Parlia-metrics!

Parlia-metrics!

Team Computational Anti-Social Scientists

The Indian Lok Sabha debates are a rich source of political discourse, reflecting the complex and evolving nature of Indian democracy. These debates, which occur during the sessions of the Lok Sabha, offer a window into the key issues and concerns of Indian politics, as well as the rhetorical strategies and linguistic patterns of various political actors.

In this project, we have leveraged the power of computational social science to analyze and model the proceedings of the Lok Sabha debates. 


Goal


The primary goal of this project is to delve into the vast realm of public-domain Indian data and unlock valuable insights from the transcripts of Lok Sabha debates. While datasets such as the Hindi Legal Documents Corpus (HLDC) and Indian Legal Documents Corpus (ILDC) have been explored and published, the parliamentary debates, despite being public, remain largely untapped in terms of comprehensive analysis and exploration.

To begin with, the project aims to extract, clean, and process the textual data from a single Lok Sabha session. This initial step is crucial in ensuring that the data is in a format suitable for further analysis. By employing text mining and natural language processing techniques, the team worked towards converting the unstructured text into a structured and organized format.


Once the data is prepared, we conducted some exploratory data analysis. This phase involved examining the data to identify patterns, trends, and key characteristics such as explainability of data, and the difficulty of working with it. Key analyses included speaker frequency count, determining the most active participants in the debates, and identifying the topics that dominate the discussions. Sentiment analysis was also conducted to understand the emotional context and tone of the debates.


Furthermore, the project will focus on analyzing the topics discussed by each speaker, allowing for a deeper understanding of individual contributions. By identifying the most prominent themes associated with each speaker, the project aims to shed light on the diverse perspectives and interests within the Lok Sabha.


To enhance the interpretability and accessibility of the findings, the project in the report video and poster presentation employed various visualization techniques such as plots, heatmaps, and interactive graphs. These visualizations provide a comprehensive overview of the data and make it easier to identify trends and patterns. Towards the end of this project, we also explored scaling up the analyses to encompass a more extensive dataset, enabling a broader exploration of the Lok Sabha debates over time.

Methodology

We began by scraping the necessary PDFs and converting them to text files for easy access and analysis. To ensure accuracy, we then cleaned up the text by removing the excessive newlines and other formatting inconsistencies.


Next, we used regular expressions to filter out the speeches made by individual speakers, which we stored in a structured database. To avoid errors due to spelling and notation inconsistencies, we implemented partial matching algorithms. This allowed us to ensure that every speech made in the Lok Sabha was captured and matched to the speaker, even if there were variations in the name of the speaker or the notation of their speeches.


Our analysis involved a detailed study of speaker counts and verbosity, sentiments expressed by the speakers, the temporal distribution of topics discussed, and topics discussed on a per-speaker basis. We also examined the manifestos of the BJP and Congress parties to see if their Lok Sabha speeches aligned with their promised agendas.


To extrapolate our findings, we expanded our study to include one year's worth of debates. We then drew subjective inferences from the results and correlated them with real-world incidents to gain a deeper understanding of the underlying trends. It is worth mentioning here that in the final version of our analysis, we omitted the sentiment scores as we did not find that they amounted to any tangible or insightful patterns. We also performed and demonstrated Named Entity Recognition during the initial phases to search for speakers and topics but eventually replaced it with a combination of zero-shot topic modelling and regular expression matching.


Results


The topics of discussion varied significantly from session to session, with the focus shifting depending on current events in the country. For example, in the first month of the budget session (the first session to be held since the repealment of the farm bill), the topics of discussion prominently included were farmers, defense, and technology, which gradually shifted to roads, railways and fiscal matters. This finding indicates that Lok Sabha debates are responsive to current events and provide a valuable source of insight into the political priorities of the country's leaders.


Secondly, the analysis revealed a difference in the language and themes used by the two major political parties in India - the Bharatiya Janata Party (BJP) and the Indian National Congress (INC). BJP focused heavily on nationalism, security, and work-related issues, while Congress spoke more about promises, rights, equality and women's issues. This finding reinforces the expected status quo, that the two parties have distinct political ideologies and campaign strategies, with BJP emphasizing its role as a strong nationalist government and Congress focusing on issues related to social justice and equality.


However, this difference in language and themes used by the two parties may also be influenced by their positions of power. BJP is currently in power, and as such, it may prioritize issues related to national security and economic development. Congress, on the other hand, is in the opposition and may prioritize highlighting issues where the ruling party is weak and showcasing their concern for issues related to social justice and women's rights to win the public's support.


We also looked for overlaps between party ideologies/manifestos and the priorities of party MPs by comparing the topic modelling results from the manifestos to those from the speeches made by party members within parliament. Here, we found some overlap, such that Nirmala Sitharaman’s speeches often having to do with national matters (alongside fiscal matters that one would expect to come up, as she is the Finance minister), in contrast with MPs like Nitin Gadkari, most of whose speeches were concerned with his own ministry (roadways).


On the flip side, we saw opposition MPs raise regional matters in the parliament (less stark in BJP leaders’ speeches), such as Mahua Moitra bringing up the Naxalite movement, Shashi Tharoor frequently speaking about the state affairs for Kerala and an Assamese Congress MP bringing up tribal issues in his state. All of these are inferences we made from the results of topic modelling on a per-speaker basis.


Through this project, we successfully explored Indian Lok Sabha debates as a dataset of scholarly interest, conducted exploratory data analyses to understand the data, and made some insightful inferences about the contents of the debates. We presented this research in an accessible manner to dozens of people during a public poster presentation, recorded our progress in a video and also documented it as this blog post.


Comments

Popular posts from this blog

[READ THIS BEFORE YOU PITCH] The Indian Bible for The Indian Investor

Stock Tip Simulator