Controversy in Cinema: Exploring Polarization in Movie Ratings

Controversy in Cinema: Exploring Polarization in Movie Ratings

Movies have been a source of entertainment for many generations. From the early silent films to the latest blockbusters, cinema has always been a reflection of our society. But with the advent of online reviews and ratings, it has become easier than ever to gauge the public's opinion on movies. In this blog, we explore how polarization in movie reviews and ratings can help us identify controversial movies.

The Dataset

Our dataset consists of two main files: mov_reviews.csv and Mov_metadata.csv. The first file contains movie reviews from IMDb, including the movie ID, user ID, and the rating given by the user. The second file contains movie metadata from TMDB, including information such as the movie's genre, box office income, cast, language, runtime, and year of release.

Defining Controversy


Defining controversy in movies is a complex task that requires analyzing various factors, such as plot, characters, themes, and presentation. One approach to measuring controversy in movies is to look at the polarisation in movie reviews and ratings.

To do this, we can perform a variance analysis on the rating data for a given movie. If a movie has a high variance in its rating data, it suggests that there is a wide range of opinions among the viewers, and hence the movie is more likely to be controversial.


Another approach is to use network analysis to identify clusters of users who have similar opinions on the movie. To do this, we can create a network graph, where each node represents a user who has rated the movie. If the difference between the ratings given by two users is more than 4 or less than 1, we create an edge between them, with the edge length indicating the difference between their ratings. This process helps us identify users who have significantly different opinions on the movie.


We can then calculate the clustering coefficients of the graph, which measures how likely it is for a group of nodes to be connected to each other. A high clustering coefficient indicates that users within the same cluster have similar opinions on the movie, while a low clustering coefficient suggests that there is less consensus among the viewers. By combining variance analysis and network analysis, we can gain a better understanding of the controversy surrounding a movie.

Combining Estimates


Combining the results from the variance analysis and the network analysis can provide a more comprehensive measure of the level of controversy in a movie. To do this, we can calculate the correlation between the variance and the clustering coefficients for each movie in the dataset. A positive correlation between the two measures would indicate that movies with higher variance in ratings also have higher clustering coefficients, suggesting more controversy. In contrast, a negative correlation would suggest that movies with higher variance have lower clustering coefficients, indicating less controversy.

Once we have calculated the correlation, we can use it to identify the most controversial movies in our dataset. Movies with a high level of controversy would have a strong positive correlation between the variance and clustering coefficients, indicating a wide range of opinions among viewers and significant differences in how the movie is perceived.

Exploring Correlations

Analyzing the relationships between different parameters can provide valuable insights into the factors that contribute to a movie's success and controversy. In addition to identifying controversial movies, we can also explore the correlations between box office income and language, genre, and estimated level of controversy. We can also investigate the relationship between language and the estimated level of controversy and the relationship between genre and the estimated level of controversy.

Results



The above plot compares the relationship of combined estimate vs the movie year. As can we observed in the graph, with recency, the amount of controversiality considerably increases. Especially in the recent years, controversiality in movies has had a spike. Nowadays, more and more movies have polarized opinions and controversiality is on a rise. People in our society are continuously becoming more polarized in terms of various opinions. 

The above graph shows variation of profit of movies vs combined estimate of polarizability. We obtain a linear relation: More polarized movies indeed show better profits. Polarizability has been used as an estimate for controversiality. Hence, we conclude that more controversial movies are more profitable in general.

The above graphs tries to estimate the variation of the combined estimate vs language. Languages like Hindi, Italian and French have generally more controversiality as compared to languages on the lower end of the graph. The graph just shows an estimate of how people’s opinions vary for movies across various different languages.


The above graph shows the relation of genre vs combined polarizability estimate. Genres like War, Documentary, Drama, etc. have maximum movies which can be considered to be polarized. In contrast, genres like “Musical” and “Children” usually have unanimous unpolarized opinions. 

Conclusion

In conclusion, we have explored how polarization in movie reviews and ratings can help us identify controversial movies. We have tried to establish that polarizability can be used as an effective estimate for defining controversiality. We have used variance analysis and network analysis to calculate an estimate of the level of controversy in each movie, and we have explored the correlations between various parameters from the metadata file. By doing so, we hope to shed some light on the factors that contribute to controversy in cinema.


Comments

Popular posts from this blog

Parlia-metrics!

[READ THIS BEFORE YOU PITCH] The Indian Bible for The Indian Investor

Stock Tip Simulator