Data Critique
How was the Oscars dataset originally created and what was it based on?
This dataset was constructed with the goal of analyzing racial diversity within the Academy Awards. We began by collecting information on nominees, winners, and award categories from publicly available sources. Notably, it is stated that the dataset was built upon The Oscar Award, 1927–2025, a dataset originally uploaded to Kaggle by Raphael Fontes. Fontes’ dataset itself was built using information scraped from the Academy Awards Official Database, maintained by the Academy of Motion Picture Arts and Sciences (AMPAS). The web scraping tool used was BeautifulSoup4 (BS4). However, the original dataset did not contain demographic information such as gender or race. To support diversity-focused analysis, the creator expanded the dataset by adding gender and racial identity attributes.
What variables are included in the dataset and what time period does it cover?
The dataset includes variables such as the year the movie was filmed, the year of the ceremony, the ceremony number (from the 72nd to the 92nd, covering 2000 to 2020), the award category, the gender and race of the nominee, the nominee’s name, the film, and whether the nominee won the award. Based on research, the dataset on Academy Awards (Oscars) nominations and winners from the 72nd to the 92nd ceremony (2000–2020) does not appear to be the product of a single organization.
What patterns of racial and gender diversity were revealed by the dataset analysis?
The dataset, “Racial Diversity of the Oscars,” illuminates many things. According to The Oscar diversity nomination index bar graph in the dataset, the results show that the majority of Oscar nominations are of the white race, with Hispanics being the lowest. Asian is the second most, and Black is the third most nominated. However, the number of Asian and Black nominations does not vary by much, and White is significantly higher than all the others, which makes it an outlier. It is also very male-dominated since men have a lot more nominations than women. Additionally, in the graphs that show the number of Oscar nominations per year, White nominations are overwhelmingly higher than every other race. It showcases the racial bias of the Oscars, and the lack of diversity within the Oscars and the film industry overall as the majority are white and male. Those in power want people from the same group to win and remain in power within the industry. They get to decide who is the top and are considered “great,” which perpetuates racial inequality, and the lack of representation of other races, and it reinforces the existing standards and power dynamics in society.
How does the dataset reflect biases in the film industry and the concept of success?
The data that we are working with assumes that the award equals significance, with gender, race, and win status being the most important identifiers. Ideologically, the dataset reinforces a narrow definition of success, one determined by elite institutional recognition. It overlooks the systemic exclusions that have historically shaped who gets nominated and who doesn’t. It also abstracts identity into basic categories, ignoring intersectional experiences or qualitative differences in contribution. If this were our only source, we would not see the broader ecosystem of cinema, indie films, non-Western media, grassroots creators, or even the personal impact of films that were never nominated. The dataset hides the social, cultural, and political forces that shape what the Academy chooses to honor.
What important context and information are missing from the dataset?
There is no performance context such as critical reception, box office success, or audience reception, which would hinder our ability to create a context to better understand how these factors could have influenced the nominees and winners each year. Other information, such as cultural and historical impact during the time period, is also left out, and it could play a significant role in explaining why certain actors/actresses received the prize. We don’t have background information on the nominees, such as their connections in the industry or previous wins or nominations, and this could also potentially be significant in looking at how it would affect the individual’s ability to achieve the award.
