Data Feminism Activity

docx

School

University of Colorado, Boulder *

*We aren’t endorsed by this school

Course

1101

Subject

Computer Science

Date

Feb 20, 2024

Type

docx

Pages

2

Uploaded by maddie6638

Report
Dataset: https://www.kaggle.com/datasets/brianbgonz/the-bachelorette-contestants? select=bachelorette-contestants.csv 1. From the data that is presented, what inferences and analyses can you initially make? Looking at the data for The Bachelorette, I can see that the ages range typically from mid-20s to late 30s. Many of the contestants are from California and Texas. Most entrepreneurs and personal trainers are eliminated pretty quickly. Contestants that are 36 years old or older typically were eliminated within the first 3 weeks. Contestants also have a higher chance of being chosen if they are between the ages of 26- 29. Most contestants from Dallas, Texas are eliminated early on. For The Bachelor contestants, ages typically ranged from early 20s to early 30s. Contestants 30 years old and older have a higher chance of being eliminated within the first 5 weeks. Contestants who are students have a low chance of making it to the final weeks. Contestants who are nurses, teachers, and waitresses are eliminated quickly. Most contestants come from states like California and Florida. Season 21 seemed to favor older contestants in the age range of 28-30. Season 21 being the only season to have heights included, height seemed to not make much of a difference depending on who made it to the final weeks. Seasons 11, 12, 15, and 16 had the most contestants who were older than 30. 2. What demographic information seems to be included in the data set? What is missing? Age, occupation, hometown, and sometimes height seem to be included in the dataset. Gender is already assumed based on the title, so that information is basically included. There are several things missing from this dataset. Height is only given for contestants of season 21, not for anyone else. Occupation is given, but income is not. Height is given for some contestants of The Bachelor, but no height is given at all for contestants of The Bachelorette. There is also no clear representative data that shows who won each season or how many weeks each season ran. The data is also missing how many dates contestants went on and if they received roses. 3. Was the data collection process noted? Is it clear? If so, are there any apparent inherent biases or limitations in the data collection process? The data collection process was noted. The author collected the data from Wikipedia and Reddit. This data collection process does have a sampling bias considering Wikipedia and Reddit might not represent all of the contestant information. Getting the data from a Reddit user could also contribute to bias because the user might have certain opinions on contestants which may lead to inaccurate data. The sources are also not super reliable, especially Reddit because anyone can post on Reddit. There are limitations when it comes to age because certain age groups may have more information available than others. 4. Do the variables or categories used in the data set adequately capture diverse experiences and perspectives? Are there any underrepresented groups or marginalized communities whose voices are missing? How might these omissions or biases affect the interpretation and use of the data? Most of the variables do not capture diverse experiences and perspectives. The variable “occupation” only provides professional background but does not provide insight into different experiences and
income. While hometown is given, ethnicity and region would help the data show more diverse perspectives. Elimination week does not give insight into any reasons or context as to why the contestant was eliminated. Considering the data is missing ethnicities, gender identities, and dates of each season, one may interpret the data in a very general way and make inferences that might not be accurate to the contestant’s experiences. There is a great possibility that one analyzing this limited data could make stereotypes about people in certain areas, jobs, and age groups. Carruth (2023) talks about how PredPol directs police to minority neighborhoods which results in biased and stereotypical conclusions similar to this data. There are not many initial inferences that can be made on this data considering it is so limited, but the inferences that can be made are biased and rely on generalizations. There are no categories that represent the contest’s personal experiences on the show, nor does it show the reason they decided to participate. 5. Is the dataset you've selected effective at conveying information in a wholistic, equitable manner? How so? If not, what could get it there? The dataset is not effective at conveying information in a wholistic, equitable manner. Multiple variables like ethnicity, race, sexual orientation, and income would provide more diverse perspectives and prevent generalizations. Qualitative data from interviews would also benefit this data since it would illuminate contestant experiences. Getting data on how the audience feels and interprets the show could be beneficial as well because it expands viewpoints on the contestants and audience also has the ability to influence. This type of data can be collected from social media platforms or even just community discussions. A way to prevent sampling bias in this data is to get anonymous insights on the show through interviews. Perkowitz (2021) says, “A majority-white data set, for example, does not produce accurate results for dark faces, and training a system using only images of men does not guarantee valid results for women.” Having diverse data is crucial for a dataset to be equitable. Although this might make the data very large, there should be a consideration of adding those who tried out to be contestants but were not chosen to be on the show. This would better clarify what type of people are interested in dating shows and what type of people the producers tend to skip out on. Works Cited Carruth, Christopher M (2023) Algorithms of Oppression, 20:50-21:30 Perkowitz, Sidney (2021) The Bias in the Machine: Facial Recognition Technology and Racial Disparities, para. 19. https://mit-serc.pubpub.org/pub/bias-in-machine/release/1
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help