Data Protection & Cybersecurity. Data Anonymization Exercises Exercise on statistical disclosure control You have found the following sanitized dataset that was released in 2020: Relationship single Gender Age City of birth * male 19-25 female 16-18 Málaga León Favorite TV Series Game of Thrones Game of Thrones in relationship male 12-15 Barcelona Friends! in relationship female 19-25 Valencia Big Bang Theory in relationship female 19-25 Madrid female 19-25 Málaga Big Bang Theory Game of Thrones single single male 16-18 León Game of Thrones single female 12-15 Barcelona Game of Thrones male 19-25 Valencia Big Bang Theory in relationship single Additionally you found a dataset with a TV show ratings survey. In the survey a value of 1 in the rating means "Definitely, I do not like it" and 5 "I am a great fan": Email Name Alice alice1995@email.com Bob bobbybob@email.com Charlie Eve Bob Alice Charlie Bob Charlie s9charchar@email.com evelyn@myhighscool.com bobbybob@email.com alice1995@email.com s9charchar@email.com bobbybob@email.com s9charchar@email.com TV Show Rating (1 is bad) Friends! 1 Friends! 4 Friends! 2 Friends! 1 Game of Thrones Game of Thrones 5 Game of Thrones 5 Big Bang Theory 3 Big Bang Theory 5 Alice Eve alice1995@email.com Big Bang Theory 2 evelyn@myhighscool.com Big Bang Theory 5 Assuming all candidates are present in both databases. 1. Where is Alice most likely born and what is most likely her relationship status? How do you infer such solution. Hint: There is enough data for a unique solution. 2. What can you learn about Charlie? 3. And what can you learn about Bob? 4. Which are the quasi-identifiers in the first dataset? 5. Does the first dataset satisfy k-anonymity? What is the value of k? 6. Suppose you are required to have a minimum of 2-anonymity, and you are allowed to add "invented" rows to achieve it. How many rows do you have to invent? 7. What approach do you propose to achieve 2-anonymity without "inventing" data?
Data Protection & Cybersecurity. Data Anonymization Exercises Exercise on statistical disclosure control You have found the following sanitized dataset that was released in 2020: Relationship single Gender Age City of birth * male 19-25 female 16-18 Málaga León Favorite TV Series Game of Thrones Game of Thrones in relationship male 12-15 Barcelona Friends! in relationship female 19-25 Valencia Big Bang Theory in relationship female 19-25 Madrid female 19-25 Málaga Big Bang Theory Game of Thrones single single male 16-18 León Game of Thrones single female 12-15 Barcelona Game of Thrones male 19-25 Valencia Big Bang Theory in relationship single Additionally you found a dataset with a TV show ratings survey. In the survey a value of 1 in the rating means "Definitely, I do not like it" and 5 "I am a great fan": Email Name Alice alice1995@email.com Bob bobbybob@email.com Charlie Eve Bob Alice Charlie Bob Charlie s9charchar@email.com evelyn@myhighscool.com bobbybob@email.com alice1995@email.com s9charchar@email.com bobbybob@email.com s9charchar@email.com TV Show Rating (1 is bad) Friends! 1 Friends! 4 Friends! 2 Friends! 1 Game of Thrones Game of Thrones 5 Game of Thrones 5 Big Bang Theory 3 Big Bang Theory 5 Alice Eve alice1995@email.com Big Bang Theory 2 evelyn@myhighscool.com Big Bang Theory 5 Assuming all candidates are present in both databases. 1. Where is Alice most likely born and what is most likely her relationship status? How do you infer such solution. Hint: There is enough data for a unique solution. 2. What can you learn about Charlie? 3. And what can you learn about Bob? 4. Which are the quasi-identifiers in the first dataset? 5. Does the first dataset satisfy k-anonymity? What is the value of k? 6. Suppose you are required to have a minimum of 2-anonymity, and you are allowed to add "invented" rows to achieve it. How many rows do you have to invent? 7. What approach do you propose to achieve 2-anonymity without "inventing" data?
Related questions
Question
Expert Solution
This question has been solved!
Explore an expertly crafted, step-by-step solution for a thorough understanding of key concepts.
This is a popular solution!
Trending now
This is a popular solution!
Step by step
Solved in 3 steps