I only need help with questions 5&6 please (in the photo).  You’re the ChiefData Science Officer at a large bank. You’ve instructed your team to experiment with using payment data for marketing purposes, predicting which customer might be interested in a golf tournament that the bank sponsors. So the data instances correspond to customers, and the features are unique account numbers. Your newly hired team is ready to shine and has put quite some effort in building a linear model, where each ac-count number that one can pay to is given a coefficient. The prediction model hence predicts interest based on whom the customer has made payments to. They proudly report to you that the accuracy of their model is 95%, on a test set chosen in January.

icon
Related questions
Question

I only need help with questions 5&6 please (in the photo). 

You’re the ChiefData Science Officer at a large bank. You’ve instructed your team to experiment with using payment data for marketing purposes, predicting which customer might be interested in a golf tournament that the bank sponsors. So the data instances correspond to customers, and the features are unique account numbers. Your newly hired team is ready to shine and has put quite some effort in building a linear model, where each ac-count number that one can pay to is given a coefficient. The prediction model hence predicts interest based on whom the customer has made payments to. They proudly report to you that the accuracy of their model is 95%, on a test set chosen in January.

+
quite some effort in building a linear model, where each ac-count number that one can pay to is
given a coefficient. The prediction model hence predicts interest based on whom the customer
has made payments to. They proudly report to you that the accuracy of their model is 95%, on a
test set chosen in January.
1. What further questions would you ask on the evaluation? Think of test data, metrics, and
baselines.
2. What would be potential privacy risks related to re-identification or the revelation of sensitive
information of customers to the data science team? How to measure these?
3. Might there be discrimination against sensitive groups, such as Muslims or women, if the
payment data is used? How to evaluate? Might there be certain features (account numbers) that if
a customer made a payment to those, the sensitive attribute is revealed? How to measure whether
the model is using these in a discriminatory way?
4. Would the invitees of the golf tournament event require an explanation for their predicted
interest? If so, what type of explanation would you provide?
5. How would your answers change if the target variable was now credit risk (defaulting on a
loan or not) and the data is provided to an external academic research group?
6. Would you expect your data science team to have answered (or at least raised) all the previous
questions, when they report their findings?
Transcribed Image Text:+ quite some effort in building a linear model, where each ac-count number that one can pay to is given a coefficient. The prediction model hence predicts interest based on whom the customer has made payments to. They proudly report to you that the accuracy of their model is 95%, on a test set chosen in January. 1. What further questions would you ask on the evaluation? Think of test data, metrics, and baselines. 2. What would be potential privacy risks related to re-identification or the revelation of sensitive information of customers to the data science team? How to measure these? 3. Might there be discrimination against sensitive groups, such as Muslims or women, if the payment data is used? How to evaluate? Might there be certain features (account numbers) that if a customer made a payment to those, the sensitive attribute is revealed? How to measure whether the model is using these in a discriminatory way? 4. Would the invitees of the golf tournament event require an explanation for their predicted interest? If so, what type of explanation would you provide? 5. How would your answers change if the target variable was now credit risk (defaulting on a loan or not) and the data is provided to an external academic research group? 6. Would you expect your data science team to have answered (or at least raised) all the previous questions, when they report their findings?
Expert Solution
steps

Step by step

Solved in 4 steps

Blurred answer