Please find attached, in CSV format, the dataset with filename 2019 EVCP use Q3 and Q4.csv. This is an openly available dataset from the Web link: https://datamillnorth.org/dataset/electric-vehicle-chargepoints-in-council-car-parks- The dataset describes information on the usage data for electric vehicle chargepoints in Council car parks in Leeds, UK, including how often they are used, duration of charge, electricity consumed and type of vehicle. Use Excel, Power BI, RapidMiner, Weka, Python, R, MATLAB, or other software (or software packages) you consider suitable, or their combinations, to perform analysis on the data, and answer the below 3 questions. Question 1. Before performing any analysis, but by (visually) inspecting the data, describe as completely as possible, the data features or attributes, how the data may be acquired, their values, sizes and availability, any general patterns/trends, and the general nature of the data (e.g. the” 5 C’s” for data quality). [30 marks] Question 2. Use software or software packages to answer the following: (a) Data pre-processing: (i) Are there any missing data? [2 marks] (ii) Can the missing data be easily imputed? State why or why not. [2 marks] (iii) Delete all rows with missing values in the column(s), and save it using a different filename. How many rows are left after deletion? [5 marks] (b). Analytics: (i) What are the average and standard deviation of the power used (in kWh) for the 2 quarters (Q3 and Q4) combined? [5 marks] (ii) Re-calculate the average and standard deviation of the power used (in kWh) only for quarter Q3. Then repeat for quarter Q4. What are their values? [5 marks] Is the average power used for Q3 higher or lower than that for Q4? [2 marks] (iii) Which site has the highest total power usage? [2 marks] Within this site, which connector has the highest total power usage? [2 marks] (iv) For Q3 and Q4 combined, plot a bar chart of car park ID (CP ID) in the horizontal axis and the total power used (in kWh) on the vertical axis. [7 marks] Based on the plotted graph, which car park ID (CP ID) has the highest usage of power for the two quarters (Q3 and Q4) combined? What is its total power usage value? [2 marks] Based on the plotted graph, which car park ID (CP ID) has the lowest usage of power for the two quarters (Q3 and Q4) combined? What is its total power usage value? [2 marks] (v) For Q3 and Q4 combined, plot a bar chart of User ID in the horizontal axis and the total power used (in kWh) on the vertical axis. [7 marks] Which User ID has the highest usage of power for the two quarters (Q3 and Q4) combined? What is its total power usage value? [2 marks] Sub-total: 45 marks Question 3. Describe in detail: (a). the (i) usefulness of this dataset; [5 marks] and (ii) limitations of this dataset, [5 marks] and what can be done to improve the data in terms of data collection/curation; [5 marks] (b). any (i) opportunities for policymaking; [5 marks] and (ii) potential business opportunities.
Please find attached, in CSV format, the dataset with filename
2019 EVCP use Q3 and Q4.csv.
This is an openly available dataset from the Web link:
https://datamillnorth.org/dataset/electric-vehicle-chargepoints-in-council-car-parks-
The dataset describes information on the usage data for electric vehicle chargepoints in Council car parks in Leeds, UK, including how often they are used, duration of charge, electricity consumed and type of vehicle.
Use Excel, Power BI, RapidMiner, Weka, Python, R, MATLAB, or other software (or software packages) you consider suitable, or their combinations, to perform analysis on the data, and answer the below 3 questions.
Question 1. Before performing any analysis, but by (visually) inspecting the data, describe as completely as possible, the data features or attributes, how the data may be acquired, their values, sizes and availability, any general patterns/trends, and the general nature of the data (e.g. the” 5 C’s” for data quality).
[30 marks]
Question 2. Use software or software packages to answer the following:
(a) Data pre-processing:
(i) Are there any missing data? [2 marks]
(ii) Can the missing data be easily imputed? State why or why not. [2 marks]
(iii) Delete all rows with missing values in the column(s), and save it using a different filename. How many rows are left after deletion? [5 marks]
(b). Analytics:
(i) What are the average and standard deviation of the power used (in kWh) for the 2 quarters (Q3 and Q4) combined? [5 marks]
(ii) Re-calculate the average and standard deviation of the power used (in kWh) only for quarter Q3. Then repeat for quarter Q4. What are their values? [5 marks]
Is the average power used for Q3 higher or lower than that for Q4?
[2 marks]
(iii) Which site has the highest total power usage? [2 marks]
Within this site, which connector has the highest total power usage?
[2 marks]
(iv) For Q3 and Q4 combined, plot a bar chart of car park ID (CP ID) in the horizontal axis and the total power used (in kWh) on the vertical axis. [7 marks]
Based on the plotted graph, which car park ID (CP ID) has the highest usage of power for the two quarters (Q3 and Q4) combined? What is its total power usage value? [2 marks]
Based on the plotted graph, which car park ID (CP ID) has the lowest usage of power for the two quarters (Q3 and Q4) combined? What is its total power usage value? [2 marks]
(v) For Q3 and Q4 combined, plot a bar chart of User ID in the horizontal axis and the total power used (in kWh) on the vertical axis. [7 marks]
Which User ID has the highest usage of power for the two quarters (Q3 and Q4) combined? What is its total power usage value? [2 marks]
Sub-total: 45 marks
Question 3. Describe in detail:
(a). the (i) usefulness of this dataset; [5 marks]
and (ii) limitations of this dataset, [5 marks]
and what can be done to improve the data in terms of data collection/curation; [5 marks]
(b). any (i) opportunities for policymaking; [5 marks]
and (ii) potential business opportunities.
Step by step
Solved in 2 steps