2261888162 - Report, Report DOC-20231218-WA0102.
docx
keyboard_arrow_up
School
Romblon State University Romblon - Sawang Campus *
*We aren’t endorsed by this school
Course
2
Subject
Information Systems
Date
Nov 24, 2024
Type
docx
Pages
21
Uploaded by DeaconAtomBison24
Data Analytics and Visualization
Task 1:
Step 1: Importing Data:
In this example, a fake statistic on grape harvests in Spain, France and Germany will be used. First, we list basic information about the growing season such as dates, amounts and areas. Once the data is ready, all you have to do is open Excel and go to the "Data" tab. To buy the accumulated data into Excel, use either Get Data or From Text/CSV. Follow the shipment wizard
to make sure that data is loaded properly.
Harvest Date: Harvest dates are carefully set to obtain the ripest grapes with the best flavor. Factors include weather forecasts, historical climate trends and the ripening time of particular grape variety.
Quantity (kg): Factors such as vineyard management practices, soil quality, and weather conditions during the growing season have an effect on how many grapes are harvested. A greater number means a bountiful harvest.
Area (hectares): The area of vineyard under cultivation is also the most important factor in determining overall grape yield. It shows the size of grape production and vineyard potential.
Temperature (Celsius): The average temperature during the growing season is influential on grape ripening. Higher temperatures enhance the accumulation of sugar in grapes, which means higher levels of alcohol content in subsequent wine.
Rainfall (mm): Grape development and the health of vines depend on rainfall. Moisture from properly timed and adequate rainfall is a necessity, but too much of it will breed disease or dilute the grapes 'flavors.
Vineyards Count: The harvest participation by vineyards measures the diversity, cooperation and scale of wine producers in a place. It can affect the overall quality and attributes of wine.
Grape Variety: Different grape varieties contribute distinct flavors, aromas, and characteristics to the wine. The choice of grape varieties is a key aspect of winemaking, influencing the final wine style.
Sugar Content (%): Sugar content in grapes is crucial for winemaking, as it determines the potential alcohol level in the wine. Brix levels are often measured to assess grape ripeness.
Acidity (pH): Acidity levels contribute to the overall balance and structure of the wine. Grapes with balanced acidity impart freshness and vibrancy to the wine, influencing its aging potential.
Region: The geographical region of grape cultivation imparts unique terroir characteristics to the wine. Each region has its own climate, soil composition, and traditions that shape the final product.
Harvest Date
Quant
ity
Regio
n
2023-01-01
100
France
2023-01-02
Germa
ny
2023-01-02
150
France
2023-01-03
120
Spain
Now, in Excel:
Go to the "Data" tab in Excel after opening the program.
For importing the information, find "From Text/CSV" on the menu.
Step 2: Initial Data Exploration:
The initial stage of gathering information is to detect discrepancies, values that are absent, or imbalances. Get a view from the air of unprocessed information by organizing and analyzing it using the tools given in the "Data" tab. The "Harvest Date," "Quantity," and "Region" boxes need to be examined in this case.
i.
Initial Data Examination:
Harvest Date:
o
Check for missing or inconsistent date entries.
o
Look for any outliers or unusual patterns in the distribution of harvest dates.
o
Verify that all dates are in the correct format and make sense chronologically.
Quantity:
o
Fill in missing values in the column labelled "Quantity".
o
See if there are outliers or unusually high/low values by looking at the distribution of quantities.
o
Units are consistent and all values numerical.
Region:
o
Check out the Region column for missing or incorrect entries.
o
It is important to check that all regions have been labeled correctly and there are no spelling mistakes.
o
Check for anomalous regions of the data set.
ii.
Using Excel Tools:
Sorting and Filtering:
o
Organize the dataset by various factors such as Harvest Date, Count and Region using Excel's sorting and filtering functions.
o
Sort the data in ascending or descending order, and find a pattern.
Data Validation:
o
Use data validation rules to check that the dates fall within a reasonable range and adhere to one form.
o
Add validation of "Quantity" values to ensure they meet expected criteria (e.g., positive numbers).
Conditional Formatting:
o
Highlight in the "Quantity" column any inconsistencies or outliers with conditional formatting.
o
Add color scales to show the division of quantities by regions.
Data Summary:
o
Apply summary statistics to 'Quantity' so as to understand the overall distribution and central tendency.
o
Create a regional harvest summary with pivot tables dates.
Charts and Visualizations:
Visualize the distribution of quantities over time and across regions with simple charts such as bar graphs or line graphs.
Explore any trends or patterns with Excel's charting tools.
iii.
Documentation:
Document Findings:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
o
During the initial survey, make note of any discrepancies noticed.
o
Record the process of cleaning and organizing data for future reference.
Prepare for Cleaning:
o
In light of these findings, sketch out a strategy for data cleaning, indicating how to fill in missing values or deal with aberrations.
Mathematical Formula: Use Excel's sorting and filtering features:
=FILTER (A2:C5, (B2:B5 <> ""))
Step 3: Handling Missing Values:
If you find that the number is blank, then it must be filled in by a correct method. Average assignment is a handy filling-in device; you take the average of all of those numbers that are in Quantity. To do this, use the IF (ISBLANK ()...) equation in Excel.
Identify Missing Values:
Locate empty or missing values in the "Quantity" column first.
If the cell is empty, check in Excel using the IF (ISBLANK()) equation. For example: excel
This formula tests whether cell A2 is blank. If it is, then "Missing"is returned; otherwise, the original value in A2.
Calculate Average Quantity:
Compute the average of those values for which there are data in the "Quantity" column by means
of AVERAGE.
For example:
AVERAGE (B2:B100)
This formula calculates the average of values in the "Quantity" column from cell B2 to B100.
Apply Average Assignment:
Use the calculated average to fill in the missing values in the "Quantity" column. Modify the IF(ISBLANK()) formula to incorporate the average. For example: excel
IF(ISBLANK(B2), AVERAGE(B2:B100), B2)
This formula checks if B2 is blank. If it is, it fills in the average of the "Quantity" column; otherwise, it retains the original value in B2.
Extend Formula:
Drag or copy the modified formula down to cover all rows with missing values in the "Quantity" column.
Verify Results:
Examine the "Quantity" column to ensure that missing values have been filled in with the appropriate averages.
Data Validation:
Implement data validation rules to prevent future missing values or inconsistencies in the "Quantity" column.
Conditional Formatting:
Apply conditional formatting to highlight cells where missing values were initially present. Ensure the formatting is clear and easily noticeable.
Document Changes:
Keep a record of the cells where missing values were filled in, along with the applied averages.
Document the steps taken for future reference and transparency.
Mathematical Formula: To complete the "amount" pillars, you can utilize typical remediation:
=IF (ISBLANK (B2), AVERAGE (B$2:B$4), B2)
Step 4: Data Formatting:
For reasons of uniformity and convenience in a contrast, the "Harvest Date" parameter must be written as the value of a number. To do this, select the girder, then navigate to the "Home" tab and name the times correctly. To further ensure data comparability, it is advised to enhance assessment machines as applicable.
Select the Column:
o
In Excel, navigate to the worksheet containing the "Harvest Date" column.
o
Click on the header of the "Harvest Date" column to select the entire column.
o
Navigate to the "Home" Tab:
o
Go to the "Home" tab in the Excel ribbon.
Format as Number:
o
Locate the "Number" group within the "Home" tab.
Choose Date Format:
o
Click on the drop-down menu in the "Number" group.
o
Select "Short Date" or a date format that suits your preference.
Enhance Assessment:
o
Consider enhancing data comprehensibility and assessment by applying additional formatting options:
o
Custom Formats: Use custom date formats to display dates in a specific way (e.g., "yyyy-
mm-dd").
o
Sorting: Ensure that the dates are sorted chronologically for easier analysis.
o
Filtering: Apply filters to quickly analyze data for specific date ranges.
Verify Results:
o
Check the "Harvest Date" column to ensure that dates are now formatted according to the
chosen style.
Conditional Formatting:
o
Utilize conditional formatting to highlight or color-code certain date ranges for visual assessment.
Data Validation:
o
Implement data validation rules to prevent entry of incorrect date formats in the future.
Document Formatting Choices:
o
Keep a record of the chosen date format for the "Harvest Date" column.
o
Document any additional formatting choices made for enhanced assessment.
Include in Data Dictionary:
o
If applicable, update your data dictionary to include information about the chosen date format for future reference.
Mathematical Formula: Format "Harvest Date" as a date:
=DATEVALUE (A2)
Step 5: Removing Duplicates:
Verify that the collection of information is free of duplicates and delete them. In Excel, you may accomplish this by going to the "Data" tab and using the "Delete Replications" feature. Microsoft
will remove row duplication when you choose the appropriate columns.
Select the Range:
o
In Excel, navigate to the worksheet containing the data.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
o
Select the entire range of data, including headers if applicable.
o
Navigate to the "Data" Tab:
o
Go to the "Data" tab in the Excel ribbon.
Use "Remove Duplicates" Feature:
o
Locate the "Data Tools" group within the "Data" tab.
Click "Remove Duplicates":
o
Click on the "Remove Duplicates" button. A dialog box will appear.
Choose Columns:
o
In the dialog box, select the columns where you want to check for duplicates.
o
For example, if you want to remove duplicates based on the "Region" column, select "Region."
Verify Selection:
o
Double-check that the correct columns are selected to identify duplicates accurately.
Configure Options:
o
If needed, configure additional options such as considering case sensitivity.
Click "OK":
o
Click the "OK" button to execute the removal of duplicate rows.
Review Results:
o
Excel will remove duplicate rows based on the selected columns, leaving only unique records.
Data Validation:
o
Implement data validation rules to prevent the entry of duplicate values in the future.
Conditional Formatting:
o
Use conditional formatting to highlight duplicate values for visual identification before removal.
Document Removal of Duplicates:
o
Keep a record of the columns selected and the results after removing duplicates.
o
Document any specific considerations or options chosen during the process.
Include in Data Dictionary:
o
If applicable, update your data dictionary to indicate that duplicate rows have been removed.
Step 6: Text Formatting:
The "Region" category and any other text-based information in the database should be formatted regularly for ease of reading and evaluation. For instance, to make everything consistent, you can
make all the letters lowercase.
Select the Column:
o
In Excel, navigate to the worksheet containing the "Region" column or any other text-
based information.
o
Click on the header of the respective column to select the entire column.
Navigate to the "Home" Tab:
o
Go to the "Home" tab in the Excel ribbon.
Text Formatting Options:
o
In the "Home" tab, locate the "Font" group.
Change Case:
o
Click on the drop-down menu in the "Font" group.
o
Select the desired text formatting option, such as "Lowercase" to make all letters lowercase.
Apply Consistent Formatting:
o
Ensure that all text-based information in the selected column is formatted consistently.
Verify Results:
o
Check the "Region" column or the relevant text-based column to confirm that text formatting has been applied.
Sorting and Filtering:
o
Use sorting and filtering functions to organize and evaluate data more efficiently.
Conditional Formatting:
o
Apply conditional formatting to highlight specific text patterns or conditions for visual assessment.
Document Formatting Choices:
o
Keep a record of the chosen text formatting option for the "Region" column or other text-
based column.
o
Document any additional formatting choices made for enhanced readability and evaluation.
Include in Data Dictionary:
If applicable, update your data dictionary to include information about the chosen text formatting
for future reference.
Mathematical Formula: Convert the "Region" column to uppercase:
=UPPER (C2)
Step 7: Applying Security Processes:
Click the "Review" tab to lock the assignment so no one else can make changes to it or access it without your permission. Pick the "Protect Sheet" alternative, enter a security code, and provide permissions to specific areas of the spreadsheet.
Navigate to the "Review" Tab:
o
In Excel, go to the "Review" tab in the Excel ribbon.
Select "Protect Sheet":
o
Within the "Changes" group in the "Review" tab, locate and click on the "Protect Sheet" option.
Enter Security Code:
o
A "Protect Sheet" dialog box will appear.
o
Enter a security code/password in the "Password to unprotect sheet" field. This code will be required to unlock the worksheet.
Set Permissions:
o
Choose the permissions you want to grant. Options may include:
o
Selecting locked cells: Allows users to select cells in the protected worksheet.
o
Selecting unlocked cells: Allows users to select and interact with unlocked cells.
Define Areas to Protect:
o
Specify which areas of the worksheet to protect. You can choose to protect the entire sheet or only specific ranges.
Click "OK":
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
o
After setting permissions and defining protected areas, click the "OK" button to apply the
protection.
Confirm Security Code:
o
If prompted, re-enter the security code to confirm.
Test Protection:
o
Attempt to make changes in the protected areas. Excel will prompt for the security code or restrict editing based on the specified permissions.
Worksheet Tab Color:
o
Consider changing the color of the worksheet tab to visually indicate that the sheet is protected.
Document Protection Details:
o
Keep a record of the security code used and specific areas protected.
o
Document any instructions or notes related to the protected sheet.
Include in Documentation:
o
Update your documentation to include information about the protected sheet, including the security code and areas protected.
Communicate Permissions:
o
If sharing the protected sheet with others, communicate the security code and any relevant instructions regarding permissions.
Step 8: Documentation:
Make a new section in the Excel file specifically for keeping track of paperwork. Give specifics like the arrival time, restoration process, period structure, and any other changes performed. Following these procedures has prepared your collection of data for evaluation in a consistent and secure manner, and the accompanying information is a complete record of the the preparation activities. Modify these equations according to the collection and individual needs.
Insert a New Worksheet:
o
In Excel, add a new worksheet that contains the paperwork tracking section.
o
Rename the new worksheet suitably, as Paperwork Tracking.
Define Column Headers:
o
Establish column headers to record the information concerned. For example:
o
"Arrival Time"
o
"Restoration Process"
o
"Period Structure"
o
"Changes Performed"
Enter Data:
o
Begin entering data in the lines below each column header.
o
For Arrival Time, enter the date and time that paperwork or data arrived.
o
For the Restoration Process, record any process of restoration or cleaning up.
o
In "Period Structure," indicate the period or timeframe for which data are provided.
o
In the heading above, list any changes made to the dataset during preparation.
o
In "Additional Notes," enter any other pertinent information or observations.
Include Formulas and Equations:
o
Add formulas or equations to automate calculations and track certain metrics when possible.
o
For instance, you could estimate the period of restoration or record changes that occurred.
Format the Worksheet:
o
Highlight headers with bold type, or shade cells.
o
It is recommended that you use conditional formatting to emphasize particular conditions or items.
Hyperlinks:
o
For quick reference, add hyperlinks to related documents or external resources.
Data Validation:
o
Verify data entries meet certain criteria, or conform to an established format.
Document Preparation Activities:
o
On a separate documentation sheet or section, list the preparatory work done. This includes cleaning and formatting data as necessary and adding security measures, along with creatingof paperwork tracking portion on pages 1-8 of this document.
Include in Data Dictionary:
o
Add to the data dictionary details about the new paperwork tracking section, as well any formulas or equations used.
Instructions for Modification:
o
Include clear instructions on how to change or adapt the equations and data in the paperwork tracking section for different datasets, changes of policy etc.
Task 2:
Objective:
A complete explanatory survey, with graphs broken down for each field of both grape varietals. If you already have the berries 'main and secondary axes' lengths, then execute a predictive examination of hypotheses on them. From such data, draw conclusions about the peculiarity of grape types. To support your point, include screenshots of each step with any Excel calculations used.
Primary and Secondary Axis Lengths:
o
For each grape varietal, carry out a comprehensive statistical analysis on both the primary
and secondary axes.
o
Use descriptive statistics to study the central tendency, dispersion and shape of distributions.
Graphical Representation:
o
Visualize, with diagrams such as histograms or box plots, the distribution of primary and secondary axis lengths for each grape varietal.
o
Explore possible relationships between primary and secondary axis lengths through scatter plots.
Define Null and Alternative Hypotheses:
o
Develop clearly formulated null and alternative hypotheses regarding primary axis lengths, as well secondary axis lengths.
o
For instance, find out if there is a large difference in means of primary axis lengths between grape varietals.
Select Appropriate Statistical Tests:
o
Depending on the nature of your hypotheses and data, select an appropriate statistical test--be it a t-test or ANOVA.
Conduct Hypothesis Testing:
o
Use Excel functions or data analysis tools to test hypotheses.
o
Include Excel screenshots for setting up and running the tests.
Interpret Statistical Significance:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
o
To find out whether there is a statistical difference between primary and secondary axes, analyze the results of hypothesis testing.
Consider Practical Significance:
o
Weight the practical significance of any differences observed in terms of growing grapes and harvesting them.
Draw Insights about Grape Peculiarities:
o
Conclude on the peculiarities of grape types from statistical analysis.
o
In the case of vineyard management, wine production or other such fields.
Capture Screenshots:
o
In Excel, put in screenshots for each step of data analysis and hypothesis testing as well as the plotting.
Excel Calculations:
o
Explain the Excel formulas and calculations in this analysis.
o
The statistical research process should be open and transparent.
Propose Recommendations:
o
Drawing on these results, make recommendations for the cultivation or winemaking of grapes.
o
Now, think about how these insights from the statistical analysis can be used in practice.
Identify Areas for Further Exploration:
o
Highlight sections that need additional investigation or data collection for more thorough understanding.
Step 1: Descriptive Statistical Analysis with Histograms:
Get the information Set Ready:
import pandas as pd
# Assuming the dataset is stored in a CSV file named 'grape_data.csv'
df = pd.read_csv(
'grape_data.csv'
)
# Display the first few rows of the dataset for initial exploration
df.head()
First, use the cleaning information. Next, give both vineyards of grapes Principal and Lesser Axis Lengths.
Descriptive Statistical Analysis:
# Calculate descriptive statistics for Major Axis Length
major_axis_stats = df[
'Major Axis Length'
].describe()
print
(
major_axis_stats
)
Using Excel formulas, calculate mean, median, standard deviation and minimum or maximum for each field.
Also create histograms on a separate graph for each field, including Major and Minor Axis Length.
import matplotlib
.
pyplot as plt
import seaborn as sns
# Plot histogram for Major Axis Length
plt
.
figure
(
figsize
=
(
8
, 6
))
sns
.histplot(df[
'Major Axis Length'
], bins
=
20
, kde
=
True
)
plt
.
title
(
'Major Axis Length Histogram'
)
plt
.
show
()
Screenshots:
Step 2: Inferential Hypothesis Test:
Formulate Hypotheses:
Null Hypothesis (H0): There is no significant difference in the Major and Minor Axis Lengths between the two grape varieties.
Alternative Hypothesis (H1): There is a significant difference in the Major and Minor Axis Lengths between the two grape varieties.
alpha = 0.05 # Chosen significance level
Performing T-test:
from scipy.stats import ttest_ind
# Perform t-test
t_statistic
, p_value = ttest_ind(df[df[
'Grape Variety'
] == 'Variety A'
]
[
'Major Axis Length'
],
df[df[
'Grape Variety'
] == 'Variety B'
]
[
'Major Axis Length'
])
Select Significance Level:
alpha = 0.05 # Chosen significance level
Choose a significance level (e.g., α = 0.05).
Perform Appropriate Test:
Depending on the data distribution, choose an appropriate inferential test (e.g., t-test or Mann-
Whitney U test).
Analyze Results:
Obtain test statistics and p-values.
Compare the p-values to the chosen significance level.
Screenshots:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Inference and Report: Based on the results obtained from the descriptive statistical analysis and the hypothesis test, make inferences regarding the eccentricity of grape varieties. For example, if the Major and Minor Axis Lengths show significant differences, it might indicate variations in the eccentricity of grape shapes between varieties.
Task 3:
Objective:
Develop a fully interactive dashboard in Excel with slicers. The dashboard will feature three separate charts, including two Pivot Charts for each class of Raisin displaying ConvexArea and Perimeter. The third Pivot Chart is of your choice to support the dashboard's validity and recommendations. Slicers will be utilized to infer which grape variety is the largest and smallest based on the region, as well as identify the class of grape with the greatest and least perimeter. Screenshots are required for each step.
Dashboard Development:
o
Develop a dedicated sheet or section for the interactive dashboard within the Excel workbook.
Insert Pivot Tables:
o
Create Pivot Tables for each class of raisin, summarizing data related to ConvexArea and
Perimeter.
o
Utilize the "Class" as rows, and include ConvexArea and Perimeter in the Values section.
Create Pivot Charts:
o
Generate two Pivot Charts for each class: one displaying ConvexArea and the other displaying Perimeter.
o
Include appropriate chart types, such as bar charts or line charts, for effective visualization.
Include a Third Pivot Chart:
o
Introduce a third Pivot Chart of your choice that supports the overall narrative and recommendations.
o
This could be a chart highlighting trends, comparisons, or any other relevant insights.
Data Validation and Verification:
o
Verify that the Pivot Tables and Charts accurately represent the underlying data.
o
Ensure that calculations are correct and in alignment with the intended analysis.
Step 1: Creating Pivot Charts for ConvexArea and Perimeter
ConvexArea Pivot Chart:
o
Select the data range.
o
Insert a PivotTable.
o
Choose "Class of Raisin" for rows, "ConvexArea" for values.
o
Insert a Pivot Chart based on the PivotTable.
o
ConvexArea Pivot Chart
Perimeter Pivot Chart:
o
Similar to the ConvexArea chart, choose "Perimeter" for values.
o
Perimeter Pivot Chart
Step 2: Adding a Third Pivot Chart
o
Choose a Relevant Pivot Chart:
o
Decide on a chart that adds validity to your dashboard. For example, a scatter plot comparing ConvexArea and Perimeter.
o
Third Pivot Chart
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Step 3: Implementing Slicers
o
Create Slicers:
o
Insert slicers for "Region," "Grape Variety," and "Class of Raisin."
o
Connect slicers to all charts.
o
Slicers
Step 4: Inferring Grape Variety Size and Class Perimeter
o
Analyze Results:
o
Use slicers to select different regions and observe changes in ConvexArea and Perimeter charts.
o
Identify which grape variety appears largest and smallest based on region.
o
Use slicers to infer which class of grape has the greatest and least perimeter.
Step 5: Dashboard Design and Recommendations
o
Design Dashboard Layout:
o
Organize charts in a visually appealing layout.
o
Include titles, labels, and legends for clarity.
o
Dashboard Layout
o
Based on the analysis, offer insights into the largest and smallest grape varieties in different regions.
o
Discuss the class of grape with the greatest and least perimeter.
o
Justify the design choices made in the dashboard.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help