2261874412 - Report, Re Data_Analytics_and_Visualization_rearrange_done

docx

School

Romblon State University Romblon - Sawang Campus *

*We aren’t endorsed by this school

Course

2

Subject

Information Systems

Date

Nov 24, 2024

Type

docx

Pages

21

Uploaded by DeaconAtomBison24

Report
Data Analytics and Visualization Task 1: Step 1: Importing Data: In this example, a fake statistic on grape harvests in Spain, France and Germany will be used. First, we list basic information about the growing season such as dates, amounts and areas. Once the data is ready, all you have to do is open Excel and go to the "Data" tab. To buy the accumulated data into Excel, use either Get Data or From Text/CSV. Follow the shipment wizard to make sure that data is loaded properly. Harvest Date: Harvest dates are carefully set to obtain the ripest grapes with the best flavor. Factors include weather forecasts, historical climate trends and the ripening time of particular grape variety. Quantity (kg): Factors such as vineyard management practices, soil quality, and weather conditions during the growing season have an effect on how many grapes are harvested. A greater number means a bountiful harvest. Area (hectares): The area of vineyard under cultivation is also the most important factor in determining overall grape yield. It shows the size of grape production and vineyard potential. Temperature (Celsius): The average temperature during the growing season is influential on grape ripening. Higher temperatures enhance the accumulation of sugar in grapes, which means higher levels of alcohol content in subsequent wine. Rainfall (mm): Grape development and the health of vines depend on rainfall. Moisture from properly timed and adequate rainfall is a necessity, but too much of it will breed disease or dilute the grapes 'flavors. Vineyards Count: The harvest participation by vineyards measures the diversity, cooperation and scale of wine producers in a place. It can affect the overall quality and attributes of wine. Grape Variety: Different grape varieties contribute distinct flavors, aromas, and characteristics to the wine. The choice of grape varieties is a key aspect of winemaking, influencing the final wine style.
Sugar Content (%): Sugar content in grapes is crucial for winemaking, as it determines the potential alcohol level in the wine. Brix levels are often measured to assess grape ripeness. Acidity (pH): Acidity levels contribute to the overall balance and structure of the wine. Grapes with balanced acidity impart freshness and vibrancy to the wine, influencing its aging potential. Region: The geographical region of grape cultivation imparts unique terroir characteristics to the wine. Each region has its own climate, soil composition, and traditions that shape the final product. Harvest Date Quant ity Regio n 2023-01-01 100 France 2023-01-02 Germa ny 2023-01-02 150 France 2023-01-03 120 Spain Now, in Excel: Go to the "Data" tab in Excel after opening the program. For importing the information, find "From Text/CSV" on the menu. Step 2: Initial Data Exploration: The initial stage of gathering information is to detect discrepancies, values that are absent, or imbalances. Get a view from the air of unprocessed information by organizing and analyzing it using the tools given in the "Data" tab. The "Harvest Date," "Quantity," and "Region" boxes need to be examined in this case. i. Initial Data Examination: Harvest Date: o Check for missing or inconsistent date entries. o Look for any outliers or unusual patterns in the distribution of harvest dates. o Verify that all dates are in the correct format and make sense chronologically. Quantity: o Fill in missing values in the column labelled "Quantity".
o See if there are outliers or unusually high/low values by looking at the distribution of quantities. o Units are consistent and all values numerical. Region: o Check out the Region column for missing or incorrect entries. o It is important to check that all regions have been labeled correctly and there are no spelling mistakes. o Check for anomalous regions of the data set. ii. Using Excel Tools: Sorting and Filtering: o Take advantage of Excel's sorting and filtering functions to organize the dataset by "Harvest Date," Count, Region. o Sort the data in ascending or descending order and look for patterns. Data Validation: o Use data validation rules to check that the dates fall within a reasonable range and adhere to one form. o Add validation of "Quantity" values to ensure they meet expected criteria (e.g., positive numbers). Conditional Formatting: o Highlight in the "Quantity" column any inconsistencies or outliers with conditional formatting. o Add color scales to show the division of quantities by regions. Data Summary: o Apply summary statistics to 'Quantity' so as to understand the overall distribution and central tendency. o Create a regional harvest summary with pivot tables dates. Charts and Visualizations: Visualize the distribution of quantities over time and across regions with simple charts such as bar graphs or line graphs. Explore any trends or patterns with Excel's charting tools. iii. Documentation: Document Findings:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
o During the initial survey, make note of any discrepancies noticed. o Record the process of cleaning and organizing data for future reference. Prepare for Cleaning: o In light of these findings, sketch out a strategy for data cleaning, indicating how to fill in missing values or deal with aberrations. Mathematical Formula: Use Excel's sorting and filtering features: =FILTER (A2:C5, (B2:B5 <> "")) Step 3: Handling Missing Values: If you find that the number is blank, then it must be filled in by a correct method. Average assignment is a handy filling-in device; you take the average of all of those numbers that are in Quantity. To do this, use the IF (ISBLANK ()...) equation in Excel. Identify Missing Values: Locate empty or missing values in the "Quantity" column first. If the cell is empty, check in Excel using the IF (ISBLANK()) equation. For example: excel This formula tests whether cell A2 is blank. If it is, then "Missing"is returned; otherwise, the original value in A2. Calculate Average Quantity: Compute the average of those values for which there are data in the "Quantity" column by means of AVERAGE. For example: AVERAGE (B2:B100) This formula calculates the average of values in the "Quantity" column from cell B2 to B100. Apply Average Assignment: Use the calculated average to fill in the missing values in the "Quantity" column. Modify the IF(ISBLANK()) formula to incorporate the average. For example: excel IF(ISBLANK(B2), AVERAGE(B2:B100), B2) This formula checks if B2 is blank. If it is, it fills in the average of the "Quantity" column; otherwise, it retains the original value in B2.
Extend Formula: Drag or copy the modified formula down to cover all rows with missing values in the "Quantity" column. Verify Results: Examine the "Quantity" column to ensure that missing values have been filled in with the appropriate averages. Data Validation: Implement data validation rules to prevent future missing values or inconsistencies in the "Quantity" column. Conditional Formatting: Apply conditional formatting to highlight cells where missing values were initially present. Ensure the formatting is clear and easily noticeable. Document Changes: Keep a record of the cells where missing values were filled in, along with the applied averages. Document the steps taken for future reference and transparency. Mathematical Formula: To complete the "amount" pillars, you can utilize typical remediation: =IF (ISBLANK (B2), AVERAGE (B$2:B$4), B2) Step 4: Data Formatting: For reasons of uniformity and convenience in a contrast, the "Harvest Date" parameter must be written as the value of a number. To do this, select the girder, then navigate to the "Home" tab and name the times correctly. To further ensure data comparability, it is advised to enhance assessment machines as applicable. Select the Column: o In Excel, navigate to the worksheet containing the "Harvest Date" column. o Click on the header of the "Harvest Date" column to select the entire column. o Navigate to the "Home" Tab: o Go to the "Home" tab in the Excel ribbon. Format as Number: o Locate the "Number" group within the "Home" tab. Choose Date Format:
o Click on the drop-down menu in the "Number" group. o Select "Short Date" or a date format that suits your preference. Enhance Assessment: o Consider enhancing data comprehensibility and assessment by applying additional formatting options: o Custom Formats: Use custom date formats to display dates in a specific way (e.g., "yyyy- mm-dd"). o Sorting: Ensure that the dates are sorted chronologically for easier analysis. o Filtering: Apply filters to quickly analyze data for specific date ranges. Verify Results: o Check the "Harvest Date" column to ensure that dates are now formatted according to the chosen style. Conditional Formatting: o Utilize conditional formatting to highlight or color-code certain date ranges for visual assessment. Data Validation: o Implement data validation rules to prevent entry of incorrect date formats in the future. Document Formatting Choices: o Keep a record of the chosen date format for the "Harvest Date" column. o Document any additional formatting choices made for enhanced assessment. Include in Data Dictionary: o If applicable, update your data dictionary to include information about the chosen date format for future reference. Mathematical Formula: Format "Harvest Date" as a date: =DATEVALUE (A2) Step 5: Removing Duplicates: Verify that the collection of information is free of duplicates and delete them. In Excel, you may accomplish this by going to the "Data" tab and using the "Delete Replications" feature. Microsoft will remove row duplication when you choose the appropriate columns. Select the Range: o In Excel, navigate to the worksheet containing the data.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
o Select the entire range of data, including headers if applicable. o Navigate to the "Data" Tab: o Go to the "Data" tab in the Excel ribbon. Use "Remove Duplicates" Feature: o Locate the "Data Tools" group within the "Data" tab. Click "Remove Duplicates": o Click on the "Remove Duplicates" button. A dialog box will appear. Choose Columns: o In the dialog box, select the columns where you want to check for duplicates. o For example, if you want to remove duplicates based on the "Region" column, select "Region." Verify Selection: o Double-check that the correct columns are selected to identify duplicates accurately. Configure Options: o If needed, configure additional options such as considering case sensitivity. Click "OK": o Click the "OK" button to execute the removal of duplicate rows. Review Results: o Excel will remove duplicate rows based on the selected columns, leaving only unique records. Data Validation: o Implement data validation rules to prevent the entry of duplicate values in the future. Conditional Formatting: o Use conditional formatting to highlight duplicate values for visual identification before removal. Document Removal of Duplicates: o Keep a record of the columns selected and the results after removing duplicates. o Document any specific considerations or options chosen during the process. Include in Data Dictionary:
o If applicable, update your data dictionary to indicate that duplicate rows have been removed. Step 6: Text Formatting: The "Region" category and any other text-based information in the database should be formatted regularly for ease of reading and evaluation. For instance, to make everything consistent, you can make all the letters lowercase. Select the Column: o In Excel, navigate to the worksheet containing the "Region" column or any other text- based information. o Click on the header of the respective column to select the entire column. Navigate to the "Home" Tab: o Go to the "Home" tab in the Excel ribbon. Text Formatting Options: o In the "Home" tab, locate the "Font" group. Change Case: o Click on the drop-down menu in the "Font" group. o Select the desired text formatting option, such as "Lowercase" to make all letters lowercase. Apply Consistent Formatting: o Ensure that all text-based information in the selected column is formatted consistently. Verify Results: o Check the "Region" column or the relevant text-based column to confirm that text formatting has been applied. Sorting and Filtering: o Use sorting and filtering functions to organize and evaluate data more efficiently. Conditional Formatting: o Apply conditional formatting to highlight specific text patterns or conditions for visual assessment. Document Formatting Choices:
o Keep a record of the chosen text formatting option for the "Region" column or other text- based column. o Document any additional formatting choices made for enhanced readability and evaluation. Include in Data Dictionary: If applicable, update your data dictionary to include information about the chosen text formatting for future reference. Mathematical Formula: Convert the "Region" column to uppercase: =UPPER (C2) Step 7: Applying Security Processes: Click the "Review" tab to lock the assignment so no one else can make changes to it or access it without your permission. Pick the "Protect Sheet" alternative, enter a security code, and provide permissions to specific areas of the spreadsheet. Navigate to the "Review" Tab: o In Excel, go to the "Review" tab in the Excel ribbon. Select "Protect Sheet": o Within the "Changes" group in the "Review" tab, locate and click on the "Protect Sheet" option. Enter Security Code: o A "Protect Sheet" dialog box will appear. o Enter a security code/password in the "Password to unprotect sheet" field. This code will be required to unlock the worksheet. Set Permissions: o Choose the permissions you want to grant. Options may include: o Selecting locked cells: Allows users to select cells in the protected worksheet. o Selecting unlocked cells: Allows users to select and interact with unlocked cells. Define Areas to Protect: o Specify which areas of the worksheet to protect. You can choose to protect the entire sheet or only specific ranges. Click "OK":
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
o After setting permissions and defining protected areas, click the "OK" button to apply the protection. Confirm Security Code: o If prompted, re-enter the security code to confirm. Test Protection: o Attempt to make changes in the protected areas. Excel will prompt for the security code or restrict editing based on the specified permissions. Worksheet Tab Color: o Consider changing the color of the worksheet tab to visually indicate that the sheet is protected. Document Protection Details: o Keep a record of the security code used and specific areas protected. o Document any instructions or notes related to the protected sheet. Include in Documentation: o Update your documentation to include information about the protected sheet, including the security code and areas protected. Communicate Permissions: o If sharing the protected sheet with others, communicate the security code and any relevant instructions regarding permissions. Step 8: Documentation: Make a new section in the Excel file specifically for keeping track of paperwork. Give specifics like the arrival time, restoration process, period structure, and any other changes performed. Following these procedures has prepared your collection of data for evaluation in a consistent and secure manner, and the accompanying information is a complete record of the the preparation activities. Modify these equations according to the collection and individual needs. Insert a New Worksheet: o In Excel, insert a new worksheet to serve as the dedicated section for paperwork tracking. o Rename the new worksheet appropriately, such as "Paperwork Tracking." Define Column Headers: o Set up column headers to capture relevant information. For example: o "Arrival Time"
o "Restoration Process" o "Period Structure" o "Changes Performed" Enter Data: o Start entering data in the rows beneath each column header. o For "Arrival Time," record the date and time when paperwork or data arrived. o For "Restoration Process," document any steps taken to restore or clean the data. o In "Period Structure," record the time frame or period covered by the data. o Under "Changes Performed," document any modifications made to the dataset during the preparation process. o Use "Additional Notes" for any other pertinent information or observations. Include Formulas and Equations: o If applicable, include formulas or equations to automate calculations or track specific metrics. o For example, you might calculate the duration of the restoration process or summarize changes made. Format the Worksheet: o Apply formatting options to enhance readability, such as bold headers or cell shading. o Consider using conditional formatting to highlight specific conditions or entries. Hyperlinks: o Add hyperlinks to related documents or external resources for quick reference. Data Validation: o Implement data validation to ensure entries meet specific criteria or adhere to a defined format. Document Preparation Activities: o In a separate documentation sheet or section, summarize the preparation activities undertaken, including data cleaning, formatting, security measures, and the creation of the paperwork tracking section. Include in Data Dictionary: o Update the data dictionary to include details about the new paperwork tracking section and any specific formulas or equations used. Instructions for Modification:
o Provide clear instructions on how to modify or update the equations and data in the paperwork tracking section to accommodate different datasets or individual needs. Task 2: Objective: For each field of both grape varietals, conduct a comprehensive explanatory survey of statistical research that includes graphs. Once you have the berries' primary and secondary axis lengths, you may run a predictive examination of hypotheses on them. Draw conclusions about the peculiarities of grape types from the data. Include screenshots of each step, as well as any Excel calculations utilized, to prove your point. Primary and Secondary Axis Lengths: o Conduct a detailed statistical analysis for each field of both grape varietals, focusing on primary and secondary axis lengths. o Utilize descriptive statistics to understand the central tendency, dispersion, and shape of the distributions. Graphical Representation: o Create visualizations such as histograms or box plots to depict the distribution of primary and secondary axis lengths for each grape varietal. o Use scatter plots to explore potential correlations between primary and secondary axis lengths. Define Null and Alternative Hypotheses: o Formulate clear null and alternative hypotheses related to primary and secondary axis lengths. o For example, assess whether there is a significant difference in the means of primary axis lengths between grape varietals. Select Appropriate Statistical Tests: o Choose suitable statistical tests, such as t-tests or ANOVA, based on the nature of the hypotheses and the data. Conduct Hypothesis Testing:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
o Perform hypothesis testing using Excel functions or data analysis tools. o Include screenshots of the Excel steps involved in setting up and conducting the tests. Interpret Statistical Significance: o Analyze the results of hypothesis testing to determine whether there are statistically significant differences in primary and secondary axis lengths. Consider Practical Significance: o Assess the practical significance of any observed differences, considering the context of grape cultivation and harvest. Draw Insights about Grape Peculiarities: o Formulate conclusions about the peculiarities of grape types based on the statistical analysis. o Consider the implications for vineyard management, wine production, or other relevant areas. Capture Screenshots: o Include screenshots of each step in Excel, showcasing the data analysis, hypothesis testing, and graphical representations. Excel Calculations: o Provide explanations of the Excel formulas and calculations used in the analysis. o Ensure transparency in the methods applied for statistical research. Propose Recommendations: o Based on the findings, propose recommendations for grape cultivation or winemaking practices. o Consider how the insights gained from the statistical analysis can be applied in practice. Identify Areas for Further Exploration: o Highlight any areas that require further investigation or data collection for a more comprehensive understanding. Step 1: Descriptive Statistical Analysis with Histograms: Get the information Set Ready: import pandas as pd # Assuming the dataset is stored in a CSV file named 'grape_data.csv' df = pd.read_csv( 'grape_data.csv' )
# Display the first few rows of the dataset for initial exploration df.head() First, use the cleaning information. Then, make sure that both vineyard of grapes have Principal and Lesser Axis Lengths. Descriptive Statistical Analysis: # Calculate descriptive statistics for Major Axis Length major_axis_stats = df[ 'Major Axis Length' ].describe() print ( major_axis_stats ) Calculate mean, median, standard deviation, minimum, and maximum for each field using Excel formulas. Create histograms for each field, including Major and Minor Axis Length, to visually represent the distribution. import matplotlib . pyplot as plt import seaborn as sns # Plot histogram for Major Axis Length plt . figure ( figsize = ( 8 , 6 )) sns .histplot(df[ 'Major Axis Length' ], bins = 20 , kde = True ) plt . title ( 'Major Axis Length Histogram' ) plt . show () Screenshots:
Step 2: Inferential Hypothesis Test: Formulate Hypotheses: Null Hypothesis (H0): There is no significant difference in the Major and Minor Axis Lengths between the two grape varieties. Alternative Hypothesis (H1): There is a significant difference in the Major and Minor Axis Lengths between the two grape varieties. alpha = 0.05 # Chosen significance level Performing T-test: from scipy.stats import ttest_ind # Perform t-test t_statistic , p_value = ttest_ind(df[df[ 'Grape Variety' ] == 'Variety A' ] [ 'Major Axis Length' ], df[df[ 'Grape Variety' ] == 'Variety B' ] [ 'Major Axis Length' ]) Select Significance Level: alpha = 0.05 # Chosen significance level Choose a significance level (e.g., α = 0.05). Perform Appropriate Test: Depending on the data distribution, choose an appropriate inferential test (e.g., t-test or Mann- Whitney U test). Analyze Results: Obtain test statistics and p-values. Compare the p-values to the chosen significance level. Screenshots:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Inference and Report: Based on the results obtained from the descriptive statistical analysis and the hypothesis test, make inferences regarding the eccentricity of grape varieties. For example, if the Major and Minor Axis Lengths show significant differences, it might indicate variations in the eccentricity of grape shapes between varieties. Task 3: Objective: Develop a fully interactive dashboard in Excel with slicers. The dashboard will feature three separate charts, including two Pivot Charts for each class of Raisin displaying ConvexArea and Perimeter. The third Pivot Chart is of your choice to support the dashboard's validity and recommendations. Slicers will be utilized to infer which grape variety is the largest and smallest based on the region, as well as identify the class of grape with the greatest and least perimeter. Screenshots are required for each step. Dashboard Development: o Develop a dedicated sheet or section for the interactive dashboard within the Excel workbook. Insert Pivot Tables: o Create Pivot Tables for each class of raisin, summarizing data related to ConvexArea and Perimeter. o Utilize the "Class" as rows, and include ConvexArea and Perimeter in the Values section.
Create Pivot Charts: o Generate two Pivot Charts for each class: one displaying ConvexArea and the other displaying Perimeter. o Include appropriate chart types, such as bar charts or line charts, for effective visualization. Include a Third Pivot Chart: o Introduce a third Pivot Chart of your choice that supports the overall narrative and recommendations. o This could be a chart highlighting trends, comparisons, or any other relevant insights. Data Validation and Verification: o Verify that the Pivot Tables and Charts accurately represent the underlying data. o Ensure that calculations are correct and in alignment with the intended analysis. Step 1: Creating Pivot Charts for ConvexArea and Perimeter ConvexArea Pivot Chart: o Select the data range. o Insert a PivotTable. o Choose "Class of Raisin" for rows, "ConvexArea" for values. o Insert a Pivot Chart based on the PivotTable. o ConvexArea Pivot Chart Perimeter Pivot Chart: o Similar to the ConvexArea chart, choose "Perimeter" for values. o Perimeter Pivot Chart
Step 2: Adding a Third Pivot Chart o Choose a Relevant Pivot Chart: o Decide on a chart that adds validity to your dashboard. For example, a scatter plot comparing ConvexArea and Perimeter. o Third Pivot Chart
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Step 3: Implementing Slicers o Create Slicers: o Insert slicers for "Region," "Grape Variety," and "Class of Raisin." o Connect slicers to all charts. o Slicers
Step 4: Inferring Grape Variety Size and Class Perimeter o Analyze Results: o Use slicers to select different regions and observe changes in ConvexArea and Perimeter charts. o Identify which grape variety appears largest and smallest based on region. o Use slicers to infer which class of grape has the greatest and least perimeter. Step 5: Dashboard Design and Recommendations o Design Dashboard Layout: o Organize charts in a visually appealing layout. o Include titles, labels, and legends for clarity. o Dashboard Layout o Based on the analysis, offer insights into the largest and smallest grape varieties in different regions. o Discuss the class of grape with the greatest and least perimeter. o Justify the design choices made in the dashboard.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help