DAT-501_6-1 Milestone Two Visualization and Analysis
docx
keyboard_arrow_up
School
Southern New Hampshire University *
*We aren’t endorsed by this school
Course
501
Subject
Business
Date
Feb 20, 2024
Type
docx
Pages
7
Uploaded by AdmiralFlamingoPerson763
DAT-501 Foundations in Data Science
6-1 Milestone Two: Visualization and Analysis
By
Kumari Sweta
Submitted To: Frederick Mobley
Table of Contents
1.
R SCRIPT:
.....................................................................................................................................
3
2.
BOXPLOT:
....................................................................................................................................
4
2.1 T
OTAL
P
ROFIT
BY
R
EGION
..............................................................................................................
4
2.2 T
OTAL
S
ALES
BY
R
EGION
..............................................................................................................
4
3.
ANALYSIS REPORT
....................................................................................................................
5
3.1
S
UMMARIZE
THE
VISUALIZATIONS
.
.............................................................................................
5
3.2 C
OMPARE
AND
CONTRAST
THE
CODE
NEEDED
TO
CONNECT
.
.........................................................
6
4.
REFERENCE :
..............................................................................................................................
8
1.
R Script:
Below is the R script to connect to MySQL database, query and create boxplots.
Boxplot based on total sales by region and other Boxplot based on profit by region.
2.
Boxplot: 2.1 Total Profit by Region
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
2.2 Total Sales by Region
3.
Analysis Report
3.1 Summarize the visualizations.
i.
Explain how the boxplot visuals are useful to illustrate and communicate the data distribution of profit by region and total sales.
Boxplot visuals are useful to illustrate and communicate the data distribution of profit by region and total sales due to following information:
The profit and sales median, quartiles, and range for every region, which show the data's central tendency and variability.
The outliers in profit and sales for regions, which are the data points that are far away from the main cluster of the data are shown. Outliers can be important in identifying unusual or unexpected data points.
The skewness of the data, which is the asymmetry of the distribution of data can be inferred from the boxplot. If one whisker is longer than the other, the data may be skewed in that direction.
Comparison between metrics are easy. For each region, having boxplots side by side (e.g., one for profits and one for sales) allows for easy visual comparison of the distribution of the two metrics.
Stakeholders and decision-makers can quickly grasp key characteristics of the data, helping them make informed decisions based on the distribution of profits and sales across regions.
ii.
Do any regions have outliers? Do any regions not have outliers? Explain your answers.
Based on the data and the boxplot created, we can infer below.
The East and West regions have outliers, as indicated by the points that are significantly distanced from the main cluster of data points. The outliers represent sales amounts that are much higher than the average for those regions.
The Central and South regions do not have any outliers. The data points are relatively close to each other, indicating a consistent range of sales amounts for those regions.
3.2 Compare and contrast the code needed to connect.
i.
Explain how you would connect to and retrieve data stored within an SQL Server database.
Using the ‘RODBC’ package, which offers an interface for connecting to different relational database management systems, including SQL Server, we
can connect to and retrieve data from a SQL Server database in R. Here is a quick comparison of the code to connect to a SQL Server database using ‘RODBC’ against a MySQL database for which we have used RMySQL.
Connecting to MySQL Database (using RMySQL):
#install and load necessary package.
#install.packages("RMySQL")
#install.packages("tidyverse")
library(tidyverse)
library(RMySQL)
#Connect to MYSQL DB
mydb=dbConnect(MySQL(), user=
'snhu'
, password=
'snhu'
, dbname=
'SNHUFinal'
,
host=
'localhost'
)
#Read data from MySQL into a dataframe
sales_data <- dbGetQuery(mydb, "SELECT * FROM CSVImport"
)
Connecting to SQL Server Database (using RODBC):
#install.packages("RODBC")
#install.packages("tidyverse")
library(tidyverse)
library(RODBC)
#Connect to SQL Server DB
con=odbcConnect(
"registered_dsn"
, uid = "server_username"
, pwd = "server_password"
)
#Read data from SQL server into a dataframe
sales_data <- sqlQuery(con, "SELECT * FROM CSVImport"
)
Comparison and Contrast:
1.
Package Difference: The primary difference is in the packages used for connecting to the databases. ‘RMySQL’ is specific to MySQL databases, while ‘RODBC’ is a more general package that can connect to various databases, including SQL Server.
2.
Connection Function: In the MySQL, we use ‘dbConnect’ function from the ‘RMySQL’ package. In the SQL Server, we use ‘odbcConnect’ function from the ‘RODBC’ package. The ‘odbcConnect’ function requires a Data Source Name (DSN) to connect to the SQL Server, which can be set up using ODBC Data Source Administrator.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
3.
Query Function: The query function is also different. In the MySQL, we use ‘dbGetQuery’ from the ‘RMySQL’ package, while in the SQL Server, we use ‘sqlQuery’ from the ‘RODBC’ package.
4.
DSN Requirement for SQL Server: Connecting to an SQL Server database using ‘RODBC’ requires the setup of a DSN, which is not in the MySQL.
Note : To set up a DSN for SQL Server, you can use the ODBC Data Source Administrator: odbcDataSources()
ii.
Include at least one reference to support your answer.
Ripley, B., & Lapsley, M. (2023). RODBC: ODBC Database Access. R package
version 1.3-23.
4.
REFERENCE :
Wickham, H. (2016).
ggplot2: Elegant graphics for data analysis
. Springer-Verlag New York.
Ripley, B., & Lapsley, M. (2023). RODBC: ODBC Database Access. R package version 1.3-23.
Related Documents
Recommended textbooks for you
data:image/s3,"s3://crabby-images/8dfe4/8dfe4483ddef74855b02648efe90cf19111517a4" alt="Text book image"
Management, Loose-Leaf Version
Management
ISBN:9781305969308
Author:Richard L. Daft
Publisher:South-Western College Pub
Recommended textbooks for you
- Management, Loose-Leaf VersionManagementISBN:9781305969308Author:Richard L. DaftPublisher:South-Western College Pub
data:image/s3,"s3://crabby-images/8dfe4/8dfe4483ddef74855b02648efe90cf19111517a4" alt="Text book image"
Management, Loose-Leaf Version
Management
ISBN:9781305969308
Author:Richard L. Daft
Publisher:South-Western College Pub