AD571_Lecture_4
pdf
keyboard_arrow_up
School
Boston University *
*We aren’t endorsed by this school
Course
571
Subject
Information Systems
Date
Oct 30, 2023
Type
Pages
26
Uploaded by ProfessorRaccoonPerson989
Lecture 4 – Data Wrangling and Visualization
Learning Objectives
After you complete this lecture, you will be familiar with the following concepts:
Data wrangling capabilities
Importance of familiarity with the data
Exploratory and explanatory analysis
Data visualizations and communications
Data Wrangling
The ability to wrangle our data allows us to garner more value from the analysis. Data wrangling is the way we
retrieve, evaluate, and pre-process data into a usable format. Although the majority of the pre-processing of raw data
is completed for the analyst, by specialists, to get the data into the correct format on a database, the wrangling
process is ongoing and nonlinear. The nonlinearity of this process is demonstrated in Wickham’s text in chapter 9.
Notice that the transformation process can happen multiple times and this is the case in the work done to complete
the term project using the NYC Real Estate Data. Different forms and sources of data are often required for analysis
as well as the use of different types of variables (numeric, character, calculations, dates, etc.). Wrangling also includes
dealing with missing values. Many forms of wrangling will be required for cases and data structure requirements in
this course as we work with various analytics techniques in descriptive, predictive, and prescriptive methodologies.
Figure 4.1: Demonstration of Data Wrangling Process
Before we can begin modeling, visualizing, and story-telling with data, one of the most important steps is data
wrangling or data preparation. Data preparation happens to assure that the data is clean and ready for analysis.
When there is a specific goal for analytics, a majority of the time can sometimes be spent manipulating the data. Data
wrangling is also known as transformation and manipulation. For students in this class, RStudio will be used for the
process of cleaning and manipulating the data in preparation for analysis and interpretation.
Students will use a package called “Tidyverse” for the data wrangling process. Tidyverse 1.3.0 includes multiple
packages within the single package for ease of use. The packages that are loaded when using Tidyverse are: broom,
cli, dbplyr, dplyr, ggplot2, haven, hms, httr, jsonlite, lubridate, magrittr, modelr, pillar, purr, readr, readxl, reprex, rlang,
rstudioapi, rvest, stringr, tibble, tidyr, and xml2. From this extensive list of packages loaded with Tidyverse, we will
need to apply a few select ones. The most important packages for the data manipulation process for the NYC Real
Estate data will be dplyr, tidyr, lubridate, magrittr, ggplot2 and stringr.
In order to complete the data wrangling process, we will require the use of dbReadTable()
to extract a data frame with
the data from the tables on the database with which we are connected. After the tables have been extracted into a
data frame, Tidyverse is used to join tables using data pipes (%>%). Where necessary, columns will be cleaned to
remove any extra spaces, records may need to be filtered with a filter()
function to remove inaccurate data from the
frame. Such values may be ones with 0 and 10 for the price of recorded sales and square footage. Additionally, some
of the columns may need to be created with the use of a mutate()
function. Additional data manipulation
using group()
and ungroup()
functions is performed as well throughout working with the data.
Data Visualization and Business Intelligence
The goal of data visualization is to make sense of large amounts of data to gain the necessary insights for a
competitive advantage in the business environment. Data visualization focuses on historical and current data, which
places the process and concept into the descriptive analytics category. As defined by Evans, data visualization
is the
process of displaying data in a meaningful fashion to provide insights that will support better decision-making, provide
managers with better analysis capabilities that reduce reliance on IT professionals, and improve collaboration and
information sharing. Visualizing data allows better communication for different functions and levels of the business,
making it possible for analysts to notice patterns and relationships throughout the process. Visualization is the first
step in moving towards increasingly advanced analytics methodologies such as predictive and prescriptive analytics.
Once we know how the data looks, this knowledge can guide us in choosing the best possible models in the future.
Background of Data
Understanding the background of your data helps to put everything into perspective. It is important to know what the
root source of the data was before it arrived into the database, so that we can think about it contextually. Knowing
what actually happened resulting in the data being collected and entered into the records will make that data useful.
One of the main benefits of having context for the data is the fact that we are able to ask and answer the question:
What else does this mean?
For example, if an analyst has data about website traffic and knows that the website sells products, then data
collected about page visits for a product can answer questions about revenue even when we don’t have that data
within reach. A product page with the highest level of traffic is highly likely to also have a higher revenue share. This is
an example of answering: What else could this mean? Context can change the perspective with which you analyze,
interpret, and represent the data to others in the communication process.
Although it is standard knowledge, we should be certain that the source of our data is trustworthy, clean, and collected
properly. In the process of interpretation, conversations with subject matter experts can help to change the
perceptions of the analyst and open up new directions for visualization. We can also apply the 5 W’s to the
background of our data to improve the perspective.
Who
collected the data (whether man or machine)? This can also align with how the data was collected. What does the data represent? Where
does the data come from geographically and can it be generalized to other locations? Is it prone to seasonality
or specific geo-trends? When
was the data collected and will the insight still hold true today? Why
was the data collected and is there possibly a specific agenda push, budget request, additional bias, or specific
outlook involved that can change the truths we are looking at? Visualization of data is not a purely creative process nor a purely technical process. Visualizing data well requires an
understanding of what it means to the real world and how it should be interpreted as we navigate the complexity and
uncertainty of our data.
Techniques for Storytelling with Data
Dashboarding
Dashboarding is a descriptive analytics process of visualizing important data for the decision-making process and
maintaining a “pulse” on metrics important to strategic implementation and corrective action management.
The Big Book of Dashboards defines a dashboard
as a visual display of data used to monitor conditions and/or
facilitate understanding. We also know that this understanding will help us to make decisions and lead to additional
analysis. Although there are various types and forms of dashboards, one important element to have in a dashboard is
that it is relevant to the people who need to make decisions. The dashboard is the big picture perspective for analysis
and can be considered a preliminary step leading to closer examination of the data.
Drilldown
The process of data drilldown is used to examine a construct in more detail. For example, measuring the constructs of
product performance will require an analyst to explore the revenue, profitability, and trends of specific products by
drilling down into the annual, or quarterly sales. Once a construct is connected to an indicator on a dashboard, the
analyst will gain the ability to evaluate information in increasing detail. Drilldown creates the benefit of going from a
general overview to a specific examination. This allows the analyst to tell a detailed story with increasing relevance for
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
different teams within the organization. For example, the story can focus on specific markets, territories, divisions,
products, types of customers, etc.
The benefits of drilldown are that the analyst can see what makes up the larger scope figures. The questions capable
of being answered are more specific. The data can also be evaluated from different perspectives and gives the
reasoning for the big picture data points. The different layers of data tell different stories and provide additional
context throughout the process of drilldown.
Once we have access to general information, we can seek additional insight from the general to specific iteratively,
with each step in the drilldown process informing the next granular step. For example, if the dashboard data shows
highest percentage growth of sales in Q4 of a specific year, we can drill down into what the contributing factors are for
the increased sales, and explore the data from the different perspectives mentioned above related to geography,
people, or product performance, until we find interesting insights that lead to more value being developed in the
future.
Ad Hoc Reporting
Ad hoc reporting happens when there is a need to answer a specific question for a business need. Ad hoc reporting is
a descriptive process within business intelligence. The purpose of generating ad hoc reports is to gather only the most
relevant data for the problem being addressed. Often there is a scenario in which a regular or static report will not be
valuable for a specific question; it also may be necessary to use calculated variables to answer specific questions that
come up sporadically. So, when weekly or quarterly reports don’t answer the questions an analyst may have, an ad
hoc report may be the necessary element to fill in the gaps of a story.
Since versatility, agility, flexibility, and accessibility are important for instant decisions, a powerful business intelligence
solution may be required for an analyst to answer questions without waiting for the support of IT team members. Tools
such as Power BI, Tableau, Qlik, and R can be used to generate ad hoc reporting that gives expedited answers to
time-sensitive questions that come up on a day-to-day basis.
Exploratory Analysis
Exploratory data analysis is used to gain an understanding of what may be worthwhile to look into with more detail.
These noteworthy elements may add value to the analysis, often becoming a point of interest for revenue generating
advantages as well as for the development of operational efficiencies. In the process of exploratory data analysis,
analysts may take different perspectives and hunt for something that can be highlighted. Exploratory data analysis is
usually not very interesting to decision makers because they are mainly interested in “the point.”
What’s Interesting
Exploratory data analysis allows us to find patterns, anomalies, imperfect dynamics, and correcting assumptions from
the business world after visualization is completed. What’s interesting can mean different things to different
stakeholders, which leads us to the concept of tailoring information to appeal to the right people, people at different
levels of the organizational hierarchy, who may be performing different functions as well. For example, from a digital
sales perspective, the administration of the department may be interested to know who the top customers are, what
sources they are coming from, what they are buying, when they are buying, and how often they are buying. These
interesting patterns and possible anomalies can be a source of valuable insight, generating considerable interest for
these types of stakeholders. Any in-depth insight can mean additional revenue, company growth, career advancement
for some, new initiatives, and many other implications. The same can be true for the negative end of the spectrum.
Other operations-based stakeholders may be interested in the biggest source of costs and inefficiencies. As long as
the information is tailored to the right party with the right message, the data storytelling process will be a success.
Understanding and Differentiation
Data Representation
The same data can be represented in different ways to convey different ideas. As noted by Nathan Yau, the key to
effective visualization, deeper understanding of the data, and relevance to the stakeholders, is the connection
between the data needed by the users and what it represents in reality. The connection to real life needs to be made
for the value to be extracted. Contextualizing the data allows analysts to glean valuable insights by reading "behind"
the numbers. Industry knowledge is the key to extracting value from any analytics technique and visualizing allows an
in-depth understanding to occur.
Variability
Variability partially accounts for the interesting occurrences in the data. Although the summary is important, the
interesting factors for decisions and storytelling are often found on the edges and different places of data distributions.
Traditionally, variability refers to how spread out the data is, which can be measured with descriptive statistics using
range, interquartile range, variance, and standard deviation. However, these measures of variability are naturally a
summary, which may only be the beginning of finding items of interest. When we explore the variability in detail, we
notice more of what’s interesting, as discussed above. The variability can flow from noticing a large range, to looking
deeper to find that a particular event happens more at a certain time of day, day of the week, time of the year, a
specific location, etc. The range means that there are peaks and valleys in the data, but looking deeper can garner
valuable insights about the variability, and juxtaposing industry knowledge can lead to improved decisions and
strategic direction.
Uncertainty
Knowing that we have a limited knowledge of future events, an analyst should make it known that uncertainty is a part
of the data and things can change as future records are collected, which is why analysts should be comfortable with
communicating in confidence intervals, statistical significance, and probability distributions. All decisions involve
uncertainty, and the likelihood of an event is important to consider when data gets visualized and communication
occurs.
Visualization of Uncertainty
When an analyst wants to visualize uncertainty and the distributions of various probabilities. A probability distribution
provides us with relative frequencies of events or outcomes. Evans defines a probability distribution
as a
characterization of the possible values that a random variable may assume along with the probability of assuming
these values. When an analyst visualizes probability distributions, it is useful in telling the story of future expectations
of a specific outcome. Future expectations are important to communicate in the assessment of risks throughout the
decision-making process. Visualization of probability distributions can also help to explain the nature of the variables
we are working with. Knowing the best fitting distribution can give us increasingly accurate insight into the probability
of a specific value occurring.
Developing Data Visualizations
Importance of Data Visualization
After connections to the data sources have been established, the next steps are to visualize, monitor, and analyze the
data. Although we will explore descriptive, predictive, and prescriptive analytics in the following lectures, visualization
of the data is the next step in the process. This is where we begin to look at the big picture.
When data is not in a visualized form, it is difficult to spot trends or patterns, or to make any sense of it at all,
especially when working with big data sets. Once data is visualized, it is easier to see important insights that lead to
better decisions. Top performing products, representatives, cities, categories, and other various units of analysis in the
business can be identified for further assessment. The same is true for the lowest performing business units of
analysis.
Explanatory
Following the work that a business analytics professional may do to get the insight for the story, explanatory analysis
is used to communicate the main ideas and insight persuasively to a specific audience. Explanatory analysis is an
assessment of the discovered insights that allows the story to be developed and explained. Decision makers rarely
want to learn about what was done to get to the insight and there are many different levels of understanding to take
into account. The following sub-sections include a few tips to consider for visualization and communication.
Focus on What’s Important
Focusing on what’s important to your audience also means focusing on what is important to the specific decisions that
need to be made for the health of the organization. The same insights can be repurposed to mean different things to
the various functions. For example, insight that would lead to increased revenue generated via specific
communications channels, would have a different meaning for the marketing department than it would for
manufacturing operations teams. Same insight, different perspectives.
Focus on What’s Relevant
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Organizations shift their aims frequently; focusing on relevance means the insights should be aligned with the goals of
the organization. For example, if the company is focusing on cutting costs for the quarter, ways to increase revenue
that involve increasing spending will be ignored temporarily because they do not align with the company's goals. So
even when the insight is valuable and the opportunity for revenue growth is ripe, it may not be strategically relevant at
a specific point in time. A pulse-check will be helpful to make sure that the information will be received with goal
connected relevance.
Communicating to the right audience
A good way to start communicating insights generated is to identify the different audience groups who may be
interested in the findings. The communications would need to be different for each group, and emphasis on specific
information may be different. The importance of crafting the right story for the right audience cannot be overstated. It
is not a good idea to craft a single message to all the different groups, expecting them to determine their own
takeaways.Freytag’s model for communicating the intended story to the right audience is seem in Figure 4.2
Figure 4.2: Using Freytag's Pyramid for Communication With Data
Now that we are keeping important communication factors in mind, lets explore a few technical aspects of how we can
communicate in the data visualization and support the correct interpretation of the intended story.
Encoding Data for Visualization
As noted in Figure 4.3 there are various options that can be used to improve the way attention is driven to the point an
analyst wants to make with the data. In this section we will explore the scenarios in which you will notice different
attributes applied. As you work in your daily life pay attention to the ways you can use the pre-attentive attributes for
an overall improved visual communication.
Figure 4.3: Preattentive Attributes
Source
: https://tableau.com
Color
Color should be intentionally applied to draw attention to and highlight the importance of a specific part of the data.
Color can also be used as a tool to set things apart. Although this is best practice many visualizations use color to
make the visualization look appealing without considering any practical reason for the application of color.
Sequential – normally used to represent a single unit of analysis such as profit, revenue, costs, volume. etc. The
sequence of color becomes darker as the value increases.
Diverging – When the focus of a part of a dashboard is either on the very high or very low, as well as on opposing
sides, diverging colors help the viewer understand instantly that two different things are being compared. One way
that a diverging data encoding can be used would be to display profits and losses along with their extremes using two
different colors and different shades of those colors.
Categorical – The goal is to create a separation between various groupings. The goal is not to use shades of a single
color, or two colors as with diverging encoding, but to use a different color to represent each category, therefore
eliminating confusion.
Highlight – Used to draw attention to specific data without alarming the viewer. Conditionals can be used to highlight
specific data, but when highlighting is used too freely, it may reduce its impact and lose its value. Use highlighting
sparingly.
Alert – This color technique is used when critical events occur on a dashboard that will require immediate attention.
When a decision needs to be made or an intervention needs to take place alerts are displayed in bright and alarming
colors. Unlike highlights, alerts are used for more critical data.
Dashboard Chart Types
Bar Chart
Bar charts are a common type of graph because they are simple to develop and interpret. Bar Charts can be basic or
more advanced. Variations include horizontal bar charts, grouped bar charts, stacked, and component-based. An
example of a stacked bar chart can be seen in Figure 4.5
.
As seen in Figure 4.4
, bar charts are most useful for displaying data that can normally fall into nominal or ordinal
categories of classification:
Nominal data is usually in a category aligned with descriptive or qualitative information such as city of
residence or type of product sold.
Ordinal data can be similar to nominal because the variables are named, but the different categories are
ranked or placed in some type of order (e.g., Good, Better, Best)
Figure 4.4: Bar Chart Of Sales Data
Source
: https://powerbi.microsoft.com
When working with nominal data, the categories can be arranged in a way that makes the bars go from the largest to
the smallest category, which is helpful to interpret the data. But if this approach were applied to ordinal data it might
confuse the viewer. One additional approach is to display negative information with the bars positioned below the x-
axis. As noted above, use of color can also drive attention to the negative and positive values.
Figure 4.5: Stacked Bar Graph
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Source
: https://r4ds.had.co.nz/
Time Series
A time series graph is a standard line graph made up of repeated measurements, which have been taken at regular
time intervals for a specific period of time (
Figure 4.6
). Time will be depicted on the x-axis and always on an even
scale. When creating a time series graph, the points representing the data are placed at planned and regular
intervals, and the points being joined are usually connected with a straight line.
Time series graphs are a great option when the analyst needs to identify a trend or pattern in the data. When looking
for trends, they become apparent when peaks and troughs increasing with time.
Seasonality
Time series can also show seasonality in the data, as many people see with the heat and electric bills.
Patterns (Cycle Variations)
Cycles and patterns are different from seasonality because it is due to a cause and effect relationship. For example,
when a business runs a promotion, the revenue increases by a certain percentage. However, there is not guarantee
that a promotion will happen regularly. This is where seasonality is different, and a Time series is a great way to depict
seasonality and patterns.
Figure 4.6: Time Series Of Sales Data
Source
: https://cran.r-project.org
Choropleth Map
Choropleth maps are a good way of displaying regional or local patterns in the data clearly (
Figure 4.7
). Regional or
local patterns could be extremely high product sales in neighboring cities counties, or comparisons between these
same proximities.
Choropleth maps perform best when we are showing one variable at a time as well as comparisons between two
points in time. Choropleth maps are useful for a big picture perspective, but not for the intricacies. Choropleths are
good for depicting extremes. One of the reasons it may be difficult to focus on details is that the intervals between the
colors may not be the same as the intervals between the values of the data. Therefore, it is more conducive to
recognizing the patterns in the map, but it is difficult to associate the exact values of localities or regions with each
other. Exploring and evaluating numeric differences would be better suited for tables. Additionally, Chloropleths are
not best suited for absolute data and figures; they are better suited for relative data such as rates of sales, rates of
returns, and other data that can be converted to percentages. Following this example, it would not be useful to depict
the absolute number of units sold on a Chloropleth since we don’t know how many units went to the area or other
details that would make the analysis useful. Rates, however, are relative points of information and would be usefully
represented on a Choropleth.
Figure 4.7: Choropleth of Votes In USA
Source
: https://powerbi.microsoft.com
Symbol Map
A symbol map also uses regional and local mapping to communicate information about specific regions and localities.
Unlike a Choropleth, a symbol map can be used to communicate absolute data. For example, if we have a goal to
reduce the absolute number of customer complaints without considering how many customers are located in specific
regions, then a symbol map would be a good option to see where the largest number of customer complaints are
located.
What it shows
Most commonly the symbol map shows a circle of varying size to communicate the number of occurrences of a
specific event, or an absolute value that depicts an important metric. An analyst may look for the large value or a
cluster of large values. It is also important to see where there are no markers or only small ones in case you are
evaluating low sales numbers or other metrics. One drawback of a symbol map is that when the symbol gets too
large, it may cover large areas and make the information uninterpretable as seen in Figure 4.8
.
Figure 4.8: Symbol Map
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Source
: https://powerbi.microsoft.com
Table
Tables are structured with the goal to organize and display important and detailed information. The data in reference
is arranged into columns and rows. The information that matters is usually displayed as regular text, with words,
numbers, and grid lines for organization. Many Business Intelligence professionals remove the gridlines because they
believe it is easier to read and more effective for communication of the points.
A good table makes it easy to compare multiple pairs of related values such as annual revenue in multiple years
(
Figure 4.9
). Special attention can also be brought to specific figures of interest using a highlighted table as seen in
Figure 4.10
. The most widely known example of a table is an organization’s financial statements. But it is important to
note that tables are not only used for the purpose of displaying quantitative information. When there are multiple sets
of values that directly impact a relationship, it may be useful to apply a table for organization. For example, strategies
may be communicated using tables displaying agendas, timing, initiatives, locations, and accountabilities.
Figure 4.9: Table of Sales Data
Source
: https://docs.microsoft.com
When to Use a Table
It is important to ask how the data will be used before using a table. There are a few reasons an analyst might select a
table, instead of a graph, as the best option for visualization of the data:
1. Stakeholders are planning to use the table for the purpose of looking up specific values.
2. The information will be used to assess quantitative values to spot a pattern.
Figure 4.10: Highlighted Table
Source
: https://help.tableau.com
Graphs
Graphs represent the relationships between two variables in the form of a visual display for quantitative information
using two axes (
Figure 4.11
). Graphs help the viewer quickly understand important information, and they are a useful
visualization tool when applied correctly. The main benefit of graphs is that they can communicate a lot of data at a
glance . Graphs can show a collection of individual values, but more importantly they can be used to show
relationships as well as the overall shape of the data. Knowing the shape of the data can aid in the process of building
models and evaluating the distributions that may go better with a specific model. Graphs are normally used to
represent mathematical functions and statistical data.
Figure 4.11: Graph of Function
Source
: https://ggplot2.tidyverse.org
Pie Chart
Pie charts are best to use when comparing the parts of a whole (
Figure 4.12
). The drawback is that they don’t display
change over time unless you are using multiple pie charts to depict multiple periods of time.
A pie chart best applies when there is a need to evaluate the composition. An interesting use case for a pie chart
would be a comparison of attributed growth areas within a business. A pie chart can depict areas responsible for the
total turnover, profit, and risks within a business. A pie chart works best when there is only one set of data and it is
usually used to show percentage or proportions. As best practice, the pie chart should have less than 10 parts. More
than that and it becomes difficult to visually distinguish the size between sections. In the case of categorical data, use
of a pie chart would be helpful as each part of the pie would be represented by a different category.
Figure 4.12: Pie Chart Of Sales Composition
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Source
: https://www.tableau.com/about/blog/2019/1/5-unusual-alternatives-pie-charts-100071
Mosaic Chart
When an analyst needs to examine the relationship between two or more categorical variables, a mosaic chart is a
perfect visualization option.
The mosaic plot uses rectangles with a specified length. Then this initial rectangle gets divided into horizontal bars
with the width of each representing the proportions of probabilities that are associated with the categorical variables.
Then the associated bars are split vertically into additional bars that are proportionate to the conditional probabilities
of the other categorical variables (
Figure 4.13
). Additional variables can be added into the visualization, and this
would require splitting further with a third, or a fourth variable. When two or more variables are categorical, one of the
most common visualization tools used to analyze the relationship between them is the mosaic plot.
Figure 4.13: Mosaic Of Social Opinion And Behavior
Source
: https://cran.r-project.org/web/packages/ggmosaic/vignettes/ggmosaic.html
Pyramid Chart
Pyramid charts are a strong option for comparing proportional slices of data. For example, sales data of a product for
a year can be visualized on a pyramid chart. A pyramid chart can be translated into segments, each segment
representing a specific data point (
Figure 4.14
). The value can be ascending or descending. The height of the
pyramid segment can be adjusted, in accord to the entire pyramid representing all the values and their specific data
slices.
Segments can be distinguished by their background, or border, and other visual elements. The label and values of the
pyramid and its segments can be displayed.
Figure 4.14: Pyramid Of Sales Process
Source
: https://powerbi.microsoft.com/en-us/blog/visual-awesomeness-unlocked-pyramid-3d-chart-by-collabion/
Spider Chart/Radar Chart
The spider chart is also referred to by many other names such as web chart, radar chart, star chart, or polar chart.
The spider chart uses a two-dimensional display to depict multi-dimension data structures. Although this visualization
has multi-dimensional capabilities, we should aim to use approximately 5 items to avoid unnecessary confusion in the
process of interpretation.
One great use case for a radar chart would be to compare several items with a wide variety of metrics, and
characteristics. The major advantage of a spider chart over other visualizations is its ability to compare all the
important metrics for a decision. Pros and cons can be instantly judged (
Figure 4.15
). For example, if we are tasked
with competitive analysis, we can examine the top 10 features that may be important to customers and graph our
competitors along with our organization. Then we can see the weaknesses of the organization in comparison to
others. Similarly, performance of internal team members can also be used to evaluate the strength and weakness of
each team member for the purpose of talent development.
Figure 4.15: Spider Chart of Tool Strength
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Source
: https://www.tableau.com/about/blog/2011/10/which-way-business-intelligence-howard-dresner-13933
Scatterplot
A scatter plot, also known as a scatter chart or a scatter graph, is normally a two-dimensional representation of data.
This visualization uses points for the representation of values for two different variables. The variables are plotted
along the x-axis and y-axis, respectively.
Scatter plots are useful for showing the relationship of two variables. This is why correlations usually deploy a scatter
plot. The reason scatter plots are the primary option for correlation plots is that they naturally show how two variables
compare (
Figure 4.16
).
Some scatterplots can include advanced applications to increase the complexity. Some scatter plots use a trendline to
clarify the relationship. Also, the sizes and shapes or colors of the points could represent additional variables, but care
should be taken to ensure that the interpretation is not made more complex. The goal is always to be able to
communicate important information quickly to the eyes.
Figure 4.16: Scatter Plot
Source
: https://ggplot2.tidyverse.org/reference/geom_point.html
Trellis plot
The main application of a Trellis plot is to view relationships of variables from a multivariable data set. Trellis graphics
belong to a framework of techniques for viewing complex data sets. Panels are set into rows and columns and
sometimes pages. A subset of a relationship is graphed by a display method for encoding the visualization. Different
visualizations can be used inside of a trellis. For example, it can be made up of scatterplots to show correlations
between multiple variables, boxplots, normal bell curves, or any other technique that would be used to explore a
relationship between two variables. The trellis allows us to view the relationships between two variables when there
are many variable relationships involved.
Every Trellis display has a series of panels, arranged in a row-by-column array (
Figure 4.17
). The Trellis can
communicate a lot of information quickly.
Figure 4.17: Trellis Time Series
Source
: https://cran.r-project.org
Area Chart
An area chart shows a time-series relationship, but the difference between an area chart and a line chart time-series
is that area charts can visually communicate volume. Information is similarly graphed on the x-axis and y-axis, while
the data points are connected with line segments. The area between the x-axis and the data-point line is commonly
shaded with a color for interpretation. Usually, area charts are used to compare two categories, or sometimes more,
given that the visualization is easy to interpret. Area charts are useful when we communicate with multiple data series
and parts of the whole in the relationships. There are two commonly used iterations of area charts, with a best use
case scenario depending on the situation:
1. The standard area chart is best for showing or comparing quantitative progress over a period of time.
2. The stacked area chart is the best for visualizing part-of-the-whole relationships, which is helpful for
demonstrating how each category contributes to the total volume (
Figure 4.18
).
Figure 4.18: Stacked Area Chart
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Source
: https://cran.r-project.org/web/packages/billboarder/readme/README.html
Heatmaps
The main goal of a Heatmap is to support the visualization of the volume of locations, events, or even
behaviors/actions within a dataset. Heatmaps assist in directing the attention of the target audience towards areas
that are most influential in the data.
Heatmaps are popular for the display of general numeric figures due to a reliance on color to communicate those
values. Reliance on color is most useful when we are working with a large volume of data. Because colors are easily
differentiated, they often make more sense to the target audience than absolute numbers.
Besides their ability to draw attention to the right direction. Heatmapping is a popular technique for data visualizations
because it does not need to be interpreted by a professional analyst. Heatmaps can be interpreted by any audience of
business users. Heat Maps are highly self-explanatory, and normally the darker area on a heatmap or a deeper
shade, or a tighter distribution of a specific color, represents a higher quantity of a variable (
Figure 4.19
). Although
analysts can live without a heatmap, they are great for enhancing the communication of key insights.
Figure 4.19: Sales Data Heat Map Visualization
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Source
: https://www.tableau.com/about/blog/2018/11/density-mark-type-brings-new-kind-heatmap-tableau-98488
After Visualization: Insights and Decisions
Having covered the technology that contains the data as well as converting the data into visualizations that
communicate important details, we now have a foundational perspective on the ways we can explore the various data
points. Storytelling with data provides us with a set of tools that will make the business analytics process more
effective in the downstream in the communication of our findings. It is important to avoid communicating all findings all
the time and there is a need to do work up front to extract meaningful insight from the data. The most important work
is extracting the insight and communicating the bottom-line up front with the supporting details following the point. We
should identify the most important and relevant points and build the supporting story around them.
From Data to Insight
An insight
is defined as the power or act of seeing into a situation. Often it is an unexpected change in the way
something is understood. The unexpected aspects of what we know present themselves as we interpret, analyze and
examine the data and visualizations. An analyst may uncover a new relationship between variables such as finding an
unexpected direction between those variables. For example, we know that larger properties are usually related to
higher prices. However, an insight may arise when we find that, in some neighborhoods, a larger property size does
not relate to higher property prices. This type of unexpected finding is an insight that would lead to additional
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
questions. In addition to unexpected relationships between variables, insights may also arise from unexpected
patterns, trends, or other anomalies in the data. Once uncovered, it will reshape our point of view.
It is also important to consider that there are different types of insights some are interesting, entertaining, useful, and
valuable. Some interesting insights are not valuable at all. A good insight will shift the way a business operates. For
example, if Company A, which caters to an older demographic identifies an insight that shows them that a younger
demographic spends more money with them and shops more frequently, it can change the way the business runs and
in turn increase profitability. Company A can take action to change the channels it uses for advertising, the creative
they use, the models they use, the product they stock in their inventory, and much more. This will then lead to
measurable improvement in the business performance. The insight is interesting and valuable for better business
decisions and better performance. An insight has the goal to change the way we think, change what we believe to be
true, and inspire action to do things in a different way. The data as well as the interpretation of the data lead to new
directions.
All insights start with raw data (
Figure 4.20
). The collection of raw data serves as the foundation for gaining
knowledge to answer business questions. The data is then organized and synthesized in reports, converting it into
information for other people’s consumption. Then people analyze reports and discover meaningful insights that would
serve to inform decisions and drive those actions that create value. Insights offer potential value, but it does not
guarantee the fruition of value. However, it is there to provide the support for a data-driven decision-making process in
place of just guessing or using solely intuition.
Figure 4.20: Start with data and finish with business improvement
Effective communication is a necessary element to explain your insight in a way that internal and external
stakeholders understand it, and more importantly, they are compelled to act on the insights being shared. Simply
presenting information or data in the hopes that stakeholders will reap their own takeaways will not be the best option,
and the same applies for written managerial reports. It is up to the presenter and writer of the managerial report to
make the data meaningful. You need to connect the dots for your audience with a relevant and effective narrative.
This can be done in a three-step process to explain, enlighten, and engage (Dykes, 2020).
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Explain
to stakeholders what you noticed in the data, what the data is telling us, and why the specified insight
is important. Guide the stakeholders through the meaning of what the findings represent as they will often not
know the techniques you are using. For example, if you notice a high standard deviation in your data, it is up to
you to explain what this standard deviation means in terms of relevance to the business and decision-making
process.
Enlighten with visualizations
of the data. When we have rows of data in a table, it is not easy to identify
something interesting until the data is visualized using descriptive analytics techniques or business intelligence
tools like Power BI desktop or Tableau. Point to interesting data points, patterns and outliers.
Engage with a combination of narrative and visualizations. Immerse your stakeholders in your narrative and
bring the insights to life.
Conclusion
As noted in Evans’ textbook there are a variety of visualization tools, ranging from Excel to other enterprise-grade
tools and software available from cloud-based services as well as open-source options. For standard visualization,
Excel is a viable option, but when we need to visualize and manipulate Big Data, different tools are required. Tableau,
Microsoft Power BI, Qlik, and R allow access to more powerful tools with advanced capabilities. Each organization will
be using a different data visualization tool in the technology stack, and becoming a versatile professional comfortable
with various tools will be an important advantage in your ability to visualize, interpret, and deliver on measurable
business performance.
Additional Resources
https://www.tableau.com/learn/get-started/desktop-viz-design https://docs.microsoft.com/en-us/power-bi/developer/power-bi-custom-visuals https://docs.microsoft.com/en-us/power-bi/visuals/power-bi-visualization-types-for-reports-and-q-and-a https://docs.microsoft.com/en-us/power-bi/developer/power-bi-custom-visuals-certified Tableau Visualizations Dykes, B. (2020). Effective Data Storytelling : How to drive change with data, narrative, and visuals
. Hoboken, New
Jersey: John Wiley and Sons. Wexler, S., Shaffer, J., & Cotgreave, A. (2017). The big book of dashboards: visualizing your data using real-world
business scenarios
. John Wiley & Sons. Few, S. (2012). Show Me the Numbers: Designing Tables and Graphs to Enlighten (Second Edition). Analytics Press. Yau, N. (2013). Data Points Visualization That Means Something.
New York: Wiley. Knaflic, C. (2015). Storytelling with data: A data visualization guide for business professionals
. Hoboken, New Jersey:
Wiley.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Browse Popular Homework Q&A
Q: Fast block to polyspermy is achieved by
Q: Ethane is a very combustible gas.
Write a balanced equation for the complete oxidation reaction that…
Q: Why does Kelley suggest that the Communist Party attracted so many prominent Black activists in the…
Q: Find the slope of the line that passes through the points (-5,2) and (-1,6).
← Previous
Q: Question 16
A rocket is shot into the air. The function H (t) = -16t² + 150t + 20 gives the height…
Q: Evaluate
•10 (22²
(x² + y²) ds, C is the top half of the circle with radius 5 centered at (0,0) and…
Q: Evaluate
11₁²
x² zdS, where S is the part of the plane z = 3+ 4x + 2y above the rectangle
S
[0, 5] ×…
Q: Graph the feasible region for the following system of inequalities by drawing a polygon around the…
Q: why are metamorphic rocks present in the stream gravels of the upper Midwest?
Q: Why to work as a Human Service Practitioner? What could be an interesting area of interest in the…
Q: Arc length Find the arc length of the following curves on the given interval.
x = 3 cos t, y = 3 sin…
Q: Quick Start Company makes 12-volt car batteries. After many years of product testing, the company…
Q: You kick a ball with an initial velocity who's Maggie code is 38.9 m/s and his direction is 8°. How…
Q: (5)
It costs $8,000/km to lay pipe
under the water and $5,000/km to
walay pipe under the ground.…
Q: Shannon Sharpe owns and operates three Frazer Speedo outlets in Columbus, OH.
Frazer Speedo is an…
Q: if x - 2
Calculate the following limits. Enter "DNE" if the limit does not exist, or "oo" or "-oo"…
Q: nurse is assisting with the care of a client following an abdominal aortic aneurysm resection. List…
Q: In the laboratory, a student dilutes 22.9 mL of a 10.4 M hydrochloric acid solution to a total…
Q: Write the slope-intercept form of the equation of a line that contains the
2
point (-6, 7) and has…
Q: of noitsupe 1epimeds
pimeda erit sompied (s00) obixoib nodiss g 02.0 soubor
Due after Laboratory…
Q: Draw the major and minor products of the E2 elimination shown
below. Ignore any inorganic…
Q: A 24 ft long board, whose weight is 14.2 Ibs, is supported from below on each end. A 184.7 lb person…
Q: What is the major product of the following reaction? Note that the reaction is run in THF, an…
Q: 1) How many milliliters of the approximately 0.15 M prepared copper(II) sulfate stock solution are…