AD571_Lecture_4

pdf

School

Boston University *

*We aren’t endorsed by this school

Course

571

Subject

Information Systems

Date

Oct 30, 2023

Type

pdf

Pages

26

Uploaded by ProfessorRaccoonPerson989

Report
Lecture 4 – Data Wrangling and Visualization Learning Objectives After you complete this lecture, you will be familiar with the following concepts: Data wrangling capabilities Importance of familiarity with the data Exploratory and explanatory analysis Data visualizations and communications Data Wrangling The ability to wrangle our data allows us to garner more value from the analysis. Data wrangling is the way we retrieve, evaluate, and pre-process data into a usable format. Although the majority of the pre-processing of raw data is completed for the analyst, by specialists, to get the data into the correct format on a database, the wrangling process is ongoing and nonlinear. The nonlinearity of this process is demonstrated in Wickham’s text in chapter 9. Notice that the transformation process can happen multiple times and this is the case in the work done to complete the term project using the NYC Real Estate Data. Different forms and sources of data are often required for analysis as well as the use of different types of variables (numeric, character, calculations, dates, etc.). Wrangling also includes dealing with missing values. Many forms of wrangling will be required for cases and data structure requirements in this course as we work with various analytics techniques in descriptive, predictive, and prescriptive methodologies. Figure 4.1: Demonstration of Data Wrangling Process
Before we can begin modeling, visualizing, and story-telling with data, one of the most important steps is data wrangling or data preparation. Data preparation happens to assure that the data is clean and ready for analysis. When there is a specific goal for analytics, a majority of the time can sometimes be spent manipulating the data. Data wrangling is also known as transformation and manipulation. For students in this class, RStudio will be used for the process of cleaning and manipulating the data in preparation for analysis and interpretation. Students will use a package called “Tidyverse” for the data wrangling process. Tidyverse 1.3.0 includes multiple packages within the single package for ease of use. The packages that are loaded when using Tidyverse are: broom, cli, dbplyr, dplyr, ggplot2, haven, hms, httr, jsonlite, lubridate, magrittr, modelr, pillar, purr, readr, readxl, reprex, rlang, rstudioapi, rvest, stringr, tibble, tidyr, and xml2. From this extensive list of packages loaded with Tidyverse, we will need to apply a few select ones. The most important packages for the data manipulation process for the NYC Real Estate data will be dplyr, tidyr, lubridate, magrittr, ggplot2 and stringr. In order to complete the data wrangling process, we will require the use of dbReadTable() to extract a data frame with the data from the tables on the database with which we are connected. After the tables have been extracted into a data frame, Tidyverse is used to join tables using data pipes (%>%). Where necessary, columns will be cleaned to remove any extra spaces, records may need to be filtered with a filter() function to remove inaccurate data from the frame. Such values may be ones with 0 and 10 for the price of recorded sales and square footage. Additionally, some of the columns may need to be created with the use of a mutate() function. Additional data manipulation using group() and ungroup() functions is performed as well throughout working with the data. Data Visualization and Business Intelligence The goal of data visualization is to make sense of large amounts of data to gain the necessary insights for a competitive advantage in the business environment. Data visualization focuses on historical and current data, which places the process and concept into the descriptive analytics category. As defined by Evans, data visualization is the process of displaying data in a meaningful fashion to provide insights that will support better decision-making, provide managers with better analysis capabilities that reduce reliance on IT professionals, and improve collaboration and information sharing. Visualizing data allows better communication for different functions and levels of the business, making it possible for analysts to notice patterns and relationships throughout the process. Visualization is the first step in moving towards increasingly advanced analytics methodologies such as predictive and prescriptive analytics. Once we know how the data looks, this knowledge can guide us in choosing the best possible models in the future. Background of Data Understanding the background of your data helps to put everything into perspective. It is important to know what the root source of the data was before it arrived into the database, so that we can think about it contextually. Knowing what actually happened resulting in the data being collected and entered into the records will make that data useful. One of the main benefits of having context for the data is the fact that we are able to ask and answer the question: What else does this mean?
For example, if an analyst has data about website traffic and knows that the website sells products, then data collected about page visits for a product can answer questions about revenue even when we don’t have that data within reach. A product page with the highest level of traffic is highly likely to also have a higher revenue share. This is an example of answering: What else could this mean? Context can change the perspective with which you analyze, interpret, and represent the data to others in the communication process. Although it is standard knowledge, we should be certain that the source of our data is trustworthy, clean, and collected properly. In the process of interpretation, conversations with subject matter experts can help to change the perceptions of the analyst and open up new directions for visualization. We can also apply the 5 W’s to the background of our data to improve the perspective. Who collected the data (whether man or machine)? This can also align with how the data was collected. What does the data represent? Where does the data come from geographically and can it be generalized to other locations? Is it prone to seasonality or specific geo-trends? When was the data collected and will the insight still hold true today? Why was the data collected and is there possibly a specific agenda push, budget request, additional bias, or specific outlook involved that can change the truths we are looking at? Visualization of data is not a purely creative process nor a purely technical process. Visualizing data well requires an understanding of what it means to the real world and how it should be interpreted as we navigate the complexity and uncertainty of our data. Techniques for Storytelling with Data Dashboarding Dashboarding is a descriptive analytics process of visualizing important data for the decision-making process and maintaining a “pulse” on metrics important to strategic implementation and corrective action management. The Big Book of Dashboards defines a dashboard as a visual display of data used to monitor conditions and/or facilitate understanding. We also know that this understanding will help us to make decisions and lead to additional analysis. Although there are various types and forms of dashboards, one important element to have in a dashboard is that it is relevant to the people who need to make decisions. The dashboard is the big picture perspective for analysis and can be considered a preliminary step leading to closer examination of the data. Drilldown The process of data drilldown is used to examine a construct in more detail. For example, measuring the constructs of product performance will require an analyst to explore the revenue, profitability, and trends of specific products by drilling down into the annual, or quarterly sales. Once a construct is connected to an indicator on a dashboard, the analyst will gain the ability to evaluate information in increasing detail. Drilldown creates the benefit of going from a general overview to a specific examination. This allows the analyst to tell a detailed story with increasing relevance for
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
different teams within the organization. For example, the story can focus on specific markets, territories, divisions, products, types of customers, etc. The benefits of drilldown are that the analyst can see what makes up the larger scope figures. The questions capable of being answered are more specific. The data can also be evaluated from different perspectives and gives the reasoning for the big picture data points. The different layers of data tell different stories and provide additional context throughout the process of drilldown. Once we have access to general information, we can seek additional insight from the general to specific iteratively, with each step in the drilldown process informing the next granular step. For example, if the dashboard data shows highest percentage growth of sales in Q4 of a specific year, we can drill down into what the contributing factors are for the increased sales, and explore the data from the different perspectives mentioned above related to geography, people, or product performance, until we find interesting insights that lead to more value being developed in the future. Ad Hoc Reporting Ad hoc reporting happens when there is a need to answer a specific question for a business need. Ad hoc reporting is a descriptive process within business intelligence. The purpose of generating ad hoc reports is to gather only the most relevant data for the problem being addressed. Often there is a scenario in which a regular or static report will not be valuable for a specific question; it also may be necessary to use calculated variables to answer specific questions that come up sporadically. So, when weekly or quarterly reports don’t answer the questions an analyst may have, an ad hoc report may be the necessary element to fill in the gaps of a story. Since versatility, agility, flexibility, and accessibility are important for instant decisions, a powerful business intelligence solution may be required for an analyst to answer questions without waiting for the support of IT team members. Tools such as Power BI, Tableau, Qlik, and R can be used to generate ad hoc reporting that gives expedited answers to time-sensitive questions that come up on a day-to-day basis. Exploratory Analysis Exploratory data analysis is used to gain an understanding of what may be worthwhile to look into with more detail. These noteworthy elements may add value to the analysis, often becoming a point of interest for revenue generating advantages as well as for the development of operational efficiencies. In the process of exploratory data analysis, analysts may take different perspectives and hunt for something that can be highlighted. Exploratory data analysis is usually not very interesting to decision makers because they are mainly interested in “the point.” What’s Interesting Exploratory data analysis allows us to find patterns, anomalies, imperfect dynamics, and correcting assumptions from the business world after visualization is completed. What’s interesting can mean different things to different stakeholders, which leads us to the concept of tailoring information to appeal to the right people, people at different
levels of the organizational hierarchy, who may be performing different functions as well. For example, from a digital sales perspective, the administration of the department may be interested to know who the top customers are, what sources they are coming from, what they are buying, when they are buying, and how often they are buying. These interesting patterns and possible anomalies can be a source of valuable insight, generating considerable interest for these types of stakeholders. Any in-depth insight can mean additional revenue, company growth, career advancement for some, new initiatives, and many other implications. The same can be true for the negative end of the spectrum. Other operations-based stakeholders may be interested in the biggest source of costs and inefficiencies. As long as the information is tailored to the right party with the right message, the data storytelling process will be a success. Understanding and Differentiation Data Representation The same data can be represented in different ways to convey different ideas. As noted by Nathan Yau, the key to effective visualization, deeper understanding of the data, and relevance to the stakeholders, is the connection between the data needed by the users and what it represents in reality. The connection to real life needs to be made for the value to be extracted. Contextualizing the data allows analysts to glean valuable insights by reading "behind" the numbers. Industry knowledge is the key to extracting value from any analytics technique and visualizing allows an in-depth understanding to occur. Variability Variability partially accounts for the interesting occurrences in the data. Although the summary is important, the interesting factors for decisions and storytelling are often found on the edges and different places of data distributions. Traditionally, variability refers to how spread out the data is, which can be measured with descriptive statistics using range, interquartile range, variance, and standard deviation. However, these measures of variability are naturally a summary, which may only be the beginning of finding items of interest. When we explore the variability in detail, we notice more of what’s interesting, as discussed above. The variability can flow from noticing a large range, to looking deeper to find that a particular event happens more at a certain time of day, day of the week, time of the year, a specific location, etc. The range means that there are peaks and valleys in the data, but looking deeper can garner valuable insights about the variability, and juxtaposing industry knowledge can lead to improved decisions and strategic direction. Uncertainty Knowing that we have a limited knowledge of future events, an analyst should make it known that uncertainty is a part of the data and things can change as future records are collected, which is why analysts should be comfortable with communicating in confidence intervals, statistical significance, and probability distributions. All decisions involve uncertainty, and the likelihood of an event is important to consider when data gets visualized and communication occurs. Visualization of Uncertainty When an analyst wants to visualize uncertainty and the distributions of various probabilities. A probability distribution provides us with relative frequencies of events or outcomes. Evans defines a probability distribution as a
characterization of the possible values that a random variable may assume along with the probability of assuming these values. When an analyst visualizes probability distributions, it is useful in telling the story of future expectations of a specific outcome. Future expectations are important to communicate in the assessment of risks throughout the decision-making process. Visualization of probability distributions can also help to explain the nature of the variables we are working with. Knowing the best fitting distribution can give us increasingly accurate insight into the probability of a specific value occurring. Developing Data Visualizations Importance of Data Visualization After connections to the data sources have been established, the next steps are to visualize, monitor, and analyze the data. Although we will explore descriptive, predictive, and prescriptive analytics in the following lectures, visualization of the data is the next step in the process. This is where we begin to look at the big picture. When data is not in a visualized form, it is difficult to spot trends or patterns, or to make any sense of it at all, especially when working with big data sets. Once data is visualized, it is easier to see important insights that lead to better decisions. Top performing products, representatives, cities, categories, and other various units of analysis in the business can be identified for further assessment. The same is true for the lowest performing business units of analysis. Explanatory Following the work that a business analytics professional may do to get the insight for the story, explanatory analysis is used to communicate the main ideas and insight persuasively to a specific audience. Explanatory analysis is an assessment of the discovered insights that allows the story to be developed and explained. Decision makers rarely want to learn about what was done to get to the insight and there are many different levels of understanding to take into account. The following sub-sections include a few tips to consider for visualization and communication. Focus on What’s Important Focusing on what’s important to your audience also means focusing on what is important to the specific decisions that need to be made for the health of the organization. The same insights can be repurposed to mean different things to the various functions. For example, insight that would lead to increased revenue generated via specific communications channels, would have a different meaning for the marketing department than it would for manufacturing operations teams. Same insight, different perspectives. Focus on What’s Relevant
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Organizations shift their aims frequently; focusing on relevance means the insights should be aligned with the goals of the organization. For example, if the company is focusing on cutting costs for the quarter, ways to increase revenue that involve increasing spending will be ignored temporarily because they do not align with the company's goals. So even when the insight is valuable and the opportunity for revenue growth is ripe, it may not be strategically relevant at a specific point in time. A pulse-check will be helpful to make sure that the information will be received with goal connected relevance. Communicating to the right audience A good way to start communicating insights generated is to identify the different audience groups who may be interested in the findings. The communications would need to be different for each group, and emphasis on specific information may be different. The importance of crafting the right story for the right audience cannot be overstated. It is not a good idea to craft a single message to all the different groups, expecting them to determine their own takeaways.Freytag’s model for communicating the intended story to the right audience is seem in Figure 4.2 Figure 4.2: Using Freytag's Pyramid for Communication With Data Now that we are keeping important communication factors in mind, lets explore a few technical aspects of how we can communicate in the data visualization and support the correct interpretation of the intended story. Encoding Data for Visualization As noted in Figure 4.3 there are various options that can be used to improve the way attention is driven to the point an analyst wants to make with the data. In this section we will explore the scenarios in which you will notice different
attributes applied. As you work in your daily life pay attention to the ways you can use the pre-attentive attributes for an overall improved visual communication. Figure 4.3: Preattentive Attributes Source : https://tableau.com Color Color should be intentionally applied to draw attention to and highlight the importance of a specific part of the data. Color can also be used as a tool to set things apart. Although this is best practice many visualizations use color to make the visualization look appealing without considering any practical reason for the application of color. Sequential – normally used to represent a single unit of analysis such as profit, revenue, costs, volume. etc. The sequence of color becomes darker as the value increases. Diverging – When the focus of a part of a dashboard is either on the very high or very low, as well as on opposing sides, diverging colors help the viewer understand instantly that two different things are being compared. One way that a diverging data encoding can be used would be to display profits and losses along with their extremes using two different colors and different shades of those colors. Categorical – The goal is to create a separation between various groupings. The goal is not to use shades of a single color, or two colors as with diverging encoding, but to use a different color to represent each category, therefore eliminating confusion. Highlight – Used to draw attention to specific data without alarming the viewer. Conditionals can be used to highlight specific data, but when highlighting is used too freely, it may reduce its impact and lose its value. Use highlighting sparingly.
Alert – This color technique is used when critical events occur on a dashboard that will require immediate attention. When a decision needs to be made or an intervention needs to take place alerts are displayed in bright and alarming colors. Unlike highlights, alerts are used for more critical data. Dashboard Chart Types Bar Chart Bar charts are a common type of graph because they are simple to develop and interpret. Bar Charts can be basic or more advanced. Variations include horizontal bar charts, grouped bar charts, stacked, and component-based. An example of a stacked bar chart can be seen in Figure 4.5 . As seen in Figure 4.4 , bar charts are most useful for displaying data that can normally fall into nominal or ordinal categories of classification: Nominal data is usually in a category aligned with descriptive or qualitative information such as city of residence or type of product sold. Ordinal data can be similar to nominal because the variables are named, but the different categories are ranked or placed in some type of order (e.g., Good, Better, Best) Figure 4.4: Bar Chart Of Sales Data Source : https://powerbi.microsoft.com When working with nominal data, the categories can be arranged in a way that makes the bars go from the largest to the smallest category, which is helpful to interpret the data. But if this approach were applied to ordinal data it might confuse the viewer. One additional approach is to display negative information with the bars positioned below the x- axis. As noted above, use of color can also drive attention to the negative and positive values. Figure 4.5: Stacked Bar Graph
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Source : https://r4ds.had.co.nz/ Time Series A time series graph is a standard line graph made up of repeated measurements, which have been taken at regular time intervals for a specific period of time ( Figure 4.6 ). Time will be depicted on the x-axis and always on an even scale. When creating a time series graph, the points representing the data are placed at planned and regular intervals, and the points being joined are usually connected with a straight line. Time series graphs are a great option when the analyst needs to identify a trend or pattern in the data. When looking for trends, they become apparent when peaks and troughs increasing with time. Seasonality Time series can also show seasonality in the data, as many people see with the heat and electric bills. Patterns (Cycle Variations) Cycles and patterns are different from seasonality because it is due to a cause and effect relationship. For example, when a business runs a promotion, the revenue increases by a certain percentage. However, there is not guarantee that a promotion will happen regularly. This is where seasonality is different, and a Time series is a great way to depict seasonality and patterns. Figure 4.6: Time Series Of Sales Data
Source : https://cran.r-project.org Choropleth Map Choropleth maps are a good way of displaying regional or local patterns in the data clearly ( Figure 4.7 ). Regional or local patterns could be extremely high product sales in neighboring cities counties, or comparisons between these same proximities. Choropleth maps perform best when we are showing one variable at a time as well as comparisons between two points in time. Choropleth maps are useful for a big picture perspective, but not for the intricacies. Choropleths are good for depicting extremes. One of the reasons it may be difficult to focus on details is that the intervals between the colors may not be the same as the intervals between the values of the data. Therefore, it is more conducive to recognizing the patterns in the map, but it is difficult to associate the exact values of localities or regions with each other. Exploring and evaluating numeric differences would be better suited for tables. Additionally, Chloropleths are not best suited for absolute data and figures; they are better suited for relative data such as rates of sales, rates of returns, and other data that can be converted to percentages. Following this example, it would not be useful to depict the absolute number of units sold on a Chloropleth since we don’t know how many units went to the area or other details that would make the analysis useful. Rates, however, are relative points of information and would be usefully represented on a Choropleth. Figure 4.7: Choropleth of Votes In USA
Source : https://powerbi.microsoft.com Symbol Map A symbol map also uses regional and local mapping to communicate information about specific regions and localities. Unlike a Choropleth, a symbol map can be used to communicate absolute data. For example, if we have a goal to reduce the absolute number of customer complaints without considering how many customers are located in specific regions, then a symbol map would be a good option to see where the largest number of customer complaints are located. What it shows Most commonly the symbol map shows a circle of varying size to communicate the number of occurrences of a specific event, or an absolute value that depicts an important metric. An analyst may look for the large value or a cluster of large values. It is also important to see where there are no markers or only small ones in case you are evaluating low sales numbers or other metrics. One drawback of a symbol map is that when the symbol gets too large, it may cover large areas and make the information uninterpretable as seen in Figure 4.8 . Figure 4.8: Symbol Map
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Source : https://powerbi.microsoft.com Table Tables are structured with the goal to organize and display important and detailed information. The data in reference is arranged into columns and rows. The information that matters is usually displayed as regular text, with words, numbers, and grid lines for organization. Many Business Intelligence professionals remove the gridlines because they believe it is easier to read and more effective for communication of the points. A good table makes it easy to compare multiple pairs of related values such as annual revenue in multiple years ( Figure 4.9 ). Special attention can also be brought to specific figures of interest using a highlighted table as seen in Figure 4.10 . The most widely known example of a table is an organization’s financial statements. But it is important to note that tables are not only used for the purpose of displaying quantitative information. When there are multiple sets of values that directly impact a relationship, it may be useful to apply a table for organization. For example, strategies may be communicated using tables displaying agendas, timing, initiatives, locations, and accountabilities. Figure 4.9: Table of Sales Data
Source : https://docs.microsoft.com When to Use a Table It is important to ask how the data will be used before using a table. There are a few reasons an analyst might select a table, instead of a graph, as the best option for visualization of the data: 1. Stakeholders are planning to use the table for the purpose of looking up specific values. 2. The information will be used to assess quantitative values to spot a pattern. Figure 4.10: Highlighted Table Source : https://help.tableau.com Graphs
Graphs represent the relationships between two variables in the form of a visual display for quantitative information using two axes ( Figure 4.11 ). Graphs help the viewer quickly understand important information, and they are a useful visualization tool when applied correctly. The main benefit of graphs is that they can communicate a lot of data at a glance . Graphs can show a collection of individual values, but more importantly they can be used to show relationships as well as the overall shape of the data. Knowing the shape of the data can aid in the process of building models and evaluating the distributions that may go better with a specific model. Graphs are normally used to represent mathematical functions and statistical data. Figure 4.11: Graph of Function Source : https://ggplot2.tidyverse.org Pie Chart Pie charts are best to use when comparing the parts of a whole ( Figure 4.12 ). The drawback is that they don’t display change over time unless you are using multiple pie charts to depict multiple periods of time. A pie chart best applies when there is a need to evaluate the composition. An interesting use case for a pie chart would be a comparison of attributed growth areas within a business. A pie chart can depict areas responsible for the total turnover, profit, and risks within a business. A pie chart works best when there is only one set of data and it is usually used to show percentage or proportions. As best practice, the pie chart should have less than 10 parts. More than that and it becomes difficult to visually distinguish the size between sections. In the case of categorical data, use of a pie chart would be helpful as each part of the pie would be represented by a different category. Figure 4.12: Pie Chart Of Sales Composition
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Source : https://www.tableau.com/about/blog/2019/1/5-unusual-alternatives-pie-charts-100071 Mosaic Chart When an analyst needs to examine the relationship between two or more categorical variables, a mosaic chart is a perfect visualization option. The mosaic plot uses rectangles with a specified length. Then this initial rectangle gets divided into horizontal bars with the width of each representing the proportions of probabilities that are associated with the categorical variables. Then the associated bars are split vertically into additional bars that are proportionate to the conditional probabilities of the other categorical variables ( Figure 4.13 ). Additional variables can be added into the visualization, and this would require splitting further with a third, or a fourth variable. When two or more variables are categorical, one of the most common visualization tools used to analyze the relationship between them is the mosaic plot. Figure 4.13: Mosaic Of Social Opinion And Behavior
Source : https://cran.r-project.org/web/packages/ggmosaic/vignettes/ggmosaic.html Pyramid Chart Pyramid charts are a strong option for comparing proportional slices of data. For example, sales data of a product for a year can be visualized on a pyramid chart. A pyramid chart can be translated into segments, each segment representing a specific data point ( Figure 4.14 ). The value can be ascending or descending. The height of the pyramid segment can be adjusted, in accord to the entire pyramid representing all the values and their specific data slices. Segments can be distinguished by their background, or border, and other visual elements. The label and values of the pyramid and its segments can be displayed. Figure 4.14: Pyramid Of Sales Process
Source : https://powerbi.microsoft.com/en-us/blog/visual-awesomeness-unlocked-pyramid-3d-chart-by-collabion/ Spider Chart/Radar Chart The spider chart is also referred to by many other names such as web chart, radar chart, star chart, or polar chart. The spider chart uses a two-dimensional display to depict multi-dimension data structures. Although this visualization has multi-dimensional capabilities, we should aim to use approximately 5 items to avoid unnecessary confusion in the process of interpretation. One great use case for a radar chart would be to compare several items with a wide variety of metrics, and characteristics. The major advantage of a spider chart over other visualizations is its ability to compare all the important metrics for a decision. Pros and cons can be instantly judged ( Figure 4.15 ). For example, if we are tasked with competitive analysis, we can examine the top 10 features that may be important to customers and graph our competitors along with our organization. Then we can see the weaknesses of the organization in comparison to others. Similarly, performance of internal team members can also be used to evaluate the strength and weakness of each team member for the purpose of talent development. Figure 4.15: Spider Chart of Tool Strength
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Source : https://www.tableau.com/about/blog/2011/10/which-way-business-intelligence-howard-dresner-13933 Scatterplot A scatter plot, also known as a scatter chart or a scatter graph, is normally a two-dimensional representation of data. This visualization uses points for the representation of values for two different variables. The variables are plotted along the x-axis and y-axis, respectively. Scatter plots are useful for showing the relationship of two variables. This is why correlations usually deploy a scatter plot. The reason scatter plots are the primary option for correlation plots is that they naturally show how two variables compare ( Figure 4.16 ). Some scatterplots can include advanced applications to increase the complexity. Some scatter plots use a trendline to clarify the relationship. Also, the sizes and shapes or colors of the points could represent additional variables, but care should be taken to ensure that the interpretation is not made more complex. The goal is always to be able to communicate important information quickly to the eyes. Figure 4.16: Scatter Plot
Source : https://ggplot2.tidyverse.org/reference/geom_point.html Trellis plot The main application of a Trellis plot is to view relationships of variables from a multivariable data set. Trellis graphics belong to a framework of techniques for viewing complex data sets. Panels are set into rows and columns and sometimes pages. A subset of a relationship is graphed by a display method for encoding the visualization. Different visualizations can be used inside of a trellis. For example, it can be made up of scatterplots to show correlations between multiple variables, boxplots, normal bell curves, or any other technique that would be used to explore a relationship between two variables. The trellis allows us to view the relationships between two variables when there are many variable relationships involved. Every Trellis display has a series of panels, arranged in a row-by-column array ( Figure 4.17 ). The Trellis can communicate a lot of information quickly. Figure 4.17: Trellis Time Series
Source : https://cran.r-project.org Area Chart An area chart shows a time-series relationship, but the difference between an area chart and a line chart time-series is that area charts can visually communicate volume. Information is similarly graphed on the x-axis and y-axis, while the data points are connected with line segments. The area between the x-axis and the data-point line is commonly shaded with a color for interpretation. Usually, area charts are used to compare two categories, or sometimes more, given that the visualization is easy to interpret. Area charts are useful when we communicate with multiple data series and parts of the whole in the relationships. There are two commonly used iterations of area charts, with a best use case scenario depending on the situation: 1. The standard area chart is best for showing or comparing quantitative progress over a period of time. 2. The stacked area chart is the best for visualizing part-of-the-whole relationships, which is helpful for demonstrating how each category contributes to the total volume ( Figure 4.18 ). Figure 4.18: Stacked Area Chart
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Source : https://cran.r-project.org/web/packages/billboarder/readme/README.html Heatmaps The main goal of a Heatmap is to support the visualization of the volume of locations, events, or even behaviors/actions within a dataset. Heatmaps assist in directing the attention of the target audience towards areas that are most influential in the data. Heatmaps are popular for the display of general numeric figures due to a reliance on color to communicate those values. Reliance on color is most useful when we are working with a large volume of data. Because colors are easily differentiated, they often make more sense to the target audience than absolute numbers. Besides their ability to draw attention to the right direction. Heatmapping is a popular technique for data visualizations because it does not need to be interpreted by a professional analyst. Heatmaps can be interpreted by any audience of business users. Heat Maps are highly self-explanatory, and normally the darker area on a heatmap or a deeper shade, or a tighter distribution of a specific color, represents a higher quantity of a variable ( Figure 4.19 ). Although analysts can live without a heatmap, they are great for enhancing the communication of key insights. Figure 4.19: Sales Data Heat Map Visualization
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Source : https://www.tableau.com/about/blog/2018/11/density-mark-type-brings-new-kind-heatmap-tableau-98488 After Visualization: Insights and Decisions Having covered the technology that contains the data as well as converting the data into visualizations that communicate important details, we now have a foundational perspective on the ways we can explore the various data points. Storytelling with data provides us with a set of tools that will make the business analytics process more effective in the downstream in the communication of our findings. It is important to avoid communicating all findings all the time and there is a need to do work up front to extract meaningful insight from the data. The most important work is extracting the insight and communicating the bottom-line up front with the supporting details following the point. We should identify the most important and relevant points and build the supporting story around them. From Data to Insight An insight is defined as the power or act of seeing into a situation. Often it is an unexpected change in the way something is understood. The unexpected aspects of what we know present themselves as we interpret, analyze and examine the data and visualizations. An analyst may uncover a new relationship between variables such as finding an unexpected direction between those variables. For example, we know that larger properties are usually related to higher prices. However, an insight may arise when we find that, in some neighborhoods, a larger property size does not relate to higher property prices. This type of unexpected finding is an insight that would lead to additional
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
questions. In addition to unexpected relationships between variables, insights may also arise from unexpected patterns, trends, or other anomalies in the data. Once uncovered, it will reshape our point of view. It is also important to consider that there are different types of insights some are interesting, entertaining, useful, and valuable. Some interesting insights are not valuable at all. A good insight will shift the way a business operates. For example, if Company A, which caters to an older demographic identifies an insight that shows them that a younger demographic spends more money with them and shops more frequently, it can change the way the business runs and in turn increase profitability. Company A can take action to change the channels it uses for advertising, the creative they use, the models they use, the product they stock in their inventory, and much more. This will then lead to measurable improvement in the business performance. The insight is interesting and valuable for better business decisions and better performance. An insight has the goal to change the way we think, change what we believe to be true, and inspire action to do things in a different way. The data as well as the interpretation of the data lead to new directions. All insights start with raw data ( Figure 4.20 ). The collection of raw data serves as the foundation for gaining knowledge to answer business questions. The data is then organized and synthesized in reports, converting it into information for other people’s consumption. Then people analyze reports and discover meaningful insights that would serve to inform decisions and drive those actions that create value. Insights offer potential value, but it does not guarantee the fruition of value. However, it is there to provide the support for a data-driven decision-making process in place of just guessing or using solely intuition. Figure 4.20: Start with data and finish with business improvement Effective communication is a necessary element to explain your insight in a way that internal and external stakeholders understand it, and more importantly, they are compelled to act on the insights being shared. Simply presenting information or data in the hopes that stakeholders will reap their own takeaways will not be the best option, and the same applies for written managerial reports. It is up to the presenter and writer of the managerial report to make the data meaningful. You need to connect the dots for your audience with a relevant and effective narrative. This can be done in a three-step process to explain, enlighten, and engage (Dykes, 2020).
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Explain to stakeholders what you noticed in the data, what the data is telling us, and why the specified insight is important. Guide the stakeholders through the meaning of what the findings represent as they will often not know the techniques you are using. For example, if you notice a high standard deviation in your data, it is up to you to explain what this standard deviation means in terms of relevance to the business and decision-making process. Enlighten with visualizations of the data. When we have rows of data in a table, it is not easy to identify something interesting until the data is visualized using descriptive analytics techniques or business intelligence tools like Power BI desktop or Tableau. Point to interesting data points, patterns and outliers. Engage with a combination of narrative and visualizations. Immerse your stakeholders in your narrative and bring the insights to life. Conclusion As noted in Evans’ textbook there are a variety of visualization tools, ranging from Excel to other enterprise-grade tools and software available from cloud-based services as well as open-source options. For standard visualization, Excel is a viable option, but when we need to visualize and manipulate Big Data, different tools are required. Tableau, Microsoft Power BI, Qlik, and R allow access to more powerful tools with advanced capabilities. Each organization will be using a different data visualization tool in the technology stack, and becoming a versatile professional comfortable with various tools will be an important advantage in your ability to visualize, interpret, and deliver on measurable business performance. Additional Resources https://www.tableau.com/learn/get-started/desktop-viz-design https://docs.microsoft.com/en-us/power-bi/developer/power-bi-custom-visuals https://docs.microsoft.com/en-us/power-bi/visuals/power-bi-visualization-types-for-reports-and-q-and-a https://docs.microsoft.com/en-us/power-bi/developer/power-bi-custom-visuals-certified Tableau Visualizations Dykes, B. (2020). Effective Data Storytelling : How to drive change with data, narrative, and visuals . Hoboken, New Jersey: John Wiley and Sons. Wexler, S., Shaffer, J., & Cotgreave, A. (2017). The big book of dashboards: visualizing your data using real-world business scenarios . John Wiley & Sons. Few, S. (2012). Show Me the Numbers: Designing Tables and Graphs to Enlighten (Second Edition). Analytics Press. Yau, N. (2013). Data Points Visualization That Means Something. New York: Wiley. Knaflic, C. (2015). Storytelling with data: A data visualization guide for business professionals . Hoboken, New Jersey: Wiley.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help

Browse Popular Homework Q&A

Q: Find the slope of the line that passes through the points (-5,2) and (-1,6). ← Previous
Q: Question 16 A rocket is shot into the air. The function H (t) = -16t² + 150t + 20 gives the height…
Q: Evaluate •10 (22² (x² + y²) ds, C is the top half of the circle with radius 5 centered at (0,0) and…
Q: Evaluate 11₁² x² zdS, where S is the part of the plane z = 3+ 4x + 2y above the rectangle S [0, 5] ×…
Q: Graph the feasible region for the following system of inequalities by drawing a polygon around the…
Q: why are metamorphic rocks present in the stream gravels of the upper Midwest?
Q: Why to work as a Human Service Practitioner? What could be an interesting area of interest in the…
Q: Arc length Find the arc length of the following curves on the given interval. x = 3 cos t, y = 3 sin…
Q: Quick Start Company makes 12-volt car batteries. After many years of product testing, the company…
Q: You kick a ball with an initial velocity who's Maggie code is 38.9 m/s and his direction is 8°. How…
Q: (5) It costs $8,000/km to lay pipe under the water and $5,000/km to walay pipe under the ground.…
Q: Shannon Sharpe owns and operates three Frazer Speedo outlets in Columbus, OH. Frazer Speedo is an…
Q: if x - 2 Calculate the following limits. Enter "DNE" if the limit does not exist, or "oo" or "-oo"…
Q: nurse is assisting with the care of a client following an abdominal aortic aneurysm resection. List…
Q: In the laboratory, a student dilutes 22.9 mL of a 10.4 M hydrochloric acid solution to a total…
Q: Write the slope-intercept form of the equation of a line that contains the 2 point (-6, 7) and has…
Q: of noitsupe 1epimeds pimeda erit sompied (s00) obixoib nodiss g 02.0 soubor Due after Laboratory…
Q: Draw the major and minor products of the E2 elimination shown below. Ignore any inorganic…
Q: A 24 ft long board, whose weight is 14.2 Ibs, is supported from below on each end. A 184.7 lb person…
Q: What is the major product of the following reaction? Note that the reaction is run in THF, an…
Q: 1) How many milliliters of the approximately 0.15 M prepared copper(II) sulfate stock solution are…