DAT 430 MOD1 Journal (1)

docx

School

Southern New Hampshire University *

*We aren’t endorsed by this school

Course

430

Subject

Industrial Engineering

Date

Dec 6, 2023

Type

docx

Pages

4

Uploaded by HighnessCrownMouse16

Report
Nathan Cumbo DAT 430 November 1, 2023 Module 1 Journal In this class, we are exploring seven different potential approaches to preprocessing data and relating how or if they might be useful working on the projects for this course: Aggregation, data sampling, dimensionality reduction, feature subset selection, feature creation, discretization & binarization, and variable transformation. As an analyst, the questions to be asked are: What is the target data source for Projects One and Two? Which of these approaches of preprocessing are in scope for meeting the needs of this project? Data aggregation is simply the process of collecting data and presenting it in summary form, for the use of conducting statistical analysis to help company executives make informed decisions regarding marketing strategies, price settings, structuring operations, and more. The data aggregation approach is primarily used by companies to improve marketing and sales. Data aggregation is relevant and applicable in the scope of Project One, where our goal is to analyze HR attrition data and present it in visual form to show the causes of employees leaving their jobs at the rate they are. Data aggregation applies in collecting the attrition information and summarizing it, including causes of attrition and visualization of any recurring patterns found in the data. In data analytics, data sampling is the practice of analyzing a small subset of data collected from a larger set of data, discovering processes, patterns and trends in the small data, and transferring the findings to the larger complete data set. The large benefit of data sampling is that it allows analysts to save time and quickly produce more accurate findings in statistical
analysis (2022). Data sampling likely won’t be necessary for the projects in this class. Dimensionality Reduction is the process of transforming high-dimensional data into low- dimensional data. This makes it easier for analysts to work with raw data with a lot of dimensions by reducing and removing many of those dimensions. This technique is common when working with raw data fields such as speech recognition, language dialect relations, signal processing, neuroinformatics, and bioinformatics. Along with dimensionality reduction comes feature creation and feature subset selection. Feature subset selection comes into play only when dimensionality reduction is also present; in the case of this class and working on Projects one and two, I believe that both dimensional reduction and feature subset selection will play roles in the decision making process. This project calls for the analyst to sort through HR attrition data to create metrics to help them draw conclusions related to why employees are leaving their jobs. When it comes to career attrition, there are a plethora of reasons for leaving; relocation, pay, family emergencies, dislike of coworkers or bosses, benefits, etc. Dimensionality reduction will allow us to sort through this data and make more precise and accurate conclusions. Discretization is essentially the process of regrouping certain values of data in new categorized smaller values. A great example of this is classifying age groups. For example, if we were given the age of 50 participants and asked to group them, we could avoid discretization by placing all participants in groups with others by decade and listing their ages. However, with data discretization, we can make categorized groups, labeled ‘Infant’, ‘Young’, ‘Adult’, and ‘Senior’, for example. This process helps when working with a large pool of data by reducing workload while obtaining minimum data loss.While dimensionality reduction will likely be present, I don’t think discretization will be a factor in the class project. Finally, variable transformation is a way to make the data work better for our model.
There are two types of variable transformation: numeric and categorical. In both cases, the transformation involves turning a variable from its original format, either numeric or categorical, to a numeric variable always. This may be necessary, at least in project one, as we will be looking at categories and characteristics regarding attrition rates. It may be applicable working with further numeric variables and data, such as when collecting data regarding employee numbers, length of time (number of days/months/years) an employee has worked before quitting, hours worked per week, etc. Of these possible approaches, I think the best approaches for working on Projects one and two in this class are data aggregation and dimensional reduction. Dimensional reduction will come into play when we start discovering more and more reasons for increasing HR attrition data. Data aggregation comes into play beforehand, as well as post analysis. Part of data aggregation is putting the data into summary form, for the sake of data visualization and presentation. This project will certainly call for visual representation of our findings, as well as descriptions of their importance and relevance.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
References: Racickas, L. (2023, July 26). Data aggregation: Definition, benefits, and examples . Coresignal. https://coresignal.com/blog/data-aggregation/#:~:text=There%20are%20two %20primary%20types,over%20a%20given%20time%20period . Data sampling . Egnyte. (2022, April 19). https://www.egnyte.com/guides/life- sciences/data-sampling#:~:text=With%20data%20sampling%2C%20researchers%2C %20data,findings%20from%20a%20statistical%20population . Google For Developers. (2016, November 15). A.I. experiments: Visualizing High- Dimensional Space . YouTube. https://www.youtube.com/watch?v=wvsE8jm1GzE Wikimedia Foundation. (2023, October 28). Dimensionality reduction . Wikipedia. https://en.wikipedia.org/wiki/Dimensionality_reduction Discretization in data mining - javatpoint . www.javatpoint.com. (n.d.). https://www.javatpoint.com/discretization-in-data- mining#:~:text=ADVERTISEMENT-,Data%20discretization%20is%20a%20method %20of%20converting%20attributes%20values%20of,discrete%20attributes%20into %20binary%20attributes . DEI, M. (2020, May 1). Catalog of variable transformations to make your model work better . Medium. https://towardsdatascience.com/catalog-of-variable-transformations-to-make- your-model-works-better-7b506bf80b97#:~:text=Variable%20transformation%20is%20a %20way,variable%20to%20another%20numeric%20variable .