Lab11_R copy

docx

School

The City College of New York, CUNY *

*We aren’t endorsed by this school

Course

228

Subject

Computer Science

Date

Apr 3, 2024

Type

docx

Pages

30

Uploaded by LieutenantOxide7028

Report
Lab 11: Introduction to the R statistical computing language Lab objectives: I. Learn the basics of R and RStudio a. Setting up RStudio b. Objects and functions II. Learn how to import and export files in R a. Directories and paths b. Reading and writing .csv files c. Basic plotting III. Learn how to manipulate and plot data using the tidyverse a. Downloading and loading packages b. Using the tidyverse to explore data c. Plotting with ggplot2 Background R is an open-source programming language designed for statistical analysis and plotting data. This language comes equipped with many tools for conducting exploratory data analyses, standard statistical tests, and data visualization. A major benefit to using R is the wide array of functionality and customization available to its users through shareable code written by R ’s user community. These freely available, specialized code collections are called packages. Packages can be freely downloaded from The Comprehensive R Archive Network ( CRAN ) package repository which currently features >18,000 packages with various functions. R is a powerful tool because it provides users with a programming environment for statistics and visualization that is customizable to different fields of research by using these specialized packages. Working directly in R can be challenging, especially when first learning the language. Using an integrated development environment ( IDE ) like RStudio helps with programming by providing interfaces that allow programmers to edit, interpret, and access code through a single graphical user interface ( GUI ) . In today’s lab, you will learn the basics of how to use the R programming language in RStudio by working with a Pokémon dataset. Requirements: You will need to have R and RStudio installed on your computer. R can be downloaded here and RStudio can be downloaded here . Select the correct installer for your operating system. Download the Lab_11 folder here . Sources: https://jcoliver.github.io/learn-r/002-intro-stats.html#solution-to-challenge-2 https://kirstenmorehouse.wordpress.com/354-2/topic-1-crash-course-in-r/ https://r4ds.had.co.nz/wrangle-intro.html https://bookdown.org/ndphillips/YaRrr/the-four-rstudio-windows.html I. The basics of R and RStudio 1
a. Setting up: Opening RStudio and the RStudio interface Begin the lab by first opening RStudio on your device. Navigate to the top left of the screen and go to File > New Project . A new window will appear. In this new window select the second option Existing Directory to associate a project with an existing working directory. Click the Browse button and navigate to the location on your computer where the Lab_11 folder has been extracted. Click on Lab_11 , click open, and then Create Project. Navigate to File and click New File > R script . You should now see four panels displayed in the RStudio application. Your screen should look similar to the image below. Panel 1 (top left) - Source: This panel acts like a basic text editor, and it is where you can write, annotate, and save commands. Text that includes commands can be called code or a script . You can also run commands from this panel by highlighting the relevant script and clicking the Run button on the top right corner of this panel. You can have multiple scripts open at the same time, and each script will have its own tab. Scripts in this panel can be saved as text files by clicking the Save floppy disk icon . Writing, running, and saving the code in this panel provides a shareable record of the commands used to perform an analysis, which facilitates replicability and trouble-shooting. Panel 2 (top right) – Environment/History: This panel includes four tabs, but only the Environment and History tabs will be used in these laboratory exercises. The Environment tab contains information about the objects that are in your working space. The R Environment can also be saved like a text document. This can be done by clicking the Save floppy disk icon on the toolbar of the Environment tab. When all objects in your Environment are saved to an .Rdata file, they can later be loaded as a single unit without having to import all of the data again. The History tab lists all the commands that have been executed, and this can also be saved if needed. 2
Panel 3 (bottom left) - Console: There are three tabs in this panel, but only the Console tab will be used in these laboratory exercises. This is where your R code is executed after being run from the Source panel. This is also the location in the RStudio interface where you will see outputs and any errors or warnings from your commands. If you were to run R without an IDE like RStudio, you would only see a console screen. You can run commands directly through the console by typing a command after the “>” character and pressing enter. The “>” character is referred to as a prompt . When the “>” character is visible it means that R is “waiting” for a command. If a command is currently running, the “>” character will disappear, and a stop sign (red octagon) will become visible in the top right of this panel. You can terminate a command by clicking this stop sign. Panel 4 (bottom right)- Files, plots, packages, and help: This panel includes six tabs, but you likely will only use the Files , Plots , Packages , and Help tabs. The Files tab acts like a file explorer and it shows what files are currently within your working directory. The Plots tab displays any graphics that are created using commands. The Packages tab contains all the available packages currently installed in R on your device. Packages can be loaded or unloaded by clicking the checkbox next to the package name. You can install packages from CRAN in the Packages tab by clicking the Install button on the top left portion of the panel. The Help tab includes detailed information about packages and their functions. b. Objects and functions Data structures, data types, and creating an object: Programming in R involves at least one object and one function . The most conventional way to think about objects is to imagine them as a shortcut to store data so they can easily be used later. Objects can be created in R by using the arrow-like assignment operator “<-” (a less-than sign “<” and a dash “-”) which assigns a name on the left of the operator to an object name on the right of the operator. Operators are characters that perform specific tasks with a piece of code or an object. There are many operators included in R that are used in object assignment, arithmetic calculations, and coding logic. For example: > pi <- 3.1415927 …creates an object called “pi” that stands for the numerical value of pi to 7 decimal places. The direction of the arrow can be swapped as long as it points to the object name. The command… > 3.1415927 -> pi …is equivalent. Object names cannot begin with a number and cannot be the name of an existing function. Using the same object name twice will delete the old object without warning and create a new object with the same name. Create object names that are concise and descriptive. Navigate to the source panel (Panel 1) in RStudio . Create an object named “one” with a value of 1 by first writing the word one on line 1, followed by the assignment operator <-, and the value 1 . Highlight this line by clicking and dragging your cursor and then click the Run button on the top right corner of the Source panel. 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Questions 1. How has the Console tab in panel 3 changed? Once the “one<-1” was successfully run, the console displayed “> one<-1” . 2. How have the Environment and History tabs in Panel 2 changed? The environment now displays the newly created “one” object and its value of 1. The history displays the previously entered command “one<-1” You have successfully created your first R object! The object one with a value of 1 has been stored in the R environment and can now be used later by using this object named one . Objects can have different data structures that depend on how the values in the data are organized. There are six types of data structures in the R Programming language: vectors , lists , data frames , matrices , factors , and arrays . The three most common data structures are vectors , lists , and data frames . In addition to objects having a data structure, the values stored within an object also have different data types . There are three data types in R: numeric , character , and logical . Numeric data are values that are an integer (a round number with no decimal places) or a double (a floating-point number is a number with two or more decimal places). R automatically stores numeric data, even when these data are integers, as a double type because the double type is preferred when performing calculations. Character data may include numbers, letters, and special characters. Logical data are either TRUE or FALSE for some condition. Different data structures and data types are useful for different tasks. Vectors are ordered collections of a single data type and the values that make up a vector are called elements . Example: 1, 2, 3, 4, 5 Lists are objects composed of ordered collections of other objects. Lists can include objects of multiple data types . Example: Name, Age, 4, TRUE Data frames are objects that store tabular data, and these resemble spreadsheets such as those from Microsoft Excel. Data frames typically have more formatting rules than vectors and lists. Each column in a data frame requires a name and each column can only contain one data type . Although a data frame can include different columns with different data types . Additionally, all columns within a data frame need to have the same number of items. Example: Day Year Present Sat 2013 FALSE Sun 2021 TRUE Fri 2020 TRUE A different function is required to enter each data structure type into R , and we will cover some of these functions later. 4
Using functions Knowing an object’s data structure and data type are important because they determine which functions can be used with that object. A function is any piece of code that performs a specific task. Functions require input, which is processed through the function’s code to produce output. Inputs are usually objects. Outputs are the desired product of a function and usually involve manipulation or calculation of the input. R has many functions built into it. Additional functions are available by creating them in R or by installing blocks of code called packages from CRAN and using the functions in those packages. Determining the data structure of the object one using functions: R has several functions that will determine the data structure of an object. The following three functions are simple functions that only require a single input value or “argument”—in this case an object: is.data.frame (), is.vector (), and is.list (). These functions will determine the data structure of an object and it will output either TRUE or FALSE. If TRUE, the object has that data structure, and if FALSE the object does not have that data structure. A function can be executed by typing the name of the function and including the required arguments separated by a comma within the parentheses of the function. Complex functions require more arguments that must be input in specific ways. The order in which the arguments must be listed in complex functions can be found in the Help tab in panel 4 at the bottom right of the RStudio screen. Navigate to the source panel (panel 1) in RStudio . On a new line, type the name of the function is.data.frame () and in between the parentheses of the is.data.frame () function write the name of the object you are investigating: one . Highlight this line by clicking and dragging the line and then click the run button at the top right of the source panel. You’ve just used your first function! Repeat this step for each of the three functions to determine the data structure of the test object: is.data.frame (), is.vector (), and is.list (). Questions 3. What is the output in the Console for each of the three functions you ran? What does this mean? > is.data.frame(one) [1] FALSE > is.vector(one) [1] TRUE > is.list(one) [1] FALSE” . This means that the “one” object is neither a data frame nor a list, but is in fact a vector. 4. What is the data structure of the object called one ? According to the results from question 3, one is a vector data structure. Create an object with the output of a function and use a new function Objects can also be used to store the output of functions for later use. Navigate to the source panel (panel 1) in RStudio . On a new line, create an object called dataframe_one_result to store the output of the function is.data.frame ( one ) . Highlight this line by clicking and dragging the cursor, then clicking the run button at the top right of the Source panel. 5
Now use the function typeof () on the object one and dataframe_one_result. First, determine the function’s description and arguments by navigating to the Source panel (panel 1) in RStudio . On a new line, type the name of the function typeof () and add a “?” directly in front of the function name. Make sure the question mark is touching the name of the function. Highlight this line by clicking and dragging the line and then click the run button . To learn more about a function, run the function name with no arguments and a question mark in front of it to pull up documentation in the Help tab in the bottom right panel of RStudio . This can help by providing the description, usage, arguments, and examples of the function. Questions 5. What does the typeof () function do? The typeof() function identifies the object type of the argument 6. What is the output for the typeof () function when using the one object as the argument? [1] “double” 7. What is the output for the typeof () function when using the dataframe_one_result object as the argument? [1] “logical” 8. What is the data type for the one object and the dataframe_one_result object? The one object is a double numerical object, the dataframe_one_result is a logical object Directories and paths So far you have learned the basics of data structure and type, how to create objects, and how to use functions. The next step is to understand how to load datasets into R. The data will likely come from an external source like a spreadsheet. The Lab_11 folder and its files were selected earlier in this exercise when you created the Lab 12 RStudio project. Think of a computer as a collection of files and programs in a large, nested series of folders. The biggest folder holding everything else is called the root directory . A directory is the folder or location in the computer where files are stored, and are represented as folders in Windows, Mac, and Linux GUIs. For R / RStudio to locate and load files from your computer, it needs to know where they are located. Therefore, a working directory must be specified: the single directory where R will look for input files and write output files. R is only able to find files in the working directory if you provide it with directions to that directory through a path , which is like an address within the file structure of your computer . The series of folders-within-folders that one must open to get to a particular file is the path —basically, a list of the folder names from the root to a particular file or directory separated by forward slashes (/). You can determine what your current working directory is by running the getwd () function, which does not require an argument, meaning it will run with nothing in the parentheses. This will tell you the path to the directory that R/RStudio is working in. Navigate to the Source panel (panel 1) in RStudio . On a new line type getwd () and run the function. Questions 6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
9. What is the output of this function? > setwd("C:\\Users\\p\\Desktop\\Lab_11\\data"),i set my wd to this in the beginning of the lab for workflow efficiency 10. What is the difference between a directory and a path? Directory is the location of the target folder in the computers storage, the path are the directions and folders that you need to enter to eventually the target directory. You can also use the Files tab in the bottom right panel to see what other files and folders are currently in the working directory. Navigate to the Files tab in panel 4. Question 11. What other folders and files are included in your working directory? In my current directory that I chose for myself for efficiency's sake I only have the required pokemon dataset. Sometimes you need to change your working directory, and this can be done with the setwd () function which takes a path as an argument—in other words, type the path inside the parentheses of setwd (). Paths are considered text strings, and must be enclosed in double quotes (“”). R cannot recognize a path (or any other text string) unless it is surrounded by double quotes. You will be using the dataset Pokemon_dataset.csv in the data folder today. Use the Files tab in the bottom right panel to find the location of the file. To provide directions for R to load the file from your computer, you need to provide the path to the file and the file name as a string. Most modern operating systems allow computers to have separate accounts with separate Desktops, Documents, Downloads, and other directories for each user. The tilde (~) is a shortcut for the current user’s home directory on a Mac or Linux computer—the highest directory for the currently active account, and which contains that user’s Desktop, Documents, Downloads, and other directories. Remember that double quotes around the path are essential. Therefore, one possible function to set the working directory (depending where the folder is located on the computer) might be. >setwd(“~/Downloads/Lab_11”) On a Windows machine, this might be: >setwd(“C:/Users/YourUserName/Downloads/Lab_11/”) Note that the true path on your computer will depend on where you downloaded the file. Windows machines provide the path with backslashes (\) but R expects forward slashes (/). Question 12. What is the path to this file on your computer? "C:/Users/p/Desktop/Lab_11/data" 7
As mentioned in question 9 Load a dataset using a function and store it as an object: Now that you know where the directory with the course data files are located, you can use a function to load the file into R . The function to do this for a comma-separated value ( csv ) file is called read.csv (). To use this function to import this dataset you need to provide one argument: a string with the path to the file and the file name. Pokemon_dataset.csv should be located within the data folder, which is a subdirectory in the current working directory. You therefore do not need to provide the full path to instruct R to locate the file. Instead of the Windows path “C:Users/YourUserName/Desktop/Lab_11/data/Pokemon_dataset.csv” or the Mac/Linux path "~/Desktop/Lab_11/data/Pokemon_dataset.csv”, it can instead be “data/Pokemon_dataset.csv” because R assumes you are already working from the Lab_11 folder when you created the RStudio project in that folder. Navigate to the Source panel (panel 1) in RStudio . On a new line type read.csv () and input the path argument within the function (between the parentheses). Run this line of code. Question 13. What is the output of the read.csv () function? The contents of the file being read was displayed. You should have noticed that the read.csv () function will read the data file and output it on the Console but that it will not automatically store the file in an object within the listed Environment tab. On a new line, create an object named Pokemon_dataset to store the output of the read.csv () function. Questions 14. What is the output of the read.csv () function now? Same thing as in question 13 15. Has anything changed in the Environment tab in the top right panel? The pokemon data set object and its data are displayed in the environment 16. Click the words Pokemon_dataset object under Data in the Environment tab (not the blue arrow). What do you now see in the Source panel (Panel 1)? A spreadsheet of all the data. You can open data frame objects and see them within the source panel. These files will resemble a Microsoft Excel spreadsheet. The Pokemon_dataset and this viewing window even retain some functionalities of a program like Microsoft Excel by having a search and filter option to explore the dataset. Searching or filtering the data from this window will not make any permanent changes to the object but are a convenient way to explore the structure and contents of a dataset. Explore a dataset using some basic R functions: Sometimes a dataset will be too large to view all of it in the source panel. Use the head () and tail () functions on the Pokemon_dataset object to get a glimpse of the data stored in the object. Questions 17. What is the output of the head () function? 8
Section EMPLID Borough Weather Species 1 4PS 23925061 Bronx Partly Cloudy Bulbasaur 2 4PS 23925061 Bronx Partly Cloudy Lunatone 3 4PS 23925061 Bronx Partly Cloudy Solrock 4 4PS 23925061 Bronx Partly Cloudy Aron 5 4PS 23925061 Bronx Partly Cloudy Pikipek 6 4PS 23925061 Bronx Partly Cloudy Solrock CP HP Weight_kg Type_1 Type_2 Height_m 1 658 12 9.09 Grass Poison 0.78 2 98 68 117.22 Rock Psychic 0.80 3 426 77 269.95 Rock Psychic 1.53 4 68 54 37.84 Steel Rock 0.35 5 128 39 1.03 Normal Flying 0.31 6 89 82 117.2 Rock Psychic 0.97 Location.Caught 1 New York, NY, United States 2 New York, NY, United States 3 New York, NY, United States 4 New York, NY, United States 5 New York, NY, United States 6 New York, NY, United States 18. What is the output of the tail () function? > tail(PD) Section EMPLID Borough Weather Species CP HP 776 4PS 23966294 Queens Rainy Rhyhorn 165 49 777 4PS 23966294 Queens Rainy Wailmer 174 84 778 4PS 23966294 Queens Rainy Doduo 284 48 779 4PS 23966294 Queens Rainy Haunter 129 28 780 4PS 23966294 Queens Rainy Tynamo 52 26 781 4PS 23966294 Queens Rainy Spearow 10 11 Weight_kg Type_1 Type_2 Height_m 776 89.39 Ground Rock 0.77 777 113.38 Water <NA> 1.84 778 34.66 Normal Flying 1.26 779 0.09 Ghost Poison 1.44 780 0.28 Electric <NA> 0.20 781 2.11 Normal Flying 0.29 Location.Caught 776 New York, New York 777 New York, New York 778 New York, New York 779 New York, New York 780 New York, New York 781 New York, New York 19. What are the names of the columns in the Pokemon_dataset ? Section EMP LID Boro ugh Wea ther Spec ies CP HP Wei ght_ kg Type _1 Type _2 Height_m Locatio n Caught 9
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
These functions are useful for seeing the first entries (head) and the last entries (tail) in a dataset to give you an idea of what these data look like. Sometimes it is also useful to isolate a column of a data frame to understand its values and properties. You learned previously that a data frame can have columns with different data types. You can isolate a column in a data frame by using the “$” operator. When columns of a data frame are isolated, they are considered vectors since they are a single collection of values of the same data type. Question 20. What happens when you run the following code: Pokemon_dataset$HP ? The console displays all the HP values in the read CSV Using what you know about how to determine data types and how to isolate columns, determine the data type of each column in the Pokemon_dataset object. Hint: use a function on a column. Question 21. What are the data types of the Species, HP, and Height_m columns? double, character, and double, respectively Using the head () and tail () functions will give you a glimpse of the data in a data frame object, and you can isolate columns by using the “$” operator. But there are other R functions that may be helpful in exploring the data. The max () and min () functions return the maximum and minimum values of a numeric collection of values. Questions 22. Using the min () and max () functions, determine the minimum and maximum values for all columns with numeric or integer data types in the Pokemon_dataset. CP: min,10,max,920 HP: min,4.78,max,122 Weight,Kg: min,0.09,max,98.42 Height: [NA} for both min and max 23. What happens when you use these functions on a column with a non-numeric data type-such as Pokemon_dataset$Species? Max: Zigazagoon Min: Abra The max () and min () functions will also work with columns that contain non-numeric data types. In instances where the data type is non-numeric, the max () and min () functions will first sort the values in the column alphabetically. The max () function will output the last value of the sorted column, while the min () function will output the first value of the sorted column. These functions 10
do not give you much information about the data in this scenario. It might be more useful to know the number of unique values within the species column. You can do this using the unique () function. This function will look through a column and output the different values that occur within the column. 24. Use the unique () function to determine the different species in the Pokemon_dataset object. There are 179 unique species. 1] "Bulbasaur" "Lunatone" "Solrock" [4] "Aron" "Pikipek" "Charmander" [7] "Yungoos" "Horsea" "Pidove" [10] "Girafarig" "Lillipup" "Magnemite" [13] "Anorith" "Bidoof" "Wimpod" [16] "Eevee" "Magikarp" "Skitty" [19] "Maril" "Bronzor" "Bellsprout" [22] "Snubbull" "Starly" "Oddish" [25] "Seviper" "Hoothoot" "Lotad" [28] "Makuhita" "Zigzagoon" "Yanma" [31] "Buneary" "Barboach" "Bouffalant" [34] "Seaking" "Sceptile" "Petilil" [37] "Froakie" "Natu" "Tauros" [40] "Glameow" "Porygon" "Tentacool" [43] "Panpour" "Hippopotas" "Bunnelby" [46] "Chespin" "Electabuzz" "Torchic" [49] "Squirtle" "Stufful" "Aipom" [52] "Staravia" "Goldeen" "Krabby" [55] "Pinsir" "Sudowoodo" "Meditite" [58] "Kricketot" "Hitmontop" "Woobat" [61] "Karrablast" "Ledyba" "Rhyhorn" [64] "Drifloon" "Fletchling" "Wailmer" [67] "Koffing" "Combee" "Gothita" [70] "Plusle" "Minccino" "Ledian" [73] "Mudkip" "Swablu" "Pikachu" [76] "Skorupi" "Grimer" "Voltorb" [79] "Geodude" "Castform" "Gligar" [82] "Sandslash" "Chinchou" "Seedot" [85] "Growlithe" "Dunsparce" "Lileep" [88] "Numel" "Ducklett" "Sentret" [91] "Deerling" "Croconaw" "Sandshrew" [94] "Diglett" "Wooper" "Whismur" [97] "Trapinch" "Drifblim" "Throh" [100] "Joltik" "Stunfisk" "Foongus" [103] "Electrike" "Burmy" "Shelmet" [106] "Nidoran" "Machop" "Grovyle" [109] "Hoppip" "Cubone" "Miltank" [112] "Tyrunt" "Hitmonchan" "Ampharos" [115] "Tangela" "Mareep" "Totodile" [118] "Sunkern" "Chikorita" "Spearow" [121] "Cyndaquil" "Popplio" "Stantler" [124] "Jigglypuff" "Shinx" "Sharpedo" [127] "Vulpix" "Wingull" "Dweeble" [130] "Munna" "Abra" "Drowzee" [133] "Ralts" "Elgyem" "Turtwig" 11
[136] "Solosis" "Kadabra" "Spoink" [139] "Slowpoke" "Sewaddle" "Kirlia" [142] "Staryu" "Cosmog" "Ekans" [145] "Weepinbell" "Ditto" "Stunky" [148] "Fletchinder" "Linoone" "Jolteon" [151] "Flaaffy" "Tympole" "Archen" [154] "Lanturn" "Dedenne" "Shiftry" [157] "Larvitar" "Inkay" "Carvanha" [160] "Croagunk" "Dewpider" "Sewadle" [163] "Shroomish" "Doduo" "Fearow" [166] "Phanpy" "Smeargle" "Drilbur" [169] "Roggenrola " "Pidgeotto" "Scyther" [172] "Swinub" "Helioptile" "Kakuna" [175] "Duskull" "Seadra" "Litwick" [178] "Haunter" "Tynamo" Functions can also be nested within each other. This means you can put one function directly within another function so that the output of the internal function becomes the input of the external function and this is run in one line of code. This can be an easy way to run multiple functions at once, but it makes reading the code harder if you need to revisit it to determine what was done in your analysis. Question 25. Use the length () function—which counts the number of elements in a vector— together with the unique () function to determine the number of species in this dataset. 179 unique species This dataset is simple, and you can count the number of unique Pokémon without needing to use the length () and unique () functions. However, this is not always the case, and sometimes combining functions by nesting them is extremely useful. The table () function combines the functionality of the length () and unique () functions by counting the number of unique occurrences. Question 26. Use the table () function to find the number of occurrences of the unique values in the Type_1 column. What do you see? Bug Dark Electric Fairy Fighting Fire 47 3 36 5 22 28 Ghost grass Grass Ground Ice Normal 6 3 50 27 1 151 Poison Psychic Rock Steel Water 22 240 32 7 100 12
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
There appears to be two values for the same grass-type category. R is case sensitive and it thinks that “grass” and “Grass” are two different things . This is an important concept to grasp: R cannot read text like a human. Capital and lowercase letters are not interchangeable. Similarly, “grass”, “ grass”, and “grass ” with spaces before or after a word (called a leading space and a trailing space , respectively) are recognized as three different values by R . In the real-world, datasets are messy. Text values might be capitalized inconsistently, misspelled, or include leading and trailing spaces. Double values (numbers with decimal places) might have been recorded with differing numbers of significant figures, etc. If these inconsistencies are not corrected, R will not be able to determine the correct number of unique values and analyses based on these categories will be incorrect. It is possible to replace all occurrences of the “grass” value with “Grass” in column Type_1. The replace () function can be used to replace values, and this function takes three arguments. The first argument is a vector that includes the targeted values for replacement. The second argument specifies which values need to be replaced. The last argument specifies the value that will be used as the replacement. Grass is a categorical value and a text string, which means it needs to be surrounded by double quotes or R will think it is an object. Run the code below: replace(Pokemon_dataset$Type_1, Pokemon_dataset$Type_1=="grass","Grass") Questions 27. What happens? The grass title was replaced with Grass 28. Open the Pokemon_dataset in the Environment tab. How is the file different? The capitalized Grass is present instead of “grass” In this line of code, you first provide the replace () function with the column to edit ( Pokemon_dataset $Type_1). Next, you specify which values to replace within the column ( Pokemon_dataset $Type_1=="grass"). In this case, you are using a condition that tells R to replace values when the values from the Pokemon_dataset in column Type_1 are equal to the “grass” string. The final argument “Grass” tells R that when values in the Type_1 column are equal to the “grass” string they should be replaced with the “Grass” string. The replace () function does not change the original dataset and it outputs the edited column as a vector in the console. These changes will only be saved if you store them in an object or column. In this case, we can store the changes back in the Type_1 column by using the “<-“ operator. R will perform operations separate from the data and changes will only be saved when you tell R to save them by storing them with the <- assignment operator. Run the code below. 13
Pokemon_dataset$Type_1 <- replace(Pokemon_dataset$Type_1,Pokemon_dataset$Type_1=="grass","Grass") Questions 29. What happened? This object was created with the updated Grass spelling 30. Open the Pokemon_dataset in the Environment tab. Is the file different? Yes the file is modified to reflect the changes done in the previous questions Now use the table () function to determine the unique number of Pokemon types in the Type_1 column. Questions 31. How many Pokémon types are there? 16 32. Is this number different than before? It was originally 17 but the combination of the grass types caused the number to drop down to 16. Now that you have changed a data frame object and saved the changes to the object, you may want to export the data. Data frame objects can be exported by using the write.csv () function. This function usually takes three arguments which include the name of the object, the name and path of the file you wish to save, and whether you want to save the file with row numbers. If you only provide the name of the output file in this function, R will assume that the file should be exported in the working directory. Run the code below to save the Pokémon data frame to your computer. write.csv(Pokemon_dataset, "Pokemon_dataset_edited.csv", row.names=FALSE) Question 33. What do you see, and where was the file saved? The newly created edited file was saved in the previously set WD Locate the file on your computer and open it. The file should look similar to the original .csv file imported into R . File names are important, especially when you need to retrace your steps so make sure that file names are concise and informative like object names. It’s useful to also include dates within the names of files. It is important to track and manage changes that have been made to code or to files when conducting analyses and this process is known as version control. Creating objects and files with informative names is one way to track and manage changes. Adequate names can provide the user with information about the content of the object/file, when it was last edited, and how it has been changed in comparison to another version of the file. It’s preferable in coding to have multiple versions of a file or object instead of one version that is constantly overwritten. Most notably version control enables the user to go back to an older version of a file or an object if something goes wrong. 14
c. Basic plotting R has basic plotting functions that visualize data in various ways. Visual exploration of data helps identify patterns within it. You previously counted the number of unique values in specific columns. You can display the same data graphically using a histogram. Histogram plots show the frequency distribution of a numerical variable, the values of which are frequently grouped into ranges called bins . The x-axis on a histogram plot represents the variable of interest, and the y- axis represents the counts of each variable value. Use the plotting function hist () to create a histogram for the numeric column HP. Hint: use the hist function on the column HP in the object Pokemon_dataset . R should have plotted this histogram in the bottom right panel in the Plot tab. Questions 34. Is the plot of HP normally distributed? There is a greater distribution of pokemon is 10-70 HP, 35. What is the most common HP range and how many individuals have values in this range? There is a greatest distribution at 10-20 HP. Create a histogram for the CP column. Question 36. What is the most common CP range and how many individuals values in this range? The most common CP range is at 100 to 400 pokemon. Once a plot is shown in the bottom right panel it can be saved as an image by clicking the “Export” button in the top toolbar of the panel. Click the Export button, and then select “Save as Image” from the dropdown menu. A new window should appear and in this new window you can specify how to save the image. You can change the image file format, name, and dimensions in this window. You can also edit the directory that will contain the image once exported. Activity & Question 37. Change the image format to PNG, the file name to “CP_histogram” and then click the “Save” button in the bottom right of the window. The plot should have been saved as “CP_histogram.png” in the current working directory. Navigate to that folder in your computer and open the image. Done. 38. Does this look like the same plot shown in RStudio? Yes. 15
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
III. Manipulate and plot data using the tidyverse a. Downloading and loading packages R has numerous, preinstalled functions, but its utility can be greatly expanded by installing packages with RStudio . The tidyverse is a collection of packages that are frequently used in data science. There are many free, online guides to help users implement the functions in the tidyverse, which makes it easy to use. The packages within the tidyverse are integrated to provide users with many functionalities for data visualization, exploration, and analysis. However, tidyverse functions are written and function differently than in basic R . These slight differences make the code easier to read and understand. Navigate to the bottom right panel in the RStudio screen and click the Packages tab. In the top left corner of this panel click the install button. A new window will pop up showing (1) where packages are being installed from (CRAN), and (2) a prompt of the names of packages to install. Type tidyverse into this prompt and then click Install, after making sure that the box Install 16
dependencies is checked. Dependencies are other packages that are used by the one you’re installing. This should initiate the installation process for the tidyverse collection of packages which will be visible in the console. Once the “>” character reappears in the console, it means that the installation process has completed. If everything worked correctly, text in the console should give the path to the downloaded tidyverse packages. Sometimes R will issue errors or warnings in red text in the Console, but don’t worry about these unless the installation was unsuccessful. Unsuccessful installations tend to occur when R or RStudio versions are outdated. Now that the package is installed, it must be loaded before it can be used. This can be done in the bottom right panel under the Packages tab. There should be an extensive list of installed packages under the Packages tab. This list includes package names, a short description, and the package version. Use the top right search bar to locate the tidyverse package. Check the box to the left of the tidyverse name-this should load the collection of tidyverse packages and you should see this in the console panel on the bottom left. Alternatively, you can also use the function library () to load packages, and the argument the library () function takes is the name of the package. You can also unload packages by using the detach () function. b. Using the tidyverse to explore and manipulate data The implementation of the pipe operator %>% in the tidyverse is one of the ways that this compilation of packages makes R code easier to read. This operator takes the output of a function on its left and passes it into another function on the right as an input. The benefit of using the pipe is that it allows multiple functions to be run together with the same object without the clutter of nested functions. Unlike nesting functions, the code created using the pipe is also similar to English text because it is read from left to write. This is the code for counting the number of Abra in the Pokemon dataset using basic R . nrow(Pokemon_dataset[(Pokemon_dataset$Species=="Abra"),]) This is the code for counting the number of Abra in the dataset using the tidyverse. Pokemon_dataset %>% filter(Species=="Abra") %>% count() Some of these functions and conditions have not been introduced. The important difference between these lines is the simplicity and readability of tidyverse code as multiple functions are implemented at once, in contrast with the messier, nested code in basic R code. The nrow () function counts the number of rows in a data frame. Use this function to determine the number of rows in the Pokemon_dataset object. Now, use the %>% pipe and the nrow () function to determine the number of rows in the Pokemon_dataset object. 17
Question 39. How do the outputs of these two functions differ, if at all? One uses tidyverse and the other does not. This is the only difference in mechanism, they both give the same result. The tidyverse can be used to perform all functions previously covered as well as more complex versions of these functions in easier-to-interpret code. If you need to isolate a column you can use the $ operator or you can use the select () function from the tidyverse. The select () function can be run by itself and it will take a data frame object and selection criteria as arguments. However, when run using a data frame object via the pipe operator, the only argument required is the selection criteria. The selection criteria in this scenario would be the name of the column, which tells R to only include that column. Use the Pokemon_dataset, the pipe operator, and the select () function to isolate the HP column. Question 40. What does this do? PD %>% select("HP"), this isolates the HP column in the pokemon data set One benefit of using the select () function is its ability to isolate multiple columns at the same time by using the names of the desired columns as arguments (selection criteria) separated by a comma. This allows you to create a dataset comprised of columns of interest in a single line of code. The only difference between the output of the $ operator and the select () function is the data structure. When columns of a data frame are isolated using the $ operator they are considered vectors. When columns of a data frame are isolated using the select () function they are considered data frames. Therefore, the $ operator should be used when isolating a single column and the select () function should be used when creating a data frame with multiple columns. Use the Pokemon_dataset object and the select () function to create a data frame with the columns Species, HP, CP, Weather, and Borough. Use the “<-“ operator to store this data frame in an object called Pokemon_data_subset. When counting the number of unique values in the Type_1 column in the Pokemon_dataset, you used the table () function. Direct usage of the output of the table () function would require that the object be modified. The same functionality can be achieved in the tidyverse using the count () function. The count () function also has the added benefit of easily working with the pipe operator and other data frame manipulations. Like the select () function, it can be used independently of the pipe operator and it will accept a data frame object as an argument. This function can also take additional arguments to determine which values to count, such as only counting values from a certain column. When the count () function is used with a pipe operator, it can accept column names as arguments like the select () function. However, adding multiple column names will 18
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
count unique combinations of the values among those columns instead of separately counting the unique values of each column. Use the Pokemon_data_subset dataset, the pipe operator, and the count () function to count the number of unique species in this dataset. Question 41. How many individuals of Natu are in this dataset? Pokemon_data_subset<-PD %>% select("Species","HP","CP","Weather","Borough") Pokemon_data_subset %>% filter(Species=="Natu") %>% count() Using these commands I got 25 Natu Now use the Pokemon_data_subset dataset, the pipe operator, and the count() function to count the number of Natu collected in Manhattan. Pokemon_data_subset %>% filter(Species=="Natu") %>% filter(Borough=="Manhattan") %>% count() I got 15 Questions 42. How many individuals of Spoink were collected in Queens? I got 3 Spoinks. 43. How is the output for Spoink different from the output for Natu? There is a difference of 1 from Queens Spoinks (3) and Queens Natu (4) You probably noticed when counting the unique boroughs that there were multiple values in the output that represented the borough of Queens, the “Queens” value and the “queens” value. R treated these are unique values because it is incapable of knowing that “Queens” and “queens” refer to the same borough due to the differences in the capitalization of Q. This resulted in an incorrect count for any combination of the species and borough that involved these values. Previously, you were able to edit the value for Grass in the Type_1 column of the Pokemon_dataset to fix a similar issue. The replace () function can replace the values but the changes done by this function need to be saved to the object before the edited dataset can be used again. You can use the mutate () function from the tidyverse to save the changes created by the replace () function to the dataset while also running other functions in the same line of code using the pipe operator. Without the mutate () function, you would have to use one line of code to replace and save values edited with replace () and another line of code to run additional functions. The name of the column that will be changed, and the replace () function with the required replace () function arguments are used as the inputs for mutate (). Run the code below to replace the “queens” value within the Borough column with the “Queens” value. Notice that the output (an edited data frame object) is still saved to an object named Pokemon_edited_Borough. 19
Pokemon_edited_Borough <- Pokemon_data_subset %>% mutate(Borough = replace(Borough, Borough == "queens", "Queens")) Count the number of individuals per Borough in the Pokemon_dataset object and then count the number of individuals per Borough in the Pokemon_edited_Borough object. Question 44. How have the counts changed? The edited data set had 142 while the old set had 112. Pokemon_data_subset %>% filter(Borough=="Queens") %>% count() Working with large datasets often requires creation of a data subset that includes specific columns or values for analyses. The select () function allows selection of a subset of columns to retain in a data subset. To create a dataset with a limited range of values, use the filter () function, which subsets rows of a data frame based on conditions provided as arguments. When used independently, the filter () function takes a data frame and a condition as arguments. When used with a dataset and the pipe operator, only the condition for filtering rows is required as an argument. There are many operators that are useful for creating expressions. An expression is a combination of operators, variables, and objects that can set up conditions for R to interpret. Commonly used operators for creating expressions can be found under the “Useful filter functions” in the Help tab of the filter () function. One operator that has been used as a condition previously in this lab is the “==” operator, which is represented by two equal signs. This operator has two equal signs because the operator “=” is usually reserved for assigning parameters within a function-like the “<-“ operator. The “==” operator compares two values to determine if they are the same value. The “==” operator was used to establish the condition for the replace () function in the code above. The section of the replace () function that established the condition for the replacement of the variable (Borough == “queens”) is telling R when a value from the column Borough is equal to the string “queens” it should be replaced. Use the Pokemon_edited_Borough dataset, the pipe operator, and the filter () function to filter the dataset so it only includes Spoink. The condition should tell R to filter the dataset to include only data from samples in which the value from the Species column is equal to the string Spoink . Save this data frame as an object named Spoink . Use the Spoink dataset, the pipe operator, and the count () function to determine how many Spoink are from each borough. Question 45. How is this output different than the previous count using the Pokemon_data_subset ? The overall count is the same but queens has 8 total in the edited data rather than the separated counts of Queens and queens in the original unedited data set. 20
Pokemon_edited_Borough %>% filter(Borough=="Bronx") %>% filter(Species=="Spoink") %>% count() spoink<-Pokemon_edited_Borough%>% filter(Species=="Spoink")%>%count() Spoink count To use multiple conditions within the same filter () function, separate each condition with a comma. This will filter the dataset assuming all the listed conditions are met, and this is the logical equivalent of saying and . This and logic can also be represented with the “&” symbol. You can also use the symbol “|” when filtering data, and this is the logical equivalent of or . Some other useful operators that can be used for creating conditions are based on simple mathematical concepts such as the “>” sign or the “<” sign and the “>=” or “<=” signs. These operators compare values and retain the same logic as in mathematics. When used to compare two values a TRUE/FALSE statement will be output depending on whether the condition being tested in true or false. Examples: Running the following lines of code will output a TRUE/FALSE statement based on whether the condition being tested is true or false. 1>0, results in TRUE because 1 is greater than 0. 1<0, results in FALSE because 1 is not less than 0. 5>=1, results in TRUE because 5 is greater than or equal to 1. 5<=1, results in FALSE because 5 is not less than or equal to 1. Operators can also be used to compare between object values. This does not change the output of the comparison because R automatically treats the object as the object value when testing conditions. Examples: There is no difference between testing conditions between the value 3 and an object with an assigned value of 3. 1>3, results in FALSE because 3 is greater than 1. Random_value<-3, creating an object with value 3 to test the condition. 1>Random_value, results in FALSE because Random_value = 3 and 3 is greater than 1. Operators can also be used to establish logical conditions, or rules that must be followed. The “|” and the “&” operators can be used to create the “or” and “and” conditions. Examples: Running the following line of code will filter the dataset Spoink to include rows when the Species is Spoink OR when the weather is Rainy. Spoink %>% filter(Species=="Spoink" | Weather=="Rainy") 21
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
This means that anytime a row contains a Species of Spoink it will be included or anytime the Weather is Rainy that row will be included. This sets up a condition where your subset can include all Spoink even when the weather is not Rainy. It also means that your subset can include all species caught in Rainy weather even if they are not Spoink. The “|” operator therefore creates a condition to get all rows with Spoink and all rows with Rainy weather. Running the following line of code will filter the dataset Spoink to include rows when the Species is Spoink AND when the weather is Rainy. Spoink %>% filter(Species=="Spoink" & Weather=="Rainy") This means that a row containing a Species of Spoink will only be included when that row also has the Rainy weather. This sets up a condition where your subset can only include Spoink collected in the Rain. This means that Spoink collected in other weather conditions are excluded. It also means that other species caught in Rainy weather are excluded. The “&” operator therefore creates a condition to only get rows with Spoink caught in Rainy weather. Problems 46. Use the Spoink object, the pipe operator, and the count () function to determine how many Spoink were collected in the Bronx that have a CP value greater than 10. There is only one Spoink above CP>10 for the Bronx. 47. Use the Spoink dataset, the pipe operator, and the count () function to determine how many Spoink are from the Bronx or have a CP value greater-than 10. I got 33. 48. Which dataset condition has the most individuals? Why? The OR condition has the most individuals as the criterias are more inclusive than the & code used in #46. spoink%>%filter(Borough=="Bronx")%>%count() Abra<-Pokemon_edited_Borough%>%filter(Species=="Abra")%> %filter(Borough=="Manhattan"|Borough=="Queens")%>%count() c. Plotting with ggplot2 The tidyverse includes a package that can create plots that are more elaborate than basic R . This is done with the package ggplot2, which provides many layers of customization. A great benefit to using ggplot2 is the wide array of documentation available to assist with creating plots. The R Graph Gallery (found here ) is one of these resources and it contains all the plots that can be created with ggplot2 and the code used to generate those plots. All plots created using the ggplot2 package use the ggplot () function, and like all things in the tidyverse, you can feed objects directly to the function using the pipe operator. The arguments that the ggplot () will take depend on the type of plot, although this usually involves providing the column name in the dataset that represents the x and y variables of the plot. The ggplot2 package also has its own version of the pipe operator represented by the “+” symbol. In tidyverse you can use the pipe operator to string together multiple functions to create 22
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
an output, in the ggplot2 package you can use the “+” symbol to string together multiple plotting features in the ggplot () function to create a plot. Every plot created in the ggplot2 package is composed of at least three aspects. The first is the dataset that will be used for plotting. The second is the ggplot () function with specified arguments that include at least one variable (x or y). The third is a geom command added to the ggplot () function which specifies how to use the provided data to create a plot. Previously you used the hist () function to create a histogram and you specified the value to plot (in this case the column HP) as the argument. Using the code below, you will create a histogram for the HP values of the Spoink object using the ggplot () function. Spoink %>% ggplot(aes(x=HP)) + geom_histogram() + theme_bw() Notice how you are feeding the Spoink object into the ggplot () function. Within the ggplot () function you are specifying that the x values of the plot should come from the HP column. The aes () within ggplot () is like a function except it contains the aesthetic mappings of the plot. This tells ggplo t() which parts of the dataset should be used to create the plot. The + is then used to connect the geom_histogram () geom to the ggplot () function. This specifies to the ggplot () function that you will create a histogram. The + symbol can be used to connect multiple features to the ggplot () function. The code theme_bw () is a default theme to beautify the plot. Countless modifications can be performed to a plot: changes to the titles, colors, font size, text, and much more! You can explore the different types of geoms available and the different themes available under the help section of the ggplot () function. Final Exercise: Creating, manipulating, and plotting a dataset using the tidyverse: Now is the time to apply what you have learned throughout the lab by answering the following questions and performing the following tasks. 1. First, create a subset of the Pokemon_edited_Borough object called Abra with the following features: Species: Abra , Borough: Manhattan or Queens , CP: Less than 100, Weather: Any, and HP: Any value. Export this dataset as a .csv file named Abra.csv to your working directory. 2. Answer the following questions using the Abra dataset: a. Which columns include numeric values? HP & CP only b. Count the number of individuals per each value of the Weather column. three cloudy, one partly cloudy, eight rainy, and two windy. c. How many Abra were collected during Rainy weather in Manhattan? There were 6 Abra collected in these conditions d. How many Abra were collected in Manhattan or during Rainy weather? There are 12 Abra collected in these conditions 23
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
3. Using the Abra dataset, create a histogram of the HP column. Export the plot as a PNG file and name it Abra.png to your working directory. Submit the files and questions from 1-3 to your teaching assistant at the end of the lab. setwd("C:\\Users\\p\\Desktop\\Lab_11\\data") one<-1 one is.data.frame(one) is.vector(one) is.list(one) dataframe_one_result<-is.data.frame(one) typeof(one) typeof(dataframe_one_result) ?typeof() getwd() read.csv("Pokemon_dataset.csv") PD<-read.csv("Pokemon_dataset.csv") read.csv("Pokemon_dataset.csv") head(PD) PD tail(PD) pdhp<-PD$HP pds<-PD$Species pdh<-PD$Height_m typeof(pdhp) typeof(pds) typeof(pdh) min(PD$Height_m) max(PD$Height_m) typeof(PD$Weight_kg) upd<-unique(pds) length(upd) table(PD$Type_1) replace(PD$Type_1, PD$Type_1=="grass","Grass") PD$Type_1<-replace(PD$Type_1,PD$Type_1=="grass","Grass") PD$Type_1 length(unique(PD$Type_1)) table(unique(PD$Type_1)) write.csv(PD, "Pokemon_dataset_edited.csv", row.names=FALSE) hist(pdhp) hist(PD$CP) 24
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
nrow(PD) PD %>% filter(Species=="Abra") %>% count() library(tidyverse) PD %>% filter(Species=="Abra") %>% count() Pokemon_data_subset<-PD %>% select("Species","HP","CP","Weather","Borough") Pokemon_data_subset %>% filter(Species=="Spoink") %>% filter(Borough=="Queens") %>% count() Pokemon_data_subset %>% filter(Species=="Natu") %>% filter(Borough=="Queens") %>% count() Pokemon_edited_Borough <- Pokemon_data_subset %>% mutate(Borough = replace(Borough, Borough == "queens", "Queens")) Pokemon_edited_Borough %>% filter(Borough=="Queens") %>% filter(Species=="Spoink") %>% count() Pokemon_edited_Borough %>% filter(Borough=="Manhattan") %>% filter(Species=="Spoink") %>% count() Pokemon_edited_Borough %>% filter(Borough=="Brooklyn") %>% filter(Species=="Spoink") %>% count() Pokemon_edited_Borough %>% filter(Borough=="Staten Island") %>% filter(Species=="Spoink") %>% count() Pokemon_edited_Borough %>% filter(Borough=="Bronx") %>% filter(Species=="Spoink") %>% count() spoink<-Pokemon_edited_Borough%>%filter(Species=="Spoink") spoink%>%filter(Borough=="Bronx"&CP>10) %>% count() spoink%>%filter(Borough=="Bronx") %>% filter(CP>10)%>% count() spoink%>%filter(Borough=="Bronx")%>%count() spoink%>% filter(Borough=="Bronx") %>% filter(Species=="Spoink") %>% filter(CP>10) %>%count() spoink%>% filter(Borough=="Staten Island") %>% filter(Species=="Spoink") %>% filter(CP>10) %>%count() spoink%>% filter(Borough=="Manhattan") %>% filter(Species=="Spoink") %>% filter(CP>10) %>%count() spoink%>% filter(Borough=="Queens") %>% filter(Species=="Spoink") %>% filter(CP>10) %>%count() spoink%>% filter(Borough=="Brooklyn") %>% filter(Species=="Spoink") %>% filter(CP>10) %>%count() spoink%>% filter(Borough=="Bronx"|CP>10) %>% filter(Species=="Spoink") %> %count() Abra<-Pokemon_edited_Borough%>%filter(Species=="Abra")%> %filter(Borough=="Manhattan"|Borough=="Queens")%>%filter(CP<100) write.csv(Abra, "Bio228Abra.csv", row.names=FALSE) Abra%>%filter(Borough=="Manhattan"|Weather=="Rainy")%>%count() Abra%>%filter(Borough=="Manhattan"&Weather=="Rainy")%>%count() hist(Abra$HP) 25
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
26
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Abra%>%filter(Borough=="Manhattan"|Weather=="Rainy")%>%count() Glossary Terms Integrated development environment (IDE): Software that helps programmers develop code by providing utilities for saving, editing, and debugging code. Graphical user interface (GUI): An interface that allows users to perform actions through graphical icons instead of text-based commands. Directory: A particular location on a device where files are stored. The root directory is the location where the source code for running your operating system as well as all other files are 27
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
stored. The working directory is a user defined location in the computer where a program like R will input or output files. Path: A set of directions that informs the operating system/software where in a computer a file is located or where a file should be stored. Script: A document containing a set of coding instructions in a particular coding language. Prompt: In R the prompt is represented by the “>” symbol. This symbol indicates that the console is ready to receive commands. Operators: A character or symbol that represents a specific mathematical or logical action that can be performed with objects. Object: A variable that stores something (example: a dataset) to the R environment for later use in coding or analyses. Data structure: A specific way of organizing data within an object. Three commonly used structures include vectors, lists, and data frames. Data type: The specific method used to store a value within an object which depends on the value. Three commonly used type numeric, character, and logical. Function: A piece of code that performs a task and can be used repeatedly. Functions usually take an argument as an input. This input is then processed through the function so that it can produce an output. Argument: The information provided as an input for a function. This can be an object or a condition. Package: A named collection of R functions, code, and sample data created by R users to perform specific tasks. Operators Basic R The “ <- “ operator “ <- “ assigns a value to an object. The “ = ” operator assigns a value to an argument within a function. The “ $ ” operator subsets a part of a data object in R. For data frames, the “$” operator selects columns. The “ == ” operator tests the condition of equality between two values. The “ > ” and “ < ” operators test if a value is greater/less than another value. The “ <= ” and “ >= ” operators test if a value is greater/less than or equal to another value. The “ & ” operator takes two logical values and returns conditions when both logical values are true. The “ | ” operator takes two logical values and returns conditions when one OR the other logical value is true. Tidyverse The “ %>% ” pipe operator takes the output of one function and inputs it into another function. Ggplot2 The “ + ” addition operator adds additional plotting layers to a ggplot() function plot. Functions 28
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Basic R is.data.frame() checks if an object is a data frame. is.vector() checks if an object is a vector. is.list() checks if an object is a list. typeof() determines the data type of an object. getwd() returns the file path of the current working directory. setwd() changes the current working directory using a path. read.csv() reads a file in table format and then creates a data frame from it. write.csv() saves a data frame object to a file on the computer. head() returns the first few data points for every column of an object. tail() returns the last few data appoints for every column of an object. max() returns the maximum value of all the values present. min() returns the minimum value of all the values present. unique() returns the different values present in a vector. length() determines the length of a vector. table() creates a contingency table of the counts of each character value. replace() replaces the value in an object given additional arguments. hist() plots a histogram of the given data values. library() loads add-on packages so they can be used. detach() unloads add-on packages. nrow() counts the number of rows in an object. Tidyverse select() 29
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
subsets a data frame by column names. count() counts the unique values of one or more variables. mutate() adds the changes from one function as new variables while preserving existing variables. filter() subsets a data frame by retaining all rows that satisfy the conditions that are provided as arguments. Ggplot2 ggplot() creates a ggplot plotting object that can be combined with aesthetic layers via the “+” operator to create a customized plot. 30
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help