compute text file_numeric_summary(textFile: character (n)) -> numeric (6): takes a text file as its input and outputs a numeric of length 6 with the following characteristics of the text file: 1. the number of lines (from the signature we know this is just n) 2. the number of blank lines (i.e. lines that contain nothing or only whitespace) 3. the number of lines that are comments (L.e. lines that starts with "#") 4. the total number of characters in the text file 5. the median line length (i.e. the median number of characters per line) 6. the max line length (i.e. the max number of characters in a line) compute text file_vord_counts (textFile: character (n)) ->data.frane (kx2): takes a text file as its input and outputs a dataframe with k rows and 2 columns where k is the number of distinct "words". Here "words" include English word, variable names, function names, or any string that starts with a letter and contains only alpha-mumerics, periods, and underscores. The first column will consist of the different words and the second columns will be the frequency with which the word appears in the text file. The names of the columns should be Word and Count and it should be sorted by frequency in descending order.

Database System Concepts
7th Edition
ISBN:9780078022159
Author:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Chapter1: Introduction
Section: Chapter Questions
Problem 1PE
icon
Related questions
Question
8:09 PM Fri Sep 16
Canvas Student
instructure inc
INSTALLED
canvas.colorado.edu
Page <
2
> of 2
1. the number of lines (from the signature we know this is just n)
2. the number of blank lines (i.e. lines that contain nothing or only whitespace)
3. the number of lines that are comments (i.e. lines that starts with "#")
4. the total number of characters in the text file
PLO 89%
5. the median line length (ie. the median number of characters per line)
6. the max line length (i.e. the max number of characters in a line)
OPEN
compute_text_file_numeric_summary(textFile: character (n)) -> numeric (6): takes a text file
as its input and outputs a numeric of length 6 with the following characteristics of the text file:
ZOOM +
compute_text_file_vord_counts (textFile: character (n)) ->data.frane (kx2): takes a text
file as its input and outputs a dataframe with k rows and 2 columns where k is the number of
distinct "words". Here "words" include English word, variable names, function names, or any string
that starts with a letter and contains only alpha-numerics, periods, and underscores. The first column
will consist of the different words and the second columns will be the frequency with which the word
appears in the text file. The names of the columns should be Word and Count and it should be sorted
by frequency in descending order.
Transcribed Image Text:8:09 PM Fri Sep 16 Canvas Student instructure inc INSTALLED canvas.colorado.edu Page < 2 > of 2 1. the number of lines (from the signature we know this is just n) 2. the number of blank lines (i.e. lines that contain nothing or only whitespace) 3. the number of lines that are comments (i.e. lines that starts with "#") 4. the total number of characters in the text file PLO 89% 5. the median line length (ie. the median number of characters per line) 6. the max line length (i.e. the max number of characters in a line) OPEN compute_text_file_numeric_summary(textFile: character (n)) -> numeric (6): takes a text file as its input and outputs a numeric of length 6 with the following characteristics of the text file: ZOOM + compute_text_file_vord_counts (textFile: character (n)) ->data.frane (kx2): takes a text file as its input and outputs a dataframe with k rows and 2 columns where k is the number of distinct "words". Here "words" include English word, variable names, function names, or any string that starts with a letter and contains only alpha-numerics, periods, and underscores. The first column will consist of the different words and the second columns will be the frequency with which the word appears in the text file. The names of the columns should be Word and Count and it should be sorted by frequency in descending order.
8:09 PM Fri Sep 16
Canvas Student
Instructure inc
INSTALLED
canvas.colorado.edu
Page <
1
> of 2
89%
OPEN
ZOOM +
Analyzing an .R File
Your task is to produce functions that can be used to summarize the actual code within an R script. We will
work up to the following line of code:
compute_text_file_summary(read_text_file(textFilePath))
We have broken this "monolith" task down into a few "brick" tasks below. However, you may consider
breaking these down even further.
For the read_text_file() function you will need to use the base: :readLines () function, readLines ()
takes in a string that is the file path (either a relative or global) to a text file (or R script) and returns a
character vector where each element corresponds to its respective line of text (or code).
Below is a list of functions for you to construct. For each, we have prescribed a very specific function signature.
When we grade the assignment, we will check to ensure your functions have this signature. Be sure to
incorporate type checks for inputs of your function as well. For instance, if a function's signature denotes a
variable to be numeric and a character is passed instead, your function should throw an error with a helpful
message. See the lecture notes from 2022-09-02 for more on type checking and the function signature syntax
used below.
.
Per usual, to grade your script, we will source() it and check its performance against a variety of units
tests. The official unit tests won't be revealed until after the assignment is graded. However, we have
provided a smaller set of unit tests in the script, grade_hul practice. R, to give you an idea of what we
will be looking for. To use this script, you will need to download the folder. HviTextFiles practice,
containing the example test files. Then, at the top of the script, assign testFilesFolderPath to wherever
the HviTextFiles practice folder is located and huScriptPath to wherever your hw1.R script is located.
Running the rest of the script and reviewing the print out will tell you how your script performed against the
practice unit tests.
Your Task
Create an R script entilted hw1.R that contains the following functions:
read_text_file(textFilePath: character (1), withBlanks: logical (1) TRUE, withComments:
TRUE) -> character (n): as its inputs, it takes in an argument textFilePath that is
logical(1)
a single character and two optional parameters withBlanks and withComments that are both single
logicals; textFilePath is the path to the text file (or R script); if withBlanks and withComments
are set to FALSE, then read_text_file() will return the text file without blank lines (i.e. lines that
contain nothing or only whitespace) and commented (i.e. lines that starts with "#") lines respectively:
it outputs a character vector of length n where each element corresponds to its respective line of
text/code.
• compute_text_file_summary(textFile: character (n)) -> 11st (2): takes in a text file (a
character vector where each element is a line of text/code) as an input and returns a named
list of length 2 as the output; the output lcist will contain the outputs of the below defined
compute_text_file_numeric_summary() and compute_text_file_word_counts() functions with
names NumericSummary and WordCounts respectively.
Transcribed Image Text:8:09 PM Fri Sep 16 Canvas Student Instructure inc INSTALLED canvas.colorado.edu Page < 1 > of 2 89% OPEN ZOOM + Analyzing an .R File Your task is to produce functions that can be used to summarize the actual code within an R script. We will work up to the following line of code: compute_text_file_summary(read_text_file(textFilePath)) We have broken this "monolith" task down into a few "brick" tasks below. However, you may consider breaking these down even further. For the read_text_file() function you will need to use the base: :readLines () function, readLines () takes in a string that is the file path (either a relative or global) to a text file (or R script) and returns a character vector where each element corresponds to its respective line of text (or code). Below is a list of functions for you to construct. For each, we have prescribed a very specific function signature. When we grade the assignment, we will check to ensure your functions have this signature. Be sure to incorporate type checks for inputs of your function as well. For instance, if a function's signature denotes a variable to be numeric and a character is passed instead, your function should throw an error with a helpful message. See the lecture notes from 2022-09-02 for more on type checking and the function signature syntax used below. . Per usual, to grade your script, we will source() it and check its performance against a variety of units tests. The official unit tests won't be revealed until after the assignment is graded. However, we have provided a smaller set of unit tests in the script, grade_hul practice. R, to give you an idea of what we will be looking for. To use this script, you will need to download the folder. HviTextFiles practice, containing the example test files. Then, at the top of the script, assign testFilesFolderPath to wherever the HviTextFiles practice folder is located and huScriptPath to wherever your hw1.R script is located. Running the rest of the script and reviewing the print out will tell you how your script performed against the practice unit tests. Your Task Create an R script entilted hw1.R that contains the following functions: read_text_file(textFilePath: character (1), withBlanks: logical (1) TRUE, withComments: TRUE) -> character (n): as its inputs, it takes in an argument textFilePath that is logical(1) a single character and two optional parameters withBlanks and withComments that are both single logicals; textFilePath is the path to the text file (or R script); if withBlanks and withComments are set to FALSE, then read_text_file() will return the text file without blank lines (i.e. lines that contain nothing or only whitespace) and commented (i.e. lines that starts with "#") lines respectively: it outputs a character vector of length n where each element corresponds to its respective line of text/code. • compute_text_file_summary(textFile: character (n)) -> 11st (2): takes in a text file (a character vector where each element is a line of text/code) as an input and returns a named list of length 2 as the output; the output lcist will contain the outputs of the below defined compute_text_file_numeric_summary() and compute_text_file_word_counts() functions with names NumericSummary and WordCounts respectively.
Expert Solution
steps

Step by step

Solved in 2 steps

Blurred answer
Knowledge Booster
File Input and Output Operations
Learn more about
Need a deep-dive on the concept behind this application? Look no further. Learn more about this topic, computer-science and related others by exploring similar questions and additional content below.
Similar questions
Recommended textbooks for you
Database System Concepts
Database System Concepts
Computer Science
ISBN:
9780078022159
Author:
Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:
McGraw-Hill Education
Starting Out with Python (4th Edition)
Starting Out with Python (4th Edition)
Computer Science
ISBN:
9780134444321
Author:
Tony Gaddis
Publisher:
PEARSON
Digital Fundamentals (11th Edition)
Digital Fundamentals (11th Edition)
Computer Science
ISBN:
9780132737968
Author:
Thomas L. Floyd
Publisher:
PEARSON
C How to Program (8th Edition)
C How to Program (8th Edition)
Computer Science
ISBN:
9780133976892
Author:
Paul J. Deitel, Harvey Deitel
Publisher:
PEARSON
Database Systems: Design, Implementation, & Manag…
Database Systems: Design, Implementation, & Manag…
Computer Science
ISBN:
9781337627900
Author:
Carlos Coronel, Steven Morris
Publisher:
Cengage Learning
Programmable Logic Controllers
Programmable Logic Controllers
Computer Science
ISBN:
9780073373843
Author:
Frank D. Petruzella
Publisher:
McGraw-Hill Education