project3
.pdf
keyboard_arrow_up
School
University of California, Berkeley *
*We aren’t endorsed by this school
Course
C8
Subject
Computer Science
Date
Jul 2, 2024
Type
Pages
12
Uploaded by CoachSheepPerson165
Question 1.2.2
Choose two
different
words in the dataset with a magnitude (absolute value) of correlation
higher than 0.2 and plot a scatter plot with a line of best fit for them. Please do not pick “outer” and “space”
or “san” and “francisco”. The code to plot the scatter plot and line of best fit is given for you, you just need
to calculate the correct values to
r
,
slope
and
intercept
.
Hint 1:
It’s easier to think of words with a positive correlation, i.e. words that are often mentioned together.
Try to think of common phrases or idioms.
Hint 2:
Refer to
Section 15.2
of the textbook for the formulas. For additional past examples of regression,
see Homework 9.
In [62]:
word_x
=
'blue'
word_y
=
'moon'
# These arrays should make your code cleaner!
arr_x
=
movies
.
column(word_x)
arr_y
=
movies
.
column(word_y)
x_su
=
standard_units(arr_x)
y_su
=
standard_units(arr_y)
r
=
np
.
mean(x_su
*
y_su)
slope
=
r
*
np
.
std(arr_y)
/
np
.
std(arr_x)
intercept
=
np
.
mean(arr_y)
-
slope
*
np
.
mean(arr_x)
# DON'T CHANGE THESE LINES OF CODE
movies
.
scatter(word_x, word_y)
max_x
=
max
(movies
.
column(word_x))
plots
.
title(
f"Correlation:
{
r
}
, magnitude greater than .2:
{
abs
(r)
>= 0.2
}
"
)
plots
.
plot([
0
, max_x
* 1.3
], [intercept, intercept
+
slope
*
(max_x
*1.3
)], color
=
'gold'
);
1
2
Question 1.3.1
Draw a horizontal bar chart with two bars that show the proportion of Comedy movies
in each dataset (
train_movies
and
test_movies
). The two bars should be labeled “Training” and “Test”.
Complete the function
comedy_proportion
first; it should help you create the bar chart.
Hint
: Refer to
Section 7.1
of the textbook if you need a refresher on bar charts.
In [66]:
def
comedy_proportion
(table):
# Return the proportion of movies in a table that have the comedy genre.
movie_len
=
table
.
num_rows
movie_group
=
table
.
group(
'Genre'
)
.
where(
'Genre'
, are
.
equal_to(
'comedy'
))
.
column(
'count'
)
.
i
return
movie_group
/
movie_len
# The staff solution took multiple lines.
Start by creating a table.
# If you get stuck, think about what sort of table you need for barh to work
comedy_proportion_t
=
comedy_proportion(train_movies)
comedy_proportion_test
=
comedy_proportion(test_movies)
comedy_tbl
=
Table()
.
with_columns(
'Categories'
, make_array(
'Training'
,
'Test'
),
'Proportions'
,
comedy_tbl
.
barh(
'Categories'
)
3
4
Question 3.1.7
In two sentences or less, describe how you selected your features.
I selected these features because of the slope and looking at other around the middle of the slope which will
help satisfy that these words appear once in at least each movie.
5
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Related Questions
How do we find the data using Phyton? if you can using google collab, that will be easy to understand.
I saw some expert didnt even try to answer the question, just copy paste the answer fom the other site not relating with the question and mark it as solution of this answer.
arrow_forward
Please answer ASAP, provide matlab plot screenshot as well.
arrow_forward
Using the ggpubr package and islands data set, PLEASE in R provide the code and steps for the following
a. For each sample, conduct a sign test to test the hypothesis that the median size of a landmass in the data set is 100 (this is with the data set units, it would correspond to 1,000,000 sq miles). Use a 95% significance level. Include the results from all 5 tests in your report. You should use the binom.test() function for these tests.
b. Do the results of your sign tests differ for the different data subsets? Provide a brief intuitive explanation of your findings.
c. For each sample, conduct a sign-rank test the hypothesis that the median size of a landmass in the data set is 100 (this is with the data set units, it would correspond to 1,000,000 sq miles). Use a 95% significance level. Include the results from all 5 tests in your report.
d. Do the results of your sign-rank tests differ from the results of the sign test for each of your 5 samples? Provide a brief intuitive…
arrow_forward
Fit a straight line to the x and y values, the compute the total standard
deviation, and the correlation coefficient for the data.
Xi
0
0.5
1
1.5
2
2.5
Yi
1
1.25
Q6 CURVE FITTING
2
2.5
3.5
4
arrow_forward
Q1/ plot the radiation.
pattern Gain= Sine cos20
in coordinate XY, XZ,YZ
With illustration on the
drawing, and write all
important relations.
arrow_forward
You fit a trendline of y = 0.5 + 0.1 to your data. What would the residual be for a data point measured as y = 0.9 for an x value of x = 0.7?
QUESTION 10
You have the following data:
Velocity
(m/s)
Time (s)
0.1 3.138991
0.2 3.590478
0.3 3.769523
0.4 4.499995
0.5
4.13742
0.6 3.952264
0.7 5.059469
0.8 4.607308
0.9 5.332076
1 5.618846
Use LINEST in EXCEL or Open Office to find the 2 sigma error on the initial velocity.
С 24 m/s
0.4 m/s
0.3 m/s
0.2 m/s
arrow_forward
Hi I am needing to figure out how to put the frequency's in the cells.. Im not even sure if im saying that right .. but Im using pivot table and i have the rows and columns displaying the correct bin info. but the totals are the amounts of tuition not the frequency.
What am i doing wrong and how do i fix it?
Sum of Tuition & FeesColumn Labels Row Labels1-500010001-1500015001-2000020001-2500025001-3000030001-3500035001-4000040001-45000Grand Total1600-1649000000384150384151700-174900000075300420981173981750-179900000001640301640301800-18490002393286258968282338753369317778241850-189945600351854601436369044059348718316279415400191900-19490135720439608323512894030508005747871950-200000389689131403144000161722Grand Total4560135727415320522053318369780111398537058533374195
arrow_forward
Normalise the following group of data using z score normalisation (200, 300, 400, 600, 1000)
Which is the answer? a, b or c?
arrow_forward
Find out the transformed value ?
Arthemetic mean is 32 and standard deviation is 24 then value 70 is transformed , what is that value using zero - score normalization.
arrow_forward
Q4/ A: How to use MATLAB to verify that the identity is correct for the trigonometric
identity given below:
Substituting: x= 25°.
1-2cosx - 3cos²x__ 1-3cosx
sin²x
1– cosx
=
arrow_forward
Please provide the R code for the following using the precip dataset
Use the sample function to take a sample of size 25 from the data points in the precip
data set.
Conduct a hypothesis test to determine if the proportion of cities with an average
rainfall of at least 20 inches is equal to 0.65 or not at a 90% significance level. Clearly
state your hypotheses, rejection rule, and conclusion in your report.
Based on your observations of the sample data, conduct a one sided test to determine if the proportion parameter is greater than (or less than, you will need to choose the direction to test) 0.65. Clearly state your hypotheses, rejection rule, and conclusion in your report.
Compute the proportion of cities in the full data set that have more than 20 inches of rain each year. With this information, do you think your test results are correct? Explain any discrepancies or issues that you see.
arrow_forward
Implement the gradient descent and train the model for 100 epoch and submit the code + output screenshot.
Use the SSE as loss i.e. ∑(prediction−y)2
and Prediction=x2w2+xw1+b
arrow_forward
Question 21
MAMD.M.D.3: Use the following data to identify the box and whisker plot:
48, 47, 16, 31, 26, 40, 11, 23, 50, 18, 42, 49, 19, 25, 10
Your answer:
10
20 30 40
50
10 20 30
40 50
10
20 30
40
50
10
20
30
40
50
arrow_forward
Upload your own dataset to Rstudio
From your dataset:
choose a nominal variable and plot its pie chart
choose a nominal/ordinal variable and plot its bar chart
choose a numerical variable and plot its histogram and its density
upload your tidy R codes here
arrow_forward
A scientist was trying to compare the relationship between some x-variable and some y-variable. Itfollows that the x-variable consists of the first 10 numbers of the Fibonacci Sequence excluding zero.The y-values are the first 10 odd numbers.Using your knowledge, research and the information given above find Pearson’s CorrelationCoefficient. Interpret and explain the meaning of your answer.
arrow_forward
Answer fast ......
arrow_forward
Please provide both question answer.
And provide correct solution. Will get more upvotes.
Convert the following epsilon NFAs to NFAs with complete transition table
arrow_forward
using R please provide the code needed to run a hypothesis test using the iris data set, conduct a hypothesis test to determine if the true mean of Sepal.Width for "virginica" species is 3.05 or not. it is given that the true variance is 0.19. use a = 0.05. also use the z.test() function
arrow_forward
last attempt. will give like if correct. please help
arrow_forward
Problem 1: Given the dataset produce the following tables: a. A table based on Petal Length a value of less than or equal to 1.4 b. A table with any variety except Setosa variety and Petal Width of less than 2.1 c. A table with an additional feature column computing the ratio of Petal Length/Petal Width.
petal.length
petal.width
variety
1.4
0.2
Setosa
1.4
0.2
Setosa
1.3
0.2
Setosa
1.5
0.2
Setosa
1.4
0.2
Setosa
1.7
0.4
Setosa
1.4
0.3
Setosa
1.5
0.2
Setosa
1.4
0.2
Setosa
1.5
0.1
Setosa
1.5
0.2
Setosa
1.6
0.2
Setosa
1.4
0.1
Setosa
1.1
0.1
Setosa
1.2
0.2
Setosa
1.5
0.4
Setosa
1.3
0.4
Setosa
1.4
0.3
Setosa
1.7
0.3
Setosa
1.5
0.3
Setosa
1.7
0.2
Setosa
1.5
0.4
Setosa
1
0.2
Setosa
1.7
0.5
Setosa
1.9
0.2
Setosa
1.6
0.2
Setosa
1.6
0.4
Setosa
1.5
0.2
Setosa
1.4
0.2
Setosa
1.6
0.2
Setosa
1.6
0.2
Setosa
1.5
0.4
Setosa
1.5
0.1
Setosa
1.4
0.2
Setosa
1.5
0.2
Setosa
1.2
0.2
Setosa
1.3
0.2
Setosa
1.4
0.1
Setosa
1.3
0.2…
arrow_forward
Calculate relative calibration. Given the standard pixel and dose data, search for regression to find the pixel-to-dose relation, then calculate the dose at a certain pixel value.
arrow_forward
Using the data below, calculate the squared error for the 4th week. Use the 2 poriód moVIng
forecast.
Week
Time Series Value
7.00
2.
14,00
13.00
18.00
Submit
Answer format: Number: Round to: 1 decimal places.
Using the data below, what is the value of RMSE?
Week
Time Series Value
Forecast
3
4.00
2
5
4.00
2
4.00
4
8
7.00
Submit
Answer format: Number: Round to: 2 decimal places.
arrow_forward
Two random variables X and Y are distributed with an unknown joint PDF. From simultaneous
measurements, the data obtained are shown in the picture below.
1.5
0.5
-0.5
-1
-1.5
-2
-2.5
-1.5
-1
-0.5
0.5
1.5
2.5
What can you infer about the value of the correlation coefficient among the two random
variables? Justify your answer.
arrow_forward
Which of the following correctly represents the coefficient of determination in terms of the variance that is an output from the analysis of variance table?
arrow_forward
Use the dataset songdata.csv to answer the following questions:
Which band/artist says love the most (per song)? The least?
Who is the most negative band in the data set (in terms of sentiment)? Positive?
Which band has the "best" vocabulary? First define what "best" means and then write tidytext code to determine the answer.
Can you predict who a song is by? Take Katy Perry and Taylor Swift (or 2 other artists with at least 50 songs each) and come up with 5-10 features for each song. Split data into train and test and see how accurate a model can be (use glm or rf).
must be written in R code to use with R studio.
arrow_forward
SEE MORE QUESTIONS
Recommended textbooks for you
Database System Concepts
Computer Science
ISBN:9780078022159
Author:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:McGraw-Hill Education
Starting Out with Python (4th Edition)
Computer Science
ISBN:9780134444321
Author:Tony Gaddis
Publisher:PEARSON
Digital Fundamentals (11th Edition)
Computer Science
ISBN:9780132737968
Author:Thomas L. Floyd
Publisher:PEARSON
C How to Program (8th Edition)
Computer Science
ISBN:9780133976892
Author:Paul J. Deitel, Harvey Deitel
Publisher:PEARSON
Database Systems: Design, Implementation, & Manag...
Computer Science
ISBN:9781337627900
Author:Carlos Coronel, Steven Morris
Publisher:Cengage Learning
Programmable Logic Controllers
Computer Science
ISBN:9780073373843
Author:Frank D. Petruzella
Publisher:McGraw-Hill Education
Related Questions
- How do we find the data using Phyton? if you can using google collab, that will be easy to understand. I saw some expert didnt even try to answer the question, just copy paste the answer fom the other site not relating with the question and mark it as solution of this answer.arrow_forwardPlease answer ASAP, provide matlab plot screenshot as well.arrow_forwardUsing the ggpubr package and islands data set, PLEASE in R provide the code and steps for the following a. For each sample, conduct a sign test to test the hypothesis that the median size of a landmass in the data set is 100 (this is with the data set units, it would correspond to 1,000,000 sq miles). Use a 95% significance level. Include the results from all 5 tests in your report. You should use the binom.test() function for these tests. b. Do the results of your sign tests differ for the different data subsets? Provide a brief intuitive explanation of your findings. c. For each sample, conduct a sign-rank test the hypothesis that the median size of a landmass in the data set is 100 (this is with the data set units, it would correspond to 1,000,000 sq miles). Use a 95% significance level. Include the results from all 5 tests in your report. d. Do the results of your sign-rank tests differ from the results of the sign test for each of your 5 samples? Provide a brief intuitive…arrow_forward
- Fit a straight line to the x and y values, the compute the total standard deviation, and the correlation coefficient for the data. Xi 0 0.5 1 1.5 2 2.5 Yi 1 1.25 Q6 CURVE FITTING 2 2.5 3.5 4arrow_forwardQ1/ plot the radiation. pattern Gain= Sine cos20 in coordinate XY, XZ,YZ With illustration on the drawing, and write all important relations.arrow_forwardYou fit a trendline of y = 0.5 + 0.1 to your data. What would the residual be for a data point measured as y = 0.9 for an x value of x = 0.7? QUESTION 10 You have the following data: Velocity (m/s) Time (s) 0.1 3.138991 0.2 3.590478 0.3 3.769523 0.4 4.499995 0.5 4.13742 0.6 3.952264 0.7 5.059469 0.8 4.607308 0.9 5.332076 1 5.618846 Use LINEST in EXCEL or Open Office to find the 2 sigma error on the initial velocity. С 24 m/s 0.4 m/s 0.3 m/s 0.2 m/sarrow_forward
- Hi I am needing to figure out how to put the frequency's in the cells.. Im not even sure if im saying that right .. but Im using pivot table and i have the rows and columns displaying the correct bin info. but the totals are the amounts of tuition not the frequency. What am i doing wrong and how do i fix it? Sum of Tuition & FeesColumn Labels Row Labels1-500010001-1500015001-2000020001-2500025001-3000030001-3500035001-4000040001-45000Grand Total1600-1649000000384150384151700-174900000075300420981173981750-179900000001640301640301800-18490002393286258968282338753369317778241850-189945600351854601436369044059348718316279415400191900-19490135720439608323512894030508005747871950-200000389689131403144000161722Grand Total4560135727415320522053318369780111398537058533374195arrow_forwardNormalise the following group of data using z score normalisation (200, 300, 400, 600, 1000) Which is the answer? a, b or c?arrow_forwardFind out the transformed value ? Arthemetic mean is 32 and standard deviation is 24 then value 70 is transformed , what is that value using zero - score normalization.arrow_forward
- Q4/ A: How to use MATLAB to verify that the identity is correct for the trigonometric identity given below: Substituting: x= 25°. 1-2cosx - 3cos²x__ 1-3cosx sin²x 1– cosx =arrow_forwardPlease provide the R code for the following using the precip dataset Use the sample function to take a sample of size 25 from the data points in the precip data set. Conduct a hypothesis test to determine if the proportion of cities with an average rainfall of at least 20 inches is equal to 0.65 or not at a 90% significance level. Clearly state your hypotheses, rejection rule, and conclusion in your report. Based on your observations of the sample data, conduct a one sided test to determine if the proportion parameter is greater than (or less than, you will need to choose the direction to test) 0.65. Clearly state your hypotheses, rejection rule, and conclusion in your report. Compute the proportion of cities in the full data set that have more than 20 inches of rain each year. With this information, do you think your test results are correct? Explain any discrepancies or issues that you see.arrow_forwardImplement the gradient descent and train the model for 100 epoch and submit the code + output screenshot. Use the SSE as loss i.e. ∑(prediction−y)2 and Prediction=x2w2+xw1+barrow_forward
arrow_back_ios
SEE MORE QUESTIONS
arrow_forward_ios
Recommended textbooks for you
- Database System ConceptsComputer ScienceISBN:9780078022159Author:Abraham Silberschatz Professor, Henry F. Korth, S. SudarshanPublisher:McGraw-Hill EducationStarting Out with Python (4th Edition)Computer ScienceISBN:9780134444321Author:Tony GaddisPublisher:PEARSONDigital Fundamentals (11th Edition)Computer ScienceISBN:9780132737968Author:Thomas L. FloydPublisher:PEARSON
- C How to Program (8th Edition)Computer ScienceISBN:9780133976892Author:Paul J. Deitel, Harvey DeitelPublisher:PEARSONDatabase Systems: Design, Implementation, & Manag...Computer ScienceISBN:9781337627900Author:Carlos Coronel, Steven MorrisPublisher:Cengage LearningProgrammable Logic ControllersComputer ScienceISBN:9780073373843Author:Frank D. PetruzellaPublisher:McGraw-Hill Education
Database System Concepts
Computer Science
ISBN:9780078022159
Author:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:McGraw-Hill Education
Starting Out with Python (4th Edition)
Computer Science
ISBN:9780134444321
Author:Tony Gaddis
Publisher:PEARSON
Digital Fundamentals (11th Edition)
Computer Science
ISBN:9780132737968
Author:Thomas L. Floyd
Publisher:PEARSON
C How to Program (8th Edition)
Computer Science
ISBN:9780133976892
Author:Paul J. Deitel, Harvey Deitel
Publisher:PEARSON
Database Systems: Design, Implementation, & Manag...
Computer Science
ISBN:9781337627900
Author:Carlos Coronel, Steven Morris
Publisher:Cengage Learning
Programmable Logic Controllers
Computer Science
ISBN:9780073373843
Author:Frank D. Petruzella
Publisher:McGraw-Hill Education