hw03

pdf

School

Santa Barbara City College *

*We aren’t endorsed by this school

Course

W54

Subject

Computer Science

Date

Oct 30, 2023

Type

pdf

Pages

16

Uploaded by sarkarved

Report
0.0.1 Question 1d There are many ways we could choose to read tweets. Why might someone be interested in doing data analysis on tweets? Name a kind of person or institution that might be interested in this kind of analysis. Then, give two reasons why a data analysis of tweets might be interesting or useful for them. Answer in 2-3 sentences. To figure out the sentiments of the public towards certain issue before elections. This allows polticians to align themselves with issues that benefit them in swaying the public and passing their agendas more effectively. Tweet analysis can reveal the issues to politcians which the public in their constituency is most interested in and can also tell what and who the public support. It can also reveal the affect that a speech or a statement might have on the public. 1
2
0.0.2 Question 2e Given the plot above, what might we want to investigate during EDA? Name some possible questions you may have about the dataset in light of the information shown in the plot. What are these sources like WhoSay and MobioNsider.com . We might also want learn who tweets the most and when do they tweet. 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
4
0.0.3 Question 2f We just looked at the top 5 most commonly used devices for each user. However, we used the number of tweets as a measure when it might be better to compare these distributions by comparing proportions of tweets (i.e., what percentage of all tweets for a user were published from each device). Why might the proportions of tweets be better measures than the number of tweets? This will allow us gain better insights in terms of who which device the most as the total number might be high for certain device but it might not neccesarily tell us whether they use that device alot as they might have higher tweets on another device. It also makes the comparison harder as someone like Elon Musk tweets way more than Cristiano which doesn’t tell us much about the preferences of either when it comes to the devices they use. 5
6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
0.0.4 Question 3b Compare Cristiano’s distribution with those of AOC and Elon Musk. In particular, compare the distributions before and after Hour 6. What differences did you notice? What might be a possible cause of that? Do the data plotted above seem reasonable? Hint: If you are not familiar with who Cristiano, AOC, and Elon Musk are, it may be helpful to Google information about these people, their occupations, and where they live. Cristiano starts tweeting after 6am increasingly while AOC’s and Musk’s tweets seem to reduce after 6am to about 11am. This might be caused due to the timezone differences they have. It does not seem completely reasonable 7
8
0.0.5 Question 4a Using your own personal interpretation, please score the sentiment of one of the following words using the VADER scale (-4 means the word is extremely negative. +4 means the word is extremely positive). No code is required for this question! • order • dog • cat • technology • TikTok • security • science • climate change What score did you give it and why? Can you describe a situation where this word would carry the opposite sentiment to the one you’ve just assigned? If not, explain why. Science = +2, however in context of weapons it would be -4 9
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
10
0.0.6 Question 4g In q4f above, we aggregated the polarity of the tweets by computing the mean sentiment score of tweets mentioning each user. What are some drawbacks of the decision to use the mean as an aggregation function? What other aggregation function(s) might be more appropriate than the mean? The mean is highly sensitive to extreme values. A single tweet with a very high or very low sentiment score can significantly skew the average, leading to a potentially misleading representation of the overall sentiment. We could use mode instead. 11
12
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
0.0.7 Question 5a Use this space to put your EDA code. In [162]: # perform your text analysis here hourly_sentiment = tweets[ "elonmusk" ] . groupby(tweets[ "elonmusk" ][ 'hour' ]) . agg({ 'polarity' : 'me hourly_sentiment . plot(y = 'polarity' , kind = 'line' , title = "Average Sentiment Score by Hour" , ylab plt . grid( True , which = 'both' , linestyle = '--' , linewidth =0.5 ) plt . axhline( 0 , color = 'black' , linewidth =0.5 ) plt . show() 13
14
0.0.8 Question 5b Use this space to put your EDA description. Examine how the sentiment of tweets varies with the time of the day. This could potentially reveal at which times of the day each of the analyzed Twitter users tends to post tweets with more positive or negative sentiment. 15
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
16

Browse Popular Homework Q&A

Q: Bonds often pay a coupon twice a year. For the valuation of bonds that make semiannual payments, the…
Q: The minimum fatigue life of rivets in the fuselage of a certain type of aircraft is modeled with a…
Q: 7.20 mm²¹/² x ( ) × (▬)×(▬) = = _cm¹/2 7.20 mm¹/³ × (—) × (-)×( X cm¹/3 5.20 mL x (x (x 9.81…
Q: GreenLawns provides a lawn fertilizing and weed control service. The company is adding a special…
Q: Download E2 and build a model to predict the S&P 500 value. If the DJIA closes at 12,558, what…
Q: What is the syntax for the data type of a reference to an int? int& ref int* int[]
Q: Select the correct name for the alkene seen below: (E)-3,5-dimethylhept-3-ene…
Q: Problem 13-66 (LO 13-3) (Algo) Skip to question   [The following information applies to the…
Q: A high school guidance counselor has a pamphlet that says that 20% of all high school students go to…
Q: A chemist adds 0.91 L of a 47.1 g/dL iron(11) bromide (FeBr₂) solution to a flask. Calculate the…
Q: (21) 2. Write a Java application that asks the user to enter a car model or 'done' when finished. If…
Q: For the reaction 2 NO + O₂ 2 NO₂ the value of Kp 2.3 x 108 at 25°C. What is the value of the…
Q: Consider 2(sin(x))2-√2 sin(x) = 2 sin(x) = √2. Bring all the terms to the left side to make an…
Q: Cache: (select all that apply) O is a portion of memory used to hold data Omakes data retrieval…
Q: A store charges 6.5% sales tax. If you buy a product for $27.31, how much will your total bill cost?…