Assignment 3 - Jupyter Notebook
pdf
keyboard_arrow_up
School
Western University *
*We aren’t endorsed by this school
Course
1000A
Subject
Statistics
Date
Jan 9, 2024
Type
Pages
7
Uploaded by SargentStingrayPerson1024
11/18/22, 10:04 PM
Assignment 3 - Jupyter Notebook
localhost:8888/notebooks/Documents/data science/Assignment 3.ipynb
1/7
Question 4
Labels for entire sample: 0.00001 - 0.14987
Labels in each subgroup
Community Pharmacy: 0.00001 - 0.11323 (ex. Community Pharmacist #587 has label 0.00587)
Hospital or Other Health Care Facility: 0.11324 - 0.13987 (ex. the 1504th individual from this
subgroup has label 0.12827)
Academia or Governement: 0.13988 - 0.14320 (ex. the 28th individual from this subgroup has label
0.14015)
Industry: 0.14321 - 0.14820 (ex. the 462nd individual from this subgroup has label 0.14782)
Corporate, professional practice or clinic: 0.14822 - 0.14987 (ex. the 18th individual from this
subgroup has label 0.14800)
In [1]:
In [2]:
Out[2]:
ID
Practice
0
1
Community Pharmacy
1
2
Community Pharmacy
2
3
Community Pharmacy
3
4
Community Pharmacy
4
5
Community Pharmacy
...
...
...
14982
14983
Corporate, professional practice or clinic
14983
14984
Corporate, professional practice or clinic
14984
14985
Corporate, professional practice or clinic
14985
14986
Corporate, professional practice or clinic
14986
14987
Corporate, professional practice or clinic
14987 rows × 2 columns
import
numpy as
np
import
pandas as
pd
import
random
df =
pd.DataFrame({
'ID'
: np.arange(
1
,
14988
).tolist(),
'Practice'
: [
'Community Pharmacy'
]
*
11323 +
[
'Hospital or +
[
'Academia or Governement'
]
*
333 +
[
'Industry'
]
*
500 +
[
'Corporate, professional practice or clinic'
]
*
167
})
df
11/18/22, 10:04 PM
Assignment 3 - Jupyter Notebook
localhost:8888/notebooks/Documents/data science/Assignment 3.ipynb
2/7
In [3]:
Selected individuals 0.02470, 0.00794, 0.11061, 0.08303, 0.09996, 0.02463, 0.02456, 0.02390,
0.08724, 0.08562
In [5]:
Selected Individuals 0.13137, 0.11745, 0.13922, 0.13778, 0.11919, 0.12236, 0.12093, 0.12128,
0,11833, 0.13877
ID Practice 86 87 Community Pharmacy 10133 10134 Community Pharmacy 5450 5451 Community Pharmacy 10462 10463 Community Pharmacy 6031 6032 Community Pharmacy 9611 9612 Community Pharmacy 5735 5736 Community Pharmacy 2405 2406 Community Pharmacy 2002 2003 Community Pharmacy 7521 7522 Community Pharmacy ID Practice 13049 13050 Hospital or Other Health Care Facility 12648 12649 Hospital or Other Health Care Facility 13069 13070 Hospital or Other Health Care Facility 11947 11948 Hospital or Other Health Care Facility 12507 12508 Hospital or Other Health Care Facility 11545 11546 Hospital or Other Health Care Facility 11361 11362 Hospital or Other Health Care Facility 11967 11968 Hospital or Other Health Care Facility 11657 11658 Hospital or Other Health Care Facility 12038 12039 Hospital or Other Health Care Facility df_Community_Pharmacy
=
df[df[
'Practice'
] ==
'Community Pharmacy'
]
df_Community_Pharmacy
random_rows =
random.sample(
range
(
11323
), 10
)
print
(df_Community_Pharmacy.iloc[random_rows])
df_Hospital_other =
df[df[
'Practice'
] ==
'Hospital or Other Health Care Fac
df_Hospital_other
random_rows =
random.sample(
range
(
2664
), 10
)
print
(df_Hospital_other.iloc[random_rows])
11/18/22, 10:04 PM
Assignment 3 - Jupyter Notebook
localhost:8888/notebooks/Documents/data science/Assignment 3.ipynb
3/7
In [7]:
Selected Individuals: 0.14277, 0.14258. 0.14259, 0.14244, 0.14006, 0.14314, 0.14138, 0.14107,
0.14215, 0.14165
In [8]:
Selected individuals: 0.14350, 0.14551, 0.14472, 0.14465, 0.14369, 0.14424, 0.14432, 0.14401,
0.14715, 0.14779
ID Practice 14248 14249 Academia or Governement 14109 14110 Academia or Governement 14137 14138 Academia or Governement 14233 14234 Academia or Governement 14093 14094 Academia or Governement 14261 14262 Academia or Governement 14294 14295 Academia or Governement 14178 14179 Academia or Governement 14277 14278 Academia or Governement 14025 14026 Academia or Governement ID Practice 14626 14627 Industry 14392 14393 Industry 14552 14553 Industry 14413 14414 Industry 14420 14421 Industry 14775 14776 Industry 14598 14599 Industry 14540 14541 Industry 14725 14726 Industry 14365 14366 Industry df_Academia_governement =
df[df[
'Practice'
] ==
'Academia or Governement'
]
df_Academia_governement
random_rows =
random.sample(
range
(
333
), 10
)
print
(df_Academia_governement.iloc[random_rows])
df_Industry =
df[df[
'Practice'
] ==
'Industry'
]
df_Industry
random_rows =
random.sample(
range
(
500
), 10
)
print
(df_Industry.iloc[random_rows])
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/18/22, 10:04 PM
Assignment 3 - Jupyter Notebook
localhost:8888/notebooks/Documents/data science/Assignment 3.ipynb
4/7
In [10]:
Selected individuals: 0.14861, 0.14907, 0.14935, 0.14831, 0.14936, 0.14962, 0.14924, 0.14872,
0.14837, 0.14847
All Selected individuals Community Pharmacy: 0.02470, 0.00794, 0.11061, 0.08303, 0.09996,
0.02463, 0.02456, 0.02390, 0.08724, 0.08562
Hospital and other Health care facilities: 0.13137, 0.11745, 0.13922, 0.13778, 0.11919, 0.12236,
0.12093, 0.12128, 0,11833, 0.13877
Academia or Governement:0.14277, 0.14258. 0.14259, 0.14244, 0.14006, 0.14314, 0.14138,
0.14107, 0.14215, 0.14165
Industry: 0.14350, 0.14551, 0.14472, 0.14465, 0.14369, 0.14424, 0.14432, 0.14401, 0.14715,
0.14779
Corporate, professional practice or clinic: 0.14861, 0.14907, 0.14935, 0.14831, 0.14936, 0.14962,
0.14924, 0.14872, 0.14837, 0.14847
Question 5
b)
ID Practice 14872 14873 Corporate, professional practice or clinic 14954 14955 Corporate, professional practice or clinic 14837 14838 Corporate, professional practice or clinic 14875 14876 Corporate, professional practice or clinic 14820 14821 Corporate, professional practice or clinic 14905 14906 Corporate, professional practice or clinic 14925 14926 Corporate, professional practice or clinic 14927 14928 Corporate, professional practice or clinic 14897 14898 Corporate, professional practice or clinic 14953 14954 Corporate, professional practice or clinic df_CPPC =
df[df[
'Practice'
] ==
'Corporate, professional practice or clinic'
df_CPPC
random_rows =
random.sample(
range
(
167
), 10
)
print
(df_CPPC.iloc[random_rows])
11/18/22, 10:04 PM
Assignment 3 - Jupyter Notebook
localhost:8888/notebooks/Documents/data science/Assignment 3.ipynb
5/7
In [11]:
In [12]:
In [13]:
Out[11]:
ID
0
1
1
2
2
3
3
4
4
5
...
...
1405
1406
1406
1407
1407
1408
1408
1409
1409
1410
1410 rows × 1 columns
26 df_Trees =
pd.DataFrame({
'ID'
: np.arange(
1
,
1411
).tolist()})
df_Trees
def
systematic_sampling
(df, starting_index, step):
indices =
np.arange(starting_index, len
(df), step =
step)
systematic_sample
=
df.iloc[indices]
return
systematic_sample
random.seed(
90
)
random_start =
random.randint(
0
,
78
)
print
(random_start)
11/18/22, 10:04 PM
Assignment 3 - Jupyter Notebook
localhost:8888/notebooks/Documents/data science/Assignment 3.ipynb
6/7
In [14]:
Out[14]:
ID
26
27
53
54
80
81
107
108
134
135
161
162
188
189
215
216
242
243
269
270
296
297
323
324
350
351
377
378
404
405
431
432
458
459
485
486
512
513
539
540
566
567
593
594
620
621
647
648
674
675
701
702
728
729
755
756
782
783
809
810
836
837
863
864
systematic_sample
=
systematic_sampling(df
=
df_Trees, starting_index
=
random_s
systematic_sample
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/18/22, 10:04 PM
Assignment 3 - Jupyter Notebook
localhost:8888/notebooks/Documents/data science/Assignment 3.ipynb
7/7
In [ ]:
ID
890
891
917
918
944
945
971
972
998
999
1025
1026
1052
1053
1079
1080
1106
1107
1133
1134
1160
1161
1187
1188
1214
1215
1241
1242
1268
1269
1295
1296
1322
1323
1349
1350
1376
1377
1403
1404
Related Documents
Related Questions
Find the preferred measure of central location of the sample whose observations are 18,
10, 11, 98, 22, 15, 11, 25 and 17 represent the number of automobiles sold during this
past January by 9 different automobile agencies. Justify your answer.
arrow_forward
We will refer to the data table below as
"Facebook_Survey_Sample".
In the first column 1:="yes" and 0:="no"
In the "Gender" column 1:="Female" and 0:="Male"
VisitsPerWeek Friends
1
45
100
0
0
1
25
60
1
40
80
m
1
40
60
0
0
1
15
40
1
30
25
80
150
90
Facebook
1
1
200
Age
34
65
46
29
21
25
22
18
42
52
Gender
1
0
0
1
1
1
1
1
Make a scatter plots by sex for VistsPerWeek
vs. Friends for those in the Facebook_Survey_Sample who
have a facebook account.
arrow_forward
Consider a sample with data values of 19,20 , 25, 25,27 , 29 ,30 , and 36. Compute the 20th,25th,65th and 75th percentiles (to 2 decimal, if decimals are necessary).
20th percentile
25thpercentile
65thpercentile
75th percentile
arrow_forward
In cluster sampling, a population is divided into groups then (some) members of
some randomly chosen groups are included in the sample. *
Gender is alan (ordinal) data. *
(Frequency) is the difference between the largest value and smallest value in the
data set.
(Mean) can be applied for nominal, ordinal, interval and ratio data.*
arrow_forward
a researcher uses an anonymous survey to investigate the television-viewing habits of 100 American adolescents. he plans to make infrence about tv habits and the american based survery the entire group of american adolescents is an example of? sample, population, parameter, and statistic?
arrow_forward
A lecturer would like to analyse whether there is a relationship between the country of origin of international students and the number of hours per day they spent on social media.
To perform the study sbove the lecturer sent an email to all the students renrolled in his class and collected the responses from whoever replied to hid email and used it as his sample data. is the data collected in this manner baised? Explain.
arrow_forward
Please recheck and provide clear and complete step-by-step solution in scanned handwriting or computerized output
arrow_forward
quilibot paraphr...
YEAR LEVEL
Grade 7
Grade 8
Grade 9
Grade 10
Determine the number of sample that must be taken from each grade level, if 95% *
decision is considered.
Grade 7
Grade 8
Grade 9
Grade
10
TOTAL
146
Classes 2022 ◆ LERIS | Professional...
O
O
NUMBER OF STUDENTS
ENROLLED
5,800
6,980
133
DO
14.790
74
SOLUTION
A
SAMPLE SIZE
70
C
saudi council DATAFLOW
67
O
SAMPLE
58
22
arrow_forward
SEE MORE QUESTIONS
Recommended textbooks for you

Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill

College Algebra (MindTap Course List)
Algebra
ISBN:9781305652231
Author:R. David Gustafson, Jeff Hughes
Publisher:Cengage Learning
Related Questions
- Find the preferred measure of central location of the sample whose observations are 18, 10, 11, 98, 22, 15, 11, 25 and 17 represent the number of automobiles sold during this past January by 9 different automobile agencies. Justify your answer.arrow_forwardWe will refer to the data table below as "Facebook_Survey_Sample". In the first column 1:="yes" and 0:="no" In the "Gender" column 1:="Female" and 0:="Male" VisitsPerWeek Friends 1 45 100 0 0 1 25 60 1 40 80 m 1 40 60 0 0 1 15 40 1 30 25 80 150 90 Facebook 1 1 200 Age 34 65 46 29 21 25 22 18 42 52 Gender 1 0 0 1 1 1 1 1 Make a scatter plots by sex for VistsPerWeek vs. Friends for those in the Facebook_Survey_Sample who have a facebook account.arrow_forwardConsider a sample with data values of 19,20 , 25, 25,27 , 29 ,30 , and 36. Compute the 20th,25th,65th and 75th percentiles (to 2 decimal, if decimals are necessary). 20th percentile 25thpercentile 65thpercentile 75th percentilearrow_forward
- In cluster sampling, a population is divided into groups then (some) members of some randomly chosen groups are included in the sample. * Gender is alan (ordinal) data. * (Frequency) is the difference between the largest value and smallest value in the data set. (Mean) can be applied for nominal, ordinal, interval and ratio data.*arrow_forwarda researcher uses an anonymous survey to investigate the television-viewing habits of 100 American adolescents. he plans to make infrence about tv habits and the american based survery the entire group of american adolescents is an example of? sample, population, parameter, and statistic?arrow_forwardA lecturer would like to analyse whether there is a relationship between the country of origin of international students and the number of hours per day they spent on social media. To perform the study sbove the lecturer sent an email to all the students renrolled in his class and collected the responses from whoever replied to hid email and used it as his sample data. is the data collected in this manner baised? Explain.arrow_forward
- Please recheck and provide clear and complete step-by-step solution in scanned handwriting or computerized outputarrow_forwardquilibot paraphr... YEAR LEVEL Grade 7 Grade 8 Grade 9 Grade 10 Determine the number of sample that must be taken from each grade level, if 95% * decision is considered. Grade 7 Grade 8 Grade 9 Grade 10 TOTAL 146 Classes 2022 ◆ LERIS | Professional... O O NUMBER OF STUDENTS ENROLLED 5,800 6,980 133 DO 14.790 74 SOLUTION A SAMPLE SIZE 70 C saudi council DATAFLOW 67 O SAMPLE 58 22arrow_forward
arrow_back_ios
arrow_forward_ios
Recommended textbooks for you
- Glencoe Algebra 1, Student Edition, 9780079039897...AlgebraISBN:9780079039897Author:CarterPublisher:McGraw HillCollege Algebra (MindTap Course List)AlgebraISBN:9781305652231Author:R. David Gustafson, Jeff HughesPublisher:Cengage Learning

Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill

College Algebra (MindTap Course List)
Algebra
ISBN:9781305652231
Author:R. David Gustafson, Jeff Hughes
Publisher:Cengage Learning