Assignment 3 - Jupyter Notebook

pdf

School

Western University *

*We aren’t endorsed by this school

Course

1000A

Subject

Statistics

Date

Jan 9, 2024

Type

pdf

Pages

Uploaded by SargentStingrayPerson1024

11/18/22, 10:04 PM Assignment 3 - Jupyter Notebook localhost:8888/notebooks/Documents/data science/Assignment 3.ipynb 1/7 Question 4 Labels for entire sample: 0.00001 - 0.14987 Labels in each subgroup Community Pharmacy: 0.00001 - 0.11323 (ex. Community Pharmacist #587 has label 0.00587) Hospital or Other Health Care Facility: 0.11324 - 0.13987 (ex. the 1504th individual from this subgroup has label 0.12827) Academia or Governement: 0.13988 - 0.14320 (ex. the 28th individual from this subgroup has label 0.14015) Industry: 0.14321 - 0.14820 (ex. the 462nd individual from this subgroup has label 0.14782) Corporate, professional practice or clinic: 0.14822 - 0.14987 (ex. the 18th individual from this subgroup has label 0.14800) In [1]: In [2]: Out[2]: ID Practice 0 1 Community Pharmacy 1 2 Community Pharmacy 2 3 Community Pharmacy 3 4 Community Pharmacy 4 5 Community Pharmacy ... ... ... 14982 14983 Corporate, professional practice or clinic 14983 14984 Corporate, professional practice or clinic 14984 14985 Corporate, professional practice or clinic 14985 14986 Corporate, professional practice or clinic 14986 14987 Corporate, professional practice or clinic 14987 rows × 2 columns import numpy as np import pandas as pd import random df = pd.DataFrame({ 'ID' : np.arange( 1 , 14988 ).tolist(), 'Practice' : [ 'Community Pharmacy' ] * 11323 + [ 'Hospital or + [ 'Academia or Governement' ] * 333 + [ 'Industry' ] * 500 + [ 'Corporate, professional practice or clinic' ] * 167 }) df

11/18/22, 10:04 PM Assignment 3 - Jupyter Notebook localhost:8888/notebooks/Documents/data science/Assignment 3.ipynb 2/7 In [3]: Selected individuals 0.02470, 0.00794, 0.11061, 0.08303, 0.09996, 0.02463, 0.02456, 0.02390, 0.08724, 0.08562 In [5]: Selected Individuals 0.13137, 0.11745, 0.13922, 0.13778, 0.11919, 0.12236, 0.12093, 0.12128, 0,11833, 0.13877 ID Practice 86 87 Community Pharmacy 10133 10134 Community Pharmacy 5450 5451 Community Pharmacy 10462 10463 Community Pharmacy 6031 6032 Community Pharmacy 9611 9612 Community Pharmacy 5735 5736 Community Pharmacy 2405 2406 Community Pharmacy 2002 2003 Community Pharmacy 7521 7522 Community Pharmacy ID Practice 13049 13050 Hospital or Other Health Care Facility 12648 12649 Hospital or Other Health Care Facility 13069 13070 Hospital or Other Health Care Facility 11947 11948 Hospital or Other Health Care Facility 12507 12508 Hospital or Other Health Care Facility 11545 11546 Hospital or Other Health Care Facility 11361 11362 Hospital or Other Health Care Facility 11967 11968 Hospital or Other Health Care Facility 11657 11658 Hospital or Other Health Care Facility 12038 12039 Hospital or Other Health Care Facility df_Community_Pharmacy = df[df[ 'Practice' ] == 'Community Pharmacy' ] df_Community_Pharmacy random_rows = random.sample( range ( 11323 ), 10 ) print (df_Community_Pharmacy.iloc[random_rows]) df_Hospital_other = df[df[ 'Practice' ] == 'Hospital or Other Health Care Fac df_Hospital_other random_rows = random.sample( range ( 2664 ), 10 ) print (df_Hospital_other.iloc[random_rows])

11/18/22, 10:04 PM Assignment 3 - Jupyter Notebook localhost:8888/notebooks/Documents/data science/Assignment 3.ipynb 3/7 In [7]: Selected Individuals: 0.14277, 0.14258. 0.14259, 0.14244, 0.14006, 0.14314, 0.14138, 0.14107, 0.14215, 0.14165 In [8]: Selected individuals: 0.14350, 0.14551, 0.14472, 0.14465, 0.14369, 0.14424, 0.14432, 0.14401, 0.14715, 0.14779 ID Practice 14248 14249 Academia or Governement 14109 14110 Academia or Governement 14137 14138 Academia or Governement 14233 14234 Academia or Governement 14093 14094 Academia or Governement 14261 14262 Academia or Governement 14294 14295 Academia or Governement 14178 14179 Academia or Governement 14277 14278 Academia or Governement 14025 14026 Academia or Governement ID Practice 14626 14627 Industry 14392 14393 Industry 14552 14553 Industry 14413 14414 Industry 14420 14421 Industry 14775 14776 Industry 14598 14599 Industry 14540 14541 Industry 14725 14726 Industry 14365 14366 Industry df_Academia_governement = df[df[ 'Practice' ] == 'Academia or Governement' ] df_Academia_governement random_rows = random.sample( range ( 333 ), 10 ) print (df_Academia_governement.iloc[random_rows]) df_Industry = df[df[ 'Practice' ] == 'Industry' ] df_Industry random_rows = random.sample( range ( 500 ), 10 ) print (df_Industry.iloc[random_rows])

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

11/18/22, 10:04 PM Assignment 3 - Jupyter Notebook localhost:8888/notebooks/Documents/data science/Assignment 3.ipynb 4/7 In [10]: Selected individuals: 0.14861, 0.14907, 0.14935, 0.14831, 0.14936, 0.14962, 0.14924, 0.14872, 0.14837, 0.14847 All Selected individuals Community Pharmacy: 0.02470, 0.00794, 0.11061, 0.08303, 0.09996, 0.02463, 0.02456, 0.02390, 0.08724, 0.08562 Hospital and other Health care facilities: 0.13137, 0.11745, 0.13922, 0.13778, 0.11919, 0.12236, 0.12093, 0.12128, 0,11833, 0.13877 Academia or Governement:0.14277, 0.14258. 0.14259, 0.14244, 0.14006, 0.14314, 0.14138, 0.14107, 0.14215, 0.14165 Industry: 0.14350, 0.14551, 0.14472, 0.14465, 0.14369, 0.14424, 0.14432, 0.14401, 0.14715, 0.14779 Corporate, professional practice or clinic: 0.14861, 0.14907, 0.14935, 0.14831, 0.14936, 0.14962, 0.14924, 0.14872, 0.14837, 0.14847 Question 5 b) ID Practice 14872 14873 Corporate, professional practice or clinic 14954 14955 Corporate, professional practice or clinic 14837 14838 Corporate, professional practice or clinic 14875 14876 Corporate, professional practice or clinic 14820 14821 Corporate, professional practice or clinic 14905 14906 Corporate, professional practice or clinic 14925 14926 Corporate, professional practice or clinic 14927 14928 Corporate, professional practice or clinic 14897 14898 Corporate, professional practice or clinic 14953 14954 Corporate, professional practice or clinic df_CPPC = df[df[ 'Practice' ] == 'Corporate, professional practice or clinic' df_CPPC random_rows = random.sample( range ( 167 ), 10 ) print (df_CPPC.iloc[random_rows])

11/18/22, 10:04 PM Assignment 3 - Jupyter Notebook localhost:8888/notebooks/Documents/data science/Assignment 3.ipynb 5/7 In [11]: In [12]: In [13]: Out[11]: ID 0 1 1 2 2 3 3 4 4 5 ... ... 1405 1406 1406 1407 1407 1408 1408 1409 1409 1410 1410 rows × 1 columns 26 df_Trees = pd.DataFrame({ 'ID' : np.arange( 1 , 1411 ).tolist()}) df_Trees def systematic_sampling (df, starting_index, step): indices = np.arange(starting_index, len (df), step = step) systematic_sample = df.iloc[indices] return systematic_sample random.seed( 90 ) random_start = random.randint( 0 , 78 ) print (random_start)

11/18/22, 10:04 PM Assignment 3 - Jupyter Notebook localhost:8888/notebooks/Documents/data science/Assignment 3.ipynb 6/7 In [14]: Out[14]: ID 26 27 53 54 80 81 107 108 134 135 161 162 188 189 215 216 242 243 269 270 296 297 323 324 350 351 377 378 404 405 431 432 458 459 485 486 512 513 539 540 566 567 593 594 620 621 647 648 674 675 701 702 728 729 755 756 782 783 809 810 836 837 863 864 systematic_sample = systematic_sampling(df = df_Trees, starting_index = random_s systematic_sample

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

11/18/22, 10:04 PM Assignment 3 - Jupyter Notebook localhost:8888/notebooks/Documents/data science/Assignment 3.ipynb 7/7 In [ ]: ID 890 891 917 918 944 945 971 972 998 999 1025 1026 1052 1053 1079 1080 1106 1107 1133 1134 1160 1161 1187 1188 1214 1215 1241 1242 1268 1269 1295 1296 1322 1323 1349 1350 1376 1377 1403 1404