Stats Project2

pdf

School

University of Texas, Dallas *

*We aren’t endorsed by this school

Course

6313

Subject

Computer Science

Date

Dec 6, 2023

Type

pdf

Pages

Uploaded by DoctorPelican3795

CS6313 STATISTICAL METHODS FOR DATA SCIENCE PROJECT 2 Problem A : Study Group Aishwarya Vinod Menon (AXV220062) Trishala Reddy (TXR220017) Dibyanshi Singh (DXS210139) Chapter 8.1 and 8.2 Descriptive Statistics Citation: Statology.org, 2020., StackOverflow,2021. Step 0: Identification and language # Name: Aishwarya Vinod Menon, Trishala Reddy # Language: Python 3.x import numpy as np import pandas as pd Step 1: Load Chapter8.txt (As demonstrated by the professor) def loadData(): data = list() fp = open("Chapter8.txt", "r") text = fp.readline() fp.close() dataset1 = text.split(","); for item in dataset1: data.append(int(item)) return data data=loadData() Step 2. The Population length= len (data) min_data=np.min(data) max_data=np.max(data) mean=np.mean(data) variance = np.var(data, ddof=0) std_dev = np.sqrt(variance) quartile_25=np.percentile(data,25) quartile_75=np.percentile(data,75) interquartile_range=quartile_75-quartile_25 lower_outlier_limit= quartile_25-1.5*interquartile_range upper_outlier_limit=quartile_75+1.5*interquartile_range

outliers = np.where((data < lower_outlier_limit) | (data > upper_outlier_limit)) num_outliers = len(outliers) print("Length:", length) print("Minimum :", min_data) print("Maximum :", max_data) print("Mean:", mean) print("Variance: " ,variance ) print("Standard Deviation :", std_dev) print("25% Quartile:", quartile_25) print("75% Quartile:", quartile_75) print("Interquartile Range (IQR):", interquartile_range) print("Number of Outliers:", num_outliers) Output: Step 2. 1000 Unit Sample: sample= np.random.choice(data,1000) Num_datapoints= len(sample) minimum=min(sample) maximum=max(sample) mean=np.mean(sample) lower_quartile= np.percentile(sample,25) upper_quartile=np.percentile(sample,75) inter_quartile_range=upper_quartile-lower_quartile lower_limit= lower_quartile-1.5*inter_quartile_range upper_limit=upper_quartile+1.5*inter_quartile_range outlier = np.where((sample < lower_limit) | (sample > upper_limit)) outliers_no = len(outlier) print("Number of datapoints:", Num_datapoints) print("Minimum Value:", minimum) print("Maximum Value:",maximum) print("Mean:",mean) print("25% Quartile:", lower_quartile) print("75% Quartile:", upper_quartile) print("The lower Outlier limit:",lower_limit) print("The upper Outlier limit:",upper_limit) print("The number of outliers:",outliers_no)

Output: Step 3. 10000 Unit Sample sample_s= np.random.choice(data,10000) datapoints= len(sample_s) min_s=min(sample_s) max_s=max(sample_s) mean_s=np.mean(sample_s) lower_quartile_s= np.percentile(sample_s,25) upper_quartile_s=np.percentile(sample_s,75) inter_quartile_range_s=upper_quartile_s-lower_quartile_s lower_limit_s= lower_quartile_s-1.5*inter_quartile_range_s upper_limit_s=upper_quartile_s+1.5*inter_quartile_range_s outlier_s = np.where((sample_s < lower_limit_s) | (sample_s > upper_limit_s)) outliers_no_s = len(outlier_s) print("Number of datapoints:", datapoints) print("Minimum Value:", min_s) print("Maximum Value:",max_s) print("Mean:",mean_s) print("25% Quartile:", lower_quartile_s) print("75% Quartile:", upper_quartile_s) print("The lower Outlier limit:",lower_limit_s) print("The upper Outlier limit:",upper_limit_s) print("The number of outliers:",outliers_no_s) Output: Step 4: 100000 Unit Sample sample_se= np.random.choice(data,100000) Datapoints_se= len(sample_se) min_se=min(sample_se)

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

max_se=max(sample_se) mean_se=np.mean(sample_se) lower_quartile_se= np.percentile(sample_se,25) upper_quartile_se=np.percentile(sample_se,75) inter_quartile_range_se=upper_quartile_se-lower_quartile_se lower_limit_se= lower_quartile_se-1.5*inter_quartile_range_se upper_limit_se=upper_quartile_se+1.5*inter_quartile_range_se outlier_se = np.where((sample_se < lower_limit_se) | (sample_se > upper_limit_se)) outliers_no_se = len(outlier_se) print("Number of datapoints:", datapoints_se) print("Minimum Value:", min_se) print("Maximum Value:",max_se) print("Mean:",mean_se) print("25% Quartile:", lower_quartile_se) print("75% Quartile:", upper_quartile_se) print("The lower Outlier limit:",lower_limit_se) print("The upper Outlier limit:",upper_limit_se) print("The number of outliers:",outliers_no_se) Output: Step 5. The Central Limit Theorem def calculate_statistics( data ): num_datapoints = len(data) min_value = min(data) max_value = max(data) mean = np.mean(data) quartile_25 = np.percentile(data, 25) quartile_75 = np.percentile(data, 75) iqr = quartile_75 - quartile_25 lower_outlier_limit = quartile_25 - 1.5 * iqr upper_outlier_limit = quartile_75 + 1.5 * iqr # Count the number of outliers in the sample outliers = [x for x in data if x < lower_outlier_limit or x > upper_outlier_limit] num_outliers = len(outliers)

return num_datapoints, min_value, max_value, mean, quartile_25, quartile_75, iqr, lower_outlier_limit, upper_outlier_limit, num_outliers # Calculate statistics for the population population_statistics = calculate_statistics(data) # Randomly sample from the population for different sample sizes sample_sizes = [1000, 10000, 100000] sample_statistics = [] for sample_size in sample_sizes: sample = random.sample(data, sample_size) sample_statistics.append(calculate_statistics(sample)) # Create a table of calculations data1 = { 'Sample Size': sample_sizes + ['Population'], 'Number of Datapoints': [stats[0] for stats in sample_statistics] + [population_statistics[0]], 'Minimum Value': [stats[1] for stats in sample_statistics] + [population_statistics[1]], 'Maximum Value': [stats[2] for stats in sample_statistics] + [population_statistics[2]], 'Mean': [stats[3] for stats in sample_statistics] + [population_statistics[3]], '25% Quartile': [stats[4] for stats in sample_statistics] + [population_statistics[4]], '75% Quartile': [stats[5] for stats in sample_statistics] + [population_statistics[5]], 'Interquartile Range': [stats[6] for stats in sample_statistics] + [population_statistics[6]], 'Lower Outlier Limit': [stats[7] for stats in sample_statistics] + [population_statistics[7]], 'Upper Outlier Limit': [stats[8] for stats in sample_statistics] + [population_statistics[8]], 'Number of Outliers': [stats[9] for stats in sample_statistics] + [population_statistics[9]] } df = pd.DataFrame(data1) # Print the table print(df) # Answer the question about the central limit theorem print("\nAnswer:") print("As the sample size increases, the sample statistics converge to the population assumptions. This is consistent with the central limit theorem, which states that the relative distribution of the sample size approaches a normal distribution as the sample size increases.")

Chapter 8.3 Scatter Plot # Name: Dibyanshi Singh # Language: R # Create a sequence of angles from 0 to 360 degrees with an 18-degree interval ang <- seq(0, 360, by = 18) # Convert angles to radians angles_in_radians <- ang * (pi / 180) # Calculate x-positions and y-positions x_positions <- cos(angles_in_radians) * 5 y_positions <- sin(angles_in_radians) * 5 # Create a data table data_tble <- data.frame(Angle_degrees = ang, X_Position = x_positions, Y_Position = y_positions) # Create scatter plots with connecting lines # Angle vs. X-Position plot(data_tble$Angle_degrees, data_tble$X_Position, type = "b", main = "Angle vs. X- Position", xlab = "ang (degrees)", ylab = "X-Position") # Angle vs. Y-Position plot(data_tble$Angle_degrees, data_tble$Y_Position, type = "b", main = "Angle vs. Y- Position", xlab = "ang (degrees)", ylab = "Y-Position") # X-Position vs. Y-Position

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

plot(data_tble$X_Position, data_tble$Y_Position, type = "b", main = "X-Position vs. Y- Position", xlab = "X-Position", ylab = "Y-Position") Output:

Related Documents

CSN105_SubnettingNetwork.docx

CS580_Assign2_DeepakMuthaiyan.docx

7.1 Cryptography.pdf

Homework--2.pdf

ASSIGNMENT 3 CIT210 OP. SYS. MANAGE FALL 2023.docx

hw10.pdf

hw08.pdf

hw06.pdf

Recommended textbooks for you

Fundamentals of Information Systems

Computer Science

ISBN:9781305082168

Author:Ralph Stair, George Reynolds

Publisher:Cengage Learning

Fundamentals of Information Systems

Computer Science

ISBN:9781337097536

Author:Ralph Stair, George Reynolds

Publisher:Cengage Learning

Np Ms Office 365/Excel 2016 I Ntermed

Computer Science

ISBN:9781337508841

Author:Carey

Publisher:Cengage

Principles of Information Systems (MindTap Course...

Computer Science

ISBN:9781305971776

Author:Ralph Stair, George Reynolds

Publisher:Cengage Learning

Database Systems: Design, Implementation, & Manag...

Computer Science

ISBN:9781285196145

Author:Steven, Steven Morris, Carlos Coronel, Carlos, Coronel, Carlos; Morris, Carlos Coronel and Steven Morris, Carlos Coronel; Steven Morris, Steven Morris; Carlos Coronel

Publisher:Cengage Learning

Principles of Information Systems (MindTap Course...

Computer Science

ISBN:9781285867168

Author:Ralph Stair, George Reynolds

Publisher:Cengage Learning

SEE MORE TEXTBOOKS

Recommended textbooks for you

Fundamentals of Information Systems
Computer Science
ISBN:9781305082168
Author:Ralph Stair, George Reynolds
Publisher:Cengage Learning
Fundamentals of Information Systems
Computer Science
ISBN:9781337097536
Author:Ralph Stair, George Reynolds
Publisher:Cengage Learning
Np Ms Office 365/Excel 2016 I Ntermed
Computer Science
ISBN:9781337508841
Author:Carey
Publisher:Cengage
Principles of Information Systems (MindTap Course...
Computer Science
ISBN:9781305971776
Author:Ralph Stair, George Reynolds
Publisher:Cengage Learning
Database Systems: Design, Implementation, & Manag...
Computer Science
ISBN:9781285196145
Author:Steven, Steven Morris, Carlos Coronel, Carlos, Coronel, Carlos; Morris, Carlos Coronel and Steven Morris, Carlos Coronel; Steven Morris, Steven Morris; Carlos Coronel
Publisher:Cengage Learning
Principles of Information Systems (MindTap Course...
Computer Science
ISBN:9781285867168
Author:Ralph Stair, George Reynolds
Publisher:Cengage Learning

Fundamentals of Information Systems

Computer Science

ISBN:9781305082168

Author:Ralph Stair, George Reynolds

Publisher:Cengage Learning

Fundamentals of Information Systems

Computer Science

ISBN:9781337097536

Author:Ralph Stair, George Reynolds

Publisher:Cengage Learning

Np Ms Office 365/Excel 2016 I Ntermed

Computer Science

ISBN:9781337508841

Author:Carey

Publisher:Cengage

Principles of Information Systems (MindTap Course...

Computer Science

ISBN:9781305971776

Author:Ralph Stair, George Reynolds

Publisher:Cengage Learning

Database Systems: Design, Implementation, & Manag...

Computer Science

ISBN:9781285196145

Author:Steven, Steven Morris, Carlos Coronel, Carlos, Coronel, Carlos; Morris, Carlos Coronel and Steven Morris, Carlos Coronel; Steven Morris, Steven Morris; Carlos Coronel

Publisher:Cengage Learning

Principles of Information Systems (MindTap Course...

Computer Science

ISBN:9781285867168

Author:Ralph Stair, George Reynolds

Publisher:Cengage Learning

SEE MORE TEXTBOOKS