Stats Project2

pdf

School

University of Texas, Dallas *

*We aren’t endorsed by this school

Course

6313

Subject

Computer Science

Date

Dec 6, 2023

Type

pdf

Pages

8

Uploaded by DoctorPelican3795

Report
CS6313 STATISTICAL METHODS FOR DATA SCIENCE PROJECT 2 Problem A : Study Group Aishwarya Vinod Menon (AXV220062) Trishala Reddy (TXR220017) Dibyanshi Singh (DXS210139) Chapter 8.1 and 8.2 Descriptive Statistics Citation: Statology.org, 2020., StackOverflow,2021. Step 0: Identification and language # Name: Aishwarya Vinod Menon, Trishala Reddy # Language: Python 3.x import numpy as np import pandas as pd Step 1: Load Chapter8.txt (As demonstrated by the professor) def loadData(): data = list() fp = open("Chapter8.txt", "r") text = fp.readline() fp.close() dataset1 = text.split(","); for item in dataset1: data.append(int(item)) return data data=loadData() Step 2. The Population length= len (data) min_data=np.min(data) max_data=np.max(data) mean=np.mean(data) variance = np.var(data, ddof=0) std_dev = np.sqrt(variance) quartile_25=np.percentile(data,25) quartile_75=np.percentile(data,75) interquartile_range=quartile_75-quartile_25 lower_outlier_limit= quartile_25-1.5*interquartile_range upper_outlier_limit=quartile_75+1.5*interquartile_range
outliers = np.where((data < lower_outlier_limit) | (data > upper_outlier_limit)) num_outliers = len(outliers) print("Length:", length) print("Minimum :", min_data) print("Maximum :", max_data) print("Mean:", mean) print("Variance: " ,variance ) print("Standard Deviation :", std_dev) print("25% Quartile:", quartile_25) print("75% Quartile:", quartile_75) print("Interquartile Range (IQR):", interquartile_range) print("Number of Outliers:", num_outliers) Output: Step 2. 1000 Unit Sample: sample= np.random.choice(data,1000) Num_datapoints= len(sample) minimum=min(sample) maximum=max(sample) mean=np.mean(sample) lower_quartile= np.percentile(sample,25) upper_quartile=np.percentile(sample,75) inter_quartile_range=upper_quartile-lower_quartile lower_limit= lower_quartile-1.5*inter_quartile_range upper_limit=upper_quartile+1.5*inter_quartile_range outlier = np.where((sample < lower_limit) | (sample > upper_limit)) outliers_no = len(outlier) print("Number of datapoints:", Num_datapoints) print("Minimum Value:", minimum) print("Maximum Value:",maximum) print("Mean:",mean) print("25% Quartile:", lower_quartile) print("75% Quartile:", upper_quartile) print("The lower Outlier limit:",lower_limit) print("The upper Outlier limit:",upper_limit) print("The number of outliers:",outliers_no)
Output: Step 3. 10000 Unit Sample sample_s= np.random.choice(data,10000) datapoints= len(sample_s) min_s=min(sample_s) max_s=max(sample_s) mean_s=np.mean(sample_s) lower_quartile_s= np.percentile(sample_s,25) upper_quartile_s=np.percentile(sample_s,75) inter_quartile_range_s=upper_quartile_s-lower_quartile_s lower_limit_s= lower_quartile_s-1.5*inter_quartile_range_s upper_limit_s=upper_quartile_s+1.5*inter_quartile_range_s outlier_s = np.where((sample_s < lower_limit_s) | (sample_s > upper_limit_s)) outliers_no_s = len(outlier_s) print("Number of datapoints:", datapoints) print("Minimum Value:", min_s) print("Maximum Value:",max_s) print("Mean:",mean_s) print("25% Quartile:", lower_quartile_s) print("75% Quartile:", upper_quartile_s) print("The lower Outlier limit:",lower_limit_s) print("The upper Outlier limit:",upper_limit_s) print("The number of outliers:",outliers_no_s) Output: Step 4: 100000 Unit Sample sample_se= np.random.choice(data,100000) Datapoints_se= len(sample_se) min_se=min(sample_se)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
max_se=max(sample_se) mean_se=np.mean(sample_se) lower_quartile_se= np.percentile(sample_se,25) upper_quartile_se=np.percentile(sample_se,75) inter_quartile_range_se=upper_quartile_se-lower_quartile_se lower_limit_se= lower_quartile_se-1.5*inter_quartile_range_se upper_limit_se=upper_quartile_se+1.5*inter_quartile_range_se outlier_se = np.where((sample_se < lower_limit_se) | (sample_se > upper_limit_se)) outliers_no_se = len(outlier_se) print("Number of datapoints:", datapoints_se) print("Minimum Value:", min_se) print("Maximum Value:",max_se) print("Mean:",mean_se) print("25% Quartile:", lower_quartile_se) print("75% Quartile:", upper_quartile_se) print("The lower Outlier limit:",lower_limit_se) print("The upper Outlier limit:",upper_limit_se) print("The number of outliers:",outliers_no_se) Output: Step 5. The Central Limit Theorem def calculate_statistics( data ): num_datapoints = len(data) min_value = min(data) max_value = max(data) mean = np.mean(data) quartile_25 = np.percentile(data, 25) quartile_75 = np.percentile(data, 75) iqr = quartile_75 - quartile_25 lower_outlier_limit = quartile_25 - 1.5 * iqr upper_outlier_limit = quartile_75 + 1.5 * iqr # Count the number of outliers in the sample outliers = [x for x in data if x < lower_outlier_limit or x > upper_outlier_limit] num_outliers = len(outliers)
return num_datapoints, min_value, max_value, mean, quartile_25, quartile_75, iqr, lower_outlier_limit, upper_outlier_limit, num_outliers # Calculate statistics for the population population_statistics = calculate_statistics(data) # Randomly sample from the population for different sample sizes sample_sizes = [1000, 10000, 100000] sample_statistics = [] for sample_size in sample_sizes: sample = random.sample(data, sample_size) sample_statistics.append(calculate_statistics(sample)) # Create a table of calculations data1 = { 'Sample Size': sample_sizes + ['Population'], 'Number of Datapoints': [stats[0] for stats in sample_statistics] + [population_statistics[0]], 'Minimum Value': [stats[1] for stats in sample_statistics] + [population_statistics[1]], 'Maximum Value': [stats[2] for stats in sample_statistics] + [population_statistics[2]], 'Mean': [stats[3] for stats in sample_statistics] + [population_statistics[3]], '25% Quartile': [stats[4] for stats in sample_statistics] + [population_statistics[4]], '75% Quartile': [stats[5] for stats in sample_statistics] + [population_statistics[5]], 'Interquartile Range': [stats[6] for stats in sample_statistics] + [population_statistics[6]], 'Lower Outlier Limit': [stats[7] for stats in sample_statistics] + [population_statistics[7]], 'Upper Outlier Limit': [stats[8] for stats in sample_statistics] + [population_statistics[8]], 'Number of Outliers': [stats[9] for stats in sample_statistics] + [population_statistics[9]] } df = pd.DataFrame(data1) # Print the table print(df) # Answer the question about the central limit theorem print("\nAnswer:") print("As the sample size increases, the sample statistics converge to the population assumptions. This is consistent with the central limit theorem, which states that the relative distribution of the sample size approaches a normal distribution as the sample size increases.")
Chapter 8.3 Scatter Plot # Name: Dibyanshi Singh # Language: R # Create a sequence of angles from 0 to 360 degrees with an 18-degree interval ang <- seq(0, 360, by = 18) # Convert angles to radians angles_in_radians <- ang * (pi / 180) # Calculate x-positions and y-positions x_positions <- cos(angles_in_radians) * 5 y_positions <- sin(angles_in_radians) * 5 # Create a data table data_tble <- data.frame(Angle_degrees = ang, X_Position = x_positions, Y_Position = y_positions) # Create scatter plots with connecting lines # Angle vs. X-Position plot(data_tble$Angle_degrees, data_tble$X_Position, type = "b", main = "Angle vs. X- Position", xlab = "ang (degrees)", ylab = "X-Position") # Angle vs. Y-Position plot(data_tble$Angle_degrees, data_tble$Y_Position, type = "b", main = "Angle vs. Y- Position", xlab = "ang (degrees)", ylab = "Y-Position") # X-Position vs. Y-Position
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
plot(data_tble$X_Position, data_tble$Y_Position, type = "b", main = "X-Position vs. Y- Position", xlab = "X-Position", ylab = "Y-Position") Output: