Letter to myself
Hi, Rafael
This is past you from stats 301. I hope that at this point you are a cybersecurity intern at any
company. I am writing to you because I want to remind you of the importance of statistics and what can
be learned from them. While everything that I say might not help, it is important for you to understand
the lessons and learn from the mistakes that were made in this class.
Firstly, I want to remind you of all the statistical learnings and concepts that we learned through
that data project we completed. I understand that we will not be using all the knowledge that we
learned but I want to at least remind you of the different surveys that can be used and when to use them
and remember some of the different types of tests that we used. The different types of random samples
that I hope you keep track of are simple random, stratified random, clustered, longitudinal, and a census.
Simple random samples are where every individual in a population has an equal chance of being
selected, and this provides an unbiased representation of the entire population. Stratified random
samples are where the population is split into groups and random samples are taken from each group,
this ensures representation from subgroups in the population. Clustered samples are where you divide
the population into groups, randomly select a couple, and then survey all groups in each cluster. This is
efficient when its not practical to sample individuals directly. Longitudinal samples are where you survey
the same individuals over a period, this is to track the change over time. Censuses are where you survey
the entire population, this is inefficient but makes sure everyone is heard and gets rid of the chance that
random samples have of being wrong. The tests you should remember are anova, two-way anova, chi-
square, linear regression, Hypothesis tests, and the Z T and F tests. Both anova and two-way anova are
used to analyze the differences in a group, but two way extends the test to study the influence of two
categorical variables and asses the interactions between the two variables. Chi-square is used for
categorical data and to determine if there is an association between two variables. Linear regression and
the F test are used to model a line after data, but the F test finds out how good of a fit the line is.
Hypothesis tests are procedures used to make inferences from sample data and help evaluate if
observed differences are likely to impact the population. Z-tests and T-tests are both used in hypothesis
testing, Z-tests are used when there is a large sample size and you know the population standard
deviation, on the other hand, T-Tests are used with small sample sizes or when the population standard
deviation is unknown.
As a cybersecurity intern, the types of surveys and when to use them are particularly important
when you will be checking on security risks that may happen. You need to understand when you can use
simple random versus stratified random versus longitudinal, and which ones are the most technically and
economically feasible.
One major connection between this course and my goals of becoming a cybersecurity expert is
being able to use statistical analysis for predictive modeling for threat detection and analyzing what
threats people are most likely to fall for. Predictive modeling could help with anticipating potential
security threats and being able to analyze if people are falling for different threats, phishing scams for
example, to prioritize what to deal with.
One significant challenge that I faced in the states was understanding how to use the statistical
software (spss). I had problems with this because of all the different options available. This software