practice questions
docx
keyboard_arrow_up
School
Wilfrid Laurier University *
*We aren’t endorsed by this school
Course
487
Subject
Statistics
Date
Apr 3, 2024
Type
docx
Pages
13
Uploaded by girlinlilac
Variable
Anything that can take on more than one value
SAT score
Age
Weight
Gender
Measurement:
The assignment of labels to a variable or an outcome
SAT (variable): 740 (measurement)
Age : 22
Weight : 160
Gender : male
types of variables
discrete (belongs to unique and separate categories, i.e.: dogs, cats, rats)
If there are only two categories, then it is a dichotomous variable i.e. open or closed, male or female.
Continuous variables: What is measured varies along a line scale and can have small or large units of measure
Length
Temperature
Age
Distance
Time
type of variable : continuous variables
What is measured varies along a line scale and can have small or large units of measure
Length
Temperature
Age
Distance
Time
nominal scales
Separated into different categories
NAMES
All categories are equal
- Cats, dogs, rats
NOT: 1st, 2nd, 3rd
Republicans, Democrats, Independents
There is no magnitude within a category
One dog is not more dog than another
No intermittent categories
No dog/cat or cat/fish categories
Membership in only one category, not both
Mutually exclusive properties
ordinal scales
what is measured by ranks, 1st 2nd 3rd,
Although there is a ranking difference between the groups, the actual difference between the group may vary.
Marathon runners classified by finish order
The times for each group will be different
Top ten 4- to 5-hour times
Bottom ten 4- to 5-week times
interval scales
Someone or thing is measured on a scale in which interpretations can be made by knowing the resulting measure.
The difference between units of measure is consistent.
Height
Speed
Four miles really is twice as far as two miles
ratio scale
Just like an interval scale, and there is a definable and reasonable zero point.
Time, weight, length
Seldom used in social sciences
All ratio scales are also interval scales, but not all interval scales are ratio scales
T/F?
The higher the level of measurement the more precise the value.
true
If you can... (types of data) why type do you have?
Assign just names
Then you have Nominal Data
Put things in order
Then you have Ordinal Data
If the distance between things is consistent
Then you Interval Data
If the scale has an absolute zero
Then you have Ratio Data
why do Psychologists typically treat their measurements as if they are Interval Data
This is done mostly out of tradition
This is done despite the fact that IQ or Personality Scores are actually Ordinal Data
And - despite the fact that the statistical tests we choose demand Interval or Ratio level data
We justify this because our tests are 'robust'
reliability
consistency of the instrument
types of reliability
Internal Consistency (Consistency of the items) - Test-retest Reliability (Consistency over time) -
Interrater Reliability (Consistency between raters) - Split-half Methods - Alternate Forms Methods
reliability is synonymous with?
consistency
....
It is the degree to which test scores for a an individual test taker or group of test takers are consistent over repeated applications.
No psychological test is completely consistent, however, a measurement that is unreliable is worthless.
The consistency of test scores is critically important in determining whether a test can provide good measurement.
Because no unit of measurement is exact, any time you measure something (observed score), you
are really measuring WHAT two things
. True Score - the amount of observed score that truly represents what you are intending to measure.
Error Component - the amount of other variables that can impact the observed score
Observed Test Score = True Score + Errors of Measurement
Why do test scores vary - Possible Sources of Variability of Scores
General Ability to comprehend instructions
- Stable response sets (e.g., answering "C" option more frequently)
- The element of chance of getting a question right
- Conditions of testing
- Unreliability or bias in grading or rating performance
- Motivation
- Emotional Strain
measurement error
Any fluctuation in test scores that results from factors related to the measurement process that are irrelevant to what is being measured.
The difference between the observed score and the true score is called the error score. S true = S observed - S error
Developing better tests with less random measurement error is better than simply documenting the amount of error.
Measurement Error is Reduced By?
Writing items clearly
- Making instructions easily understood
- Adhering to proper test administration
- Providing consistent scoring
how can we determine reliability?
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Internal Consistency
- Test-retest Reliability
- Interrater Reliability
- Split-half Methods
- Odd-even Reliability
- Alternate Forms Methods
internal consistency
Measures the reliability of a test solely on the number of items on the test and the intercorrelation
among the items. Therefore, it compares each item to every other item.
If a scale is measuring a construct, then overall the items on that scale should be highly correlated with one another.
A common way of measuring internal consistency ...
Cronbach's Alpha: .80 to .95 (Excellent) .70 to .80 (Very Good) .60 to .70 (Satisfactory) <.60 (Suspect)
Internal consistency estimates are a function of:
The Number of Items - if we think that each test item is an observation of behaviour, high internal consistency strengthens the relationship --- i.e., There is more of it to observe.
Average Intercorrelation - the extent to which each item represents the observation of the same thing observed.
The more you observe a construct, with greater consistency
RELIABILITY
SPLIT HALF reliability
refers to determining a correlation between the first half of the measurement and the second half of the measurement (i.e., we would expect answers to the first half to be similar to the second half).
ODD EVEN reliability
refers to the correlation between even items and odd items of a measurement tool.
In this sense, we are using a single test to create two tests, eliminating the need for additional items and multiple administrations.
Since in both of these types only 1 administration is needed and the groups are determined by the
internal components of the test, it is referred to as an internal consistency measure.
Advantages of split half and odd-even
Simplest method - easy to perform
Time and Cost Effective
disadvantages of split-half and odd even
Many was of splitting
Each split yields a somewhat different reliability estimate
Which is the real reliability of the test?
Test retest reliability
Test-retest reliability is usually measured by computing the correlation coefficient between scores of two administrations.
If a scale is measuring a construct consistently, then there should not be radical changes on the scores between administrations --- unless something significant happened.
The rationale behind this method is that the difference between the scores of the test and the retest should be due to measurement solely.
Amount of time in test retest reliability
The amount of time allowed between measures is critical.
The shorter the time gap, the higher the correlation; the longer the time gap, the lower the correlation. This is because the two observations are related over time.
Optimum time betweem administrations is 2 to 4 weeks.
Min. corelation in test - retest reliability is?
.50
The higher the correlation (in a positive direction) the higher the test-retest reliability
Problem in test-retest
memory effect. Which means that a respondent may recall the answers from the original test, therefore inflating the reliability.
interrater reliability
Interrater reliability means that if two different raters scored the scale using the scoring rules, they should attain the same result.
Interrater reliability is usually measured by computing the correlation coefficient between the scores of two raters.
Here the criterion of acceptability is pretty high (e.g., a correlation of at least .9), but what is considered acceptable will vary from situation to situation
Parallel/Alternate Forms Method
refers to the administration of two alternate forms of the same measurement device and then comparing the scores.
Both forms are administered to the same person and the scores are correlated. If the two produce the same results, then the instrument is considered reliable.
A correlation between these two forms is computed just as the test-retest method.
advantages of parallel/alternate forms method
Eliminates the problem of memory effect.
Reactivity effects (i.e., experience of taking the test) are also partially controlled.
Can address a wider array of issues than the test-retest method.
disadvantages of parallel/alternate forms method
Are the two forms of the test actually measuring the same thing?
More Expensive
Requires additional work to develop two measurement tools.
Are the two forms of the test actually measuring the same thing?
More Expensive
Requires additional work to develop two measurement tools.
FACTORS that affect reliability
Administrator Factors
Number of Items on the instrument
The Instrument Taker
Length of Time between Test and Retest
administrator factors
Poor or unclear directions given during administration or inaccurate scoring can affect reliability.
number of items
The larger the number of items, the greater the chance for high reliability.
: Use longer tests or accumulate scores from short tests.
the test retaker
If you took an instrument in August when you had a terrible flu and then in December when you were feeling quite good, we might see a difference in your response consistency. If you were under considerable stress of some sort or if you were interrupted while answering the instrument questions, you might give different responses.
How high should reliability be?
A highly reliable test is always preferable to a test with lower reliability.
.80 > greater (Excellent) .70 to .80 (Very Good) .60 to .70 (Satisfactory) <.60 (Suspect)
A reliability coefficient of .80 indicates that 20% of the variability in test scores is due to measurement error.
Construct Validity
Evaluate the adequacy of the operational definition.
Is the operational definition sufficiently measuring the construct it claims to measure?
Internal Validity
Evaluate the extent that it was the independent variable that caused the changes or differences in the dependent variable.
Are there alternative explanations (confounds)?
External Validity
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Evaluate the extent that the results can generalize to other populations and settings.
Can the results be replicated with other participants?
Can the results be replicated in other settings?
Four General Categories of Variables
Situational variables
Response variables
Participant or subject variables
Mediating variables
Situational Variable
Extraneous features of the environment that are present and may influence dependent variable responses.
Response Variables
All potential responses to the manipulation of an independent variable or observed in reaction to the environment.
Participant or Subject Variables
These extraneous variables are related to individual characteristics of each participant that may impact how he or she responds.
Mediating Variables
number of bystanders -> diffusion of responsibility -> helping behaviors
OPERATIONAL DEFINITIONS OF VARIABLES
Variable is an abstract concept that must be translated into concrete forms of observation or manipulation
Studied empirically
Help communicate ideas to others
NONEXPERIMENTAL VERSUS EXPERIMENTAL METHODS
Nonexperimental Method
Direction of Cause and Effect
The Third-Variable or Confounding Variable Problem
Experimental Method
Experimental Control
Randomization
The casual possibilities in a non-experimental study
exercise -> anxiety
anxiety -> exercise
income -> exercise and anxiety
IV
Independent Variables
The variables that are considered to be the "cause"
Usually manipulated by the researcher
DV
The variables that are considered to be the "effect"
Usually measured by the researcher
DV is on what axis?
Y axis
IV is on what axis?
X axis
causality - Inferences of Cause and Effect Require Three Elements:
Temporal precedence
Covariation between the two variables
Need to eliminate plausible alternative explanations
ADVANTAGES OF MULTIPLE METHODS
Artificiality of Experiments
Ethical and Practical Considerations
Participant Variables
Description of Behavior
Successful Predictions of Future Behavior
Construct Validity
Adequacy of the operational definition of variables
Internal Validity
Ability to draw conclusions about causal relationships from our data
External Validity
Extent to which the results can be generalized to other populations and settings
Conclusion Validity
Draws reasonable conclusions based upon an analysis of the data
types of questionnaires
Self-administration
Face-to-face
Example of true experiment:
in class
Women in Drug Treatment (Tx)
I.V.'s...
Assessed for pre- post-Tx drug use (D.V.) using surveys
Other D.V.'s measured
two types of self reports
Frequency of occurrence
How many drugs used over time
Prevalence of occurrence
How many women use drugs
Perceptions and attitudes in questionnaires
in general Perceptions and attitudes
Public views on a variety of subjects
Sentencing policies
Gun control
Police performance
Drug abuse
Example: Campus Security: fear > victimization
Survey Instruments can also be developed to assess individual attributes.
EXAMPLES:
Personality
Anxiety
Depression
Happiness
types of questions
open ended: "Tell us about your relationship with your parents."
closed ended: Single worded answer - "did you parents hit you?
Is this open or closed?
"About how many times did you get a spanking as a child?"
Never
1 to 20
21 or more
closed.
open ended questions positive points:
Give us a rich source of information
Often used to construct a closed-ended survey for later use
Usually included at the end of every survey as a 'catch all' question
Example: "Is there anything else you want to share with use about your parents?"
open ended difficulties
Hard to summarize across many surveys
Must be 'coded' so that statistics can be generated
Coding can be too subjective
closed ended positive points:
Gives us uniform objective responses
Can be easily input into a data file for analysis
closed ended difficulties:
May fail to include proper response categories
Example: "How many sexual partners have you had in the last year?"
1 to 5
6 or more
closed ended should have categories that are?
Should have response categories that are
Exhaustive
Can use category: "Other"
Mutually Exclusive
Bad Example: "How many times have you hit your children in the last year?"
0 to 5
5 to 10
10 to 20
20 or more
What is the likert scale?
1. Strongly Agree
2. Agree
3. Disagree
4. Strongly Disagree
Sometimes insert a "Neutral" response
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
open/closed ended should be?
Clear and unambiguous
"I feel fear when I must speak in public."
Measured with Likert Scale
Short items are best
Don't use Negative Items
"I don't think people should be afraid to speak."
Rather use "I like to speak to large groups."
Wording should be examined for potential bias:
"assistance to poor" rather than "welfare"
Results: 63% of respondents said too little was given vs. 23% too little for welfare
"people who are homeless" rather than "bums"
question ordering can have what affect?
Administer the most sensitive questions near the end of the survey
Example: Health; Income; Sexuality
Be aware that questions that appear earlier can influence answers later
Example: Questions about neighborhood crimes before questions about gun control laws
Questionnaire construction
Don't squeeze too much on a page
Leave it open and 'breezy' looking
Reduces errors
Doesn't demoralize the respondent as they move through the pages quickly
Don't use abbreviations like "abbrev." as they can confuse everyone
Use 'contingency' or 'skip-questions'
so only certain people get certain questions
Example: Pregnancy questions for women - otherwise
"If male - skip to question 60"
Use matrix for questions with the same potential responses
Example: Agree Disagree
Spanking helps [ ] [ ]
Hit with hand only [ ] [ ]
Using belt is okay [ ] [ ]
Useful for Likert responses:
SA A D SD
In-Person Interviews?
Completion rates are very high (85% +)
Reduces 'don't knows'
Can explain questions to eliminate confusion
Can observe the respondent and make valuable notes on their behavior
Can 'probe' for deeper information
But: people are more likely to give 'socially-acceptable' answers
List some common tests and acronyms
APGAR: appearance, pulse, grimise, acuity, respiration (new born test)
SAT, GRE, FCAT, MCAT
What is the purpose of textbook
expose us to assign and use the assessment instruments in sociological, legal, political testing. See legal implications.
How did Darwin's work influence (directly and indirectly) tests and measurement
Darwin saw individual differences which led to his cousin francis goltman took idea and applied to humans in anthroprometric method= metrics for differences in humans ig weight, height,
How did Alfred Binet contribute to tests and measurement
investigated why children didn't do well which led to intelligence testing
Describe how personality tests first got started
In WW1 to place people into different areas in the military . Maturity, leadership, etc.
What is controversial in Kaplan's approach to tutoring
he offered tutoring for standardized testing test so people could do better. built up tutoring school, which still exist today. still assumption we can teach people, how to do better on tests. CONTROVERSY = no data to show that tutoring actually helps.
Describe what a professional must do when they notice a particular behavior or set of behaviors
TAKE AN ACTION - they reach conclusion, then take action
The book's authors define a test as:
everything - exams, test, procedure. any way of measuring things
Describe and give examples of how we might assess: Achievement, Personality, Aptitude, Ability
(intelligence and vocational)
achievement: history exam, or exam
Personality: criteria test
Aptitude: how good is someone ins sales
ability: skills and competence (intelligence - learn and apply)
Describe and give examples for the five main reasons we do tests: Selection, Placement, Diagnosis,
Hypothesis Testing, Classification for career
selection: training, who to train, who to accept
placement: giving a math test to see what course to put them in
diagnoses: mental disorders
hypothesis: did the program work?
classification for career: what will suit you the best?
Explain some pro's and con's about different aspects of the subject (p.13): precision, good tools (validity
and reliability), take on many forms, interpreted only within context measured, test misuse,
achievement tests (p.13-15)
Precision: can measure ones ability, but HOW one solves is difficult. Tools: sometimes use the best, sometimes what's convenient. Best= more time and money - more accurate and reliable. Take on many forms: paper pencil, computer, observation - select form which best fits what Q you're asking. Interpreted only within the context measured: keep test scores in perspective & understand them within the initial purpose for the testing. misuse: know purpose of test, who to give to, quality, how to interpret. Achievement tests: those who know material, those who don't. some are bad test takers.
Give some examples of independent and dependent variables
IV - whats manipulated, how much alcohol
DV - whats being measured, how drunk are you
Explain and give examples why as psychologists we act as if ordinal level data is really interval level data
tradition
what are levels of measurement?
ordinal, nominal, interval, ratio
The chapter title is "Welcome to Lake Woebegone..." This comes from: a fictional town in Minnesota
from a character who reports on the radio: "Well, that's the news from Lake Wobegon, where
all the women are strong, all the men are good looking, and all the children are above average."
Why is this relevant to our discussion of percentiles
because ignore fact that there is variability and that there are other places for distribution.
Define a percentile or percentile rank
percentage of people who scored lower than you. The point in a distribution below which s percentage of scores fall.
Define Quartile; Q1, Q2, Q3
Q1 = the 25th percentile, also called first quartile
Q2 = the median, also the 50th percentile
Q3 = the 75th percentile, third quartile.
anyone with a score of 1-25 goes in first quartile. etc.
Define Stanine
one of nine equal segments in a normal distribution. Half of one standard deviation normally.
What are some problems with Stanine scores
lumps people together, no individual differences, cannot tell differences in performance, limit to nine categories, assumption that placing in one of nine stanines makes conceptual sense.
Describe the concept of validity
the assessment tool does that it says it does. Wants to know whats being tested and reliability how consistently. ADD more items to make it more valid.
Give examples for the different types of validity (p.64)
predictive validity - how well a test outcome is consistent with a criterion that occurs in the future - do high scores in high school predict one doing well in college.
content validity - where the test items sample the universe for items for which the test is designed - achievement tests, certification, licensing. examine closely to be sure its accurate.
Criterion validity - when you want to know if test scores are systematically related to other criteria that indicate the test taker is competent in a certain area (correlate scores with another measure which is valid and assess same abilities)
construct validity - if a test measures some underlying psychological construct -how well a test score reflects an underlying construct.
Give example of predictive validity and how the GRE is an example of a test with problems
The GRE doesn't necessarily predict how well one will do in graduate school.
Explain what you would do if your test validity is low in: content validity and construct validity
content - rewrite, consult someone who is in this field and they will rewrite it.
construct - take a better look at the theoretical rationale that underlies the test you developed and
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
the items you created to reflect that rationale. maybe the definition and theoretical model are underdeveloped.
Describe and give example of how a test score is the combination of a true score and an error score
Never know what a score is, it changes moment to moment, always includes some error.
See 9 more
AD
Related Documents
Recommended textbooks for you

Linear Algebra: A Modern Introduction
Algebra
ISBN:9781285463247
Author:David Poole
Publisher:Cengage Learning

Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt

Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill

Holt Mcdougal Larson Pre-algebra: Student Edition...
Algebra
ISBN:9780547587776
Author:HOLT MCDOUGAL
Publisher:HOLT MCDOUGAL
Recommended textbooks for you
- Linear Algebra: A Modern IntroductionAlgebraISBN:9781285463247Author:David PoolePublisher:Cengage LearningBig Ideas Math A Bridge To Success Algebra 1: Stu...AlgebraISBN:9781680331141Author:HOUGHTON MIFFLIN HARCOURTPublisher:Houghton Mifflin HarcourtGlencoe Algebra 1, Student Edition, 9780079039897...AlgebraISBN:9780079039897Author:CarterPublisher:McGraw Hill
- Holt Mcdougal Larson Pre-algebra: Student Edition...AlgebraISBN:9780547587776Author:HOLT MCDOUGALPublisher:HOLT MCDOUGAL

Linear Algebra: A Modern Introduction
Algebra
ISBN:9781285463247
Author:David Poole
Publisher:Cengage Learning

Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt

Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill

Holt Mcdougal Larson Pre-algebra: Student Edition...
Algebra
ISBN:9780547587776
Author:HOLT MCDOUGAL
Publisher:HOLT MCDOUGAL