Go To Home Page
Intro to ANOVAs
We learned previously that there are 4 things we need to know about
any statistical test:
- What variables can the test handle?
- What statistic does the test generate?
- What distribution does the test use?
- What arguments does the R function require?
Let’s talk about these 4 needs in relation to our next test:
ANOVAs.
- What does ANOVA stand for? Any guesses?
Don’t peek!
ANOVA stands for ANalysis Of VAriance.
Variables
Dependent Variables
Like the t-test, an ANOVA usually takes a single numeric
dependent variable.
Predictor Variables
Predictor/independent variables in ANOVA are called
FACTORS.
In R, these variables must be factors (do you see the
connection?).
In other words, ANOVA can only take categorical predictor
variables. No numbers allowed.
How Many Factors?
An ANOVA can have several different Factors. But try not to go
overboard. ANOVA can handle it - but can your brain?
How Many Levels Per Factor?
This is a good question, since we know that the t-test can
only accept one 2-level ‘factor’.
An ANOVA’s factors can have 2 or more levels. There’s really
no computational limit. But there are practical limits - again, don’t
add too many levels, or you will drive yourself insane.
ANOVA Variants
The rules for ANOVA variables described above don’t always apply;
there are variants of the classic ANOVA.
MANOVA
MANOVA stands for Multivariate analysis of variance. MANOVA is a
variant of ANOVA that can take multiple dependent
variables.
- If you want to know more about this, explore the manova() function
in R
- Once you know how to do a regular ANOVA, MANOVA isn’t too hard
ANCOVA
ANCOVA stands for analysis of covariance. ANCOVA can take
factors as predictors, as well as continuous (numeric) predictors. These
continuous predictors are called covariates.
- A covariate is a variable that you want to control for,
statistically
- You aren’t usually directly interested in analyzing a covariate
BUT
- You want to see if the effect of your other variables is still
significant when this covariate is accounted for
- In other words, covariates are usually potential
confounds.
- We’ll talk more about covariates when we talk about regression
ANCOVAs are pretty easy in R – you don’t even need a special
function.
- An ANOVA needs a dependent variable. What sort of variable
(number, factor, string) should this dependent variable
be?
- How many dependent variables can a regular ANOVA take? If
you want more dependent variables, what sort of ANOVA-like test should
you perform?
- An ANOVA needs predictor/independent variable(s). What sort
of variable (number, factor, string) should these be? Give an example of
a variable that would work and a variable that would NOT work.
- What is the specific term for a predictor/independent
variable in an ANOVA?
- If you want to include a different kind of variable in an ANOVA,
what sort of ANOVA-like test should you perform?
- How many variables can an ANOVA take as predictor/independent
variable(s)?
- What are the levels of the following variables?
- Sex: Male, Female
- Airports: JFK, LaGuardia, Newark
- Car: Mazda, Cadillac, Dodge, Toyota
- How many levels can a given predictor/independent variable
have in ANOVA?
- Describe a hypothetical study where an ANOVA would be the
appropriate test to use and a t-test would not be.
The F Statistic
ANOVA uses the F statistic.
To compute the F statistic takes 4 steps:
- Compute Sum of Squares
- SSbetween = Sum of Squares Between
- SSwithin = Sum of Squares Within
- Figure out degrees of freedom
- dfbetween = degrees of freedom between
- dfwithin = degrees of freedom within
- Compute Mean Squares
- MSbetween = SSb / dfb
- MSwithin = SSw / dfw
- Compute the F statistic
Step 1: Compute Sum of Squares
Total Sum of Squares
Computing total sum of squares (SStot) isn’t necessary for
ANOVA, but you might find this conceptually helpful. Keep in mind
that:
SStot = SSbetween + SSwithin
if (!require("nycflights13")) install.packages("nycflights13")
library(nycflights13)

In the plot above, the black vertical line represents the Grand Mean,
the average of all data points.
The gray dots represent the individual data points. The blue line is
a violin plot showing the distribution of data.
The thin horizontal black lines represent the distance of some of the
data points (gray dots) from the grand mean (vertical black line). To
compute the total sum of squares (SStot), we square all these
distances, then add them all up. We have summed the
squares of the distances of each data point from the grand
mean. See where the name sum of squares comes from?
SSbetween
To compute SSbetween, we do this same thing, except
instead of using the individual data points, we use the mean values for
the different groups.
Here’s a violin plot of the flights data, but now we’ve
divided it into three groups, based on the three NYC airports.

To compute the SSbetween, we subtract each group’s mean
from the grand mean (the black horizontal lines in the zoomed-in plot
below). Then we square these differences and add them all up (we
sum the squares).
The SSbetween acts as an index of how separated the
different groups are from each other:
- If SSbetween is small, the groups are all packed pretty
tightly together (and closer to the grand mean).
- If SSbetween is big, the groups are spaced far apart from
each other (and far from the grand mean).
In other words, SSbetween is a number that expresses the
variance *between groups.

SSwithin
And now on to SSwithin. For this one, we are back to
looking at the individual data points. BUT instead of finding the
distance from each point to the grand mean, like we did for
SStotal, we’ll be finding the distance from each data point
to the mean of its group (note how the black horizontal lines
in the plot below go from a data point to the group mean, NOT
to the grand mean). Then we square these differences and add them all up
(we sum the squares).
The SSwithin acts as an index of how spread out the data
points are inside (within) the groups:
- If SSwithin is small, the data points are all packed
pretty tightly together, so the group’s distribution is narrow.
- If SSwithin is big, the data points are spaced far apart,
so the group’s distribution is broad.
In other words, SSwithin is a number that expresses the
variance *within groups.

- What is a ‘Sum of Squares’? What specifically is summed and then
squared?
Below is some made-up data. Use it to answer the next two
questions.
| 1 |
5 |
7 |
3 |
| 2 |
4 |
6 |
3 |
| 3 |
7 |
6 |
3 |
| 4 |
6 |
7 |
5 |
| 5 |
5 |
7 |
4 |
| 6 |
5 |
5 |
1 |
| 7 |
4 |
8 |
5 |
| 8 |
5 |
3 |
2 |
| 9 |
6 |
6 |
4 |
| 10 |
5 |
8 |
4 |
| Group Means: |
5.2 |
6.3 |
3.4 |
| Grand Mean: |
4.9666667 |
|
|
- How would you calculate the SSBetween? What numbers would
you use (individual observations, group means, grand mean)?
- How would you calculate the SSwithin? What numbers would
you use (individual observations, group means, grand mean)?

- Make and complete the following table, placing the graph labels (A,
B, C, or D) in the appropriate cell.
| Bigger SSWithin |
|
|
| Smaller SSWithin |
|
|
Step 3. Compute Mean Squares
Now to compute the mean squares. This part is just math:
- MSbetween = SSb / dfb
- MSwithin = SSw / dfw
Remember that the mean is the sum of a set of data points divided by
the total number of data points. Similarly, the mean square is
the sum of squares divided by the degrees of freedom
(the number of observations that were free to vary).
Some important notes:
- If you have more than one factor in your ANOVA, each will have its
own mean square. You might see these referred to by their factor names,
e.g. ‘mean square airport’ or MSairport.
- MSwithin is also often referred to as ‘mean square error’
(MSe) or ‘mean square residual’ (MSr).
- Complete the following sentence: To compute MSBetween,
divide ______ by ______.
Imagine you are reading a journal article about the Instragram study
described above. Answer the following questions:
- You see the term MS~age group~. What Mean Square is this referring
to? MSBetween or MSWithin?
- You see the term ‘mean square error’ (MSe) or ‘mean
square residual’ (MSr). What Mean Square is this referring
to? MSBetween or MSWithin?
Step 4. Compute the F statistic
Now, finally, we can compute the F statistic. If we
have more than one factor, we do this separately for each one.
F = MSb / MSw
F Statistic
The F statistic will be bigger if MSbetween
(numerator of the fraction) is bigger. In other words,
F is big is the different groups are farther apart.
The F statistic will be smaller if MSwithin
(denominator of the fraction) is bigger. In other words,
F is big when the groups are more compact and don’t overlap as
much.
- Complete the following sentence: To compute the F
Statistic, divide ______ by ______.
- Place the following things in the appropriate slots in the table
below:
- Bigger difference between the groups (Bigger
MSBetween)
- Larger standard deviation within each group (Bigger
MSWithin)
- Bigger N (sample size; Smaller MSWithin)
The F Distribution
Of course, now that we have our F statistic, we need a
distribution to compare it to. Unsurprisingly, we’ll use the F
distribution. The shape of the F distribution depends on both
the dfbetween and the df~within. It looks like this:
Copy the code chunk below into your R markdown document. Run it. Set
the Alpha Level to 0.05, then play around with the Degrees of Freedom
Between and Degrees of Freedom Within. Then answer the questions
below.
if (!require("shiny")) install.packages("shiny")
library(shiny)
shinyApp(
ui = fluidPage(
fluidRow(
column(4, wellPanel(
sliderInput("df1", label = h3("Degrees of Freedom (Between)"), min = 1,
max = 10, value = 1),
)),
column(4, wellPanel(
sliderInput("df2", label = h3("Degrees of Freedom (Within)"), min = 1,
max = 1000, value = 1),
)),
column(4,wellPanel(
sliderInput("alpha", label = h3("Alpha Level"), min = 0,
max = 0.5, value = 0),
))),
fluidRow(
column(12, offset = 0,
plotOutput("plot")
)
)),
server = function(input, output) {
output$plot = renderPlot({
ggplot(data.frame(x = c(-1, 8)), aes(x=x)) + theme_bw() +
stat_function(fun = df, geom = "area", fill = "red1",
xlim = c(qf(1 - input$alpha, df1 = input$df1, df2 = input$df2), 8),
args = list(df1 = input$df1, df2 = input$df2),
color = "red", size = 2) +
stat_function(fun = df, args = list(df1 = input$df1, df2 = input$df2), color = "black", size = 2) +
scale_x_continuous(breaks = -1:8, labels = -1:8) +
labs(
x = expression(italic("F")),
y = expression(paste("P(", italic("F"), ")")),
title = expression(paste("The ", italic("F"), " Distribution"))
) + coord_cartesian(xlim = c(-1, 8)) + theme(text = element_text(size = 30))
})
},
options = list(height = 900)
)
- What happens to the shape of the F distribution as you
increase DFBetween?
- What happens to the red area (alpha level) as you increase the
DFBetween?
- What happens to the shape of the F distribution as you
increase DFWithin?
- What happens to the red area (alpha level) as you increase the
DFWithin?
- In general, does increasing the degrees of freedom make it easier or
harder to get a statistically significant result? Why?
- If you are in charge of collecting data for a study, what can you do
to increase DFBetween? What about DFWithin? Which
change do you think is more impactful in helping you find a significant
result?
Arguments
Now that we know how ANOVA works, let’s learn about how to implement
them in R. We’ll start simple, using the example from the
flights data.
Flights_ANOVA <- aov(dep_delay ~ origin, data = flights)
Note the following:
- We are using the aov() function to run the ANOVA. There are other
functions we could use, but this works fine for a design with a single
variable.
- We have to specify a formula, just like we did with the
t-test.
- We have to specify what data we are using, just like we did with the
t-test.
Here is what the output looks like. Note we use the summary()
function here; it makes the results more readable.
summary(Flights_ANOVA)
## Df Sum Sq Mean Sq F value Pr(>F)
## origin 2 1280515 640258 396.9 <0.0000000000000002 ***
## Residuals 328518 529886717 1613
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 8255 observations deleted due to missingness
Notice:
- There are three 5 columns:
- Df = degrees of freedom
- Sum Sq = Sum of Squares
- Mean Sq = Mean Square
- F value = the F value. I feel like this one was
obvious.
- Pr(>F) = the p value. Remember that 2e-16 means
“a really tiny number”.
- And 2 rows:
- origin. This row reports dfbetween,
SSbetween, MSbetween, and the F statistic
and p values for our origin factor.
- Residuals. This row reports dfwithin,
SSwithin, and MSwithin.
- These numbers were used to compute the F statistic. Check
for yourself. 640257.6064865 / 1612.9609862 = 396.9455008
Was the result statistically significant?
ANOVA is an omnibus test.
Notice that even though we have 3 levels in our variable, ANOVA only
reported 1 p-value. What does this mean?
ANOVAs don’t compare every level to every other level. Instead, the
ANOVA looks for evidence that at least one of the levels is
different from at least one of the other levels. It indicates
to you that “one of these things is not like the others”. BUT it doesn’t
tell you which.
Big Bird is an ANOVA
Following up on ANOVA results
If you get a statistically significant result in ANOVA, do the
following:
- Graph our data to see which level of the variable appears to be
different.
- If necessary, do some t-tests to compare pairs of levels
together.
Graph the data and look at the graph
Let’s start with a graph. Armed with the knowledge that there are
differences between as least two of the groups, we can interpret this
graph with confidence.

It looks like all the groups are different from each
other.
Follow-up t-tests.
We can confirm our observations using the t_test() function
from the rstatix package:
#if(!required("rstatix")) install.packages("rstatix")
library(rstatix)
t.test.results <- flights %>%
t_test(dep_delay ~ origin) %>%
adjust_pvalue(method = "none") %>%
add_significance()
knitr::kable(t.test.results)
| dep_delay |
EWR |
JFK |
117596 |
109416 |
17.76196 |
226958.1 |
0 |
0 |
**** |
| dep_delay |
EWR |
LGA |
117596 |
101509 |
27.36163 |
216266.2 |
0 |
0 |
**** |
| dep_delay |
JFK |
LGA |
109416 |
101509 |
10.24619 |
208866.4 |
0 |
0 |
**** |
We’ll use this one later, when we talk about interactions.
Describing ANOVA results
Now that we understand our ANOVA results, we want to communicate them
to others. At a minimum, we need to include the following:
- The F statistic, with both degrees of freedom.
- the Mean Square Residual, usually called the Mean Square Error or
MSe
- The p value.
- A description of any follow-up tests.
- A plain-language description of the interpretation.
You might be asked to include other information, such as
- partial eta squared, a measure of effect size.
So, our paragraph might look something like this:
There was a significant main effect of Airport (F(2, 328518)
= 396.9, MSe = 1613, p < .001, \(\eta^2_p\) = 0.0024). Post-hoc comparisons
revealed that EWR has the longest delays, JFK the second longest, and
LGA the shortest.
In order to do the ANOVAs that follow, you will need to install and
load the datarium and Stats2Data packages. Add this
code you your setup chunk.
if (!require("datarium")) install.packages("datarium")
library(datarium)
if (!require("Stat2Data")) install.packages("Stat2Data")
library(Stat2Data)
Doing a One-Way ANOVA
A One-Way ANOVA has a single predictor/independent variable. We’ll
start with a data set about Alzheimer’s. Run the following code:
data(Amyloid)
?Amyloid
Amyloid
## Group Abeta
## 1 NCI 114
## 2 NCI 41
## 3 NCI 276
## 4 NCI 0
## 5 NCI 16
## 6 NCI 228
## 7 NCI 927
## 8 NCI 0
## 9 NCI 211
## 10 NCI 829
## 11 NCI 1561
## 12 NCI 0
## 13 NCI 276
## 14 NCI 959
## 15 NCI 16
## 16 NCI 24
## 17 NCI 325
## 18 NCI 49
## 19 NCI 537
## 20 MCI 73
## 21 MCI 33
## 22 MCI 16
## 23 MCI 8
## 24 MCI 276
## 25 MCI 537
## 26 MCI 0
## 27 MCI 569
## 28 MCI 772
## 29 MCI 0
## 30 MCI 260
## 31 MCI 423
## 32 MCI 780
## 33 MCI 1610
## 34 MCI 0
## 35 MCI 309
## 36 MCI 512
## 37 MCI 797
## 38 MCI 24
## 39 MCI 57
## 40 MCI 106
## 41 mAD 407
## 42 mAD 390
## 43 mAD 1154
## 44 mAD 138
## 45 mAD 634
## 46 mAD 919
## 47 mAD 1415
## 48 mAD 390
## 49 mAD 1024
## 50 mAD 1154
## 51 mAD 195
## 52 mAD 715
## 53 mAD 1496
## 54 mAD 407
## 55 mAD 1171
## 56 mAD 439
## 57 mAD 894
Now examine the data and read the help page in the ‘help’ tab. Then
answer the following questions:
- What is the Dependent variable?
- What is the Predictor variable? How many levels does it have, and
what are they?
- Is this data within- or between-subjects?
- Now set up an ANOVA to test whether the levels of the Predictor
variable are different.
- Don’t forget to use the summary() function to show your
results.
- Your output should look like this:
## Df Sum Sq Mean Sq F value Pr(>F)
## Group 2 2129969 1064985 5.971 0.00454 **
## Residuals 54 9632060 178371
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Interpreting ANOVA Output
- What is the DFBetween and DFWithin for this
data?
- Which Mean Squares is MSBetween and which is
MSWithin?
- Which two numbers were divided to get the F value of 5.971?
- Are the levels of the Predictor statistically different? How do you
know?
Following Up
- Explain what the following statement means in your own words: “ANOVA
in an omnibus test”.
- Show how you can get summary statistics for the levels of the
Predictor, like this:
## # A tibble: 3 × 3
## Group mean_Amyloid sd
## <fct> <dbl> <dbl>
## 1 mAD 761. 427.
## 2 MCI 341. 406.
## 3 NCI 336. 436.
- Now copy the following code into your document. Use comments to
explain what each line of code is doing:
Amyloid %>% t_test(Abeta ~ Group) %>%
adjust_pvalue(method = "none") %>%
add_significance()
## # A tibble: 3 × 10
## .y. group1 group2 n1 n2 statistic df p p.adj p.adj.signif
## <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 Abeta mAD MCI 17 21 3.08 33.6 0.004 0.004 **
## 2 Abeta mAD NCI 17 19 2.95 33.7 0.006 0.006 **
## 3 Abeta MCI NCI 21 19 0.0358 36.9 0.972 0.972 ns
- Now write a brief 1-2 sentence APA-style summary of the results.
When you interpret ANOVA results, you want to do 3 things:
- Describe the results numerically. For this ANOVA, you must include
the F statistic, both degrees of freedom, the MSe
(mean square error, AKA MSWithin) and the p value.
Like this: There was a main effect of Group (F(2,12) = 4.938,
MSe = 1.067, p = .027).
- Describe any follow-up t-tests you did. Like this:
Follow-up t-tests indicated that …
- Describe the results in plain language. Like with words that a human
would use to talk to another human. Like this: These results show that
XXX Group had higher Amyloid Plaque than …
Within-subjects vs. Between-subjects Designs
So that’s how you do a basic one-way ANOVA. Before we proceed, we
need to (re)learn the distinction between within-subjects and
between-subjects data.
What is the subject in between- and
within-subjects designs?
The subject is the person (or thing) providing the data.
- All swans are white. The subjects are the specific swans
that we examine.
- Every bald person is smart. The subjects are the specific
bald people that we give IQ tests to.
- Do men, on average, weigh more than women? The subjects are
the specific men and women we weigh.
- Are the mice heavier post-treatment than pre-treatment? The
subjects are the specific mice that we weigh.
Between-Subjects Data
In between-subjects data, each subject is compared to a
different subject.
The data we used above - flights departing from 3 different airports
- is between-subjects data. The subjects - the specific thing
providing data - are the individual flights; each flight had its own
departure delay (dependent variable) and each flight had an airport of
origin (factor). We know that each flight left either EWR, JFK,
or LGA; it is impossible for a given airplane flight to leave
from two different airports! So when we compared the average departure
delay between these three airports, we were comparing different
flights to each other. That’s between-subjects data.
Another way to think about between-subjects data is to ask “How many
data points did each subject give us?”. Since each flight only gave one
data point (i.e. it only had one departure delay value), this is
between-subjects data.
When the study’s purpose is to compare two different groups
(e.g. ADHD vs. control, Americans vs. Europeans, rural vs. urban), the
design and data are usually between-subjects.
Let’s consider some of the examples given above:
- All swans are white. Each swan only gives us one data point
- black or white - so this is between-swans data.
- Every bald person is smart. Each person is either
bald or not; we’re comparing different people when we compared
baldies and hairies. Also, each person’s IQ is only going to be measured
once.
- Do men, on average, weigh more than women? When we are
comparing men to women, we are comparing two groups of different people.
So this is between-subjects.
- Are the mice heavier post-treatment than pre-treatment? In
this case, we are measuring each mouse twice - before and after some
treatment. This means that we are camparing each mouse to
itself pre- vs. post-treatment. This is NOT between-subjects
data.
Within-Subjects Data
The mouse example above represents within-subjects data.
Within-subjects data is often called repeated-measures
data. In this kind of data, each subject gives more than one data
point, so that an individual subject is being compared to
him/herself. Within-subjects data is data where participants
experienced more than one level of a variable. In other words, 2 or more
data points came from the same participant. This is conceptually much
like the paired-samples t-test we saw earlier.
NOTE: More than one data point means more than one
measurement of the same variable. If I take an IQ test
and a personality test, that’s not within-subjects data because those
are different variables. If I take an IQ test before and after I drink a
bunch of mountain dew, that’s within-subjects data because the SAME
variable is measures twice.
For example, suppose our participant Tim was part of a study about
listening to music while studying. He came in and studied for an hour
while listening to instrumental classical music. Then he took a test on
the material he studied. The next week he studied for an hour listening
to classical music with lyrics. Then he took a test on that material.
For Tim, Lyrics is a within-subjects variable - he experienced Lyrics
AND No Lyrics. Because of this, 2 different data points both came from
Tim. And because these data points are both from the same person, they
are not truly independent.
Other examples:
- Are my students learning? If I compare the same students at
the beginning and end of the semester, that’s within-subjects data.
- Do I look better in blue? If you try on 6 blue outfits and
6 red ones and get people’s opinion, that’s within-subjects data: since
you wore both the blue and the red, you’re comparing blue you to red
you.
What do we do with within-subjects data?
The big difference between between-subjects and
within-subjects data is that within-subjects data are
not independent; because multiple data points come from the same person,
those data points “belong together” - they are not free to vary. We
already saw this in the last section when we learned about paired
t-tests, where we wanted to link up the data points that
belonged in neat pairs.
So when we do a within-subjects AKA
repeated-measures ANOVA, we have to tell the ANOVA which data
points belong to the same subjects. In the next section, we’ll see how
that is done. But first, I have some questions:
- What is a between-subjects variable? Give a definition AND an
example.
- What is a within-subjects variable? Give a definition AND an
example.
- Define the term repeated-measures. Give an example of a
repeated-measures study. Is a repeated-measures variable
between- or within-subjects?
- Look at this data. What is/are the variables? Is each variable
between- or within-subjects?
## Quiz Johnny_Appleseed Paul_Bunyan John_Henry
## 1 1 61 53 50
## 2 2 60 50 51
## 3 3 60 50 48
## 4 4 64 51 51
## 5 5 61 51 53
## 6 6 61 51 49
## 7 7 58 50 48
## 8 8 62 49 49
## 9 9 62 50 51
## 10 10 61 47 49
- Now look at this data about cars. Is this between- or
within-subjects data?
## mpg cyl disp hp drat wt qsec vs am gear carb hp_cat
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 Low
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 Low
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 Lowest
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 Low
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 Average
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 Low
- Now look at this data about dogsledding. Is this between- or
within-subjects data?
| Jerry Sousa |
1 |
243 |
| Jerry Sousa |
2 |
176 |
| Jerry Sousa |
3 |
304 |
| Jerry Sousa |
4 |
201 |
| Melissa Owens |
1 |
215 |
| Melissa Owens |
2 |
421 |
| Melissa Owens |
3 |
334 |
| Melissa Owens |
4 |
220 |
- Now look at this data about diamonds. Is this between- or
within-subjects data?
head(diamonds)
## # A tibble: 6 × 10
## carat cut color clarity depth table price x y z
## <dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43
## 2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31
## 3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31
## 4 0.29 Premium I VS2 62.4 58 334 4.2 4.23 2.63
## 5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75
## 6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48
- Generally speaking, if a data set is in wide format, is it most
probably between- or within-subjects?
Repeated-Measures One-Way ANOVA
For this analysis, I’ll be using data from a reading study:
Vasilev, M. R., Hitching, L., & Tyrrell, S. (2023). What makes
background music distracting? Investigating the role of song lyrics
using self-paced reading. Journal of Cognitive Psychology,
1-27.
The OSF page is here: https://osf.io/8zw4x/ The specific data we will use is
here: https://osf.io/7y3v9
This study tested people’s reading rates under 3 conditions:
- Silence
- Lyrical Music
- Instrumental Music
The goal of the study was to test whether background music affected
how quickly people read, and whether it matters whether the music had
lyrics or not.
Here’s the data for one subjects (the authors conveniently use the
word subject to label the column that indicates which
subject):
| 1 |
13 |
silence |
0.7016369 |
53 |
75.53765 |
Experiment 1a |
| 1 |
10 |
silence |
0.5253559 |
49 |
93.27011 |
Experiment 1a |
| 1 |
4 |
silence |
0.5805464 |
53 |
91.29331 |
Experiment 1a |
| 1 |
7 |
silence |
0.5305413 |
52 |
98.01310 |
Experiment 1a |
| 1 |
1 |
silence |
0.6064904 |
58 |
95.63217 |
Experiment 1a |
| 1 |
5 |
lyrical |
0.5067230 |
52 |
102.62017 |
Experiment 1a |
| 1 |
14 |
lyrical |
0.4038367 |
50 |
123.81242 |
Experiment 1a |
| 1 |
11 |
lyrical |
0.4633005 |
53 |
114.39659 |
Experiment 1a |
| 1 |
8 |
lyrical |
0.4066892 |
50 |
122.94400 |
Experiment 1a |
| 1 |
2 |
lyrical |
0.5125489 |
57 |
111.20890 |
Experiment 1a |
| 1 |
9 |
instrumental |
0.3108060 |
45 |
144.78486 |
Experiment 1a |
| 1 |
12 |
instrumental |
0.5378115 |
61 |
113.42264 |
Experiment 1a |
| 1 |
6 |
instrumental |
0.3790384 |
52 |
137.18925 |
Experiment 1a |
| 1 |
3 |
instrumental |
0.4976833 |
56 |
112.52136 |
Experiment 1a |
| 1 |
15 |
instrumental |
0.3870599 |
45 |
116.26107 |
Experiment 1a |
Notice that this person read some passages in silence, others while
listening to lyrical music, and still others while listening to
instrumental music. So, when we are comparing reading rates for silence,
lyrical, and instrumental music, we will be comparing subject 1 to
subject 1 to subject 1 (and the same for the other subjects). This is
within-subjects data for this reason.
Doing a repeated-measures ANOVA
So let’s do the ANOVA.
NOTE: There were multiple passages read in each sound condition. Some
data points were missing (see “The Horrors of Unbalanced Data”, below)
so I averaged the observations together.
NOTE: We’ve filtered the data because this study contains multiple
experiments, and the last one included an extra level of the sound
factor called “speech”, which I have removed.
NOTE: I’ve created a new subject label called subjectALL, which is a
combination of subject number and experiment number. I did this because
subject 1 in experiment 2 is NOT the same person as subject 1 in
Experiment 3, so they need to be labeled uniquely.
Reading_ANOVA <- aov(wpm ~ sound + Error(subjectALL), data = readingrate)
# NOTE: WPM stands for Words Per Minute, a measure of reading rate.
Before we get to the output, let’s consider the difference here. The
main change that makes this a within-subjects ANOVA is the
addition of the Error() term to the formula. This terms says
“use the subjectALL column to group this data, and treat the
sound variable as a within-subjects variable”.
Interpreting repeated-measures output
##
## Error: subjectALL
## Df Sum Sq Mean Sq F value Pr(>F)
## Residuals 819 4679985 5714
##
## Error: Within
## Df Sum Sq Mean Sq F value Pr(>F)
## sound 2 5288 2643.8 5.913 0.00276 **
## Residuals 1638 732401 447.1
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Notice that the ANOVA is split into two parts. We are only interested
in the “Error: Within” part. Is there a significant main effect of
sound? What can you conclude from the follow-up t-tests
below?
library(rstatix)
t.test.results <- readingrate %>%
t_test(wpm ~ sound, paired = TRUE) %>% # paired = TRUE because this is within-subjects data
adjust_pvalue(method = "none") %>%
add_significance()
knitr::kable(t.test.results)
| wpm |
instrumental |
lyrical |
820 |
820 |
3.414152 |
819 |
0.000671 |
0.000671 |
*** |
| wpm |
instrumental |
silence |
820 |
820 |
1.670099 |
819 |
0.095000 |
0.095000 |
ns |
| wpm |
lyrical |
silence |
820 |
820 |
-1.793848 |
819 |
0.073000 |
0.073000 |
ns |
One-Way Within-Subjects (Repeated Measures)
Now let’s do an ANOVA with within-subjects data!
Run the following code, then look at the data and the help page.
data(Fingers)
?Fingers
Fingers
- Can you tell that this data is within-subjects? How?
- Now set up an ANOVA to test whether the levels of the Predictor
variable are different.
- To tell R that the data is within-subjects, add this to the Formula:
Error(as.factor(Subject))
- Don’t forget to use the summary() function to show your
results.
- Your output should look like this:
##
## Error: Subject
## Df Sum Sq Mean Sq F value Pr(>F)
## Residuals 3 5478 1826
##
## Error: Within
## Df Sum Sq Mean Sq F value Pr(>F)
## Drug 2 872 436.0 7.88 0.021 *
## Residuals 6 332 55.3
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
- Get the means and standard deviations for the three types of Drugs.
Show your code and the output.
- Look at the following code. How is it different from the code in
40? More importantly, WHY is it different from the code
in 40?
Fingers %>% t_test(TapRate ~ Drug, paired = TRUE) %>%
adjust_pvalue(method = "none") %>%
add_significance()
## # A tibble: 3 × 10
## .y. group1 group2 n1 n2 statistic df p p.adj p.adj.signif
## <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 TapRate Caffeine Placebo 4 4 4.08 3 0.026 0.026 *
## 2 TapRate Caffeine Theobromine 4 4 -0.289 3 0.791 0.791 ns
## 3 TapRate Placebo Theobromine 4 4 -4.50 3 0.02 0.02 *
- Now write a brief 1-2 sentence APA-style summary of the
results.
Factorial Designs
So you can use an ANOVA with a single factor, as in the example we’ve
already seen:
- Hypothesis: Listening to music while studying will
affect reading rate.
- Dependent Variable: Reading Rate.
- Factor: Sound: Silence, Lyrical Music, Instrumental
Music.
An ANOVA with a single factor is called a one-way ANOVA. This simple
experiment can’t be analyzed using a t-test, because the
predictor variable (Sound) has 3 levels.
However, ANOVAs are ideal for factorial designs, experiments
with more than one factor.
If we were to change our example ANOVA to be a factorial
design, we would add a second factor:
- Hypothesis 1: Listening to music while studying
will affect reading rate.
- Hypothesis 2: Familiar songs will be less
distracting
- Dependent Variable: Reading Rate.
- Factor 1: Sound: Silence, Lyrical Music,
Instrumental Music.
- Factor 2: Familiarity (Levels:
participants know the songs, songs are unknown to the participants)
| Lyrical Music |
“Stayin’ Alive” by the BeeGees |
Pretty much anything else by the BeeGees |
| Instrumental Music |
Theme from Star Wars |
Theme from 80’s TV show Airwolf |
readingrate <- readingrate %>%
mutate(MusicFamiliarity = case_when(
Experiment == "Experiment 1a" | Experiment == "Experiment 1b" ~ "Familiar",
TRUE ~ "Unfamiliar"
)) # Here I make the new Factor 'MusicFamiliarity'
Main Effects
The separate effects of the different factors in an ANOVA are called
Main Effects. A main effect is the independent effect
of a factor on the dependent variable, separate from any other
factor.
A one-way ANOVA has only one factor, so only one main effect. A
two-way ANOVA has two factors, so two main effects. And so on.
Interactions
Main effects are fine, but the real reason to do a factorial design
is to look at Interactions. While a main
effect tells us how one factor influences the dependent variable, an
interaction explores how two (or more) factors work together to
influence the dependent variable. For example, we might ask:
- Does the effect of music on reading rate get stronger when the music
has is familiar?
We are asking if Sound (Factor 1) affects reading rate (dependent
variable) differently for Familiar vs. Unfamiliar songs (Factor 2).
While main effects test each factor separately, an
Interaction is a test of how two factors work together
to influence the dependent variable.
Arguments in a Factorial Design
How would we set up this factorial design in ANOVA? We can start with
what we know how to do: An ANOVA with a single factor.
Reading_ANOVA_Factorial <- aov(wpm ~ sound + Error(subjectALL), data = readingrate)
Now we add our other factors, separating each with a plus (+).
Reading_ANOVA_Factorial <- aov(wpm ~ sound + MusicFamiliarity + Error(subjectALL), data = readingrate)
And then we tell R to give us the output.
summary(Reading_ANOVA_Factorial)
##
## Error: subjectALL
## Df Sum Sq Mean Sq F value Pr(>F)
## MusicFamiliarity 1 344160 344160 64.93 0.00000000000000274 ***
## Residuals 818 4335825 5301
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Error: Within
## Df Sum Sq Mean Sq F value Pr(>F)
## sound 2 5288 2643.8 5.913 0.00276 **
## Residuals 1638 732401 447.1
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Notice that MusicFamiliarity, which is a between-subejcts factor, is
in the top section, while sound, still a within-subjects factor, is in
the bottom section as usual.
What does the Main Effect of MusicFamiliarity mean? Use the graph
below to guide your interpretation.

Adding the Interaction
Let’s see how we would set this ANOVA up to incorporate an
interaction:
Reading_ANOVA_Factorial_wInteraction <- aov(wpm ~ sound * MusicFamiliarity + Error(subjectALL), data = readingrate)
- I’ve linked sound and MusicFamiliarity with an
asterisk (*) instead of a plus (+). This tells R to consider BOTH the
main effects AND the interaction of
these two variables.
Here’s the output:
summary(Reading_ANOVA_Factorial_wInteraction)
##
## Error: subjectALL
## Df Sum Sq Mean Sq F value Pr(>F)
## MusicFamiliarity 1 344160 344160 64.93 0.00000000000000274 ***
## Residuals 818 4335825 5301
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Error: Within
## Df Sum Sq Mean Sq F value Pr(>F)
## sound 2 5288 2643.8 5.908 0.00278 **
## sound:MusicFamiliarity 2 258 129.2 0.289 0.74932
## Residuals 1636 732142 447.5
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The interaction is represented as sound:MusicFamiliarity. It
is not statistically significant. This tells us that the effect of sound
(probably) did not depend on music familiarity - or at least we have no
evidence that it does. In other words, whether the music was familiar or
not, the effect of sound on reading rate was the same. See the graph
below. Notice that even though the bars on the left are lower, the
instrumental bar is a bit higher and the lyrical bar a bit lower on both
sides.

I’ve seen that word before vs. that word’s hard to see
Let’s see another example of a factorial design. Hopefully this time
the interaction will be significant!
Here is the original study that explains this data: http://germel.dyndns.org/psyling/pdf/2008_Yap_et_al_SQ_frequency.pdf
Here is the replication study that re-created the original study: https://osf.io/ahpik/
Here is the replication data we are analyzing: https://osf.io/6kaw2
Explanation of the Data
In this study, participants completed a ‘lexical decision task’, they
were shows a string of letters and had to decide, as quickly as
possible, if it was a real word or not.
Two factors were manipulated by the researchers:
- Word Frequency: How often a given word is used in speech
and writing. ‘House’ is a much more common word than ‘Hobby’, and it
should be recognized faster.
- Clarity: How easy a word is to see. Some of the words
alternated on the screen between the word itself and a string of random
symbols. This flickering made the word hard to see.
The authors of the study wanted to look for a:
- Main Effect of Frequency. Are more common words easier
(faster) to recognize? Probably!
- Main Effect of Clarity. Are words that are harder to see
also harder to recognize? I’m betting yes.
- An interaction of frequency and
clarity. If I make a word harder to see, does frequency matter
more or less?
| Clear |
HOUSE |
HOBBY |
| Degraded |
HOUSE vs. @%?&! |
HOBBY vs. @#%?&! |
Running the ANOVA
Here’s our ANOVA:
Word_ANOVA <- aov(Mu ~ Frequency * Clarity + Error(subject), data = worddata)
summary(Word_ANOVA)
##
## Error: subject
## Df Sum Sq Mean Sq F value Pr(>F)
## Residuals 70 6384156 91202
##
## Error: Within
## Df Sum Sq Mean Sq F value Pr(>F)
## Frequency 1 1747096 1747096 60.008 0.000000000000397 ***
## Clarity 1 172073 172073 5.910 0.0159 *
## Frequency:Clarity 1 115052 115052 3.952 0.0481 *
## Residuals 210 6114014 29114
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
It looks like both main effects are significant, and the interaction
is too!
Frequency Main Effect
Let’s interpret the main effect of Frequency. Here’s a graph.
worddata %>%
group_by(Frequency) %>%
summarise(delay = mean(Mu, na.rm = TRUE), se = std.error(Mu)) %>%
ggplot(aes(x = Frequency, y = delay)) +
geom_bar(stat = "identity", position = position_dodge(), aes(fill = Frequency), color = "black") +
geom_errorbar(aes(ymin=delay-se, ymax=delay+se),
width=.2, # Width of the error bars
position=position_dodge(.9)) + theme(legend.position = "none", text = element_text(size = 20)) + labs(
x = "Frequency",
y = "Response Time"
) + theme_bw() + theme(legend.position = "none") #+ coord_cartesian(ylim = c(140, 190))

And here’s how I would write up the results for this part.
There was a significant main effect of Word Frequency (F(1,
210) = 60, MSe = 29114, p < .001), indicating
that high frequency words were recognized more quickly than low
frequency words.
Clarity Main Effect
Now let’s do the same thing for Clarity.
worddata %>%
group_by(Clarity) %>%
summarise(delay = mean(Mu, na.rm = TRUE), se = std.error(Mu)) %>%
ggplot(aes(x = Clarity, y = delay)) +
geom_bar(stat = "identity", position = position_dodge(), aes(fill = Clarity), color = "black") +
geom_errorbar(aes(ymin=delay-se, ymax=delay+se),
width=.2, # Width of the error bars
position=position_dodge(.9)) + theme(legend.position = "none", text = element_text(size = 20)) + labs(
x = "Clarity",
y = "Response Time"
) + theme_bw() + theme(legend.position = "none") #+ coord_cartesian(ylim = c(140, 190))

There was a significant main effect of Clarity (F(1, 210) =
5.91, MSe = 29114, p = .016), indicating that clear
words were recognized more quickly than degraded words.
Interaction
And finally, let’s interpret the interaction. We’ll start with a
graph.
worddata %>%
group_by(Clarity, Frequency) %>%
summarise(delay = mean(Mu, na.rm = TRUE), se = std.error(Mu)) %>%
ggplot(aes(x = Frequency, y = delay, group= Clarity)) +
geom_bar(stat = "identity", position = position_dodge(), aes(fill = Clarity), color = "black") +
geom_errorbar(aes(ymin=delay-se, ymax=delay+se),
width=.2, # Width of the error bars
position=position_dodge(.9)) + theme(legend.position = "none", text = element_text(size = 20)) + labs(
x = "Clarity",
y = "Response Time"
) + theme_bw() + theme(legend.position = "none") #+ coord_cartesian(ylim = c(140, 190))

And some follow-up t-tests.
library(rstatix)
t.test.results <- worddata %>%
group_by(Frequency) %>%
t_test(Mu ~ Clarity, paired = TRUE) %>% # paired = TRUE because this is within-subjects data
adjust_pvalue(method = "none") %>%
add_significance()
knitr::kable(t.test.results)
| High |
Mu |
clear |
degraded |
71 |
71 |
-0.8483092 |
70 |
0.399000 |
0.399000 |
ns |
| Low |
Mu |
clear |
degraded |
71 |
71 |
-3.6352108 |
70 |
0.000526 |
0.000526 |
*** |
And here is what I would conclude:
These two main effects were qualified by a significant interaction
(F(1, 210) = 3.952, MSe = 29114, p <
.048). Follow-up t-tests indicated that the effect of Clarity
was significant for Low Frequency words, but not for High Frequency
Words.
Two-Way ANOVA
A two-way ANOVA has 2 Factors, instead of just one. Two-way ANOVAs
let us test 3 things: * Does Factor A predict the dependent variable? *
Does Factor B predict the dependent variable? * Do Factor A and Factor B
interact (work together) to predict the dependent
variable?
Run the code below:
data(stress)
?stress
stress
## # A tibble: 60 × 5
## id score treatment exercise age
## <int> <dbl> <fct> <fct> <dbl>
## 1 1 95.6 yes low 59
## 2 2 82.2 yes low 65
## 3 3 97.2 yes low 70
## 4 4 96.4 yes low 66
## 5 5 81.4 yes low 61
## 6 6 83.6 yes low 65
## 7 7 89.4 yes low 57
## 8 8 83.8 yes low 61
## 9 9 83.3 yes low 58
## 10 10 85.7 yes low 55
## # ℹ 50 more rows
Now examine the data and read the help page in the ‘help’ tab. Then
answer the following questions:
- What variable would make the best dependent variable?
- What variables would make acceptable ANOVA Predictors? I think there
are two.
- Is this data within- or between-subjects?
- Now set up an ANOVA to test whether your two Predictors
- Don’t forget to include the Interaction!
- Don’t forget to use the summary() function to show your
results.
- Your output should something look like this:
## Df Sum Sq Mean Sq F value Pr(>F)
## treatment 1 351.4 351.4 12.295 0.000923 ***
## exercise 2 1776.3 888.1 31.076 0.00000000104 ***
## treatment:exercise 2 217.3 108.7 3.802 0.028522 *
## Residuals 54 1543.3 28.6
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Notice that there are 3 lines of results instead of just 1. The first
line is the ‘main effect’ of treatment, the second line is the
‘main effect’ of exercise, and the third line is the
interaction of treatment and
exercise.
Main Effects
- In your own words, define ‘main effect’.
- Use the t-test results below to help you write a brief 2-4
sentence APA-style summary of the two main effects.
stress %>% group_by(treatment) %>% t_test(score ~ exercise) %>%
adjust_pvalue(method = "none") %>%
add_significance()
## # A tibble: 6 × 11
## treatment .y. group1 group2 n1 n2 statistic df p p.adj p.adj.signif
## <fct> <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 yes score low moderate 10 10 0.388 17.8 0.703 0.703 ns
## 2 yes score low high 10 10 6.65 16.0 0.00000562 0.00000562 ****
## 3 yes score moderate high 10 10 6.65 16.8 0.00000437 0.00000437 ****
## 4 no score low moderate 10 10 0.0809 17.4 0.936 0.936 ns
## 5 no score low high 10 10 3.36 17.2 0.004 0.004 **
## 6 no score moderate high 10 10 3.01 18.0 0.007 0.007 **
Interaction
- An interaction means that the effect of one Factor depends on the
other Factor. Consider the graph below. How is the effect of the Factor
exercise different for Treatment=yes than for
Treatment=no?

- Now write a brief 1-2 sentence APA-style summary of the interaction
effect. Remember, you need numbers AND words.
Two-Way Repeated Measures ANOVA
Now let’s do a repeated-measures ANOVA!
data(selfesteem2)
?selfesteem2
selfesteem2
## # A tibble: 24 × 5
## id treatment t1 t2 t3
## <fct> <fct> <dbl> <dbl> <dbl>
## 1 1 ctr 83 77 69
## 2 2 ctr 97 95 88
## 3 3 ctr 93 92 89
## 4 4 ctr 92 92 89
## 5 5 ctr 77 73 68
## 6 6 ctr 72 65 63
## 7 7 ctr 92 89 79
## 8 8 ctr 92 87 81
## 9 9 ctr 95 91 84
## 10 10 ctr 92 84 81
## # ℹ 14 more rows
Now examine the data and read the help page in the ‘help’ tab. Then
do the following:
- How do you know that the variables (time or treatment) are
within-subjects?
- Pivot the data so it is in long format.
- Only pivot columns 3:5 (the time columns). Send the column names to
a new column called “time” and the values to “self_esteem_score”
- Now do an ANOVA on the pivoted data. You should get the results
shown below:
- You’ll need to tell R that this is within-subjects data by adding
something like this to your formula:
Error(VariableThatIdentifiesTheSubject)
- Now do the follow-up t-tests.
- Hint: Do you need to group your data before you do the
t-tests? What variable should you group by?
- Hint: Should you use paired=TRUE or paired=FALSE?

- Based on the ANOVA results, follow-up t-tests, and the
graph provided, write an APA-style summary of the results.
- Describe both main effects in numbers and in words. Use the closes
mean square error.
- Describe the interaction in numbers and in words.
When do we NOT use ANOVAs (but we could, if we had to)
Below are two situations where I don’t think you should use
ANOVAs:
- When you have within-subjects data.
- When you have unbalanced data.
First I’ll explain how you COULD do an ANOVA in these situations.
Then we’ll talk about why I don’t think you SHOULD.
Within-subjects data
I can hear you saying, “What do you mean, I shouldn’t use ANOVAs with
within-subjects data? Why’d you teach me about it, then?”
Here are 3 good reasons why I taught you about within-subjects data
here:
- It IS important to understand the difference between between- and
within-subjects data, and now seemed as good a time as any to teach
you.
- You may be called upon to do an ANOVA, or at least to interpret one.
It’s still important to know what a good ANOVA looks like.
- Remember when you were in grade school and they taught you about
drugs? Same reason - so you could stay away.
The Horrors of Unbalanced Data
Since ANOVA was made for factorial designs, which are usually
experimental or quasi-experimental in nature, ANOVA expects that the
data will be balanced: there will be (close to) the same number
of observations in each cell.
There are a couple of reasons why you might have unbalanced data:
- You have a lot of missing data
- Your data are from a naturally occurring data set, so there are more
data points in some cells than others by chance.
If your data is unbalanced, factorial designs require a special
variation of ANOVA. If you are interested in the how and
why, google “Type I and Type II ANOVAs”. If not, just know that
you can do a factorial ANOVA on unbalanced data this way:
if (!require("car")) install.packages("car")
library(car)
Flights_ANOVA_Unbalanced <- Anova(lm(dep_delay ~ origin + carrier, data = flights))
Flights_ANOVA_Unbalanced
## Anova Table (Type II tests)
##
## Response: dep_delay
## Sum Sq Df F value Pr(>F)
## origin 111096 2 34.777 0.000000000000000791 ***
## carrier 5178615 15 216.144 < 0.00000000000000022 ***
## Residuals 524708102 328503
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Why Not ANOVA?
There’s a simple reason why you shouldn’t do an ANOVA in these
situations: a better test is available. Mixed-effects models handle
repeated-measures data better than ANOVAs do, and they deal with
unbalanced designs better, too.
Where do I get these ‘mixed-effects models’, you ask? Stick around -
we’ll get to them soon enough.
Real Data: Do People Want to Be More Moral
- Test the following research question: Which work-related trait do
People want to change the most: Organization, Productiveness,
Assertiveness, or Responsibility? To do this, do the following:
- Read in ‘moraldatalong.csv’ from Checklist 4.
- Filter the data to include ONLY Organization, Productiveness,
Assertiveness, and Responsibility from the Trait variable.
- Decide if the ANOVA is between-, within-, or mixed.
- Perform the appropriate ANOVA
- Perform follow-up t-tests as needed.
- Make a graph comparing the four Traits.
- Write an APA-style paragraph describing the results of your
analysis.
(Bonus) Do an ANCOVA
An ANCOVA in an ANOVA that includes one or more continuous
predictors. The purpose of including this additional predictor is to
statistically control for this value. In other words, an ANCOVA checks
to see if the Factors are significant even after accounting for some
potential confounding factors.
- Redo the ANOVA of the stress data from item 58 as an ANCOVA, this
time including age as a co-variate.
## Df Sum Sq Mean Sq F value Pr(>F)
## treatment 1 351.4 351.4 14.141 0.000425 ***
## exercise 2 1776.3 888.1 35.743 0.000000000149 ***
## age 1 222.7 222.7 8.964 0.004177 **
## treatment:exercise 2 220.9 110.5 4.446 0.016409 *
## Residuals 53 1316.9 24.8
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
- Re-make the graph I made for item 61.
Real Data: Honestly Hot! (Bonus)
For this task, we will be replicating part of this paper:
Niimi, R., & Goto, M. (2023). Good conduct makes your face
attractive: The effect of personality perception on facial
attractiveness judgments. Plos one, 18(2), e0281758.
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0281758
The Open Science Framework Page for this data can be found here:
https://osf.io/rysnm/
Let’s use the data from Experiment 1:
Data: https://osf.io/5qx9j Codebook: https://osf.io/szn2y
- Read in the data. Create a new variable called attractiveness that
is the inverse of Phys1 (see the note in the codebook about this).
- Hint: Google “formula to reverse code Likert scale” or something
similar.
- Once you’ve gotten the data ready, save it as
“honestyhotnessdata.csv” so we can use it later.
- Do an appropriate ANOVA. Include attractiveness as your dependent
variable. Include 3 factors: StimHonesty, StimAtty, and StimGender, as
well as the interactions of all 3 factors and the three-way
interaction.
- Using graphs and follow-up t-tests, interpret the ANOVA
results and write up a paragraph describing what you found.
Real Data: Autism (Bonus! But not easy!)
For this task, we will be replicating part of this paper:
Birmingham, E., Stanley, D., Nair, R., & Adolphs, R. (2015).
Implicit social biases in people with autism. Psychological Science,
26(11), 1693-1705.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4636978/
The Open Science Framework Page for this data can be found here:
https://osf.io/9tu5r/
- Find the file “iatDataForSPSS.txt”. It’s hidden deep within
a zip file. Move that file into your project folder.
- Read in the data and prep it for analysis:
- Remember, it’s a .txt, NOT a .csv
- Create a new variable called BiasType. If Exp contains the
words “Flower” or “Shoes”, BiasType should be “Non-Social”. If
Exp contains the words “Gender” or “Race”, BiasType should be
“Social”. Otherwise it should be NA.
- Set up and run your ANOVA, using BiasType and Group as your Factors
and D as the dependent variable.
- Hint: One of the Factors is within-subjects. Can you figure out
which one?
##
## Error: Subj
## Df Sum Sq Mean Sq F value Pr(>F)
## BiasType 1 0.147 0.1469 0.797 0.375630
## Group 1 2.723 2.7230 14.768 0.000292 ***
## BiasType:Group 1 0.010 0.0104 0.057 0.812869
## Residuals 61 11.248 0.1844
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Error: Within
## Df Sum Sq Mean Sq F value Pr(>F)
## BiasType 1 7.421 7.421 73.275 0.00000000000000108 ***
## BiasType:Group 1 0.404 0.404 3.992 0.0468 *
## Residuals 255 25.826 0.101
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
- Conduct follow-up t-tests to explore the ANOVA
results.
- Make a graph to explore the ANOVA results
- Write an APA-style paragraph describing the outcome of this
analysis.
Go To Home Page