Answer Key for Statistics Workshop
ANOVA and Post-hoc tests
results=aov(Mass~Strain,data=datum)
Df Sum
Sq Mean Sq
F value Pr(>F)
Strain
3
1.449
0.4830
5.261 0.0041 **
Residuals 36
3.305
0.0918
TukeyHSD(results)
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = Mass ~ Strain, data = datum)
$`Strain`
diff
lwr
upr p
adj
Strain2-Strain1
-0.02501669 -0.3899625 0.33992915 0.9977248
Strain3-Strain1
-0.40572691 -0.7706728 -0.04078107 0.0244214
Strain4-Strain1
-0.37885197 -0.7437978 -0.01390612 0.0393922
Strain3-Strain2 -0.38071022 -0.7456561 -0.01576438 0.0381385
Strain4-Strain2 -0.35383527 -0.7187811
0.01111057 0.0602106
Strain4-Strain3
0.02687494 -0.3380709 0.39182079 0.9971852
results=lm(Mass~Strain,data=datum)
Coefficients:
Estimate
Std. Error
t value
Pr(>|t|)
(Intercept)
0.81604
0.09582
8.517
3.77e-10 ***
StrainStrain2 -0.02502
0.13551
-0.185
0.85456
StrainStrain3
-0.40573
0.13551
-2.994
0.00495 **
StrainStrain4
-0.37885
0.13551
-2.796
0.00825 **
---
Residual standard error: 0.303 on 36 degrees of freedom
Multiple R-squared: 0.3048, Adjusted R-squared: 0.2469
F-statistic: 5.261 on 3 and 36 DF, p-value: 0.004102
We found that mass of strain 1 was 0.41 (+/- 0.36; +/- 95% C.I.) kg greater than that of strain 3 (p = 0.024). We found that mass of strain 2 was 0.025 (+/- 0.36; +/- 95% C.I.) kg greater than that of strain 1; however our results were not statistically significant (p = 0.998). Running analyses using an lm returned lower p-values (0.0050 and 0.85, respectively) and smaller confidence intervals (0.27 and 0.27, respectively).
When data can be categorical or continuous
results=lm(Biomass~Fertilizer,data=datum)
results2=lm(Biomass~as.factor(Fertilizer),data=datum)
anova(results2,results)
We found that the model where biomass was treated as a categorical model was a significant improvement in the fit to the data, according to an f-drop test (p = 9.067e-08).
Intro to multivariable analyses
Estimate
Std. Error
t value
Pr(>|t|)
(Intercept) 48.9119376
2.4009606 20.372
< 2e-16 ***
Latitude
-1.0061478 0.0857306
-11.736 < 2e-16 ***
Elevation
-0.0016249 0.0001134
-14.327 < 2e-16 ***
CountryNepal -3.5928897
0.2567890 -13.992
< 2e-16 ***
CountryTibet -1.7300551
0.2486237 -6.959
4.39e-10 ***
Residual standard error: 1.022 on 95 degrees of freedom
Multiple R-squared: 0.8687, Adjusted R-squared: 0.8632
F-statistic: 157.1 on 4 and 95 DF, p-value: < 2.2e-16
relevel(Country, ref = "Tibet")Nepal
-1.8628346 0.2510544
-7.420 4.92e-11 ***
Dataset #1
results=lm(SqDens~Pine*RoadDensity,data=datum)
Coefficients:
Estimate
Std. Error
t value
Pr(>|t|)
(Intercept)
9.54211
0.39012
24.460 <
2e-16 ***
RoadDensity
-0.47013
0.06222
-7.556
4.17e-10 ***
ForestPine
1.46694
0.51251
2.862
0.00591 **
RoadDensity:ForestPine -0.52292
0.09076
-5.762
3.70e-07 ***
Residual standard error: 0.9328 on 56 degrees of freedom
Multiple R-squared: 0.8351, Adjusted R-squared: 0.8262
F-statistic: 94.5 on 3 and 56 DF, p-value: < 2.2e-16
results=lm(SqDens~RoadDensity,data=datum,subset=c(Forest=="Hardwood"))
Coefficients:
Estimate
Std. Error
t value
Pr(>|t|)
(Intercept) 9.54211
0.39960
23.879 <
2e-16 ***
RoadDensity -0.47013
0.06373
-7.377
4.94e-08 ***
Residual standard error: 0.9555 on 28 degrees of freedom
Multiple R-squared: 0.6603, Adjusted R-squared: 0.6481
F-statistic: 54.42 on 1 and 28 DF, p-value: 4.938e-08
results=lm(SqDens~RoadDensity,data=datum,subset=c(Forest=="Pine"))
Coefficients:
Estimate
Std. Error
t value
Pr(>|t|)
(Intercept) 11.00905
0.32410
33.97 <
2e-16 ***
RoadDensity -0.99305
0.06442
-15.41
3.3e-15 ***
Residual standard error: 0.9096 on 28 degrees of freedom
Multiple R-squared: 0.8946, Adjusted R-squared: 0.8908
F-statistic: 237.6 on 1 and 28 DF, p-value: 3.297e-15
We found a significant interaction between forest type and road density (p = 3.70e-07), therefore we analyzed the two forest types separately. In pine forest, for each 1 road/sq. km increase in road density, we observed a 0.99 (+/- 0.13; +/- 95% C.I.) squirrel/ha decrease in squirrel density (p = 3.3e-15). However, in hardwood forest, for each 1 road/sq. km increase in road density, we observed a 0.47 (+/- 0.12; +/- 95% C.I.) squirrel/ha decrease in squirrel density (p = 4.94e-08).
Dataset #2 -
results=lm(Size~Sex+Treatment+Sex:Treatment,data=datum)
Coefficients:
Estimate Std. Error
t value Pr(>|t|)
(Intercept)
1.49420 0.03760
39.737 < 2e-16 ***
SexMales
0.54887 0.05318
10.321 < 2e-16 ***
TreatmentHormone
0.47309 0.05318
8.896
3.49e-14 ***
SexMales:TreatmentHormone 0.49144
0.07521 6.535
3.06e-09 ***
Residual standard error: 0.188 on 96 degrees of freedom
Multiple R-squared: 0.899, Adjusted R-squared: 0.8959
F-statistic: 284.9 on 3 and 96 DF, p-value: < 2.2e-16
results=lm(Size~Treatment,data=datum,subset=c(Sex=="Females"))
Coefficients:
Estimate
Std. Error
t value
Pr(>|t|)
(Intercept)
1.49420
0.03624
41.233 <
2e-16 ***
TreatmentHormone 0.47309
0.05125
9.231
3.25e-12 ***
Residual standard error: 0.1812 on 48 degrees of freedom
Multiple R-squared: 0.6397, Adjusted R-squared: 0.6322
F-statistic: 85.22 on 1 and 48 DF, p-value: 3.249e-12
results=lm(Size~Treatment,data=datum,subset=c(Sex=="Males"))
Coefficients:
Estimate
Std. Error
t value
Pr(>|t|)
(Intercept)
2.04308
0.03892
52.49
<2e-16 ***
TreatmentHormone 0.96453
0.05504
17.52
<2e-16 ***
Residual standard error: 0.1946 on 48 degrees of freedom
Multiple R-squared: 0.8648, Adjusted R-squared: 0.862
F-statistic: 307.1 on 1 and 48 DF, p-value: < 2.2e-16
We found a significant interaction between sex and treatment effect (p = 3.06e-09), therefore we analyzed the two sexes separately. In females, we found that treated individuals were 0.47 (+/- 0.10; +/- 95% C.I.) kg larger than the control group (p = 3.25e-12). However, in males, we found that treated individuals were 0.96 (+/- 0.11; +/- 95% C.I.) kg larger than the control group (p < 0.0001).
results=lme(Density~Plot,data=datum,random=~1|Field)
Random effects:
Formula: ~1 | Field
(Intercept) Residual
StdDev: 2.231843
3.052642
Fixed effects: Density ~ Plot
Value
Std.Error
DF
t-value
p-value
(Intercept) 20.521777 1.336962
14
15.349556 0.0000
PlotdBurn 5.322554
1.526321
14
3.487179 0.0036
PlotFertilize 5.179703
1.526321
14
3.393587 0.0044
Number of Observations: 24
Number of Groups: 8
We found that density of grasshoppers in burned plots was 5.32 (+/- 3.05; +/- 95% C.I.) hoppers/ha greater than that in control plots (p = 0.0036). We found that density of grasshoppers in fertilized plots was 5.18 (+/- 3.05; +/- 95% C.I.) hoppers/ha greater than that in control plots (p = 0.0044). We found that density of grasshoppers in burned plots was 0.14 (+/- 3.05; +/- 95% C.I.) hoppers/ha greater than that in fertilized plots; however our results were not statistically significant (p = 0.93). The estimated standard deviation in grasshopper density due to the field effect was 2.23 hoppers/ha.
Split-plot design
results=lme(AlgaeDens~Fish+Plants+Shade,data=datum,random=~1|Pond/Fish/Plants)
Random effects:
Formula: ~1 | Pond
(Intercept)
StdDev: 0.8389336
Formula: ~1 | Fish %in% Pond
(Intercept)
StdDev: 1.04778
Formula: ~1 | Plants %in% Fish %in% Pond
(Intercept) Residual
StdDev: 1.008345
0.8521384
Fixed effects: AlgaeDens ~ Fish + Plants + Shade
Value
Std.Error
DF
t-value
p-value
(Intercept) 29.988255 0.6912284
46
43.38401 0.0000
Fish
9.627482 0.7587809
5
12.68809 0.0001
Plants
10.875258 0.4580404
11
23.74301 0.0000
ShadeLow -10.076803 0.2459912
46
-40.96408 0.0000
ShadeMed -0.093639
0.2459912 46
-0.38066 0.7052
Number of Observations: 72
Number of Groups:
Pond
Fish %in% Pond
6
12
Plants %in% Fish %in% Pond
24
We found that plant algae density in fish plots was 9.63 (+/- 1.51; +/- 95% C.I.) g/ml greater than that in control plots (p = 0.0001; df = 5). We found that algae density in plant plots was 10.88 (+/- 0.91; +/- 95% C.I.) g/ml greater than that in control plots (p < 0.0001; df = 11). We found that algae density in low shade plots was 10.08 (+/- 0.49; +/- 95% C.I.) g/ml lower than that in high shade plots (p < 0.0001; df = 46). We found that plant biomass in medium shade plots was 0.093 (+/- 0.49; +/- 95% C.I.) g/ml lower than that in high water plots (p < 0.0001; df = 46). The estimated standard deviation in biomass due to the pond effect was 0.84 g/ml.
Poisson regression
results=glm(Present~Habitat+Understory,data=datum,family=poisson)
Coefficients:
Estimate Std. Error
z value Pr(>|z|)
(Intercept) 0.982255
0.111574 8.804
< 2e-16 ***
HabitatForest 0.525159
0.093398 5.623
1.88e-08 ***
Habitatgrassland 0.574820
0.104569 5.497
3.86e-08 ***
Understory 0.011416
0.001302 8.765
< 2e-16 ***
We found that forests had 1.69 (1.404 - 2.039; 95% C.L.) times as many bird species as crops (p = 1.88e-08). We also found that grasslands 1.78 (1.44 - 2.19; 95% C.L.) times as many bird species as crops (p = 3.86e-08). Finally, we found that grasslands had 1.05 (0.88 - 1.26; 95% C.L.) times as many bird species as forests; however our results were not statistically significant (p = 0.583). For each 1% increase in understory density, there were 1.011 (1.008 - 1.014; 95% C.L.) times as many bird species (p < 2e-16).
results=glm(Present~Burned+StandAge+TreeDensity,data=datum,family=binomial)
Coefficients:
Estimate
Std. Error
z value
Pr(>|z|)
(Intercept) -3.41174
1.03268
-3.304
0.000954 ***
Burned 1.56501
0.59462
2.632
0.008490 **
StandAge 0.05555
0.01263
4.399
1.09e-05 ***
TreeDensity -0.12780
0.03722
-3.434
0.000595 ***
We found that burned stands were 4.78 (1.58 - 16.83; 95% C.L.) times as likely to be occupied by woodpeckers as control plots (p = 0.0085). For each 1 tree / 100 m2 increase in tree density, forest stands were 0.88 (0.81 - 0.94; 95% C.L.) times as likely to be used by woodpeckers (p = 0.00060). Finally, for each 1 year increase in stand age, forest stands were 1.057 (1.034 - 1.087; 95% C.L.) times as likely to be occupied by woodpeckers (p = 1.09e-0.5).