Answer Key for Statistics Workshop

ANOVA and Post-hoc tests

results=aov(Mass~Strain,data=datum)

                     Df             Sum Sq         Mean Sq         F value         Pr(>F)
Strain          3             1.449             0.4830                5.261         0.0041 **
Residuals     36             3.305             0.0918

TukeyHSD(results)

 Tukey multiple comparisons of means
95% family-wise confidence level

Fit: aov(formula = Mass ~ Strain, data = datum)

$`Strain`
                                        diff                 lwr                 upr             p adj
Strain2-Strain1              -0.02501669 -0.3899625 0.33992915 0.9977248
Strain3-Strain1             -0.40572691 -0.7706728 -0.04078107 0.0244214
Strain4-Strain1              -0.37885197 -0.7437978 -0.01390612 0.0393922
Strain3-Strain2             -0.38071022 -0.7456561 -0.01576438 0.0381385
Strain4-Strain2             -0.35383527 -0.7187811 0.01111057 0.0602106
Strain4-Strain3            0.02687494 -0.3380709 0.39182079 0.9971852

results=lm(Mass~Strain,data=datum)

Coefficients:
                                Estimate             Std. Error             t value             Pr(>|t|)
(Intercept)                 0.81604             0.09582             8.517             3.77e-10 ***
StrainStrain2       -0.02502             0.13551             -0.185             0.85456
StrainStrain3          -0.40573             0.13551             -2.994             0.00495 **
StrainStrain4              -0.37885             0.13551             -2.796             0.00825 **
---


Residual standard error: 0.303 on 36 degrees of freedom
Multiple R-squared: 0.3048, Adjusted R-squared: 0.2469
F-statistic: 5.261 on 3 and 36 DF, p-value: 0.004102

We found that mass of strain 1 was 0.41 (+/- 0.36; +/- 95% C.I.) kg greater than that of strain 3 (p = 0.024). We found that mass of strain 2 was 0.025 (+/- 0.36; +/- 95% C.I.) kg greater than that of strain 1; however our results were not statistically significant (p = 0.998). Running analyses using an lm returned lower p-values (0.0050 and 0.85, respectively) and smaller confidence intervals (0.27 and 0.27, respectively).

When data can be categorical or continuous

results=lm(Biomass~Fertilizer,data=datum)

results2=lm(Biomass~as.factor(Fertilizer),data=datum)

anova(results2,results)

We found that the model where biomass was treated as a categorical model was a significant improvement in the fit to the data, according to an f-drop test (p = 9.067e-08).

Intro to multivariable analyses

results=lm(Size~Latitude+Elevation+Country,data=datum)

Coefficients:
                        Estimate             Std. Error             t value             Pr(>|t|)
(Intercept)         48.9119376         2.4009606         20.372         < 2e-16 ***
Latitude             -1.0061478         0.0857306         -11.736         < 2e-16 ***
Elevation             -0.0016249         0.0001134         -14.327         < 2e-16 ***
CountryNepal        -3.5928897         0.2567890         -13.992         < 2e-16 ***
CountryTibet         -1.7300551             0.2486237         -6.959         4.39e-10 ***


Residual standard error: 1.022 on 95 degrees of freedom
Multiple R-squared: 0.8687, Adjusted R-squared: 0.8632
F-statistic: 157.1 on 4 and 95 DF, p-value: < 2.2e-16

results=lm(Size~Latitude+Elevation+relevel(Country,ref="Tibet"),data=datum)

relevel(Country, ref = "Tibet")India         1.7300551         0.2486237         6.959         4.39e-10 ***
relevel(Country, ref = "Tibet")Nepal         -1.8628346         0.2510544         -7.420         4.92e-11 ***

We found that for each 1 degree increase in latitude, there was a 1.00 (+/- 0.17; +/- 95% C.I.) cm decrease in lotus flower size (p < 0.0001). We found that for each 1000 meter increase in elevation, there was a 1.625 (+/- 0.22; +/- 95% C.I.) cm decrease in lotus flower size (p < 0.0001).We found that lotus flower size in Nepal was 3.60 (+/- 0.50; +/- 95% C.I.) cm smaller than that in India (p < 0.0001). We found that lotus flower size in Tibet was 1.73 (+/- 0.49; +/- 95% C.I.) cm smaller than that in India (p = 4.39e-10). Finally, we found that lotus flower size in Nepal was 1.86 (+/- 0.50; +/- 95% C.I.) cm smaller than that in Tibet (p = 4.39e-10).

Interactions

Dataset #1

results=lm(SqDens~Pine*RoadDensity,data=datum)

Coefficients:
                                        Estimate             Std. Error             t value             Pr(>|t|)
(Intercept)                         9.54211             0.39012             24.460             < 2e-16 ***
RoadDensity                     -0.47013             0.06222             -7.556             4.17e-10 ***
ForestPine                         1.46694             0.51251             2.862             0.00591 **
RoadDensity:ForestPine     -0.52292             0.09076             -5.762             3.70e-07 ***

Residual standard error: 0.9328 on 56 degrees of freedom
Multiple R-squared: 0.8351, Adjusted R-squared: 0.8262
F-statistic: 94.5 on 3 and 56 DF, p-value: < 2.2e-16

results=lm(SqDens~RoadDensity,data=datum,subset=c(Forest=="Hardwood"))

Coefficients:
                        Estimate             Std. Error             t value             Pr(>|t|)
(Intercept)        9.54211             0.39960             23.879             < 2e-16 ***
RoadDensity     -0.47013             0.06373             -7.377             4.94e-08 ***


Residual standard error: 0.9555 on 28 degrees of freedom
Multiple R-squared: 0.6603, Adjusted R-squared: 0.6481
F-statistic: 54.42 on 1 and 28 DF, p-value: 4.938e-08

results=lm(SqDens~RoadDensity,data=datum,subset=c(Forest=="Pine"))

Coefficients:
                        Estimate             Std. Error             t value             Pr(>|t|)
(Intercept)       11.00905             0.32410             33.97             < 2e-16 ***
RoadDensity     -0.99305             0.06442             -15.41             3.3e-15 ***

Residual standard error: 0.9096 on 28 degrees of freedom
Multiple R-squared: 0.8946, Adjusted R-squared: 0.8908
F-statistic: 237.6 on 1 and 28 DF, p-value: 3.297e-15

We found a significant interaction between forest type and road density (p = 3.70e-07), therefore we analyzed the two forest types separately. In pine forest, for each 1 road/sq. km increase in road density, we observed a 0.99 (+/- 0.13; +/- 95% C.I.) squirrel/ha decrease in squirrel density (p = 3.3e-15). However, in hardwood forest, for each 1 road/sq. km increase in road density, we observed a 0.47 (+/- 0.12; +/- 95% C.I.) squirrel/ha decrease in squirrel density (p = 4.94e-08).

Dataset #2 -

results=lm(Size~Sex+Treatment+Sex:Treatment,data=datum)

Coefficients:
                                              Estimate         Std. Error         t value         Pr(>|t|)
(Intercept)                             1.49420         0.03760         39.737         < 2e-16 ***
SexMales                             0.54887         0.05318         10.321         < 2e-16 ***
TreatmentHormone             0.47309         0.05318         8.896             3.49e-14 ***
SexMales:TreatmentHormone 0.49144         0.07521         6.535         3.06e-09 ***


Residual standard error: 0.188 on 96 degrees of freedom
Multiple R-squared: 0.899, Adjusted R-squared: 0.8959
F-statistic: 284.9 on 3 and 96 DF, p-value: < 2.2e-16

results=lm(Size~Treatment,data=datum,subset=c(Sex=="Females"))

Coefficients:
                                Estimate             Std. Error             t value             Pr(>|t|)
(Intercept)                 1.49420             0.03624             41.233             < 2e-16 ***
TreatmentHormone   0.47309             0.05125             9.231             3.25e-12 ***

Residual standard error: 0.1812 on 48 degrees of freedom
Multiple R-squared: 0.6397, Adjusted R-squared: 0.6322
F-statistic: 85.22 on 1 and 48 DF, p-value: 3.249e-12

results=lm(Size~Treatment,data=datum,subset=c(Sex=="Males"))

Coefficients:
                                Estimate             Std. Error             t value             Pr(>|t|)
(Intercept)                 2.04308             0.03892             52.49             <2e-16 ***
TreatmentHormone   0.96453             0.05504             17.52             <2e-16 ***

Residual standard error: 0.1946 on 48 degrees of freedom
Multiple R-squared: 0.8648, Adjusted R-squared: 0.862
F-statistic: 307.1 on 1 and 48 DF, p-value: < 2.2e-16

We found a significant interaction between sex and treatment effect (p = 3.06e-09), therefore we analyzed the two sexes separately. In females, we found that treated individuals were 0.47 (+/- 0.10; +/- 95% C.I.) kg larger than the control group (p = 3.25e-12). However, in males, we found that treated individuals were 0.96 (+/- 0.11; +/- 95% C.I.) kg larger than the control group (p < 0.0001).

Mixed effects models (use lme function)

results=lme(Density~Plot,data=datum,random=~1|Field)

 

Random effects:
Formula: ~1 | Field
                (Intercept)         Residual
StdDev:     2.231843         3.052642

Fixed effects: Density ~ Plot
                    Value             Std.Error             DF             t-value             p-value
(Intercept)     20.521777     1.336962             14             15.349556     0.0000
PlotdBurn     5.322554         1.526321             14             3.487179     0.0036
PlotFertilize     5.179703         1.526321             14             3.393587     0.0044

Number of Observations: 24
Number of Groups: 8

We found that density of grasshoppers in burned plots was 5.32 (+/- 3.05; +/- 95% C.I.) hoppers/ha greater than that in control plots (p = 0.0036). We found that density of grasshoppers in fertilized plots was 5.18 (+/- 3.05; +/- 95% C.I.) hoppers/ha greater than that in control plots (p = 0.0044). We found that density of grasshoppers in burned plots was 0.14 (+/- 3.05; +/- 95% C.I.) hoppers/ha greater than that in fertilized plots; however our results were not statistically significant (p = 0.93). The estimated standard deviation in grasshopper density due to the field effect was 2.23 hoppers/ha.

Split-plot design

results=lme(AlgaeDens~Fish+Plants+Shade,data=datum,random=~1|Pond/Fish/Plants)

Random effects:
Formula: ~1 | Pond
(Intercept)
StdDev: 0.8389336

Formula: ~1 | Fish %in% Pond
(Intercept)
StdDev: 1.04778

Formula: ~1 | Plants %in% Fish %in% Pond
                (Intercept)         Residual
StdDev:     1.008345         0.8521384

Fixed effects: AlgaeDens ~ Fish + Plants + Shade
                    Value             Std.Error             DF             t-value             p-value
(Intercept)     29.988255     0.6912284         46             43.38401         0.0000
Fish             9.627482         0.7587809         5                 12.68809         0.0001
Plants             10.875258     0.4580404         11                 23.74301         0.0000
ShadeLow     -10.076803     0.2459912         46             -40.96408         0.0000
ShadeMed     -0.093639         0.2459912         46             -0.38066         0.7052


Number of Observations: 72
Number of Groups:
Pond                 Fish %in% Pond
6                         12
Plants %in% Fish %in% Pond
24

We found that plant algae density in fish plots was 9.63 (+/- 1.51; +/- 95% C.I.) g/ml greater than that in control plots (p = 0.0001; df = 5). We found that algae density in plant plots was 10.88 (+/- 0.91; +/- 95% C.I.) g/ml greater than that in control plots (p < 0.0001; df = 11). We found that algae density in low shade plots was 10.08 (+/- 0.49; +/- 95% C.I.) g/ml lower than that in high shade plots (p < 0.0001; df = 46). We found that plant biomass in medium shade plots was 0.093 (+/- 0.49; +/- 95% C.I.) g/ml lower than that in high water plots (p < 0.0001; df = 46). The estimated standard deviation in biomass due to the pond effect was 0.84 g/ml.

Poisson regression

results=glm(Present~Habitat+Understory,data=datum,family=poisson)

Coefficients:
                            Estimate         Std. Error         z value         Pr(>|z|)
(Intercept)         0.982255         0.111574         8.804         < 2e-16 ***
HabitatForest     0.525159         0.093398         5.623         1.88e-08 ***
Habitatgrassland 0.574820         0.104569         5.497         3.86e-08 ***
Understory         0.011416         0.001302         8.765         < 2e-16 ***

We found that forests had 1.69 (1.404 - 2.039; 95% C.L.) times as many bird species as crops (p = 1.88e-08). We also found that grasslands 1.78 (1.44 - 2.19; 95% C.L.) times as many bird species as crops (p = 3.86e-08). Finally, we found that grasslands had 1.05 (0.88 - 1.26; 95% C.L.) times as many bird species as forests; however our results were not statistically significant (p = 0.583). For each 1% increase in understory density, there were 1.011 (1.008 - 1.014; 95% C.L.) times as many bird species (p < 2e-16).

results=glm(Present~Burned+StandAge+TreeDensity,data=datum,family=binomial)

Coefficients:
                    Estimate             Std. Error             z value             Pr(>|z|)
(Intercept)     -3.41174             1.03268             -3.304             0.000954 ***
Burned         1.56501             0.59462                 2.632             0.008490 **
StandAge     0.05555             0.01263                 4.399             1.09e-05 ***
TreeDensity -0.12780             0.03722                 -3.434             0.000595 ***

We found that burned stands were 4.78 (1.58 - 16.83; 95% C.L.) times as likely to be occupied by woodpeckers as control plots (p = 0.0085). For each 1 tree / 100 m2 increase in tree density, forest stands were 0.88 (0.81 - 0.94; 95% C.L.) times as likely to be used by woodpeckers (p = 0.00060). Finally, for each 1 year increase in stand age, forest stands were 1.057 (1.034 - 1.087; 95% C.L.) times as likely to be occupied by woodpeckers (p = 1.09e-0.5).