Exercises - Statistics Workshop

Answer Key

ANOVA and Post-hoc tests

Dataset - For this dataset, you are conducting an experiment to see if the mass (Y; kg) varies between different strains of tilapia of a given length (X). Run an ANOVA and then a Tukey's Post-hoc test. Report the difference between Strain 1 and Strain 3, as well as the difference between Strain 2 and Strain 1. Also run the analysis without the post-hoc test. How do the results from the Tukey's post-hoc test compare to those from without the post-hoc test? Why?

When data can be categorical or continuous

Dataset - For this dataset, you are examining the effect of fertilizer (grams) on plant biomass. Fertilizer has been applied to multiple sample plants at a select few levels; thus, fertilizer could be treated as either categorical or continuous. Use a couple of lm analyses to analyze the data treating fertilizer as both categorical and continuous. Is the model where you treat fertilizer as categorical a significant improvement in the fit to the data?

Multivariable analyses

Dataset - In this example, you have two continuous factors (e.g., elevation [meters] and degrees north latitude [degrees]), and one categorical factor with three groups (country). The dependent variable is lotus plant size (grams). Analyze all variables at the same time using an lm.

Interactions - Pick one that is more relevant to your research

Dataset #1 - In this example, you have a continuous x-variable (e.g., road density [roads/sq. km]) and one categorical x-variable (Hardwood forest vs. Pine forest) and you are looking at their effect on squirrel density (squirrels / hectare). In this system, there is an interaction between road density and forest type.

Dataset #2 - In this example you have two categorical factors (e.g., Sex and Treatment) and each factor has two levels (male + female, control (placebo) + hormone injection). The dependent variable might be something like adult size. In this system, there is an interaction between sex and treatment.

Mixed effects models

Data Set - In this experiment, we have 8 fields, in each field we have three different plots, which are randomly assigned to 1 of 3 "treatments": control, fertilize, and burn (called dburn for reasons you should be able to figure out). Note that we can't test for an interaction between burn and fertilizer, because no plots received both treatments (so the treatments are different groups within a single variable; NOT different variables). The response variable being measured is density of grasshoppers (hoppers/hectare). Using an lme, calculate the differences in the density of grasshoppers between the three treatments. What is the standard deviation in the effect of field on grasshopper density?

Poisson Regression

DataSet - In this example, you are conducting a study of bird biodiversity. You have sampled 100 randomly chosen habitat patches for birds. In addition to the number of bird species detected in the patch (Y - 'Present'), you also collected data on the habitat type (X1), and the understory density (X2 - an index from 0 to 100 generated using a density board - 100 means 100 visual obstruction - so thick you can't see through it, 0 means no visual obstruction or no understory).  In your analyses, be sure you consider the differences among all three habitat types!

Logistic Regression

DataSet - In this example, you will basically be doing a habitat analysis. In essence, you have sampled 100 long-leaf pine stands for red-cockaded woodpeckers. In addition to whether or not the bird was detected (Y - 'Present'; which we will assume was done without error), you collected data on the stand age (X1 - in years), the density of the trees (X2 - in trees / 100 m2), and whether or not the stand is burned regularly by management personnel (X3 - every 3 years) or only burned when fire occurs naturally.