CAIC Problems

Home Announcements Course Information Problem sets Readings

 

I have posted two Helianthus phylogenies and a dataset that goes along with the phylogenies.

1. Use CardInput (and Phylogeny--depending on which version of CAIC you use) to create each of these phylogenies.  You will end up with a pair of .Phyl and .BLen files for each phylogeny at the end of this process.  Be careful how you name your taxa.  I suggest using a three letter code for each of the specific names.  For example, Helianthus annuus will be come "Helianthus ann."  For the species that also have subspecies designations, just add three more letters for the subspecies, e.g., Helianthus debilis vestitus becomes. "Helianthus debves."

Note that the 50% majority rule tree has two instances of H. giganteus and H. debilis cucumerifolus.  Simply designate one of them with a one and the other with a two, e.g., "Helianthus gig1" and "Helianthus gig2."

2. Use an editor or a spreadsheet to make a data file for each of the phylogenies using the information in the dataset.  You'll need to make a dataset for each of the phylogenies because of the duplicate entries for H. gig and H. debcuc in the 50% majority rule phylogeny.  Just use the data for these taxa twice in your data table.  This isn't really legal, statistically speaking, but since this is just a practice assignment, don't worry about it.  The two variables you'll need to put in the table are latitude and % saturated fat.  We won't have any discrete variables in our dataset. 

3. After you've created your data files and your phylogenies, run CAIC for each one.  Try them both with equal length branches and with Grafen branches.  The contrasts that you want to specify will have Latitude as the independent variable and % saturated fat as the dependent variable. 

4.  Use each of your output files in your favorite statistics package to perform a regression on your results and see what you get.  Test whether the slope is significantly different from zero and also plot your results as a scatter plot with a best fit regression line shown on it.  Remember to specify that your regression model does not have a constant.  If when you plot your results the regression line does not go through the origin, you've specified the wrong model. 

5. Just for fun, perform a regression on the raw data and plot it on a scatter plot.  For this regression you will need a constant in your model, because there is no reason to expect the mean to be zero. 

Bring your results to class on Monday and we will compare your results to see who is allowed to publish with out having to retract their results.  Let me know if you have any problems, i.e., don't wait until the last minute to do the assignment.