- Class: MW3-4:30, F92
- TA: Joshua Magarick
- email: statistics.assignments@gmail.com
- Last year's TA Sathya, kept a web site with several useful files on it: code .

- Office hours: Tuesday at 11 am (Huntsman 472). Or email me for an appointment.
- Course work:
- submit all exercises to statistics.assignments@gmail.com
- Exercises (to introduce you to R. These are not that important--they are more for your benefit.)
- Homeworks (or more accurately, cases)
- Final project
- Note: You are the best students on campus. I have very high expecations on what you will learn this semester. This class is often listed as taking the most work of any class taken at wharton on student evaluations.

- We will be using R (free) as our statistics package. I'll be using R in class. The book is on R. Statistics revolves around R.
- Two useful books on R:
- Introductory Statistics with R by Peter Dalgaard, 2nd edition, ISBN 978-0-387-79053-4, Springer 2008 (paperback).
- Linear Models with R by Julian J. Faraway, ISBN 1-58488-425-8, Chapman\& Hall/CRC Press 2005, (hardback)

- But the web, and Joshua are your best resources!

- Jan 11: Fitting functions with Taylor
- readings:
- Richard Berk
- Berndt chapter 2
- Dalgaard: 2.4 and 6.1 (version 2 it is 1.6 and 5.1)

- Lecture notes: (.pdf, .html, and(.Rnw)

- readings:
- Jan 14 @ 12-1:30 (F45): Introduction to R (by Sathyanarayan
Anand) Held In F45.
- bring your laptops!
- After seeing his introduction, you should be able to do the first practice R set. If you have questions about doing this, send us an email and we can add more information to the file.

- Jan 17: No class
- Jan 19: Doglegs
- Lecture notes: ( .pdf , .html and (.Rnw)
- read Richard Berk

- Jan 21: (5pm) The first Practice R problems due

- Jan 24: The residuals
- Jan 26: standard errors
- Thursday Jan 27 @ 5pm: 2nd introduction to R (in F45)
- video
- R code
- events.csv (data used in the above example)

- Jan 31: Bootstrap
- class notes: html, .Rnw
- Suggested readings for bootstraping data:
- The Wiki article is very good
- How to bootstrap in R.
- A more theoretical description of bootstrap
- Efron came up with the idea. See the book by Efron, Bradley. Tibshirani, Robert J. An introduction to bootstrap.

- You should now have finished reading Berndt chapter 2.
- Finish up the first homework) also.
- To confirm you have understood the first 5 lectures, do HW 2 (.html). No detailed write ups necessary--just R.

- Feb 2: Bootstraping Berndt
- class notes: .html, .Rnw, .R)
- Sathya's code
- Apple data
- (Sample R code from 2010 class).
- Submit homework via email: pdf file (putting the R code in a seperate file is useful so we can run it more easilly)

- Feb 4: HW 1 due

- Feb 7: Ponzironi (dice due)
- Feb 9: Global Warming discussion (lead by Sathya)

- Feb 14: HW 2 due (due at 2:45 so you have time to get to class on time. :-)
- Feb 14: Interlude: Darwin
- Feb 16: A Ponzironi :
- Everything in one paper on testing alpha.
- Other information:
- Dice discussion and a dice paper.
- Long run growth rate: notes 1 (optimal investing)

- Feb 21: Introduction to linguistics
- Mitch Marcus and Noam Chomsky
- History of bigger data
- Problem statement: Who wrote the Federalist papers?
- Zipf distribution (see NSF for some curious modern ideas on Zipf.)

- Feb 23: Singularity discussion and federalist papers

- Feb 28: Blocking data (pdf)
- Paper on blocking by Dongyu Lin

- Mar 2:Federalist papers
- Federalist data
- Sathya's code for federalist papers
- sample code (from pdf file) for federalist papers
- Ziff's law for the full wikipedia
- download the entire wiki.

- Spring break!
- Mar 7: assignment 3 due before 5pm (New due date)

- Mar 14: PCAs

- Mar 16: PCAs
- class notes: (html, .Rnw, .R)
- Reading: Statistics in 1933 (on project Euclid)

- Mar 21: CCA and language
- talk slides
- A short assignement on evolution is due just before class at 2:45.

- Mar 23: predicting the next word
- Mar 24: (4-5pm): Using R in lingustics (for homework 4)
- Room 350 JMHH
- If you are conforatble with the homework--you don't need to attend
- It will "truely" start at 4:30, but if you have class at 4:30, come at 4:00.
- The R code we will be discussing
- Sathya's code to help you with homework 4.

- Mar 28: summary of
linguistics
- Proposal due (.pdf)
- some housing data that might be a good start on a project
- Some ideas for projects.

- R code
- data

- Proposal due (.pdf)
- Mar 30: Risk inflation
- here are Sathya's code from his help session.
- research paper

- Apr 4: Risk inflation continued
- handout
- research paper
- You can run data through our code on line.

- Apr 6: Calibration (quotes due)

- April 11:
Regression for Calibration
- We will look at Boston housing in class
- what the variables mean
- R notes
- R source

- Apr 13:Paranoid finance and Paranoid predictions
- data analysis due
- book chapter
- an entire wiki on this topic

- Apr 15: Assignment 4 due (At midnight, so you can spend all of friday evening if you like)

- Apr 18: Loss functions and calibration
- Apr 20: In class presentations

- Apr 25: In class presentations
- Apr 27: (Reading days) 3-6, F92: Optional presentation date for those who want the extra two days. If interested, let me know. I might even allow powerpoint on this date.

- May 3: Final write up of project due
- May 6: Last homework due

- If you can't access WRDS, here are the two files:
- Fama deciles (with VW returns)
- value weighted (aka SP500)
- tbills

- Cleaning crews for practice reading into R
- Federalist papers
- Alice in wonderland: human readable, and word counts from google. Here are files that are easier to read into R: one, two.
- Google n-grams

- Richard Berk
- Berndt chapter 2
- Dalgaard: 2.4 and 6.1 (in the previous edition: 1.6 and 5.1)
- Faraway: chapter 7
- Dalgaard: Chapter 6 (whole chapter)
- Dawkins: Handout
- Long term history of stocks / bonds.

- fire up R (first week's practice)
- make doglegs (second week's practice)
- residuals (3rd week's practice)
- hetroskadasticity (3rd week's practice)
- homework one help file.
- homework two help file.
- federalist help file

- example source file
- What it looks like when processed
- Other references I've found useful:
- powerdot,
- My sample Sweave commands file: (.Rnw and .pdf)
- An introduction
- The Sweave manual itself
- how to customize Sweave

- boston housing data (.txt, .jmp)
- New Hampshire Temp (.jmp) from
*Nature*. - Berndt data (for homework)
- baseball data (.jmp)
- Doing an infinite number of t-tests? (.jmp)
- Here is a simulation of an ideal hetroskadastic dataset.
- Online code for alpha investing (fast large variable selection).
- Loans (.csv)

- World records for marathons by age Contrast with New england Journal of Medicine from 1980 (nejm198007173030304).
- palio global warming
- Display ft vs sales
- amazon data (from compscore)
- Population cohorts 1926 - 1979 (total .txt,.csv)
- DJIA (.txt, .csv)
- gold.jmp
- Homework 1: download Berndt data. (guidelines)
- Nerlov data (.jmp, .txt)
- learning model cars
- Berkshire Hathaway Inc (.JMP)
- info on hedge funds
- See the end of this page for other data sets of interest
- KOPCKE data for homework 4: (raw data) Note strange file format. You WILL need to edit it! If you read it into a text editor you will see that there are TWO different columns of number. Here are some comments on the file.
- Magic forecasting rule to generate excess returns
- IBM (.txt)
- value weighted.cvs
- Monthly T-bills, VW, inflation 1925 - 1995 (.jmp,.txt) suggestion: DJIA for homework. :-)
- Homework evaluations (.txt). Notice that no homework is statistically significantly worse than any other. Too bad!
- mink data (txt,txt, xls) documentation
- International Airline (.jmp)
- Unemployment (.csv)
- Live births in Pennsylvania 1915 - 1997 csv
- Fish (.jmp)
- Hurricane (Splus)

Last modified: Wed Jan 11 12:35:56 EST 2012