- Aug 28 (Wed the first day of class): Fitting functions with Taylor
- Admistrivia: course goals, prerequisites
- readings:
- Richard Berk
- Berndt chapter 2
- Dalgaard: 2.4 and 6.1 (version 2 it is 1.6 and 5.1)

- Lecture notes: (.pdf, .html, and source=.Rnw)

- Aug 30 (Fri): 11:: R introduction in F94
- Come early (at 10:30) if you have trouble installing R on your computer.
- bring your laptops!
- After seeing his introduction, you should be able to do the first practice R set. If you have questions about doing this, send us an email and we can add more information to the file.

- Sept 4 (Wed, the 2nd day of class): Doglegs
- Lecture notes: ( .pdf , .html and the source=.Rnw)
- Finish reading Richard Berk.

- Sept 5 @ 4pm (Thur): The first Practice R problems due
- Sept 6 (Fri): 11am 2nd day of introduction to R in F94

- Sept 7/8 (weekend!) Work on learning R. Do Josh's 2nd practice to confirm you are good with R. Turn it in on Wednesday. If you have R down, you should start on HW 1. At any rate, start reading Berndt chapter 2.
- Sept 9: The residuals
- Sept 11: standard errors (practice 2 due)
- Sept 12 @ 4pm: Berndt homework due (Sorry this drifted to next week accidentally, but this is the correct due date.)

- Sept 16: Bootstrap
- class notes: html, .Rnw
- Suggested readings for bootstraping data:
- The Wiki article is very good
- How to bootstrap in R.
- A more theoretical description of bootstrap
- Efron came up with the idea. See the book by Efron, Bradley. Tibshirani, Robert J. An introduction to bootstrap.

- To confirm you have understood the first 5 lectures, Start HW 2 (.html). No detailed write ups necessary--just R.

- Sept 18: BERNDT with bootstrap.

- Sept 23: bootstrapping colinear data
- Sept 25: Experiments and
causation
- Read article by Jim Manzi.
- Discussion of Skinner vs. Piaget

- Sept 30: Applying natural experiments to the origins of sex.
- handout
- wiki on the evolution of sex
- Widowbirds with long tails and the handicap principle

- Oct 2: Homework discussion and power transformations
- Most of this lecture ended up talking about the homework and the log transformation. So we'll do production functions next time.

- Oct 3: hw2 due (baseball data)

- Oct 7: Learning curves and Cobb-Douglas production functions
- some notes
- Tukey's bulging rule
- Wiki page on learning curves
- Read Berndt chapter 3. We'll have a homework out of it due sometime soon.
- Richar Waterman's notes

- Oct 9: Singularity and examples of learning curves
- Kurzweil view singularity (or on utube)
- Kurzweil and google work on solving death! (This is the part of his talk I skipped today). Hat tip to Deepa Mahajan.

- Oct 14: Stepwise regression: Introduction
- Wiki on model selection
- Wiki on Bonferroni
- A fairly readable research paper with Bob Stine

- Oct 16: More on stepwise regression
- Start thinking about the project
- Oct 16: More on stepwise regression

- Oct 21: You can mathematize anything!
- Oct 23: Linguistics
- geek vs nerd
- read the wiki page on Naive bayes.

- Oct 24: HW3: Fitting evolution data

- Oct 28: Using R with words
- R output (R code)
- Federalist data
- language log on who wrote
*The cookoo's Calling.* - Pretty pictures: words in a cloud using R.
- passing bills in congress

- Oct 30: Naive bayes implementation

- Nov 4: PCA
- This class will set up the concepts and the next class will show how to do it for linguistics and some R code.
- Bostrom and the not so happy future.
- Now this is big data

- Nov 6: More PCA
- Source with latex
- Just R
- Reading: Stephen Stigler's article: " Statistics in 1933"

- Nov 7 (Thursday): proposal for project due (.Rnw).
- Nov 8 (Friday): Homework: Stepwise regression

- Nov 11: predicting the next word
- Zipf distribution (see NSF for some curious modern ideas on Zipf.)
- Ziff's law for the full wikipedia
- download the entire wiki.

- Nov 13: CCA's

- Nov 18: Assignment on Alice in wonderland
- Nov 18: summary of
linguistics
- word types: with background
- nouns vs verbs
- nouns vs verbs
- numbers
- handout
- R code for generating above plots
- data for above plots

- Nov 20: Case: real estate
- Nov 21: Lit review due

- Nov 25: Paranoid finance
- book chapter
- an entire wiki on this topic

- Nov 25: Data anaysis due
- Nov 27: No class (friday schedule)

- Dec 2: Calibration
- Dec 4: presentations without power point (odd sized groups: i.e. either 1 person or 3 people groups)
- Dec 9 (last day of class): presentations without power point (even sized groups)
- Dec 11 (reading days): presentations with power point

- Dec 16st: Final write up due

- Class: MW 10:30-12, F36
- TA: Kory Johnson
- Office hours: Thursday at 2pm (Huntsman 472). Or email me for an appointment.
- Course work:
- submit all exercises to statistics.assignments@gmail.com
- Exercises (to introduce you to R. These are not that important--they are mostly for your benefit.)
- Assignments (or more accurately, cases)
- Final project (in a group of two students or individual)
- Note: You are the best students on campus. I have very high expecations on what you will learn this semester. On student evaluations, this class is often listed as requiring the most work of any class taken at Wharton.

- We will be using R (free) as our statistics package. I'll be using R in class. The book is on R. Statistics revolves around R.
- Two useful books on R:
- Introductory Statistics with R by Peter Dalgaard, 2nd edition, ISBN 978-0-387-79053-4, Springer 2008 (paperback).
- Linear Models with R by Julian J. Faraway, ISBN 1-58488-425-8, Chapman & Hall/CRC Press 2005, (hardback)
- In future years, I might have the students read
*Big Data: A Revolution That Will Transform How We Live, Work, and*.

- But the web, and Kory are your best resources!

- If you can't access WRDS, here are the two files:
- Fama deciles (with VW returns)
- value weighted (aka SP500)
- tbills

- Cleaning crews for practice reading into R
- Federalist papers
- Alice in wonderland: human readable, and word counts from google. Here are files that are easier to read into R: one, two.
- Google n-grams

- Richard Berk
- Berndt chapter 2
- Dalgaard: 2.4 and 6.1 (in the previous edition: 1.6 and 5.1)
- Faraway: chapter 7
- Dalgaard: Chapter 6 (whole chapter)
- Dawkins: Handout
- Long term history of stocks / bonds.

- fire up R (first week's practice)
- make doglegs (second week's practice)
- residuals (3rd week's practice)
- hetroskadasticity (3rd week's practice)
- homework one help file.
- homework two help file.
- federalist help file

- example source file
- What it looks like when processed
- Other references I've found useful:
- powerdot,
- My sample Sweave commands file: (.Rnw and .pdf)
- An introduction
- The Sweave manual itself
- how to customize Sweave

- Obama and the SP 500
- I fixed the block bootstrap! (See Rweave code)
- Here one picture I talked about in class: obama vs SP500.
- Justin Wolfers comment at Bloomberg (scroll to the bottom). (Hat tip to Joshua Lynn)
- If you want to play with this data here are the obama.csv and sp.csv data files.

- boston housing data (.txt, .jmp)
- New Hampshire Temp (.jmp) from
*Nature*. - Berndt data (for homework)
- Doing an infinite number of t-tests? (.jmp)
- Here is a simulation of an ideal hetroskadastic dataset.
- Online code for alpha investing (fast large variable selection).
- Loans (.csv)

- World records for marathons by age Contrast with New england Journal of Medicine from 1980 (nejm198007173030304).
- palio global warming
- Display ft vs sales
- amazon data (from compscore)
- Population cohorts 1926 - 1979 (total .txt,.csv)
- DJIA (.txt, .csv)
- gold.jmp
- Homework 1: download Berndt data. (guidelines)
- Nerlov data (.jmp, .txt)
- learning model cars
- Berkshire Hathaway Inc (.JMP)
- See the end of this page for other data sets of interest
- KOPCKE data for homework 4: (raw data) Note strange file format. You WILL need to edit it! If you read it into a text editor you will see that there are TWO different columns of number. Here are some comments on the file.
- Magic forecasting rule to generate excess returns
- IBM (.txt)
- value weighted.cvs
- Monthly T-bills, VW, inflation 1925 - 1995 (.jmp,.txt) suggestion: DJIA for homework. :-)
- Homework evaluations (.txt). Notice that no homework is statistically significantly worse than any other. Too bad!
- mink data (txt,txt, xls) documentation
- International Airline (.jmp)
- Unemployment (.csv)
- Live births in Pennsylvania 1915 - 1997 csv
- Fish (.jmp)
- Hurricane (Splus)

Last modified: Tue Jan 28 17:37:53 EST 2014