ACM/ESE 118
Methods in Applied Statistics and Data Analysis
Fall 2008


Instructor
Emmanuel Candes
300 Firestone
emmanuel@acm.caltech.edu
Office Hours: T 2–3 (or by appointment)

  Lectures
TTh 10:30-11:55
Guggenheim, 101

First meeting: September 30

 

Home

Handouts

Homework

Computing


Description: Introduction to fundamental ideas and techniques of statistical modeling, with an emphasis on conceptual understanding and on the analysis of real data sets. Assignments will draw on data analysis problems in various science and engineering fields, and will involve some programming.

Prerequisite: Ma 2 or other introductory course in probability and statistics; working knowledge of linear algebra at the level of Ma 2. Programming experience is particularly useful.

Syllabus:

  • Simple linear regression: least squares estimation, analysis of residuals
  • Multiple linear regression: parameter estimation, inference about model parameters
  • Analysis of variance, comparison of models, model selection
  • Assessing goodness-of-fit, outliers, influential observations
  • Collinearity and rank-deficiency, singular value decomposition, regularization
  • Choosing models and fitting parameters: cross-validation, L-curve
  • Principal component analysis, linear discriminant analysis
  • Generalized linear models: models, estimation and examples
  • Resampling methods and the bootstrap

Textbooks:(on reserve at the library)

  1. Montgomery, D. C., Peck E. A., and G. G. Vining, Introduction to Linear Regression Analysis, 4th Edition, Wiley (2006) [required]

  2. Manly, D. F. J, Multivariate Statistical Methods: A Primer, 3rd Edition, Chapman and Hall (2004)

    Great reference for reviewing some elements of linear algebra, and for linear discriminant analysis, principal components analysis and canonical correlation analysis

  3. Efron, B. and R. J. Tibshirani, An Introduction to the Bootstrap, Chapman and Hall (1993)

    Excellent introducuction to the bootstrap and its many applications. Also provides fresh insights into many topics in statistics

  4. Johnson, R. A., and D. W. Wichern, Applied Multivariate Statistical Analysis, 5th Edition, Prentice Hall (2002)

    Covers many of the topics we will study in class. The book is more theoretically oriented than our textbook, and should provide a complement for students wishing to go deeper in the theory

  5. Venables, W. N., and B. D. Ripley, Modern Applied Statistics with S, 4th Edition, Springer (2002)

    This reference explains how to use R (S is the same as R except for very few commands). Also reviews a lot of statistical methodology and introduces some nice data sets

Handouts: All handouts will be stored in a binder in 217 Firestone and/or posted online.

Teaching assistants and office hours:

  • Kelly Littlepage: (klittlepage@caltech.edu): Tuesdays 1–2, 304 Firestone
  • Peter Stobbe (stobbe@acm.caltech.edu): Wednesdays 10–11:30, SFL Study Group 2–3
  • Yaniv Plan (plan@acm.caltech.edu): Mondays 2–3:30, 212 Firestone

Introduction to statistical computing with R: Wednesdays 5–6, Firestone 306 (except on 10/08 where it will be in Moore 070)

Grading:

  • Homework assignments: 60%
    • Homework assignments will generally be distributed on Thursdays and are due in class the following Thursday.
    • Late homeworks will NOT be accepted for grading (medical emergencies excepted with proof).
    • There will be about 6 or 7 assignments; the lowest score will be dropped in the final grade.
  • Final exam (take-home): 40%.
  • Use of sources without citing them in homework sets or in the final exam results in failing grade for course.