Description:
Introduction to fundamental ideas and techniques of
statistical modeling, with an emphasis on conceptual
understanding and on the analysis of real data
sets. Assignments will draw on data analysis problems in
various science and engineering fields, and will involve
some programming.
Prerequisite:
Ma 2 or other introductory course in probability and
statistics; working knowledge of linear algebra at the
level of Ma 2. Programming experience is particularly
useful.
Syllabus:
- Simple linear regression: least squares estimation,
analysis of residuals
- Multiple linear regression: parameter estimation,
inference about model parameters
- Analysis of variance, comparison of models, model
selection
- Assessing goodness-of-fit, outliers, influential
observations
- Collinearity and rank-deficiency, singular value
decomposition, regularization
- Choosing models and fitting parameters:
cross-validation, L-curve
- Principal component analysis, linear discriminant
analysis
- Generalized linear models: models, estimation and examples
- Resampling methods and the bootstrap
Textbooks:(on reserve at
the library)
- Montgomery, D. C., Peck E. A., and
G. G. Vining, Introduction to Linear Regression
Analysis, 4th Edition,
Wiley (2006) [required]
- Manly, D. F. J, Multivariate
Statistical Methods: A Primer, 3rd Edition, Chapman
and Hall (2004)
Great reference for reviewing some elements of linear
algebra, and for linear discriminant analysis, principal
components analysis and canonical correlation analysis
- Efron, B. and R. J. Tibshirani, An
Introduction to the Bootstrap, Chapman and Hall
(1993)
Excellent introducuction to the bootstrap and its many
applications. Also provides fresh insights into many topics in
statistics
- Johnson, R. A., and D. W. Wichern, Applied
Multivariate Statistical Analysis, 5th Edition,
Prentice Hall (2002)
Covers many of the topics we will study in class. The book is more
theoretically oriented than our textbook, and should provide a
complement for students wishing to go deeper in the theory
- Venables, W. N., and
B. D. Ripley, Modern Applied
Statistics with S, 4th Edition, Springer (2002)
This reference explains how to use R (S is the same as R except for
very few commands). Also reviews a lot of statistical methodology and
introduces some nice data sets
Handouts:
All handouts will be stored in a binder in 217 Firestone and/or
posted online.
Teaching assistants and office hours:
- Kelly Littlepage: (klittlepage@caltech.edu): Tuesdays
12, 304 Firestone
- Peter Stobbe (stobbe@acm.caltech.edu): Wednesdays
1011:30, SFL Study Group 23
- Yaniv Plan (plan@acm.caltech.edu): Mondays 23:30, 212
Firestone
Introduction to statistical
computing with R:
Wednesdays 56, Firestone 306
(except on 10/08 where it will be in Moore 070)
Grading:
- Homework assignments: 60%
- Homework assignments will generally be distributed on Thursdays
and are due in class the following Thursday.
- Late homeworks will NOT be accepted for grading
(medical emergencies excepted with proof).
- There will be about 6 or 7 assignments; the
lowest score will be dropped in the final grade.
- Final exam (take-home): 40%.
- Use of sources without citing them in homework sets or
in the final exam results in failing grade for course.
|