Skip to content


The first edition of this book was developed from class notes written for an applied regression course taken primarily by undergraduate business majors in their junior year at the University of Oregon. Since the regression methods and techniques covered in the book have broad application in many fields, not just business, this second edition widens its scope to reflect this. Details of the major changes for the second edition are included below.

The book is suitable for any undergraduate statistics course in which regression analysis is the main focus. A recommended prerequisite is an introductory probability and statistics course. It would also be suitable for use in an applied regression course for non-statistics major graduate students, including MBAs, and for vocational, professional, or other non-degree courses. Mathematical details have deliberately been kept to a minimum, and the book does not contain any calculus. Instead, emphasis is placed on applying regression analysis to data using statistical software, and understanding and interpreting results.

Chapter 1 reviews essential introductory statistics material, while Chapter 2 covers simple linear regression. Chapter 3 introduces multiple linear regression, while Chapters 4 and 5 provide guidance on building regression models, including transforming variables, using interactions, incorporating qualitative information, and using regression diagnostics. Each of these chapters includes homework problems, mostly based on analyzing real datasets provided with the book. Chapter 6 contains three in-depth case studies, while Chapter 7 introduces extensions to linear regression and outlines some related topics. The appendices contain a list of statistical software packages that can be used to carry out all the analyses covered in the book (each with detailed instructions available from the book website), a table of critical values for the t-distribution, notation and formulas used throughout the book, a glossary of important terms, a short mathematics refresher, and brief answers to selected homework problems.

The first five chapters of the book have been used successfully in quarter-length courses at a number of institutions. An alternative approach for a quarter-length course would be to skip some of the material in Chapters 4 and 5 and substitute one or more of the case studies in Chapter 6, or briefly introduce some of the topics in Chapter 7. A semester-length course could comfortably cover all the material in the book.

The website for the book contains supplementary material designed to help both the instructor teaching from this book and the student learning from it. There you'll find all the datasets used for examples and homework problems in formats suitable for most statistical software packages, as well as detailed instructions for using the major packages, including SPSS, Minitab, SAS, JMP, Data Desk, EViews, Stata, Statistica, R, and S-PLUS. There is also some information on using the Microsoft Excel spreadsheet package for some of the analyses covered in the book (dedicated statistical software is necessary to carry out all of the analyses). The website also includes information on obtaining a solutions manual containing complete answers to all the homework problems, as well as further ideas for organizing class time around the material in the book.

The book contains the following stylistic conventions:

  • When displaying calculated values, the general approach is to be as accurate as possible when it matters (such as in intermediate calculations for problems with many steps), but to round appropriately when convenient or when reporting final results for real-world questions. Displayed results from statistical software use the default rounding employed in R throughout.
  • In the author's experience, many students find some traditional approaches to notation and terminology a barrier to learning and understanding. Thus, some traditions have been altered to improve ease of understanding. These include: using familiar Roman letters in place of unfamiliar Greek letters [e.g., E(Y) rather than μ and b rather than β]; replacing the nonintuitive ȳ for the sample mean of Y with mY; using NH and AH for null hypothesis and alternative hypothesis, respectively, rather than the usual H0 and Ha.

Major changes for the second edition

  • The first edition of this book was used in the regression analysis course run by from 2008 to 2012. The lively discussion boards provided an invaluable source for suggestions for changes to the book. This edition clarifies and expands on concepts that students found challenging and addresses every question posed in those discussions.
  • The foundational material on interval estimation has been rewritten to clarify the mathematics.
  • There is new material on testing model assumptions, transformations, indicator variables, nonconstant variance, autocorrelation, power and sample size, model building, and model selection.
  • As far as possible, I've replaced outdated data examples with more recent data, and also used more appropriate data examples for particular topics (e.g., autocorrelation). In total, about 40% of the data files have been replaced.
  • Most of the data examples now use descriptive names for variables rather than generic letters such as Y and X.
  • As in the first edition, this edition uses mathematics to explain methods and techniques only where necessary, and formulas are used within the text only when they are instructive. However, this edition also includes additional formulas in optional sections to aid those students who can benefit from more mathematical detail.
  • I've added many more end-of-chapter problems. In total, the number of problems has increased by nearly 25%.
  • I've updated and added new references, nearly doubling the total number of references.
  • I've added a third case study to Chapter 3.
  • The first edition included detailed computer software instructions for five major software packages (SPSS, Minitab, SAS Analyst, R/S-PLUS, and Excel) in an appendix. This appendix has been dropped from this edition; instead, instructions for newer software versions and other packages (e.g., JMP and Stata) are now just updated on the book website.