The first edition of this book was developed from class notes written for an applied regression course taken primarily by undergraduate business majors in their junior year at the University of Oregon. Since the regression methods and techniques covered in the book have broad application in many fields, not just business, the second edition widened its scope to reflect this. This third edition refines and improves the text further. Details of the major changes for the third edition are included at the end of this preface.

The book is suitable for any undergraduate or graduate statistics course in which regression analysis is the main focus. A recommended prerequisite is an introductory probability and statistics course. It is also appropriate for use in an applied regression course for MBAs and for vocational, professional, or other non-degree courses. Mathematical details have deliberately been kept to a minimum, and the book does not contain any calculus. Instead, emphasis is placed on applying regression analysis to data using statistical software, and understanding and interpreting results. Optional formulas are provided for those wishing to see these details and the book now includes an informal overview of matrices in the context of multiple linear regression.

Chapter 1 reviews essential introductory statistics material, while Chapter 2 covers simple linear regression. Chapter 3 introduces multiple linear regression, while Chapters 4 and 5 provide guidance on building regression models, including transforming variables, using interactions, incorporating qualitative information, and using regression diagnostics. Each of these chapters includes homework problems, mostly based on analyzing real datasets provided with the book. Chapter 6 (online) contains three in-depth case studies, while Chapter 7 (online) introduces extensions to linear regression and outlines some related topics. The appendices contain a list of statistical software packages that can be used to carry out all the analyses covered in the book (each with detailed instructions available on this website), a table of critical values for the t-distribution, notation and formulas used throughout the book, a glossary of important terms, a short mathematics refresher, a tutorial on multiple linear regression using matrices, and brief answers to selected homework problems.

The first five chapters of the book have been used successfully in quarter-length courses at a number of institutions. An alternative approach for a quarter-length course would be to skip some of the material in Chapters 4 and 5 and substitute one or more of the case studies in Chapter 6, or briefly introduce some of the topics in Chapter 7. A semester-length course could comfortably cover all the material in the book.

This website contains supplementary material designed to help both the instructor teaching from this book and the student learning from it. Here you’ll find all the datasets used for examples and homework problems in formats suitable for most statistical software packages, as well as detailed instructions for using the major packages, including SPSS, Minitab, SAS, JMP, Data Desk, EViews, Stata, Statistica, R, and Python. There is alsosome information on using the Microsoft Excel spreadsheet package for some of the analyses covered in the book (dedicated statistical software is necessary to carry out all of the analyses). This website also includes information on obtaining a solutions manual containing complete answers to all the homework problems, as well as instructional videos, practice quizzes, and further ideas for organizing class time around the material in the book.

The book contains the following stylistic conventions:

  • When displaying calculated values, the general approach is to be as accurate as possible when it matters (such as in intermediate calculations for problems with many steps), but to round appropriately when convenient or when reporting final results for real-world questions. Displayed results from statistical software use the default rounding employed in R throughout.
  • In the author’s experience, many students find some traditional approaches to notation and terminology a barrier to learning and understanding. Thus, some traditions have been altered to improve ease of understanding. These include: using familiar Roman letters in place of unfamiliar Greek letters (e.g., E(Y) rather than μ and b rather than β); replacing the nonintuitive ȳ for the sample mean of Y with mY; using NH and AH for null hypothesis and alternative hypothesis, respectively, rather than the usual H0 and Ha.

Major changes for the third edition

  • The second edition of this book was used in the regression analysis course run by from 2012 to 2020. The lively discussion boards provided an invaluable source for suggestions for changes to the book. This edition clarifies and expands on concepts that students found challenging and addresses every question posed in those discussions.
  • There is expanded material on assessing model assumptions, analysis of variance, sums of squares, lack of fit testing, hierarchical models, influential observations, weighted least squares, multicollinearity, and logistic regression.
  • A new appendix provides an informal overview of matrices in the context of multiple linear regression.
  • I’ve added learning objectives to the beginning of each chapter and text boxes at the end of each section that summarize the important concepts.
  • As in the first two editions, this edition uses mathematics to explain methods and techniques only where necessary, and formulas are used within the text only when they are instructive. However, the book also includes additional formulas in optional sections to aid those students who can benefit from more mathematical detail.
  • I’ve added many more end-of-chapter problems. In total, the number of problems has increased by nearly 70%.
  • I’ve updated and added new references.
  • This website has been expanded to include instructional videos and practice quizzes.