Software Info – Minitab

These instructions accompany Applied Regression Modeling by Iain Pardoe, 3rd edition, Wiley, 2020. The numbered items cross-reference with the “computer help” references in the book. These instructions are based on Minitab 17 for Windows, but they (or something similar) should also work for other versions. Find instructions for other statistical software packages here.

Getting started and summarizing univariate data

  1. If desired, change Mintab’s default options by selecting Tools > Options.
  2. To open a Mintab data file, select File > Open.
  3. To edit last dialog box, select Edit > Edit Last Dialog or click the Edit Last Dialog tool (ninth button from the left).
  4. Output appears in the Session Window and can be copied and pasted from Minitab to a word processor like OpenOffice Writer or Microsoft Word. Graphs appear in separate windows and can also easily be copied and pasted to other applications.
  5. You can access help by selecting Help > Help. For example, to find out about “boxplots” click the Index tab, type boxplots in the first box, and select the index entry you want in the second box.
  6. To transform data or compute a new variable, select Calc > Calculator. Type a name (with no spaces) for the new variable in the Store result in variablebox, and type a mathematical expression for the variable in the Expression box. Current variables in the dataset can be moved into the Expression box, while the keypad and list of functions can be used to create the expression. Examples are LOGE('X') for the natural logarithm of X and 'X'**2 for X2. Click OK to create the new variable, which will be added to the dataset (check it looks correct in the Worksheet Window); it can now be used just like any other variable. If you get the error message “Completion of computation impossible,” this means there is a syntax error in your Expression—a common mistake is to forget the multiplication symbol (*) between a number and a variable (e.g., 2*'X' represents 2X).
  7. To create indicator (dummy) variables from a qualitative variable, select Calc > Make Indicator Variables. Move the qualitative variable into the Indicator variables for box, type a range of columns in which to store the variables (e.g., C5-C6) in the Store results in box, and click OK (check that the correct indicator variables have been added to your spreadsheet in the Worksheet Window).
    • To find a percentile (critical value) for a t-distribution, select Calc > Probability Distributions > T. Select Inverse cumulative probability, type the Degrees of freedom, select Input constant, and type the lower-tail area (i.e., one minus the one-tail significance level). For example, typing 29 for theDegrees of freedom and 0.95 for the Input constant returns the 95th percentile of the t-distribution with 29 degrees of freedom (1.699), which is the critical value for an upper-tail test with a 5% significance level. By contrast, typing 0.975 for the Input constant returns the 97.5th percentile (2.045), which is the critical value for a two-tail test with a 5% significance level.
    • To find a percentile (critical value) for an F-distribution, select Calc > Probability Distributions > F. Select Inverse cumulative probability, type the Numerator degrees of freedom and Denominator degrees of freedom, select Input constant, and type the lower-tail area (i.e., one minus the significance level). For example, typing 2 for the Numerator degrees of freedom3 for the Denominator degrees of freedom, and 0.95 for the Input constant returns the 95th percentile of the F-distribution with 2 numerator degrees of freedom and 3 denominator degrees of freedom (9.552).
    • To find a percentile (critical value) for a chi-squared distribution, select Calc > Probability Distributions > Chi-Square. Select Inverse cumulative probability, type the Degrees of freedom, select Input constant, and type the lower-tail area (i.e., one minus the significance level). For example, typing 2for the Degrees of freedom and 0.95 for the Input constant returns the 95th percentile of the chi-squared distribution with 2 degrees of freedom (5.991).
    • To find an upper-tail area (one-tail p-value) for a t-distribution, select Calc > Probability Distributions > T. Select Cumulative probability, type the Degrees of freedom, select Input constant, and type the t-statistic. For example, typing 29 for the Degrees of freedom and 2.40 for the Input constantreturns 0.988, which is one minus the upper-tail area for a t-statistic of 2.40 from the t-distribution with 29 degrees of freedom (i.e., the p-value for an upper-tail test is 1−0.988=0.012). By contrast, 2*(1−0.988)=0.023 is the p-value for the corresponding two-tail test.
    • To find an upper-tail area (p-value) for an F-distribution, select Calc > Probability Distributions > F. Select Cumulative probability, type theNumerator degrees of freedom and Denominator degrees of freedom, select Input constant, and type the F-statistic. For example, typing 2 for theNumerator degrees of freedom3 for the Denominator degrees of freedom and 51.4 for the Input constant returns 0.995, which is one minus the upper-tail area for an F-statistic of 51.4 from the F-distribution with 2 numerator degrees of freedom and 3 denominator degrees of freedom (i.e., the p-value is 1−0.995=0.005).
    • To find an upper-tail area (p-value) for a chi-squared distribution, select Calc > Probability Distributions > Chi-Square. Select Cumulative probability, type the Degrees of freedom, select Input constant, and type the chi-squared statistic. For example, typing 2 for the Degrees of freedomand 0.38 for the Input constant returns 0.173, which is one minus the upper-tail area for a chi-squared statistic of 0.38 from the chi-squared distribution with 2 degrees of freedom (i.e., the p-value is 1−0.173=0.827).
  8. Calculate descriptive statistics for quantitative variables by selecting Stat > Basic Statistics > Display Descriptive Statistics. Move the variable(s) into the Variable(s) list. Click Statistics to select the summaries, such as the Mean, that you would like.
  9. Create contingency tables or cross-tabulations for qualitative variables by selecting Stat > Tables > Cross Tabulation and Chi-Square. Move one qualitative variable into the rows box and another into the columns box. Cell percentages (within rows, columns, or the whole table) can be calculated by clicking the appropriate boxes under Display.
  10. If you have a quantitative variable and a qualitative variable, you can calculate descriptive statistics for cases grouped in different categories by selecting Stat > Tables > Descriptive Statistics. Move the qualitative variable into the rows box (and another qualitative variable into the columns box if there is more than one). Click Associated Variables to select the quantitative variable for which you would like descriptive statistics, and the descriptive statistics to display; the default is the number of cases, but other statistics such as the Mean and Standard Deviation can also be selected.
  11. To make a stem-and-leaf plot for a quantitative variable, select Graph > Stem-and-Leaf. Move the variable into the Graph variables box.
  12. To make a histogram for a quantitative variable, select Graph > Histogram. Choose Simple and move the variable into the Graph variables box.
  13. To make a scatterplot with two quantitative variables, select Graph > Scatterplot. Choose Simple and move the vertical axis variable into the first row of the Y variables column and the horizontal axis variable into the first row of the X variables column.
  14. All possible scatterplots for more than two variables can be drawn simultaneously (called a scatterplot matrix) by selecting Graph > Matrix Plot, choosing Matrix of plots, Simple, and moving the variables into the Graph variables list.
  15. You can mark or label cases in a scatterplot with different colors/symbols according to categories in a qualitative variable by selecting Graph > Scatterplot and choosing With Groups. After moving the vertical axis variable into the first row of the Y variables column and the horizontal axis variable into the first row of the X variables column, move the grouping variable into the Categorical variables for grouping box. To change the colors/symbols used, select the symbols you want to change by clicking on one of the points with that symbol twice (all the data points should become highlighted on the first click, and just the points in that group should remain highlighted on the second click). Then select Editor > Edit Symbols. Select the color/symbol you want and click OK to see the effect.
  16. You can identify individual cases in a scatterplot by hovering over them.
  17. To remove one of more observations from a dataset, select Data > Subset Worksheet. Select Specify which rows to exclude and select one of the subsequent options.
  18. To make a bar chart for cases in different categories, select Graph > Bar Chart.
    • For frequency bar charts of one qualitative variable, choose Simple with Bars represent: Counts of unique values and move the variable into theCategorical variables box.
    • For frequency bar charts of two qualitative variables, choose Cluster with Bars represent: Counts of unique values and move the variables into theCategorical variables box.
    • The bars can also represent various summary functions for a quantitative variable. For example, to represent means, select Bars represent: A function of a variable and select Mean for the function.
  19. To make boxplots for cases in different categories, select Graph > Boxplot. Choose One Y, With Groups, move the quantitative variable into the Graph variables box, and move the qualitative variable(s) into the Categorical variables box.
  20. To make a QQ-plot (also known as a normal probability plot) for a quantitative variable, select Graph > Probability Plot. Choose Single and move the variable into the Graph variables box.
  21. To compute a confidence interval for a univariate population mean, select Stat > Basic Statistics > 1-Sample t. Move the variable for which you want to calculate the confidence interval into the Samples in columns box. Then click the Options button to bring up another dialog box in which you can specify the confidence level for the interval. Clicking OK will take you back to the previous dialog box, where you can now click OK.
  22. To do a hypothesis test for a univariate population mean, select Stat > Basic Statistics > 1-Sample t. Move the variable for which you want to do the test into the Samples in columns box, check Perform hypothesis test, and type the (null) hypothesized value into the Hypothesized mean box. Then click the Optionsbutton to bring up another dialog box in which you can specify a lower-tailed (“less than”), upper-tailed (“greater than”), or two-tailed (“not equal”) alternative hypothesis. OKwill take you back to the previous dialog box, where you can now click OK.

Simple linear regression

  1. To fit a simple linear regression model (i.e., find a least squares line), select Stat > Regression > Regression > Fit Regression Model. Move the response variable into the Response box and the predictor variable into the Predictors box. Just click OK for now—the other items in the dialog box are addressed below. In the rare circumstance that you wish to fit a model without an intercept term (regression through the origin), click the Model button and deselect Include the constant term in the model before clicking OK.
  2. To add a regression line or least squares line to a scatterplot, select Editor > Add > Regression Fit, and Linear for the Model Order. You can create a scatterplot with a regression line superimposed by selecting Graph > Scatterplot. Choose With Regression and move the response variable into the first row of the Y variables column and the predictor variable into the first row of the X variables column.
  3. To find 95% confidence intervals for the regression parameters in a linear regression model, select Stat > Regression > Regression > Fit Regression Model. Move the response variable into the Response box and the predictor variable into the Predictors box. Before clicking OK, click the Results button, select Expanded Table and check Coefficients. The confidence intervals are displayed as the final two columns of the “Coefficients” output. This applies more generally to multiple linear regression also.
    • To find a fitted value or predicted value of Y (the response variable) at a particular value of X (the predictor variable), select Stat > Regression > Regression > Fit Regression Model. Move the response variable into the Response box and the predictor variable into the Predictors box. Before clicking OK, click the Storage button, check Fits, then click OK to return to the main Regression dialog box, and then click OK. The predicted or fitted values of Y at each of the X-values in the dataset are displayed in the Worksheet Window in a column headed FITS in the Worksheet Window. Each time you ask Minitab to calculate predicted or fitted values like this, it will add a new column to the dataset and increment an end digit by one. For example, the second time you calculate a predicted or fitted value of Y it will be called FITS_1.
    • You can also obtain a predicted or fitted value of Y at an X-value that is not in the dataset by selecting Stat > Regression > Regression > Predict after fitting a model. Type the X-value into the first space beneath the predictor variable label. In this case, the predicted or fitted value of Y at this X-value is displayed in the Session Window as “Fit.” (not in the Worksheet Window).
    • This applies more generally to multiple linear regression also.
    • To find a confidence interval for the mean of Y at a particular value of X, select Stat > Regression > Regression > Predict after fitting a model. In the pull-down menu below where it says “Response,” change the option to “Enter columns of values” and select the predictor variable to go in the box labeled with the name of the predictor. The confidence intervals for the mean of Y at each of the X-values in the dataset are displayed as two columns headed CLIM and CLIM_1 in theWorksheet Window. Each time you ask Minitab to calculate confidence intervals like this, it will add new columns to the dataset and increment the end digit by one. For example, the second time you calculate confidence intervals for the mean of Y the end points will be called CLIM_2 and CLIM_3.
    • You can also obtain a confidence interval for the mean of Y at an X-value that is not in the dataset by selecting Stat > Regression > Regression > Predictafter fitting a model. Type the X-value into the first space beneath the predictor variable label. In this case, the confidence interval for the mean of Y at this X-value is displayed only in the Session Window (and not in the Worksheet Window).
    • This applies more generally to multiple linear regression also.
    • To find a prediction interval for an individual value of Y at a particular value of X, select Stat > Regression > Regression > Predict after fitting a model. In the pull-down menu below where it says “Response,” change the option to “Enter columns of values” and select the predictor variable to go in the box labeled with the name of the predictor. The prediction intervals for an individual Y-value at each of the X-values in the dataset are displayed as two columns headed PLIM and PLIM_1 in the Worksheet Window. Each time you ask Minitab to calculate prediction intervals like this, it will add new columns to the dataset and increment the end digit by one. For example, the second time you calculate prediction intervals for an individual Y-value the end points will be called PLIM_2 and PLIM_3.
    • You can also obtain a prediction interval for the mean of Y at an X-value that is not in the dataset by selecting Stat > Regression > Regression > Predictafter fitting a model. Type the X-value into the first space beneath the predictor variable label. In this case, the prediction interval for an individual Y-value at this X-value is displayed only in the Session Window (and not in the Worksheet Window).
    • This applies more generally to multiple linear regression also.

Multiple linear regression

  1. To fit a multiple linear regression model, select Stat > Regression > Regression > Fit Regression Model. Move the response variable into the Responsebox and the predictor variables into the Predictors box. In the rare circumstance that you wish to fit a model without an intercept term (regression through the origin), click the Options button and deselect Fit intercept before clicking OK.
  2. To add a quadratic regression line to a scatterplot, select Editor > Add > Regression Fit, and Quadratic for the Model Order. You can create a scatterplot with a quadratic regression line superimposed by selecting Graph > Scatterplot. Choose With Regression and move the vertical axis variable into the first row of the Y variables column and the horizontal axis variable into the first row of the X variables column. Before clicking OK, click the Data View button, click the Regression tab in the subsequent Scatterplot - Data View dialog box, and change the Model Order from Linear to Quadratic. Click OK to return to theScatterplot - With Regression dialog box, and OK again to create the graph.
  3. Categories of a qualitative variable can be thought of as defining subsets of the sample. If there is also a quantitative response and a quantitative predictor variable in the dataset, a regression model can be fit to the data to represent separate regression lines for each subset. To display a regression line for each subset in a scatterplot, select Graph > Scatterplot and choose With Regression and Groups. After moving the vertical axis variable into the first row of the Y variables column and the horizontal axis variable into the first row of the X variables column, move the grouping variable into the Categorical variables for grouping box. Click OKto create the graph.
  4. Minitab does not appear to offer an automatic way to find the F-statistic and associated p-value for a nested model F-test in multiple linear regression. It is possible to calculate these quantities by hand using Minitab regression output and appropriate percentiles from a F-distribution.
  5. To save residuals in a multiple linear regression model, select Stat > Regression > Regression > Fit Regression Model. Move the response variable into the Response box and the predictor variables into the Predictors box. Before clicking OK, click the Storage button and check Residuals under Diagnostic Measuresin the subsequent Regression: Storage dialog box. Click OK to return to the main Regression dialog box, and then click OK. The residuals are saved as a variable called RESI in the Worksheet Window; they can now be used just like any other variable, for example, to construct residual plots. Each time you ask Minitab to save residuals like this, it will add a new variable to the dataset and increment an end digit by one; for example, the second time you save residuals they will be called RESI_1. To save what Pardoe (2012) calls standardized residuals, check Standardized residuals under Diagnostic Measures in the Regression: Storage dialog box—they will be saved as a variable called SRES in the Data Editor Window. To save what Pardoe (2012) calls studentized residuals, check Deleted t residualsunder Diagnostic Measures in the Regression: Storage dialog box—they will be saved as a variable called TRES in the Data Editor Window.
  6. To add a loess fitted line to a scatterplot (useful for checking the zero mean regression assumption in a residual plot), select Editor > Add > Smoother. The default value of 0.5 for Degree of smoothing tends to be a little on the low side: I would change it to 0.75. You can create a scatterplot with a loess fitted line superimposed by selecting Graph > Scatterplot. Choose With Regression and move the vertical axis variable into the first row of the Y variables column and the horizontal axis variable into the first row of the X variables column. Before hitting OK, click the Data View button, click the Smoother tab in the subsequent Scatterplot - Data View dialog box, and change the Smoother from None to Lowess. Hit OK to return to the Scatterplot - With Regression dialog box, and OK again to create the graph.
  7. To save leverages in a multiple linear regression model, select Stat > Regression > Regression > Fit Regression Model. Move the response variable into the Response box and the predictor variables into the Predictors box. Before clicking OK, click the Storage button and check Hi (leverages) under Diagnostic Measures in the subsequent Regression: Storage dialog box. Click OK to return to the main Regression dialog box, and then hit OK. The leverages are saved as a variable called HI1 in the Worksheet Window; they can now be used just like any other variable, for example, to construct scatterplots. Each time you ask Minitab to save leverages like this, it will add a new variable to the dataset and increment an end digit by one; for example, the second time you save leverages they will be called HI_1.
  8. To save Cook’s distances in a multiple linear regression model, select Stat > Regression > Regression > Fit Regression Model. Move the response variable into the Response box and the predictor variables into the Predictors box. Before clicking OK, click the Storage button and check Cook's distance under Diagnostic Measures in the subsequent Regression: Storage dialog box. Click OK to return to the main Regression dialog box, and then hit OK. Cook’s distances are saved as a variable called COOK in the Worksheet Window; they can now be used just like any other variable, for example, to construct scatterplots. Each time you ask Minitab to save Cook’s distances like this, it will add a new variable to the dataset and increment an end digit by one; for example, the second time you save Cooks’ distances they will be called COOK_1.
  9. To create some residual plots automatically in a multiple linear regression model, select Stat > Regression > Regression > Fit Regression Model. Move the response variable into the Response box and the predictor variables into the Predictors box. Before clicking OK, click the Graphs button and select Deleted underResiduals for Plots in the subsequent Regression - Graphs dialog box. Check Residuals versus fits under Individual plots to create a scatterplot of the studentized residuals on the vertical axis versus the predicted values on the horizontal axis. You could also move individual predictor variables into the Residuals versus the variables box to create residual plots with each predictor variable on the horizontal axis. Click OK to return to the main Regression dialog box, and then hit OK. To create residual plots manually, first create studentized residuals (see help #35), and then construct scatterplots with these studentized residuals on the vertical axis.
  10. To create a correlation matrix of quantitative variables (useful for checking potential multicollinearity problems), select Stat > Basic Statistics > Correlation. Move the variables into the Variables box and hit OK.
  11. Minitab now displays variance inflation factors by default in multiple linear regression. The variance inflation factors are in the last column of the main regression output under “VIF.”
  12. To draw a predictor effect plot for graphically displaying the effects of transformed quantitative predictors and/or interactions between quantitative and qualitative predictors in multiple linear regression, first create a variable representing the effect, say, “X1effect” (see computer help #6). Then select Graph > Scatterplot. Choose With Connect and Groups and move the “X1effect” variable into the first row of the Y variables column and X1 into the first row of the X variables column.
    • If the “X1effect” variable just involves X1 (e.g., 1 + 3X1 + 4X12), you can click OK at this point.
    • If the “X1effect” variable also involves a qualitative variable (e.g., 1 − 2X1 + 3D2X1, where D2 is an indicator variable), you should move the qualitative variable into the Categorical variables for grouping box before clicking OK.
    See Section 5.5 in Pardoe (2020) for an example.