JMP – Applied Regression Modeling, 2nd edition

These instructions accompany Applied Regression Modeling by Iain Pardoe, 2nd edition published by Wiley in 2012. The numbered items cross-reference with the "computer help" references in the book. These instructions are based on SAS JMP 10 for Mac OS, but they (or something similar) should also work for other versions. Find instructions for other statistical software packages here.

Getting started and summarizing univariate data

If desired, change JMP's default options by selecting JMP > Preferences (Mac) or File > Preferences (Windows).
To open a JMP data file, select File > Open. You can also use File > Open to open text data files or Excel spreadsheets. For Excel spreadsheets, check the box labeled Always enforce Excel Row 1 as labels if the spreadsheet has the variable labels in the first row.
To relaunch analysis or recall dialog after running an analysis, click the red triangle next to the analysis name at the top of the output window, and select Script > Relaunch Analysis or Model dialog.
Output appears in a separate window each time you run an analysis. If you click on the "selection tool" (the third button from the left at the top of the window that looks like a "+"), you can select the output by clicking on it, and then right-click to Copy so that you can then paste it to a word processor like OpenOffice Writer or Microsoft Word.
You can access help by selecting Help > Statistics Index, then selecting the topic that you would like help with. There is also a Help button in each analysis dialog box.
To transform data or compute a new variable, select Cols > New Column, type the new variable name in the Column Name box, and select Formula under Column Properties. In the resulting dialog box, select the variable to be transformed under Table Columns and build the formula using the various operations and functions. Examples are Transcendental > Log for the natural logarithm and x^y for powers such as 2 ("squared"). The new variable should appear in the data spreadsheet (check that it looks correct) and can now be used just like any other variable.
To create indicator (dummy) variables from a qualitative variable, select the qualitative variable and select Cols > Recode. Type the values 0 and 1 under New Valuefor the appropriate categories and change\linebreak In Place to New Column. Check that the correct indicator variable has been created in the spreadsheet. Change the name and data/modeling type of the created variable by double-clicking the column heading (Data Type should be Numeric rather than Character and Modeling Type should be Continuous rather than Nominal). Repeat for other indicator variables (if necessary).
- To find a percentile (critical value) for a t-distribution, select View > Log (Windows) or Window > Log (Mac), type and highlight t Quantile(p, df), then click Run Script. Here p is the lower-tail area (i.e., one minus the one-tail significance level) and df is the degrees of freedom. For example, t Quantile(0.95, 29)returns the 95th percentile of the t-distribution with 29 degrees of freedom (1.699), which is the critical value for an upper-tail test with a 5% significance level. By contrast, t Quantile(0.975, 29) returns the 97.5th percentile of the t-distribution with 29 degrees of freedom (2.045), which is the critical value for a two-tail test with a 5% significance level.
- To find a percentile (critical value) for an F-distribution, select View > Log (Windows) or Window > Log (Mac), type and highlight F Quantile(p, df1, df2), then click Run Script. Here p is the lower-tail area (i.e., one minus the significance level), df1 is the numerator degrees of freedom, and df2 is the denominator degrees of freedom. For example, F Quantile(0.95, 2, 3) returns the 95th percentile of the F-distribution with 2 numerator degrees of freedom and 3 denominator degrees of freedom (9.552).
- To find a percentile (critical value) for a chi-squared distribution, select View > Log (Windows) or Window > Log (Mac), type and highlight ChiSquare Quantile(p, df), then click Run Script. Here p is the lower-tail area (i.e., one minus the significance level) and df is the degrees of freedom. For example, ChiSquare Quantile(0.95, 2) returns the 95th percentile of the chi-squared distribution with 2 degrees of freedom (5.991).
- To find an upper-tail area (one-tail p-value) for a t-distribution, select View > Log (Windows) or Window > Log (Mac), type and highlight 1 - t Distribution(t, df), then click Run Script. Here t is the absolute value of the t-statistic and df is the degrees of freedom. For example, 1 - t Distribution(2.40, 29) returns the upper-tail area for a t-statistic of 2.40 from the t-distribution with 29 degrees of freedom (0.012), which is the p-value for an upper-tail test. By contrast, 2*(1 - t Distribution(2.40, 29)) returns the two-tail area for a t-statistic of 2.40 from the t-distribution with 29 degrees of freedom (0.023), which is the p-value for a two-tail test.
- To find an upper-tail area (p-value) for an F-distribution, select View > Log (Windows) or Window > Log (Mac), type and highlight 1 - F Distribution(f, df1, df2), then click Run Script. Here f is the value of the F-statistic, df1 is the numerator degrees of freedom, and df2 is the denominator degrees of freedom. For example, 1 - F Distribution(51.4, 2, 3) returns the upper-tail area (p-value) for an F-statistic of 51.4 for the F-distribution with 2 numerator degrees of freedom and 3 denominator degrees of freedom (0.005).
- To find an upper-tail area (p-value) for a chi-squared distribution, select View > Log (Windows) or Window > Log (Mac), type and highlight 1 - ChiSquare Distribution(chisq, df), then click Run Script. Here chisq is the value of the chi-squared statistic and df is the degrees of freedom. For example, 1 - ChiSquare Distribution(0.38, 2) returns the upper-tail area (p-value) for a chi-squared statistic of 0.38 for the chi-squared distribution with 2 degrees of freedom (0.827).
Calculate descriptive statistics for quantitative variables by selecting Analyze > Distribution. Move the variable(s) into the Y, Columns list and click OK. In the resulting output window, you can select additional output by clicking on the red triangle next to each variable name.
Create contingency tables or cross-tabulations for qualitative variables by selecting Analyze > Fit Y by X. Move one qualitative variable into the Y, Response list and another into the X, Factor list. Cell percentages (within rows, columns, or the whole table) are displayed automatically in the resulting table.
If you have quantitative variables and qualitative variables, you can calculate descriptive statistics for cases grouped in different categories by selecting Tables > Summary. Select the quantitative variable(s) and then select the summaries that you would like from the Statistics menu. Move the qualitative variable(s) into the Grouplist.
To make a stem-and-leaf plot for a quantitative variable, select Analyze > Distribution. Move the variable(s) into the Y, Columns list and click OK. In the resulting output window, you can select Stem and Leaf by clicking on the red triangle next to each variable name.
To make a histogram for a quantitative variable, select Analyze > Distribution. Move the variable(s) into the Y, Columns list and click OK. In the resulting output window, you can select various Histogram Options by clicking on the red triangle next to each variable name.
To make a scatterplot with two quantitative variables, select Analyze > Fit Y by X. Move the vertical axis variable into the Y, Response box and the horizontal axis variable into the X, Factor box.
All possible scatterplots for more than two variables can be drawn simultaneously (called a scatterplot matrix) by selecting Graph > Scatterplot Matrix. Move all the variables into the Y, Columns box.
You can mark or label cases in a scatterplot with different colors/symbols according to categories in a qualitative variable by selecting Rows > Color or Mark by Column... before drawing the plot. Select the column containing the variable you wish to mark by.
You can identify individual cases in a scatterplot by hovering over individual points in the scatterplot. If you double-click a point, the corresponding row in the spreadsheet will be highlighted.
To remove one of more observations from a dataset, right-click on the row number(s) in the data spreadsheet and select Exclude/Unexclude.
To make a bar chart for cases in different categories, select Graph > Chart.
- For frequency bar charts of one or two qualitative variables, move the variable(s) into the Categories, X, Levels box.
- The bars can also represent various summary functions for a quantitative variable. For example, to represent group means, select the quantitative variable and then select Mean from the Statistics menu.
To make boxplots for cases in different categories, select Analyze > Fit Y by X.
- Move the quantitative variable into the Y, Response box and the qualitative variable into the X, Factor box. In the resulting Oneway Analysis output window, click on the red triangle and select Quantiles.
- To create clustered boxplots for two qualitative variables, first create a new qualitative variable consisting of all category combinations (using computer help #6 and theCharacter > Concat function). Then use this new variable as the X, Factor variable.
To make a QQ-plot (also known as a normal probability plot) for a quantitative variable, select Analyze > Distribution. Move the variable into the Y, Columns list and click OK. In the resulting output window, you can select Normal Quantile Plot by clicking on the red triangle next to the variable name.
To compute a confidence interval for a univariate population mean, select Analyze > Distribution. Move the variable into the Y, Columns list and click OK. In the resulting output window, you can select Confidence Interval by clicking on the red triangle next to the variable name. Enter the confidence level in the resultingConfidence Intervals dialog box and click OK.
To do a hypothesis test for a univariate population mean, select Analyze > Distribution. Move the variable into the Y, Columns list and click OK. In the resulting output window, you can select Test Mean by clicking on the red triangle next to the variable name. Enter the (null) hypothesized mean in the resulting Test Mean dialog box and click OK.

Simple linear regression

To fit a simple linear regression model (i.e., find a least squares line), select Analyze > Fit Model. Move the response variable into the Y box, select the predictor variable and Add it to the Construct Model Effects box, and click Run. In the rare circumstance that you wish to fit a model without an intercept term (regression through the origin), click No Intercept before clicking Run.
To add a regression line or least squares line to a scatterplot, select Analyze > Fit Y by X. Move the response variable into the Y, Response box, move the predictor variable into the X, Factor box, and click OK. Click on the red triangle in the resulting Fit Y by X output window, and select Fit Line.
To find 95% confidence intervals for the regression parameters in a simple or multiple linear regression model, fit the model using computer help #25 or #31, right-click in the body of the Parameter Estimates table in the resulting Fit Least Squares output window, and select Columns > Lower 95% and Columns > Upper 95%.
- To find a fitted value or predicted value of Y (the response variable) at a particular value of X (the predictor variable) in a linear regression model, fit the model using computer help #25 or #31, click on the red triangle next to Response in the resulting Fit Least Squares output window, and select Save Columns > Predicted Values. This will produce fitted or predicted values of Y for each of the X-values in the dataset by default (in a column labeled Predicted *, where the star represents the response variable name). Each time you ask JMP to calculate fitted or predicted values of Y like this it will add a new column to the dataset and append a number to the column header (e.g., "2" for the second time).
- You can also obtain a fitted or predicted values of Y at an X-value that is not in the dataset by doing the following. Before fitting the regression model, add the X-value to the dataset (go down to the bottom of the spreadsheet and type the X-value in the appropriate cell of the next blank row) Then fit the regression model and follow the steps above. JMP will ignore the X-value you typed when fitting the model (since there is no corresponding Y-value), so all the regression output (such as the estimated regression parameters) will be the same. But JMP will calculate a fitted or predicted value of Y at this new X-value based on the results of the regression. Again, look for it in the dataset in the column labeled Predicted *.
- This applies more generally to multiple linear regression also.
- To find a confidence interval for the mean of Y at a particular value of X in a linear regression model, fit the model using computer help #25 or #31, click on the red triangle next to Response in the resulting Fit Least Squares output window, and select Save Columns > Mean Confidence Interval. This will produce 95% intervals for each of the X-values in the dataset by default (in columns labeled Lower 95% Mean * and Upper 95% Mean *). Each time you ask JMP to calculate confidence intervals like this it will add new columns to the dataset and append a number to the column headers (e.g., "2" for the second time). If you hold down the Shift key and then select Save Columns > Mean Confidence Interval you'll be prompted to enter a significance level (e.g., enter 0.10 for 90% intervals).
- You can also obtain a confidence interval for the mean of Y at an X-value that is not in the dataset by doing the following. Before fitting the regression model, add the X-value to the dataset (go down to the bottom of the spreadsheet and type the X-value in the appropriate cell of the next blank row) Then fit the regression model and follow the steps above. JMP will ignore the X-value you typed when fitting the model (since there is no corresponding Y-value), so all the regression output (such as the estimated regression parameters) will be the same. But JMP will calculate a confidence interval for the mean of Y at this new X-value based on the results of the regression. Again, look for it in the dataset in the columns labeled Lower 95% Mean * and Upper 95% Mean *.
- This applies more generally to multiple linear regression also.
- To find a prediction interval for an individual value of Y at a particular value of X in a linear regression model, fit the model using computer help #25 or #31, click on the red triangle next to Response in the resulting Fit Least Squares output window, and select Save Columns > Indiv Confidence Interval. This will produce 95% intervals for each of the X-values in the dataset by default (in columns labeled Lower 95% Indiv * and Upper 95% Indiv *). Each time you ask JMP to calculate confidence intervals like this it will add new columns to the dataset and append a number to the column headers (e.g., "2" for the second time). If you hold down the Shift key and then select Save Columns > Indiv Confidence Interval you'll be prompted to enter a significance level (e.g., enter 0.10 for 90% intervals).
- You can also obtain a prediction interval for an individual Y-value at an X-value that is not in the dataset by doing the following. Before fitting the regression model, add the X-value to the dataset (go down to the bottom of the spreadsheet and type the X-value in the appropriate cell of the next blank row) Then fit the regression model and follow the steps above. JMP will ignore the X-value you typed when fitting the model (since there is no corresponding Y-value), so all the regression output (such as the estimated regression parameters) will be the same. But JMP will calculate a prediction interval for an individual Y at this new X-value based on the results of the regression. Again, look for it in the dataset in the columns labeled Lower 95% Indiv * and Upper 95% Indiv *.
- This applies more generally to multiple linear regression also.

Multiple linear regression

To fit a multiple linear regression model, select Analyze > Fit Model. Move the response variable into the Y box, select the predictor variables and Add them to the Construct Model Effects box, and click Run. In the rare circumstance that you wish to fit a model without an intercept term (regression through the origin), click No Intercept before clicking Run.
To add a quadratic regression line to a scatterplot, select Analyze > Fit Y by X. Move the response variable into the Y, Response box, move the predictor variable into the X, Factor box, and click OK. Click on the red triangle in the resulting Fit Y by X output window, and select Fit Polynomial > 2,quadratic.
Categories of a qualitative variable can be thought of as defining subsets of the sample. If there are also a quantitative response and a quantitative predictor variable in the dataset, a regression model can be fit to the data to represent separate regression lines for each subset. First use computer help #15 and #17 to make a scatterplot with the response variable on the vertical axis, the quantitative predictor variable on the horizontal axis, and the cases marked with different colors according to the categories in the qualitative predictor variable. To add a regression line for each subset to this scatterplot first click on the red triangle in the resulting Fit Y by X output window, select Group By ..., select the qualitative predictor variable, and click OK. Then click on the red triangle again and select Fit Line.
To find the F-statistic and associated p-value for a nested model F-test in multiple linear regression, fit the model using computer help #31, click on the red triangle next toResponse in the resulting Fit Least Squares output window, and select Custom Test.... The resulting Custom Test output will have a list of regression parameters that has a column of zeroes next to it; click the zero next to the first parameter in the nested F-test null hypothesis and change the value to "1." Then click Add Column and repeat for the second parameter in the null hypothesis. Repeat for each of the parameters in the null hypothesis, then click Done.
To save residuals in a multiple linear regression model, fit the model using computer help #31, click on the red triangle next to Response in the resulting Fit Least Squares output window, and select Save Columns > Residuals. The residuals are saved as a variable called Residual *, where the star represents the response variable name; they can now be used just like any other variable, for example, to construct residual plots. To save what Pardoe (2012) calls standardized residuals, selectSave Columns > Studentized Residuals—they will be saved as a variable called Studentized Resid *. JMP does not appear to offer a way to save what Pardoe (2012) calls studentized residuals
JMP does not appear to offer a way to add a loess fitted line to a scatterplot but it can add a similar smoothing spline fitted line (useful for checking the zero mean regression assumption in a residual plot). To do so, select Analyze > Fit Y by X. Move the vertical axis variable (e.g., the studentized residuals) into the Y, Responsebox, move the horizontal axis variable into the X, Factor box, and click OK. Click on the red triangle in the resulting Fit Y by X output window, and select Fit Spline; you can experiment to find a value for the smoothing parameter "lambda" that captures the major trends in the scatterplot without being overly "wiggly," but typically a value of 1 or 10 should work well.
To save leverages in a multiple linear regression model, fit the model using computer help #31, click on the red triangle next to Response in the resulting Fit Least Squares output window, and select Save Columns > Hats. The leverages are saved as a variable called h *, where the star represents the response variable name; they can now be used just like any other variable, for example, to construct scatterplots.
To save Cook's distances in a multiple linear regression model, fit the model using computer help #31, click on the red triangle next to Response in the resulting Fit Least Squares output window, and select Save Columns > Cook's D Influence. The Cook's distances are saved as a variable called Cook's D Influence *, where the star represents the response variable name; they can now be used just like any other variable, for example, to construct scatterplots.
JMP will automatically create a residual plot in a multiple linear regression model, specifically one with the (ordinary) residuals on the vertical axis versus the predicted values on the horizontal axis. To create residual plots manually, first create standardized residuals (see computer help #35), and then construct scatterplots with these standardized residuals on the vertical axis.
To create a correlation matrix of quantitative variables (useful for checking potential multicollinearity problems), select Analyze > Multivariate Methods > Multivariate. Move all the variables into the Y, Columns box and click OK.
To find variance inflation factors in multiple linear regression, fit the model using computer help #31, right-click in the body of the Parameter Estimates table in the resulting Fit Least Squares output window, and select Columns > VIF.
To draw a predictor effect plot for graphically displaying the effects of transformed quantitative predictors and/or interactions between quantitative and qualitative predictors in multiple linear regression, first create a variable representing the effect, say, "X1effect" (see computer help #6).
- If the "X1effect" variable just involves X1 (e.g., 1 + 3X1 + 4X1²), then use computer help #26 to create the line plot.
- If the "X1effect" variable also involves a qualitative variable (e.g., 1 − 2X1 + 3D2X1, where D2 is an indicator variable), you should then use computer help #33 to create the line plot.
See Section 5.5 in Pardoe (2012) for an example.