These instructions were kindly prepared by Tom Kari to accompany Applied Regression Modeling by Iain Pardoe, 2nd edition published by Wiley in 2012. The numbered items cross-reference with the "computer help" references in the book. These instructions are based on SAS Studio Release 3.6, but they should also work for other versions. Find instructions for other statistical software packages here.
- SAS On Demand for Academics (SODA) is a facility that lets you use SAS Studio to run SAS on the cloud. You can read more about SODA, and set up an account, here.
- SAS Studio is a point-and-click interface that lets you use SAS to process and analyze data. You can read more about SAS Studio here.
Note that in SAS Studio you can also create and submit SAS code, so any of the instructions in the SAS code instructions can also be used. Some of the instructions below require creating SAS code, as they can't all be implemented with the point-and-click interface. Some of the SAS code instructions use data from the SASHELP library, which is automatically available for examples and for testing code. You can just drop the code from an instruction into a SAS code window, and it will execute.
Getting started and summarizing univariate data
- SAS Studio preferences and options can be set using the
icon.
- Use the Utilities > Import Data task to import your data.
(If you need to upload it first, you can do this using theicon on the Server Files and Folders navigation pane.)
- To undo an action, use the
icon.
- Output of tables and graphs appears in the Results window.
You can choose HTML, PDF, or RTF results in the Preferences > Results menu under theicon.
RTF output can be copied and pasted to a word processor like OpenOffice Writer or Microsoft Word. - You can access help by selecting the
icon.
- To transform data or compute a new variable:
Select the Data > Transform Data task.
On the DATA tab:
Select your dataset in the DATA menu item.
Specify up to three transformations in the TRANSFORM 1, TRANSFORM 2, and TRANSFORM 3 menu items:
- Select the variable in the Variable pick list;
- Select the transformation type in the Transform menu item.
Change the output datset name if desired on the OUTPUT DATA SET menu item.
The following transformations are pre-programmed in SAS Studio:
inverse square, inverse, inverse square root, natural log, square root, square
You can also use the Specify Custom Transformation option to create the SAS code for a different transformation.
Once you've specified the transformations the way you want, click the Run icon to run the task. If you need an option not covered by the above, you can also do it with SAS programming. Create a new SAS program, and type, for example,
data mydata2;
set work.mydata;
E_Y = b0 + (b1*X1) + (b2*(X1**2)) + (b3*X2) + (b4*X3) + (b5*(X3**2));
run;
to reproduce the equation shown on page 146. - There isn't a SAS Studio task to create indicator (dummy) variables from a qualitative variable. Create a new SAS program, and type, for example,
data mydata2;
set work.mydata;
if X='level' then D1=1;
else D1=0;
run;
where X is the qualitative variable and "level" is the name of one of the
categories in X. Repeat for other indicator variables (if necessary).- There isn't a SAS Studio task to find a percentile (critical value) for a t-distribution. Create a new SAS program, and type, for example,
data mydata;
cvt = quantile('t', p, df);
run;
where p is the lower-tail area (i.e., one minus the one-tail significance level) and df is the degrees of freedom. When you run the program, the result will be in variable cvt in the output dataset. For example, quantile('t', .95, 29) returns the 95th percentile of the t-distribution with 29 degrees of freedom (1.699), which is the critical value for an upper-tail test with a 5% significance level. By contrast, quantile('t', .975, 29) returns the 97.5th percentile of the t-distribution with 29 degrees of freedom (2.045), which is the critical value for a two-tail test with a 5% significance level. - There isn't a SAS Studio task to find a percentile (critical value) for an F-distribution. Create a new SAS program, and type, for example,
data mydata;
cvt = quantile('f', p, df1, df2);
run;
where p is the lower-tail area (i.e., one minus the significance level), df1 is the numerator degrees of freedom, and df2 is the denominator degrees of freedom. When you run the program, the result will be in variable cvt in the output dataset. For example, quantile('f', .95, 2, 3) returns the 95th percentile of the F-distribution with 2 numerator degrees of freedom and 3 denominator degrees of freedom (9.552). - There isn't a SAS Studio task to find a percentile (critical value) for a chi-squared distribution. Create a new SAS program, and type, for example,
data mydata;
cvt = quantile('chisq', p, df);
run;
where p is the lower-tail area (i.e., one minus the significance level) and df is the degrees of freedom. When you run the program, the result will be in variable cvt in the output dataset. For example, quantile('chisq', 0.95, 2) returns the 95th percentile of the chi-squared distribution with 2 degrees of freedom (5.991).
- There isn't a SAS Studio task to find an upper-tail area (one-tail p-value) for a t-distribution. Create a new SAS program, and type, for example,
data mydata;
pt = 1 - probt(t, df);
run;
where t is the value of the t-statistic and df is the degrees of freedom.
When you run the program, the result will be in variable pt in the output dataset. For example, pt = 1 - probt(2.40, 29); returns the upper-tail area for a t-statistic of 2.40 from the t-distribution with 29 degrees of freedom (0.012), which is the p-value for an upper-tail test. By contrast, pt = 2 * (1 - probt(2.40, 29)); returns the two-tail area for a t-statistic of 2.40 from the t-distribution with 29 degrees of freedom (0.023), which is the p-value for a two-tail test. - There isn't a SAS Studio task to find an upper-tail area (p-value) for an F-distribution. Create a new SAS program, and type, for example,
data mydata;
pf = 1 - probf(f, df1, df2);
run;
where f is the value of the F-statistic, df1 is the numerator degrees of freedom, and df2 is the denominator degrees of freedom. When you run the program, the result will be in variable pf in the output dataset. For example, pf = 1 - probf(51.4, 2, 3); returns the upper-tail area (p-value) for an F-statistic of 51.4 for the F-distribution with 2 numerator degrees of freedom and 3 denominator degrees of freedom (0.005). - There isn't a SAS Studio task to find an upper-tail area (p-value) for a chi-squared distribution. Create a new SAS program, and type, for example,
data mydata;
pchisq = 1 - probchi(chisq, df);
run;
where chisq is the value of the t-statistic and df is the degrees of freedom. When you run the program, the result will be in variable pchisq in the output dataset. For example, pchisq = 1 - probchi(0.38, 2); returns the upper-tail area (p-value) for a chi-squared statistic of 0.38 for the chi-squared distribution with 2 degrees of freedom (0.827).
- There isn't a SAS Studio task to find a percentile (critical value) for a t-distribution. Create a new SAS program, and type, for example,
- There are two good options to calculate descriptive statistics for quantitative variables:
Method 1: Using a SAS Studio task
Select the Statistics > Summary Statistics task.
On the DATA tab:
Select your dataset in the DATA menu item.
Select the quantitative variable(s) to be analyzed on the Analysis variables: menu items.
On the OPTIONS tab:
On the STATISTICS menu item, select the desired statistics from the submenus.
Click the Run icon to run the task.
Method 2: Using SAS code
Create a new SAS program, and type, for example,
proc univariate data=sashelp.class;
var Height;
run;
where Height is the quantitative variable. Specify an output statement to calculate other statistics beyond those calculated by default (see SAS Help for specific details on how to do this).
Which approach you will use depends on which statistics you require. - To create contingency tables or cross-tabulations for qualitative
variables:
Select the Statistics > Table Analysis task.
On the DATA tab:
Select your dataset in the DATA menu item.
Select your Row and Column variables on the Row variables: and Column variables: menu items.
Click the Run icon to run the task. Note that there are many additional options available on the OPTIONS tab. To obtain percentages, tick the Cell, Row, and Column boxes in the FREQUENCY TABLE > Percentages menu item. - There isn't a SAS Studio task to calculate descriptive statistics for cases grouped in different categories when you have a quantitative variable and a qualitative variable. Create a new SAS program, and type, for example,
proc sort data=sashelp.shoes out=mydata;
by Region;
run;
proc univariate data=mydata;
var Sales;
by Region;
run;
where Sales is the quantitative variable and Region is the qualitative variable. Specify an output statement to calculate other statistics beyond those calculated by default (see SAS Help for specific details on how to do this). - There isn't a SAS Studio task to make a stem-and-leaf plot for a quantitative variable. Create a new SAS program, and type, for example,
ods graphics off;
proc univariate data=sashelp.class plot;
var Age;
run;
where Age is the quantitative variable (for large sample sizes, SAS will create a horizontal bar chart instead of a stem-and-leaf plot). If you don't use the ODS statement, you'll always get a bar chart. - To make a histogram for a quantitative variable:
Select the Graph > Histogram task.
On the DATA tab:
Select your dataset in the DATA menu item.
Select your quantitative variable on the Analysis variable menu item.
The number of bins can be selected with the Specify number of bins option on the Horizontal Axis menu item on the OPTIONS tab.
Click the Run icon to run the task. - To make a scatterplot with two quantitative variables:
Select the Graph > Scatter Plot task.
On the DATA tab:
Select your dataset in the DATA menu item.
Select your X and Y variables on the X axis: and Y axis: menu items
where Y is the vertical axis variable and X is the horizontal axis variable.
Click the Run icon to run the task. - There isn't a SAS Studio task to draw all possible scatterplots for more than two variables (called a scatterplot matrix).
To do this, create a new SAS program, and type, for example,
proc sgscatter data=sashelp.shoes;
matrix Sales Inventory Returns;
run;
where Sales, Inventory, and Returns are quantitative variables. - To mark or label cases in a scatterplot with different colors/symbols according to categories in a qualitative variable:
Suppose a qualitative variable contains values 1-4 to represent four categories.
First follow computer help #15 to specify a scatter plot.
Then, on the DATA tab, on the Group: item, select the qualitative variable.
Click the Run icon to run the task.
The colors and symbols can be customised, but not in the SAS Studio task. See number 17 in the SAS code instructions to see how to create a program that will do this. - To identify individual cases in a scatterplot:
Select the Graph > Scatter Plot task.
On the DATA tab:
Select your dataset in the DATA menu item.
Select your X and Y variables on the X axis: and Y axis: menu items
where Y is the vertical axis variable and X is the horizontal axis variable.
On the APPEARANCE tab:
Click on the MARKERS menu item to expand it, and select your label variable in the Marker label: box.
You can change the characteristics of the label with the other options in the MARKERS menu item.
Click the Run icon to run the task. - To remove one of more observations from a dataset, determine the value(s) with respect to a particular variable.
Then use one of these methods:
The easiest option if the task you are using has a Filter: menu item under the DATA menu item (where you select your dataset), click it and enter a logical comparison that will remove the desired observations, for example Sales <= 500000 to remove observations where the value for Sales is greater than 500,000.
If the task doesn't have a Filter: option:
Select the Data > Filter Data task.
On the DATA tab:
Select your dataset in the DATA menu item.
On the FILTER 1 menu item:
Select the variable that forms the basis for removing the observations in the Variable 1: menu item.
Set up the comparison parameters that will exclude the observations you wish to exclude, for example Less than or equal in the Comparison menu item,
Select Enter a value in the Value type: menu item, and enter the comparison amount, for example 500000 in the Value: menu item.
You can also use the Logical: menu item to extend your selection with more comparisons.
If necessary, change the output dataset name in the OUTPUT DATA SET menu item.
Click the Run icon to run the task.
The task will create a copy of your dataset with the name specified in the OUTPUT DATA SET menu item . You will then need to run any further tasks on this dataset. - To make a bar chart for cases in different categories:
- For frequency bar charts of one qualitative variable:
Select the Graph > Bar Chart task.
On the DATA tab:
Select your dataset in the DATA menu item.
Select your category variable on the Category: menu item of the ROLES section.
Make sure the Measure: menu item of the ROLES section is set to Frequency Count (default).
Click the Run icon to run the task. - For frequency bar charts of two qualitative variables:
Select the Graph > Bar Chart task.
On the DATA tab:
Select your dataset in the DATA menu item.
Select your first category variable on the Category: menu item of the ROLES section.
Select your second category variable on the Subcategory: menu item of the ROLES section.
Make sure the Measure: menu item of the ROLES section is set to Frequency Count (default).
Click the Run icon to run the task. - The bars can also represent various summary functions for a quantitative variable. These can be selected using the Measure: menu item of the ROLES section.
- For frequency bar charts of one qualitative variable:
- There are two good options to create boxplots for cases in different categories:
Method 1: Using a SAS Studio task- For just one qualitative variable:
Select the Graph > Box Plot task.
On the DATA tab:
Select your dataset in the DATA menu item.
Select your quantitative variable on the Analysis variable: menu item.
Select your qualitative variable on the Category: menu item.
Click the Run icon to run the task. - For two qualitative variables:
Select the Graph > Box Plot task.
On the DATA tab:
Select your dataset in the DATA menu item.
Select your quantitative variable on the Analysis variable: menu item.
Select your major qualitative variable on the Category: menu item.
Select your minor qualitative variable on the Subcategory: menu item.
Click the Run icon to run the task.
Method 2: Using SAS code
- For just one qualitative variable, create a new SAS program, and type, for example,
proc sgpanel data=sashelp.shoes;
panelby Region;
vbox Sales;
run;
where Sales is a quantitative variable and Region is the qualitative variable. - For two qualitative variables, create a new SAS program, and type, for example,
proc sgpanel data=sashelp.shoes;
panelby Region Product;
vbox Sales;
run;
where Sales is a quantitative variable, Region is the major qualitative variable, and Product is the minor qualitative variable
Which approach you will use depends on which graphical presentation you prefer.
- For just one qualitative variable:
- To create a QQ-plot (also known as a normal probability plot) for a quantitative variable:
Select the Statistics > Distribution Analysis task.
On the DATA tab:
Select your dataset in the DATA menu item.
Select your quantitative variable(s) in the Analysis Variables: menu item.
On the OPTIONS tab:
Click the Normal quantile-quantile plot box.
Click the Run icon to run the task. - There isn't a SAS Studio task to compute a confidence interval for a univariate population mean.
To do this, create a new SAS program, and type, for example,
proc univariate data=sashelp.class cibasic(alpha=0.05);
var Weight;
run;
where Weight is the variable for which you want to calculate the confidence interval, and alpha is the confidence level of the interval. - There isn't a SAS Studio task to do a hypothesis test for a univariate population mean. Create a new SAS program, and type, for example,
proc univariate data=sashelp.class mu0=90;
var Weight;
run;
where Weight is the variable for which you want to do the test and mu0 is the (null) hypothesized value.
Simple linear regression
- To fit a simple linear regression model (i.e., find a least squares line):
Select the Statistics > Linear Regression task.
On the DATA tab:
Select your dataset in the DATA menu item.
Select your response variable in the Dependent Variable menu item.
Select your predictor variable in the Continuous Variables menu item.
On the MODEL tab:
Click the Edit box on Model Effects.
Select your predictor variable, and click the Add button.
Click OK to return from the Model Effects.
Click the Run icon to run the task.
In the rare circumstance that you wish to fit a model without an intercept term (regression through the origin), untick the Intercept tick box in the Model Effects dialogue.Several of the instructions below require using SAS code to use features that aren't available in SAS Studio. They are all based on the following SAS code, which implements the regression model described above:
proc reg data=sashelp.class;
model Weight=Height;
run;
where Weight is the response variable and Height is the predictor variable. - To add a regression line or least squares line to a scatterplot, first set up the scatterplot (see help #15).
Then, on the APPEARANCE tab, on the FIT CURVES menu item, tick the Regression box.
Click the Run icon to run the task. - To find confidence intervals for the regression parameters in a simple
linear regression model, first set up the regression model (see help #25).
Then, on the OPTIONS tab, on the Statistics menu item, change Default statistics to Default and selected statistics. Tick the Confidence limits for estimates box.
Click the Run icon to run the task.
The confidence intervals are displayed as two columns headed 95% Confidence Limits.
This applies more generally to multiple linear regression also. - There are two options to find a fitted value or predicted value of Y (the response variable) at a particular value of X (the predictor variable).
- In a Linear Regression report:
First set up the regression model (see help #25).
Then, on the OPTIONS tab, on the Statistics menu item, change Default statistics to Default and selected statistics. Tick the Predicted values box.
Click the Run icon to run the task.
The predicted values are displayed with the original data in a column headed Predicted Value. - In a Linear Regression output dataset:
First set up the regression model (see help #25).
Then, on the OUTPUT tab (you may have to scroll to the right to see it), on the OUTPUT DATA SETS menu item, tick the Create observationwise statistics data set box, and under the Predicted Values item tick the Predicted value box.
Click the Run icon to run the task.
A SAS dataset will be created, with the original variables and values from the input dataset, and a new variable containing the predicted value. - You can also obtain a fitted or predicted values of Y at an X-value that is not in the dataset by doing the following. Before fitting the regression model, add the X-value to the dataset using code such as:
data mydata;
input Height;
datalines;
75
run;
data mydata2;
set sashelp.class mydata;
run;
where Height is the predictor variable. Then fit the regression model, using this new dataset and the instructions above. SAS will ignore the predictor value you added when fitting the model (since there is no corresponding response variable), so all the regression output (such
as the estimated regression parameters) will be the same. But SAS will calculate a fitted or predicted value of Y at this new X-value based on the results of the regression.
This applies more generally to multiple linear regression also.
- In a Linear Regression report:
- To find a confidence interval for the mean of Y at a particular value of X:
- First set up the regression model (see help #25).
Then, on the OUTPUT tab (you may have to scroll to the right to see it), on the OUTPUT DATA SETS menu item, tick the Create observationwise statistics data set box,
and under the Predicted Values item tick the Confidence intervals for mean predicted value box.
Click the Run icon to run the task.
A SAS dataset will be created, with the original variables and values from the input dataset, and two new variables containing the confidence interval. - You can also obtain a confidence interval for the mean of Y at an X-value that is not in the dataset by doing the following. Before fitting the regression model, add the X-value to the dataset using code such as:
data mydata;
input Height;
datalines;
75
run;>data mydata2;
set sashelp.class mydata;
run;
where Height is the predictor variable. Then fit the regression model using this new dataset and the instructions above. SAS will ignore the predictor value you added when fitting the model (since there is no corresponding response variable), so all the regression output (such as the estimated regression parameters) will be the same. But SAS will calculate a confidence interval for the mean of Y at this new X-value based on the results of the regression.
This applies more generally to multiple linear regression also.
- First set up the regression model (see help #25).
- To find a prediction interval for an individual value of Y at a particular value of X:
- First set up the regression model (see help #25).
Then, on the OUTPUT tab (you may have to scroll to the right to see it), on the OUTPUT DATA SETS menu item, tick the Create observationwise statistics data set box,
and under the Predicted Values item tick the Confidence intervals for individual predicted value box.
Click the Run icon to run the task.
A SAS dataset will be created, with the original variables and values from the input dataset, and two new variables containing the prediction interval. - You can also obtain a prediction interval for an individual value of Y at an X-value that is not in the dataset by doing the following. Before fitting the regression model, add the X-value to the dataset using code such as:
data mydata;
input Height;
datalines;
75
run;data mydata2;
set sashelp.class mydata;
run;
where Height is the predictor variable. Then fit the regression model using this new dataset and the instructions above. SAS will ignore the predictor value you added when fitting the model (since there is no corresponding response variable), so all the regression output (such as the estimated regression parameters) will be the same. But SAS will calculate a prediction interval for an individual value of Y at this new X-value based on the results of the regression.
This applies more generally to multiple linear regression also.
- First set up the regression model (see help #25).
Multiple linear regression
-
- To fit a multiple linear regression model:
Select the Statistics > Linear Regression task.
On the DATA tab:
Select your dataset in the DATA menu item.
Select your response variable in the Dependent Variable menu item.
Select your predictor variables in the Continuous Variables menu item.
On the MODEL tab:
Click the Edit box on Model Effects.
Select your predictor variables, using Ctrl+Click if necessary, and click the Add button.
Click OK to return from the Model Effects.
Click the Run icon to run the task.
In the rare circumstance that you wish to fit a model without an intercept term (regression through the origin), untick the Intercept tick box in the Model Effects dialogue. Several of the instructions below require using SAS code to use features that aren't available in SAS Studio. They are all based on the following SAS code, which implements the regression model described above:
proc reg data=sashelp.class;
model Weight=Age Height;
run;
where Weight is the response variable and Age and Height are the predictor variables. - To add a quadratic regression line to a scatterplot, first follow computer help #15 to specify a scatter plot.
Then, on the APPEARANCE tab, on the FIT CURVES data item, tick the Regression option, and set the Degree to 2 to specify a quadratic line.
Click the Run icon to run the task. - Categories of a qualitative variable can be thought of as defining subsets of the sample. If there is also a quantitative response and a quantitative predictor variable in the dataset, a regression model can be fit to the data to represent separate regression lines for each subset. First follow computer help #17 to specify a scatter plot with different markings for values of a qualitative variable.
Then, on the APPEARANCE tab, on the FIT CURVES menu item, tick the Regression box.
Click the Run icon to run the task. - To find the F-statistic and associated p-value for a nested model F-test in
multiple linear regression, create a new SAS program, and type, for example,
proc reg data=sashelp.cars;
model MPG_Highway={EngineSize Horsepower} {MSRP Invoice} / selection=forward
groupnames='EngineSize Horsepower' 'MSRP Invoice'
slentry=0.99;
run;
Here, EngineSize and Horsepower are in the reduced model, while EngineSize, Horsepower, MSRP, and Invoice are in the complete model. The F-statistic is in the second row of the "Summary of Forward Selection Table" in the column headed F Value, while the associated
p-value is in the column headed Pr > F. - There are two options to find the different types of residuals in a multiple linear regression model.
- In a Linear Regression report:
First set up the regression model (see help #31).
Then, on the OPTIONS tab, on the Statistics menu item, change Default statistics to Default and selected statistics.
Under the Diagnostics menu item, tick the Analysis of influence and the Analysis of residuals boxes.
Click the Run icon to run the task.
The residuals are displayed in the column headed Residual, what Pardoe (2012) calls standardized residuals are displayed in the column headed Student Residual, and what Pardoe (2012) calls studentized residuals are displayed in the column headed RStudent. - In a Linear Regression output dataset:
First set up the regression model (see help #31).
Then, on the OUTPUT tab (you may have to scroll to the right to see it), on the OUTPUT DATA SETS menu item, tick the Create observationwise statistics data set box,
and under the Residuals item tick the Residual, Studentized residual, and Studentized residual with current observation removed boxes.
Click the Run icon to run the task.
A SAS dataset will be created, with the original variables and new variables (r_, student_, and rstudent_) containing the residual values.
- In a Linear Regression report:
- To add a loess fitted line to a scatterplot:
First follow computer help #15 to specify a scatter plot.
Then, on the APPEARANCE tab, on the FIT CURVES item, tick the Loess box.
Click the Run icon to run the task.
If you wish to adjust the smoothness of the line, see SAS Help for information about specifying a smoothing value. - There are two options to find leverages in a multiple linear regression model.
- In a Linear Regression report:
First set up the regression model (see help #31).
Then, on the OPTIONS tab, on the Statistics menu item, change Default statistics to Default and selected statistics.
Under the Diagnostics menu item, tick the Analysis of influence box.
Click the Run icon to run the task.
The leverages are displayed in the column headed Hat Diag H. - In a Linear Regression output dataset:
First set up the regression model (see help #31).
Then, on the OUTPUT tab (you may have to scroll to the right to see it), on the OUTPUT DATA SETS menu item, tick the Create observationwise statistics data set box,
and under the Influence Statistics item tick the Leverage box.
Click the Run icon to run the task.
A SAS dataset will be created, with the original variables and values from the input dataset, and a new variable containing the leverage.
- In a Linear Regression report:
- There are two options to find Cook's distances in a multiple linear regression model.
- In a Linear Regression report:
First set up the regression model (see help #31).
Then, on the OPTIONS tab, on the Statistics menu item, change Default statistics to Default and selected statistics.
Under the Diagnostics menu item, tick the Analysis of residuals box.
Click the Run icon to run the task.
The Cook's distances are displayed in the column headed Cook's D. - In a Linear Regression output dataset:
First set up the regression model (see help #31).
Then, on the OUTPUT tab (you may have to scroll to the right to see it), on the OUTPUT DATA SETS menu item, tick the Create observationwise statistics data set box,
and under the Influence Statistics item tick the Cook's D box.
Click the Run icon to run the task.
A SAS dataset will be created, with the original variables and values from the input dataset, and a new variable containing the Cook's distance.
- In a Linear Regression report:
- To create some residual plots automatically in a multiple linear regression
model, modify the SAS code in computer help #31 by adding an plot
statement, as in the example below:
proc reg data=sashelp.class;
model Weight=Age Height;
plot rstudent.*predicted. rstudent.*nqq. rstudent.*cookd.;
run;
This produces a plot of studentized residuals versus fitted values, a QQ-plot of the studentized residuals, and a plot of studentized residuals versus Cook's distances. To create residual plots manually, first create studentized residuals (see help #35), and then construct scatterplots with these studentized residuals on the vertical axis. - To create a correlation matrix of quantitative variables (useful for checking potential multicollinearity problems):
Select the Statistics > Correlation Analysis task.
On the DATA tab:
Select your dataset in the DATA menu item.
Select your quantitative variables to be analysed in the Analysis Variables: menu item.
Click the Run icon to run the task.
If you would like to print the simple statistics for the quantitative variables, on the STATISTICS item of the OPTIONS menu set Display Statistics: to Selected Statistics and tick the Descriptive Statistics box. - To find variance inflation factors in multiple linear regression, first set up the regression model (see help #31).
Then, on the OPTIONS tab, on the Statistics menu item, change Default statistics to Default and selected statistics. Under the Collinearity menu item, tick the Variance inflation factors box.
Click the Run icon to run the task. - To draw a predictor effect plot for graphically displaying the effects of
transformed quantitative predictors and/or interactions between quantitative and qualitative predictors in multiple linear regression, first create an "x1effect" variable representing the effect (see help #6). As an example, we'll create WeightEffect1 and WeightEffect2 using the following, as well as an indicator variable for a qualitative variable (see help #7):
data mydata;
set sashelp.class;
if Sex='M' then D1=1;
else D1=0;
WeightEffect1 = 1 + (3 * Weight) + (4 * (Weight **2));
WeightEffect2 = 1 - (2 * Weight) + (3 * D1 * Weight);
run;
Next, use the Data > Sort Data task to sort the dataset by the values of the X variable (if necessary).Then, select the Graph > Series Plot task.
On the DATA tab:
Select your new dataset, or the sorted dataset if a sort was necessary, in the DATA menu item.
Select your X and Y variables on the X axis: and Y axis: menu items
where Y is the vertical axis variable and X is the horizontal axis variable.- If the "X1effect" just involves the X variable (e.g.,
1 + 3X + 4X2), that's all you need to do. (In our example above, X would be Weight and Y would be WeightEffect1.) - If the "X1 effect" variable involves a qualitative variable
(e.g., 1 − 2X + 3D1X, where D1 is an indicator variable based on the qualitative variable), on the DATA tab, on the Group: item, select the qualitative variable that D1 is based on. (In our example above, X would be Weight, Y would be WeightEffect2 and Sex would be the qualitative variable.)
Click the Run icon to run the task.
See Section 5.5 in Pardoe (2012) for an example. - If the "X1effect" just involves the X variable (e.g.,
- To fit a multiple linear regression model: