These instructions accompany Applied Regression Modeling by Iain Pardoe, 2nd edition published by Wiley in 2012. The numbered items cross-reference with the "computer help" references in the book. These instructions are based on EViews 7 for Windows, but they (or something similar) should also work for other versions. Find instructions for other statistical software packages here.
Getting started and summarizing univariate data
- If desired, change EViews' default options by selecting Options > General Options.
- To open an EViews data file, select File > Open > EViews Workfile. You can also open other files, such as Excel spreadsheets or SPSS data files.
- EViews does not appear to offer a way to recall a previously used dialog box.
- Output appears in separate windows, from where it can be copied and pasted to a word processor like OpenOffice Writer or Microsoft Word.
- You can access help by selecting Help > EViews Help Topics. For example, to find out about "boxplots" click the Index tab, type boxplots in the first box, and select the index entry you want in the second box.
- To transform data or compute a new variable, select Quick > Generate Series. Type a name (with no spaces) for the new variable in the Enter equation box, then =, then type a mathematical expression for the variable. Examples are logx=log(x) for the natural logarithm of X and xsq=x^2 for x2. Click OK to create the new variable, which will be added to the dataset (check it looks correct in the Workfile); it can now be used just like any other variable. If you get an error message, this probably means that there is a syntax error in your equation—a common mistake is to forget the multiplication symbol (*) between a number and a variable (e.g., 2*xrepresents 2x).
- To create indicator (dummy) variables from a qualitative variable, select Quick > Generate Series. Type, for example, D1=@recode(x="level", 1, 0),where xis the qualitative variable and level is the name of one of the categories in x. Repeat for other indicator variables (if necessary).
-
- To find a percentile (critical value) for a t-distribution, type @qtdist(p,df) into the Command window, where p is the lower-tail area (i.e., one minus the one-tail significance level) and df is the degrees of freedom. Press the return/enter key to see the result at the bottom of the screen. For example, @qtdist(.95,29) returns the 95th percentile of the t-distribution with 29 degrees of freedom (1.699), which is the critical value for an upper-tail test with a 5% significance level. By contrast, @qtdist(.975,29) returns the 97.5th percentile of the t-distribution with 29 degrees of freedom (2.045), which is the critical value for a two-tail test with a 5% significance level.
- To find a percentile (critical value) for an F-distribution, type @qfdist(p,df1,df2) into the Command window, where p is the lower-tail area (i.e., one minus the significance level), df1 is the numerator degrees of freedom, and df2 is the denominator degrees of freedom. For example, @qfdist(0.95,2,3) returns the 95th percentile of the F-distribution with 2 numerator degrees of freedom and 3 denominator degrees of freedom (9.552).
- To find a percentile (critical value) for a chi-squared distribution, type @qchisq(p,df) into the Command window, where p is the lower-tail area (i.e., one minus the significance level) and df is the degrees of freedom. For example, @qchisq(0.95,2) returns the 95th percentile of the chi-squared distribution with 2 degrees of freedom (5.991).
-
- To find an upper-tail area (one-tail p-value) for a t-distribution, type =1-@ctdist(t,df) into the Command window, where t is the value of the t-statistic and df is the degrees of freedom. For example, =1-@ctdist(2.40,29) returns the upper-tail area for a t-statistic of 2.40 from the t-distribution with 29 degrees of freedom (0.012), which is the p-value for an upper-tail test. By contrast, =2*(1-@ctdist(2.40,29)) returns the two-tail area for a t-statistic of 2.40 from the t-distribution with 29 degrees of freedom (0.023), which is the p-value for a two-tail test.
- To find an upper-tail area (p-value) for an F-distribution, type =1-@cfdist(f,df1,df2) into the Command window, where f is the value of the F-statistic, df1 is the numerator degrees of freedom, and df2 is the denominator degrees of freedom. For example, =1-@cfdist(51.4,2,3) returns the upper-tail area (p-value) for an F-statistic of 51.4 for the F-distribution with 2 numerator degrees of freedom and 3 denominator degrees of freedom (0.005).
- To find an upper-tail area (p-value) for a chi-squared distribution, type =1-@cchisq(chisq,df) into the Command window, where chisq is the value of the chi-squared statistic and df is the degrees of freedom. For example, =1-@cchisq(0.38,2) returns the upper-tail area (p-value) for a chi-squared statistic of 0.38 for the chi-squared distribution with 2 degrees of freedom (0.827).
- Calculate descriptive statistics for a quantitative variable by selecting Quick > Show. Type the name of the quantitative variable into the Objects to display in a single window box and click OK. Click View > Descriptive Statistics & Tests > Stats Table and click OK to display the results.
- Create contingency tables or cross-tabulations for qualitative variables by selecting Quick > Show. Type the names of two qualitative variables separated by spaces into the Objects to display in a single window box and click OK. Click View > N-Way Tabulation and click OK to display the table. Cell percentages (within rows, columns, or the whole table) can be calculated by selecting the appropriate options in the Crosstabulation dialog box.
- If you have a quantitative variable and a qualitative variable, you can calculate descriptive statistics for cases grouped in different categories by selecting Quick > Show. Type the name of the quantitative variable into the Objects to display in a single window box and click OK. Click View > Descriptive Statistics & Tests > Stats by Classification and click OK. Type the name of the qualitative variable that defines the categories into the Series/Group for classify box, select the statistics to display, and click OK to display the results.
- EViews does not appear to offer a way to create a stem-and-leaf plot for a quantitative variable
- To make a histogram for a quantitative variable, select Quick > Show. Type the name of the quantitative variable into the Objects to display in a single window box and click OK. Click View > Descriptive Statistics & Tests > Histogram and Stats and click OK.
- To make a scatterplot with two quantitative variables, select Quick > Show. Type the name of the horizontal axis variable followed by a space and then the name of the vertical axis variable into the Objects to display in a single window box and click OK. Click View > Graph, select Basic graph for the General graph type, select Scatter for the Specific Graph type and click OK.
- All possible scatterplots for more than two variables can be drawn simultaneously (called a scatterplot matrix}) by selecting Quick > Show. Type the names of the variables separated by spaces into the Objects to display in a single window box and click OK. Click View > Graph, select Basic graph for the General graph type, select Scatter for the Specific Graph type, then select Scatterplot matrix for Multiple series and click OK.
- You can mark or label cases in a scatterplot with different colors/symbols according to categories in a qualitative variable by following Help #15, but selectingCategorical graph for the General Graph type and typing the name of the qualitative variable into the Within graph box under Factors - series defining categories. Click OK to display the graph.
- You can identify individual cases in a scatterplot by hovering over individual points.
- To remove one of more observations from a dataset, select Quick > Sample and type appropriate values into the Sample range pairs box. For example, type 1 9 11 100 to remove observation #10 from a dataset containing 100 observations. To return the dataset to its original state type @all into the Sample range pairs box.
- To make a bar chart for cases in different categories, select Quick > Show. Type the name of a quantitative variable into the Objects to display in a single window box and click OK. Click View > Graph, select Categorical graph for the General graph type, and select Bar for the Specific Graph type.
- For frequency bar charts of one qualitative variable, type the name of the qualitative variable into the Within graph box under Factors - series defining categories and choose Numbers of observations for Graph data. Click OK to display the graph.
- For frequency bar charts of two qualitative variables, type the names of the qualitative variables separated by a space into the Within graph box under Factors - series defining categories and choose Numbers of observations for Graph data. Click OK to display the graph.
- The bars can also represent various summary functions for the quantitative variable. For example, to produce a bar chart of means select Means for Graph data. Click OK to display the graph.
- To make boxplots for cases in different categories, select Quick > Show. Type the name of the quantitative variable into the Objects to display in a single window box and click OK. Click View > Graph, select Categorical graph for the General graph type, and select Boxplot for the Specific Graph type.
- For just one qualitative variable, type the name of the qualitative variable into the Within graph box under Factors - series defining categories. Click OKto display the graph.
- For two qualitative variables, type the names of the qualitative variables separated by a space into the Within graph box under Factors - series defining categories. Click OK to display the graph.
- To make a QQ-plot (also known as a normal probability plot) for a quantitative variable, Quick > Show. Type the name of the quantitative variable into the Objects to display in a single window box and click OK. Click View > Graph, select Basic graph for the General graph type, and select Quantile - Quantile for the Specific Graph type. Click OK to display the graph.
- EViews does not appear to offer an automatic way to compute a confidence interval for a univariate population mean. It is possible to calculate such an interval by hand using EViews output.
- To do a hypothesis test for a univariate population mean, select Quick > Show. Type the name of the quantitative variable into the Objects to display in a single window box and click OK. Click View > Descriptive Statistics & Tests > Simple Hypothesis Tests, type the (null) hypothesized value into the Mean box, and click OK to display the results. The p-value calculated (displayed as "Probability") is a two-tailed p-value; to obtain a one-tailed p-value you will either need to divide this value by two or subtract it from one and then divide by two (draw a picture to figure out which).
Simple linear regression
- To fit a simple linear regression model (i.e., find a least squares line), select Quick > Estimate Equation. Type, for example, y c x into the Equation specification box, where y is the name of the response variable, c stands for the "constant" that represents the intercept term, and x is the name of the predictor variable. Click OK to see the results. In the rare circumstance that you wish to fit a model without an intercept term (regression through the origin), omit the c in the Equation specification.
- To add a regression line or least squares line to a scatterplot, follow Help #15, but select Regression Line for Fit lines in the Graph Options dialog box.
- To find confidence intervals for the regression parameters in a simple linear regression model, follow Help #25 (or #31), then select View > Coefficient Diagnostics > Confidence Intervals. Default values for the confidence levels are 90%, 95%, and 99%, but you can change this if you want. This applies more generally to multiple linear regression also.
- To find a fitted value or predicted value of Y (the response variable) at each value of X (the predictor variable) in the dataset, follow Help #25 (or #31), then select View > Actual,Fitted,Residual. The fitted or predicted values of Y at each of the X-values in the dataset are displayed in the column headed Fitted. This applies more generally to multiple linear regression also.
- EViews does not appear to offer an automatic way to find a confidence interval for the mean of Y at each value of X in the dataset. It is possible to calculate such intervals by hand using EViews output. This applies more generally to multiple linear regression also.
- To find a prediction interval for an individual value of Y at each value of X in the dataset, follow Help #25 (or #31) then select Proc > Forecast. Type names into the Forecast name and S.E. (optional) boxes and click OK. The forecasts (fitted or predicted values) and S.E.'s (prediction standard errors) will appear under these names in the Workfile. The prediction intervals for an individual Y-value at each of the X-values in the dataset can then be calculated by hand using these values and appropriate t-percentiles. This applies more generally to multiple linear regression also.
Multiple linear regression
- To fit a multiple linear regression model, select select Quick > Estimate Equation. Type, for example, y c x1 x2 into the Equation specification box, where y is the name of the response variable, c stands for the "constant" that represents the intercept term, and x1 and x2 are the names of the predictor variables. Click OK to see the results. In the rare circumstance that you wish to fit a model without an intercept term (regression through the origin), omit the c in the Equation specification.
- To add a quadratic regression line to a scatterplot, follow Help #15, but select Regression Line for Fit lines in the Graph Options dialog box, then click Optionsand select a Polynomial of order 2 under X transformations.
- Categories of a qualitative variable can be thought of as defining subsets of the sample. If there is also a quantitative response and a quantitative predictor variable in the dataset, a regression model can be fit to the data to represent separate regression lines for each subset. First use help #15 and #17 to make a scatterplot with the response variable on the vertical axis, the quantitative predictor variable on the horizontal axis, and the cases marked with different colors/symbols according to the categories in the qualitative predictor variable. To add a regression line for each subset to this scatterplot, select Regression Line for Fit lines in the Graph Options dialog box.
- EViews does not appear to offer an automatic way to find the F-statistic and associated p-value for a nested model F-test in multiple linear regression. It is possible to calculate these values by hand using EViews output.
- To save residuals in a multiple linear regression model, follow Help #31 and the residuals are saved by default as variable resid in the Workfile; they can now be used just like any other variable, for example, to construct residual plots. EViews does not appear to offer a way to automatically save what Pardoe (2012) calls standardized residuals. To save what Pardoe (2012) calls studentized residuals, follow Help #31, select View > Stability Diagnostics > Influence Statistics, checkRStudent, and type a name into the adjoining box to store them in the Workfile.
- EViews does not appear to offer an automatic way to add a loess fitted line to a scatterplot (useful for checking the zero mean regression assumption in a residual plot). However, it is possible to add a similar fitted line based on a "kernel fit" by following Help #15 but selecting Kernel Fit for Fit lines in the Graph Options dialog box.
- To save leverages in a multiple linear regression model, follow Help #31, select View > Stability Diagnostics > Influence Statistics, check Hat Matrix, and type a name into the adjoining box to store them in the Workfile.
- EViews does not appear to offer an automatic way to save Cook's distances in a multiple linear regression model.
- To create a histogram of residuals automatically in a multiple linear regression model, follow Help #31 then select View > Residual Diagnostics > Histogram - Normality Test. To create residual plots manually, first create residuals (see help #35), and then construct scatterplots with these residuals on the vertical axis.
- To create a correlation matrix of quantitative variables (useful for checking potential multicollinearity problems), select Quick > Show. Type the names of the quantitative variables separated by spaces into the Objects to display in a single window box and click OK. Click View > Covariance Analysis, deselect Covariance, select Correlation, and click OK to display the matrix.
- To find variance inflation factors in multiple linear regression, follow Help #31 then select View > Coefficient Diagnostics > Variance Inflation Factors. The variance inflation factors are in the column labeled "Centered VIF."
- To draw a predictor effect plot for graphically displaying the effects of transformed quantitative predictors and/or interactions between quantitative and qualitative predictors in multiple linear regression, first create a variable representing the effect, say, "x1effect" (see computer help #6). Then select Quick > Show, type x1 x1effect into the Objects to display in a single window box, and click OK. Click View > Graph, select Basic graph for the General graph type and select Scatter for the Specific Graph type.
- If the "x1effect" variable just involves x1 (e.g., 1 + 3x1 + 4x12), you can click OK at this point.
- If the "x1effect" variable also involves a qualitative variable (e.g., 1 − 2x1 + 3d2x1, where d2 is an indicator variable), you should select Categorical graph for the General Graph type and type the name of the qualitative variable into the Within graph box under Factors - series defining categories. Click OK to display the graph.
See Section 5.5 in Pardoe (2012) for an example. The instructions here create scatterplots rather than line plots, but lines can be added to the plots with an appropriate choice of Fit lines in the Graph Options dialog box.