Software Info – Statistica
These instructions accompany Applied Regression Modeling by Iain Pardoe, 3rd edition, Wiley, 2020. The numbered items cross-reference with the “computer help” references in the book. These instructions are based on the “Classic Menus” interface of Statistica 10 for Windows, but they (or something similar) should also work for other versions. Find instructions for other statistical software packages here.
Getting started and summarizing univariate data
- If desired, change Statistica’s default options by selecting Tools > Options.
- To open a Statistica data file, select File > Open. You can also open Excel, SPSS, JMP, and Minitab files.
- To resume an analysis, which allows you to re-run an analysis with different options selected, right-click on the appropriate analysis in the tree structure in the left-hand pane of the active Workbook and select Resume Analysis.
- Output can be viewed in the active Workbook. Click Add to Report to add individual pieces of output (including tables and graphs) to a report, from where they can also be added to a Microsoft Word document.
- You can access help by selecting Help > Statistica Help. For example, to find out about “boxplots” click the Index tab, type boxplot in the keyword box, click Display, and select the index entry you want in the main window.
- To transform data or compute a new variable, select the Data window and then select Data > Variables > Add. Type a name (with no spaces) for the new variable in the Name box, and type a mathematical expression for the variable in the Long name box. You can click the Functions button to help you to create the expression. Examples are =Log(x) for the natural logarithm of x and x**2 for x2. Click OK to create the new variable, which will be added to the dataset (check it looks correct in the Data window); it can now be used just like any other variable. If the new variable has blank values this probably means there is a syntax error in your Function—a common mistake is to forget the multiplication symbol (*) between a number and a variable (e.g., 2*x represents 2x).
- To create indicator (dummy) variables from a qualitative variable, select the Data window and then select Data > Variables > Add. Type a name (with no spaces) for the indicator variable in the Name box, and type an expression like the following in the Long name box: =iif(x="level", 1, 0), where x is the qualitative variable and level is the name of one of the categories in x. Click OK and check that the correct indicator variable has been added to your spreadsheet in the Data window. Repeat for other indicator variables (if necessary).
-
- To find a percentile (critical value) for a t-distribution, select Statistics > Probability Calculator > Distributions. Select t (Student) for the Distribution, check Inverse and 1-Cumulative p, type the upper-tail area (i.e., the one-tail significance level) into the box labeled p, and the degrees of freedom into the box labelled df. Click Compute to see the result in the box labelled t. For example, typing .05 for p and 29 for df returns the 95th percentile of the t-distribution with 29 degrees of freedom (1.699), which is the critical value for an upper-tail test with a 5% significance level. By contrast, also checking Two-tailedreturns the 97.5th percentile of the t-distribution with 29 degrees of freedom (2.045), which is the critical value for a two-tail test with a 5% significance level.
- To find a percentile (critical value) for an F-distribution, select Statistics > Probability Calculator > Distributions. Select F (Fisher) for the Distribution, check Inverse and 1-Cumulative p, type the upper-tail area (i.e., the significance level) into the box labeled p, the numerator degrees of freedom into the box labelled df1, and the denominator degrees of freedom into the box labelled df2. Click Compute to see the result in the box labelled F. For example, typing .05 for p, 2 for df1, and 3 for df2 returns the 95th percentile of the F-distribution with 2 numerator degrees of freedom and 3 denominator degrees of freedom (9.552).
- To find a percentile (critical value) for a chi-squared distribution, Statistics > Probability Calculator > Distributions. Select Chi2 for the Distribution, check Inverse and 1-Cumulative p, type the upper-tail area (i.e., the significance level) into the box labeled p and the degrees of freedom into the box labelled df. Click Compute to see the result in the box labelled Chi2. For example, typing .05 for p and 2 for df returns the 95th percentile of the chi-squared distribution with 2 degrees of freedom (5.991).
-
- To find an upper-tail area (one-tail p-value) for a t-distribution, select Statistics > Probability Calculator > Distributions. Select t (Student) for the Distribution, check 1-Cumulative p, and type the value of the t-statistic into the box labeled t and the degrees of freedom into the box labelled df. Click Compute to see the result in the box labelled p. For example, typing 2.40 for t and 29 for df returns the upper-tail area for a t-statistic of 2.40 from the t-distribution with 29 degrees of freedom (0.012), which is the p-value for an upper-tail test. By contrast, also checking Two-tailed returns the two-tail area for a t-statistic of 2.40 from the t-distribution with 29 degrees of freedom (0.023), which is the p-value for a two-tail test.
- To find an upper-tail area (p-value) for an F-distribution, select Statistics > Probability Calculator > Distributions. Select F (Fisher) for the Distribution, check 1-Cumulative p, and type the value of the F-statistic into the box labeled t, the numerator degrees of freedon into the box labelled df1, and the denominator degrees of freedom into the box labelled df2. Click Compute to see the result in the box labelled p. For example, typing 51.4 for F, 2 for df1, and 3for df2 returns the upper-tail area (p-value) for an F-statistic of 51.4 for the F-distribution with 2 numerator degrees of freedom and 3 denominator degrees of freedom (0.005).
- To find an upper-tail area (p-value) for a chi-squared distribution, select Statistics > Probability Calculator > Distributions. Select Chi2 for the Distribution, check 1-Cumulative p, and type the value of the chi-squared statistic into the box labeled Chi2 and the degrees of freedom into the box labelled df. Click Compute to see the result in the box labelled p. For example, typing 0.38 for Chi2 and 2 for df returns the upper-tail area (p-value) for a chi-squared statistic of 0.38 for the chi-squared distribution with 2 degrees of freedom (0.827).
- Calculate descriptive statistics for quantitative variables by selecting Statistics > Basic Statistics and Tables. Leave Descriptive statistics selected under the Quick tab and click OK. Click Variables to select the variable(s) for analysis and click OK. Click the Advanced tab to select the summaries, such as the Mean, that you would like and click Summary: Statistics to view the results.
- Create contingency tables or cross-tabulations for qualitative variables by selecting Statistics > Basic Statistics and Tables. Select Tables and banners and click OK. Click Specify tables (select variables), move one qualitative variable into List 1 and another into List 2, and click OK twice. Cell percentages (within rows, columns, or the whole table) can be calculated by clicking Options before clicking Summary to display the results.
- If you have a quantitative variable and a qualitative variable, you can calculate descriptive statistics for cases grouped in different categories by following the instructions in Help #10, but clicking By Group to select the qualitative grouping variable before clicking Summary to display the results. Check Accumulate tabular results in a single spreadsheet in the By Group dialog box to display the results for all the groups in a single table.
- To make a stem-and-leaf plot for a quantitative variable, select Statistics > Basic Statistics and Tables. Leave Descriptive statistics selected under the Quick tab and click OK. Click Variables to select the variable(s) for analysis and click OK. Click the Normality tab and click Stem & leaf plot.
- To make a histogram for a quantitative variable, select Graphs > Histograms. Click Variables to select the variable(s) for analysis and click OK.
- To make a scatterplot with two quantitative variables, select Graphs > Scatterplots. Click Variables and move the horizontal axis variable into the X: box and the vertical axis variable into the Y: box, then click OK twice.
- All possible scatterplots for more than two variables can be drawn simultaneously (called a scatterplot matrix}) by selecting Graphs > Matrix Plots. Click Variablesto select the variable(s) for analysis and click OK twice.
- You can mark or label cases in a scatterplot with different colors/symbols according to categories in a qualitative variable by followng Help #15, then clicking the Categorized tab in the 2D Scatterplots dialog box, checking On under X-Categories, and selecting Overlaid under Layout. Click Change Variable to select the qualitative categorization variable, then click OK twice to display the plot.
- You can identify individual cases in a scatterplot by hovering over individual points.
- To remove one of more observations from a dataset, right-click the appropriate observation(s) in the Data window and select Selection Conditions > Remove Selected Cases. Removed cases will have a different background color in the spreadsheet. To add removed cases back, right-click the appropriate observation(s) in the Data window and select Selection Conditions > Add Selected Cases.
-
- To make a frequency bar chart of one qualitative variable, select Statistics > Basic Statistics and Tables. Select Frequency tables and click OK. Click Variables to select the qualitative variable and click OK. Then click Histograms to display the bar chart.
- For frequency bar charts of two qualitative variables, select Statistics > Basic Statistics and Tables. Select Tables and banners and click OK. Click Specify tables (select variables) to select the two qualitative variables and click OK. Then click Categorized histograms to display the bar charts.
- To produce a bar chart of means of a quantitative variable for cases in different categories, select Graphs > 2D Graphs > Means w/Errors Plots. Select Columns for Graph type, then click Variables and move the quantitative variable into the Dependent variable: box and the qualitative variable(s) representing the categories into the Grouping variable: box, then click OK twice.
- To make boxplots of a quantitative variable for cases in different categories, select Graphs > 2D Graphs > Boxplots. Click Variables and move the quantitative variable into the Dependent variable: box and the qualitative variable representing the categories into the Grouping variable: box, then click OK twice.
- To make a QQ-plot (also known as a normal probability plot) for a quantitative variable, select Statistics > Basic Statistics and Tables. Leave Descriptive statistics selected under the Quick tab and click OK. Click Variables to select the variable(s) for analysis and click OK. Click the Prob. & Scatterplots tab and click Normal probability plot.
- To compute a confidence interval for a univariate population mean, select Statistics > Basic Statistics and Tables. Select t-test, single sample and click OK. Click Variables to select the quantitative variable for analysis and click OK. Click the Options tab and check Compute conf. limits before clicking Summary to display the results.
- To do a hypothesis test for a univariate population mean, select Statistics > Basic Statistics and Tables. Select t-test, single sample and click OK. Click Variables to select the quantitative variable for analysis and click OK. Type the (null) hypothesised value into the Test all means against: box before clicking Summary to display the results. The p-value calculated is a two-tailed p-value; to obtain a one-tailed p-value you will either need to divide this value by two or subtract it from one and then divide by two (draw a picture to figure out which).
Simple linear regression
- To fit a simple linear regression model (i.e., find a least squares line), select Statistics > Multiple Regression. Click the Variable button and move the response variable into the Dependent var. box and the predictor variable into the Independent variable list box. Click OK twice to see the basic results, from where you can select further results to display. In the rare circumstance that you wish to fit a model without an intercept term (regression through the origin), select the Advanced tab in the Multiple Linear Regression dialog box, check Advanced options (stepwise or ridge regression), and click OK. Then select the Advanced tab in the Model Definition dialog box and change the Intercept setting to Set to zero before clicking OK.
- To add a regression line or least squares line to a scatterplot, follow Help #15, which includes a regression line in the plot by default.
- Statistica does not appear to offer an automatic way to find 95% confidence intervals for the regression parameters in a simple linear regression model using the Multiple Regression routine (although it is possible to do this using Statistica’s General Linear Models routine). It is possible to calculate these intervals by hand using Statistica regression output and appropriate percentiles from a t-distribution. This applies more generally to multiple linear regression also.
- To find a fitted value or predicted value of Y (the response variable) at a particular value of X (the predictor variable), follow Help #25 (or #31) to fit a linear regression model, then click the Residuals/assumptions/prediction tab. Click Predict dependent variable to specify the value(s) for the predictor term(s) and click OK. The fitted or predicted value of Y at the X-value(s) that you specified is displayed in the row of the results labeled “Predicted.” This applies more generally to multiple linear regression also.
- To find a confidence interval for the mean of Y at a particular value of X, follow Help #25 (or #31) to fit a linear regression model, then click the Residuals/assumptions/prediction tab. Select Compute confidence limits, specify the significance level, Alpha (the default is 5% for a 95% interval), click Predict dependent variable to specify the value(s) for the predictor term(s), and click OK. The confidence interval for the mean of Y at the X-value(s) that you specified is displayed in the rows of the results labeled, for example, “-95.0%CL” and”+95.0%CL.” This applies more generally to multiple linear regression also.
- To find a prediction interval for an individual value of Y at a particular value of X, follow Help #25 (or #31) to fit a linear regression model, then click the Residuals/assumptions/prediction tab. Select Compute prediction limits, specify the significance level, Alpha (the default is 5% for a 95% interval), clickPredict dependent variable to specify the value(s) for the predictor term(s), and click OK. The prediction interval for an individual Y-value at the X-value(s) that you specified is displayed in the rows of the results labeled, for example, “-95.0%PL” and”+95.0%PL.” This applies more generally to multiple linear regression also.
Multiple linear regression
- To fit a multiple linear regression model, select select Statistics > Multiple Regression. Click the Variable button and move the response variable into theDependent var. box and the predictor variables into the Independent variable list box. Click OK twice to see the basic results, from where you can select further results to display. In the rare circumstance that you wish to fit a model without an intercept term (regression through the origin), select the Advanced tab in the Multiple Linear Regression dialog box, check Advanced options (stepwise or ridge regression), and click OK. Then select the Advanced tab in the Model Definition dialog box and change the Intercept setting to Set to zero before clicking OK.
- To add a quadratic regression line to a scatterplot, follow Help #15, but before clicking OK select the Advanced tab in the 2D Scatterplots dialog box and select Polynomial for Fit.
- Categories of a qualitative variable can be thought of as defining subsets of the sample. If there is also a quantitative response and a quantitative predictor variable in the dataset, a regression model can be fit to the data to represent separate regression lines for each subset. To do this follow Help #17, which includes separate regression lines in the plot by default.
- Statistica does not appear to offer an automatic way to to find the F-statistic and associated p-value for a nested model F-test in multiple linear regression. It is possible to calculate these quantities by hand using Statistica regression output and appropriate percentiles from a F-distribution.
- To save residuals in a multiple linear regression model, follow Help #31 to fit a multiple linear regression model, then click the Residuals/assumptions/predictiontab. Click Perform residual analysis, click the Save tab in the Residual Analysis dialog box, and click Save residuals & predicted. The residuals will be saved as a variable called Residuals in a new spreadsheet; they can now be used just like any other variable, for example, to construct residual plots. Note that Statistica will also save what it calls StandardResidual, but these are different to what Pardoe (2012) calls standardized residuals. Similarly, Statistica’s DeletedResidual is different to what Pardoe (2012) calls studentized residuals.
- To add a loess fitted line to a scatterplot (useful for checking the zero mean regression assumption in a residual plot), follow Help #15, but before clicking OK select the Advanced tab in the 2D Scatterplots dialog box and select Lowess for Fit.
- Statistica does not appear to offer an automatic way to save leverages in a multiple linear regression model.
- To save Cook’s distances in a multiple linear regression model, follow Help #31 to fit a multiple linear regression model, then click the Residuals/assumptions/prediction tab. Click Perform residual analysis, click the Save tab in the Residual Analysis dialog box, and click Save residuals & predicted. The Cook’s distances will be saved as a variable called CookDistance in a new spreadsheet; they can now be used just like any other variable, for example, to construct scatterplots.
- To create some residual plots automatically in a multiple linear regression model, follow Help #31 to fit a multiple linear regression model, click the Residuals/assumptions/prediction tab, and click Perform residual analysis. Try any of the following:
- Select the Residuals tab and select Residuals vs. independent var..
- Select the Scatterplots tab and select Predicted vs. residuals.
- Select the Residuals tab and select Histogram of residuals.
- Select the Probability plots tab and select Normal plot of residuals.
- To create a correlation matrix of quantitative variables (useful for checking potential multicollinearity problems), select Statistics > Basic Statistics and Tables. Select Correlation matrices and click OK. Click One variable list to select the variable(s) for analysis and click OK. Click Summary to view the results.
- To find variance inflation factors in multiple linear regression, follow Help #31 to fit a multiple linear regression model, click the Advanced tab, and click Current sweep matrix. The variance inflation factors are the negatives of the diagonal elements for the predictor terms in this matrix.
- To draw a predictor effect plot for graphically displaying the effects of transformed quantitative predictors and/or interactions between quantitative and qualitative predictors in multiple linear regression, first create a variable representing the effect, say, “X1effect” (see computer help #6). Then select Graphs > Scatterplots. Click Variables and move X1 into the X: box and the “X1effect” variable into the Y: box.
- If the “X1effect” variable just involves X1 (e.g., 1 + 3X1 + 4X12), you can click OK twice at this point.
- If the “X1effect” variable also involves a qualitative variable (e.g., 1 − 2X1 + 3D2X1, where D2 is an indicator variable), you should click the Categorized tab in the 2D Scatterplots dialog box, check On under X-Categories, and select Overlaid under Layout. Click Change Variable to select the qualitative categorization variable, then click OK twice to display the plot.