SPSS
SPSS (acronym for Statistical Package for the Social Sciences) is a statistical computer program which was originally used only in social science and applied science research, and is now also applied (as IBM SPSS) in the field of market research (marketing).
It is one of the best known statistical programs considering its ability to work with large databases and a simple interface for most analyses. In SPSS version 12, analyzes could be performed with two million records and 250,000 variables. The program consists of a base module and annex modules that have been constantly updated with new statistical procedures. Each of these modules is purchased separately.
For example, SPSS can be used to assess educational issues.
Currently, it competes not only with licensed programs such as SAS, MATLAB, Statistica, Stata, but also with free and open source software, of which the most prominent are the R and Python languages. A free package called PSPP has recently been developed, with an interface called PSPPire that has been compiled for various operating systems such as Linux, as well as versions for Windows and macOS. This last package pretends to be an open source clone that emulates all the possibilities of SPSS.
History
It was created in 1968 by Norman H. Nie, C. Hadlai (Tex) Hull, and Dale H. Bent. Between 1969 and 1975 the University of Chicago through its National Opinion Research Center was in charge of the development, distribution and sale of the program. From 1975 corresponds to SPSS Inc.
Originally the program was created for large computers. In 1970 the first SPSS user manual was published by Nie and Hall. This manual popularizes the program among higher education institutions in the United States. In 1984 the first version for personal computers came out.
Since version 14, but more specifically since version 15, the possibility of using the SPSS object libraries from various programming languages has been implemented. Although it has been mainly implemented for Python, there is also the possibility of working from Visual Basic, C++ and other languages.
On June 28, 2009, it was announced that IBM, months after its attempted purchase of Sun Microsystems was frustrated, was acquiring SPSS for $1.2 billion.
SPSS versions
SPSS Inc. develops a basic module of the statistical package SPSS, of which the following versions have appeared:
- SPSS 1 - 1968
- SPSSx release 2 - 1983 (for large UNIX servers)
- SPSS 5.0 - December 1993
- SPSS 6.1 - February 1995
- SPSS 7.5 - January 1997
- SPSS 8.0 - 1998
- SPSS 9.0 - March 1999
- SPSS 10.0.5 - December 1999
- SPSS 10.0.7 - July 2000
- SPSS 10.1.4 - January 2002
- SPSS 11.0.1 - April 2002
- SPSS 11.5.1 - April 2003
- SPSS 12.0.1 - July 2004
- SPSS 13.0.1 - March 2005 (for the first time, work with multiple databases at the same time).
- SPSS 14.0.1 - January 2006
- SPSS 15.0.1 - November 2006
- SPSS 16.0.1 - November 2007
- SPSS 17.0.1 - November 2008 (on the SPSS user list "SPSSX (r) Discussion [SPSSX-L@LISTSERV. UGA. EDU]" several company officials previously announced the release of version 16 of this software. It incorporated a Java-based interface that allows to make some improvements in system usage facilities.
- SPSS 16.0.2 - April 2008
- SPSS Statistics 17.0.1 - December 2008 (Incorporates important inputs such as multilanguage being, able to change language in the options whenever we want. It also includes modifications in the syntax editor in such a way that highlights keywords and commands, making suggestions while writing. In this sense it approaches the IDE systems used in programming).
- SPSS Statistics 17.0.2 - March 2009
- PASW Statistics 17.0.3 - September 2009 (IBM acquires rights and changes its SPSS name by PASW 18)..........
- PASW Statistics 18.0 - August 2009
- PASW Statistics 18.0.1 - December 2009
- PASW Statistics 18.0.2 - April 2010
- PASW Statistics 18.0.3 - September 2010
- IBM SPSS Statistics 19.0 - August 2010 (see IBM SPSS)
- IBM SPSS Statistics 19.0.1 - December 2010
- IBM SPSS Statistics 20.0 - August 2011
- IBM SPSS Statistics 20.0.1 - March 2012
- IBM SPSS Statistics 21.0 - August 2012
- IBM SPSS Statistics 22.0 - August 2013
- IBM SPSS Statistics 23.0 - August 2014
- IBM SPSS Statistics 24.0 - June 2016
- IBM SPSS Statistics 25.0 - March 2017
- IBM SPSS Statistics 26.0 - 2019
- IBM SPSS Statistics 27.0 - June 2020
SPSS Modules
The SPSS system of modules, like those of other programs (similar to some programming languages) provides a whole series of additional capabilities to those existing in the base system. Some of the available modules are:
- Return models
- Advanced models
- Data reduction: Allows to create synthetic variables from colinear variables through Factor Analysis.
- Classification: Allows to perform groupings of observations or variables (cluster analysis) through three different algorithms.
- Non-parametric tests: Allows to perform different statistical tests specialized in non-normal distributions.
- Tables: Allows the user to give a special format to the outputs of the data for further use. There is a certain trend within users and software developers to put aside the original TABLES system to make more extensive use of calls CUSTOM TABLES.
- Trends
- Categories: Allows multivariate analysis of variables normally categories. Metric variables can also be used as long as the proper recoding process is performed.
- Joint Analysis: Allows the analysis of data collected for this specific type of statistical tests.
- Maps: Allows the geographical representation of the information contained in a file (discontinued for SPSS 16).
- Exact tests: Allows to perform statistical tests in small samples.
- Lost Value Analysis: Simple regression based on imputations on absent values.
- Complex samples: Allows work for the creation of stratified samples, by conglomerates or other types of samples.
- SamplePower: Calculation of sample sizes.
- Classification trees: Allows to formulate classification and/or decision trees with which the formation of groups can be identified and predict the conduct of its members.
- Data Validation: Allows the user to perform logical reviews of the information contained in a ".sav" file and obtain reports of the values considered atypical. It is similar to the use of syntax or scripts to perform revisions of files. In the same way that these mechanisms are subsequent to the digitization of data.
- SPSS Programmability Extension (SPSS 14 onwards). It allows the use of the Python programming language for better control of various processes within the program that until now were mainly done by scripts (with the SAX Basic language). There is also the possibility of using Microsoft's technologies.NET to make use of SPSS libraries. Although some users have questioned the need to include other languages, the company does not have this among its immediate objectives.
From SPSS/PC there is an attached version called SPSS Student which is a complete program of the corresponding version, but limited in its capacity regarding the number of records and variables it can process. This version is for the purpose of teaching the management of the program.
Management
SPSS has a file system in which the main one is the data files (extension. SAV). Apart from this type there are two other types of frequent use:
- Exit Files (output, extension. SPO): In these, all data manipulation information is displayed by users through command windows. They are susceptible to being exported with various formats (originally HTML, RTF or TXT, currently version 15 incorporates PDF export along with the XLS and DOC formats already in version 12).
- Syntax files (extension. SPS): Almost all SPSS windows have a button that allows you to paste the process you want to perform. The above generates a syntax file where all the instructions carried out by the SPSS commands are saved. This file is susceptible to be modified by the user. Many of the first SPSS users usually write these files instead of using the program's sticking system.
There is a third type of file: the script file (extension.SBS). This file is used by the most advanced users of the software to generate routines that allow the automation of very long and/or complex processes. Many of these processes are usually not part of the standard output of SPSS commands, although they do start from these outputs. Much of the functionality of the script files has now been taken over by the insertion of the Python programming language into the SPSS syntax routines. Procedures that previously could only be done through scripts can now be done from the syntax itself.
When the program is installed it brings a certain number of examples or utilities of almost all the files in question. These are used to illustrate some of the usage examples of the program.
Here is a small list of things that can be done using this program:
1.Introduction of data:
We go to the data view and they are entered in DIFFERENT columns (because they are different variables) from top to bottom.
2.Basic calculations:
-To carry out operations: ANALYZE>>descriptive statistics>>frequencies (for frequency tables) there you take the variable that interests you to the other side and click statistics where we will mark everything we want to know (mean, mode, median, quartiles). A new screen will appear with the results. If we need to know P2.5 or P97.5, we should do it here.
-ANALYZE>>descriptive statistics>>explore: there we introduce the variable in the first field (list of dependents) and click accept. Here it gives us all the information from before, but IN ADDITION it gives us the confidence interval and sample estimate as well as the standard error of the mean ABOVE it gives us the graphs of the trunk-leaf diagram and the box diagram.
-For ASYMMETRY and KURTOSIS: In symmetry: if it is negative it is biased to the LEFT if it is 0 it is symmetric and if it is positive it is biased to the RIGHT. In kurtosis: if it is around 0 it is mesokurtic, if it is negative platykurtic and if it is positive leptokurtic.
-ANALYZE>>descriptive statistics>>frequencies>>graphs this is useful to see the SHAPE OF THE DISTRIBUTION, since we can superimpose the normal curve. If the curve resembles the histogram we can say that it is symmetric.
-If, for example, we want to make a cloud of points or a scatter plot to see two quantitative variables, we go to Graphs>>old dialog box>>scatter points>>simple scatter>>define>> EYE you have to know which is the dependent and which is the independent. Depending on will be the X (dependent (Y) and independent (X) [age, for example, would be independent in most cases])
-Another thing we can get is Pearson's linear correlation coefficient ANALYZE>>bivariate>>correlations. There a table will appear. On one diagonal we will always get 1 (ignore) on the other another value will appear, which will be the important one.
-The coefficient of regression and the coefficient of determination: ANALYZE>>regression>>linear. Of all the tables there are, you have to look at the one that says MODEL SUMMARY and look at the R2 (determination coefficient). To get the regression coefficient (b) you have to look at a table called COEFFICIENTS. There we see two numbers below the B. The first is called the constant (also called a) and the second is the regression coefficient B. In summary you have to take the SECOND.
-If we want to compare two means: ANALYZE>> compare means>>t-test for independent samples>>define groups.
-To make a data selection of a variable: DATA>>Select cases>>If it satisfies the condition>>You put the variable to the right=(whatever you want to compare) Now we are going to ANALYZE>>explore.
-ANALYZE>>Descriptive statistics>>Contingency tables>>boxes>>% in rows>> accept
-ANALYZE>>Descriptive Statistics>>Cross Tables>>Show Clustered Bar Chart AND Statistics>>(any statistic you want)
-ANALYZE>>Compare means>>T test for 1 sample>>(we put the value in test value)>>OK `[We look in Sig]
-If we want to change the name of the variables to make it more comfortable, we can go to VIEW VARIABLES (right tab) and click on the name.
SPSS data file
Data files in SPSS format have the extension. SAV. When opening a data file with SPSS, we see the data view, a table in which the rows indicate the cases and the columns the variables. Each cell corresponds to the value that a certain variable adopts in a certain case.
In addition to this data view, in the latest versions of the program there is a variable view in which the characteristics of each one are described. In this view, the rows correspond to each variable and the columns allow us to access its characteristics:
- Name, limited to 8 characters.
- Type of variable (compar this list of options with existing statistical variables)
- Number in standard format)
- Coma decimal, number with commas every three positions and with a point as decimal delimiter
- Decimal point, number with points every three positions and with a comma as decimal boundary.
- Scientific notation, a number expressed in such a format that follows from an E and a number expressing the power of 10 to which the previous numerical part multiplies
- Date
- Currency dollar, numerical format with which amounts are expressed in dollars
- User coin, numerical format with which amounts are expressed in the coin defined in the "Options" dialog box coin tab
- Alphanumeric character or variable chain
- Total size
- Size of the decimal part
- Variable label
- Tags for values
- Lost values
- Space that occupies in data view
- Alignment of the variable in the data view
- Measuring scale.
Some users overlook the characteristics of variables when working in the database. However, when scripts or Python are used, the characteristics of the variables can take on great relevance in the construction of ad-hoc procedures.
SPSS Syntax File
These syntax files can be generated with the help of the program itself, since in almost all windows where tasks are performed in SPSS there is a "Paste" button. This button closes the window in question and saves the syntax of the actions selected in that window. Once saved, this file is susceptible to modification.
The syntax as presented below was produced directly with SPSS. This program gives a readable format to the syntax, a format that the software in some cases does not require for its correct use.
Another peculiarity of SPSS syntaxes is that they are not "case sensitive". Before which it is common to see syntax written only in uppercase, only in lowercase or a combination of each user's own. This situation is modified for those people who make use of Python within their syntaxes, since this is a sensitive language to variations between upper and lower case letters. This forces these users to write syntax more carefully.
The following example illustrates how to open a data file by syntax and how to carry out a frequency and a crosstabulation with data from one of the example files that the program installs.
*This is a comment, you must be preceded by an asterisk and finished by a point.
*Open the Tomato.sav file. TEAP FILE='C:Program FilesSPSSTomato.sav'.
* Generate a table with the frequencies of the fertilizer variable. FREQUENCIES VARIABLES=fert /ORDER= ANALYSIS.
*Make a contingency table with the initial and fertilizing height variables. CROSSTABS /TABLES=initial BY fert /FORMAT= AVALUE TABLES /CELLS= COUNT /COUNT ROUND CELL.
Observations
Always consider the width, which determines the maximum number of characters that the variable contains.
Contenido relacionado
Fourth generation of computers
Annex: Home computers by category
November 17