Using Python for data analysis
If you want to start using Python, but are used to using SPSS for data analysis, you might find the files below useful. As it could be somewhat difficult to switch from SPSS to Python (not only in terms of learning Python). Most online tutorials (on using Python for data analysis) are focused mostly on prediction. For example, in most linear regression tutorials beta (standardized coefficient) was not included, or incremental change in R2 was never mentioned, and many more things we are used to in SPSS output never came up. All things that could be useful and helpful in interpreting regression results are not typically included in outputs of existing Python modules, in my opinion.
The files below apply existing Python modules and provide information that would be included in SPSS output, and minimize coding.
If you want to switch to Python for data analysis, hope you find these files helpful. The goal in compiling these files was making Python statistical analysis output easily understandable to those of us who are used to conducting data analysis in SPSS.
Basic Regression
This file runs a regression analysis and provides a basic output: R2 table, F-value table and the coefficients table.
Link to the file: Download Basic Regression Python NotebookHow to use this file:
- Run 1st cell to import the relevant libraries.
- Run 2nd cell to define the regression class, needed for analysis.
- In the 3rd cell, write the name of the file inside quotation marks.
This line is written for csv files, if you have an excel file, then this will work: df = pd. read_excel("nameoffile.xlsx"), and write the name of the file including its extension inside quotation marks. The file extension matters (check how the file is saved and make sure you include the appropriate file extension and match the file extension to the read command). Optional, you could also save the file as csv in Excel. - In the 4th cell write the names of the independent variables inside the square brackets (this forms a list of all independent variables to be included in the analysis). Separate the names of independent variables by a comma, and make sure the individual names of the variables are inside quotation marks. If you have only one independent variable, then include it inside quotation marks in square brackets (and then delete the comma and the other variable).
Next write the name of the dependent variable (also inside quotation marks). - Run cells 5 through 7 to get the regression output.