Scipy stats.zscore Function

the
scipy.stats.zscore
Function 
Calculating the
zscore
for aOnedimensional
Array in Python 
Calculating the
zscore
for a MultiDimensional Array in Python 
Calculating the
zscore
for aPandas Dataframe
in Python
zscore
is a statistic method that helps calculate how many values standard deviation away is a particular value away from the mean value. The zscore
is calculated with the help of the following formula.
z = (X – μ) / σ
In which,
 X is a particular value from the data
 μ is the mean value
 σ is the standard deviation
This tutorial will show how to calculate the zscore
value of any data in Python using the SciPy
library.
the scipy.stats.zscore
Function
The scipy.stats.zscore
function of the SciPy
library helps to calculate the relative zscore
of the given input raw data along with the data’s mean and standard deviation. It is defined as scipy.stats.zscore(a, axis, ddof, nan_policy)
.
Following are the parameters of the scipy.stats.zscore
function.
a (array) 
An arraylike object of the raw input data. 
axis (int) 
It defines the axis along which the function computes the zscore value. The default value is 0 i.e, the function computes over the whole array. 
ddof (int) 
It defines the degree of freedom correction in the whole computation of the standard deviation. 
nan_policy 
This parameter decides how to deal when there are NaN values in the input data. There are three decision parameters in the parameter, propagate , raise , omit . propagate simply returns the NaN value, raise returns an error and omit simply ignores the NaN values and the function continues with computation. These decision parameters are defined in single quotes '' . Also, NaN values never affect the zscore value that is calculated for the other values present in the input data. 
All the parameters except the a (array)
parameter are optional. That means it is not necessary to define them every time while using the scipy.stats.zscore
function.
Now, let us use the scipy.stats.zscore
function on onedimensional array
, multi dimensional array
, and Pandas Dataframe
.
Calculating the zscore
for a Onedimensional
Array in Python
import numpy as np
import scipy.stats as stats
input_data = np.array([5, 10, 20, 35, 25, 22, 19, 19, 50, 45, 62])
stats.zscore(input_data)
Output:
array([1.3916106 , 1.09379511, 0.49816411, 0.39528239, 0.20034861,
0.37903791, 0.55772721, 0.55772721, 1.28872889, 0.99091339,
2.00348608])
Note that each zscore
value tells that how many standard deviation values away is its corresponding value away from the mean value. Here, the negative
sign represents that that value is that many standard deviations below
the mean value, and the positive sign represents that that value is that many standard deviations above
the mean value. If a zscore
value comes out to be 0
, then that value is 0
standard deviation values away from the mean value.
Calculating the zscore
for a MultiDimensional Array in Python
import numpy as np
import scipy.stats as stats
data = np.array([[5, 10, 20, 35],
[25, 22, 19, 19],
[50, 45, 62, 28],
[24, 45, 15, 30]])
stats.zscore(input_data)
Output:
array([1.3916106 , 1.09379511, 0.49816411, 0.39528239, 0.20034861,
0.37903791, 0.55772721, 0.55772721, 1.28872889, 0.99091339,
2.00348608])
Calculating the zscore
for a Pandas Dataframe
in Python
In this, we will use the randint()
function of the NumPy
library. This function is used to generate random sample numbers and store them in the form of a NumPy
array. After creating the NumPy
array, we will use that array as a Pandas Dataframe
.
import pandas as pd
import numpy as np
import scipy.stats as stats
input_data = pd.DataFrame(np.random.randint(0, 30, size=(4, 4)), columns=['W', 'X', 'Y', 'Z'])
print(input_data)
W X Y Z
0 7 9 2 15
1 11 23 15 28
2 28 11 25 2
3 11 19 14 15
input_data.apply(stats.zscore)
Output:
W X Y Z
0 0.894534 1.135815 1.471534 0.000000
1 0.400998 1.310556 0.122628 1.414214
2 1.696529 0.786334 1.348907 1.414214
3 0.400998 0.611593 0.000000 0.000000
Note that apply()
function of the Pandas
library is used to calculate the zscore
value for each value in the given dataframe. This function is used to apply a specific function defined as a function argument of the apply()
function to each value of the Pandas series or dataframe.