Flexiple Logo
  1. Home
  2. Blogs
  3. Python
  4. How to Calculate z-score in Python

How to Calculate z-score in Python

Author image

Harsh Pandey

Software Developer

Published on Mon Apr 01 2024

Introduction

In the domain of statistics and data analysis, the concept of z-score plays a crucial role in understanding how data points relate to the mean and standard deviation of a dataset. Z-score, often referred to as the standard score, allows us to determine how far a data point is from the mean in terms of standard deviations. In this beginner-friendly guide, we will explore the world of z-score python using scipy, pandas, and basic calculations involving mean and standard deviation.

What is a Z-Score?

Before diving into the code, let's grasp the concept of a Z-score. Imagine you have a dataset, and you want to know how a specific data point compares to the rest of the data. The Z-score quantifies this comparison by telling you the number of standard deviations a particular data point is away from the mean. A positive Z-score indicates that the data point is above the mean, while a negative Z-score signifies that it's below the mean. A Z-score of 0 implies that the data point is right at the mean.

Calculating Z-Score Python Using Scipy

Scipy is a powerful library in Python that offers a wide range of scientific and mathematical functions. Calculating Z-scores is a breeze with Scipy's built-in zscore function. Let's see how it works:

import numpy as np
from scipy import stats

# Sample dataset
data = np.array([12, 15, 18, 21, 24, 27, 30, 33, 36, 39])

# Calculate Z-scores
z_scores = stats.zscore(data)

# Print Z-scores
print("Z-Scores:", z_scores)

Understanding Z-Score Python with Pandas

Pandas is a widely-used library for data manipulation and analysis. It's no surprise that Pandas provides an elegant way to calculate Z-scores using its Series data structure. Let's see how it's done:

import pandas as pd

# Sample dataset
data = pd.Series([12, 15, 18, 21, 24, 27, 30, 33, 36, 39])

# Calculate mean and standard deviation
mean = data.mean()
std_dev = data.std()

# Calculate Z-scores
z_scores = (data - mean) / std_dev

# Print Z-scores
print("Z-Scores:", z_scores)

Z-Score Python with Mean and Standard Deviation

Calculating Z-scores manually using the formula (x - mean) / standard_deviation is a straightforward approach if you're familiar with basic arithmetic. Let's calculate Z-scores using this method:

# Sample dataset
data = [12, 15, 18, 21, 24, 27, 30, 33, 36, 39]

# Calculate mean and standard deviation
mean = sum(data) / len(data)
differences = [(x - mean) for x in data]
std_dev = (sum([diff ** 2 for diff in differences]) / len(data)) ** 0.5

# Calculate Z-scores
z_scores = [(x - mean) / std_dev for x in data]

# Print Z-scores
print("Z-Scores:", z_scores)

Interpreting Z-Score Python

Understanding the magnitude of a Z-score Python helps in interpreting the relationship between a data point and the mean. A Z-score closer to 0 indicates that the data point is close to the mean, while a higher absolute Z-score suggests a greater deviation from the mean.

For instance, if a student's test score has a Z-score of -2, it means the score is 2 standard deviations below the mean. Conversely, a Z-score of +1.5 indicates a score 1.5 standard deviations above the mean, which could imply exceptional performance.

Conclusion

Through this blog, you've embarked on a journey to grasp the significance of Z-scores in the context of data analysis. We explored different methods to calculate Z-scores using Python, including Scipy, Pandas, and manual calculations using mean and standard deviation. Z-scores are invaluable tools for understanding data variability and identifying outliers in a dataset. As you continue your data analysis journey, remember that Z-scores provide valuable insights that contribute to making informed decisions based on statistical patterns and trends.

Related Blogs

Browse Flexiple's talent pool

Explore our network of top tech talent. Find the perfect match for your dream team.