Flexiple Logo
  1. Home
  2. Blogs
  3. Python
  4. Jaccard similarity in Python

Jaccard similarity in Python

Author image

Harsh Pandey

Software Developer

Published on Thu Mar 28 2024

Introduction

Welcome to our blog on "Python Jaccard Similarity"! If you've ever wondered how to measure the similarity between sets, you're in the right place. Jaccard similarity is a popular technique used to compare the similarity between two sets by calculating the size of their intersection divided by the size of their union. In this blog, we'll explore what Jaccard similarity is and how to calculate it step-by-step in Python. Let's dive in and unlock the power of Jaccard similarity with Python!

What is Jaccard Similarity?

Jaccard similarity is a measure used to determine how similar two sets are. It is particularly useful when dealing with categorical data or elements that can be represented as sets. The Jaccard similarity coefficient, also known as Jaccard index, is calculated by dividing the size of the intersection of two sets by the size of their union. The result lies between 0 and 1, where 0 indicates no similarity, and 1 means the sets are identical.

To better understand this, let's consider two sets, A = {1, 2, 3, 4} and B = {3, 4, 5}. The intersection of these sets is {3, 4}, and their union is {1, 2, 3, 4, 5}. By dividing the size of the intersection (2) by the size of the union (5), we get a Jaccard similarity of 0.4 or 40%, indicating a moderate similarity between the two sets.

Calculating Jaccard Similarity in Python

Calculating Jaccard similarity in Python involves three simple steps: finding the intersection of two sets, finding the union of the two sets, and then dividing the size of the intersection by the size of the union. Let's walk through the process step-by-step with a detailed explanation and examples.

Find Intersection of two sets

The intersection of two sets contains the elements that are common to both sets. In Python, you can find the intersection of two sets using the intersection() method or the & operator.

set_A = {1, 2, 3, 4}
set_B = {3, 4, 5}
intersection = set_A.intersection(set_B)
print(intersection)  # Output: {3, 4}

Find Union of two sets

The union of two sets contains all the unique elements from both sets. In Python, you can find the union of two sets using the union() method or the | operator.

set_A = {1, 2, 3, 4}
set_B = {3, 4, 5}
union = set_A.union(set_B)
print(union)  # Output: {1, 2, 3, 4, 5}

Calculating Jaccard similarity

To calculate the Jaccard similarity, we need to divide the size of the intersection by the size of the union.

set_A = {1, 2, 3, 4}
set_B = {3, 4, 5}

intersection = set_A.intersection(set_B)
union = set_A.union(set_B)

jaccard_similarity = len(intersection) / len(union)
print(f"Jaccard similarity: {jaccard_similarity}")

In this example, the size of the intersection is 2 (as {3, 4} has two elements), and the size of the union is 5 (as {1, 2, 3, 4, 5} has five elements). Therefore, the Jaccard similarity between sets A and B is 2/5, which equals 0.4 or 40%.

Conclusion

Now you know how to calculate Jaccard similarity between two sets in Python. By understanding and applying this simple technique, you can measure the similarity between various sets of data, enabling you to gain valuable insights in fields such as data science, natural language processing, and recommendation systems. The code examples provided above should serve as a foundation for your further exploration and utilization of Jaccard similarity in your Python projects.

Related Blogs

Browse Flexiple's talent pool

Explore our network of top tech talent. Find the perfect match for your dream team.