➡️▶️Democratizing Math for ML — Eigenvectors, Eigenvalues and Examples⬆️🔼

8 min readOct 1, 2024

The starting point for many machine learning algorithms involves vectors. Vectors have both direction and magnitude, and transformations can change both.

However, if a vector is an Eigenvector of a particular transformation (or matrix), its direction remains unchanged after the transformation, though its magnitude may change.

But what is an Eigenvector?

According to Wikipedia —

In linear algebra, an eigenvector or characteristic vector is a vector that has its direction unchanged by a given linear transformation.

Let’s understand this with an example

This is a Matrix —

And this is a 1D matrix (Vector) —

I will multiply Matrix A with Vector v —

And I will get a 1D matrix (another Vector)

[1 0] — This can be plotted on the X axis by drawing a line from 0 to 1.

[2 0] — This can be plotted on the X axis by drawing a line from 0 to 2.

So Vector v` is actually twice the length of Vector v. But it still points in the same direction as Vector v (which is towards the positive side of the X axis).

In this context, 2 actually becomes the Eigenvalue that stretches Vector v in the positive x direction without changing the direction of the Vector v.

Eigenvalues and eigenvectors are used in a variety of mathematical operations, especially when dealing with matrices. They help simplify complex matrix operations, solve systems of linear equations, analyze stability in systems, and perform data transformations.

Let’s see an example of Eigenvectors for Matrix operations.

Matrix Operation (Diagonalization)

A diagonal matrix is one in which only the diagonal has non zero values. All non-diagonal elements are zero.

Operations with diagonal matrices are much simpler than with general matrices (less compute intensive also)

E.g. —

⚫️Raising a matrix to a power

⚫️Matrix multiplication

⚫️Solving systems of linear equations or differential equations

Eigenvalues and eigenvectors allow us to easily diagonalize a matrix, making matrix calculations simpler.

When you diagonalize a matrix, you express it as:

▪️ A is the original matrix.

▪️ P is a matrix whose columns are the eigenvectors of A.

▪️ D is a diagonal matrix whose diagonal elements are the eigenvalues of A.

▪️ P^(-1) is the inverse of the matrix P

Assuming you have a matrix A:

We want to find its eigenvalues and eigenvectors, and then diagonalize it.

To find the eigenvalues, we solve the following equation:

Characteristic Equation of A

det is the Determinant (A special number that can be calculated from a square matrix) which can give you several characteristics of a Matrix such as
▪️ Is it invertible?
▪️ Are 2 Vectors parallel to each other?

To diagonalize matrix A, we need to find the Eigenvalues and Eigenvectors as follows —

We want to find the value of λ

(Because we need to diagonalize)

This is now a linear equation

Solving the equation, we get

These are the values that you can plug into the equation to get a value of zero.

Now, to find the Eigenvectors, we need to substitute both the values of Lambda into the Matrix A

[x1 x2] is the target Eigenvector we are trying to determine

Which gives us Eigenvector v1 = [1 1]

For λ = 2

Eigenvector v2 = [1 -2]

The Eigenvectors are now combined to create a 2D matrix as follows

And it goes without saying that applying the equation

We get D as

Eigenvalues and Eigenvectors in ML

Because of the equation —

If we have the diagonal matrix (which we can easily derive based on the Eigenvectors), we can do some very complex computation operations efficiently.

One very important application of Eigenvalues and Eigenvectors in Machine learning is Principal Component analysis (PCA).

Principal component analysis (PCA) is a dimensionality reduction and machine learning method used to simplify a large data set into a smaller set while still maintaining significant patterns and trends.

PCA focuses on reducing the number of variables (or dimensions) involved in the dataset. Although It reduces the number of features is still retains the most ‘important information’, helping to simplify the problem and make it computationally more efficient.

We are now going to apply our knowledge of Eigenvectors to write some code to do PCA for a matrix of values

To do PCA, we need to take the following steps —

▪️ Standardize the dataset (center it around the mean).

▪️ Compute the covariance matrix (shows the covariance between each pair of features in a dataset).

▪️ Find the eigenvalues and eigenvectors of the covariance matrix.

▪️ Sort the eigenvectors by the largest eigenvalues (the principal components).

▪️ Project the data onto the new feature space.

Step 1 —

import numpy as np
import matplotlib.pyplot as plt

#Our 2D matrix
X = np.array([[2.5, 2.4],
              [0.5, 0.7],
              [2.2, 2.9],
              [1.9, 2.2],
              [3.1, 3.0],
              [2.3, 2.7],
              [2.0, 1.6],
              [1.0, 1.1],
              [1.5, 1.6],
              [1.1, 0.9]])

Step 2 —

#Standardize the dataset, centered around the mean
X_mean = np.mean(X, axis=0)
X_centered = X - X_mean

Step 3 —

# Compute the covariance matrix
cov_matrix = np.cov(X_centered.T)

# Find the eigenvalues and eigenvectors
eigenvalues, eigenvectors = np.linalg.eig(cov_matrix)

As you can see, Numpy already has utility functions to compute the Eigenvectors

Step 4 —

# Sort the eigenvectors by the largest eigenvalues
sorted_indices = np.argsort(eigenvalues)[::-1]
sorted_eigenvalues = eigenvalues[sorted_indices]
sorted_eigenvectors = eigenvectors[:, sorted_indices]

# Project the data onto the new principal components (1D)
principal_component = sorted_eigenvectors[:, 0]  # Eigenvector corresponding to the largest eigenvalue
X_pca = X_centered.dot(principal_component)

Step 5 —

# Plot the original data and the principal component direction
plt.figure(figsize=(8, 6))

# Plot the centered data points
plt.scatter(X_centered[:, 0], X_centered[:, 1], label='Centered Data')

# Plot the principal component
origin = np.zeros_like(principal_component)
plt.quiver(*origin, *principal_component, color='r', scale=3, label='Principal Component', width=0.02)

plt.title('PCA - Principal Component Analysis')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.axhline(0, color='black',linewidth=0.5)
plt.axvline(0, color='black',linewidth=0.5)
plt.grid(True)
plt.legend()
plt.show()

# Print results
print("Eigenvalues:\n", sorted_eigenvalues)
print("Eigenvectors (Principal Components):\n", sorted_eigenvectors)
print("Projected Data (1D PCA):\n", X_pca)

The Result —

🔵The eigenvectors of the covariance matrix are the directions of the principal components, and the eigenvalues represent the amount of variance in those directions.

🔵The eigenvector with the highest eigenvalue is the first principal component (it captures the most variance), which is why we sorted the eigenvectors by the largest eigenvalues in Step 4.

🔵The plot shows the original data (centered) and the principal component (shown in red). The projected data is in 1D, giving a reduced version of the dataset while preserving most of the variance.

So essentially, with PCA, we have successfully reduced the 2D matrix to a 1D matrix. To explain —

▪️ The algorithm identifies the principal component, which is the direction in which the data varies the most.

▪️ The data is then projected onto this principal component, essentially reducing the number of features (or dimensions) from 2 to 1.

▪️ After applying PCA, each data point is represented by a single value (1D), instead of two values (2D). This reduces the dimensionality while trying to preserve as much information as possible.

⏺ Before PCA: The dataset is spread across two axes (dimensions), x and y.

⏺ After PCA: The data is projected onto the principal component, which is now treated as the new axis, and every point is described by a single value along this axis.

In PCA, the eigenvectors of the covariance matrix are the principal components, and the eigenvalues tell you how much variance is captured by each component.

Other Applications in Machine Learning

Spectral Clustering — groups similar data points (clustering) based on the eigenvalues and eigenvectors of a similarity matrix. Spectral clustering finds clusters of points that are connected or similar to each other. These clusters might not be easily detected by traditional methods like K-means.
Latent Semantic Analysis — Applies Singular Value Decomposition (SVD) to a term-document matrix, which involves finding the eigenvalues and eigenvectors of the matrix. This reduces the dimensionality of the matrix while preserving the most important patterns, allowing the model to discover latent (hidden) relationships between words and documents. Eigenvectors help represent documents and words in a lower-dimensional space, making it easier to find similarities between them.
Markov Chains — Eigenvalues and Eigenvectors are used to analyze the long-term behavior of systems that move from one state to another, like web surfing, where you move from one webpage to another. by identifying the dominant eigenvector of the transition matrix (or the web graph) we can derive a steady-state distribution that reflects the importance of each page (page ranking).

Summary

Eigenvalues and eigenvectors are fundamental in many machine learning algorithms because they allow efficient manipulation of data, especially when performing vector transformations and matrix operations. While machine learning often deals with vectors, these concepts are particularly useful for dimensionality reduction techniques like PCA, which simplify data and improve computational efficiency. Techniques like PCA don’t just trade accuracy for speed — they also enhance model performance by removing irrelevant or noisy features.

The use of eigenvalues and eigenvectors extends beyond machine learning. In fields like finance and business analytics, they are valuable for tasks like portfolio optimization, risk management, and principal component regression, where they help uncover underlying patterns and relationships in large datasets.

Key Takeaways:

▪️ Eigenvalues/eigenvectors simplify complex matrix operations.

▪️ PCA reduces dimensionality, improving efficiency without necessarily trading accuracy.

▪️ Applications extend beyond machine learning to areas like finance and business analytics.

Follow me Ritesh Shergill

for more articles on

🤖AI/ML

👨‍💻 Tech

👩‍🎓 Career advice

📲 User Experience

🏆 Leadership

I also do

✅ Career Guidance counselling — https://topmate.io/ritesh_shergill/149890

✅ Mentor Startups as a Fractional CTO — https://topmate.io/ritesh_shergill/193786