Jason Liu's answer to What is an intuitive explanation of singular value decomposition (SVD)?

What is an intuitive explanation of singular value decomposition (SVD)?

Studies Computational Math and Data Analysis.. · Upvoted by

, studied Machine Learning & Magnetic Resonance Imaging at Auburn University (2016) and

, Ph.D Computational Linguistics & Machine Learning, University of Lethbridge (2013) · Author has 61 answers and 557.9K answer views · Updated 7y ·

There is a bit of math in the beginning of this post but I also wrote a quick MATLAB program that visualizes what SVD can do to an image.

In the context off data analysis, the idea is to use a rank reduced approximation of a dataset to generalize some of the properties/structures.

What is Singular Value Decomposition?

To find a SVD of A, we must find V, \Sigma and U such that:

[math]A = U\Sigma V^T[/math]

1. [math]V[/math] must diagonalize [math]A^TA[/math]
1.1. [math]v_i[/math] are eigenvectors of [math]A^TA[/math].

2. [math]\Sigma[/math] where [math]\Sigma_{ii} [/math] are singular values of [math]A[/math].
3. [math]U[/math] must diagonalize [math]AA^T[/math]
3.1 [math]u_i[/math] are eigenvectors of [math]AA^T[/math].

If [math]A[/math] has rank [math]r[/math] then:

1. [math]v_i \cdots v_r[/math] forms an orthonormal basis for the range of [math]A^T [/math]
2. [math]u_i \cdots u_r[/math] form an orthonormal basis for the range of [math]A[/math]
3. Rank of [math]A[/math] is equal to the number of nonzero entries of S. From the form of this factorization

We see that we can express [math]A[/math] another way, it can be shown that [math]A[/math] can be written as a sum of Rank = 1 matrixes.

[math]A = \sum_{i=1}^r\sigma_iu_iv_i^T[/math]

We know that by construction [math]\sigma_i[/math] monotonic decreasing, the significance/weight of the [math]n^{th}[/math] term decreases. This means that the summation to [math]k < r[/math] is an approximation [math]\hat{A}[/math] of rank [math]k[/math] for the matrix [math]A[/math].

tl;dr

Singular value decomposition is essentially trying to reduce a rank [math]R[/math] matrix to a rank K matrix.

But what does this mean?

It means that we can take a list of [math]R[/math] unique vectors, and approximate them as a linear combination of [math]K[/math] unique vectors.

Take this example, the image below is an image made of 400 unique row vectors.

SVD is my favourite algorithm, and Feynman is my favourite scientist.

What happens if I take the first singular vector?

Notice that each row of pixels is the same... just different 'brightness'

Essentially, each row can now be written as as [math]R_i = c_i \cdot SV_1[/math] where [math]c_i[/math] is the scale factor and [math]SV_1[/math] is the singular vector with the highest singular value (biggest contribution to data)

What happens if I take the first two singular vectors?

Now, each row can be written as the sum of two vectors

[math]R_i = c_i \cdot SV_1 + b_i \cdot SV_2[/math]

Where [math]SV_2[/math] is the second most influential singular vector.

k=10

Can you believe it? With only 10 unique vectors I can almost make out the image!

k=50

There you have it.

Using 50 unique values and you get a decent representation of what 400 unique values look like.

Vector Space of things.

Think of another example. Say I'm netflix. And I have 100 million users and each one has watched the same 500 movies.

What I can say is that each person can be characterized in the basis of [math]R^n[/math] where [math]n=100[/math]. So each user is in the vector space of movies.

How do I reason about this data? We have no way to 'reduce' the dimension of the vector space of movies. How do I get insight?

Maybe I'll take the SVD truncated at ... k=50.

What does this mean? This means that I'm going to find the best 50 vectors that might be a good representation of the matrix. I'm going to find the singular vectors, what I do is mathematically the following statement.

Its a change of basis from the movie basis to the first k singular vector basis. The interpretation could be that each user can be equal to

[math]user = \Sigma^{50} c_i \cdot SV_i[/math]

Where SV is the singular vectors.

We can consider those 50 singular values to be 'types of users'. We may discover that [math]SV_3[/math] has high ratings on sports movies while [math]SV_2[/math] has very high ratings on horror movies.

This allows us to consider only 50 types of users and use those 50 to generalize the 500 (tada, you have dimensionality reduction)

SV_1 = documentaries
SV_2 = horror
SV_3 = sports
...

When we look at the singular values (instead of the singular vectors) we notice that a user has the values [0,4,2,0..]

This means that user = [math]4 \cdot SV_2 + 2\cdot SV_3 + ... [/math]

Translation : This user likes horror movies and maybe watchs sports too.

We get to rewrite people from the movie basis to the movie genre basis. Thats mathematical, it works pretty well too.

Other examples could be from word basis to the topic basis or cloths basis to fashion style basis.

154.1K views ·

View upvotes

View 10 shares

1 of 16 answers

Something went wrong. Wait a moment and try again.

View 15 other answers to this question

About · Careers · Privacy · Terms · Contact · Languages · Your Ad Choices · Press ·