Past is prologue, data is the story. · 3y ·
Two simple reasons:-
Logarithm function slope decreases as N/df value increases. This means that beyond a point, increasing N dramatically will not affect TF-IDF score as much - which mimics real life here. Beyond a point, dissimilarity will not matter much.
Log of 1 is 0. Hence when “i” is contained in all documents, w will be zero. Which means documents are completely similar, and their inverse similarity is zero.
5.4K views ·
View upvotes
· 1 of 4 answers
Something went wrong. Wait a moment and try again.