• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Liutauras Vilda
  • Bear Bibeault
  • Tim Cooke
  • Junilu Lacar
Sheriffs:
  • Paul Clapham
  • Devaka Cooray
  • Knute Snortum
Saloon Keepers:
  • Ron McLeod
  • Tim Moores
  • Stephan van Hulst
  • Tim Holloway
  • Frits Walraven
Bartenders:
  • Carey Brown
  • salvin francis
  • Claude Moore

What is latent semantic indexing?  RSS feed

 
Ranch Hand
Posts: 201
2
Android Java Tomcat Server
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi all,

Can someone explain what is latent semantic indexing? Where can it applied?

Sorry, I still new to this.

Thanks.
 
Author
Posts: 5
5
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Randy,

Hobson is the real expert on this point of NLP, but in a nutshell... LSI is one method to create a dense vector/mathematical representation of the content of a document in a particular corpus.

For the entire corpus, find the list of unique words.  
For each document in the corpus count up the occurrences of each of those words.  Some will be zero.  
Then use a dimensionality reduction technique such as SVD to get a vector that represents the meaning of that document 'in relation to'all the other docs.

You can then use the same path as above on me documents and then a similarity metric such as cosine distance to find similar documents in the corpus and surface them (in a search query, say).

Hope that helps,
Cole
 
Don't get me started about those stupid light bulbs.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!