• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Liutauras Vilda
  • Tim Cooke
  • Paul Clapham
  • Jeanne Boyarsky
Sheriffs:
  • Ron McLeod
  • Frank Carver
  • Junilu Lacar
Saloon Keepers:
  • Stephan van Hulst
  • Tim Moores
  • Tim Holloway
  • Al Hobbs
  • Carey Brown
Bartenders:
  • Piet Souris
  • Frits Walraven
  • fred rosenberger

text Summarization API

 
Greenhorn
Posts: 25
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi friends,
I am developing a java application where i need to extract text content from web pages and then summarize it based on a keyword given by the user.I have extracted the text content from web pages but i need to summarize it based on keyword given.Is there any java tools available which can help me sort this problem or someone can send me some code which converts the text to bits of text.
thanking u in advance
Pradeep
 
Ranch Hand
Posts: 456
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
no easy way to achieve this, sounds like you need a search engine, which indices the text for you.

btw: if not a must, you can save the detour to extract the text from the webpage...

try lucene - lucene.apache.org


regards,
jan
 
Rancher
Posts: 43028
76
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I'm not aware of a text summarization API in Java. Lucene lets you index and search text, but it does not address summarization. I'm also not sure what you mean by "summarize it based on a keyword" - do you want to extract those parts of the text that deal with that particular keyword?
 
Author and all-around good cowpoke
Posts: 13078
6
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
You need to parse the text into units that make sense to humans - phrases, sentences and paragraphs. Next score those units according to the presence of keyword(s), now select the best of the units that are "hits" according to typical writing principles and the size of the summary you are aming for.

What do I mean about writing principles? Think about how you yourself scan text.
For example you expect the first sentence of a paragraph to be meaningful in terms of the content of that paragraph. You expect a good chance that the last paragraph of an article to summarize the article.

In the prehistoric era of computers (showing my age now) there was an indexing technique called KWIC - Key Word In Context. It created a listing with the n words preceeding a key word plus the n words following. This put a burden on the reader to recognize a significant context versus a trival one.

This is a topic of continued interest to me, let us know what you come up with.
Bill
 
It's just a flesh wound! Or a tiny ad:
Garden Master Course kickstarter
https://coderanch.com/t/754577/Garden-Master-kickstarter
reply
    Bookmark Topic Watch Topic
  • New Topic