Win a copy of Functional Reactive Programming this week in the Other Languages forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

reading info from english dictionary

 
Shrinath M Aithal
Ranch Hand
Posts: 82
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
hi all,

here is what i am trying to do :
Trying to achieve Natural Language Processing in java.
To do that, the first step is to be able to classify words into their respective parts of speech. To do that, I need to refer to a dictionary or build a database myself. Building a database myself to classify noun or verb seems stupid, so I was thinking if I could make the program to go online when it finds the words not in its database and add that word to the local database using some online dictionary?

If anyone feels uncomfortable to read the question, please post your doubts,
if anyone feels there is a better way of doing this, help me with your ideas,
if anyone knows how to do it, please do guide me..
thanks to all
 
David Newton
Author
Rancher
Posts: 12617
IntelliJ IDE Ruby
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Seems reasonable, although if the dictionary in question doesn't have an API it'll be a lot of work. You should probably check for existing word classification work since NLP isn't a new field.

Be aware that classification depends on context, and NLP in general is a non-truvial problem.
 
Campbell Ritchie
Sheriff
Pie
Posts: 50258
79
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Not a "beginning" question. Moving.

As David Newton says, natural language processing is a major problem; it is really a science in its own right.
 
Shrinath M Aithal
Ranch Hand
Posts: 82
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
ok, thank you guys..
But may I know how do you read from a online page on the web and extract only the information you want?? Like lookup a word in online thesauraus and say if it is verb or noun or what part of speech it is?
Because I googled a bit, and couldn't find many java source codes that could do what I wanted.. Any help would be enlightning and appreciated..
 
David Newton
Author
Rancher
Posts: 12617
IntelliJ IDE Ruby
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Without an API you'd have to screen-scrape.

As I said--I'd seriously consider looking for existing datasets, although naive, non-contextual usage may not be what you want.

I'd probably join the ACM (if you're not already a member) and start reading papers---a ton of dissertations and theses have been written on what you're trying to accomplish.
 
Shrinath M Aithal
Ranch Hand
Posts: 82
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
ok.. So what you say is I use the already existing datasets, what do you feel about Wordnet? would it be easier ?
By the way, thanks for that ACM, I wasn't aware of that.. Now there are loads of things what I wanted
 
Shrinath M Aithal
Ranch Hand
Posts: 82
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
found a good api based and command line based Parts of Speech tagger, "stanford pos tagger", thought would just let anyone know if they are looking for one.. Thank you guys
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic