Win a copy of The Way of the Web Tester: A Beginner's Guide to Automating Tests this week in the Testing forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

OpenNLP and Apache commons.lang

Marcus Hirschbine
Posts: 11
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
These two libraries I will be implementing into a text based game engine! (Since the primary focus is on these APIs, I decided to use the API board instead of the game development board.)

Specifically from the OpenNLP project, (Open Natural Language Processing), I am focused on the SentenceDetector (and related), as well as the Parser (and related).

Look, here in my constructor I have initiated these two tools with their, "Trainers" which give them the... 'brains' to do what they do in breaking down and recognizing natural language in english.

And now the reason I created this thread, is that an appropriate method for initializing these objects/tools?

Their purpose will be to analyze and parse various kinds of input from the user, to determine what the user is doing, because by text based,
I don't just mean choose your own adventure, I'm talking of implenting a full on complex control structure.

> Go west, grab that sword and swing at the orc with it.

The sentence detector would do it's thing to that, and tell me, "Yup, that is one sentence."


The parser will do some crazy %!#@ to it which I have a hard time describing, but it comes out looking like this...

Input: The quick brown fox jumps over the lazy dog .

Output: (TOP (NP (NP (DT The) (JJ quick) (JJ brown) (NN fox) (NNS jumps)) (PP (IN over) (NP (DT the)
(JJ lazy) (NN dog))) (. .)))

The tags and such generated I have yet to have memorized the glossary for, NN=Noun... and such...

The parenthesis is the framework too a tree mapping of the sentence structure I'm assuming but I have yet to visualize methods of applying this...


I am wondering if anyone is familiar with the libraries OpenNLP and the commons.lang from Apache.
And if one might be available to reply on this thread for the ongoing future to come because I will be working
heavily with these libraries implementing their interfaces into my engine.



Edit: Turns out I scrapped the parser tool! All I needed were the other tools, POSTagger (Part-of-speech Tagger) and the tokenizer!

Process of using these tools:

1. Sentences are separated into different elements of an array as Strings.
2. Each sentence is tokenized and the tokenized sentences are stored into an ArrayList<String[]>. (Each String array, contained the tokenized sentences.)
3. Then the POSTagger iterates through the sentences' tokens, and generates an ArrayList<String[] of tags! (Each element to each array corresponds to the matching tokens from the sentences.)

The results yield:

Sentence: Crypto_PRP ,_, please_VB execute_VB command_NN zero_NN ._.
> Crypto, please execute command zero.
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic