• Post Reply Bookmark Topic Watch Topic
  • New Topic

Part of Speech tagging solutions in Java  RSS feed

 
vin Hari
Ranch Hand
Posts: 189
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Ranchers,

i need a experts to suggest me with the solutions to this problem.

problem:
i have few  text files with nouns,pronounce,verbs etc...
i need to read(around 8 files) into the memory only once .
The process should be able to tag multiple sentences from the input text simultaneously.
The output should return the tagged text in the format word/tag in the order of the original text.

example: if i have sentence in a file:
My aunt’s can opener can open a drum

i should be able to read the file and convert the sentences or paragraph from the file to the output file in the following format
the output should be :
My/PRP$ aunt/NN ’s/POS can/NN opener/NN can/MD open/VB a/DT drum/NN

can some one help me how to achieve this. thanks.


 
Campbell Ritchie
Marshal
Posts: 56536
172
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Probably: No.

You have got yourself a very difficult task there. Determining whether the can in “can opener” is a noun or a verb is just as difficult a task as determining whether can in “can open” is a noun or a verb. In fact I think the answer is neither in the latter case; it is an auxiliary verb and its present meaning cannot exist without open following. In fact you shou‍ld regard both “can opener” and “can open” as phrases. One is a noun comprising two words and the other a verb comprising two words, what the English language people call a phrasal verb. I would suggest you can do several things:-
  • Find some details about natural language processing, but that is cutting edge research which has taken sixty‑plus years even to get to the level of Google Translate.
  • Simplify your vocabulary and your grammar. Avoid words with two meanings, e.g. bear, can. Restrict yourself to one‑w‍ord terms. Restrict yourself to sentences in the form subject→verb→object.
  • Find a book like Mason Brown and Levine Lex and Yacc (O'Reilly) which shows an example using lex to print verb noun etc, but that restricts itself to my simplified vocabulary.
     
    Campbell Ritchie
    Marshal
    Posts: 56536
    172
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    vin Hari wrote:. . . My aunt’s can opener can open a drum . . .
    Had you written
    My aunt’s can opener can open a can
    that would have been easier because you can create a rule that verbs cannot follow “a”.
     
    vin Hari
    Ranch Hand
    Posts: 189
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Thank you Campbell for your reply, ranchers any other suggestion, how abt reading the file to database and screen the entire line and divide them on space and attach the tag.

    please let me know if there are any suggestions thanks.
     
    Tim Cooke
    Marshal
    Posts: 4043
    239
    Clojure IntelliJ IDE Java
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Natural Language Processing is a whole research area in and of itself. I propose that trying to write one yourself from scratch would be a frustrating and ultimately fruitless exercise. You might make more progress if you chose to use an existing library to process the text, such as with the Apache openNLP project.
     
    • Post Reply Bookmark Topic Watch Topic
    • New Topic
    Boost this thread!