Win a copy of Programmer's Guide to Java SE 8 Oracle Certified Associate (OCA) this week in the OCAJP forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Resume Parsing

 
Michael Malley
Greenhorn
Posts: 20
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
So my company has a website that they use to upload resumes (.doc, .docx) and manually input data from the resume such as Name, Tel No, Address, etc. The site uses PHP, mySQL, and is hosted on an Apache server. They want to automate the process. At first I was thinking of doing some PHP and parsing the file on the website, but I decided against that. I feel the best way to do this would be to use Java EE with a few EJBs and some relational mapping to the database that the website already uses. Therefore- I am here.

My questions range simple to complex:
- Is it a good idea to use Java EE for this? (I think it's the most powerful way to do it with an apache server running mySQL- more robust than PHP)
- Are there some parsing algorithms that one could start me out with? I've done recursive descent parsing with J2SE back in school before, but I think this is a different situation. Obviously the part I'm having difficulty with is predicting where information will be with a lot of possibilities for labels, titles, and formatting (job history vice work history vice professional experience, headed sections vice bolded sections vice indented sections, etc.)
- Additionally, the solution I'm envisioning will involve a lot of looping and looking up words in an enumeration... ("first word is a name so let's see if it matches those criteria, if not that criteria, then all other criteria, and if not them, then move on") I feel that would be very very very inefficient. Any conceptual algorithms anyone could lend me?

After reviewing my questions it's obvious to me that I have no idea what I'm doing, and a starting point would be much appreciated.

Oh, skill level: I've done a lot of academic work with Java and I'm strong in OOP concepts. I've been developing little programs here and there for my company up until now. I wouldn't say I'm an "expert" but I'm competent.
 
Shahzad Latif
Greenhorn
Posts: 28
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Michael,

After reading your question, it looks like you want to process the Doc/Docx files in your Java applications. If you're planning to do it from scratch then I would say it's going to be very tough and complicated. However, you may want to try some Java based API to process the Word documents like Aspose.Words for Java. This is a commercial product though, you'll be able to process your documents quite easily with this component. You may try it at your end to see if it helps. If you need further assistance with this, please write back.
 
Michael Malley
Greenhorn
Posts: 20
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks Shahzad.

Actually, I've used Aspose Words for other projects before. The processing isn't really the issue. What I'm really looking for is some sort of algorithm. I would like it if anyone has done something like this before and shared with me the type of parsing they used and some of the ways they went about doing it efficiently (i.e. did they make up an enumeration of common words to search for, did they use recursion- if so, how?- etc.). So I guess I'm really just looking for an in-depth discussion and some brainstorming partners
 
Shahzad Latif
Greenhorn
Posts: 28
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Michael,

I got your point. In that case, I think you should post the query in Algorithm related forum. "Java in General" is not that specific. Well, wish you good luck in your endeavour.

By the way, I have also tweeted this so maybe some one good in algorithm come across and help you with this: https://twitter.com/#!/shahzad_latif/status/149861663842115586.
 
Michael Malley
Greenhorn
Posts: 20
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Do you know the correct forum? I have looked and don't see any sub-forums about algorithms in the main forums.
 
Shahzad Latif
Greenhorn
Posts: 28
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
If I search algorithm on this site, I find most of the algorithm related discussions in General Computing forum. So, I suppose that's the forum where you should discuss this.
 
Michael Malley
Greenhorn
Posts: 20
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks.

Rather than re-posting a topic, could a Mod please move this thread to the General Computing forum?
 
Michael Malley
Greenhorn
Posts: 20
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
As a follow-up question, could I use JavaCC to generate a parser for this project? I know it's not parsing lines of code and expressions, but is there a way I could define a grammar for a resume?
 
Michael Malley
Greenhorn
Posts: 20
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
This was sort of a brainstorming topic. I've since started this project and would like to thank all those who participated.
 
g Melle
Greenhorn
Posts: 4
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
can you share the code source please ??? or send it to me
my mail : amal.ghrab@esprit.tn
think you
 
g Melle
Greenhorn
Posts: 4
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

think you
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic