i am able to search content of an pdf using Apache lucene, but if some images are there in that pdf
my probelm starts it's not searching the content of an image in that pdf. Does any body know how
search image content which present in the pdf file.
What do you mean by "searching the content of an image" - do the images contain text in them, and you'd like to search in that text? If so, that's a hard thing to do, and Lucene can't do it for you. You'd need to extract the images (maybe using a library like PDFBox), and then perform Optical_character_recognition on the image. That may provide you with text that you can index using Lucene.
I hope this may help regarding Aspose: Aspose.OCR for .NET is a character recognition component built to allow developers to add OCR functionality in their ASP .NET web applications, web services and Windows applications. It provides a simple set of classes for controlling character recognition tasks. It helps developers to work with image (BMP, TIFF) files from within their own applications. It allows developers to extract text from images quickly & easily , saving time & effort involved in developing an OCR solution from scratch. View more details at: http://www.aspose.com/categories/.net-components/aspose.ocr-for-.net/default.aspx
pie. tiny ad:
Building a Better World in your Backyard by Paul Wheaton and Shawn Klassen-Koop