In chapter 13 we discuss using 3rd party libraries to extract text from various formats (MS documents, XML SAX and DOM, plain text, PDF) and place that text into indexes utilizing Hibernate Search. I is very easy to do utilizing the constructs of Hibernate Search and the chapter is filled with examples of how to do it.
I forgot to add that I'm not quite sure what you mean by non-text but you have to remember this is a full TEXT search engine. If you wanted to be able to search for, let's say, a particular jpg file, then the searchable data would not be the jpg file itself but text-based metadata that was entered about that jpg file.
As John said, we have examples in the book to index MS Doc and PDFs and you can download the book source code at book.emmanuelbernard.com/hsia
Generally, Hibernate Search provides the concept of bridge that let's you index unknown data type into Lucene. They are pretty much like Hibenate user types but for Hibernate Search.
Here are a few bridge examples people can implement: - read a URL (on your entity) where a PDF is, extract the data from the PDF and index it in the Lucene Document - read the byte (o your entity) representing a MS Document, extract the data and index it in the Lucene document - store and index a Map in a particular way not natively supported by Hibernate Search
Also Hibernate Search can index all the basic JDK types (URL, Date, numbers, etc)