Thanks, Stan. I have a reasonable amount of experience with Java and OOP, but not vast experience with web frameworks. I also have substantial experience with RDBMS and SQL. My thought therefore was yours: certainly no EJB, and, after some internal struggle, have decided not--at least at the outset--to use a framework (e.g. Spring). I have used in-house frameworks before, and although I know Spring is supposed to be very good, I am nevertheless leery of the prospect of debugging through framework code when I might not need it. Some kind of controller servelet/command
pattern would likely do the trick for a while. Another internal debate was over the database/persistence layer, and I have decided, for the time being, to go either with home-grown
JDBC or at most with iBatis. Hibernate indeed looks great but since I have not done lots of coding for a couple of years, I have decided not worry about persisting object graphs until it seems I need to. With iBatis I would retain control over the SQL sent to the database.
In a prototype of the display formatting I used some initial XSLT complemented by JSTL XML processing. This seemed to work pretty well: the prototype is up at
http://163.1.169.41/prototypeApp (to view it properly you need a unicode font enabled for classical Greek such as code2000.) The point of the text markup is to represent a Greek text, originally written on a papyrus manuscript, to show where letters are missing or obscure, where certain marks (called paragraphoi) are written on the papyrus etc. The markup conventions--brackets etc., are standard marking conventions used in academic hardcopy publications.
And thanks for the hint about Lucene. In my Lucene prototype I had simply read the unicode in from text files, since some demo code was available and since this was an easy way to ascertain exactly what unicode text I was processing. But you are right that in a real application I might well want to slurp up the text into Lucene right before insertion into the RDBMS.
The RDBMS would be likely be PostGreSQL (I would like Oracle but PostGreSQL is free). Here again, though, I am haunted by the Agile maxims: perhaps I should just use MS Access and be done with it. Of course that would bind me to a Windows platform--maybe we will use Access as a front-end reporting tool. I also don't know if Access can handle a CLOB datatype.
This is all just a paraphrase to invite further discussion. Any comments, questions, suggestions or rebuttals on these ideas are welcome!