We have a project coming up in my financial firm requiring we generate 700,000 PDFs in one night. Each PDF will be 7-10 pages and will contain charts and graphs.
We are leaning towards XSL:FO and some type of SVG graphics package. Is there anyone with experience with this kind of thing that could give some adivce or provide some links to assist my search for a solution?
In my perception, FOP (the standard Java implmentation of XSL/FO) is not the speediest package, and combining it with SVG is probably not going to make it any faster. You might want to take a look at the iText library as well. It can include images that you have created elsewhere, which may or may suffice for your purposes.
You'd need to create around 2 files each second, so performance is an issue. I'm not sure either FOP or iText can be that fast for documents of the size you mention that include graphics/images, so I'd do some timings first.
I did a project for a client transforming XML to PDF - the documents ended up being 200 - 300 pages in some cases, taking 20-30 seconds on my slow machine. There were some embedded graphics in jpg or gif format, didnt try SVG. PDF file sizes over 1.5mb.
If your data can easily be created in XML format I would suggest giving XSL:FO a try.
Since this runs overnight, you may have spare processing power sitting on peoples desktops that could be used for a "render farm" type of operation if it proves too much for one machine. Bill [ July 18, 2006: Message edited by: William Brogden ]
Originally posted by William Brogden: if it proves too much for one machine.
As indeed it probably will. Even if we define "overnight" as between 5PM and 9AM, that's 57600 seconds; To do 700000 documents in that time, you'd have to do over 12 documents per second that entire time, which sounds like a very ambitious goal for a single machine.
If the documents are mostly static data, a lot can be pulled into the XSL file and precompiled, speeding up the generation process enormously. But still it sounds like a job that will require at least a multi CPU machine with heaps of RAM and fast connections to your database (or EJB serverfarm or whatever).
I'm reading data from a csv-file, filter it, do some calculation, and write to pdf with itext. For a 25-pages output with about 60 sections with 60 tables and diagrams I need 3 s / Document. The diagram is very simple made with pngencoder: http://catcode.com/pngencoder/