Win a copy of Fixing your Scrum this week in the Agile forum!

Konstantinos Vasileiou

Greenhorn
+ Follow
since Jul 20, 2009
Cows and Likes
Cows
Total received
0
In last 30 days
0
Total given
0
Likes
Total received
0
Received in last 30 days
0
Total given
0
Given in last 30 days
0
Forums and Threads
Scavenger Hunt
expand Ranch Hand Scavenger Hunt
expand Greenhorn Scavenger Hunt

Recent posts by Konstantinos Vasileiou

I try to build my application as a JAR file. The problem is that as I read XML files, the program uses the relevant DTD files from the local folder when executed from within Eclipse, but searches for the DTD in home\user\ when executed as a JAR from anywhere else.
Can I somehow change that, to have the DTDs just in the same folder with the JAR, no matter from where I run the program?

Thank you in advance
12 years ago

Paul Clapham wrote:

Konstantinos Vasileiou wrote:Hmmm. Maybe some code will be clarifying.



Yes. But the clarifying code would be the code where you pass a File or an InputStream or something like that into the parser.


Sorry.... Here it is:


I tested some more things and probably it is not a mistake of the SAX parser after all: I print the list of names the parsing module returns and the name is printed correctly. If I also print the String "André Gonçalves" to the GUI text components, it appears correctly as well. For some reason, the String loses the extra encoding information somewhere in the process? Is that possible?
12 years ago

Paul Clapham wrote:SAX parsers work perfectly well with all Unicode characters.

However your problem description is now confusing. It appears that you tested by outputting data from SAX to your console from XML, and had a problem there. Then you tested displaying a constant value into a GUI component, and that worked successfully. I don't see the test where you output data from SAX to a GUI component, and so it's still possible that your console is not a good testing tool for non-ASCII characters.

It's also possible that you are doing something like passing a Reader with the wrong encoding to the SAX parser, but you haven't posted any code so that's just speculation too.


First, I did not output the problematic data from SAX to the console - I created my objects first, using a SAX parser, and then printed the suspicious field of the object under discussion - and it appeared as I describe above. Outputting the data from the object to a TextArea still has the same problem for this name!

Hmmm. Maybe some code will be clarifying.
I have an XML file that contains data about some objects of my system. I deserialise the file into object instances with a class that uses the SAX parser. I use o CharArrayWriter for reading from the XML

and then


If the problem is not there, then the initialisation might be wrong?...
12 years ago

Campbell Ritchie wrote:Please check what happens if you give the output to a Java object. Try javax.swing.JOptionPane.showMessageDialog(null, "André Gonçalves"); and see what happens. The Windows console is bad at displaying non-ASCII characters.



I tested it and it appears perfectly well!
By the way, I am using Ubuntu 9.04, so it is not related to the Windows console.

I really believe it has to do with the SAX parser that deserialises the entities... Is there some option I should have set to do it? It cannot be that difficult but still it is a very annoying little bug!
12 years ago

Rob Prime wrote:Do you get the same problem (or other problems) when you open the file in Internet Explorer? Perhaps the encoding is simply incorrect.


The encoding is set to ISO-8859-1 and the XML is displayed correctly from Firefox.
12 years ago
I am using SAX to deserialise some objects from an XML input. The problem is that a name that is read, André Gonçalves, cannot be read correctly by the parser (or this is the point where I identify the problem at least.)
In fact, when I print the output either to the console, or to a GUI text component, it appears like this: Marcos Andr� Gon�alves

Can I do something to correct this? It is really annoying...
12 years ago
Thank you - very helpful and detailed reply!
12 years ago
I have implemented a program that invokes some Linux commands to complete a number of tasks. My problem is that I want to add a GUI to the application, so I need to somehow redirect the output of the commands from the error and the out streams to a graphical components, such as a JTextArea.... However, I am not sure how to do it. The code I currently have looks like this:


12 years ago

Jarred Olson wrote:Again, I've never used PDFBox so I'm not sure if you can do this or not (I know you can do it with java.io.*) but you might want to try reading it in line by line to try and keep your heap size down.


I am sure you can do this... at least I do not know such a method of PDFBox. The getText() method extracts all the text at once, but as I can guess from the description of the error message, PDFBox also uses the structure of the pdf document, so I do not know if parsing line by line can exist, similarly to a "flat" I/O stream.
12 years ago
The PDF is not very big (1.8Mb) and PDFBox works fine with much larger files.
12 years ago
To add some more details to my previous post:
Increasing the heap size does not help at all - I tried up to 512Mb getting the same error at this particular pdf.
I made a simple prototype application like this:



and the system throws exactly the same error at this damned file.
12 years ago
Hi guys,
I am using PDFBox to parse sequentially a large number of PDF files, get their text, and write in another document. What I have done is creating a method


that accepts as arguments the name of the file and the Writer to the new file that contains the concatenation of the textual data.

The error I get, when the parsing reaches one specific file, is:





Any ideas what is going wrong?
12 years ago

Ulf Dittmer wrote:

org/fontbox/afm/AFMParser


Do you have that class on your classpath? Maybe PDFBox comes in several jar files.




Yes, you are right. I needed to add the FontBox jar to my build path in order to make it work... Thanks!
12 years ago
Hi all,

I am trying to extract the textual content of PDF files from my Java code. I (am trying to) use PDFBox 0.7.3 and the examples I have found online so far are rather limited. Basically, I did something like this:


and I get the following exception:


Any suggestions from the more PDFbox-experienced users?
12 years ago
Well, I figured it out and it may be useful for other newbies as well:
First, read the great tutorial on the Use of Runtime.exec() before playing with it... My final code looks somewhat like the one in the tutorial (I print only the error stream, as the output stream of wget does not *seem* to give anything useful as info - so there is no need for a second thread.)

The problem was, as the 2nd page of the article indicates: "...failure to promptly write the input stream or read the output stream of the subprocess may cause the subprocess to block, and even deadlock."
12 years ago