Granny's Programming Pearls
"inside of every large program is a small program struggling to get out"
Win a copy of Murach's Python Programming this week in the Jython/Python forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic

PDF to Html convertion in jsp using java  RSS feed

Nazeer Ahammad
Ranch Hand
Posts: 43
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi All,
I'm using below code to convert pdf file to Html. It was printing table content as string.
Example: suppose pdf has table content like below
| Header |
TD1 | TD2 | TD3 | TD4 |

If i use below jsp code.
I'm getting Output as like below

Header TD1 TD2 TD2 TD3 TD4

<%@page import="com.itextpdf.text.pdf.parser.PdfTextExtractor"%>
<%@page import="com.itextpdf.text.pdf.PdfReader"%>
<%@ page language="java" contentType="text/html; charset=ISO-8859-1"
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "">
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
<title>View page</title>
<%! String page1;%>
<%! String[] pagescon; %>
<%! String pages="Nazeer\nAhammad\nDudekula"; %>

PdfReader reader = new PdfReader("D:/tablecontent.pdf");
System.out.println("This PDF has "+reader.getNumberOfPages()+" pages.");
PdfTextExtractor.getTextFromPage(reader, 1);
page1=PdfTextExtractor.getTextFromPage(reader, 1).replaceAll("\\s"," ");

for(int i=0;i<pagescon.length;i++)
<br> <%= pagescon[i]%>

<%} %>

please anyone give solution.

Thank you,
William Brogden
Author and all-around good cowpoke
Posts: 13078
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Seems to me that if you want extracted strings to be presented in an HTML table, you will have to write the HTML formatting yourself.

I would never try to do this with embedded code in a JSP. Instead I would create a class that could be tested outside the JSP/servlet environment. Once you get it producing well formatted HTML then see about using it in JSP.

Paul Clapham
Posts: 22185
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
And it seems to me that if you use a class named PdfTextExtractor, it's only going to extract the text from the PDF. If the PDF contains formatting such as tables, it isn't going to tell you anything about that.
With a little knowledge, a cast iron skillet is non-stick and lasts a lifetime.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!