Win a copy of Spring in Action (5th edition) this week in the Spring forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Bear Bibeault
  • Devaka Cooray
  • Liutauras Vilda
  • Jeanne Boyarsky
Sheriffs:
  • Knute Snortum
  • Junilu Lacar
  • paul wheaton
Saloon Keepers:
  • Ganesh Patekar
  • Frits Walraven
  • Tim Moores
  • Ron McLeod
  • Carey Brown
Bartenders:
  • Stephan van Hulst
  • salvin francis
  • Tim Holloway

Compare two files PDF and summaries in Java  RSS feed

 
Greenhorn
Posts: 7
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I am new in Java, I have written code to compare two files of data in Java but that is only working for 1 line not for all lines or whole file. Below is my code:

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;

public class CompareTextFiles {

public static void main(String[] args) throws IOException {
BufferedReader reader1 = new BufferedReader(new FileReader("D:/Data/file1.docx"));
BufferedReader reader2 = new BufferedReader(new FileReader("D:/Data/file2.docx"));
String line1 = reader1.readLine();
String line2 = reader2.readLine();
boolean areEqual = true;
int lineNum = 1;
while (line1 != null || line2 != null) {
if (line1 == null || line2 == null) {
areEqual = false;
break;
} else if (!line1.equalsIgnoreCase(line2)) {
areEqual = false;
break;
}
line1 = reader1.readLine();
line2 = reader2.readLine();
lineNum++;
}
if (areEqual) {
System.out.println("Two files have same content.");
} else {
System.out.println("Two files have different content. They differ at line " + lineNum);
System.out.println("File1 has " + line1 + " and File2 has " + line2 + " at line " + lineNum);
}
reader1.close();
reader2.close();
}
}
 
Saloon Keeper
Posts: 5038
134
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
PDF files with a .docx file extension?

But regardless, you can't compare structured text formats like this. You'll need to use APIs that can access those parts of the file you wish to compare. Apache POI may be able to do this for DOCX. For PDFs this is going to be very hard, if not impossible. You can use Apache PDFBox to extract all text in a PDF, but there's no guarantee that the results will look similar for similar files.
 
nalini shrma
Greenhorn
Posts: 7
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Can you please provide Java api code for word.
 
Norm Radder
Rancher
Posts: 3316
33
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Look at the Apache site for packages that will be useful for word and other types of document files like pdf and xls
 
nalini shrma
Greenhorn
Posts: 7
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I dont have any idea to do and i am new in java, thats why i asked here. If you have code and can provide that would be really good for me.
 
Norm Radder
Rancher
Posts: 3316
33
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
There are probably examples of code here on this forum that use different Apache packages.  Try doing a search here or on the internet for sample code that uses Apache.

You will need to read the API doc on the Apache site and download some jar files that contain their code.  
You will need to put those jar files on the classpath for the javac and java programs.
Your code will need to have import statements for the packages and classes that are used in the code.


i am new in java


I suggest that you postpone this project and work on a simpler one until you have more experience.
 
nalini shrma
Greenhorn
Posts: 7
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I did for .txt file and working fine for that. I am struggling for PDF because i am new. below is output what i get for .txt file by Java:

Both files have different content at line 1
File1 has ABCD DEFG and File2 has ABCD DEF at line 1
Both files have different content at line 3
File1 has NNN and File2 has NN at line 3
Both files have different content at line 4
File1 has 6y7u and File2 has 6y at line 4


Same i was expecting for PDF
 
Tim Moores
Saloon Keeper
Posts: 5038
134
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You seem to have missed what I wrote earlier. There are fundamental reasons why that sort of approach doesn't work with PDFs (or other structured file formats).

Someone else asked recently about extracting text from a PDF; see my reply at https://coderanch.com/t/691561/Task-write-code-reading-fields. Maybe those tools give you something to work with.

But overall I'll have to agree with Norm that this particular project seems a bit out of reach for someone just starting with Java.
 
nalini shrma
Greenhorn
Posts: 7
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi, I have written below code to compare 2  .txt file and summarize report and my code is working for .txt file, But i am unable to do for PDF, Can you please provide code:

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;

public class CompareTextFiles {
   public static void main(String[] args) throws IOException {
       BufferedReader reader1 = new BufferedReader(new FileReader("Data\v1.txt"));
       BufferedReader reader2 = new BufferedReader(new FileReader("Data\v2.txt"));
       String line1 = reader1.readLine();
       String line2 = reader2.readLine();
       boolean areEqual = true;
       int lineNum = 1;
       while (line1 != null || line2 != null) {
           if (line1 == null || line2 == null) {
               areEqual = false;
               break;
           } else if (!line1.equalsIgnoreCase(line2)) {
               System.out.println("Two files have different content at line " + lineNum);
               System.out.println("File1 has " + line1 + " and File2 has " + line2 + " at line " + lineNum);

           }
           line1 = reader1.readLine();
           line2 = reader2.readLine();
           lineNum++;
       }

       reader1.close();
       reader2.close();
   }
}


Below is my output:

Both files have different content at line 1
File1 has ABCD DEFG and File2 has ABCD DEF at line 1
Both files have different content at line 3
File1 has NNN and File2 has NN at line 3
Both files have different content at line 4
File1 has 6y7u and File2 has 6y at line 4
 
Tim Moores
Saloon Keeper
Posts: 5038
134
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
That's pretty much what you had posted earlier - we've already seen that, so there's no point in repeating it. But you seem to ignore what myself and others are telling you.

Can you please provide code


Nope, nobody here is going to do your work for you, especially if you don't listen to the advice you're getting.
 
nalini shrma
Greenhorn
Posts: 7
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I am unable to do for PDF thats why i am asking.
 
Norm Radder
Rancher
Posts: 3316
33
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Start  by looking at the programs that Tim's post linked to.  They are supposed to read and extract text from PDF files.
 
Tim Moores
Saloon Keeper
Posts: 5038
134
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

nalini shrma wrote:I am unable to do for PDF.


Yes, we got that. But since we're not going to do the work for you, we give you hints where you can start. That's what this site is all about. I'm not getting the impression that you're trying hard to tackle this yourself, though.
 
It is sorta covered in the JavaRanch Style Guide.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!