• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Best Way to Edit a Byte Array

 
Mitch Robinson
Ranch Hand
Posts: 30
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Guys,

Another byte array question I'm afraid, firstly a bit of context. I'm trying to compare a large amount of AFP files but the product I use to generate the AFP files produces a timestamp at the top of each document. This causes my comparison software to show differences, added to this is the fact that if I produce the AFP's before 10am then the files are all one byte less in size due to no leading zero (thanks for that).

So my question is what would be the best way of taking a 5MB files and reading in only the first 150 Bytes, editing these and rewritting the file. The reading of only the first 150 bytes is the section i'm struggling the most with and i'm not sure it can be done?

For the edit I will probably replace the timestamp bytes with a time of 00:00:00 to ensure there are no differences.

Is this possible? if so how would you suggest going about it?

Thanks in advance
Mitch
 
William Brogden
Author and all-around good cowpoke
Rancher
Posts: 13074
6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
That sounds like a job for java.io.RandomAcessFile because it can write into the middle of an existing file.

Consult the JavaDocs for read(byte[]) and write(byte[]) and similar methods. You will also need seek( long ) to position for writing at the start of the file.

Bill
 
Mitch Robinson
Ranch Hand
Posts: 30
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

Thanks for the quick reply, a quick question in regards to the RandomAccessFile I/O, can it be used to insert a byte(add to the file) I didn't see this when I was reading through the I/O.

Also does it read the whole file into memory, or just open it as a random access file?

At the moment, I use the RandomAccessFile to overwrite the bytes, which seems to work ok. But in the case of a timestamp from before 10am I would need to insert an extra byte to bring the file to the correct size for the comparison?

Thanks again,
Mitch
 
Ulf Dittmer
Rancher
Posts: 42968
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
No, RAF can't insert bytes - only change existing ones.
 
Mitch Robinson
Ranch Hand
Posts: 30
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks for the reply Ulf,

What would be the best way of inserting a byte, again trying to avoid reading the whole file in as some file can be very large?

I now have my RAF working so any files that are created post 10:00 am are now corrected, its just if any of the files are created before this I get the problem. i've raised it as a bug with the supplier but as it's pretty minute it won't be changed anytime soon...

Thanks again
Mitch
 
Rob Spoor
Sheriff
Pie
Posts: 20669
65
Chrome Eclipse IDE Java Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Ulf Dittmer wrote:No, RAF can't insert bytes - only change existing ones.

Technically it can, but you have to do the hard work yourself. The protocol to insert n bytes at position m in pseudocode:
The shifting is the hardest part, but that can be done in blocks of n bytes at a time, starting at the end:
It is important to start at the end because the shift will overwrite bytes; if you start at m then you will overwrite bytes you will need to shift later on.
 
Mitch Robinson
Ranch Hand
Posts: 30
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Rob,

Thanks for the reply, would this be the way you suggest to do it?

Also do you know how this will affect performance, as it looks like every byte is both read from the file and written to the file, as the byte I need to add is within the first 73 bytes of the file?

Thanks,
Mitch
 
Mitch Robinson
Ranch Hand
Posts: 30
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator


Example file of 5000 bytes and m is 73, buffer of 1000
In this code, is my logic correct

Seek to position 4000 --> Read fully from 4000-5000 --> Seek to position 4001 --> Write from 4001-5001
Seek to position 3000 --> Read fully from 3000-4000 --> Seek to position 3001 --> Write to 3001-4001
Seek to position 2000 --> Read fully from 2000-3000 --> Seek to position 2001 --> Write from 2001-3001
Seek to position 1000 --> Read fully from 1000-2000 --> Seek to position 1001 --> Write to 1001-2001
Seek to position 73 --> Read fully from 73-1000 --> seek to position 74 --> Write to 74 - 1001

Mitch


 
William Brogden
Author and all-around good cowpoke
Rancher
Posts: 13074
6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I am guessing you are better off just using the stream methods for inserting or removing bytes, writing to a new file and then deleting the old.

My reasoning is that the operating system disk cache and buffers are probably better organized for reading and writing in sequence whereas using random access and working backwards through the file would involve lots more disk seeks.

Let us know if you do time trials with both methods.

Bill
 
Mitch Robinson
Ranch Hand
Posts: 30
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Rob,

Getting really confused by the logic needed to fill your psudeo code



Currently I've got code which gets me down to the byte less than my buffer size by using AFPFile.seek((AFPFile.length()-(i*n)-1));

Do I need a int which increments or literally just to fill in the sections in your psudeo code?

int len = Math.min(remaining bytes, n); For this bit i'm understanding remaining bytes as:-
Total No. of bytes in file - m / no of bytes written
(raf.length() - m) / (raf.length() - m) - raf.getFilePointer()??

am I on the right track?

Thanks
Mitch



Thanks,
Mitch
 
Satya Maheshwari
Ranch Hand
Posts: 368
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Is the replacement string always shorter than the original one(since the time stamp in set to 0)? If yes, you could leverage on that by padding some thing instead of moving all the bytes.
 
Rob Spoor
Sheriff
Pie
Posts: 20669
65
Chrome Eclipse IDE Java Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Mitch Robinson wrote:Do I need a int which increments or literally just to fill in the sections in your psudeo code?

You'll need at least two ints (or long perhaps):
1 for the remaining bytes. Let's call it remaining. It initially is raf.length() - m - n, since all bytes after m + n must be shifted up. After each shift you decrease it by the number of bytes shifted, usually n.
1 for the index to start shifting. Initially raf.length() - n it gets decreased by n each time (but make sure to not go below m). You have mimicked this behaviour with (AFPFile.length()-(i*n)-1), but simple addition / subtraction is faster than multiplication. The index to shift to can be calculated from this index, and is the index + n for simple insertions.
 
Mitch Robinson
Ranch Hand
Posts: 30
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator


Hello again,

I've got the code above, which when I step through it seems to work as expected, BUT when I view the modified AFP it seems to have blanked all of the bytes(not shifted) after position m??

What would be the reason for this? In regards to the speed testing once i've got past this problem I will run some speed test comparison and report back with the results.

Thanks,
Mitch

Thanks again
Mitch
 
Mitch Robinson
Ranch Hand
Posts: 30
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks for the help guys, I've finally sorted it.

Will try to get some speed comparison tests going, but initial testing is showing its taking ~20seconds to shift all bytes from position 73 in a 70MB file so speed seems ok to me, especially for my needs.

Again thanks for the help

Mitch
Making Progress....
 
Rob Spoor
Sheriff
Pie
Posts: 20669
65
Chrome Eclipse IDE Java Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Mitch Robinson wrote:

Here's the code I used for something similar. Instead of inserting one byte it was replacing several bytes. length is the number of bytes to replace, 0 in your case, and data is a byte[] to replace with.
When applying to your example, size == raf.length(), length == 0, diff == data.length == 1, offset == m, index == posToShift and COPY_BUFFER_SIZE == n (but can be anything).

As you see it is nearly the same; the only difference is the calculation of index / posToShift, but that seems to be just fine in your code as well.
 
Mitch Robinson
Ranch Hand
Posts: 30
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks for that Rob,

It seems yours is more re-usable than mine as mine was built for the specific purpose of entering a single byte, but it wouldn't take too much change to accomodate additional bytes being inserted. Also surprised by the speed I thought it would be slower than what my tests are showing....

Mitch
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic