• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Tim Cooke
  • paul wheaton
  • Jeanne Boyarsky
  • Ron McLeod
Sheriffs:
  • Paul Clapham
  • Liutauras Vilda
  • Devaka Cooray
Saloon Keepers:
  • Tim Holloway
  • Roland Mueller
Bartenders:

port from C# to Java - explanation needed

 
Ranch Hand
Posts: 218
5
MS IE Notepad Suse
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi there,

@Mods: maybe cross-reference this to Java - general

I'm trying to port some old code found online to Java. I only understand very little C# - basically as much I can get what each line means with the help of docs.

What does this code do: It get's the path to a file and scans it for data by matching hashes. Doing it with the first hash and iterating all over the file at least get me something (as I don't have some "output" of the original C#-programm I can't check if it's correct - but only matching SHA1 hashes - wich should collision resistive enough for the purpose of this project) - but the original code uses some C# stuff I just don't know and even many hours of Google only helped very little. So maybe someone with a bit more experience in C# (and a bit of Java wouldn't be bad) could help me to at least understand what this does so I can search for implement it in Java (or maybe someone know both languages well enough to provide me with some ideas).
As I know many threads of such questions end up in: "please give me some code" - I want one to know: I don't ask for code / request/demand someone to do so - only if one feels bored and is willing to help a few lines will be highly appreciated. Explanation is good enough (at least I hope).

I cut some of the original code - if some more information needed feel free to ask:

Some hashes, either one single hash or an array of hashes:


Methods (functions?) to search either for one hash or for an array of hashes:
note: I didn't wrote any of this - just copied


BLOCK_LENGTH is hardcoded to 1MB (1048576)
The passed "stream" is created as this:


So, where do I need help?

Fist: Although for me as a Java guy the naming of "IList" looks "out-of-spec", I guess this is a type-interface for at least some sort of List. As the methods (functions? what's it called in C#?) are either called with the single array, or the array of array (there is no such thing as 2D-arrays - at least not in Java - guess the same in C#), I guess IList also covers array types - am I right? Otherwise I could not see how it should work when calling like this:

If so - this would be easy to port to Java - I just need confirmation if my assumption is correct.

Second: Although I can at least imagine what this "parallel" thing does (with some help of the docs) I'm not really sure about it. As far as I understand it it parallelize the workload but not going through the file in sequence but split up in many worker threads each scanning only 1x blocksize (1MB in this case). This should also easy to implement - but can be omitted as it's a one-time job - so it can take all the time it needs. At least I'm not sure how this codes handles edge-cases where the data is split in between to blocks ...

Third: I know it from Java - and the conventions state not to do so - but I guess it's the same in C#: the IF and FOR with only one-lines are the same as with added paranthesis - am I right?

Any help appreciated. If something's needed just ask - I'll try to provide.
Thanks in advance

Matt
 
Bartender
Posts: 15737
368
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Matt Wong wrote:Fist: Although for me as a Java guy the naming of "IList" looks "out-of-spec", I guess this is a type-interface for at least some sort of List.


In .NET, it's convention to prefix interface names with a capital letter 'I'. I'm not a fan, but it's better to follow convention than your own rules.

As the methods (functions? what's it called in C#?) are either called with the single array, or the array of array (there is no such thing as 2D-arrays - at least not in Java - guess the same in C#), I guess IList also covers array types - am I right?


"Method" is correct, as is "array of arrays".

In .NET, System.Array implements System.Collection.IList, so yes: you can assign array references to variables of type IList.

Second: Although I can at least imagine what this "parallel" thing does (with some help of the docs) I'm not really sure about it.


It's similar to writing

At least I'm not sure how this codes handles edge-cases where the data is split in between to blocks ...


You'll have to consider it even if you process all blocks sequentially, because the algorithm operates on blocks regardless.

Third: I know it from Java - and the conventions state not to do so - but I guess it's the same in C#: the IF and FOR with only one-lines are the same as with added paranthesis - am I right?


If you mean whether you can substitute a compound statement for a simple statement, then yes. However, as in Java, not all keywords accept a simple statement. An example is the try-statement, where each of the try-, catch-, and finally-clauses MUST be a compound statement.
 
Matt Wong
Ranch Hand
Posts: 218
5
MS IE Notepad Suse
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks for the explanation. About the parallel - I'm not familiar with Javas Stream api - but the Java syntax helped a bit to try to understand this:

In the code I'm try to port, there is this array of byte-arrays contain hashes - the code looks a bit strange - is there any guarantee that each of the about 100 hashes in the return array are populated?
As I'm looked at the code it's a simple check if the calculated hash for the current sub-block is part of the hash-array - and only if the data are added to the result - otherwise - well - the data for the hash not found would just be empty / zero ?

I already started to implement a few lines and let it run. The first hash can be found, but none of the hashes in the hash-array. Maybe these data are not contained in the input-data, maybe I've got some length wrong. I tried another tool wich does basic the same - but I can't tell if it is able to find the data as it's only output is the data from the single hash - wich is identical to the data I'm able to find. But this is somewhat related to the file and it's data I have to investigate on other more specialized forums.
 
Stephan van Hulst
Bartender
Posts: 15737
368
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Matt Wong wrote:is there any guarantee that each of the about 100 hashes in the return array are populated?


No. The algorithm appears to reverse a list of hashes with the help of candidate byte strings from an input file. If the entire file is processed and fewer candidates were found to correspond to a hash than there are hashes in the list, then obviously the result array will contain gaps.

- otherwise - well - the data for the hash not found would just be empty / zero ?


No. The result is an array of arrays, and arrays are reference types. Gaps in the array (hashes that weren't found and reversed) are represented by the default value for reference types: null.

I already started to implement a few lines and let it run. The first hash can be found, but none of the hashes in the hash-array. Maybe these data are not contained in the input-data, maybe I've got some length wrong. I tried another tool wich does basic the same - but I can't tell if it is able to find the data as it's only output is the data from the single hash - wich is identical to the data I'm able to find. But this is somewhat related to the file and it's data I have to investigate on other more specialized forums.


We can't help you with this because we don't have the data, and even if we had it wouldn't be able to tell you if the input file is corrupt or the algorithm wrongly implemented.

The original code contains some sloppy coding that doesn't make me very hopeful for the correctness.
 
Why is the word "abbreviation" so long? And this ad is so short?
Smokeless wood heat with a rocket mass heater
https://woodheat.net
reply
    Bookmark Topic Watch Topic
  • New Topic