• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Tim Cooke
  • Campbell Ritchie
  • Ron McLeod
  • Junilu Lacar
  • Liutauras Vilda
Sheriffs:
  • Paul Clapham
  • Jeanne Boyarsky
  • Henry Wong
Saloon Keepers:
  • Tim Moores
  • Tim Holloway
  • Stephan van Hulst
  • Piet Souris
  • Carey Brown
Bartenders:
  • Jesse Duncan
  • Frits Walraven
  • Mikalai Zaikin

Finding all physical or logical drives, reading raw data

 
Greenhorn
Posts: 10
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi all,

I'm looking for a little guidance on something. Let me give some background.

I have developed some tools with Java that I use in projects that read data from input and look for various things based on heuristic signatures, patterns, all kinds of things. These tools have been a huge help since I work on these kinds of data processing projects often.

For input, often it is a big file, and it can be an image file of an entire drive made with a separate off-the-shelf software. My projects are often sifting through raw data, not just current valid files in a file system. My tools use BufferedInputStream (FileInputStream) to do the actual reading of bytes.

At one point I learned I could read raw data directly from a local physical drive using \\.\PhysicalDrive0 for example in Windows. It works the same way using a Java File object. The big difference is that with a physical drive, exists() returns false, canRead() returns false, length() returns 0, getTotalSpace() returns 0. With an actual file, all those return true or have proper numbers.

So I have adapted my tools to also work on physical drives. As far as I know, there isn't a way to know the size (except using JNI), so I have to take that into account and processing just has to end when it runs out of data. That works fine if I just let my program know whether it's a physical drive or a file.

Now here is what I want to do next. I'd like to make my current project run across all drives in the computer to process their raw data, and the user won't have to specify the drive.

I am aware of:

That returns File objects for each logical drive (aka partition) in the computer, which is great. Actually it doesn't matter to me whether my project processes data for all physical drives or all logical drives. However, it seems those are treated as directories - isDirectory() returns true - and you can't actually read data directly from those. In Java, I know directories can be Files too. So it seems that doesn't help with reading raw data. When I try, it says "access is denied". And when I search on that, people say it's because it is a directory.

Is there any way to find all physical drives in the local machine?

One method definitely crossed my mind. I could just try reading data on \\.\PhysicalDrive0, \\.\PhysicalDrive1, \\.\PhysicalDrive2, .... \\.\PhysicalDrive55 or something like that to detect drives. If some bytes are actually read, then those physical drives exist. As far as I know, attempting to read data is the only way to know whether a drive exists (apart from JNI).

But that's really not a good way to do it and probably a last resort. Is there a better way?

I'll add that I haven't used JNI really and it's been really long since I did anything in C++. But if that is the best way to go, I'll work on it.

I appreciate any help!
 
Saloon Keeper
Posts: 25653
183
Android Eclipse IDE Tomcat Server Redhat Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Welcome to the Ranch, Michael!

The first thing to determine is what a "physical drive" really is. DOS Drive IDs are assigned to partitions so it's common for multiple DOS "drives" to be on a single physical disk. There's also the question of how to deal with the metadata on a physical drive both relating to partitioning and to the various types of filesystems. And then there's UEFI.

In Java, the universal filesystem model is based on Unix. In Unix there is always one root drive and the other filesystems are linked (mounted) as directories under it. In the case of Windows, then, there should potentially be a "/A", "/C", "/D", and so forth. And there may be filesystem share aliases like "/M$" Anything directly under "/"  that is a directory is thus a potential DOS disk. Unmounted partitions, however, will not show up there, and there may be other file and directories directly under "/" that are not filesystem mounts.

You have discovered some of that yourself, I can see.

The proper Java syntax to reference filesystem objects via absolute path under the C drive would be via "/C:". I think an alternative may exist - something like "/C|", but memory is hazy. Of course, any filesystem object that the Java app doesn't have filesystem rights to is not going to be accessible, however.

That's about as far as I can take you though. I don't have Windows these days. After TurboTax went web-based, my last need for Windows died.
 
Marshal
Posts: 75874
361
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
As far as I can remember, under Windows® the drive letters aren't preceded by /s, and they go something like
  • A=1st floppy drive (!),
  • B=2nd floppy drive (did they ever use B?),
  • C=1st hard disc, or 1st partition on hard disc,
  • and removable USB drives and similar are G H I etc. I am not certain because I have hardly used Windows® either, but Wikipedia should prove me wrong.
     
    Tim Holloway
    Saloon Keeper
    Posts: 25653
    183
    Android Eclipse IDE Tomcat Server Redhat Java Linux
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    Yes, there were B drives. The standard floppy controller chip had an A/B drive select line.

    Now whether or not any MS/DOS machines were ever built with 2 floppies, I can't recall, but they were not uncommon on CP/M — the predecessor to MS/DOS-PC/DOS. Back before hard disks were an option.

    The hard disk controller was separate hardware from the floppy controller. The floppy controller was generally pre-installed on the motherboard, whereas the earliest hard drives generally came with a very individualistic controller card. Oh, the days before SATA and IDE! It was for this reason that hard drive assignments started at C.

    The "/C" in a Java file path is specifically for Java. It normalises the syntax to be Unix-like as well as establishes "C:" to be an actual disk mount. You can name directories or even files "C:" under Linux, so skipping the "/" can be hazardous. Consider a relative filesystem path of C:/program files. Now imagine you've created an absolute path of "/windows/backups/C:/Program Files". I can do exactly that.

    Note that while A and B are pretty firmly attached to floppy drives, everything from C on up is fair game. Depending on how you set up the system, C and F might be hard disk partitions, but D could be a CD-ROM and E might be a network share. The actual assignments were originally made by BIOS detection at boot time and elaively static, but over the years, they got more and more configurable.
     
    Campbell Ritchie
    Marshal
    Posts: 75874
    361
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    Tim Holloway wrote:Y. . . The "/C" in a Java file path is specifically for Java. . . .

    A bit like using / rather than \\ in paths and having the JVM convert them to \ if on Windows®.
    I don't know whether this part of the Java™ Tutorials will help. Or this part.
     
    Tim Holloway
    Saloon Keeper
    Posts: 25653
    183
    Android Eclipse IDE Tomcat Server Redhat Java Linux
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    Campbell Ritchie wrote:

    Tim Holloway wrote:Y. . . The "/C" in a Java file path is specifically for Java. . . .

    A bit like using / rather than \\ in paths and having the JVM convert them to \ if on Windows®.
    I don't know whether this part of the Java™ Tutorials will help. Or this part.



    Actually, there are places in Windows where a forward (real) slash is accepted in filename paths. But I never learned the rules, so I don't know when it applies.

    The whole reason why CP/M, DOS, and Windows, successively, use backslashes in filepaths comes from the fact that a lot of CP/M was lifted from the DEC PDP Operating systems. Unlike Unix, which uses dashes to indicate a command switch, DEC used the forward slash (as do most legacy DOS utilities). That results in ambiguities when used with filenames on the command line, so backslashes became the CP/M standard. Note that similar issues are responsible for the differences in how ":" and ";" work on JVMs for command lines on different run platforms.

    There are other filesystem path syntaxes. I believe that early Apple filesystems used ":" as a path separator. The PR1ME minicomputer system used a "<" to indicate a volume root, and ">" for directory settings (example: <SYSRES>CMDNC0>GREP"). It is sometimes convenient to translate IBM mainframe filenames that way, also, but IBM did not historically support hierarchical filesystems, and only occasionally used common name elements (SYS1.LINKLIB, SYS1.LPA). IBM's VM/370 OS is actually more like DEC PDP disks, thus similar to CP/M and MS-DOS.

    JVM built-in-libraries have conversion rules to take a Unix-format name and translate it to the local filesystem syntax, however. Which is why for maximum portability I always recommend using Unix-format paths and employing the Java file pathname building and dissection servers over brute-force string operations on file paths.
     
    Michael Katich
    Greenhorn
    Posts: 10
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    Hey guys, thanks for the replies! And thanks for the Ranch welcome. I've been here a bunch of times but never asked anything until now.

    That is some good information, though it's kind of peripheral to what I was looking for.

    I would like my program to cover all data areas - meaning not necessarily within a file system. That way deleted data is processed as well. My program does this now, but just for one user-specified drive at a time. I'd like it to work across all drives in the local machine.

    Tim Holloway wrote:The first thing to determine is what a "physical drive" really is.


    Well, for my purposes, I'm talking about running a Java program in Windows that has a File object with path \\.\PhysicalDrive0 for example, and the program reads that with BufferedInputStream/FileInputStream. That's what I'm doing and it works well. I don't want to read only current files that are visible to the user.

    Something like this-

    The "/C" in a Java file path is specifically for Java. It normalises the syntax to be Unix-like as well as establishes "C:" to be an actual disk mount.


    I just experimented doing this with the following drive letter Strings for the File object creation:
  • C: -> FileNotFoundException (Access is denied) when performing FileInputStream.open(). File.isDirectory() returns true.
  • C:\\ -> FileNotFoundException (The system cannot find the path specified) when performing FileInputStream.open(). File.isDirectory() returns true.
  • /C -> FileNotFoundException (The system cannot find the path specified) when performing FileInputStream.open(). File.exists() return false and so do isFile() and isDirectory().
  • /C: -> FileNotFoundException (Access is denied) when performing FileInputStream.open(). File.isDirectory() returns true.

  • By the way, I did also notice that / or \\ can be used. That's good to know the JVM is converting them.

    But in summary, I would like to read all drives and all data - not just files currently visible to users. Currently it can read individually specified drives such as \\.\PhysicalDrive0, all data. If I knew what drives existed in the local machine running the program, then I could do the same over all drives. Is there a good way to determine this?
     
    Campbell Ritchie
    Marshal
    Posts: 75874
    361
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    There must be a way to walk your filesystem using the more modern datatypes in this Java™ Tutorials section. Maybe the subsections about walking the file tree, and creating and reading directories will help you.
    Many people prefer always to use / as a path separator nd let the JVM convert it to \ on Windows®.
     
    Michael Katich
    Greenhorn
    Posts: 10
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    I could walk the file system, but that would disregard data that's not in current files, such as deleted data. When I say "raw data", I'm referring to data that could be in a current file that the file system is aware of, but also data that could be outside the file system. They are treated the same for this purpose. The files don't matter, just the underlying data does.
     
    Tim Holloway
    Saloon Keeper
    Posts: 25653
    183
    Android Eclipse IDE Tomcat Server Redhat Java Linux
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    You have your work cut out for you.

    Originally, a physical disk had a boot sector(s), a partition table, and partitions. Now they can alternatively have boot, UEFI partition information and partitions. If memory serves, unlike the original DOS partition table, UEFI partition information isn't all in one place (and I'm not sure that extended DOS partition definitions were either).

    Then there's the matter of filesystems. There are TONs of filesystems and they're all different. For DOS/Windows, you had FAT, FAT32, NTFS, HFS - the OS/2 version of NTFS, because Microsoft changed NTFS just to screw IBM. For Linux/Unix, filesystem options include ext2/ext3/ext4, xfs, btrfs, ReiserFS, overlay filesystems (popular for containerization and virtualization), and those are just the ones likely to be seen today. Apple has its own filesystem.

    And swap partitions. Which usually don't contain a formal filesystem of any kind.

    And every one of them has different ways of deleting files. Some of which nuke the deleted file data, some of which simply erase the directory entry or inode (Unix/Linux) and all of which have different ways to allocate data spaces.

    Oh yes, and then there are the low-level block managers like LVS that remap the physical blocks on a drive to make it easier to resize and mirror disk partitions. And journals that log disk activity.

    If you intended to do this as a general-purpose utility you could make a career of it.

    However, let's assume that you only care about Windows. Still, there are the legacy FAT file systems, the partition table (legacy or UEFI) and NTFS. And the icing on the cake. NTFS is NOT an open-source filesystem. Microsoft can and does (note present tense!) change it on a whim and not tell anyone what they did. Many foreign OS filesystem handlers that have NTFS drivers only support NTFS as read-only/use at own risk.

    So much fun awaits! Good Luck!
     
    Campbell Ritchie
    Marshal
    Posts: 75874
    361
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    Michael Katich wrote:. . . that would disregard data that's not in current files, such as deleted data. . . . .

    Yes, that does make it more complicated.
     
    Michael Katich
    Greenhorn
    Posts: 10
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    Let's try taking a step back. You guys are answering a question that I did not ask.

    Thankfully I do not need to reconstruct file system/partition table/boot sector structures. Those can be completely disregarded as long as there is access to the raw data. I just want to find, for example, all instances of a particular 70 byte sequence that matches a set of criteria. That is it. It does not matter if it's in a current file or not, if it was deleted, if it's in a partition. I don't have a question about that part.

    I'm currently using File("\\.\PhysicalDrive0") to do just that on a single drive and it's working as needed under Windows. I've used it on drives that have both NTFS and FAT32 partitions and unallocated space. My program runs through all the data of a drive in one shot. It works great.

    This may work currently subject to the whim of Microsoft. I understand that, but I know this (meaning File("\\.\PhysicalDrive0")) has worked for quite a long time. I see it mentioned other places. I believe in Linux I could use File("/dev/sda") for the same effect, so there is another option if needed.

    What do I want some help with? Well I'd like to run my program over all drives and currently it only works on one user-specified drive. How do I do this?

    By the way I do have a career related to this kind of thing I haven't yet had to recreate the reading operations of a file system but that could possibly happen at some point. I do have other projects that walk the file system.
     
    Sheriff
    Posts: 27235
    87
    Eclipse IDE Firefox Browser MySQL Database
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    Tim Holloway wrote:Now whether or not any MS/DOS machines were ever built with 2 floppies, I can't recall, but they were not uncommon on CP/M — the predecessor to MS/DOS-PC/DOS. Back before hard disks were an option.



    I'm fairly sure I worked on a MS DOS box with two floppies. And I never worked with CP/M. But that was rarely a useful configuration for businesses, the hard drive was what made the system practical even if it was only 5 MB.
     
    Tim Holloway
    Saloon Keeper
    Posts: 25653
    183
    Android Eclipse IDE Tomcat Server Redhat Java Linux
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    For Windows, you probably need to be looking at its Storage Management Subsystem. It may have a ReST API, at least in Windows 11.

    Yes, in Linux you can access any physical disk at the raw byte level if you know its /dev filename. There are a number of utilities that can assist in finding that, in addition to the pseudo-filesystems like /proc

    One caveat, however. You are not guaranteed a hit for your match sequence if you do raw access. Pretty much every filesystem - even the venerable FAT - allocates files in terms of logical sectors. All of the physical sectors in a logical sector will be contiguous, but the set of logical sectors for a given file can be splattered in random - often dynamically relocated - regions within the filesystem's partition. In the event that your magic byte sequence ends up crossing logical sector boundaries, you may not get a match.
     
    Master Rancher
    Posts: 245
    9
    • Likes 1
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    Michael Katich wrote:Let's try taking a step back. You guys are answering a question that I did not ask.



    How do I put this politely?

    Because you try to abuse Java for something it's kust not designed for!

    Just because it is possible to access a device in a RAW way like /dev/sda or its windows equivalent doesn't mean it should be done in this way. There's a reason file systems were invented and to at least some degree represent data in a somewhat unified way to an os thru the various drivers no matter the actual on-disk format.

    Sure - you MAY be able to piece together a file split across multiple physical and logical entities - but the way it's meant to be read is by its proper driver like ZFS (just as an example).

    So, your question in simple terms is: How to iterate over the list of devices local connected to the system. Well, as there's no such pseudo-paths as /proc, /dev or /sys on windows you have to use the WIN32 api - which only can do via JNI.

    Java is just not meant to do what you like it to do - so to get it fone you have to use additional external endpoints - which on windows always means to interact with the win32 api in some way. So, unless you want to dive into jni and write your own native code - or you're maybe lucky to find what you need within JNA - you're pretty much out of luck other than reky on user input or try bruteforce.

    You also may exploit inter-process communication by calling "wmic diskdrice list brief" and read back its output - but if you go this route you're doing it way wrong.

    In addition to that: I don't think the way you want to scan a drive support random access - so you rely on the sequential speed of the drive - which fir HDDs tops at around 250 mb/s ... reading multiple TB will take quite some time. It also heavily stresses the drive and relies on firmware error detection and correction - you waste any benefits of modern filesystems like integrity check or fault resilience thru redundant shadow copies.

    There're many tools out there way better suited for this task - Java just isn't one of them.
     
    Michael Katich
    Greenhorn
    Posts: 10
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    Tim Holloway wrote:One caveat, however. You are not guaranteed a hit for your match sequence if you do raw access. Pretty much every filesystem - even the venerable FAT - allocates files in terms of logical sectors. All of the physical sectors in a logical sector will be contiguous, but the set of logical sectors for a given file can be splattered in random - often dynamically relocated - regions within the filesystem's partition. In the event that your magic byte sequence ends up crossing logical sector boundaries, you may not get a match.


    Thank you, good point, and yes I am aware of that. Modern hard drives have a sector size of 4 KB now. Before that it was 512 bytes. I believe that's true for SSDs also. So they technically do not read or write smaller chunks of data. It's definitely possible for such a small byte sequence to be split by fragmentation, though on average it wouldn't be. If it is, that's the way it goes. I am looking at unallocated/deleted data as well, so definitely a part of this process is unpredictable and depends on factors beyond my control.

    Matthew Bendford wrote:Just because it is possible to access a device in a RAW way like /dev/sda or its windows equivalent doesn't mean it should be done in this way.


    Matthew Bendford wrote:So, your question in simple terms is: How to iterate over the list of devices local connected to the system. Well, as there's no such pseudo-paths as /proc, /dev or /sys on windows you have to use the WIN32 api - which only can do via JNI.


    Matthew, thanks for the reply. I appreciate the insight.

    When I asked "is there any way to do this", I definitely considered that "no", or "not very well" could be the answer. I honestly didn't know and didn't find other info out there so that's why I was asking. This is the discussion I wanted to have, and what I wanted to learn more about.

    Matthew Bendford wrote:So, unless you want to dive into jni and write your own native code - or you're maybe lucky to find what you need within JNA - you're pretty much out of luck other than reky on user input or try bruteforce.


    I do think I saw some software before, I think it was written in Java, and it detected drives before performing other operations. It had a displayed log and it was going down a list of like 100 drives or whatever it was it tried. So it could have been using the trial-and-error method, or brute force like you say. This isn't a problem and I can fall back on that for my project. If using JNI, native code, etc is a big barrier just for this one purpose then I will avoid it. I am unfamiliar with that, so this is part of my question. Doing trial-and-error on drives will not take much time, both for coding and for the program to actually do the checks. For something that will run over the entirety of large drives, drive detection is not a significant chunk of time (which brings me to your other point). But trial-and-error drive detection is definitely not elegant, and that's why I posted here.

    Matthew Bendford wrote:In addition to that: I don't think the way you want to scan a drive support random access - so you rely on the sequential speed of the drive - which fir HDDs tops at around 250 mb/s ... reading multiple TB will take quite some time.


    Yes, random access is not actually useful for this project's purpose. The bytes I'd like to find could be located anywhere so I wouldn't need it to jump to a specific location? And from what I understand, random access performs a bit slower than sequential. This program has to actually do checks that move only one byte at a time to look for a matching byte sequence. I have worked in a way to read larger chunks for better performance, but the checks are actually done with a "window" that moves one byte at a time. If I want I can also build in a way to specify a beginning and ending offset to optionally limit the data. That would then use skip() on the BufferedInputStream - it seems quick enough.

    Lots of things that read or copy entire drives can take a long time. That is the way it is and accepted.

    Matthew Bendford wrote:There're many tools out there way better suited for this task - Java just isn't one of them.


    Well it's working great for what I have. I guess it's just not suited well for dealing with physical drives and detecting them. That is completely understandable because of the nature of Java, the JVM running on top of the operating system and all that. I understand. Java doesn't have nuts and bolts access to hardware. If it turned out there was no way to access physical drives (due to a change from MS or something) then I would just make a drive image file first (with a separate off-the-shelf tool) before running my program on that file, or use it in Linux instead. But it's nice there is a way to do that.
     
    Tim Holloway
    Saloon Keeper
    Posts: 25653
    183
    Android Eclipse IDE Tomcat Server Redhat Java Linux
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    As of Windows 8, so I understand, the administration of persistent storage is done by the Storage Management Subsystem. Apparently, storage providers (disk drives, network drives, RAID arrays, possibly USB sticks) all hook into it to be given a standard management API.

    As of Windows 11, there is the hint that in addition to whatever Windows DLL or COM service interfaces with Storage Management it might be possible to talk to Storage Management via a ReST HTTP call. That is, as a local web service. A lot of products favor ReST for such things these days.

    So in that event you could talk to it using basic Java web client code and not need JNI or other third-party libraries. Not true "write-once/run-anywhere", since it only works when the target is a Windows machine, but at least avoiding anything exotic.
     
    Matthew Bendford
    Master Rancher
    Posts: 245
    9
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    @OP
    My point is this: Due to file system drivers data may not end up in the way they're represented thru the driver.

    Example: A plain text file contain "Hello World" MAY end up with exact that binary sequence on disk - but thanks to character encoding, big endian vs little endian and metadata I highly doubt that is actually the case.

    In other words: You mentioned the example of finding a specific 70 byte sequence. Question: Is this 70 byte sequence the actual raw binary on-disk-format data - or is it the logical information as transformed by the driver and encodings and such?

    Unless you search for the actual on-disk format data - which already requires knowledge of how the filesystem driver does it's conversion and maybe interlaces it with metadata, error detection and all that - you won't find "Hello World" in the RAW stream.

    I don't say it's impossible - but it depends on what kind of data you're looking for and if it's the raw on-disk format or how the several encodings involved represent it to the OS.

    As for random access: Most partition tools set up GPT disks by using sector 0-33 for the gpt table and start the first partition on sector 2048. So there's a huge gap between sectors 34 and 2047 just to be expected empty. Yes, this area can be used to store data like a 2nd stage bootloader or such - but that's not for the average user.

    Just out of pure personal curiosity: What's the kind of data you try to find on the raw binary stream? Some crypto key material? Some "deleted" data? At least for the latter there're countless recovery tools out there. As the for the crypto stuff: I don't see that gonna work.
     
    Tim Holloway
    Saloon Keeper
    Posts: 25653
    183
    Android Eclipse IDE Tomcat Server Redhat Java Linux
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    Matthew Bendford wrote:@OP
    My point is this: Due to file system drivers data may not end up in the way they're represented thru the driver.

    Example: A plain text file contain "Hello World" MAY end up with exact that binary sequence on disk - but thanks to character encoding, big endian vs little endian and metadata I highly doubt that is actually the case.



    I don't think so. File system drivers tend to copy data verbatim. Stuff like byte-swapping usually happens at a higher level and almost never on strings and binary (BYTE) sequences. About the worst that could happen would be that the data would be compressed and/or encrypted. Or in extreme cases, encoded for an unexpected code page (e.g., EBCDIC instead of ASCII). But as far as I know, even Unicode isn't byte-swapped. And in any event, as I said, that stuff happens before the data gets sent to the disk driver.
     
    Michael Katich
    Greenhorn
    Posts: 10
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    Believe it or not, I look at raw data in a hex editor quite often. This can be done. Like I said, my program already works. I’ve been using example data when I test. My question wasn’t about that, but I’m happy to talk about it a little more. I’ll do that

    Save “Hello World” to a plain ASCII text file (Windows notepad ANSI or UTF-8), and “Hello World” shows up in the raw data. Save it as a UTF-16 text file and then it’s still in raw data except now it uses 2 bytes per character instead of 1. If your digital camera model is a “Hello World 900”, then you’ll see “Hello World 900” in the EXIF data of digital photo files taken by that camera. Not everything is ASCII, but that gives you an idea. I have seen EBCDIC before but not much. If you have an interest, give it a try. Now if there is encryption or compression, that’s a different story, and the data would truly be stored differently.

    You guessed it, encryption keys are the main target in my example - a few different kinds which can be in just raw binary data or ASCII text. But it would work on any other types of small sequences of data provided there is a restrictive enough definition to limit false positives. The ability to find physical drives would be useful for other projects as well.

    I am aware of existing deleted file recovery tools, how they work, their effectiveness. I won’t try to recreate something if there is an off-the-shelf tool that already does it well. They are quite useful, but they don’t do everything. They only work on some file types in which there is a good heuristic of identification and the beginning and end of the file (known as file “carving”). What about corruption and partial files? When all I need is a 70 byte sequence that matches some criteria, requiring whole intact files (deleted or not) is unnecessary.

    Your mention of big-endian vs little-endian reminds me, I wrote software (also in Java) to recover data from corrupt SQLite databases - holding text messages from Android phones. The output was a CSV file. I did have to deal with multiple data formats for the various table columns, timestamps, ASCII text and numbers, etc. The timestamps were actually little-endian seconds since the epoch IIRC. I believe this was just what SQLite did, nothing more. I wrote code to convert either big or little-endian seconds or milliseconds to human-readable Strings. But back to SQLite - there might be a tool out there that can recover whole deleted SQLite database files, but if it’s corrupt or just a fragment, then what? You can’t open it in SQLite. It’s a case where some data was better than none.

    I am fully aware that the people that designed the systems on which we store data did not have this in mind. Particularly for deleted or corrupt files - stuff is broken. Once it gets to that point, the file system does not have an answer. You said that I try to abuse Java for something it’s not designed for. Well it works well enough processing data and that’s mainly what I use it for. The detection of physical drives would have been a nice extra that may not be possible but I didn’t know enough and didn’t find much info… so I asked.

    “Yes, in Linux you can access any physical disk at the raw byte level if you know its /dev filename.”

    This is great. So if MS ever removes that ability, I could always move to Linux. Chances are that wouldn’t be removed from Linux, right?
     
    Tim Holloway
    Saloon Keeper
    Posts: 25653
    183
    Android Eclipse IDE Tomcat Server Redhat Java Linux
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    The main thing about attempting to search by raw blocks rather than with knowledge of the filesystem is indeed the possibility that you'll miss something because it spanned non-contiguous logical sectors. Technically, you could potentially also get false positives when two contiguous logical sectors happen to match up, but that's a pretty low probability.

    You can, of course, do what you want for your own personal benefit. Just expect that it you attempt to commercialize it that the "gotchas" will result in a LOT of customer support calls.

    As far as SQLite goes, it's tantamount to recovering FoxPro database files. SQLite, as per its name, doesn't resort to fancy data optimization and compression tricks, so a file undelete mixed with some creative repair (or at least extraction) will probably do well. Something like IBM's DB2 is another matter. It's not open source, and the binary structure of DB/2 databases is such that DB2 database file on z/OS, iSeries and eSeries (Windows/Linux) are all completely unintelligible to each other. And, of course, DB2 is far more likely to have elaborate data-distorting schemes to compact and optimize access speeds. They've been doing that for a very long time and over the years I've seen some of the tricks they used in pre-DB2 products.

    There are certain things that are very unlikely to change in Linux, or indeed in Unix (including MacOS). The /proc and /dev filesystems are among them. Probably also /etc, but you never know what Leonard Poettering may spring next. Knowing where to look under /dev is much less predictable. Linux doesn't have an official Storage Manager subsystem, but one of the places that the commands that look for offline disk and disk-like resources is the dynamic device manager (udev). Other places include under /proc (exact location may vary) and under /sys.

    Just as an aside, the /etc directory serves in much the same capacity for Unix as the Windows Registry does. With the welcome difference that you can easily do partial backup and restore operations using simple file utilities. Originally in Unix, some /etc files contained binary data. Linux has always preferred text only, with a few rogue exceptions.
     
    Michael Katich
    Greenhorn
    Posts: 10
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    Tim Holloway wrote:The main thing about attempting to search by raw blocks rather than with knowledge of the filesystem is indeed the possibility that you'll miss something because it spanned non-contiguous logical sectors. Technically, you could potentially also get false positives when two contiguous logical sectors happen to match up, but that's a pretty low probability.


    Good points. I guess I could just have it also search through current files in the file system if I wanted and that would prevent the misses you described. It could be optional maybe. That would just cost a bit more time developing and a bit more run-time. I don't think it's needed, but I could do a lot more testing to see if any "planted" matches get missed for these reasons. As for false positives from that, we'll see how that goes in usage but I also suspect it won't be much of a problem.

    Tim Holloway wrote:You can, of course, do what you want for your own personal benefit. Just expect that it you attempt to commercialize it that the "gotchas" will result in a LOT of customer support calls.


    This definitely wouldn't be put out for consumer use and no guarantees will be given. The nature of raw data and deleted data is that there is uncertainty and unpredictability.

    Tim Holloway wrote:As far as SQLite goes, it's tantamount to recovering FoxPro database files. SQLite, as per its name, doesn't resort to fancy data optimization and compression tricks, so a file undelete mixed with some creative repair (or at least extraction) will probably do well. Something like IBM's DB2 is another matter. It's not open source, and the binary structure of DB/2 databases is such that DB2 database file on z/OS, iSeries and eSeries (Windows/Linux) are all completely unintelligible to each other. And, of course, DB2 is far more likely to have elaborate data-distorting schemes to compact and optimize access speeds. They've been doing that for a very long time and over the years I've seen some of the tricks they used in pre-DB2 products.


    That SQLite work ended up with finding ways to carve table rows which was based on the mostly easily identified and reliable markers in raw data. I didn't use a deleted file recovery tool first - who knows how well it would have done or if it would have cut something off. The end result gave something that produced the largest possible output (with a subset of useful columns) based on input of varying quality. The key here is that useful data was returned even if there wasn't much left of the database or it was badly corrupted. I don't know DB2 well, but of course this process would not work for many other types of databases.

    Tim Holloway wrote:There are certain things that are very unlikely to change in Linux, or indeed in Unix (including MacOS). The /proc and /dev filesystems are among them. Probably also /etc, but you never know what Leonard Poettering may spring next. Knowing where to look under /dev is much less predictable. Linux doesn't have an official Storage Manager subsystem, but one of the places that the commands that look for offline disk and disk-like resources is the dynamic device manager (udev). Other places include under /proc (exact location may vary) and under /sys.


    Thanks for the insight about this, and from your previous reply about Windows.
     
    Tim Holloway
    Saloon Keeper
    Posts: 25653
    183
    Android Eclipse IDE Tomcat Server Redhat Java Linux
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    One of the big benefits of un-delete tools is that they restore the deleted data back into the filesystem, which protects it from further corruption. In multi-tasking OS's with lots of activity there's always the risk that some task might demand disk space that gets resolved from the deleted data area.

    So un-deleting helps prevent that. Along with that comes the strategy that one should immediately take the damaged filesystem offline or lock it for allocation (depending on what the filesystem allows) if at all possible.

    FAT filesystems basically just nuked the filename directory entry as I recall, and a single character replacement could restore the file (again, presuming no one had demanded any of the freed file space). In Unix/Linux, typically the inode entry was deleted, leaving the data, so you could simply create a new inode. Some file systems have a journalling system that can help. And of course many systems have some sort of "trashcan" or other system such as a "time machine" that keeps a file easily recoverable for greater or lesser periods of time.
     
    Michael Katich
    Greenhorn
    Posts: 10
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    Campbell Ritchie wrote:MK: Please don't edit posts after they have been replied to; any changes won't be reflected in the answers and that causes confusion about what we read. I am refusing the changes. Please post the new information as a new post.


    Ok sure thing. They were very small edits by the way. I just changed a quote to use the actual forum quote formatting, and changed "human-readable Strings" to "human-readable date-time Strings". I figured those didn't warrant mention in a new post.

    Campbell Ritchie wrote:One of the big benefits of un-delete tools is that they restore the deleted data back into the filesystem, which protects it from further corruption. In multi-tasking OS's with lots of activity there's always the risk that some task might demand disk space that gets resolved from the deleted data area.


    That is the whole point of an un-delete tool, right? There is no point leaving data you want in "free space". You are definitely right about continued activity overwriting data from "free space". The next write could be over top of data you want, and then it's gone for good. Also, SSDs for example will constantly overwrite "free space" blocks through their TRIM command to reclaim blocks, so you'll find relatively little latent data compared to a hard disk. They do that just because writing to an "empty" block of NAND flash memory is faster than writing to a "non-empty" one.

    Campbell Ritchie wrote:So un-deleting helps prevent that. Along with that comes the strategy that one should immediately take the damaged filesystem offline or lock it for allocation (depending on what the filesystem allows) if at all possible.


    I agree. I recommend people shut off and not use storage devices that may have needed data outside the file system. But of course most people don't know this until they start learning about data loss and data recovery. Making a clone as soon as possible and working from that is the way to go. You always have the clone to fall back on then. If you use a deleted file recovery tool, don't install it on the storage in question, and don't set output to there either. There are also tools that include write-blockers for accessing storage devices while preventing writes. And if verifiability is required, a hash can be made of the entire storage. That way it can be redone at a later date and matched to verify nothing was altered in between. For best chances of recovery, use a professional service.
     
    Tim Holloway
    Saloon Keeper
    Posts: 25653
    183
    Android Eclipse IDE Tomcat Server Redhat Java Linux
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    Incidentally, I backup all critical files at 4AM each morning and my fileshare drives are mirrored on 3 separate machines.

    Recovery is all well and good, but not having to recover is even better.

    In addition to nightly backups, really volatile stuff like programming projects get local git commits, which themselves are a type of backup.

    Some databases, incidentally, keep logs that can be used to roll back or recover transactions.
     
    Michael Katich
    Greenhorn
    Posts: 10
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    Absolutely – you should back up if you value your data.
     
    Tim Holloway
    Saloon Keeper
    Posts: 25653
    183
    Android Eclipse IDE Tomcat Server Redhat Java Linux
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    Michael Katich wrote:Absolutely – you should back up if you value your data.

    My server farm is not very large, but I can count on one or 2 drives to fail each year. Backups aren't just a luxury or a "bother".

    If it's important, back it up. If you don't, sooner or later you'll wish you had.
     
    Matthew Bendford
    Master Rancher
    Posts: 245
    9
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    Back to the main question: Is there a smarter way to discover any storage by Java other then brute force on Windows?
    One idea that comes to my mind from the ZFSonWindows quickstart: wmic diskdrive list brief
    There're maybe others like directly get some sort of lowlevel FileDescriptor - but as said: That's locked away behind the win32api and requires native code - at least in some form like jna.
    I used jna before to play around with some basic of the win api - it's even powerful enough to get around security and install a system wide keylogger - so you might able to interact with the storage subsystem. But even on linux it would just by a simple iteration of either /dev/sdX or using the meta data in like /sys or /proc.
     
    Tim Holloway
    Saloon Keeper
    Posts: 25653
    183
    Android Eclipse IDE Tomcat Server Redhat Java Linux
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    "wmic" is not a Java function, it's an external command. Technically cheating, and if you're going to use wmic, might as well use ALL available external commands.

    Java absolutely cannot see any non-mounted storage via API methods. The best you can do is look for pseudo-files, like /proc and /dev.

    Not all storage in Linux is /dev/sdX, though. That originally indicated SCSI drives and was later re-purposed for SATA. The IDE disks (and some SATA) were /dev/hdX, CD/DVD devices typically show up under /dev/sgX (often aliased to things like /dev/cdrom, /dev/dvd0, etc.). And of course floppies: /dev/fdX.

    And then there are USB storage devices. Not everything under /dev/usb is a storage device; some are mice, keyboards and other things.

    Java is my first-choice programming language for most projects. But for something like this, I'd be looking elsewhere.
     
    Campbell Ritchie
    Marshal
    Posts: 75874
    361
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    Michael Katich wrote:. . .

    Campbell Ritchie wrote:One of the big benefits of un-delete tools . . .

    . . .

    Campbell Ritchie wrote:So un-deleting helps prevent that. . . .

    . . . .

    Careful, please. I don't think I said that; I think it was somebody else.
     
    Matthew Bendford
    Master Rancher
    Posts: 245
    9
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    Tim Holloway wrote:"wmic" is not a Java function, it's an external command. Technically cheating, and if you're going to use wmic, might as well use ALL available external commands.

    Java absolutely cannot see any non-mounted storage via API methods. The best you can do is look for pseudo-files, like /proc and /dev.

    Not all storage in Linux is /dev/sdX, though. That originally indicated SCSI drives and was later re-purposed for SATA. The IDE disks (and some SATA) were /dev/hdX, CD/DVD devices typically show up under /dev/sgX (often aliased to things like /dev/cdrom, /dev/dvd0, etc.). And of course floppies: /dev/fdX.

    And then there are USB storage devices. Not everything under /dev/usb is a storage device; some are mice, keyboards and other things.

    Java is my first-choice programming language for most projects. But for something like this, I'd be looking elsewhere.


    I'm well aware that using any external command is kind of "cheating" - but so is the "raw" access OP already mentioned by using the rather cryptic "\\.\PHYSICALDISKn" "path" - which just leverages the underlying drivers.
    In fact, using external commands has another two sides to it: a) it's somewhat the linux way: "a tool is meant to do one thing only but do it good" (a bit like how classes should designed) and b) it's just another way of accessing the win32api - but instead of some "internal kernel magic" it's calling a sub-process and readings its output.
    As for accessing devices on linux: I'm also aware of that - no need to point it out - but it's the same "bruteforce" or "cheating" as what OP tries to avoid: Use some pattern and iterate through what it returns until it causes an error. There's no difference in going thru /dev/sd[a-z] than \\.\PHYSICALDISK[1-9] - it's just brute force until the next iteration throws an error because there're only so many devices.
    Using wmic is about as calling (new File("/dev")).listFiles((f,s) -> s.matches("sd[a-z]")); - it's getting a list upfront to iterate over instead of bruteforce until an exception is raised.
    So, to wrap back around: In fact there're exactly these two ways:
    1) using \\.\PHYSICALDRIVEn as template and scan for anything starting from 0 up until n+1 no longer exists
    2) using SOME method of many available by getting a list containing every target devices and iterate over it - and just avoid the exception for n+1 because we know what n is
    Which of these two ways OP wants to go comes from the original question:

    One method definitely crossed my mind. I could just try reading data on \\.\PhysicalDrive0, \\.\PhysicalDrive1, \\.\PhysicalDrive2, .... \\.\PhysicalDrive55 or something like that to detect drives. If some bytes are actually read, then those physical drives exist. As far as I know, attempting to read data is the only way to know whether a drive exists (apart from JNI).

    But that's really not a good way to do it and probably a last resort. Is there a better way?


    So, OP seem to look for ANY possible solution to 2 - which WMIC is just one, using JNI/JNA and ask the win32API another - using REST to talk to the storage subsystem is a third one (which by the way doesn't differ much from just calling WMIC) ... anything else I mentioned is just either comparing to linux or questioning the reason behind. If OP will find the crypto stuff one's looking for? I wouldn't go far as to scrap the entire disk - as there's obvious at least some knowledge of part of what's to be searched.

    To get off-topic: In the GTA5 modding scene it's common to share around a list of hashes and a few informations how to extract the crypto keys from the executeable to access the assets and modify them - as sharing around the keys would be copyright infrigement. So scanning a bunch of bytes of known length and compare their hash to a known value to find some crypto keys and related information seem to have its usecase - but I'm sure OP is not after such infromation just to avoid possible legal actions. I personal still doubt that Java is a good tool to approach the problem - but others shown me to utilized PHP to come up with rather complex programs I would had struggle to implement in other languages - the issue here just is that java has not the capabilities out of the box for OPs specific question.
     
    Tim Holloway
    Saloon Keeper
    Posts: 25653
    183
    Android Eclipse IDE Tomcat Server Redhat Java Linux
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    Actually, invoking an external program in a Linux app is sort of a last resort. The preferred way would be to pipe between programs.

    But Java also discourages invoking external programs because it's a violation of "write once/run anywhere".

    Then again, as noted, Java isn't the ideal platform for this sort of shenanigans anywhere.

    On detecting storage devices in Linux, I've already mentioned the hazards of just randomly running through /dev. Some of those things can bite you and as time progresses, I'll not warrant that you'll find everything under /dev anyway. After all, network devices don't live there.

    There are several native Linux commands that are better for sussing out storage devices, just as wmic is better for looking for Windows devices. I do recommend piping their output rather than "exec"ing them, but that's a personal preference.
     
    Michael Katich
    Greenhorn
    Posts: 10
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    Campbell Ritchie wrote:Careful, please. I don't think I said that; I think it was somebody else.


    Ah, sorry about that! I guess I was copying the quote brackets when writing my response so I could multi-quote and got careless and forgot to go back and adjust those. I would edit it to fix that, but will the change be allowed?

    Matthew and Tim, I appreciate the continued discussion. I read through your latest comments and they have summarized things well.

    I think I will go forward with the trial-and-error method with \\.\PhysicalDrive1, etc. That way the whole thing can stay as plain-old Java. It can be write once, run anywhere. I could end up making an executable, but still, it's less complicated regarding maintenance, distribution. It could check the OS and can use /dev/sda1 for Linux (or equivalent for Mac OS). There basically isn't any performance penalty relative to what this program will be doing, scanning entire drives and image files.

    Many thanks! I was just looking for possible alternatives. I had not seen this question explored before, and now I know a lot more perspective around it.

     
    Campbell Ritchie
    Marshal
    Posts: 75874
    361
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    Michael Katich wrote:

    Campbell Ritchie wrote:Careful, please. . . . .

    Ah, sorry about that! . . . .

    Apology accepted

    Sorry for delay in replying; I have been away.
     
    reply
      Bookmark Topic Watch Topic
    • New Topic