I have a legacy COBOL system that generates some variable blocked files (DSORG=VBA), which I need to process. Can this even be done in native Java (11), or is there a 3rd-party tool that I can purchase to satisfy my requirements? I am completely lost. Any insight would be greatly appreciated.
You probably can, but I don't know myself. Please supply more information. Details of the format of the file would be useful.
We haven't got a dedicated COBOL forum, so I moved you to “other languages”. If you can write COBOL, you are unusual; I have heard there is a shortage of COBOL programmer who can maintain old bank systems, which enhances the salary they can earn.
I don't think I understand what other details of the file format you want. The file can have 2 types of records; one could be 800 bytes while the other 400 bytes. The first 4 bytes of each record contain the length of the record in binary format. What else would you like to know?
Do you have a layout definition for what is in the blocks? The fields could be text, packed, binary or other.
It shouldn't be too hard to write java code to read the blocks and break out the records as a byte array for starters.
NB My wife and I both coded COBOL in the early 70s.
The COBOL system is riddled with them. I was asking in general. Other than the binary field at the beginning of each record, all the fields are, for the most part, text with some packed decimal to add grief to an already depressing problem. I could get you a real layout if you think it would help, but I'm not asking someone to code a solution but tell me how to do it.
Read the records into a byte array and convert the different fields as per their expected datatype. The 4 byte length fields would be a binary value. Then use that length to pick out the record's bytes.
You can do all this via brute force in user written Java code. However, you'll probably prefer to use a ready-mixed solution and that's an ETL (Extract, Transform, Load) tool. There are several to choose from both free and with commercial support. One of the most popular historically was Talend. Another one is Pentaho DI from Hitachi. Pentaho is the one I'm most familiar with - I have contributed source code modifications to it in fact but it itself is a high-performance and very flexible app written in Java. Talend is (I think!) also a Java system, but I'm not very familiar with it or with any of the other products that might be out there.
You have 2 issues.
Fist, you have to pull down the VBA data records. Pentaho can actually FTP into a mainframe and do that itself as part of the ETL pipeline. VBA as stored on disk/tape consists of a binary block-length header field followed by 0 or more records, each of which is headed by a binary logical record length (LRECL) field. So de-blocking may be an essential first step. Depends on what the IBM FTP server will do for you.
Once you have the data broken out into records, the real func begins because chances are that the record in question may have text AND binary data in it, and both require further processing. Fields defines as CHARACTER would have to be converted from EBCDIC to ASCII, and thence to Java String or Character objects. Unless they're intended to be binary indicators (for example, "Y" for yes/true, "N" for no/false). You may have a binary length prefix on CHAR VARYING to deal with as well.
Then there's the numeric stuff. Which may be characters, but is more likely to be COMPUTATIONAL, COMP-2 or COMP-3. Computational is straight binary, but it's bitwise-continuous (big-endian) not bitwise-discontinuous (little-endian) as Intel processors use. COMP-2 is, of course, floating-point, but in addition to byte-order considerations, the original IBM floating-point binary representaion was completely different from the IEEE format used by Java. Although newer IBM mainframes added IEEE as an option.
And then there's good old COMP-3. This is actually pretty easy, since it's just BCD, 2 binary decimal digits per byte except for the last byte, which contains the sign.
Which is why an ETL tool can be so handy. Rather than hand-coding custom field transformations for a multitude of complex records you can use the GUI editor to string together processing blocks and build an easily-maintainable transformation profile.
One thing to note, however, the processing model for Pentaho DI (Kettle) at least involves a parallel set of extracted data columns running down the pipe, Conversions don't actually transform, they create new data colums in the desired form while the original column also remains in the stream. There is a certain mindset to it.
Sources may include data from the Fakebook Research Foundation with support from Gargle University
Anything worth doing well is worth doing poorly first. Just look at this tiny ad: