Granny's Programming Pearls
"inside of every large program is a small program struggling to get out"
JavaRanch.com/granny.jsp
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Devaka Cooray
  • Liutauras Vilda
  • Jeanne Boyarsky
  • Bear Bibeault
Sheriffs:
  • Paul Clapham
  • Knute Snortum
  • Rob Spoor
Saloon Keepers:
  • Tim Moores
  • Ron McLeod
  • Piet Souris
  • Stephan van Hulst
  • Carey Brown
Bartenders:
  • Tim Holloway
  • Frits Walraven
  • Ganesh Patekar

COMP-3 data unpacking in Java (Embedded in Pentaho)

 
Greenhorn
Posts: 3
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
We are facing a challenge in reading the COMP-3 data in Java embedded inside Pentaho ETL. There are few Float values stored as packed decimals in a flat file along with other plain text. While the plain texts are getting read properly, we tried using Charset.forName("CP500");, but it never worked. We still get junk characters. Since Pentaho scripts doesn't support COMP-3, in their forums they suggested to go with User Defined Java class. Could anyone help us if you have come across and solved such?

Regards
 
Bartender
Posts: 20940
127
Android Eclipse IDE Tomcat Server Redhat Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Welcome to the JavaRanch, Rust! I'm not sure how you managed to post this question to the Wiki, but I moved it to a forum more appropriate.

I'm intimately familiar with both Pentaho DI - if you look at the source code, you'll find my name on some of the comments - and IBM mainframes and yes, I was dealing with COMP-3 data in Java years before I'd even heard of Pentaho.

An IBM COBOL COMPUTATIONAL-3 number is a signed binary-coded decimal entity. It consists of pairs of BCD digits packed 2 per byte, with the exception of the final byte. which has a digit in its left nybble and the sign in its right nybble.

The sign will usually be 0x0F, 0x0C or 0x0D, where F and C signify positive numbers and D and B signify negative numbers. 0x0A should also be positive, I think, but it's not normally seen. So anyway, the packed representation of "-2" would be "0x2D", the packed representation of 1234 would be 0x01 0x23 0x4C".

The "F" sign is what you get when an unsigned number converted from EBCDIC text. The "C" sign is what arithmetic and compiler constant declarations produce, so you'll often see both in the same set of data.

Because this is a BINARY data format, your Pentaho DI data source must be instructed to read those fields as BYTES, not as characters/strings. Code page settings don't apply here (although they do to ordinary EBCDIC text). You would then run them through a field converter to normalize them into Java numeric or string form.

I would have thought that there'd be a Pentaho DI plugin for that format, since it's so common, but I'll admit I don't know of one.
 
Rust Cohle
Greenhorn
Posts: 3
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks for your reply.
I tried some conversion code based on several examples. The result values did not match the expected value. Since the Data Junction tool (Which we used for data extract) supports COMP-3 datatype's encoding, it by default converts the data and we use that data for validation. What Data Junction tool does is what we are trying achieve using Java and that is the requirement. – Guru
 
Sheriff
Posts: 24594
55
Eclipse IDE Firefox Browser MySQL Database
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
So you have some sample code fragments which are given an array of bytes and fail to decipher them from COMP-3 format? Do you have some which work sometimes, or do you have some which have systematic errors? Why don't we start with code which is alleged to work rather than trying to start from scratch?

However... before that you should ensure that whatever you're using to extract the bytes from your Pentaho data actually produces the correct bytes. If that part fails then there's no point in trying to carry on.
 
Rust Cohle
Greenhorn
Posts: 3
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I haven't found a concrete proven code which is certified as "Works", especially for values with floating point.
 
Tim Holloway
Bartender
Posts: 20940
127
Android Eclipse IDE Tomcat Server Redhat Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
COMP-3 is not floating-point. It works with assumed fixed decimal points. COMPUTATIONAL-2 is floating point, and I think also COMPUTATIONAL-4 for double-precision floating-point (the exact formats of COMPUTATIONAL depend on the compiler vendor).

IBM legacy floating-point is very different than the IEEE floating-point built into Java. So not only would attempting to use a COMP-3 converter fail horribly, but also an attempted read as floating-point binary would fail to produce the proper results. zSeries mainframes have the option to work with IEEE floating-point (partly to provide IBM with hardware-level support for Java!). but I would hope that they defined a distinct COMPUTATIONAL data type for newer COBOL systems to allow with the possibility of dealing with both IEEE and legacy forms. It's been a while since I consulted a COBOL manual, so I don't know for certain.

I could provide detailed information on COMP-2 conversion, but not for free, since it's too much like work. I've never had to deal with it when doing ETL. If you're interested, though, the sordid details are covered in the IBM Principles of Operation manual.
 
Greenhorn
Posts: 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello can anyone explain me the logic which i saw in net to unpack data

import java.math.BigInteger;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;

public class PackedDecimalToComp {

   public static void main(String[] args) {

       try {
           // test.unpackData(" 0x12345s");
           Path path = Paths.get("C:\\Users\\AV00499269\\Desktop\\Comp3 data file\\Comp3Test.txt");
           byte[] data = Files.readAllBytes(path);
           PackedDecimalToComp test = new PackedDecimalToComp();
           test.unpackData(data);
       } catch (Exception ex) {
           System.out.println("Exception is :" + ex.getMessage());
       }    
   }

   private static String unpackData(byte[] packedData) {
       String unpackedData = "";

       final int negativeSign = 13;
       for (int currentCharIndex = 0; currentCharIndex < packedData.length; currentCharIndex++) {
           byte firstDigit = (byte) ((packedData[currentCharIndex] >>> 4) & 0x0F);
           byte secondDigit = (byte) (packedData[currentCharIndex] & 0x0F);
           unpackedData += String.valueOf(firstDigit);
           if (currentCharIndex == (packedData.length - 1)) {
               if (secondDigit == negativeSign) {
                   unpackedData = "-" + unpackedData;
               }
           } else {
               unpackedData += String.valueOf(secondDigit);
           }
       }
       System.out.println("Unpackeddata is :" + unpackedData);

       return unpackedData;
   }    
}
 
Tim Holloway
Bartender
Posts: 20940
127
Android Eclipse IDE Tomcat Server Redhat Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Welcome to the JavaRanch, Ashwini!

You could have started your own thread. We don't charge extra. Generally we would rather you do that than resume a thread that's been untouched for 2 years. But your question is in the same vein, so I'll leave it here.

IBM COBOL COMPUTATIONAL-3 (COMP-3) is a packed decimal number format. Just for information, the exact internal forms of the different COBOL COMPUTATIONAL number formats are vendor-specific, but hardly anyone does COBOL in non-IBM environments anymore. So we think of COMP-3 as being packed decimal as a matter of course.

Packed Decimal is a variant of the Binary Coded Decimal (BCD) format. BCD is a compact way of storing decimal number values. A BCD number packs 2 decimal digits per byte (one per nybble), so pure BCD is always an even number of decimal digits. The BCD digits have the binary values 0000-1001, which is 0-9 when converted to ASCII (or EBCDIC). So a number whose hex value is 05 21 43 corresponds to an ASCII string "052143".

However, COMP-3 is slightly different. For one thing, the total number of bytes in a COMP-3 number cannot exceed 16, since that was a hardware limitation of the IBM System/360 computer architecture and the System/360 and its descendants have actual machine language instructions for working directly with COMP-3 numbers. In contrast, Intel-compatible CPUs have some instructions that work with BCD, but not with COMP-3. And unlike the S/360, the Intel instructions only operate on one byte at a time, but gain the advantage that you can deal with more than 16 bytes in a BCD number.

The critical difference between BCD and COMP-3 is in the final nybble. Note, incidentally, that IBM does NOT use byte-swapped memory, so it always goes from low to high, read left-to-right. So the "rightmost" byte of a COMP-3 number is not a digit, it's a sign. The values for this nybble are A, C or F for positive numbers, and B or D for negative numbers. The "F" is what happens when you take an EBCDIC number and run it through the IBM "PACK" machine instruction (since the EBCIDC hex value for "052143" is F0 F5 F2 F1 F4 F3 and the PACK instruction simply swaps nybbles on the final byte. Once the number has been operated on arithmetically, the sign nybble will be C or D.

Your code example manages this by extracting nybbles and converting them to their numeric string equivalents. The conditional statement for the final byte looks at the sign, and if it's 0x0D (13), then it marks the resulting number as being negative. The code isn't as robust or as efficient as how I do it, but it's sufficient for most purposes.

Incidentally, packed decimal was designed back when Hollerith (punched) cards were standard. The IBM 80-columns Hollerith format consisted of 10 rows, numbered 0-9 plus 2 rows above (known as "X" and "Y"), Because there were only 80 columns in the card, signed numbers were often punched by backspacing and punching an X or Y hole ("zone punch") on the final digit, saving 1 column's worth of space. When a number that had been overpunched this was was run through the IBM PACK instruction, the zone punch would be reflected as the sign nybble (since the EBCDIC code that was read was simply nybble-swapped on the final character). That meant that negative numbers could be stored and processed efficiently.
 
It is sorta covered in the JavaRanch Style Guide.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!