• Post Reply Bookmark Topic Watch Topic
  • New Topic

Best way to split single large xml file into multiple xml files with java  RSS feed

 
Karthik Karunanithi
Greenhorn
Posts: 5
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Guys,

I need your help. I need to split the large xml files into multiple xml files. Can you please suggest me which one will be the best way.

1) performance wise also should be fine.
2) With multithread also required because will receive multiple large xml files.
3) should not come memory error also.

For eg: my large xml file will look this.

<Document xmlns="urn:iso:std:iso:20022:tech:xsd:pain.001.001.05">
<CstmrCdtTrfInitn>
<GrpHdr>
...............
</GrpHdr>
<PmtInf>
............
<CdtTrfTxInf>
................
</CdtTrfTxInf>
<CdtTrfTxInf>
................
</CdtTrfTxInf>
<CdtTrfTxInf>
................
</CdtTrfTxInf>

</PmtInf>
</CstmrCdtTrfInitn>
</Document>

I want it as three part from the above example.

part 1:

<Document xmlns="urn:iso:std:iso:20022:tech:xsd:pain.001.001.05">
<CstmrCdtTrfInitn>
<GrpHdr>
...............
</GrpHdr>
<PmtInf>
............
<CdtTrfTxInf>
................
</CdtTrfTxInf>
</PmtInf>
</CstmrCdtTrfInitn>
</Document>

part 2:

<Document xmlns="urn:iso:std:iso:20022:tech:xsd:pain.001.001.05">
<CstmrCdtTrfInitn>
<GrpHdr>
...............
</GrpHdr>
<PmtInf>
............
<CdtTrfTxInf>
................
</CdtTrfTxInf>
</PmtInf>
</CstmrCdtTrfInitn>
</Document>

part 3:

<Document xmlns="urn:iso:std:iso:20022:tech:xsd:pain.001.001.05">
<CstmrCdtTrfInitn>
<GrpHdr>
...............
</GrpHdr>
<PmtInf>
............
<CdtTrfTxInf>
................
</CdtTrfTxInf>
</PmtInf>
</CstmrCdtTrfInitn>
</Document>


This is my requirement, So can you please suggest me the best way to do this.

And also if any book is there. Please let me know.

Thanks in Advance.

Regards,
Karthik K
 
K. Tsang
Bartender
Posts: 3648
16
Firefox Browser Java Mac OS X
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I see your desired format part 1,2,3 has the same tags. Are only the stuff/content inside the <CdtTrfTxInf> tag determine which part to go?

If so, you can write a function or class to do this parsing of the <CdtTrfTxInf> tag and write to the appropriate file.



 
Karthik Karunanithi
Greenhorn
Posts: 5
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thank you for your quick response...K. Tsang

<CdtTrfTxInf> Inside this tag unique transaction details will come. </CdtTrfTxInf> .

For your clear understanding,

The original file will look like this,

<Document xmlns="urn:iso:std:iso:20022:tech:xsd:pain.001.001.05">
<CstmrCdtTrfInitn>
<GrpHdr>
...............
</GrpHdr>
<PmtInf>
Bulk 1....
<CdtTrfTxInf>
Txn 1.1....
</CdtTrfTxInf>
<CdtTrfTxInf>
Txn 1.2....
</CdtTrfTxInf>
<CdtTrfTxInf>
Txn 1.3....
</CdtTrfTxInf>

</PmtInf>
<PmtInf>
Bulk 2....
<CdtTrfTxInf>
Txn 2.1....
</CdtTrfTxInf>
<CdtTrfTxInf>
Txn 2.2....
</CdtTrfTxInf>
<CdtTrfTxInf>
Txn 2.3....
</CdtTrfTxInf>

</PmtInf>
</CstmrCdtTrfInitn>
</Document>

I want it like 6 files from the one original file.

file 1:
<Document xmlns="urn:iso:std:iso:20022:tech:xsd:pain.001.001.05">
<CstmrCdtTrfInitn>
<GrpHdr>
...............
</GrpHdr>
<PmtInf>
Bulk 1
<CdtTrfTxInf>
Txn 1.1....
</CdtTrfTxInf>
</PmtInf>
</CstmrCdtTrfInitn>
</Document>

file 2:

<Document xmlns="urn:iso:std:iso:20022:tech:xsd:pain.001.001.05">
<CstmrCdtTrfInitn>
<GrpHdr>
...............
</GrpHdr>
<PmtInf>
Bulk 1
<CdtTrfTxInf>
Txn 1.2....
</CdtTrfTxInf>
</PmtInf>
</CstmrCdtTrfInitn>
</Document>

file 3:

<Document xmlns="urn:iso:std:iso:20022:tech:xsd:pain.001.001.05">
<CstmrCdtTrfInitn>
<GrpHdr>
...............
</GrpHdr>
<PmtInf>
Bulk 1
<CdtTrfTxInf>
Txn 1.3....
</CdtTrfTxInf>
</PmtInf>
</CstmrCdtTrfInitn>
</Document>

file 4:

<Document xmlns="urn:iso:std:iso:20022:tech:xsd:pain.001.001.05">
<CstmrCdtTrfInitn>
<GrpHdr>
...............
</GrpHdr>
<PmtInf>
Bulk 2
<CdtTrfTxInf>
Txn 2.1....
</CdtTrfTxInf>
</PmtInf>
</CstmrCdtTrfInitn>
</Document>

file 5:

<Document xmlns="urn:iso:std:iso:20022:tech:xsd:pain.001.001.05">
<CstmrCdtTrfInitn>
<GrpHdr>
...............
</GrpHdr>
<PmtInf>
Bulk 2
<CdtTrfTxInf>
Txn 2.2....
</CdtTrfTxInf>
</PmtInf>
</CstmrCdtTrfInitn>
</Document>

file 6:
<Document xmlns="urn:iso:std:iso:20022:tech:xsd:pain.001.001.05">
<CstmrCdtTrfInitn>
<GrpHdr>
...............
</GrpHdr>
<PmtInf>
Bulk 6
<CdtTrfTxInf>
Txn 2.3....
</CdtTrfTxInf>
</PmtInf>
</CstmrCdtTrfInitn>
</Document>


....Now you can able to understand what is my requirement exactly.

One more thing, i am trying to do this because for xml transformation it took nearly more than hour for transforming 50K txns.

that's why i am splitting into single file.

And if possible can please you give some sample program this and i attached sample file also.  (I am very new to this.)


Thank you.
 
Karthik Karunanithi
Greenhorn
Posts: 5
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Guys,

It's been urgent...Could you please help on this...

Thank you
 
Liutauras Vilda
Sheriff
Posts: 4927
334
BSD
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Looking to subject, it seems you need a solution using Java.

Do you know how to read file? Write new file?

Do you know exact xml structure upfront? Repetition of same tags doesn't matter, but are they always go in same sequence and always same tags?
In case you can answer to those question above - yes, seems that you could write fairly simple parser and accomplish that job.

Now, if you never had any experience with Java yourself, you'll have hard times probably.

As an aside note: people here don't work on urgent basis as well as don't provide complete solutions, but they are more than happy to help going through some sort of struggle finding a solution.

How much Java experience you have?
 
Liutauras Vilda
Sheriff
Posts: 4927
334
BSD
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
And welcome to the Ranch, Karthik!
 
Karthik Karunanithi
Greenhorn
Posts: 5
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Liutauras Vilda,

Thank you so much your response and sorry for asked urgent basis.

I have been working in jbase technology for the past 3 years. I am learning java for the past one year.

Read xml and write xml file, i learnt from website. And i tried with dom4j but it took so much time to 100k transactions.

I read about w3c dom that will also take time. It will dumb entire xml file into memory. so that's why i confused with this.

Can you please tell me which parser will be good for this.

or if you have any websites please refer me. i will check and get back to you.

Thank you so much for doing wonderful job.





 
Mark Spencers
Ranch Hand
Posts: 51
2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You need to use sax parser. It doesn't read all xml file into memory.

See this tutorial https://www.mkyong.com/java/how-to-read-xml-file-in-java-sax-parser/ and this answer https://stackoverflow.com/questions/26310595/how-to-parse-big-50-gb-xml-files-in-java
 
Karthik Karunanithi
Greenhorn
Posts: 5
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thank you Mark Spencers....

Will try and get back if any issue.
 
Consider Paul's rocket mass heater.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!