• Post Reply Bookmark Topic Watch Topic
  • New Topic

Transfering large amounts of records from one J2EE system to another: how?

 
Steve Buck
Ranch Hand
Posts: 45
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I'm trying to figure out what the best way to transfer large amounts of data from one J2EE system to another J2EE system would be.

Both systems would be at different physical locations (and owned by different businesses). Say that both use different databases: one uses MS SQL and another uses Oracle.

You don't want to do a direct database-to-database copy of the records (and I'm not sure but perhaps you can't when its not the same vendor anyhow).

You must have a J2EE component (I don't know which/how?) scrub and validate the incoming data and determine which records are useful and which aren't. Then add them to its own system.

Data transfers can be upwards of 1GB. I've thought about using XML Web Service but the overhead would be way too much. How else can you do it? I've never even heard of anyone using sockets with J2EE...should you?

On .NET I would have used windows services and then just compressed the data and sent over a binary protocol using TCP socket. Trying to find the best or nearly equal solution in J2EE world
[ January 14, 2005: Message edited by: Steve Buck ]
 
Paul Sturrock
Bartender
Posts: 10336
Eclipse IDE Hibernate Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Trying to find the best or nearly equal solution in J2EE world

For what you are trying to do, I don't see a reason to involve J2EE at all. Think about the array of technologies in J2EE and try to think of one which gives you any tangible benefit when moving data from one DB to another. The only one I can think of is JTA, but thats really questionable. You could do it through JDBC, but again is it a benefit to do this through JDBC, which is more complex than (for example) MSSQL's DTS and Oracle's imp/exp utilities?

If you have business rules involved in the transfer, then yes you should probably involve some sort of application which can transform or validate data as it passes betwee nthe two DBs. But if the only differences are vendor specific differences between t he two DBs, then just use the tools which come with the DBs - since this is a relatively common requirement.
 
Steve Buck
Ranch Hand
Posts: 45
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Paul Sturrock:

For what you are trying to do, I don't see a reason to involve J2EE at all. Think about the array of technologies in J2EE and try to think of one which gives you any tangible benefit when moving data from one DB to another. The only one I can think of is JTA, but thats really questionable. You could do it through JDBC, but again is it a benefit to do this through JDBC, which is more complex than (for example) MSSQL's DTS and Oracle's imp/exp utilities?

If you have business rules involved in the transfer, then yes you should probably involve some sort of application which can transform or validate data as it passes betwee nthe two DBs. But if the only differences are vendor specific differences between t he two DBs, then just use the tools which come with the DBs - since this is a relatively common requirement.


Thanks. I'm really not familiar with MSSQL's "DTS" and Oracle's "imp/exp" utilities. I just wanted something self-contained within a J2EE project so that I wouldn't have to worry about what every single person runs when they run the J2EE project. I'd hate to do something like schedule a windows schedule occurence or a cron job on nix using some external tool I've written to manipulate the data.

I'll have a look at what JTA can do. A large portion (if not all) the data will be recorded transactions. I'm still learning how to do all of this in J2EE so forgive my lack of knowledge Since I would like to keep record of the exchanges and make it reliable and accountable...perhaps JTA facilitates this.
 
David Harkness
Ranch Hand
Posts: 1646
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
JTA is simply Java's Transaction API, providing classes to abstract transactions for different types of resources (database, message queue, etc.) and provide support for single transactions to span multiple resources simultaneously (commit one transaction to two databases as a single unit of work).

While you'll use transactions in the process of transfering data, it's really a minor issue if you limit yourself to read-from-one-write-to-the-other. This can be handled with the basic JDBC (Java DataBase Connectivity) API. If you were updating the records ni the source database as they were being written to the destination, JTA would become more important.

From my experience, given that you want to run these processes at multiple business sites (I assume different businesses -- not just offices for the same business), you'll have a much easier time selling them on using a simple DB-specific export script scheduled with cron. Managing a J2EE installation, especially for something this straight-forward, is overkill in my view. Their DBAs will already know how to manage the script+cron; J2EE is another story.

The application I'm currently working on has a nightly incremental (just new/modified rows) data export from our live system to a few other systems (reporting, newsletter mailing and CRM). Each script controls what data gets exported, and another shell script tars it up and sends it to the appropriate system, in some cases to other physical locations. Finally, the systems group on the receiving end is free to decide how to import the data into their database.

Granted, a turn-key network of J2EE applications would be far cooler and possibly more advanced (business rules, varying schedules, etc), this can all be done relatively easily with shell scripts and export/import rules. Of course, in the end it depends on the requirements and existing technology in place. At first blush, J2EE smells like over-engineering.

Feel free to post more details of the trickier parts you feel may warrent a more robust solution, and people here will gladly chime in.
 
Steve Buck
Ranch Hand
Posts: 45
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
David Harkness:
Unfortunately I'm still bent on finding a way to do it within the J2EE system so it can be self-contained (and apply business rules to the data).

On another note:
Should sockets be used from within EJBs?

You could schedule the EJB Timer Service (say, in New York) to invoke a method that opens a listening socket, and, simultaneously, the other end of the wire (Tokyo) could use its EJB Timer Service (within a narrow window of the other EJB) to invoke a method to create a client socket, connect to the listening one in New York and do a large data transfer (1GB).

I'm a beginner with EJB (and J2EE) so tell please me if this is a big "no-no"
 
Brian Tinnel
Ranch Hand
Posts: 69
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Listening on a socker is a big no-no within a bean. Creating an outgoing socket is okay.

Are you sending all 1GB worth of data at one time, or is it multiple pieces of smaller sized data. If it is smaller data then you could maybe look at using JMS. The clients can just put the data on a queue and you write an MDB to process it. It is really hard to say if this is workable.

I wouldn't give up on Web Services so quickly (unless you are really sending the 1GB worth of data). For communication between different companies it has a lot of benefits. For one, you don't have to require that they have the jars needed to communicate with your app server.
 
Steve Buck
Ranch Hand
Posts: 45
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Unfortunately there will be times when 1GB is needed.

I'm looking into compressing web service responses and any way to tune it so that binary data can be sent as efficiently as possible over the web service.

Perhaps I will use it. But if the overhead is monumentous then it will definitely be avoided.

As far as JMS and MDB go--it was my understanding that you should only use those within a single J2EE container rather than between multiple (and different physical sites)?
 
Steve Buck
Ranch Hand
Posts: 45
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I have a new idea! Would like some feedback on it

When site B requests that A gives it the 1GB update (request through a web service call), site A will take the 1GB of data and tar it up then drop it into a new FTP directory with permissions for that specific site account.

The site A web service that was called will respond with the filename of the tar and a CRC32. Then the client system at site B, who gets the response, will then automatically FTP in, grab the file, apply its business rules to the data and use it however.

Make it self-managing and have it remove the tar once a transfer is done.

This keeps all the code inside of the J2EE container--no outside scripts relied on. Maximum speed over the wire using FTP and all that is required from both ends is FTP and HTTP allowed.

Sound like a decent idea? Please tell me whether you think this is good, bad, ugly etc
[ January 22, 2005: Message edited by: Steve Buck ]
 
Paul Sturrock
Bartender
Posts: 10336
Eclipse IDE Hibernate Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Sounds like it would work. But of course file operations from within an EJB container are a no-no, unless you use JCA. Which is just a complication, rather than a help.
 
Steve Buck
Ranch Hand
Posts: 45
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Paul Sturrock:
Sounds like it would work. But of course file operations from within an EJB container are a no-no, unless you use JCA. Which is just a complication, rather than a help.


Darn it! What can I possibly do? :/

Sigh...why do things always have to be much more difficult than necessary

I don't even know what JCA is...

Surely there must be -some- way to transfer large amounts of data between J2EE systems that is contained within a J2EE application.

Would this be "good": On Timer Service triggered callback, the EJB polls the Web Service and gets the URL and checksum. Then the EJB passes those two off to a POJO (Plain Old Java Object) and it can do the file I/O ?

Somewhere in my J2EE application (deployment descriptor I figure) it would store the path to a directory where the POJO should store the files and use as its workplace (preferably a folder under my J2EE application deployment tree--is this okay? eg ..\webapp\workspace right beside WEB-INF and images). When the POJO is instantiated it will be passed the URL, necessary credentials, checksum, and the filepath on the system. Or is this crappy design (in J2EE terms)?

Thanks again. I really appreciate the feedback here

Edit: Looks like this approach is using "helper classes". And this is still not permitted as it may be inlined and then violate EJB spec. Any use of file I/O by the EJB directly or through the methods called on the call-graph is prohibited? *sigh*
[ January 24, 2005: Message edited by: Steve Buck ]
 
Steve Buck
Ranch Hand
Posts: 45
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I'm starting to think xml web service may be the only way to transfer this information. Is there any limit as to the size of an xml web service response can be?

I could always try and tune it as much as possible to negate the XML overhead (impossible, but there must be some amount possible)... http 1.1 compression and so forth.

I see most transfers around 10MB but sometimes it will hit 1GB without doubt.

The data will be textual, btw.

Are EJBs allowed to connect to web services and use the required APIs to take advantage of them?
[ January 24, 2005: Message edited by: Steve Buck ]
 
David Harkness
Ranch Hand
Posts: 1646
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Steve Buck:
Darn it! What can I possibly do? :/
Keep in mind that all these things that are "not allowed" in a J2EE container (not just in an EJB bean, so calling a POJO makes no difference) are "not allowed" in order to make portability possible. However, if you do them (access a file, open a socket, etc), most containers will not stop you.

You need to decide how portable and standards-compliant you want your system to be. If you can specify that all locations use the same container, you can likely get away with breaking the rules. Even with multiple containers you simply need to test whether or not they allow that particular rule to be broken.

In my current WebLogic application I merrily open a file on disk to read an encrypted passphrase to use when decrypting some data in the database. This is "not allowed" by EJB, but it works on Windows and Linux and Solaris with no modifications to the code.

Besides, if you don't break a rule every now and then, you're just not a serious coder.
 
Steve Buck
Ranch Hand
Posts: 45
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
David Harkness:
I read up on that last evening. I realize you can still do this for most of the major containers but... I still want it to be compliant so that -any- container now or in the future will be guarenteed to work with it.

Such is the reason I chose J2EE in the first place (as a standard...not a product and no lockin or vendor preference)

bah
Might have to avoid J2EE for this particular project *cry*
 
David Harkness
Ranch Hand
Posts: 1646
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Steve Buck:
Might have to avoid J2EE for this particular project *cry*
Then I'd suggest you start reading up on the Spring Framework and Hibernate. I've just completed porting our EJB application running in WebLogic to Spring + Hibernate running in Tomcat.

First thing we noticed: much better performance. And we haven't enabled the second level Hibernate cache (equivalent of entity bean cache) yet. Everytime we need to touch a database row, we hit the database to get the data first.

There are lots of happy faces around here now.
 
Steve Buck
Ranch Hand
Posts: 45
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by David Harkness:
Then I'd suggest you start reading up on the Spring Framework and Hibernate. I've just completed porting our EJB application running in WebLogic to Spring + Hibernate running in Tomcat.

First thing we noticed: much better performance. And we haven't enabled the second level Hibernate cache (equivalent of entity bean cache) yet. Everytime we need to touch a database row, we hit the database to get the data first.

There are lots of happy faces around here now.


Well... I kind of wanted a large framework with backing the backing that J2EE has rather than some open source frameworks :/ I do plan to do some lightweight java systems--but not for this particular task.

The deciding factor of J2EE over .NET for this project was the J2EE maturity, support and available documentation and particularly the vendor independence (the largest part)... being limited by technical aspects makes it impossible for me to go further with J2EE I am afraid
 
David Harkness
Ranch Hand
Posts: 1646
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Steve Buck:
Well... I kind of wanted a large framework with backing the backing that J2EE has rather than some open source frameworks :/
You may be surprised at the size and complexity of the projects using "some open source frameworks" as well as the names of the companies making use of them, but I certainly understand your hesitation and am not here to evangelize . . . too much.
being limited by technical aspects makes it impossible for me to go further with J2EE I am afraid
You could still use web services and not have to do any trickery. Simply break up the 1GB files into more palatable sizes and base-64 encode them. Sure, the size will grow by 33%, but once you're committed to sending 1GB over the network, what's 0.33GB more going to matter?
 
Steve Buck
Ranch Hand
Posts: 45
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
David,

I realized more recently the data being transferred will be completley plaintext. Therefore I won't even need to change the encoding.

There will be a little overhead breaking it up into separate transmissions but thats no biggie...an insignificant amount IMO. So this is probably how I would do it, too.

The problem now lies in whether or not an EJB method can legally call on a web service method (such as described in my other thread). If it is allowable under the EJB 2.1 and the J2EE container spec then I'm all for it

I want to setup a Timer Service for an EJB on site A to callback a method eg getUpdates(). getUpdates() will have in its code the call to an XML web service at site B. Site A then gets the XML response, applies some business logic and then pops in the remaining data into the database using JDBC.

My only problem now exists whether if its allowed or not. I have no experience with web services to this point and I don't know if it calls upon any java.io (which EJB container prohibits?) or how exactly it works with the parsers and so forth.

If someone could tell me concretely whether it can be done (and not break the spec) or not I'd appreciate it

thanks again.
[ January 26, 2005: Message edited by: Steve Buck ]
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!