• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Tim Cooke
  • Campbell Ritchie
  • Ron McLeod
  • Junilu Lacar
  • Liutauras Vilda
Sheriffs:
  • Paul Clapham
  • Jeanne Boyarsky
  • Henry Wong
Saloon Keepers:
  • Tim Moores
  • Tim Holloway
  • Stephan van Hulst
  • Piet Souris
  • Carey Brown
Bartenders:
  • Jesse Duncan
  • Frits Walraven
  • Mikalai Zaikin

Merging databases---architecture?

 
Ranch Hand
Posts: 74
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,

I've been reading Scott Ambler for several years, now, and appreciate the notion
of incremental refactoring of databases. I have a non-incremental problem, now.

Where I work there are scientists independently generating data; often binary. We
can help them organize their data, put their metadata into a RDMS, and make web
interfaces to it. This creates database "islands" in the organization. These are
usually not very complex as far as RDBs go. The data is mostly read-only, and
the ERDs pretty standard. Rows are appended, but not often altered.

Inevitably, we would like to start browsing and querying across these islands in an
ad hoc sort of way. My question is what is the best way to do this? The current
idea is to identify some common keys, and then write scripts to pull the desired data
out of the islands and build merged RDBs. These typically may be "throw-away"
databases lasting only a few months, created as needed for the scientists to do
analysis on the merged data. (A subsequent problem is what to do with the analysis
data). The analyses would be used for publication.

This scenario seems to address some issues not covered in the literature I've read,
especially regarding data provenance and even versioning. I would really appreciate
the two cents worth of some real experts in the field.

Thanks!
Glenn
 
author
Posts: 608
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
This sounds like a classic Extract-Transform-Load (ETL) issue, if I'm reading you right. Have you looked into this at all?

- Scott
 
Glenn Murray
Ranch Hand
Posts: 74
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi Scott,

Thanks, no, I haven't looked into this as I had never even heard of it---that's why
I asked. And you are right, ETL is what I am talking about---thanks for the right
nomenclature.

Do suggest somewhere to go to learn more about ETL tools, patterns, designs, etc.?

Thanks,
Glenn
 
Scott Ambler
author
Posts: 608
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Ralph Kimball has a data warehouse ETL book which I think is pretty good. I'd start there, although there could be a better book out there. Perhaps a good thing to do is start with Kimball's site. He's got a lot of good articles there.

- Scott
 
Hoo hoo hoo! Looks like we got a live one! Here, wave this tiny ad at it:
Free, earth friendly heat - from the CodeRanch trailboss
https://www.kickstarter.com/projects/paulwheaton/free-heat
reply
    Bookmark Topic Watch Topic
  • New Topic