• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Tim Cooke
  • paul wheaton
  • Paul Clapham
  • Ron McLeod
Sheriffs:
  • Jeanne Boyarsky
  • Liutauras Vilda
Saloon Keepers:
  • Tim Holloway
  • Carey Brown
  • Roland Mueller
  • Piet Souris
Bartenders:

Is this a good application for Hadoop?

 
Greenhorn
Posts: 17
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Can you tell me if this is an appropriate situation in which to apply Hadoop:

We are a collection of toll road operators. We each operate one or more toll roads, and we each have our own set of customers. Every customer has "arrangements to pay" information stored with the toll road operator that owns the customer account. But we want any toll road customer to be able to use any toll road in a seemless way, so that all charges, incurred on any toll road, end up on their home toll road operator account - a true interoperability scenario. To make this possible, every day, we currently exchange large flat files (several GBs) containing "arrangements to pay" data.

This "arrangements to pay" data is held within specific database tables within our own tolling systems. Some of these systems are custom built, some based on SAP, some on Oracle applications, some use SQL Server and some use Oracle database.

Is it practical to think that we could create a Hadoop cluster of this "arrangements to pay" data so that any toll road operator, at any time, could query the cluster and determine the status of a particular customer's "arrangements to pay" to find out whether, for example, their account is account is active and has a sufficient balance of funds?

I would be very interested to hear your views on how this might be possible.

Regards, Rupert
 
Greenhorn
Posts: 3
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
This is definitely Hadoop-able.
 
author
Posts: 15
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
That's an interesting problem. So if I understand you correctly the key gain here is to have a single source of the arrangements to pay data that can be queried by any of the providers and obviate the need for this data to be kept in multiple different RDBMS, along the way easing the data sharing problem by not requiring all files to be pushed to all partners?

If so then this would be a great fit for Hive. You could possibly take the existing data files, push them into Hive and then use a SQL-like syntax to run reports against the data.

I see two possible wrinkles that would need more detailed thought:

1. If the query load is lots of small queries (e.g. a query per customer) then Hive, having higher latency than a transactional RDBMS will give poor performance. But if the workload is more report-type queries like "select <payment records> from < table> where date = <date> and customer_id in <my customers>" then it would work well.
2. If you also wanted to hold the customer data and account info in Hadoop then that's more a Hbase type use case where ease of updates and low-latency query response times are more important. So you could potentially hold customer data in Hbase, payment data in Hive. You'd still have the benefit of a single shared system.

Or in other words my kneejerk response is to say it could be a good fit, certainly worth some exploration if you are looking to do some rationalization/ process streamlining.

Garry
 
reply
    Bookmark Topic Watch Topic
  • New Topic