• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
  • Campbell Ritchie
  • Paul Clapham
  • Ron McLeod
  • Jeanne Boyarsky
  • Tim Cooke
  • Liutauras Vilda
  • paul wheaton
  • Henry Wong
Saloon Keepers:
  • Tim Moores
  • Tim Holloway
  • Stephan van Hulst
  • Carey Brown
  • Frits Walraven
  • Piet Souris
  • Himai Minh

Streaming Data: Understanding the real-time pipeline - Beyond real-time data?

Ranch Hand
Posts: 238
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi Andrew,

This looks like a very interesting book. Looking at the table of contents, I was wondering if you'd considered the use of a streaming approach to analyse not only real-time (or close to real-time) data, but also very large finite stored data. This seems to be the approach behind the Apache Beam project, founded by the guys at Google (I don't work for them, by the way, I am just a researcher). Basically, the idea is that you could use the same abstractions from streaming such as windowing to analyse very large, but finite, data. The conventional dichotomy between batch and stream thus ceases to exist, since all big data, finite or infinite, can be treated as stream.

Fascinating stuff, and it really expands the scope of streaming methods and techniques beyond near-real-time or streaming data.

Congratulations on the publication of your book, I have added it to my "to read" list!
Posts: 14
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi T,
You are certainly on to something here. The approach with Apache Beam is fantastic and something I firmly believe we need as a community. It lays the foundation for how we think about and talks about streaming systems regardless of the underlying stream processing engine. You are correct in the approach fo finite (historically called batch) and infinite data streams. When you think about this you come to the conclusion that a batch is really a dataset with a finite start/end time and thus you can treat is like a stream.

When you start to think about things this way, then you start to think streaming first. If you look at Apache Flink you will see a very similar approach -- from day one Flink had the perspective that everything is a stream and batch is just a stream with a fixed start / stop time.

Thank you for the kind words and added my book to your "to read" list.
Danger, 10,000 volts, very electic .... tiny ad:
Free, earth friendly heat - from the CodeRanch trailboss
    Bookmark Topic Watch Topic
  • New Topic