Andrew Psaltis

Author
+ Follow
since Sep 04, 2017
Cows and Likes
Cows
Total received
5
In last 30 days
0
Total given
0
Likes
Total received
1
Received in last 30 days
0
Total given
1
Given in last 30 days
0
Forums and Threads
Scavenger Hunt
expand Ranch Hand Scavenger Hunt
expand Greenhorn Scavenger Hunt

Recent posts by Andrew Psaltis

Mohammed Sardar. wrote:
Thanks and Very interesting to hear this book is suitable to that use case as well your work on this book really commendable. Great thanks to introduce this book here.



Thanks for the kind words Mohammed!
4 years ago
Hi Sathya,
For certain industries, there are certainly governing bodies that standardize the data formats that are used to exchange data. However, between industries or in many other there are no general governing bodies. This cartoon (http://xkcd.com/927/) is a very accurate description of the world we live in. A great tool to look at to help deal with this problem is Apache NiFi  (nifi.apache.org)
4 years ago
This book is very useful in that use case as well. The book is not geared toward any one industry, it is written to tech people how to think about and build streaming systems. So, for example, an aeronautical engineer might be interested in the data coming from the turbine system of an aircraft. They will need tools to collect, analyze and present this data to them.
4 years ago
Hi Divya,
The goal was to make this topic approachable for a developer who does not have any prior knowledge of streaming systems.
4 years ago
That really depends on the source of the data. For example, the size of the data being produced by a sensor, perhaps a water flow meter in a factory plant or your house (see this project for an example: https://github.com/ericmaicon/water-sensor) is quite small. On the other hand, you may be consuming data from a CCTV camera that you want to analzye.

In both cases the data may be continuously flowing, however, the amount of data that makes up a logical message may be of totally different sizes.
4 years ago
Hi T,
You are certainly on to something here. The approach with Apache Beam is fantastic and something I firmly believe we need as a community. It lays the foundation for how we think about and talks about streaming systems regardless of the underlying stream processing engine. You are correct in the approach fo finite (historically called batch) and infinite data streams. When you think about this you come to the conclusion that a batch is really a dataset with a finite start/end time and thus you can treat is like a stream.

When you start to think about things this way, then you start to think streaming first. If you look at Apache Flink you will see a very similar approach -- from day one Flink had the perspective that everything is a stream and batch is just a stream with a fixed start / stop time.

Thank you for the kind words and added my book to your "to read" list.
4 years ago
Hi Will,
The book lays the foundation for how to think about these systems when building them. Architects/CTO's will certainly feel at home. However, it also goes over in mid-level detail on some algorithms and at the end how to tie the whole thing together in code.

The book is certainly not a "how to" sort of book because when building a streaming data pipeline there is no one size fits all, from a technology stack, language or deployment platform. However, in all cases, there are core distributed software engineering principles that a developer needs to understand to build these systems. The aim here is to find that happy medium where developers feel armed and ready to go and Architects/CTO's feel comfortable with this space.

A natural next book would be more of a practitioner's guide that goes deep on a particular implementation.

Hope that helps.
4 years ago

satya Priya Sundar wrote:Thanks for clearing the confusion.

  Then what constitues Stream, is it the processing of raw data into some specific format that another system can consume in real-time
  Or can this also be just relaying data from one system to another?

Any real time examples?

Regards
Sathya



 



Hi Sathya,
It can be either case. For example, if you use a Twitter client it is getting the data stream (in this case just tweets) from Twitter. On the otherhand, Twitter is doing processing between when you publish a Tweet and when it get's sent to a Twitter client. However, from the standpoint of your Twitter client -- there is a stream of data being pushed to it.

In the most general case, a stream can be identified as the flow of data from a producer.

Hope that helps.

Thanks,
Andrew
4 years ago

Narendran Sridharan wrote:Welcome Andrew. Congratz on your book publication.  



Thanks Narendran!
4 years ago
Hi Jacek,
This book is not about a particular framework. It is a book about the things you need to consider and think through as you are building an entire system. Naturally, frameworks like Spark Streaming, Flink, Storm, etc. are mentioned and the commonalities are drawn upon during one of the chapters. But those tools are always just one piece of the puzzle -- there are many other parts such as how do you: "acquire, move, process, possibly store, and deliver the resulting data to a client.
4 years ago
Hi Satya,
This is actually something that is addressed early in the book. Today we do a great job of having jargon soup and making things quite confusing, as these terms are quite overloaded. I would argue that Data Streaming may be more accurate in this day. However, many many times people still talk about real time and streaming interchangeably.

Thanks,
Andrew
4 years ago
Hi Scott,
I have looked at Drill -- quite amazing how far that project has come. Really a nice way to have a single way to access data regardless of source as if it is all one.
4 years ago
Hi Paul,
The right mind set from my perspective two-fold -- first, it is very important to think about what is "real time" in your domain, it is not always the same. Secondly, is to think about "what insight can you glean from the data as it is passing by"
4 years ago
Thanks for all the welcomes.. Honored to be part of this!
4 years ago