• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Tim Cooke
  • Campbell Ritchie
  • paul wheaton
  • Ron McLeod
  • Devaka Cooray
Sheriffs:
  • Jeanne Boyarsky
  • Liutauras Vilda
  • Paul Clapham
Saloon Keepers:
  • Tim Holloway
  • Carey Brown
  • Piet Souris
Bartenders:

Issues Storm - Kafka

 
Greenhorn
Posts: 10
Eclipse IDE Python Spring
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi All,

We are using storm Kafka integration where a Spout reads from a Kafka topic.

Following is the version of storm, Kafka and zookeeper we are using.
Strom : apache-storm-0.9.2-incubating
Kafka : kafka_2.8.0-0.8.1.1
Zookeeper : zookeeper-3.4.6

I am facing following issues at spout.
1)The messages gets failed even if the average time taken is less than max.topology.timeout value, also we aren’t getting any exceptions at any of the bolt.
2)A topology is finally emitting to the Kafka producer i.e. some other topic, but the messages are getting duplicated due to replay issues.
3)The consumer group is isn’t working properly for storm Kafka integration.
a.When we give same group id to the Kafka consumer of different topology but still both are reading same messages.
b.If we have 2 different consumer with different consumer group id in different topology it works fine if both topologies are deployed at the same time, but doesn’t if we deploy one of them after some of the message are already loaded in the topic and read by the first topology.

Kindly help me with above points as it is hampering the overall scope of the project and also time lines.

Regards,
Nilesh Chhapru.
 
Author
Posts: 14
5
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
We never ran 0.9.2-incubating in production as it didn't pass our tests. I don't recall what problems we hit. I do know we didn't end up using it.


1. Average time isn't useful in your scenario. Your average can be below your timeout but many tuples can still be taking longer than that timeout. You are dealing with latency to process and that is going to vary widely.
2. You need to work out your issues with #1 above. Then figure out how to make your system work with at-least-once processing. You can't have exactly once processing in a distributed system. You can get usually once but that is the best you can do. Chapter 4 discusses this towards the end. Ideally, you want a system where it doesnt matter for anything other wasted processing if you duplicate a message.
3. That isn't something I'm capable of helping you hammer out via this forum. I'd suggest creating the simplest topology possible and figure out what in your code/configuration is causing the issue.



 
reply
    Bookmark Topic Watch Topic
  • New Topic