• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Paul Clapham
  • Ron McLeod
  • Liutauras Vilda
  • Bear Bibeault
Sheriffs:
  • Jeanne Boyarsky
  • Tim Cooke
  • Devaka Cooray
Saloon Keepers:
  • Tim Moores
  • Tim Holloway
  • Piet Souris
  • salvin francis
  • Stephan van Hulst
Bartenders:
  • Frits Walraven
  • Carey Brown
  • Jj Roberts

Is this correct approach to process unprocessed error records with Spark for Streaming data

 
Ranch Foreman
Posts: 2067
12
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
A spark streaming application reads streaming data from kafka using Spark kafka integration. While processing it if any exception occurs , then such records would not be processed. But those records had already been received by Spark and would never be processed this way. How to reprocess such records? What I can think of for dealing with this is to have a table where you put records before processing (say RecordsBeforeProcessing) .In this table keep a flag variable for IsProcessed. Once a record is processed, updated this variable to Y (else it will remain N).  If such situation happens take the program to local Eclipse, debug it and fix the code.  After that deploy this code. Now schedule a batch processing job to read the records with IsProcessedFlag as 'N' and provide this to Spark to reprocess. But this may have performance overheads. Is this a correct approach? Thanks
 
pie. tiny ad:
Building a Better World in your Backyard by Paul Wheaton and Shawn Klassen-Koop
https://coderanch.com/wiki/718759/books/Building-World-Backyard-Paul-Wheaton
reply
    Bookmark Topic Watch Topic
  • New Topic