• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Tim Cooke
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Liutauras Vilda
Sheriffs:
  • Rob Spoor
  • Junilu Lacar
  • paul wheaton
Saloon Keepers:
  • Stephan van Hulst
  • Tim Moores
  • Tim Holloway
  • Carey Brown
  • Scott Selikoff
Bartenders:
  • Piet Souris
  • Jj Roberts
  • fred rosenberger

Is this correct approach to process unprocessed error records with Spark for Streaming data

 
Ranch Hand
Posts: 2601
13
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
A spark streaming application reads streaming data from kafka using Spark kafka integration. While processing it if any exception occurs , then such records would not be processed. But those records had already been received by Spark and would never be processed this way. How to reprocess such records? What I can think of for dealing with this is to have a table where you put records before processing (say RecordsBeforeProcessing) .In this table keep a flag variable for IsProcessed. Once a record is processed, updated this variable to Y (else it will remain N).  If such situation happens take the program to local Eclipse, debug it and fix the code.  After that deploy this code. Now schedule a batch processing job to read the records with IsProcessedFlag as 'N' and provide this to Spark to reprocess. But this may have performance overheads. Is this a correct approach? Thanks
 
pie. tiny ad:
Building a Better World in your Backyard by Paul Wheaton and Shawn Klassen-Koop
https://coderanch.com/wiki/718759/books/Building-World-Backyard-Paul-Wheaton
reply
    Bookmark Topic Watch Topic
  • New Topic