• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Paul Clapham
  • Ron McLeod
  • Bear Bibeault
  • Liutauras Vilda
Sheriffs:
  • Jeanne Boyarsky
  • Junilu Lacar
  • Henry Wong
Saloon Keepers:
  • Tim Moores
  • Stephan van Hulst
  • Jj Roberts
  • Tim Holloway
  • Piet Souris
Bartenders:
  • Himai Minh
  • Carey Brown
  • salvin francis

Is this correct approach to process unprocessed error records with Spark for Streaming data

 
Ranch Foreman
Posts: 2059
12
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
A spark streaming application reads streaming data from kafka using Spark kafka integration. While processing it if any exception occurs , then such records would not be processed. But those records had already been received by Spark and would never be processed this way. How to reprocess such records? What I can think of for dealing with this is to have a table where you put records before processing (say RecordsBeforeProcessing) .In this table keep a flag variable for IsProcessed. Once a record is processed, updated this variable to Y (else it will remain N).  If such situation happens take the program to local Eclipse, debug it and fix the code.  After that deploy this code. Now schedule a batch processing job to read the records with IsProcessedFlag as 'N' and provide this to Spark to reprocess. But this may have performance overheads. Is this a correct approach? Thanks
 
reply
    Bookmark Topic Watch Topic
  • New Topic