• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
  • Tim Cooke
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Liutauras Vilda
  • Rob Spoor
  • Junilu Lacar
  • paul wheaton
Saloon Keepers:
  • Stephan van Hulst
  • Tim Moores
  • Tim Holloway
  • Carey Brown
  • Scott Selikoff
  • Piet Souris
  • Jj Roberts
  • fred rosenberger

Most Critical Issue Experiences

Ranch Hand
Posts: 441
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Just thought of discussing the most critical hard to debug issues which you guys have experienced in PRODUCTION. It will give us some learning for future projects.
Saloon Keeper
Posts: 9344
Eclipse IDE Firefox Browser MySQL Database VI Editor Java Windows
Posts: 27234
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
This won't provide you any learning, I don't think. But the hardest-to-debug situation I ever encountered was never solved.

It was one batch of orders which didn't come out right; it looked like one if-statement in one program was always taking the wrong branch and so naturally things didn't come out right. Other batches of orders that night worked fine, the problem had never happened before, and in fact the problem never occurred again. It was a program which we ran for at least 20 years in 20 different warehouses several times every day. There was no reason why that if-statement should have been defective in that one batch of orders. I just happened to be the one in the office that evening, I was working late for some reason, but my main task then was to reconstruct the orders and get them re-entered so the batch could be run again and the orders could go out.

As I said, we never solved the problem. Our best guess was this: You know that the hardware where memory is stored contains error-checking and error-correcting mechanisms, because there's always the possibility that very tiny electrical fluctuations can flip a bit in the memory. So those mechanisms catch and correct something like 99.999999% of those errors. (I don't know how many 9's there actually were in the statistics for our systems.) But not 100%. So there's a tiny, tiny probability that a random error can occur, but if you run your machines for enough years an error will occur. But of course we could never prove that was what actually happened.
pie. tiny ad:
Building a Better World in your Backyard by Paul Wheaton and Shawn Klassen-Koop
    Bookmark Topic Watch Topic
  • New Topic