This week's book giveaway is in the NodeJS forum.
We're giving away four copies of Serverless Applications with Node.js and have Slobodan Stojanovic & Aleksandar Simovic on-line!
See this thread for details.
Win a copy of Serverless Applications with Node.js this week in the NodeJS forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Liutauras Vilda
  • Bear Bibeault
  • Jeanne Boyarsky
  • paul wheaton
Sheriffs:
  • Junilu Lacar
  • Paul Clapham
  • Knute Snortum
Saloon Keepers:
  • Stephan van Hulst
  • Ron McLeod
  • Tim Moores
  • salvin francis
  • Carey Brown
Bartenders:
  • Tim Holloway
  • Frits Walraven
  • Vijitha Kumara

Micrometrics to forecast Application Performance  RSS feed

 
Greenhorn
Posts: 13
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Even unpredictable weather is being forecasted. But after all these technological advancements, are we able to forecast our application performance & availability? Are we able forecast even for the next 20 minutes? Will you be able to say that in the next 20 minutes application is going to experience OutOfMemoryError, CPU spikes, crashes? Most likely not. It’s because we focus only on macro-metrics:

  • Memory utilization
  • Response time
  • CPU utilization


  • These are great metrics, but they can’t act as lead indicators to forecast performance/availability characteristics of your application. Now let’s discuss few micrometrics that can forecast your application’s performance/availability characteristics.

    EXAMPLE: 1



              Fig: You can notice repeated full GCs triggered (graph from GCeasy.io)


    Let’s start this discussion with an example. This application experienced OutOfMemoryError. Look at the heap usage graph (generated by parsing garbage collection logs). You can notice heap usage going higher & higher despite full GCs running repeatedly. This application experienced OutOfMemoryError around 10:00am, whereas repeated full GCs started happening right around 08:00am. Starting from 08:00am till 10:00am application was only doing repeated full GCs. If DevOps team would have monitored Garbage collection activity, they should have been able to forecast that application is going to experience OutOfMemoryError even a couple of hours before.

    Memory related micrometrics

    There are 4 memory/garbage collection related micrometrics that you can monitor:

  • Garbage collection Throughput
  • Garbage collection Pause time
  • Object creation rate
  • Peak heap size


  • Let’s discuss them in this section.

    # 1. GARBAGE COLLECTION THROUGHPUT

    Garbage Collection throughout is the amount of time application spends in processing customer transactions vs amount of time application spends in doing garbage collection.

    Let’s say your application has been running for 60 minutes. In this 60 minutes, 2 minutes is spent on GC activities.

    It means application has spent 3.33% on GC activities (i.e. (2 / 60) * 100).

    It means Garbage Collection throughput is 96.67% (i.e. 100 – 3.33).

    When there is a degradation in the GC throughput, it’s an indication of some sort of memory problem is brewing in the application.

    # 2. GARBAGE COLLECTION LATENCY


          Fig:GC Throughput & GC Latency micrometric

    When certain phases of Garbage Collection event run, entire application pauses. This pause is what referred as latency. Some Garbage collection events might take a few milliseconds, whereas some garbage collection events can take several seconds to minutes. You need to monitor GC pause times. If GC pause times starts to go higher, it will impact user’s experience.

    # 3. OBJECT CREATION RATE



            Fig: Object creation rate micrometric

    Object creation rate is the average amount of objects created by your application. Say suppose your application was 100mb/sec. And recently it starts to create 150mb/sec without any increase in the traffic volume – then it’s an indication of some problem brewing in the application. This additional object creation rate has potential to trigger more GC activity, increase CPU consumption & degrade response time.

    You can use this same metric in your CI/CD pipeline as well to measure the quality of code commit. Say in your previous code commit your application was creating 50mb/sec. Starting from recent code commit, say your application starts to create 75mb/sec for the same of amount traffic volume – then it’s an indication of some inefficient code commit to your repository.

    # 4. PEAK HEAP SIZE


                Fig: Peak Heap size micrometric

    Peak heap size is the maximum amount of memory consumed by your application. If peak heap size goes beyond a limit you must investigate it. Maybe there is a potential memory leak in the application, newly introduced code (or 3rd libraries/frameworks) is consuming lot of memory.

    How to generate memory related micrometrics?
    All the memory related micrometrics can be sourced from garbage collection logs.

    (1). You can enable the garbage collection logs by passing following JVM arguments:

    Till Java 8:

    -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:

    From Java 9:

    -Xlog:gc*:file=

    (2). Once garbage collection logs are generated you can either manually the GC logs to GC log analysis tools such as fastThread.io or using programmatic REST API. REST API is useful when you want to automate the report generation process. It can be used in CI/CD pipeline as well.

    EXAMPLE 2
    After few hours of launch a major financial application started to experience ‘OutOfMemoryError: unable to create new native thread’. This application turned ON a new feature in their JDBC (Java Database Connectivity) driver. Apparently, this feature had a bug, due to which JDBC driver started to spawn new threads repeatedly (instead of re-using same threads). Thus, within a short duration of time, application started to experience ‘OutOfMemoryError: unable to create new native thread’. If team would have monitored thread count and thread states, they could have caught the problem quite early on and prevented the outage. Here are the actual thread dumps captured from the application. You can notice that RUNNABLE state thread count growing between each thread dump over period.


            Fig: Growing RUNNABLE state Thread count (graph from fastThread.io)

    Let us go through the blog to know more in details about Memory related micrometrics
         
     
     
    WARNING! Do not activate jet boots indoors or you will see a tiny ad:
    global solutions you can do at home or in your backyard
    https://www.kickstarter.com/projects/paulwheaton/better-world-boo
    • Post Reply Bookmark Topic Watch Topic
    • New Topic
    Boost this thread!