Exporting the stuck thread count from WebLogic

Exporting the stuck thread count from WebLogic

Published on: Category: Oracle

This blog expands on an article by Frank Munz on stuck thread handling – I advise reading his post first, especially if you are having trouble dealing with stuck threads. Find it here.

What prompted me to write this bit was a question I was asked by one of our customers: “So stuck threads are a signal that some backend system is misbehaving. How can we monitor the number of stuck threads in WebLogic server?”.

Searching online, the first few pages in Google come up with administrators asking the same question.  Some solutions offered:

  • The WLDF can trigger on stuck threads, so you can have it send you a mail or an SNMP trap, but it’s very limited in functionality.
  • Stuck threads can be hogging threads as well and in WLST the total count of hogging threads is an attribute of a server runtime. That’s easily found and manipulated in a WLST script. But that’s not a trustworthy indicator!
  • Use jstack, then count the number of “[STUCK]” instances.

Only that last one can be used to build historic data, but even then it triggers on any stuck thread, you have to do a lot of string parsing to count only threads you are interested in.

Strangely enough nobody offered the Oracle Enterprise Manager. OEM can monitor stuck threads and saves historic data. But if OEM has a Stuck Threads metric that means there’s also an MBean. And it is, but it takes some digging through the runtime MBean tree. Browsing with WLST, every deployment gets a workmanager and if no specific workmanager is added, a copy of the default work manager configuration is used. And that in turn has a StuckThreadCount attribute. Here’s a script that counts all of these per server:

  1. '''
  2. So the basic tree stucture we're walking looks like this:
  3. ServerRuntimes
  4. |--< Server (all servers)
  5. |--< Application run times (all deployments)
  6. |--< Work manager runtimes (all work managers)
  7.  
  8. We could probably enumerate all threads and see which are stuck, but this
  9. way you can much easier manager the output. As you can see in this tree
  10. for example, you know which deployment has stuck threads
  11. '''
  12. ## Prevent printing output to the screen
  13. redirect('/dev/null','false')
  14.  
  15. ## Insert your own password here
  16. connect("weblogic", <your password>, "t3://localhost:7001")
  17.  
  18. domainRuntime()
  19. servers = ls('/ServerRuntimes','true','c')
  20.  
  21. # We'll store all results in here, using the server name for a key
  22. result=dict()
  23. for server in servers:
  24. deployments = ls('/ServerRuntimes/' + server + '/ApplicationRuntimes','true','c')
  25. result[server] = 0;
  26. for deployment in deployments:
  27. ## If you are only interested in a single deployment, run that check here, like
  28. ## if(deployment.getName() == "MyApplication"):
  29.  
  30. ## Could be that there are multiple workmanagers, I'm not sure, so let's iterate over them
  31. wms = ls('/ServerRuntimes/' + server + '/ApplicationRuntimes/' \
  32. + deployment + '/WorkManagerRuntimes','true','c')
  33. for wm in wms:
  34. cd('/ServerRuntimes/' + server + '/ApplicationRuntimes/' \
  35. + deployment + '/WorkManagerRuntimes/' + wm)
  36. result[server] = result[server] + get('StuckThreadCount')
  37.  
  38. ## Reenable printing output
  39. redirect('/dev/null','true')
  40.  
  41. ## Print all server names and the number of stuck threads we counted per server
  42. ## Format for Nagios output etc. from here
  43. for key in result:
  44. print(key + " has " + str(result[key]) + " stuck threads.")

Using the StuckThreadForFree application from Frank Munz’ page to generate some stuck threads, I can now count them:

  1. [oracle@machine]$ wlst getstuckthreads.wlst
  2. Backoffice-0 has 0 stuck threads.
  3. Backoffice-1 has 3 stuck threads.
  4. Selfservice has 0 stuck threads.
  5. AdminServer has 0 stuck threads.

It works! But the code is a hassle. From WebLogic 12c and upwards, WebLogic exposes an aggregate stuck thread attribute. If I change my script to use this new attribute, that makes for much cleaner code:

  1. ## Prevent printing output to the screen
  2. redirect('/dev/null','false')
  3.  
  4. ## Insert your own password here
  5. connect("weblogic", <your password>, "t3://localhost:7001")
  6.  
  7. domainRuntime()
  8. servers = ls('/ServerRuntimes','true','c')
  9. result=dict()
  10. for server in servers:
  11. cd('/ServerRuntimes/' + server + '/ThreadPoolRuntime/ThreadPoolRuntime')
  12. result[server] = get('StuckThreadCount')
  13.  
  14. ## Reenable printing output
  15. redirect('/dev/null','true')
  16. for key in result:
  17. print(key + " has " + str(result[key]) + " stuck threads.")

There, that should do it. Of course, you can adapt these scripts in many ways, for example to save the results and inpect the behaviour of those threads over time.

Mark Otting
About the author Mark Otting

Mark is an administrator with more then 10 years of experience in Weblogic server, the Oracle Service Bus and Oracle SOA Suite. Coming from a background as a developer and having a broad spectrum of technical interests, he is often found in the role of linking pin and troubleshooter between departments. His specialties include optimising, system administration, both on the technical as on the governing aspects.

More posts by Mark Otting
Comments
Reply