This documentation is for older WSO2 products. View documentation for the latest release.
Page Comparison - Troubleshooting in Production Environments (v.11 vs v.12) - Clustering Guide 4.2.0 - WSO2 Documentation

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Open a command line in Solaris.
  2. Run prstat and have a look to the last column, labeled PROCESS/NLWPNLWP is a reference to the number of lightweight processes and are the number of threads the process is currently using with Solaris as there is a one-to-one mapping between lightweight processes and user threads. A single thread process will show 1 there while a multi-threaded one will show a larger number. See the following code block for an example.

    Code Block
      PID USERNAME  SIZE   RSS STATE  PRI NICE      TIME  CPU PROCESS/NLWP       
    ...
    12905 root     4472K 3640K cpu0    59    0   0:00:01 0.4% prstat/1
    18403 monitor   474M  245M run     59   17   1:01:28 9.1% java/103
     4102 oracle     12G   12G run     59    0   0:00:12 4.5% oracle/1

    If you observe the PROCESS/NLWP value in the example above, you can identify that prstat and oracle are single thread processes, while java is a multi-threaded process.

  3. Alternatively, you can analyze individual thread activity of a multi-threaded process by using the -L and -p options, like prstat -L -p pidThis displays a line for each thread sorted by CPU activity. In that case, the last column is labeled PROCESS/LWPID, where LWPID is the thread ID. If more than one thread shows significant activity, your process is actively taking advantage of multi-threading.

...

Checking the health of a cluster

In Hazelcast, the health of a member in the cluster is determined by the heartbeats the member sends. If the well-known member does not receive a heartbeat within a given amount of time (this can be configured), then the node is assumed dead. By default, the given amount of time is 600 seconds (or 10mins), which might be too much for some scenarios.

...

If a heartbeat message is not received by a given amount of time, Hazelcast assumes the node is dead. This is configured via the hazelcast.max.no.heartbeat.seconds property. The optimum value for this property depends on the system. Although the default is 600 seconds, it might be necessary to reduce the heartbeat to a lower value if nodes are to be declared dead in a shorter time frame. However, you must verify this in your system and adjust as necessary depending on your scenario.

Warning

Warning: Reducing the value of this property to a lower value can result in nodes being considered as dead even if they are not. This results in multiple messages indicating that a node is leaving and rejoining the cluster.

Please find the below steps on how to configure the maximum time between heartbeats.

  1. Create a property file called hazelcast.properties, and add the following property to it.
    hazelcast.max.no.heartbeat.seconds=30300 
  2. Place this file in the <PRODUCT_HOME>/repository/conf/ directory in all the nodes in your cluster.
  3. Restart the servers.

...