This section describes some recommended performance tuning configurations to optimize WSO2 ML. It assumes that you have set up the BAM server on Unix/Linux, which is recommended for a production deployment.
- Performance tuning requires you to modify important system files, which affect all programs running on the server. Therefore, it is recommended for you to familiarize yourself with these files using Unix/Linux documentation before editing them.
- The parameter values discussed below are just examples. They might not be the optimal values for the specific hardware configurations in your environment. Therefore, it is recommended that you carry out load tests on your environment to tune the product accordingly.
To optimize network and OS performance, configure the following settings in
/etc/sysctl.conffile of Linux. These settings specify a larger port range, a more effective TCP connection timeout value, and a number of other important parameters at the OS-level.
When you have the localhost port range configuration lower bound to 1024, there is a possibility that some processes may pick the ports which are already used by WSO2 servers. Therefore, it is good to increase the lower bound as sufficient for production (e.g., 10,000).
To alter the number of allowed open files for system users, configure the following settings in /etc/security/limits.conf file of Linux.
Optimal values for these parameters depend on the environment.
To alter the maximum number of processes your user is allowed to run at a given time, configure the following settings in
/etc/security/limits.conffile of Linux (be sure to include the leading * character). Each carbon server instance you run would require upto 1024 threads (with default thread pool configuration). Therefore, you need to increase the nproc value by 1024 per each carbon server (both hard and soft).
When an XML element has a large number of sub-elements and the system tries to process all the sub-elements, the system can become unstable due to a memory overhead. This is a security risk.
To avoid this issue, you can define a maximum level of entity substitutions that the XML parser allows in the system. You do this using the
entity expansion limit attribute that is in the
<ML_HOME>/bin/wso2server.bat file (for Windows) or the
<ML_HOME>/bin/wso2server.sh file (for Linux/Solaris). The default entity expansion limit is 64000.
In a clustered environment, the entity expansion limit has no dependency on the number of worker nodes
WSO2 Carbon platform-level settings
In multitenant mode, the WSO2 Carbon runtime limits the thread execution time. That is, if a thread is stuck or taking a long time to process, Carbon detects such threads, interrupts and stops them. Note that Carbon prints the current stack trace before interrupting the thread. This mechanism is implemented as an Apache Tomcat valve. Therefore, it should be configured in the
<PRODUCT_HOME>/repository/conf/tomcat/catalina-server.xml file as shown below.
classNameis the Java class used for the implementation. Set it to
thresholdgives the minimum duration in seconds after which a thread is considered stuck. The default value is 600 seconds.
JDBC Pool configurations
Within the WSO2 platform, we use Tomcat JDBC pooling as the default pooling framework due to its production ready stability and high performance. The table below indicates some recommendations on how to configure the JDBC pool using the
<PRODUCT_HOME>/repository/conf/datasources/master-datasources.xml file. For more details about recommended JDBC configurations, see The Tomcat JDBC Connection Pool.
The maximum number of active connections that can be allocated from the connection pool at the same time. The default value is
|This value should match the maximum number of requests that can be expected at a time in your production environment. This is to ensure that, whenever there is a sudden increase in the number of requests to the server, all of them can be connected successfully without causing any delays. Note that this value should not exceed the maximum number of requests allowed for your database.|
|The minimum number of connections that can remain idle in the pool, without extra ones being created. The connection pool can shrink below this number if validation queries fail. Default value is 0.||This value should be similar or near to the average number of requests that will be received by the server at the same time. With this setting, you can avoid having to open and close new connections every time a request is received by the server.|
The indication of whether connection objects will be validated before they are borrowed from the pool. If the object validation fails, it will be dropped from the pool, and we will attempt to borrow another connection.
Setting this property to 'true' is recommended as it will avoid connection requests from failing. The
To avoid excess validation, run validation at most at this frequency (time in milliseconds). If a connection is due for validation, but has been validated previously within this interval, it will not be validated again. The default value is
This time out can be as high as the time it takes for your DBMS to declare a connection as stale. For example, MySQL will keep a connection open for as long as 8 hours, which requires the validation interval to be within that range. However, note that having a low value for validation interval will not incur a big performance penalty, specially when database requests have a high throughput. For example, a single extra validation query run every 30 seconds is usually negligible.
|The SQL query used to validate connections from this pool before returning them to the caller. If specified, this query does not have to return any data, it just can't throw an SQLException. The default value is null. Example values are SELECT 1(mysql), select 1 from dual(oracle), SELECT 1(MS Sql Server).||Specify an SQL query, which will validate the availability of a connection in the pool. This query is necessary when |
WSO2 ML-specific settings
The values discussed below are general recommendations for standalone applications. They are not optimal for large scale applications. These configurations mainly depend on the size and the structure of the datasets you want to process and the underlying infrastructure of the system.
|Improvement area||Performance recommendation|
|ML JVM options|
Change the following values in the
These options predominate the performance of the ML jobs especially when the ML is run in standalone mode since Spark contexts are created within the JVM.
|Concurrency in ML|
Change the number of threads in the thread pool and the length of queue of threads in the
|Data Compression and Serialization (Spark)|
Change the following I/O compression and Serialization properties in the
|Executor memory (Spark)||Change the properties that define the amount of memory allocated for executors in a worker node for the application in the |
|Executor cores (Spark)|| Change the number of cores allocated for executors in a worker node in the |