This documentation is work in progress and will be released with the next WSO2 EI version.
General Data Protection Regulation (GDPR) for WSO2 EI - WSO2 Enterprise Integrator 6.x.x - WSO2 Documentation

All docs This doc
||
Skip to end of metadata
Go to start of metadata

WSO2 Enterprise Integrator (WSO2 EI) consists of four profiles (ESB, Message Broker, Business Process Server, and Analytics) that can persist a user's personally identifiable information (PII) in various sources, namely log files and RDBMSs. However, organizations that use WSO2 EI have a legal obligation to remove all instances of a user's PII from the system if the relevant user requests. For example, consider a situation where an employee resigns from the organization and, thereby, requests the organization to remove all instances of one's PII from the organization's system. You can fulfill this requirement by anonymizing the user's PII in the system, or (in some cases) by completely removing such PII from the system.

See the topics given below for instructions on how to remove PII from each profile of WSO2 EI.

What is GDPR?

The General Data Protection Regulation (GDPR) is a new legal framework that was formalized by the European Union (EU) in 2016. It comes into effect from 28, May 2018. GDPR requires any organization that processes Personally Identifiable Information (PII) of individuals who live in Europe, to be compliant with the regulations. Organizations that fail to demonstrate GDPR compliance are subjected to financial penalties.

Do you want to learn more about GDPR?

If you are new to GDPR, we recommend that you take a look at our tutorial series on Creating a Winning GDPR Strategy.

For more resources on GDPR, see the white papers, case studies, solution briefs, webinars, and talks published on our WSO2 GDPR homepage. You can also find the original GDPR legal text here.

How WSO2 EI persists a user's PII

Each profile of WSO2 EI persists user information in various different sources as explained below.

ESB Profile

The ESB profile can persist PII in various log files (carbon logs, audit logs, API logs, and service-specific logs) depending on the mediation logic defined. The ESB does not persist a user's PII in any RDBMS by default.

BPS Profile

The BPS profile of WSO2 EI contains three main components: BPMN, BPEL, and Human Tasks. These components will persist a user's PII in various ways as explained below.

  • If you have workflows defined in any one of the above components or all of them, PII of users will get stored in the relevant RDBMS.

    What are the RDBMSs used by the BPS profile?

    The BPS profile uses two separate databases dedicated to BPMN data and BPEL/Human Task data respectively. By default, these are two H2 databases that are shipped with the product. You can find out more about these databases from here.

  • All three components will also persist user information in log files (carbon logs and audit logs).
Message Broker ProfileThe Message Broker profile does not persist any PII in any way.
Analytics Profile

The Analytics profile of WSO2 EI uses event streams, which contain user information (PII) in its schemas. This data is stored in two separate RDBMSs dedicated for the Analytics profile.

By default, when you start the Analytics profile, two H2 databases will be created for this purpose. However, when you move to a production environment, it is recommended to change these to industry-grade RDBMSs. You need to create the connection to the new databases from the profile by configuring the analytics-datasources.xml file (stored in the <EI_HOME>/wso2/analytics/conf/datasources/ directory.

Tools for removing PII in WSO2 EI

The following tools are shipped with WSO2 EI:

  • WSO2 EI is shipped with the Forget-Me Tool, which can anonymize a user's PII in log files and RDBMSs by replacing all occurrences of the deleted user with either a randomly generated UUID value or a specified pseudonym. This tool is stored in the <EI_HOME>/wso2/tools/forget-me/ directory of WSO2 EI. Find out about all the capabilities of the Forget-Me tool from here.
    Important! In the case of log files, note that the Forget-Me Tool does not replace PII values in the actual log files. Instead, the tool will create a new set of log files with anonymized PII values. The organization can then remove the original log files.

    If you want to use the Forget-Me tool to remove PII in multiple WSO2 products at the same time, you can use the standalone version of the tool.
    For information on how to build and run the Forget-Me tool in standalone mode, see Removing References to Deleted User Identities in WSO2 Products in the WSO2 Administration Guide.

  • In addition to this tool, the BPS profile also contains a set of SQL scripts that can remove PII by completely removing any process instances associated with a particular user. This method is only applicable to BPEL and Human Task processes.

Prerequisites for removing PII

As explained in How WSO2 EI persists a user's PII, the ESB profile and the BPS profile will store user information in log files. Note that we can only remove a deleted user's PII from archived log files, and not the live log files that are connected to the system.

Therefore, before you start removing PII stored by the ESB profile and the BPMN component, be sure that the relevant user has been inactive in the system for a sufficient amount of time. This will ensure that all of the user's PII contained in log files are successfully archived. You can then follow the instructions given below to remove the user's PII references from the archived log files.

Removing PII from the ESB profile

Before you begin,

Anonymizing PII references

You can use the Forget-Me Tool to remove references to personally identifiable information (PII) from logs in the ESB profile. For example, consider a proxy service that logs the username that is sent through a payload. A Log mediator can be used for this as shown below.

<log level="custom">
<property expression="//Authentication/username" name="USER_NAME"/>
</log>

The user name that is used when you invoke this query will be logged in the following log files: wso2carbon.log file, audit.log file, warn.log, and the service-specific log file that is enabled for the proxy service.

[EI-Core]  INFO - LogMediator USER_NAME = Sam

Let's look at how to anonymize the username value in log files.

  1. Every log statement follows the same pattern where the "USER_NAME" keyword is followed by an actual username (in this example it is "Sam"). The regex pattern of this log statement will be as shown below. The Forget-Me Tool will use the below regex pattern to anonymize the username. 

    This pattern should be added to the ei-patterns.xml file (stored in the <EI_HOME>/wso2/tools/forget-me/conf/log-config/ directory).

    <pattern key="pattern3">
           <detectPattern>(.)*(USER_NAME)(.)*${username}(.)*</detectPattern>
           <replacePattern>${username}</replacePattern>
    </pattern>
  2. Update the config.json file (stored in the <EI_HOME>/wso2/tools/forget-me/conf/directory) as shown below. This file contains references to all the log files (except any service-specific log file) in the system that store the above user information. If you have enabled a service-specific log file, you need to add that file name (see the element descriptions given below).

    {
     "processors" : [ 
       "log-file"
     ],
     "directories": [
       {
         "dir": "log-config",
         "type": "log-file",
         "processor" : "log-file",
         "log-file-path" : "<EI_HOME>/repository/logs",
         "log-file-name-regex" : "(audit.log|warn.log|wso2carbon.log)(.)*"
       }
     ]
    }

    The elements in the above configuration are explained below.

    • "processors": The processors listed for this element specifies whether the tool will on log files, RDBMSs, or analytics streams. In the case of the ESB profile, we only need to remove PII from log files, and therefore, the processor is set to "log-file".
    • "directories": This element lists the directories that correspond to the processors. In the case of the ESB profile, we need to specify the directories that store log files.
    • "log-file-path": This specifies the directory path to the log files. Note that all the relevant log files are stored in the <EI_HOME>/repository/logs/ directory.

      Be sure to replace the "log-file-path" value with the correct absolute path to the location where the log files are stored. If you are on Windows, be sure to use the forward slash ("/") instead of the back slash ("\"). For example: C:/Users/Administrator/Desktop/wso2ei-6.2.0/repository/log.

    • "log-file-name-regex": This gives the list of log files (stored in the log-file-path) that will persist the user's PII. Note that the above log-file-name-regex includes the audit.log, warn.log, and wso2carbon.log files, as well as the archived files of the same logs. If you have enabled a service-specific log file, be sure to add the file name to this list.

  3. Open a command prompt and navigate to the <EI_HOME>/bin directory.

  4. Execute the following command to anonymize the user information that was added to the ei-patterns.xml file.
    • On Linux:

      ./forgetme.sh -U Sam
    • On Windows:

      forgetme.bat -U

    This will result in the following:

    1. Copies will be created of all the log files specified in the config.json file. The following is the format of the log copyanon-<time_stamp>-<original_log_name>.log. For exampleanon-1520946791793-warn.log.

    2. The PII will be anonymized in the copies. The log files will display the user information as a pseudonym.

      [EI-Core]  INFO - LogMediator USER_NAME = 86c3bfd9-f97c-4b08-9f15-772dcb0c1c

    For the list of commands you can run using the Forget-Me tool, see this link.

Deleting original (archived) log files

Note that the PII is not removed from the original log files. It is the responsibility of the organization to remove the original log files that contain the user's PII.

Removing PII from the BPS profile

Let's look at how to anonymize/remove personally identifiable information (PII) stored by the three main components of the BPS profile (BPMN component, BPEL component, and the Human Task component).

Anonymizing PII in the BPMN (activiti) component

The PII references stored by the BPMN component can be removed from log files as well as the BPMN-specific database by using the Forget-Me Tool.

Follow the steps given below.

  1. Add the relevant drivers for your BPMN-specific database to the <EI_HOME>/wso2/tools/forget-me/lib directory. For example, if you have changed your BPMN database from the default H2 database to MySQL, copy the MySQL driver to this given directory.
  2. Open the activiti-datasources.xml file (stored in the <EI_HOME>/wso2/tools/forget-me/conf/datasources/ directory), and specify the details of the RDBMS that stores the metadata from BPMN workflows.
  3. Update the config.json file ( stored in the <EI_HOME>/wso2/tools/forget-me/conf/ directory) as shown below. This file contains references to all the log files in the system, and the RDBMS that stores the user information form BPMN workflows.

    {
     "processors" : [
       "log-file", "rdbms"
     ],
     "directories": [
       {
         "dir": "log-config",
         "type": "log-file",
         "processor" : "log-file",
         "log-file-path" : "<EI_HOME>/wso2/business-process/repository/logs",
         "log-file-name-regex" : "(audit.log|warn.log|wso2carbon.log)(.)*"
       },
       {
        "dir": "sql",
        "type": "rdbms",
        "processor" : "rdbms"
        }
     ],
     "extensions": [
       {
         "dir": "datasources",
         "type": "datasource",
         "processor" : "rdbms"
       }
     ]
    }

    The elements in the above configuration are explained below.

    • "processors": The processors listed for this element specifies whether the tool will run for log files, RDBMSs, or analytics streams. In the case of the BPMN component of the BPS profile, we need to remove PII from log files, as well as the BPMN-specific database. Therefore, the processor is set to "log-file","rdbms".
    • "directories": This element lists the directories that correspond to the processors. In the case of the BPMN component, we need to specify the directories that store log files, as well as the directory of the SQL scripts for the BPMN database. Therefore, the above configuration contains two directories: "log-config" and "sql".
    • "log-file-path": This specifies the directory path to the logs. In this example, all the relevant log files for BPS are stored in the <EI_HOME>/wso2/business-process/repository/logs/ directory. 

      Be sure to replace the "log-file-path" value with the correct absolute path to the location where the log files are stored. If you are on Windows, be sure to use the forward slash ("/") instead of the back slash ("\"). For example: C:/Users/Administrator/Desktop/wso2ei-6.2.0/repository/log.

    • "log-file-name-regex": This gives the list of log files (stored in the log-file-path) that will persist the user's PII. Note that the above log-file-name-regex includes the audit.log, warn.log, and wso2carbon.log files, as well as the archived files of the same logs.

  4. Open a command prompt and navigate to the <EI_HOME>/bin directory.

  5. Run the tool using the following command:

    • On Linux:

      ./forgetme.sh -U <USERNAME>
    • On Windows:

      forgetme.bat -U <USERNAME>

    This will result in the following:

    1. Copies will be created of all the log files specified in the config.json file. The following is the format of the log copyanon-<time_stamp>-<original_log_name>.log. For exampleanon-1520946791793-warn.log.

    2. The PII will be anonymized in the copies. The log files will display the user information as a pseudonym.

    3. The user's PII will be removed from the BPMN database.

    For the list of commands you can run using the Forget-Me tool, see this link.

Deleting original (archived) log files

Note that the PII is not removed from the original log files. It is the responsibility of the organization to remove the original log files that contain the user's PII.

Removing Human Task and BPEL process instances

If you are using Human Tasks and BPEL workflows in your BPS profile, you can remove a user's personally identifiable information (PII) from the BPS instance by removing all process instances and task instances (associated with message exchanges) from the server.

WSO2 EI is shipped with a set of SQL scripts (stored in the bpel and humantask folders in the <EI_HOME>/wso2/business-process/repository/resources/cleanup-scripts directory) that you can use for removing process instances and task instances from the BPS profile. There are two ways of doing this:

  • Remove all completed tasks/processes. This can be configured to a particular period.
  • Identify the processes/tasks that are specific to a given user ID, and remove them individually.

For instructions, see BPS database cleanup.

Removing PII from the Analytics profile

Before you begin, find out about how the Analytics profile stores a user's PII.

Shown below is an example data stream (used by the ESB profile) for product analytics. Note that the username, email and the date of birth are personally identifiable information (PII) of the user.

Stream NameAttribute List
org.wso2.gdpr.students
  • username
  • email
  • dateOfBirth
org.wso2.gdpr.students.marks
  • username
  • marks

These PII references can be removed from the Analytics database by using the Forget-Me ToolFollow the steps given below.

  1. Add the relevant drivers for your Analytics-specific databases to the <EI_HOME>/wso2/tools/forget-me/lib directory. For example, if you have changed your Analytics databases from the default H2 instances to MySQL, copy the MySQL driver to this given directory.
  2. Create a folder named 'streams' in the <EI_HOME>/wso2/tools/forget-me/conf/ directory. 
  3. Create a new file named streams.json with the content shown below, and store it in the /streams directory that you created in the previous step. This file holds the details of the streams and the attributes with PII that we need to remove from the database.

    {
        "streams": [
            {
                "streamName": "org.wso2.gdpr.students",
                "attributes": ["username", "email", "dateOfBirth"],
                "id": "username"
            },
            {
                "streamName": "org.wso2.gdpr.students.marks",
                "attributes": ["username"],
                "id": "username"
            }
        ]
    }

    The above configuration includes the following:

    • Stream Name: The name of the stream.
    • Attributes: The list of attributes that contain PII.
    • id: The ID attribute, which holds the value that needs to be anonymized (replaced with a pseudonym).
  4. Update the config.json file ( stored in the <EI_HOME>/wso2/tools/forget-me/conf/ directory) as shown below.

    {
        "processors": [
            "analytics-streams"
        ],
        "directories": [
            {
                "dir": "analytics-streams",
                "type": "analytics-streams",
                "processor": "analytics-streams"
            }
        ]
    }
  5. Open a command prompt and navigate to the <EI_HOME>/bin directory.

  6. Run the tool using the following command:

    • On Linux:

      ./forgetme.sh -U <USERNAME> -carbon <EI_ANALYTICS_HOME>
    • On Windows:

      forgetme.bat -U <USERNAME> -carbon <EI_ANALYTICS_HOME>

    For the list of commands you can run using the Forget-Me tool, see this link.

  • No labels