WSO2 Data Analytics Server is succeeded by WSO2 Stream Processor. To view the latest documentation for WSO2 SP, see WSO2 Stream Processor Documentation.

All docs This doc
||
Skip to end of metadata
Go to start of metadata

The General Data Protection Regulation (GDPR) is a new legal framework formalized by the European Union (EU) in 2016. This regulation comes into effect from 28, May 2018, and can affect any organization that processes Personally Identifiable Information (PII) of individuals who live in Europe. Organizations that fail to demonstrate GDPR compliance are subjected to financial penalties. 

Do you want to learn more about GDPR?

If you are new to GDPR, we recommend that you take a look at our article series on Creating a Winning GDPR Strategy.

For more resources on GDPR, see the white papers, case studies, solution briefs, webinars, and talks published on our WSO2 GDPR homepage. You can also find the original GDPR legal text here.

The Forget-me tool packed with DAS enables you to hide any PII (Personally Identifiable Information) sent to the product to be processed as well as remove references to deleted user IDs. See the following sections for details about the GDPR-compliancy of WSO2 DAS.

Forget-me tool overview

The Forget-me tool is shipped with WSO2 DAS by default in the <DAS_HOME>/repository/components/tools/identity-anonymization-tool-x.x.x directory. If required, you can change the default location of the configurations of this tool or make changes to the default configurations. You can also run the Forget-me tool in the standalone mode.

Changing the default configurations location

You can change the default location of the tool configurations if desired. You may want to do this if you are working with a multi-product environment where you want to manage configurations in a single location for ease of use. Note that this is optional.

To change the default configurations location for the embedded tool, do the following:

  1. Open the forgetme.sh file found inside the <DAS_HOME>/bin directory.

  2. The location path is the value given after -d  within the following line. Modify the value after -d to change the location.

    The default location path is $CARBON_HOME/repository/components/tools/forget-me/conf. 

    sh $CARBON_HOME/repository/components/tools/identity-anonymization-tool/bin/forget-me -d $CARBON_HOME/repository/components/tools/identity-anonymization-tool/conf -carbon $CARBON_HOME $@

Changing the default configurations of the tool

All configurations related to this tool can be found inside the <DAS_HOME>/repository/components/tools/identity-anonymization-tool/conf directory. The default configurations are set up as follows:

  • Read Logs: <DAS_HOME>/repository/logs
  • Read Datasource: <DAS_HOME>/repository/conf/datasources/
  • Default datasources: WSO2_ANALYTICS_EVENT_STORE_DB, WSO2_ANALYTICS_PROCESSED_DATA_STORE_DBWSO2_CARBON_DBWSO2_METRICS_DBWSO2ML_DB
  • Log file name regex: The regex patterns defined in all the files in the <DAS_HOME>/repository/components/tools/identity-anonymization-tool/conf/log-config directory are considered.

For information on changing these configurations, see Configuring the config.json file in the Product Administration Guide.

Running the Forget-me tool in the standalone mode

This tool can run standalone and therefore cater to multiple products. This means that if you are using multiple WSO2 products and need to delete the user's identity from all products at once, you can do so by running the tool in standalone mode.
For information on how to build and run the Forget-Me tool, see Removing References to Deleted User Identities in WSO2 Products in the WSO2 Administration Guide.

Removing PII via the Forget-me tool

In WSO2 DAS, event streams specify the schema for events to be selected into the DAS event flow to be processed. This schema can include user IDs and other information that you want to be hidden when DAS persists events for batch analytics.  This can be done via the Forget-me Tool.

To demonstrate this, consider an example where there are two streams as given below.

Stream NameAttribute List
org.wso2.gdpr.students
  • username
  • email
  • dateOfBirth
org.wso2.gdpr.students.marks
  • username
  • marks

In the above streams, the user name, email and the date of birth are considered PII (Personally Identifiable Information) that need to be hidden. To do this, follow the steps given below.

Step 1: Configure the streams.json file

In order to identify the streams and the stream attributes with PII, you need to create this file with definitions of the relevant streams and each attribute that contains PII (Personally Identifiable Information). This file must be placed in the <DAS_HOME>/repository/components/tools/identity-anonymization-tool-x.x.x/conf/streams directory. 

The following is the sample streams.json file for this scenario.

{
    "streams": [
        {
            "streamName": "org.wso2.gdpr.students",
            "attributes": ["username", "email", "dateOfBirth"],
            "id": "username"
        },
        {
            "streamName": "org.wso2.gdpr.students.marks",
            "attributes": ["username"],
            "id": "username"
        }
    ]
}

This file must include the following information as shown in the sample above:

The above configuration includes the following:

  • Stream Name: The name of the stream.
  • Attributes: The list of attributes that contain PII.
  • id: The ID attribute that needs to be replaced with the value of pseudonym argument when executing the tool.

Step 2: Configure the config.json file

In order to identify the streams and the stream attributes with PII, you need to create this file with definitions of the relevant streams and each attribute that contains PII (Personally Identifiable Information). This file must be placed in the <DAS_HOME>/repository/components/tools/identity-anonymization-tool-x.x.x/conf directory. 

The analytics-streams processor needs to be added to the configuration file of the Forget-Me tool as shown on the sample below.

{
  "processors" : [
    "log-file",
    "analytics-streams"
  ],
  "directories": [
    {
      "dir": "log-config",
      "type": "log-file",
      "processor" : "log-file",
      "log-file-path" : "logs",
      "log-file-name-regex" : "(.)*"
    },
    {
      "dir": "streams",
      "type": "analytics-streams",
      "processor" : "analytics-streams"
    }
  ]
}

Currently, you can apply the Forget-me tool to remove PII from the following locations:

ProcessorLocation
log-file <DAS_HOME>/repository/logs directory is the default location for logs.
analytics-streamsInformation persisted for the streams you specified in the

<DAS_HOME>/repository/components/tools/identity-anonymization-tool-x.x.x/conf/streams/streams.json file are removed from the persisted streams in the event store. For more information about persistence of data, see Configuring Data Persistence.



Step 3: Execute the Forget-me tool

To execute the Forget-me tool, issue the following command pointing to the <DAS_HOME> directory.

forget-me -U <USERNAME> -d <CONF_DIR> -carbon <DAS_HOME>

Removing references to deleted user identities

In addition to the above, you can also delete references to deleted user information of WSO2 DAS via the Forget-me tool.

Before you begin

  • Note that this tool is designed to run in offline mode (i.e., the server should be shut down or run on another machine) in order to prevent unnecessary load to the server. If this tool runs in online mode (i.e., when the server is running), DB lock situations on the H2 databases may occur.
  • If you have configured any JDBC database other than the H2 database provided by default, copy the relevant JDBC driver to the <DAS_HOME>/repository/components/tools/identity-anonymization-tool/lib directory.
  1. Open a new terminal window and navigate to the <DAS_HOME>/bin directory. 
  2. Execute one of the following commands depending on your operating system:

    • On Linux/Mac OS: ./forgetme.sh -U <username>
    • On Windows: forgetme.bat -U <username>

    Note

    The commands specified above use only the -U <username> option, which is the only required option to run the tool. There are several other optional command line options that you can specify based on your requirement. The supported options are described in detail below.

    Command Line OptionDescriptionRequiredSample Value
    UThe name of the user whose identity references you want to remove.Yes-U john.doe
    dThe configuration directory to use when the tool is run.
    If you do not specify a value for this option, the <DAS_HOME>/repository/components/tools/identity-anonymization-tool-x.x.x/conf directory (which is the default configuration directory of the tool) is used.
    No-d <TOOL_HOME>/conf
    T

    The tenant domain of the user whose identity references you want to remove.

    If you specify a tenant domain via this option, use the TID option to specify the ID of which the references must be removed.

    No

    -T acme-company

    The default value is carbon.super

    TID

    The tenant ID of the user whose identity references you want to remove.

    It is required to specify a tenant ID if you have specified a tenant domain via the TID option.

    No-TID 2346
    DThe user store domain name of the user whose identity references you want to remove.No

    -D Finance-Domain

    The default value is PRIMARY.

    puThe pseudonym with which the user name of the user whose identity references you want to remove should be replaced. If you do not specify a pseudonym when you run the tool, a random UUID value is generated as the pseudonym by default.No

    -pu “123-343-435-545-dfd-4”

    carbon

    The CARBON HOME. This should be replaced with the variable $CARBON_HOME in directories configured in the main configuration file.

    No-carbon “/usr/bin/wso2das/wso2das3.2.0
  • No labels