History Cleanup

In order to satisfy data protection laws or just for general storage management purposes Optimize provides an automated cleanup functionality.

Please note the following:

  • By default the history cleanup is disabled in Optimize. Before enabling it you should consider the type of cleanup and time to live period that fits to your needs. Otherwise historic data intended for analysis might get lost irreversibly.
  • The default engine history cleanup works different than the one in Optimize due to the possible cleanup strategies. The current implementation in Optimize is equivalent to the end time strategy of the Engine.

Setup

The most central setting properties are cronTrigger and ttl, their global default configuration is the following:

historyCleanup:
  cronTrigger: '0 1 * * *'
  ttl: 'P2Y'

cronTrigger - defines at what interval and when the history cleanup should be performed in the format of a cron expression. The default is 1AM every day. To avoid any impact on daily business it is recommended to schedule the cleanup outside of business hours. See the Configuration Description for further insights into this property and it’s format.

ttl - is the global time to live period of data contained in Optimize, the field that defines the age a particular entity differs between process, decision and event data. Please refer to the corresponding subsection in regard to that. The default value is 'P2Y', which means by default data older than 2 years at the point in time when the cleanup is executed gets cleaned up. For details on the notation see the Configuration Description of the ttl property.

All the remaining settings are entity type specific and will be explained in the following subsections.

Process Data Cleanup

The age of process instance data is determined by the endTime field of each process instance. Running instances are never cleaned up.

To enable the cleanup of process instance data, the historyCleanup.processDataCleanup.enabled property needs to be set to true.

Another important configuration parameter for process instance cleanup is the historyCleanup.processDataCleanup.cleanupMode. It determines what in particular gets deleted when a process instance is cleaned up. The default value of all results in the whole process instance being deleted. For other options checkout the Configuration Description of the historyCleanup.processDataCleanup.cleanupMode property.

To setup a process definition specific ttl or different cleanupMode you can also provide process specific settings using the perProcessDefinitionConfig list which overrides the global settings for the corresponding definition key. In this particular sample the cleanup on process instances of the key MyProcessDefinitionKey would get cleaned up after 2 months instead of 2 years and when the cleanup is performed only their variables would get deleted instead of the whole process instance.

historyCleanup:
  ttl: 'P2Y'
  processDataCleanup:
    enabled: true
    cleanupMode: 'all'
    perProcessDefinitionConfig:
      'MyProcessDefinitionKey':
        ttl: 'P2M'
        processDataCleanupMode: 'variables'

Decision Data Cleanup

The age of decision instance data is determined by the evaluationTime field of each decision instance.

To enable the cleanup of decision instance data the historyCleanup.decisionDataCleanup.enabled property needs to be set to true.

Like for the Process Data Cleanup it’s possible to configure a decision definition specific ttl using the perDecisionDefinitionConfig list.

historyCleanup:
  ttl: 'P2Y'
  decisionDataCleanup:
    enabled: true
    perDecisionDefinitionConfig:
      'myDecisionDefinitionKey':
        ttl: 'P3M'

Ingested Event Cleanup

The age of ingested event data is determined by the time field provided for each event at the time of ingestion.

To enable the cleanup of decision instance data, the historyCleanup.ingestedEventCleanup.enabled property needs to be set to true.

historyCleanup:
  ttl: 'P2Y'
  ingestedEventCleanup:
    enabled: true

Please note that the ingested event cleanup does not cascade down to potentially existing Event Based Processes that may contain data originating from ingested events. To make sure data of ingested events is also removed from Event Based Processes, you need to enable the Process Data Cleanup as well.

Example

Here is a sample of how a complete cleanup configuration might look like:

historyCleanup:
  cronTrigger: '0 1 * * 0'
  ttl: 'P1Y'
  processDataCleanup:
    enabled: true
    cleanupMode: 'variables'
    perProcessDefinitionConfig:
      'VeryConfidentProcess':
        ttl: 'P1M'
        processDataCleanupMode: 'all'
      'KeepTwoMonthsProcess':
        ttl: 'P2M'
  decisionDataCleanup:
    enabled: true
    perDecisionDefinitionConfig:
      'myDecisionDefinitionKey':
        ttl: 'P3M'
  ingestedEventCleanup:
    enabled: true

In the following a brief summary is given of what presented configuration does:

  • The cleanup is scheduled to run every sunday at 1AM.
  • The global ttl of any data is 1 year.
  • The process data cleanup is enabled.
  • The cleanupMode performed on all process instances that passed the ttl period is just clearing their variable data but keeping the overall instance data like activityInstances.
  • There is a process specific setup for the process definition key 'VeryConfidentProcess' that has a special ttl of 1 month and those will be deleted completely due the specific cleanupMode: 'all' configuration for them.
  • There is another process specific setup for the process definition key 'ToKeepForeverProcess' that has a special ttl of 2 months.
  • The decision data cleanup is enabled.
  • There is a decision definition specific setup for the definition key myDecisionDefinitionKey that has a special ttl of 3 months.
  • The ingested event cleanup is enabled.

On this Page: