Import Overview

Architecture Overview

In general, the import assumes the following setup:

  • A Camunda engine, where the data is imported from.
  • The Optimize back-end, where the data is transformed to an appropriate format for efficient data analysis.
  • Elasticsearch, which is the database of Optimize, to where the formatted data is persisted.

The following depicts the setup and how the components communicate with each other:

Hereby, the main idea is that Optimize queries the engine data using the REST-API and transforms the data such that it can be quickly and easily queried by Optimize. To prevent the engine from producing too much load from the Optimize queries and at the same time to make the import fast, Optimize adapts the amount of queries automatically to the engine’s response time.

Furthermore, one should be aware of what the general requirements for the data in Optimize are:

  • Optimize does not own the data of the engine. The Optimize dataset can always be removed and reimported or adapted to the needs of Optimize.
  • The data is only a near real-time representation of the engine database. That means, Elasticsearch may not contain the data of the most recent time frame, e.g. the last two minutes, but all the previous data should be synchronized.

If you are interested in the details of the import, have a look at the designated section Import Procedure.

Import performance overview

This section gives an overview of how fast Optimize imports certain datasets to get a feeling of the import speed of Optimize and if it meets certain demands.

It is very likely that this changes on different datasets, e.g. the speed of the import depends on how the data is distributed. Also how all involved components are set up has an impact on the import. For instance, if you deploy the Camunda Platform on a different machine than Optimize and Elasticsearch to provide both applications with more computation resources the process is likely to speed up or if the Camunda Platform and Optimize are physically far away from each other, the network latency might slow down the import.

Setup

The following components were used within the import:

Component Version
Camunda Platform 7.8.0 on a Tomcat 8.0.47
Camunda Platform Database PostgreSQL 9.6
Elasticsearch 6.0.0
Optimize 2.0.0

The Optimize configuration with the default settings was used, as described in detail in the configuration overview.

All three of the following components were running on a single laptop:

  • Processor: Intel® Core™ i5 (6. Generation),6440HQ Processor,4x 2.60 GHz
  • Working Memory: 16 GB (DDR4)
  • Storage: 192 SSD (SSD)

The time was measured from the start of Optimize until the whole import of the data to Optimize was finished.

Large size data set

This dataset contains the following amount of instances:

Number of Process Definitions Number of Activity Instances Number of Process Instances Number of Variable Instances
20 21 932 786 2 000 000 6 913 889

Here you can see how the data is distributed over the different process definitions:

Results:

  • Duration of importing the whole data set: ~ 40 minutes
  • Speed of the import: 10 000 - 14 000 database rows per second during the import process

Medium size data set

This dataset contains the following amount of instances:

Number of Process Definitions Number of Activity Instances Number of Process Instances Number of Variable Instances
46 1 427 384 261 106 1 273 324

Here you can see how the data is distributed over the different process definitions:

Results:

  • Duration of importing the whole data set: ~5 minutes
  • Speed of the import: 8 000 - 14 000 database rows per second during the import process

Small data set

This dataset contains the following amount of instances:

Number of Process Definitions Number of Activity Instances Number of Process Instances Number of Variable Instances
10 777 340 73 487 2 387 146

Here you can see how the data is distributed over the different process definitions:

Results:

  • Duration of importing the whole data set: ~3-4 minutes
  • Speed of the import: 5000 - 8000 database rows per second during the import process

On this Page: