Import Overview
Architecture Overview
In general, the import assumes the following setup:
- A Camunda engine, where the data is imported from.
- The Optimize back-end, where the data is transformed to an appropriate format for efficient data analysis.
- Elasticsearch, which is the database of Optimize, to where the formatted data is persisted.
The following depicts the setup and how the components communicate with each other:
Hereby, the main idea is that Optimize queries the engine data using the REST-API and transforms the data such that it can be quickly and easily queried by Optimize. To prevent the engine from producing too much load from the Optimize queries and at the same time to make the import fast, Optimize adapts the amount of queries automatically to the engine’s response time.
Furthermore, one should be aware of what the general requirements for the data in Optimize are:
- Optimize does not own the data of the engine. The Optimize dataset can always be removed and reimported or adapted to the needs of Optimize.
- The data is only a near real-time representation of the engine database. That means, Elasticsearch may not contain the data of the most recent time frame, e.g. the last two minutes, but all the previous data should be synchronized.
If you are interested in the details of the import, have a look at the designated section Import Procedure.
Import performance overview
This section gives an overview of how fast Optimize imports certain datasets to get a feeling of the import speed of Optimize and if it meets certain demands.
It is very likely that this changes on different datasets, e.g. the speed of the import depends on how the data is distributed. Also how all involved components are set up has an impact on the import. For instance, if you deploy the Camunda Platform on a different machine than Optimize and Elasticsearch to provide both applications with more computation resources the process is likely to speed up or if the Camunda Platform and Optimize are physically far away from each other, the network latency might slow down the import.
Setup
The following components were used within the import:
Component | Version |
---|---|
Camunda Platform | 7.8.0 on a Tomcat 8.0.47 |
Camunda Platform Database | PostgreSQL 9.6 |
Elasticsearch | 6.0.0 |
Optimize | 2.0.0 |
The Optimize configuration with the default settings was used, as described in detail in the configuration overview.
All three of the following components were running on a single laptop:
- Processor: Intel® Core™ i5 (6. Generation),6440HQ Processor,4x 2.60 GHz
- Working Memory: 16 GB (DDR4)
- Storage: 192 SSD (SSD)
The time was measured from the start of Optimize until the whole import of the data to Optimize was finished.
Large size data set
This dataset contains the following amount of instances:
Number of Process Definitions | Number of Activity Instances | Number of Process Instances | Number of Variable Instances |
---|---|---|---|
20 | 21 932 786 | 2 000 000 | 6 913 889 |
Here you can see how the data is distributed over the different process definitions:
Results:
- Duration of importing the whole data set: ~ 40 minutes
- Speed of the import: 10 000 - 14 000 database rows per second during the import process
Medium size data set
This dataset contains the following amount of instances:
Number of Process Definitions | Number of Activity Instances | Number of Process Instances | Number of Variable Instances |
---|---|---|---|
46 | 1 427 384 | 261 106 | 1 273 324 |
Here you can see how the data is distributed over the different process definitions:
Results:
- Duration of importing the whole data set: ~5 minutes
- Speed of the import: 8 000 - 14 000 database rows per second during the import process
Small data set
This dataset contains the following amount of instances:
Number of Process Definitions | Number of Activity Instances | Number of Process Instances | Number of Variable Instances |
---|---|---|---|
10 | 777 340 | 73 487 | 2 387 146 |
Here you can see how the data is distributed over the different process definitions:
Results:
- Duration of importing the whole data set: ~3-4 minutes
- Speed of the import: 5000 - 8000 database rows per second during the import process