Data Historian Performance Test: Pushing Canary to the Limit

In 2014 Gary Stern and the engineering team at Canary Labs sought to test the full capabilities of the Canary Data Historian. Published below are the complete findings from their study. As this was originally intended for use as an internal document, please excuse any grammatical or spelling errors.

In 2014 Gary Stern and the engineering team at Canary Labs sought to test the full capabilities of the Canary Data Historian. Published below are the complete findings from their study. As this was originally intended for use as an internal document, please excuse any grammatical or spelling errors.

Scaling and Capacity

Scaling and Capacity are complex, multiple-layer issues. Generally, when scalability and total capacity are questioned, it is to discover if the system will have the capacity and performance to meet the need and demands of the end user's system.

To answer this basic question, a complete system analysis must be examined to determine whether the data historian can store data as fast as needed and, then after storing, provide data to clients at an acceptable rate.

At the heart of overall system performance is the Canary Historian, so testing  started with our proprietary database.

For testing purposes, the following benchmark machine was used:

System Specifications

Dell Precision T7600 Workstation
OS: Windows 7 Pro
CPU Xeon 2 GHz processor
Cores: 6 physical (12 Logical)
Memory: 32 Gb
Disk Storage: RAID 320Gb – SATA 7200 RPM Drives
Cost to Buy: Less than $3000 in 2014

Canary Historian: Version 10.1 (64-bit Version)

Historian Raw Write Performance:
2.80 Million TVQs/second

Benchmark Parameters: 100 Million TVQ’s across 10,000 tags in 8 Datasets, Data type: R4

Historian CPU usage was around 35%.

Based on our analysis, we believe that for this test the limiting factor is disk performance and better performance may be achieved by using solid state drives.

Question: Does the total number of tags affect performance?

Answer: Yes, but not by much.  It will take longer to write the data because of the structure of the historian. Data is stored on a per tag basis, so for a higher tag count it will need to write to a higher number of locations within the file.  For instance, test results show that 100,000 tag write times are only 18% slower than 10,000 tag write times.

When 100 Million TVQs are sent to the Historian:

Tag Count
Total .hdb
File Size
Average Bytes/TVQ
Throughput
10,000
524.2 Mb
5.36
2.80M TVQs/sec
100,000
524.2 Mb
5.36
2.37M TVQs/sec

Noteworthy:
Canary has tested a single historian with one million tags successfully updating at one million TVQ’s per second.

Question: Does the data types affect performance?

Answer: Yes. An R8 value versus an R4 value is twice the number of bytes. Thus, the throughput is approximately 3% slower.

When 100 Million TVQs are sent to the Historian:

Data Type
Total .hdb
File Size
Average Bytes/TVQ
Throughput
R4
524.2 Mb
5.36
2.80M TVQs/sec
R8
914.8 Mb
9.36
2.70M TVQs/sec
Question: Does the amount of Historical data already stored in the Historian affect performance?

Answer: No. The design of the Canary Historian is such that the amount of data already stored does not affect the writing performance. For instance, if there are ten years of data stored versus only one day there will be no difference in system performance. This occurs because the historian “rolls over” the .hdb file on a regular basis (usually daily). Since live TVQ updates occur in the last, or currently open .hdb file, writing performance is very consistent.

Question: How does the Data Profile affect performance?

Answer: For the purposes of this benchmark test, every TVQ was a different value when written to the historian. In real-world operations, we typically see change rates in process control systems of 20-30%. For example, if you are monitoring a temperature of a tank, the temperature reading changes slowly and a different reading occurs every minute instead of every second. The Canary Historian is optimized with "update by exception" logic to reduce disk space, thus improving performance.

Question: What is maximum number of tags the Canary Historian can address?

Answer: A single Canary Historian was tested with 25 million tags without issue. More points could have been tested, however, since we have never needed more than a five million tag capacity, we stopped there. A single, stand-alone historian with 25 million tags would be a feasible solution for a "smart meter” electric utility where the data rate is at fifteen minute intervals and the data is loaded in blocks three or four times per day. It is not recommended for a one second interval system.  To accomplish this you would need several historical servers.

For larger systems over 250,000 tags or higher TVQ data volumes, it is recommended to configure a “Server Class” machine for the Canary Historian, with the appropriate number of cores, memory and disk storage and consider the overall data collection system and client requirements.

·        The historian performance will improve with the addition of more cores.  The Canary Historian is multi-threaded, and takes advantage of multiple processor cores.  Each Historian DataSet (group of tags within a set of .hdb files) can simultaneously execute on different threads/cores.
·        With larger tag counts, more memory improves performance as more tag indexing information can utilize main memory without disk swapping.
·        For approximate disk space, estimate the total number of TVQ’s per day and multiple by 6 (data type R4) or by 10 (data type R8).

The Canary Historian has often been deployed as a VM (Virtual Machine).   When ran in a VM environment, the historian saw performance lessened by approximately 25% when compared to a dedicated server machine.

Data Historian Raw Read Performance:
4.6 Million TVQs/second

Benchmark Parameters: 100 Million Total TVQ’s, across 1000 tags, 8 Datasets, Data type: R4

Most client applications do not connect directly to the Canary Historian; they connect to the data historian Web Service which provides additional functionality, but also communicates with the client via WCF instead of COM/DCOM.  This allows access to historian data through firewalls and because it uses a single port, is IT friendly.

When performance is measured at the client side and data is accessed using the historian Web Service, there are two components that influence overall performance. The first is the time interaction between the Web Service and the Historian and the second is the marshaling and transmission of the results from the Web Service to the Client via WCF.

However, The Historian Web Service is multi-threaded and multiple clients may access the underlying historian simultaneously. Since the client communicates to the Web Service via WCF, they must also consider the speed of the communication channel and the protocol overheads of WCF. The fastest WCF communications is via the “net.tcp” protocol. Various security and encryption levels add more overhead to the WCF.

A function in the WebServiceHelper called “CommunicationSpeedCheck” will return a speed indicator between the raw communication speed from the client application and the Historian Web Service.

 

Reading Historian data through the Historian Web Service by Client application

 
Total TVQs
Client on Local to Web Service
Client  on LAN
Client with Wireless Access
Client with Internet Connection
Read Raw
432,000
2.1 sec
2.4 sec
4.1 sec
28.2 sec
Read Processed (1 Minute Aggregate)
7,200
0.94 sec
0.96 sec
1.2 sec
2.6 sec
Client/Web Service ComSpeed
 
4.5M bps
1.2M bps
324K bps
16K bps
5 Tags – 1 second data – 24 hours – Total of 86,400 TVQs per Tag
Using “Net.TCP” communication protocol, no encryption.
 
As shown in these tests, the amount of data between the data historian and web service is identical.  However, since the aggregated result is smaller (7,200 vs, 432,000), the overall time of 0.9 versus 2.1 seconds is faster due to the amount of data transmitted from the web service to the client being smaller.   The communication channel speed between the Historian Web Service and the Client has a significant impact on performance.
 
With very heavy client usage levels, there are several possibile architectures that can improve the overall system performance.
 
·        Historian and the Historian Web Service can be on separate servers
·        There can be multiple Historian Web Services receiving data from a single Historian
·        Splitting the loading to multiple Historians, with multiple Historian Web Services
·        A single Web Service accessing multiple Historians

Have unique system requirements?  We would love to help you solve your process data problems.

 

 
CanaryLabs Vertical 250

Make It Easy To Use Your Time-Series Data

Using your time-series data to make better decisions doesn’t have to be hard! At Canary, we believe your database should do the heavy lifting for you.

Try Canary

Make It Easy To Use Your Time-Series Data

Try Canary

Most companies are spending too much money on their data historians.

Download Pricing