Understanding How To Calculate Historian Disk Space

Sep 1, 2020 10:19:39 AM / by Jeff Knepper

"What type of disk space will the Canary Historian need?" is a common question (and a valid one!) but not necessarily as simple as it might sound....

Data Historian Storage Needs


The Canary Historian stores tags, organized into groups, referred to as DataSets.  By default, each DataSet archives data into daily Historical Database Files, also referred to as HDB files.  Each HDB file contains all of the tag names, properties, timestamps, values, and qualities for an entire day.




Of course, HDB files are optimized for storage, and at the end of each day, are validated and compressed automatically by the historian using a loss-less compression algorithm.  Because it is loss-less, all original data values are maintained without using any interpolation while still featuring a 3:1 compression ratio.


Get More From Data Webinar CTA-1


Several factors contribute to the overall size of each HDB file:

Number of tags

Rather obvious, but the higher the tag count the more data.  However, this is usually the only component that is considered when someone asks the "how much storage is required" question!


Length of tag names

Canary only writes the tag name one-time for each HDB file, but unnecessarily long tag names mixed with high tag counts (100,000+) can increase the footprint.


Sample rate

How fast you are requesting the Canary Data Collector to read your tags will contribute to the number of timestamps, values, and quality scores that are written to an HDB file.  Although Canary is super efficient by ignoring unchanged values, choosing to update the timestamp rather than duplicate the entry, fast sample rates on some analog tags can create a lot of values. Imagine a temperature sensor with one hundredth of a degree accuracy.  If polled every second, you would nearly always see a value change.

Using deadbanding features when setting up logging sessions with the Canary Data Collectors can help avoid this issue and provides balance for faster scan classes without having to worry about unnecessary data storage.


Date change rate

Just because tags are being scanned often doesn't mean they change often.  As mentioned above, Canary 'writes-by-exception', ignoring unchanging values and updating the timestamp of the current value.  However, understanding how often your tags typically change goes a long way to predicting what your storage footprint may look like.  Some tags, like production values, alarms, or high/low set points, may only change a few times a minute, or only once or twice a day.  Just the opposite however, are tags like hydraulic pressures that might have data values that change a hundred times a second, all of which are valuable to record.  Canary has found that on average, fewer than 40% of the tags that are part of most continual processes change each time they are sampled.


Data type

The type of data you collect will impact the overall storage requirements.  These could range from single byte booleans to 8 byte floats.  Strings can also be stored; Canary only stores each unique string a single time within an HDB file.


Timestamp resolution

If you don't need millisecond resolution in your timestamps, use the 'Timestamp Normalization' feature of the Canary Data Collectors to reduce your timestamps to an appropriate length.  Long timestamps written tens of thousands of times per day per tag can add up.


Tag metadata

Depending on the type of Data Collector being used, you have the ability to record metadata, or properties, along with your tag values and quality scores.  Similar to tag names, lengths of strings used will impact overall DB file size.  


Get More From Data Webinar CTA-1


Estimating File Size

To estimate the potential hard driving sizing for your project, consider the following potential scenarios, both representative of typical complex systems.


7 Year Archive - 664 GB

  • 10,000 tags
  • 30% boolean, 20% analog integer, 50% analog float
  • Average sample rate - 5 seconds
  • Average change rate - 30%


7 Year Archive - 1.30 TB

  • 10,000 tags
  • 30% boolean, 20% analog integer, 50% analog float
  • Average sample rate - 3 seconds
  • Average change rate - 35%


Here are additional details of storage needs, based solely on data type.


1,000 two byte boolean tags

Scan rate Change rate Daily file size 7 years
1 sec 20% 64 MB 163.2 GB
5 sec 20% 12.7 MB 32.4 GB
30 sec 20% 2.15 MB 5.4 GB
5 min 20% 0.21 MB 0.5 GB


1,000 four byte analog float tags

Scan rate Change rate Daily file size 7 years
1 sec 70% 33.8 MB 863.48 GB
5 sec 70% 6.7 MB 172.7 GB
30 sec 70% 1.1 MB 28.7 GB
5 min 70% 0.11 MB 2.8 GB


1,000 four byte analog integer tags

Scan rate Change rate Daily file size 7 years
1 sec 40% 19.3 MB 493.4 GB
5 sec 40% 3.8 MB 98.7 GB
30 sec 40% 0.64 MB 16.4 GB
5 min 40% 0.064 MB 1.64 GB


With so many potential factors, estimating necessary storage can be difficult.  However, expanding an existing system to provide for more storage options is not difficult.  It is recommended that you size a minimum of 1 TB for every 10,000 tags based on the above sample and change rates.


Topics: Canary Historian

Jeff Knepper

Written by Jeff Knepper