Canary Blog

Understanding Your Data Historian Choices and Options

Written by Jeff Knepper | Apr 6, 2020 9:08:30 PM

More than ever, companies are understanding the importance, and necessity, of the data historian. With so many database options available to you, it can be a bit confusing to work through your options.  This guide should make it easier to understand the choices you have when selecting a data historian for your industrial application.

 

A Bit Buzzworthy

Certainly not a new creation, the time-series database has been around since the very beginning.  Recently however, the popularity of time-series databases have surged, fueled equally by the massive amounts of data we produce and our desires to learn from it.  In response to this demand we have seen offerings from the big three, Amazon, Microsoft, and Google, as well as a slew of open source options emerge.  All in addition to the standard choices the industrial automation space already has been offering.  Weekly articles appear announcing multi-million-dollar investments into time-series database technologies, and at this point, trying to understand the pros and cons of all these databases can be quite confusing.  As you and your organization get serious about storing and analyzing data, how will you choose the right one?

 

Built with a purpose

Imagine using a sledgehammer to hang crown molding or a claw hammer to break up an old sidewalk.  Ridiculous right?  Yet so often, companies do just that when they select a time-series database.  Most organizations either select a time-series database that offers far more than they need, costing them greatly on both licensing and deployment costs, or they select a solution that is simply underpowered, hurting their performance and scalability. 

To keep this from happening to you, it is important to correctly identify the tasks you want, and do not want, your time-series database to perform. Like any other instrument, the time-series database is generally designed with a purpose. Choosing the right one requires you to first understand your specific needs. 

Some great questions to work through include:

  • Do you have the talent/time to integrate an open source option?
  • Would you prioritize read performance above write performance?
  • How much ongoing database maintenance do you wish to support?
  • Do you want a solution that includes data collection and analytics along with the DB?
  • How important is scalability and storage resources?
  • Are there benefits to SQL vs NoSQL solutions?
  • What type of data resolution or scan frequency do you want/need?

Lay of the land

Grouping time-series databases into general groups will help you to further identify the right fit for your organization’s needs.

 

IoT Solutions

Typically, these are geared for the cloud, feature subscription pricing, and present themselves as part of an overall IoT ecosystem.  They are generally designed for extremely large devices counts with very slow scan classes.  Additionally, they seem to be best suited for OEMs or sensor manufacturers that are interested in offering remote monitoring capabilities or perhaps organizations that wouldn’t consider real-time data access to be operationally critical.  Examples would include ThingWorx, Splunk, Ubidots, and others.

 

Analytic Warehouses

Designed to keep your time series data ready and available for cloud computing, these are highly scalable and extremely powerful options for feeding massive amounts of data to various analytic tools.  These solutions tend to focus more on analyzing the data than collecting it and are geared for fast data reads rather than ease of writing data to them.  Often associated with AWS, Google, or Azure environments the general use case leans heavily towards analytics and are not usually suited for real-time process control and operational decision making.

 

Open Source Offerings

One of the fastest growing categories, these time-series databases will feature various levels of development and support.  Initially attractive as a “free” option, they will require large amounts of work for initial deployment and integration.  Several also convert to more traditional licensing requirements for enterprise versions.   Some will include data collection and analytics, but the majority do not.  Few specifically are built for the specific challenges of industrial automation.  Some offerings include TimeScale DB, Influx DB, Prometheus, and others.  One large area of concern in this category is the business model.  Nearly all of these companies are being heavily invested in by venture capitalist.  Generally, this signifies a business model with the ultimate goal of being acquired by a larger company.  Unfortunately, when this occurs, support, future product development, and general user satisfaction almost always suffer.

 

Data Historians

A specific category of time-series databases that began over thirty years ago specifically to help companies in the industrial automation space.  These time-series databases can quickly be sorted by their underlying  SQL or NoSQL technology.  SQL based data historians are often favored for smaller systems due to their low costs and typical inclusion with most SCADA platforms.  NoSQL databases are favored for higher tag counts and more critical operational data.  The most popular NoSQL offerings that stand independent of SCADA platforms include Pi and Canary.  Additionally, several SCADA providers offer NoSQL historians like Iconics and Wonderware.  Historians that are SCADA dependent can be troublesome to larger enterprises as they find it difficult to consolidate data from multiple facilities running different SCADA packages.

 

Affordability matters

The general purpose of a time-series database for an industrial application is to collect large volumes of process data that can be used by both operators and process engineers to increase process efficiency.  Ultimately, they are used to increase the bottom line; making a fast ROI crucial.  When approaching a financial evaluation of the time-series database, it is important to evaluate three separate areas:

  • Deployment Costs
  • Licensing Model
  • Long-term Support

Choose the wrong database and you will accrue unpalatable fees during deployment.  Whether it is your teams’ time, system integrator billings, or long hours of training, deployment costs can quickly skyrocket.  It is not uncommon for larger organizations to spend 12-18 months deploying some solutions, and most find they spend as much on deployment as they did on the purchase of the system.

Once these costs are considered, you must also weigh both initial licensing costs and projected growth costs.  In an age where more and more devices are coming online, it is important to match a time-series database’s licensing structure with both your current and future needs.  A great question to consider is whether they offer free data collection and an unlimited tag count option.  If so, at what price point?  Secondly, how are trending and analytic tools priced?  Finally, calculate your on-going support and maintenance costs and understand what penalties you may face should that coverage lapse.

If you focus your time-series database search on solutions that offer the following features, you greatly increase your ROI opportunities in the first 12 months.

  • Fast deployments with light training requirements
  • Free data collection
  • Maintenance free database
  • Affordable and scalable licensing models
  • Support and maintenance for 15% or less of the original licensing

 

Getting the data out

Today, a time-series database built for industrial automation must excel at both storing data as well as making it readily accessible.  This means the database you choose must be evaluated on its reliability as well as its interoperability.  Most companies want to integrate a time-series database with their entire industrial stack.  That means providing data to SCADA, MES, and ERP systems, as well as for various data analytic tools.  To do this ensure you can make both SQL queries and connect to the time-series database via web APIs.  Additionally, having HTML trending tools and dashboarding will allow for simple integration into various platforms and enable data consumption on tools like smartphones and tablets.

 

Evaluate multiple offerings

A simple way to ensure you make the right choice is to undergo a pilot or proof of concept with several vendors.  Most reputable companies will provide a free 90-day demo of their technology so you and a core team can evaluate the software using your own process data.  These evaluations will give you a realistic sense of the deployment process, allow you to better understand the strengths and weaknesses of the systems, and work with the company behind the product.  If a company is not willing to provide an evaluation or cannot assist with a timely deployment of a POC, perhaps they have made your decision a bit easier. 

 

It's all about the data

We are living in extremely exciting times in the automation world.  More and more technologies continue to emerge, many which offer incredible possibilities in helping run operations for efficiently.  However, nearly all these new resources require the same thing, access to incredible amounts of process data.  The decision you make now regarding your time-series database will have impact in your organization for many years to come.

Canary exists to provide reliable and affordable tools that will make it easy to capture, store, and analyze time-series data.  To better understand whether Canary could be the right fit for your organization, learn more about our system, or try Canary without any commitment.  If you like the tools, browse our pricing or request a full demonstration.