SAP recently posted an article titled "10 Myths About Predictive Analytics". It is a great read and worth your time, and can be found here. The article makes the following points regarding current Predictive Analytic Myths:
- 1. Predictive analytics is easy
- 2. Scientific evidence is proof
- 3. Only what you can measure matters
- 4. Correlation = causation
- 5. Predictions are perfect
- 6. Predictions are forever
- 7. You need a skilled consultant to implement predictive analytics
- 8. Predictive analytics is mostly a machine problem
- 9. Predictive analytics are expensive
- 10. Insights = action
A potential client was discussed on a recent phone call with a distributor. This client manages a fleet of over 100 large sea vessels and is looking for a solution to monitor the entire fleet, as well as move their maintenance model away from interval and run to failure solutions, hoping to employ a predictive or auto prognostic model.
This particular client is looking for an "all-in-one" solution that will help them transition from their current stage, not collecting or sharing any historical process data, to operating under a full machine learning / predictive model. The challenge is the client is focused on an "out of the box" solution that can offer them everything they want.
You may be chuckling to yourself because you have had interactions with a similar client or you yourself are seeking the same type of solution. I would suggest a different approach, one that I commonly refer to as the "crawl-walk-run" plan.
The most basic first step is the collection of large amounts of process data. The top priority of the client needs to be the development of a solution that will allow for all of the sensors that define their many assets to be recorded and stored in a central database. This database needs to be "loss-less", meaning that the data does not change over time to reduce storage space limitations. For instance, if you collect a pressure reading at one-second intervals, a "loss-less" database will keep all those individual readings for as long as you like. Many data historians do not offer this feature. Instead, after a few months, they turn one second data into 60 second or 5 minute averages. For some industries or uses, this might be acceptable. However, if you plan to employ algorithms to scour your data looking for correlations, you don't want to feed it processed time-averages.
Once the data is being stored, the client must put together a plan that will allow for the data to be shared around the organization. Centralizing data is one thing, but if that data is not easy to access and report from, the information will stay locked in the database with very few individuals gaining any value from it. To make this possible, the database must be engineered in a way that offers a variety of connections, including ODBC and custom APIs. It is likely that the process data will be beneficial to combine with other data and move into other systems.
Once the data has been collected and effectively made available to all that can consume it, advanced analytics such as predictive maintenance schedules can be created. However, to get to this step will require time. Not because the organization will move slowly, nor because it will be difficult to collect and distribute the process data. The reason this final step will take time is due to the nature of predictive analytics. In order for the tools to properly function, it will be necessary to provide years of process data to learn from. Machine learning is only as good as the data that it is learning from, and having adequate history is a requirement.
Therefore, I made this recommendation to our distributor. Tell your client to focus on the first two parts of this process. Today they need to focus less on the analytic solutions that are available, and instead need to focus on finding a strong and reliable database that is capable of storing sub second data from millions of sensors. That database cannot be SQL based and must be fast. Recalling the data is going to be of paramount importance for the future process and must be thought about during selection. Focusing on the best machine learning available today instead of the best data historian available today would be a mistake for several reasons.
- Technological advancements - most predictive analytic companies are less than five years old, which is a strong indicator that the market is quickly changing and advancing by the month. The best solution today is unlikely to be the best solution in two or three years when they are ready to move forward.
- Pricing changes - like most new technologies, the price points of predictive analytics have already started to fall, and will further drop over time. Making decisions today on pricing seems unnecessary when you can nearly guarantee that model will change over the next two years.
- Volatile industry - as mentioned in the first point, this is a relatively young market and is full of start-ups. It seems like an unnecessary risk to form a partnership now when it won't be necessary for another few years. Given time, leaders will emerge and hold fast, giving a better indicator of who should (and should not) be partnered with.
As with any solution, always start with the basics first. Find a historian that can handle the millions upon millions of data points you are going to point towards it. Once the data is collected, then begin to focus on what you can learn and how the organization can benefit.