Another component of the “Big Data” conversation is what “big” really means. In the early days of the PLC an I/O map of 1024 inputs and 1024 outputs was considered “big”. Not so much these days. Adding 12 or 16 bit analog values to the mix of data was really taxing.
As control system technology has progressed with increasing memory density and processor throughput, the boundaries of data collection have increased exponentially. Network speeds have increased thousands of times serving data to internal and external applications with equal ease.
With the newer tools more complex functions can be easily realized. In the case of electric motors, the most valuable function would be to see the sum of the phase currents over time as load changes take place. Taking a current sample during starting for the first 20 seconds while a load is being started is typically where the data would be most informative. Storing multiple starts and creating a “normal” profile becomes a template for measuring conditions at the mechanical load. This type of data structure would be an analog data file with a known sampling rate and resolution, probably much lower density than MP3 audio.
Another important rule could be constructed looking at the current changes over time for a steady load like a fan. When the fan is running at a set speed its current should be constant. A variation of 5% or more would indicate a problem. Sampling could be conducted periodically based on a user-defined interval. Samples that are normal can be discarded, samples that are outside the error band would signal an alarm state and would be tagged with day date and time and stored for reference.
Production floor metrics like machinery up-time and production output are extremely simple by comparison. In discrete manufacturing, every time a part is completed it simply increments a counter. Complex manufacturing that involve many steps or part traceability require time tagging as the product goes through its process steps.
Things get more complicated in highly centralized systems where coordination between large databases and remote manufacturing operations takes place. These systems involve major communications infrastructures, and in today’s slightly paranoid world, significant security measures to prevent intrusions. For large multinational companies the security frontier represents a massive investment.
The main point is that defining data structures is not quite as difficult as it seems because there is no one generalized data “superset”. What is important on the manufacturing floor is easily defined. What is important in terms of metrics that define how a business is performing is also relatively easy to define.
So big data might not be so big after all.