As of 2012, the size of data sets that was feasible to process in a reasonable amount of time, was limited to exabytes level. Data sets grow in size in part because they are increasingly being gathered by ubiquitous information-sensing mobile devices, aerial sensory technologies (remote sensing), cameras, software logs, microphones, radio-frequency identification readers, wireless sensor networks and other such applications. Limitations due to large data sets are encountered in many areas, including genomics, meteorology, complex physics simulations, biological and environmental research. This article takes a look at how the phenomenon of Big Data has affected the T&M world
‘Big Data’ is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. The challenges include capture, storage, curation, search, transfer, sharing, analysis and visualisation.
In test and measurement applications, engineers and scientists often end up collecting huge amounts of data every second of every day.
Let us take the Large Hadron Collider (LHC) for example. For every second of its operations, a test engineer has to deal with an additional 40 terabytes of data. (A ‘tera’ is ‘1’ followed by twelve 0’s.) Similarly, every 30 minutes of a Boeing jet engine run add another 10 terabytes of valuable data, which translates to over 640 TB for just a single journey across the Atlantic. Multiply that by more than 25,000 flights each day, and you get an understanding of the enormous amount of data being generated. Suffice to say that the data available today is so huge that there is a whole branch devoted to deciphering it.
Implications of data growth in the design space
The impact of data growth is primarily reflected in a need for faster data storage as well as faster data transfer. This has resulted in the development of ultra-fast storage drives, circuits that can handle gigabit data transfer speeds, faster backplanes for servers and faster networking to handle the increased data. This, in turn, has led to the emergence of the following faster serial standards for data transfer between chips and systems, as well data streaming to storage drives:
SAS-3 (serial-attached SCSI-3). SAS is a point-to-point serial protocol that moves data to and from storage devices. SAS-3 is the latest SAS standard with a bus speed of 12 Gbps.
Fibre channel. It is a high-speed network technology used to connect computers for data storage. It has become a common connection type for storage area networks in enterprise storage. Faster versions of fibre channel are 16G and 28G.
PCI Express Gen 3, Infiniiband, RapidIO and Hypertransport. These are fast buses used for high performance backplanes.
Where does Big Data come in?
Testing and validating the latest ultra-fast buses require high-performance test and measurement systems, which presents a tremendous challenge and business opportunity for test and measurement companies. The typical test needs are for testing the transmitter and the receiver, and also validating the connection medium (cables et al). For transmitter’s physical-layer testing, ultra-high-bandwidth oscilloscopes with low noise floor and low intrinsic jitter are required.
Sanchit Bhatia, digital applications specialist, Agilent Technologies, cites an example: “Fibre Channel 28G requires a 45GHz real-time scope for electrical validation and SAS-3 requires a 20GHz oscilloscope. Moving on to the receiver validation, bit-error-rate testers operating at up to 28G data rate are required. These testers stress the receiver by applying calculated jitter and measure the bit error rate. Protocol validation is also done in these high-speed buses through custom-designed high-performance protocol analysers.”
Big Data opens three interesting areas for test and measurement: “First, test and measurement has so far been limited to labs and manufacturing lines. With Big Data catching fire, the boundaries of time and distance have broken down and test and measurement now happens on the field. Some applications that have caught on very significantly are online condition monitoring of systems (in other words, aggregation of data). Second, remote monitoring, testing and diagnostics of systems deployed in remote locations without physical access have gained a lot of leverage, which means easier access to data. Last, near-infinite computing resources in the cloud provide an opportunity for software to offload computationally heavy tasks. These can be sophisticated image or signal processing or even compilation and development, which, in short, is ‘offloading,’ explains Satish Mohanram, technical marketing manager, National Instruments India.
“Differentiation is no longer about who can collect the most data; it’s about who can quickly make sense of the data they collect”—Measurements Outlook 2013, National Instruments Corporation.
Challenges encountered with the ‘Big’ in Big Data
There used to be a time when hardware sampling rates were limited by the speed at which analogue-to-digital conversion took place, physically restricting the amount of data that could be acquired. But today, hardware is no longer a limiting factor in acquisition applications. Management of the acquired data is the challenge of the future.
As it becomes both necessary and easier to capture large amounts of data at high speeds, engineers will, needless to say, face challenges in creating end-to-end solutions that require a close relationship between the automated test and IT equipment. This is driving test and measurement system providers to work with IT providers in order to offer bundled and integrated solutions for automated test applications.
Contextual data mining. Data mining is the practice of using the contextual information saved along with data to search through and pare down large data sets into more manageable, applicable volumes. By storing raw data alongside its original context, or ‘metadata,’ it becomes easier to accumulate, locate and later on, manipulate and comprehend. This is one of the major benefits that makes it easier to analyse the collected data.
Intelligent data acquisition nodes. Though it is common to stream test data to a host PC over standard buses like Ethernet and USB, high-channel-count measurements with fast sampling rates can easily overload the communication bus. An alternative approach is to first store data locally and then transfer the files for post-processing after a test is run, which increases the time taken to realise valuable results. To overcome these hurdles, the latest measurement systems integrate leading technology from Intel, ARM and Xilinx to offer increased performance and processing capabilities as well as off-the-shelf storage components to provide high-throughput streaming to the disk.
Ensuring inter-operability. With on-board processors, the intelligence of measurement systems has become more decentralised by having processing elements closer to the sensor and the measurement itself.
Mohanram shares, “Advanced data acquisition hardware include high-performance multi-core processors that can run acquisition software and processing-intensive analysis algorithms in line with the measurements. These measurement systems are so intelligent that they can analyse and deliver results more quickly without waiting for large amounts of data to transfer, or without having to enter it in the first place. This optimises the system’s ability to use the disk space more efficiently.”
Bhatia adds, “The biggest challenge in these buses is to ensure interoperability between systems designed by different companies and to ensure compliance to the design specifications. Compliance test specifications are created to ensure inter-operability and compliance by the standard bodies.”
Breaking the resolution barrier. Today, hardware vendors have accelerated data collection rates to such an extent that engineers and scientists have been able to break through rate and resolution barriers rapidly, triggering a new wave of data consequences.
Bhatia shares, “With data rates going up, validation and compliance procedures have become more complex, as jitter budget will be very tight. Higher data rates have also driven the bandwidth requirements on oscilloscopes higher and created the need for higher-speed BERTs in the future.”
Mohanram paints a slightly different picture: “Advancements in computing technology, including increasing microprocessor speed and hard drive storage capacity, combined with decreasing costs of hardware and software have caused an explosion of data coming in at a blistering pace. In measurement applications in particular, engineers and scientists can collect vast amounts of data every second of every day.”
‘Big’ is here to stay, deal with it
Technology research firm IDC recently conducted a digital data study, which included the world’s measurement files, video, music and many other files. This study estimates that the amount of data available is doubling every two years. The fact that data production is doubling every two years mimics one of electronics’ most famous laws: Moore’s law. Forty-eight years later, Moore’s law still influences many aspects of IT and electronics. If digital data production continues to mimic Moore’s law, success as an organisation will hinge on the speed at which acquired data can be turned into useful knowledge.
The Big Data phenomenon adds new challenges to data analysis, search, integration, reporting and system maintenance, which must be met to keep pace with the exponential growth of data. The sources of data are many, but data derived from the physical world is amongst the most interesting to the engineers and scientists. This is analogue data that is captured and digitised. Thus it can be called ‘Big Analogue Data.’