When leveraged for big data projects, high-performance computing and data lakes are an attractive alternative to structured data warehouses.
Businesses today have found that managing big data presents its own set of challenges; when thousands of sources are accessed and collected, the process of handling information becomes difficult. The older method of storage—filing information in a directory that ranks it according to its relative importance (referred to as a data warehouse)—is an expensive way to treat large data volumes. Enter the data lake as a solution.
Drawbacks of Structured Data
In contrast to a data warehouse, data lakes make use of non-hierarchical or “flat” architectures for the storage of raw data. To load data into a data warehouse it must first be given structure, a process referred to as “schema-on-write.” In a data lake the information is largely unstructured until it is necessary to access it (called “schema-on-read”). Instead of restructuring the data upon storage, metadata tags and unique identifiers are applied to enable the data elements to be accessed by query. Data lake environments eliminate data silos that create fixed volumes of information isolated from the rest of the company, enabling users to more easily cross-correlate data. And with increasing volumes of data coming from multiple sources—a situation that has made it necessary to have a storage platform that can scale—the flat architecture of data lakes enables them to accommodate growth more easily.
Data lakes are becoming the preferred platform for businesses to store information in its native format. They offer the benefit of lower cost and flexibility that enables the support of multiple workloads of different sizes and types. Their central repository—data lakes store massive amounts of data (from multiple sources) in one place—leaves the user with one target for the data processing work. And since the data is unstructured, analysts can remodel it to diversified data sets.
Dangers of Data Swamps
Yet managing a data lake can be difficult. Just because there is a large amount of information pouring into one place doesn’t make it easy to organize data and send it elsewhere in a readily usable format. In an article from Information Management, Ben Sharma, CEO at data management firm Zaloni in Durham, NC says that poorly organized Data Lakes risk becoming “data swamps” that offer questionable benefit. Yet, the need for storage solutions is strong and so the use of data lakes is a significant trend in data-intensive industries.
“Companies are realizing that they need more agile data platforms and deeper analytical capabilities to compete effectively in their market. The major trend we see is organizations moving from sandbox or single-purpose big data applications to enterprise wide governed data lake implementations.”
The larger challenge is, “finding, rationalizing and curating the data from across an enterprise for analytics solutions,” Sharma says.
The data management industry wants to utilize automated algorithms to get better and faster insight into its data. To take full advantage of Data Lakes, many companies seek to make use of high-performance computing (HPC) solutions. According to Bill Mannel, Vice President and General Manager of HPC at Hewlett Packard Enterprise, data lakes and HPC are enabling users to better search, curate and analyze big data. Those organizations who embrace the challenges of storing and accessing data in this new environment may realize competitive advantages as well as cost savings and growth - if they address the issues of curation and access that make it more difficult to utilize this flat architecture for big data.
Tame the Ever-Increasing Flow of Information
InfoDesk has created the world’s smartest platform for managing and sharing information. With our comprehensive solutions, you can bring all your information together, filter and select relevant content, and deliver the right intelligence to the right people. InfoDesk has been providing actionable intelligence to multinational corporations, government agencies and other organizations since 1999. InfoDesk is based in New York with offices in London, Washington, DC and India. Learn more about InfoDesk.