{"id":120,"date":"2018-08-20T11:44:51","date_gmt":"2018-08-20T11:44:51","guid":{"rendered":"http:\/\/accelforce.com\/solixBlog\/?p=120"},"modified":"2023-03-18T23:52:40","modified_gmt":"2023-03-19T06:52:40","slug":"data-lake-misconceptions","status":"publish","type":"post","link":"https:\/\/www.solix.com\/blog\/data-lake-misconceptions\/","title":{"rendered":"3 Common Data Lake Misconceptions","gt_translate_keys":[{"key":"rendered","format":"text"}]},"content":{"rendered":"<div class=\"inner-page-blog-content\">\n<p><span class=\"first-letter\">T<\/span>he enterprise data lake is now well past its infancy: more than a quarter of all organizations have a data lake in production. However, with maturity comes new findings, criticisms, and data lake misconceptions \u2014 with headlines like \u201cData lakes will need to demonstrate business value or die\u201d.<\/p>\n<p>Many of the criticisms of data lakes are just flat out untrue, so I\u2019m here to set the record straight with three common misconceptions about data lakes, debunked:<\/p>\n<h3>They are a replacement to data warehouses<\/h3>\n<p>Some people call <a href=\"https:\/\/www.solix.com\/data-management-solutions\/enterprise-data-lake\/\">data lakes<\/a> the next generation of data warehousing, or simply data warehouse 2.0. However, this couldn\u2019t be farther from the truth. While both technologies at the core are data storage repositories capable of processing, manipulating, and securing data, they are both meant for different purposes, and thus are most efficient when coexisting with one another.<\/p>\n<p>A key difference is that data lakes can store all and any type of data, whether it be structured, unstructured, or semi-structured, while data warehouses can only store structured data. In layman\u2019s terms, Pentaho CTO James Dixon (credited with coining \u201cdata lake\u201d), famously said \u201ca data mart or data warehouse is akin to a bottle of water \u2014 cleansed, packaged and structured for easy consumption \u2014 while a data lake is more like a body of water in its natural state.\u201d<\/p>\n<p>Because data lakes are meant to store and process all types of data, they are ideal for <a href=\"https:\/\/www.solix.com\/products\/solix-common-data-platform\/solix-big-data-suite\/solix-enterprise-data-lake\/\">data science and big data analytics projects<\/a>, while data warehouses make more sense for primary applications where security and performance are valued most. Together, data lakes and data warehouses help enterprises manage their data and make better data-driven decisions.<\/p>\n<p class=\"image-blk-center\"><img decoding=\"async\" class=\"alignnone size-full wp-image-324\" src=\"\/wp-content\/uploads\/2018\/08\/datalake.jpeg\" alt=\"Better data-driven decisions\" width=\"1024\" height=\"512\" title=\"\" srcset=\"https:\/\/www.solix.com\/blog\/wp-content\/uploads\/2018\/08\/datalake.jpeg 1024w, https:\/\/www.solix.com\/blog\/wp-content\/uploads\/2018\/08\/datalake-300x150.jpeg 300w, https:\/\/www.solix.com\/blog\/wp-content\/uploads\/2018\/08\/datalake-768x384.jpeg 768w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/p>\n<h3>Data lakes are not secure<\/h3>\n<p>Here&#8217;s another to add to the data lake misconceptions list: Key comparison of data lakes vs data warehouses is security \u2014 while data warehouses have been around longer and are considered much more mature for securing data, data lakes can be just as secure \u2014 the key is not in the technology, rather the overall data management strategy.<\/p>\n<p>To secure your data lake, you must understand the data lake pipeline, from ingestion to analysis, and <a href=\"https:\/\/www.solix.com\/resources\/lg\/on-demand-webinars\/enterprise-data-lake-archiving-enabling-a-governed-view-with-solix-common-data-platform\/\">implement the appropriate data governance and security strategies<\/a> accordingly.<\/p>\n<h3>Data lakes eventually become \u201cdata swamps\u201d<\/h3>\n<p>Since data lakes ingest all and any types of data, organizations often worry that their data lakes will turn into \u201cdata swamps\u201d, or huge repositories full of disorganized, poorly managed data. The key to avoiding a data swamp is to ensure the proper implementation of a fully featured <a href=\"https:\/\/www.solix.com\/data-management-solutions\/information-lifecycle-management\/\">Information Lifecycle Management<\/a> strategy for your data lake.<\/p>\n<p>Utilizing tools to ensure data can be classified on ingestion or creation and the correct retention policies applied down to the individual record basis. This ensures that the data is not retained past its usefulness and its purge from the system is fully audited on removal. Along with data retention, the data lake should be configured to support \u2018Data Tiering\u2019 to enable enterprises to store their data in the layer appropriate to its usage and long term life expectancy.<\/p>\n<p>The Solix CDP\u2019s <a href=\"https:\/\/www.solix.com\/products\/solix-common-data-platform\/object-workbench\/\">object workbench<\/a> and <a href=\"https:\/\/www.solix.com\/products\/solix-common-data-platform\/governance-workbench\/\">data governance workbench<\/a> are built with all of the information lifecycle management tools necessary to prevent your data lake from turning into a data swamp, better preparing your data for advanced tasks like big data analytics, machine learning, and artificial intelligence.<\/p>\n<h3>Conclusion<\/h3>\n<p>Just like the adoption of any other technology in the enterprise, <a href=\"https:\/\/www.solix.com\/resources\/lg\/on-demand-webinars\/big-data-application-strategy-enterprise-archiving-and-data-lake\/\">a successful data lake implementation does not stop at \u201cif you build it, they will come\u201d<\/a>. For a data lake to become successful, enterprises must create a thorough data management strategy, and fortunately, there are many solutions readily available to help enterprises do so.<\/p>\n<\/div>\n","protected":false,"gt_translate_keys":[{"key":"rendered","format":"html"}]},"excerpt":{"rendered":"<p>The enterprise data lake is now well past its infancy: more than a quarter of all organizations have a data lake in production. However, with maturity comes new findings, criticisms, and data lake misconceptions \u2014 with headlines like \u201cData lakes will need to demonstrate business value or die\u201d. <a class=\"showCmpBlk\" href=\"#\">(more)<\/a><\/p>\n","protected":false,"gt_translate_keys":[{"key":"rendered","format":"html"}]},"author":7,"featured_media":122,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[63],"tags":[88,89,19,71],"coauthors":[],"class_list":["post-120","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-lake","tag-data-swamp","tag-data-warehouse","tag-enterprise-data-lake","tag-solix-cdp"],"gt_translate_keys":[{"key":"link","format":"url"}],"_links":{"self":[{"href":"https:\/\/www.solix.com\/blog\/wp-json\/wp\/v2\/posts\/120","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.solix.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.solix.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.solix.com\/blog\/wp-json\/wp\/v2\/users\/7"}],"replies":[{"embeddable":true,"href":"https:\/\/www.solix.com\/blog\/wp-json\/wp\/v2\/comments?post=120"}],"version-history":[{"count":0,"href":"https:\/\/www.solix.com\/blog\/wp-json\/wp\/v2\/posts\/120\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.solix.com\/blog\/wp-json\/wp\/v2\/media\/122"}],"wp:attachment":[{"href":"https:\/\/www.solix.com\/blog\/wp-json\/wp\/v2\/media?parent=120"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.solix.com\/blog\/wp-json\/wp\/v2\/categories?post=120"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.solix.com\/blog\/wp-json\/wp\/v2\/tags?post=120"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.solix.com\/blog\/wp-json\/wp\/v2\/coauthors?post=120"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}