The rise of multi-cloud, data-first architecture, and the broad portfolio of advanced data-driven applications that have arrived as a result require cloud data management systems to collect, manage, govern and build pipelines for enterprise data. Cloud data management architectures span private, multi-cloud and hybrid cloud environments. These architectures connect to data sources not just from transaction systems, but from file servers, the Internet or multi-cloud repositories.
The scope of cloud data management includes data lakes and archives, enterprise content services, and consumer data privacy solutions. These solutions manage the risk and compliance challenges of storing large amounts of data.
Cloud data platforms
Cloud data platforms are the centerpiece of cloud data management programs and manage uniform data collection and data storage at the lowest cost. Archives, data lakes, and content services enable cloud migration projects to connect, ingest, and manage any type of data from any source. For instance, legacy mainframes, ERP, CRM, file stores, relational and non-relational databases, and even SaaS environments like Salesforce or Workday.
Data migrated to the cloud is often stored “as-is” in buckets to reduce heavy lift ETL processes. The goal is to establish real-time data pipelines to support data-driven applications. When “as-is” data will not meet application requirements, cloud data platforms cleanse and transform raw data in preparation for future processing. The process of data preparation provides critical data quality measures including data profiling, data cleansing, data transformation, data enrichment and data modeling.
Data pipelines are a series of data flows where the output of one element is the input of the next one, and so on. Subsequently, data lakes serve as the collection and access points in a data pipeline and are responsible for access control. As data pipelines emerge across the enterprise, enterprise data lakes become data distribution hubs with centralized controls to federate data across networks of data lakes. Data federation centralizes metadata management, data governance and compliance control while at the same time enabling decentralized data lake operations.
Cloud metadata management
Cloud metadata management provides a view of the entire data landscape (including structured, semi-structured, and unstructured data) and helps users understand their data better. Analysts classify, profile and apply consistent descriptions and business context for the data. Centralized metadata management enables users to explore their data landscape in three ways:
- Data lineage helps users understand the data lifecycle including a history of data movement and transformation. As a result, this simplifies root cause analysis by tracing data errors and improves confidence for processing by downstream systems.
- A data catalog is a portfolio view of data inventory and data assets. Users browse the data that they need and are able to evaluate data for intended uses.
- Business Glossary is a list of business terms with their definitions. Data governance programs require that business concepts for an organization be defined and used consistently.
Cloud data management also provides consumer data privacy and data governance controls that are essential to reduce the risks involved in handling bulk data. Information Lifecycle Management (ILM) manages data throughout its lifecycle. This establishes a system of controls and business rules including data retention policies and legal holds. Security and privacy tools like data classification, data masking and sensitive data discovery help achieve compliance with data governance policies such as NIST 800-53, PCI, HIPAA, and GDPR. Consumer data privacy and data governance are not only essential for legal compliance, they improve data quality as well.
Digital transformation requires interoperability with the cloud and its vast network of data and web services. Cloud data management connects, governs and manages data across multi-cloud landscapes. As a result, this delivers the essential custody services for a data-first architecture.
By providing end-to-end services such as data pipelines between OLTP systems and SQL data warehouses, archiving databases and mail servers, hosting data lakes and running NoSQL applications, cloud data management provides essential services for data-driven applications.
Data-first architectures require low-cost and efficient object storage, real-time access, data governance, metadata management, data preparation and connectivity with end-to-end data pipelines. Cloud data management enables any organization to implement these critical capabilities very quickly, achieving digital transformation.