Records and Retention Management using Big Data

Big Data, Data Management No Comments »

In recent years IT has been rocked by the advent of Big Data, new kinds of data coming from the Internet, using new technologies such as Hadoop and Map Reduce. So far IT has treated this largely as an exotic technology from the outside that has a lot of promise but that is separate from traditional company data and the systems and people generating it.

But is that really true? Today increasing numbers of work groups and and small companies are using online systems like Google Drive, Dropbox, etc., to share and collaborate on data of all kinds from written documents to photographs and videos, both for work and personal use. Google, one of the chief drivers of this often bottom-up movement in organizations, has a specific mission –to organize the world’s information and make it universally accessible and useful. We all know the power of this vision; it is changing our lives. What we seldom think about is that these services are based on those same Big Data technologies and concepts.

So why can’t CIOs do the same thing with the large amounts of data that their organizations generate? Enterprises are experiencing dramatic growth in data, but often much of the data is stored inefficiently—which wastes resources and time. Clearly, enterprises are continuing to invest in more storage/infrastructure every year. For me, the case for Big Data as a repository for records and retention management is made. Think of the power of having an internal system that makes all company documents, videos, photos, etc., as well as traditional structured data, instantly available to whoever needs it (and has proper authorization to access it) from a central place accessible anywhere the Internet or corporate internal network reaches, on any device the user wants, at any time. And simultaneously protects that data from loss and ensures a single master version of the truth. It can be done with Big Data technologies.

The advantages of creating this kind of repository based on Big Data technology include:

  • No database licensing and maintenance costs: Imagine the money being spent on Oracle, SQL Server, DB2, etc. Open source technology eliminates that.
  • No Tier 1 storage costs: You can use standard SATA storage, even white box storage as the hyperscale installations do, instead of high cost storage from EMC, Netapp, IBM, HP, etc.
  • Choice of public or private cloud: If you choose you can eliminate CAPEX entirely and host your repository on Amazon or any of several other public cloud services. Or if you prefer, you can put it in a private cloud in-house.
  • No Backup/DR costs or issues: The way Big Data is organized, it eliminates the need for backup or administrative costs. And because it is available across the Internet it supports work from alternative locations in an emergency, as well as routine remote work.
  • Extended Analytics: Once corporate data, including semi-structured and unstructured documents, etc., are in a Big Data repository, it becomes easy to add third-party data such as weather data, link that to your sales & marketing, & extend analytics beyond your enterprise data.
  • And one final important advantage: Building and running this repository will allow IT to gain valuable experience with the Big Data technologies that clearly will be a big part of the IT future.

Imagine the world of enterprise data center in 2025. It will be vastly different. Clearly, Big Data will seep into every enterprise. This is perfect use case for getting a head start on that transition. If you are the CIO, who believes in this vision, reach out to us. We can help you.


© Solix Technologies, Inc.
Entries RSS