Metadata Management – Not a Small Problem Anymore

At it’s annual Symposium/ITExpo conference last year, Garner identified the “Top 10 Strategic Technology Trends” for the year.  Their list correlates closely with the topics we’ve seen in industry presentations and conferences, and even the news media for most of this year.  Guess what topped the list?  AI and Advanced Machine Learning, with Intelligent Apps and Intelligent Things next on the list.  Blockchains, mesh apps, and adaptive security are on the list too, but the limelight and broadest focus are on technologies that ingest data and spit out something new – with little human intervention.  As Hadoop and other big data technologies have matured, the Data Lake architecture has evolved to allow firms to leverage their computing power against transactional/behavioral data to produce “Intelligent” systems and apps.

Hadoop and Data Lakes power  the long-term technology trends:  AI/Machine Learning, and “Intelligent” systems.  Metadata management tools and processes are what keep your Data Lake from becoming the Data Swamp.  Harnessing the power of data to drive value requires managing that data just like you would any other data asset.  Just because you have a big data platform doesn’t mean you can get away with dumping data in it with no organization or no method to communicate the organization and relationships to everyone who wants to use it.

It’s no coincidence then, that Gartner also released their first-ever “Magic Quadrant for Metadata Management Solutions.”  (many links available to download the report from several vendors)Collecting, maintaining, and using metadata is no longer something you do just to satisfy a documentation requirement.  Metadata management is a strategic imperative for any firm who’s success rests on what it does with data:  banks, insurance firms, trading houses, utilities, manufacturers, retailers, healthcare.

The Magic Quadrant’s authors, Guido de Simoni and Roxane Edjlali, build the case for their findings on two key assumptions:

Through 20189, 80% of data lakes will not include effective metadata management capabilities, making them inefficient.

By 2020, 50% of information governance initiatives will be enacted with policies based on metadata alone.

The first assumption isn’t surprising – data lake architectures are only beginning to be a mainstream solution and most tools and programs haven’t adapted to the architecture or built in integrations to the underlying Hadoop technology.  The second assumption is somewhat startling in that implies that in the next 2-3 years, we’ll see a huge shift from human-powered data stewardship and governance to a much more automated regime based on metadata (provided that the tools exist and are adopted to that extent).

Leveraging (and monetizing) new flows of data from disparate sources will be the challenge for data professionals if their firms are to remain competitive (and relevant).   Having a complete and comprehensive view of every data element – what it means, where it comes from, where it goes, how it’s calculated, and who cares – makes this possible.  Metadata management platforms that seamlessly gather and present this type of information to a wide set of users and stakeholders will be at the center of this trend, feeding the ubiquitous AI/ML routines that we will rely on every day.