Or why data mapping spreadsheets reproduce like rabbits.

A common issue the world of data projects (data warehouse, data mart, reporting, analytics, moving data from one place to another) is that documenting metadata and data mappings is an afterthought to the project planning.  The planning focuses on deliverables, timelines, and resources.  The project budget is developed and focuses on the big ticket licenses for the DBMS, ETL tools, data modeling tools, and seat licenses for reporting and analytics.

Sometimes there’s even a budget line-item for data mapping or metadata management tools.

Then the budget gets a haircut.

The data modelers can maintain the metadata they need in their tools.  The developers maintain their code repositories.  The DBA’s put field definitions in the database comments.  The solution for metadata and data mapping ultimately becomes – “Just keep it in Excel.”

The pain doesn’t start right away.  In fact, the real pain with spreadsheets doesn’t really begin until after the project is over.  Let’s take an example:

Suppose the new data project is to integrate 10 source systems into a new reporting application.  The data mapping team builds an Excel template that looks like this:  1 spreadsheet file for each source, 1 tab in the spreadsheet for each file being mapped.  Columns are defined for business name, business definition, physical name, format, business logic, filter criteria, etc.

At the end of the project, the team has 10 detailed spreadsheets that are now the golden source of mapping information for anybody asking.  They get stored in a shared network drive or Sharepoint location and there you go.  Everything is good until something needs to change.

Someone wants a new data element.  A business requirement changes.  Finance needs a separate calculation for their reporting, a new product gets launched, marketing starts using new tools . . . business happens (as it should, that’s what business does.).  Every one of these changes requires changes to the data – models, fields, mappings, ETL code.  Every one of the changes needs to be documented in the spreadsheets.

The team archives a copy of the old spreadsheet so the historical mappings aren’t lost and then it creates a new spreadsheet with the updated metadata and mappings.  One spreadsheet becomes two.  Ten spreadsheets become twenty.  Another release goes in and then you have thirty.  It’s not exponential growth (like the rabbit problem), but it’s about as bad.  Way too many spreadsheets for what’s now a broad audience of analysts, testers, developers, auditors, and users to keep track of.

We built MapWorks to address this very problem.  One place that everyone can go for mappings and metadata.  One solution that maintains historical mappings and futures development cycles.  Searchable.  Flexible to accommodate how your team operates but structured enough to keep everyone aligned.