Integration: What’s in a Name?

Posted by Steve Cormier

Find me on:

4/28/12 6:14 PM

You're building a data warehouse, and you want to indicate what U.S. Census regions your customers are in for analytics purposes.

You get the data from the U.S. government source, but then you decide that it would be good to add in some foreign regions that exist in Canada and Mexico, where you have customers. You add this data to your US_Census_Table, and link the US_Census_Region to your customer territory in your customer and order detail data warehouse tables.

Anything wrong with this? Yes. Yes, indeed. You know that you added that data from other countries, and others around you in the organization do as well. At the point of inception, tribal knowledge is strong. However, new people come along, and suddenly there's an idea that official US census data contains information on Canada and Mexico, which is not true. People may make decisions on this, with unpleasant consequences.

If you want to add the extra data, you need to change the naming to indicate the overall concept of the table, such as North_American_Census_Region. Even if the original content is overwhelming (you have one region from Mexico and two from Canada) and you still want to acknowledge this, you would use something like Augmented_US_Census_Region to indicate that it's not strictly the official externally-generated information. Hopefully the naming in this case would point users to further metadata to find out what Augmented means.

It's important to consider naming carefully. Strong names will be as self-contained as possible, conveying information that's understandable over time, and applying wholly to a concept. They imply something neither bigger nor smaller than the referenced information set. They should never mislead users or depend on current understandings that may change fairly quickly.

Topics: Analytics, Data Warehouse, Information Management, Information Integration