The Heart of Data Virtualization (Part 2)

Posted by Steve Cormier

Find me on:

6/16/15 1:00 PM

Part 2: The re-emergence of process oriented design

In Part 1, we looked at how OOD/Normalized modeling came into being and how important it was to data efficiency and integrity. 

In this installment, we see how the process-oriented design approach has re-emerged in the Big Data culture, and how data virtualization helps restore OOD/Normalization order. 

Strangely enough, the process-oriented mentality we talked about in the old times has re-emerged, with a vengeance.  Today, it’s particularly prevalent in the newest technologies.  Hadoop (Google, Facebook…) and NoSQL data stores, for instance, are very often used by newer developers in a very process-oriented way.  The object-oriented lessons have been forgotten by many, and duplication of data such as ‘customer’ is rampant.

So, we have the good data structuring concept of OOD/Normalization, and the not-so-good practice of process-oriented duplicative implementation.  There are many tasks, such as developing analytics and services, that require properly integrated data with single representations.

What are we to do?

Well, we could go back and write all the systems properly, but that’s not happening because business has to get done.

So, we have to suck the data out of the process-oriented systems and eliminate the duplication so we have a nice object-oriented single version of the truth.  Jane Jones may be three different people in three different process-oriented systems, but we will save her from getting John Smith’s mortgage statements by integrating her into one shining computer version of herself, mainly for proper reporting/analytics and for building web services.

Well, that’s all fine and good, but before we can integrate stuff we have to have a model that represents everything (people, cars, houses) in an OOD, normalized fashion. 

That model is the business model. The model contains all the business objects and their relationships to each other.

This brings us to the technology that has been doing this for a long time, the data warehouse.

Before data virtualization (and still very popular) there was data warehousing.  The data warehouse was an integrated data store based on OOD/Normalization.  It required a good OOD model based on the real business world to be successful.  Once we had a good model and thus central data store, we could pull all the disjointed data from the operational systems, align it, transform it so it all matched up, then integrate it into a single representation—just one Jane Jones.  In a data warehouse, the data was actually moved from the original systems into the new data store (almost always a relational database).  This may not sound like that big a deal, but think about it.  All that data from the original systems was a lot of data.  That meant you had to have a whole new big data (no not Big Data, big…data) store, which meant buying hardware, and software, and managing it, and doing backups, and version upgrades, and having more DBA’s and, well, you get the picture.

Many companies just couldn’t afford the cost and time required to build a whole separate system, and many companies who thought they could ended up swamped in the cost and complexity of system management.

Enter the concept of data virtualization.

In the next part of this article, we’ll look at how data virtualization delivers needed data more quickly and without the substantial cost of data warehousing.

Topics: Analytics, Business Intelligence