The Logical Big Data Warehouse

molecule-world[1]Not so long ago, Gartner defined the Logical Data Warehouse (of which Mark Beyer claimed the paternity, but I won’t try to rebuild the Data Warehouse family tree, too many fathers and version numbers out there…).

To put it simply, the premise of the Logical Data Warehouse is that not all data needs to be physically moved over to the Data Warehouse, but instead, when appropriate, it can stay in place in its “owner” application/database, as long as a logical layer exists that enables transparency in the access to this data (through a Data Services approach, or virtual data federation/EII like our friends at Composite Cisco do). I know I am oversimplifying here, and I am probably in trouble with Mark – but if you were looking for some real expert advice you wouldn’t be reading my ramblings but his research anyway!

The same rule of not moving everything for the sake of moving everything applies to your big data projects. Too many organizations looking for ways to break down data silos bring all the data together in one central place and, sure enough, Hadoop is an excellent storage resource for large amounts of data.  Hadoop distro vendors will absolutely love it when your Hadoop cluster grows and they can sell you more maintenance and support (and for the record, Teradata and Oracle are the same, just with a price tag 100 times higher and fully proprietary stuff).

You need to think “data distribution” beyond Hadoop. It’s not always necessary to duplicate and replicate everything. Some data is already readily available in the enterprise data warehouse, with fast, random access through highly optimized schemas and indexing.  You may need to bring in a subset of this data when needed to perform lookups or joins, but it does not need to reside permanently in Hadoop.  Some other data sets might be better off just residing where they are produced.

As with every data management project, proper architecture design is essential. And so is having a proper integration infrastructure. In the Logical Data Warehouse model, seamless access to distributed data is key. The same is true for big data.  Only with this infrastructure will you be able to build this Logical Big Data Warehouse.

Thank you, TiA


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s


This entry was posted on July 23, 2013 by in CLOUD, Op Ed.

Top Posts & Pages


Enter your email address to follow this blog and receive notifications of new posts by email.


%d bloggers like this: