Rethinking “Big Data” – Not All Content Has Value

For the past several years the business community and IT industry has been buzzing about “Big Data”.  The Holy Grail of business is to become a “data driven Enterprise” by efficiently mining vast amounts of internal and external data.  Identifying unforeseen relationships is considered to be an excellent method to drive sales growth and extend a company’s market share.  While there may be value in this approach, “Big Data” analysis will only be as successful as the value of the stored content it examines.

Since the beginning of the computer industry, organizations have collected and stored amounts of information well beyond what the law requires.  Management in general holds the belief that legacy data may contain vast treasure troves of unidentified residual value.   In some cases it has been justified, since an ability to recall and examine historical content has proven to identify valuable relationships.  However, in some situations it is questionable just how significant the recently discovered patterns and associations may be.

Recently a new wave of analytical tools and data structures has emerged to capitalize on the growing pool of stored data.  They provide new capabilities to combine and analyze dissimilar information, produce associations between obscure facts, and allow vast quantities of data to be inspected for unexpected relationships.  The application of these tools provides new methods for analyzing customer needs, trends, and buying patterns.

While some of retained data may yield valuable insight into an organization’s market, expecting everything in the archive to hold such nuggets can be unrealistic, problematic, and prohibitively expensive to maintain.

Changing Customer Priorities – Today’s markets are highly dynamic, with major change occurring randomly on frequent basis.  Much of the captured information has a finite shelf life.  Over time customers experience life changing events, families mature and disburse, personal finances may improve or decline, and individual priorities shift.  Critical buying patterns of a decade ago may have little relevance in today’s market.

The Impact of External Events – Recent political, economic, and natural phenomena have re-shaped our society.   Dramatic changes in our travel patterns occurred after 9/11.  Hurricane Katrina, along with the Indonesia and Japanese tsunamis effected our thinking about preparations for natural disasters. Senseless killings at a theater in Aurora CO, a shopping mall in Tucson AZ, and a Sikh Temple in Milwaukee WI make us reconsider our attendance at social events and modify entertainment plans. The impact of a protracted global recession negatively impacts spending trends, financial investments, retirement plans, and even our expectations for our future.  Key indicators of a decade ago may provide marginal value today.

Attrition of Value – Another issue with analyzing vast quantities of stored legacy data is the problem of long-term retention of content with questionable business value.  All data is not created equal!  Details about receivables may hold a level of value for many years, while a file about last week’s cafeteria specials is almost worthless by the following week.  A good example this is a management PowerPoint sent to all employees.  The original copy may retain its importance for an extended period of time, but dozens of identical copies kept in user accounts provide little incremental value.

Content Duplication –In any given SAN it is typical to find several outdated copies of the same data, abandon “clone” files once needed for testing, data with expired business value, unnecessary copies of temp files, orphaned directories from departed users, and residue left from ancient applications and databases.  Unless a continuous process is in place to preen and update active storage, licensing costs for “Big Data” analytical tools and systems may be prohibitive.  Even if the IT budget can absorb the cost, performance will suffer from having to load, filter, and index huge quantities of irrelevant data.

While valuable insight may be gained from customer content buried deep within an organization’s data repository, due diligence should be performed on existing content to verify its value and uniqueness.  The old saying “garbage in-garbage out” is just as valid in today’s “Big Data” world as it was in the heyday of the mainframe.

About Big Data Challenges

Mr. Randy Cochran is a Senior Storage Architect at Data Center Enhancements Inc.. He has over 42-years of experience as an IT professional, with specific expertise in large and complex SAN/NAS/DAS storage architectures. He is recoginzed as a Subject Matter Expert in the enterprise storage field. For the past five years his primary focus has been on addressing the operational requirements and challenges presented by petabyte-level storage.

Posted on September 25, 2012, in Data and tagged , , , , , , . Bookmark the permalink. 3 Comments.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: