Similar to my initial thoughts on my Spreadsheet Nation post, I jumped into this topic of Data Quality without really testing the waters.  In this case, I thought I could just jump in, rhyme off some platitudes about Garbage In, Garbage Out (GIGO), and go on my merry way.  Instead, what opened up to me was a vast sea and I was a fish out of water.  I was standing on shore clueless about what lay beneath the surface.

Malaspina Strait, British Columbia, Canada

Malaspina Strait, British Columbia, Canada

Data Quality really is one of those topics that tends to lurk under the surface – elusive to capture.  We are talking about “the state of completeness, validity, consistency, timeliness, and accuracy that makes data appropriate for a specific use” (definition courtesy of the Government of British Columbia).  Or if you prefer, there’s the Dragnet definition: “Just the facts”.  For accountants, we are talking about all that stuff we enter into our systems (or gets generated by other systems) that we need to access later for producing reports and analysis.  The Data Quality refers to how effectively we can gain access to and generate meaning from these volumes.

A great deal of energy tends to go into our design of ways for inputting data.  How much thought has gone into the processes designed for getting the data back out?

According to IDC, a leading technology research firm, very few companies have systems in place to make use of their data, and [they] often struggle to classify data in order to find it again.  There’s a great quote on the V3 blog from Benjamin Woo at IDC:

“The key is to take the data and make money from it”

I think that this frames the issue in language we can understand.  We incur costs for gathering, processing, and storing data.  We may even incur further costs cleansing, reworking, and managing the stores of data.  What does the data do for us?  Are we developing an asset that creates future value?  Or, are we plugging an expense?

As I alluded to with my “fish out of water” comment, the answers to these questions are deeper than can be fathomed in this brief forum.  Today, I would like to simply skip a stone across the surface from the safety of shore.

An introduction to the formal world of Data Quality is the real goal for this post.  I’m not the expert.  These guys are the experts (a couple of them anyways):

  • TDWI: The Data Warehousing Institute is where business and technical professionals come together to gain knowledge and skills through education and research programs relating to the Business Intelligence and Data Warehousing Industry.  These guys are leaders in the field and have a ton of resources you may find useful.
  • IAIDQ: The International Association for Information and Data Quality is a not-for-profit, vendor-neutral professional society of people passionate about improving information and data quality.  They have a fantastic glossary of terms you may find very useful!

These two groups provide a jumping off point.  I don’t think, as accountants, we can be expected to become Data Quality experts.  The constraints of time and inclination stack up against it as well they should.  But, I do think that it’s in our best interest to familiarize ourselves with their world a bit so we can speak intelligently about these matters and gain some measure of insight that can help produce more value from the data we compile.

One quick example

Domain Value Redundancy: A dysfunctional characteristic of an attribute or field in which the same fact of information is represented by more than one value. For example, unit of measure code having domain values of “doz,” “dz,” and “12″ may all represent the fact that the unit of measure is “one dozen.” (Larry English)

MountPleasantClock 2What input fields in your systems give the user discretion with respect to the input values?

I live in Mount Pleasant.

I live in Mt. Pleasant.

I live in Mt Pleasant.

You can see how, once these various spellings get into the database, it becomes much more difficult to generate aggregate data without going in and mucking around.  Getting it right the first time is a key issue, but that’s a topic for another post.

Parting Shot

Here are some fun facts to leave you with today, just to give you an idea about the nature of the Data Quality professional:

  • Wednesday, November 11, 2009 was World Quality Day.  I bet you didn’t know that. World Quality Day was established by the United Nations in 1990 to raise awareness of how quality approaches can have a tangible effect on business success.
  • Right now among the Data Quality community, they are engaged in a “Blogging Olympics” dubbed, “Three Single Versions of a Shared Version of the Truth”.  My favourite post so far is the one arguing that the “single version of the truth” mindset is inherently flawed and should be considered the “one lie strategy”.

These are the guys we need to engage.  Enjoy!

Share