Tuesday, April 1, 2008

"Create One Version of Truth": Can This Ideal Be Realized in Databases Managed by Humans?

In my blog "Could 'End-User Buy-In And Support For Accurate Data' Resolve The 'Garbage-In, Garbage-Out' Issues Of Databases?", I discussed how the company in the article, Renassler Polytechnic Institute, had developed what could be a best practices methodology for ensuring accurate data and consistent databases within its company. I summarized the list of steps or things that Renassler implemented to faciliate this process of maintaining accurate data while gaining a consistent database in return. The list is included again as the following:

1. Create cross-functional support.
2. Think big, start small, deliver quickly.
3. Create one version of data truth.
4. Provide support for new behaviors.

Notice that I put the third item in the list in boldface type to coincide with this blog topic. I was reviewing the list of blog entries on this site to ensure that any blogs that were a prequel to a series of subsequent blogs had indeed been followed up by those sequel blogs. When I scanned the list of specific ones to concatenate, this blog suddenly redirected my focus. I kept thinking about a recent situation in which a critical judgement or decision had been made on the basis of certain information provided by what was obviously deemed an accurate and reliable repository. Although the data gathered from this repository was not subjected to check constraints, it was presumed accurate.

Thus a critical decision or judgement was made when in fact the data retrieved from this repository was actually inconsistent. The repository had not been updated to reflect errors made thus it was presented to the end user as valid. A major issue with this inconsistent data was that, first, the keeper of the data made a mistake, acknowledged it, but either for lack of accountability or perhaps forgetful intentions did not update the repository. When confronted with the error, the data keeper (dba) still did not make changes to the repository thus the data retrieved resulted in actions and ideas that had negative impacts on the objects in the repository when inconsistent, inaccurate data was retrieved.

In search of understanding what caused the bad data to be stored in the repository despite knowledge of errors, a deficiency was discovered. There had been no clearly defined data definitions or any model for how information or different events would be be handled since the business had hired a new data keeper (dba). Thus, problems surfaced because the business requirements had changed and no one had updated the repository to reflect those changes. Users kept making updates that either were being discarded or lost because the new data keeper did not communicate the new business rules therefore any inconsistent data kept being rewritten. The problem was the lack of communication regarding data issues on the behalf of the data keeper. Information continued to be modified so much that it began to cause conflict and to resolve it the data keeper created a version of truth for the repository and communicated the change to upper management instead of the users. Upper management then made a critical decision influenced by bad data and a desire to resolve the chaos within the system. Data definitions were established and business rules were communicated to users and stored in a new repository as a version of truth. It seemed that a lot of the problems were resolved for the data keeper and upper management, but not the users. Inconsistent data and poor communication had tarnished the users' creditability and no one had created one version of truth.

There was a version of truth extracted from the repository, but it was inaccurate. Then, there was a version of truth communicated to upper management via the users that was deemed unworthy of recording to the new repository despite validity. Upper management stepped in at the request of the new data keeper to establish a version of truth that everyone would have to accept. This eliminated chaos, but did not truly create one version of truth because there were multiple ideals of what the one version of truth entailed. Since neither set of data had been checked against each other and no concatenation of data had been obtained from the repository (dba), the users, and upper management, then no one was able to create one version of truth.

This is sort of a wacky analytical blog in which the revelation is that just like there's multiple sides to every story there is such with the presentation of information stored in databases. What makes a single story have so many sides is that each user can interpret data in a variety of ways and based upon some ideal can present it as they see fit. This may lead to inconsistencies in the data and cause problems for the database system if there are no checks and balances. Since individuals maintain databases and can modify within certain restrictions data contained in databases, there is potential for human errors. If no one is willing to check the database for those errors to ensure that all data is consistent with everything that has been presented, then it will be impossible to have a database with one version of truth. It will be more along the lines of deciding whose version of truth is more acceptable despite bad data thus making this concept difficult to realize in some settiings.

No comments: