Quantcast
Channel: lavykim australia
Viewing all articles
Browse latest Browse all 32

Aspects of Data Quality

$
0
0

Data quality is a measure of the agreement between the data views presented by an information system and the real world. A data quality of 100% for example, would indicate the information system’s data view to be in perfect agreement with the real world. A data quality rating of 0% would indicate no agreement at all. No information system can achieve a data quality of 100%. The real concern with data quality is not that it is perfect, but that it is high enough to base reasonable decisions on it. For this reason we can define data quality as “fitness for use”. Hence, data quality is relative. Data of quality appropriate for one use may not possess sufficient quality for another use.

The quality of data depends on the processes that create, transform and modify them. The four most important dimensions of data quality are: accuracy, completeness, currency and consistency. Data quality problems cannot be addressed effectively without an understanding of these fundamental aspects of data quality.

I) Accuracy

Accuracy is a measure of the agreement of the recorded data with an identified source. This source, which may be the real world, a computing algorithm or an aerial photograph, sets the context for further analysis. Accuracy must not be confused with precision. Accuracy is a measure of correctness and pertains to the number of errors contained in the data.

Precision is a measure of the degree of reproducibility and refers to the level of measurement and exactness of description. Precise attribute information may specify the characteristics of features in great detail but this does not mean that the information is accurate. Similarly, computers allow calculating of very precise results, however, this does not mean that these results are necessarily accurate.

Some surveys, are qualitative, either as a result of the data collection process itself or the or as a result of the phenomena being recorded. In some cases, little can be done about the intrinsic non-precise nature of the survey. In other cases, we can design surveys to provide more accurate results by changing the survey methodology. In any case, data inaccuracies arising during recording, transfer and processing of data are preventable. These inaccuracies are usually introduced as a result of typing errors, assignment of wrong codes and incorrect recording of co-ordinates. One solution is to double check all data. This is a time consuming  process. A more popular approach is the imposition of data entry constraints and the use of integrity checks.

II) Completeness

Completeness describes the relationship between the objects in the database and the abstract world of all objects. It pertains to the question of whether a data set contains all the necessary elements with respect to a given purpose. Completeness may refer to the absence or presence of individual data records or spatial features, a lack of relevant data fields or the absence of appropriate links between spatial and attribute features (this is of particular relevance in the case of georelational or GIS data). Completeness is application specific; a data set that is complete with respect to one application may not necessarily be complete with respect to another application. Judging completeness requires some knowledge of the real world phenomena the data is abstracting as well as some knowledge of data’s intended purpose.

III) Currency

Currency is a measure of completeness with respect to time. It refers to the temporal aspect of the relationship between objects in the database and the abstract world of all objects. It relates to the question of ‘how up-to-date is the data?’. A data set that is complete today may not be complete tomorrow. The issue of currency is critical for data based on phenomena that change relatively quickly, such as socio-economic patterns. It is less of an issue for data based on (relatively) stable phenomena, such as soils or geology.

A lack of data currency may be due to either the unavailability of up-to-date source data (e.g. aerial photos) at the time of data capture or due to a failure of the data custodian to update the database as new information becomes available. The former is often the result of technical, economic or political constraints and hence, generally not of interest from a data engineering point of view. Assessment of currency requires knowledge of the timeliness of the data source as well as the temporal ‘stability’ of the real world phenomenon. It should also be remembered that a lack of currency may actually be of value (e.g. historical data).

IV) Consistency

Consistency refers to the data’s logical structure and its conformity to known or previously defined rules or criteria. Most of the rules and criteria upon which consistency will be judged should be supplied as part of the metadata. However, many consistency checks, especially those involving the logic aspects of consistency, do not require any specific knowledge of defining rules or criteria. As a result, consistency is the aspect of data quality that is most easily verified from a database manager’s point of view. In fact, (logical) consistency is at times the only aspect of data quality that can be judged by an external agent, not familiar with defining criteria, data source, or specific project requirements.


Viewing all articles
Browse latest Browse all 32

Trending Articles