Data cleansingData cleansing

Data cleansing is the process of detecting and correcting corrupt or inaccurate records in a data repository. It transforms large amounts of ambiguous and heterogeneous information into consistent data sets.

Any data repository of a decent size is likely to contain a certain percentage of “junk” data records, i.e. duplicates generated due to typos or different spelling, manual data entry errors and similar words or phrases. Automatic data cleansing technology involves fuzzy search methods for recognising potential duplicates and inconsistencies with further reconciliation. When detected, groups of duplicate data records can either be reconciled with a single master record or simply deleted from the system. Reconciliation methods may include removing typos, validating values against a reference list, discarding inconsistent data or the automatic insertion of missing values.

Data cleansing and validation at the data record’s point of entry are a key part of all Soltex-based solutions.