Download PDFOpen PDF in browser

Towards Automated Data Cleaning Workflow

EasyChair Preprint 5652

8 pagesDate: May 28, 2021

Abstract

The success of AI-based technologies depends crucially on trustful and clean data. Analysis in data cleaning has provided a range of approaches to handle completely different data quality issues. Most of them require some prior information regarding the dataset so as to pick and configure the approach properly. We tend to argue that for unknown data sets, it is unreasonable to understand the data quality issues direct and to formulate all necessary quality constraints in round. Pragmatically, the user solves information quality issues by implementing associate degree repetitious cleaning process. This progressive approach poses the challenge of distinctive the right sequence of cleaning routines and their configurations. During this paper, we highlight our add progress towards building a cleaning work flow orchestrator that learns from cleaning tasks within the past and proposes promising cleaning workflows for a new dataset. To the current finish, we tend to highlight new approaches for choosing the foremost promising error detection routines, aggregating their outputs, and explaining the ultimate results.

Keyphrases: Data Cleaning Workflows, Data Profiling., machine learning

BibTeX entry
BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:
@booklet{EasyChair:5652,
  author    = {J Nivetha and A Sreemitha},
  title     = {Towards Automated Data Cleaning Workflow},
  howpublished = {EasyChair Preprint 5652},
  year      = {EasyChair, 2021}}
Download PDFOpen PDF in browser