Download PDFOpen PDF in browserTowards Automated Data Cleaning WorkflowEasyChair Preprint 56528 pages•Date: May 28, 2021AbstractThe success of AI-based technologies depends crucially on trustful and clean data. Analysis in data cleaning has provided a range of approaches to handle completely different data quality issues. Most of them require some prior information regarding the dataset so as to pick and configure the approach properly. We tend to argue that for unknown data sets, it is unreasonable to understand the data quality issues direct and to formulate all necessary quality constraints in round. Pragmatically, the user solves information quality issues by implementing associate degree repetitious cleaning process. This progressive approach poses the challenge of distinctive the right sequence of cleaning routines and their configurations. During this paper, we highlight our add progress towards building a cleaning work flow orchestrator that learns from cleaning tasks within the past and proposes promising cleaning workflows for a new dataset. To the current finish, we tend to highlight new approaches for choosing the foremost promising error detection routines, aggregating their outputs, and explaining the ultimate results. Keyphrases: Data Cleaning Workflows, Data Profiling., machine learning
|