Download PDFOpen PDF in browserA Highly Accurate Data Synchronization and Full-Text Search Algorithm for Canal and ElasticsearchEasyChair Preprint 104726 pages•Date: June 30, 2023AbstractCurrently, there are numerous thorny issues in structured data and semi-structured full-text search scheme with large-scale text nature, like long data synchronization delay, inconvenient personalized business processing and low efficiency. To address these issues, this paper proposes an efficient algorithm based on Canal data synchronization framework and Elasticsearch full-text search engine. Firstly, we rewrite the Canal adapter component to obtain the flexible configuration of business data processing, thereby enhancing the secondary data processing ability of the framework and achieving the purpose of improving the efficiency of data synchronization. Secondly, by recording the synchronization time of nearby data in Canal framework, the weight of time series data is gradually decreased by combining with the exponential weighted average function to highlight the influence of recent data and present the novelty of data, which can achieve effective control the synchronization interval and duration by dynamically and flexibly setting the synchronization trigger period. Lastly, the Elasticsearch word tokenizer is modified, and then the configuration of custom expansion words and stop words dictionary are proposed to filter the query data effectively, thereby enhancing the query hit rate and accuracy. Extensive experiments on the data of traditional Chinese medicine demonstrate that the designed algorithm obtains high data synchronization efficiency, full text search speed and accuracy. Hence, the proposed algorithm is a milestone in smart healthcare. Keyphrases: Canal, Elasticsearch, Real-Time Synchronization, full-text search
|