Download PDFOpen PDF in browser

A Highly Accurate Data Synchronization and Full-Text Search Algorithm for Canal and Elasticsearch

EasyChair Preprint 10472

6 pagesDate: June 30, 2023

Abstract

Currently, there are numerous thorny issues in structured data and semi-structured full-text search scheme with large-scale text nature, like long data synchronization delay, inconvenient personalized business processing and low efficiency. To address these issues, this paper proposes an efficient algorithm based on Canal data synchronization framework and Elasticsearch full-text search engine. Firstly, we rewrite the Canal adapter component to obtain the flexible configuration of business data processing, thereby enhancing the secondary data processing ability of the framework and achieving the purpose of improving the efficiency of data synchronization. Secondly, by recording the synchronization time of nearby data in Canal framework, the weight of time series data is gradually decreased by combining with the exponential weighted average function to highlight the influence of recent data and present the novelty of data, which can achieve effective control the synchronization interval and duration by dynamically and flexibly setting the synchronization trigger period. Lastly, the Elasticsearch word tokenizer is modified, and then the configuration of custom expansion words and stop words dictionary are proposed to filter the query data effectively, thereby enhancing the query hit rate and accuracy. Extensive experiments on the data of traditional Chinese medicine demonstrate that the designed algorithm obtains high data synchronization efficiency, full text search speed and accuracy. Hence, the proposed algorithm is a milestone in smart healthcare.

Keyphrases: Canal, Elasticsearch, Real-Time Synchronization, full-text search

BibTeX entry
BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:
@booklet{EasyChair:10472,
  author    = {Peiyang Wei and Xiaoyu Shi and Gang Zhang},
  title     = {A Highly Accurate Data Synchronization and Full-Text Search Algorithm for Canal and Elasticsearch},
  howpublished = {EasyChair Preprint 10472},
  year      = {EasyChair, 2023}}
Download PDFOpen PDF in browser