Download PDFOpen PDF in browser

Is it feasible to identify outputs of an arbitrary process at run time without excessively slowing down workflows?

12 pagesPublished: December 11, 2023

Abstract

In this study, we explore the feasibility of identifying file events for any process in real-time without significant workflow slowdowns, to aid in generating a data provenance report for the dynamic workflow manager, MEOW. Unlike traditional workflow managers, MEOW’s output location isn’t pre-defined, and output can initiate another job. We es- tablished criteria and examined four Linux tools: strace, perf script, inotify, and fanotify. Our findings suggest that strace meets our requirements, and integrating an strace-based tracer into MEOW is both theoretically and practically viable. While the implemented tracer slows the workflow by approximately 1.3 times, worst-case scenarios show it could be up to 5 times. This research forms the base for constructing MEOW’s data provenance report.

Keyphrases: data provenance report, fanotify, inotify, meow, perf, strace, workflow manager

In: Lindsay Quarrie (editor). Proceedings of 2023 Concurrent Processes Architectures and Embedded Systems Hybrid Virtual Conference, vol 17, pages 81-92.

BibTeX entry
@inproceedings{COPA2023:Is_it_feasible_identify,
  author    = {Philip Shun Jensen and Iben Lilholm and David Marchant},
  title     = {Is it feasible to identify outputs of an arbitrary process at run time without excessively slowing down workflows?},
  booktitle = {Proceedings of 2023 Concurrent Processes Architectures and Embedded Systems Hybrid Virtual Conference},
  editor    = {Lindsay Quarrie},
  series    = {Kalpa Publications in Computing},
  volume    = {17},
  publisher = {EasyChair},
  bibsource = {EasyChair, https://easychair.org},
  issn      = {2515-1762},
  url       = {/publications/paper/H7MN},
  doi       = {10.29007/mj18},
  pages     = {81-92},
  year      = {2023}}
Download PDFOpen PDF in browser