JChaosIndex: Measuring and Benchmarking Dispersion in Randomized Data

EasyChair Preprint 14318

27 pages•Date: August 6, 2024

Abstract

The profound question of whether true randomness exists in nature has not deterred humans from relying on it for activities such as gambling, tie-breaking, and conducting polls. Randomization of data is an ongoing need for various business reasons like design of clinical trials, or training an AI model, to name a few. Sometimes, data randomization may be needed for the reasons of privacy, so that data may not be traced back to any individual data subject.

While one can develop techniques for randomizing the data; to control the level of randomization, it is important to measure the level of randomness in the “randomized" data. Hence, to measure the level of randomness of the "randomized" data numerically, two aspects are important - unpredictability of data and spread (dispersion) of data. Permutation entropy is an established techniques for measuring unpredictability and complexity of time series. To measure the level of dispersion (JChaosIndex) in randomized data, a "Neighbour-displacement-delta" (NDD) based technique is proposed. JChaosIndex considers displacement of each data element as well as relative displacements of the neighbours of each data element. The technique is based on three themes: objective measurement, ease of use from programming standpoint and confidentiality of data. JChaosIndex allows systems to measure level of dispersion in the “randomized" data vis-a`-vis the original non-randomized data. It can be easily included in a programming language library or database methods or any algorithm. Also, this technique is domain-agnostic as it works purely on the indexes of the data record and not the actual data.

This paper describes technique for unpredictability and dispersion measurement, and describes ways to find out most optimal random configurations. It also includes benchmarking information about unpredictability and dispersion generated by random number generators of various programming language libraries.

Keyphrases: JChaosIndex, Neighbour displacement delta, Randomized data, Unbiased, best randomization, complexity, compute permutation entropy entropy np, def permutation entropy time series m tau, dispersed configuration, dispersion, most random configuration, permutation entropy, random number generator, random number generators, randomization of data, randomness, unpredictability

Links:

https://easychair.org/publications/preprint/dCRN

BibTeX entry

BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:

@booklet{EasyChair:14318,
  author    = {Jui Keskar},
  title     = {JChaosIndex: Measuring and Benchmarking Dispersion in Randomized Data},
  howpublished = {EasyChair Preprint 14318},
  year      = {EasyChair, 2024}}

Download PDF Open PDF in browser