Document Type

Conference Paper

Publication Date

2020

DOI

10.1145/3383583.3398589

Publication Title

Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020, August 1-5, 2020, Virtual Event, China

Pages

513-514

Conference Name

ACM/IEEE Joint Conference on Digital Libraries in 2020, August 1-5, 2020, Virtual Event, China

Abstract

Researchers reuse data from past studies to avoid costly re-collection of experimental data. However, large-scale data reuse is challenging due to lack of consensus on metadata representations among research groups and disciplines. Dataset File System (DFS) is a semi-structured data description format that promotes such consensus by standardizing the semantics of data description, storage, and retrieval. In this paper, we present analytic-streams – a specification for streaming data analytics with DFS, and streaming-hub – a visual programming toolkit built on DFS to simplify data analysis work-flows. Analytic-streams facilitate higher-order data analysis with less computational overhead, while streaming-hub enables storage, retrieval, manipulation, and visualization of data and analytics. We discuss how they simplify data pre-processing, aggregation, and visualization, and their implications on data analysis workflows.

Comments

© 2020 the Authors

Included with written permission.

Original Publication Citation

Jayawardana, Y., & Jayarathna, S. (2020). Streaming analytics and workflow automation for DFS. ACM/IEEE Joint Conference on Digital Libraries in 2020, Virtual Event, China, August 1-5, 2020. https://doi.org/10.1145/3383583.3398589

ORCID

0000-0001-5992-6818 (Jayawardana), 0000-0002-4879-7309 (Jayarathna)

Share

COinS