학술논문
Blockchain-enabled immutable, distributed, and highly available clinical research activity logging system for federated COVID-19 data analysis from multiple institutions
Document Type
article
Author
Kuo, Tsung-Ting; Pham, Anh; Edelson, Maxim E; Kim, Jihoon; Chan, Jason; Gupta, Yash; Ohno-Machado, Lucila; Anderson, David M; Balacha, Chandrasekar; Bath, Tyler; Baxter, Sally L; Becker-Pennrich, Andrea; Bell, Douglas S; Bernstam, Elmer V; Ngan, Chau; Day, Michele E; Doctor, Jason N; DuVall, Scott; El-Kareh, Robert; Florian, Renato; Follett, Robert W; Geisler, Benjamin P; Ghigi, Alessandro; Gottlieb, Assaf; Hinske, Ludwig C; Hu, Zhaoxian; Ir, Diana; Jiang, Xiaoqian; Kim, Katherine K; Knight, Tara K; Koola, Jejo D; Lee, Nelson; Mansmann, Ulrich; Matheny, Michael E; Meeker, Daniella; Mou, Zongyang; Neumann, Larissa; Nguyen, Nghia H; Nick, Anderson; Park, Eunice; Paul, Paulina; Pletcher, Mark J; Post, Kai W; Rieder, Clemens; Scherer, Clemens; Schilling, Lisa M; Soares, Andrey; SooHoo, Spencer; Soysal, Ekin; Steven, Covington; Tep, Brian; Toy, Brian; Wang, Baocheng; Wu, Zhen R; Xu, Hua; Yong, Choi; Zheng, Kai; Zhou, Yujia; Zucker, Rachel A
Source
Journal of the American Medical Informatics Association. 30(6)
Subject
Language
Abstract
ObjectiveWe aimed to develop a distributed, immutable, and highly available cross-cloud blockchain system to facilitate federated data analysis activities among multiple institutions.Materials and methodsWe preprocessed 9166 COVID-19 Structured Query Language (SQL) code, summary statistics, and user activity logs, from the GitHub repository of the Reliable Response Data Discovery for COVID-19 (R2D2) Consortium. The repository collected local summary statistics from participating institutions and aggregated the global result to a COVID-19-related clinical query, previously posted by clinicians on a website. We developed both on-chain and off-chain components to store/query these activity logs and their associated queries/results on a blockchain for immutability, transparency, and high availability of research communication. We measured run-time efficiency of contract deployment, network transactions, and confirmed the accuracy of recorded logs compared to a centralized baseline solution.ResultsThe smart contract deployment took 4.5 s on an average. The time to record an activity log on blockchain was slightly over 2 s, versus 5-9 s for baseline. For querying, each query took on an average less than 0.4 s on blockchain, versus around 2.1 s for baseline.DiscussionThe low deployment, recording, and querying times confirm the feasibility of our cross-cloud, blockchain-based federated data analysis system. We have yet to evaluate the system on a larger network with multiple nodes per cloud, to consider how to accommodate a surge in activities, and to investigate methods to lower querying time as the blockchain grows.ConclusionBlockchain technology can be used to support federated data analysis among multiple institutions.