학술논문

Building an Accessible, Usable, Scalable, and Sustainable Service for Scholarly Big Data
Document Type
Conference
Source
2021 IEEE International Conference on Big Data (Big Data) Big Data (Big Data), 2021 IEEE International Conference on. :141-152 Dec, 2021
Subject
Communication, Networking and Broadcast Technologies
Computing and Processing
Power, Energy and Industry Applications
Robotics and Control Systems
Signal Processing and Analysis
Buildings
Semantics
Computer architecture
Big Data
Search engines
Rendering (computer graphics)
Libraries
scholarly big data
big data infrastructure
big data search
big data service
Language
Abstract
Since the emergence of scholarly big data, there have been several efforts for web-based services such as digital library search engines (DLSEs). However, much of the design and specifications of an accessible, usable, scalable, and sustainable DLSE have not been well represented and discussed in the literature. We argue that these four characteristics are essential to providing a high-quality service for scholarly big data from both the user and developer’s perspectives. This paper reviews the design, implementation, and operation experiences, and lessons of CiteSeerX, a real-world digital library search engine. We analyze the strengths and weaknesses of the current design, and proposed a new design with a revised architecture, enhanced hardware, and software infrastructure. The Alpha version of the new design has been implemented and tested. The new system replaces MySQL and Apache Solr with a single instance of Elasticsearch, which plays a dual role of data storage and search. Another major improvement is the integration of extraction and ingestion, which significantly boosts document ingestion speed. The web application is re-engineered to enhance the user experience by applying a learning-to-rank model and offering more refined search tools. The system is also improved in many other aspects. We believe the design considerations and experience can benefit researchers and engineers who plan, design, and upgrade future systems with comparable scales and functionalities.