GWAC海量星表数据处理的数据库系统选型研究
A Pre-research on GWAC Massive Catalog Data Storage and Processing System
-
摘要: 为应对我国的宽视场地基广角相机阵在大数据管理和实时处理上带来的挑战,提出一种基于列存储数据库MonetDB的时序数据处理与管理系统设计方案。本方案充分利用MonetDB兼具数据处理和管理于一体的数据库平台特点,通过将交叉认证等核心数据处理算法内嵌于数据库中,从而实现将 “计算带到数据中” 的设计理念。同时,对本方案开展了多项关键技术的研究与测试:TPC-H基准性能测试;大数据加载能力测试及优化研究;基于MonetDB的Zone算法实现与测试;可定制函数开发功能的测试。初步的预研结果表明,列存储切实可行,同时对本设计方案作详细的介绍。提出的基于列存储MonetDB数据库设计的海量星表数据处理应用方案,是高效的数据处理与管理为一体的天文数据库解决方案。Abstract: GWAC (Ground Wide Angle Camera) poses huge challenges in large-scale catalogue storage and real-time processing of quick search of transients among wide field-of-view time-series data. Firstly, this paper proposes a concept to employ databases' functions such as fast data processing and parallelism, which will improve system performance and availability through the integration of data storage and computing platform. To understand the feasibility of Column-store MonetDB in vast catalogue management, we carry out a variety of pilot experiments on key technologies. We conduct TPC-H benchmark, data loading benchmark and optimization, and key algorithm testing of astronomical source association, all compared with the traditional row store database. Then, we use MonetDB to realize cross-match Zone algorithm. UDF function is developed for customizable data loading. Test results show that MonetDB database has a remarkable performance in big data management and it is efficient in real-time data processing: it has the ability to deal with 2.5T catalog data.In the end we propose a wide field of view massive time serial observation data processing solution using the in-memory column store database MonetDB. The experimental results confirm the feasibility of this scheme. The design plan of MonetDB-based massive catalogue data processing solution is an efficient astronomical database solution that combines data processing and data management.