A Data Processing Acceleration Method and System for FAST Petabyte Pulsar Data Processing
-
-
Abstract
The Five-hundred-meter Aperture Spherical radio Telescope (FAST) has started normal science operation. Data collected by drift scan pulsar survey has exceeded 1 PB, and it is expected to further increase by at least 5 PB per year. Existing pulsar search software, such as PRESTO, SIGPROC, and etc., cannot meet the real-time data analysis and management requirements. How to efficiently process PB volume of data has become a new challenge in the field of radio astronomy. In order to tackle the problems of PB data analysis and data management encountered by FAST, we, the joint team from Guizhou Normal University (GZNU) and the National Astronomical Observatories (NAOC), designed and implemented a PRESTO-based, distributed-parallel-computing system, named Craber, which integrated network technology, database, and cross-regional hardware computing resources. Craber performed well on data sets both from the Parkes Multibeam Pulsar Survey (PMPS) and the Commensal Radio Astronomy FAST Survey (CRAFTS). A 100 MB Parkes data file took ~36 seconds by 55 computing nodes in sub-cluster D of Craber, while a 128 MB data file from CRAFTS cost ~22 seconds. Up to date, Craber processed more than 66 000 data files from FAST, helped FAST detect more than 140 high-quality candidates, 114 of which have been confirmed. All resulting data products were then stored into the integrated Oracle database or dedicated file server, ready for further candidates selection with AI. Craber has already helped FAST speed up its data processing substantially and discovered a number of new pulsars.
-
-