Storing, processing, and transmitting state confidential information are strictly prohibited on this website
Dai Wei, Wang Sen, Li Qiuhong, Deng Hui, Mei Ying, Wang Feng. Implementation of SKA1-MID Self-calibrating Pipeline Based on Spark[J]. Astronomical Research and Technology, 2020, 17(3): 334-340.
Citation: Dai Wei, Wang Sen, Li Qiuhong, Deng Hui, Mei Ying, Wang Feng. Implementation of SKA1-MID Self-calibrating Pipeline Based on Spark[J]. Astronomical Research and Technology, 2020, 17(3): 334-340.

Implementation of SKA1-MID Self-calibrating Pipeline Based on Spark

More Information
  • Received Date: November 27, 2019
  • Revised Date: December 11, 2019
  • Available Online: November 20, 2023
  • The amount of the scientific data generated by the SKA exceeds the processing capabilities of all existing distributed processing systems. How to implement a distributed execution framework is an important research issue of scientific data processing. Based on Spark framework, one of the most mature execution frameworks, this study attempts to systematically analyze how to migrate iCal pipelines in the Algorithm Reference Library (ARL) to Spark. We analyze and discuss the implementation procedure and present the corresponding task flow implementation. The final experiments show that the results of the iCAL upon Spark is correct. In summary, Spark could meet the requirements of distributed data for certain data. The limitations of Spark itself severely restricts its application in SKA.
  • [1]
    CARILLI C, RAWLINGS S. Motivation, key science projects, standards and assumptions[J]. New Astronomy Reviews, 2004, 48(11/12):979-984.
    [2]
    HALL P J, SCHILIZZI R T, DEWDNEYP E F, et al. The square kilometer array (SKA) radio telescope:progress and technical directions[J]. Radio Science Bulletin, 2008, 2008(326):4-19.
    [3]
    TAYLOR A R. The square kilometre array[C]//Proceedings of the International Astronomical Union. 2007:164-169.
    [4]
    BROEKEMA P C, VAN NIEUWPOORT R V V, BAL H E. ExaScale high performance computing in the square kilometer array[C]//Proceedings of the 2012 workshop on High-Performance Computing for Astronomy Date. 2012:9-16.
    [5]
    赖铖, 梅盈, 邓辉, 等. MUSER可见度数据积分方法与实现[J]. 天文研究与技术, 2018, 15(1):78-86.
    [6]
    于晓雨, 邓辉, 梅盈, 等. 宽视场成像网格化算法中w-plane最优经验值研究[J]. 天文研究与技术, 2019, 16(2):218-224.
    [7]
    ZAHARIA M, CHOWDHURY M, FRANKLIN M J, et al. Spark:cluster computing with working sets[C]//Proceedings of the 2nd USENIX Conference on Hot topics in cloud computing HotCloud. 2010.
    [8]
    SPARK A. Apache Spark:lightning-fast unified analyticsengine[EB/OL].[2019-11-28]. http://spark.apache.org.
    [9]
    ROCKLIN M. Dask:parallel computation with blocked algorithms and task scheduling[C]//Proceedings of the 14th Python in Science Conference. 2015:130-136.

Catalog

    Article views (73) PDF downloads (195) Cited by()

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return