Implementation of SKA1-MID Self-calibrating Pipeline Based on Spark
-
-
Abstract
The amount of the scientific data generated by the SKA exceeds the processing capabilities of all existing distributed processing systems. How to implement a distributed execution framework is an important research issue of scientific data processing. Based on Spark framework, one of the most mature execution frameworks, this study attempts to systematically analyze how to migrate iCal pipelines in the Algorithm Reference Library (ARL) to Spark. We analyze and discuss the implementation procedure and present the corresponding task flow implementation. The final experiments show that the results of the iCAL upon Spark is correct. In summary, Spark could meet the requirements of distributed data for certain data. The limitations of Spark itself severely restricts its application in SKA.
-
-