A Tale of Two Systems: Using Containers to Deploy HPC Applications on Supercomputers and Clouds

Younge, Andrew J., et al. “A tale of two systems: Using containers to deploy HPC applications on supercomputers and clouds.” 2017 IEEE International Conference on Cloud Computing Technology and Science (CloudCom). IEEE, 2017.

container

  • Docker
  • Shifter
  • Charliecloud
  • Singularity

DevOps

DevOps

部署的工作流:

  1. 在本地电脑上使用docker容器(因为桌面电脑用win和macOS的比较多,docker都支持),将Dockerfile和项目代码保存到git项目中。
  2. 项目推送到远端的仓库,并将容器镜像放进容器注册服务。
  3. 在多个平台上(EC2、cluster、supercomputer)拉取代码,在容器中执行。

environment

  • 镜像环境:

    • HPCG benchmark

    • Intel MPI Benchmark suite (IMB)

    • base image: Centos 7, both benchmarks were built using the Intel 2017 Parallel Studio, which includes the latest Intel compilers and Intel MPI library.

      拉取镜像:

      1
      docker pull ajyounge/hpcg-container
  • Cray XC30 supercomputing platform

    • hardware:

      Volta includes 56 compute nodes packaged in a single enclosure, with each node consisting of two Intel Ivy Bridge E5-2695v2 2.4 GHz processors (24 cores total), 64GB of memory, and a Cray Aries network interface.

    • shared file system

      Shared file system supportis provided by NFS I/O servers projected to compute nodes via Cray’s proprietary DVS storage infrastructure.

    • OS:Cray Compute Node Linux (CNL ver. 5.2.UP04, 基于SUSE Linux 11), linux kernel v3.0.101

      内核版本过老,需要做出修改才能使用Singularity。具体来说,增加了对loopback设备和EXT3文件系统的支持。
      
      • config:

        Specifically, we configure Singularity to mount /opt/cray, as well as /var/opt/cray for each container instance.

        In order to leverage the Aries interconnect as well as advanced shared memory intra-node communication mechanisms, we dynamically link Cray’s MPI and associated libraries provided in /opt/cray directly within the container

        链接的动态库包括:

        • Cray’s uGNI messaging interface
        • XPMEM shared memory subsystem
        • Cray PMI runtime libraries
        • uDREG registration cache
        • application placement scheduler (ALPS)
        • configure workload manager
        • some Intel Parallel Studio libraries
  • Amazon EC2: c3.8xlarge

    • hardware:

      • cpu: Intel Xeon “Ivy-Bridge” E5-2680 v2 (2.8 GHz, 8 cores, hyperthread) x 2
      • memory: 60GB of RAM
      • disk: 2x320 GB SSDs
      • network: 10 Gb Ethernet network
    • OS: RHEL7

      • config:
        使用SR-IOV技术,加载了ixgbevf内核模块。
    • Docker: v1.19

benchmark

Benchmarks are reported as the average of 10 trials for IMB and 3 trials for HPCG, with negligible run-to-runvariance that is therefore not shown.

  • IMB

    测试网络的带宽和延迟,对应MPI节点通信的性能。对于全静态链接和动态链接的版本做了测试。

    • PingPong bandwidth

      IMB PingPong bandwidth

      Singularity容器中链接CrayMPI,带宽最高,接近native。表明MPI库的选择会严重影响性能,针对特殊机器做过优化的版本最优。

    • PingPong Latency

      IMB PingPong Latency

      Singularity链接CrayMPI,延迟和native采用动态链接基本一致。静态链接的版本延迟最低。

  • HPCG

    MPI程序的性能

    HPCG benchmark

    可以观察到,随着rank数量增加,Cray相比EC2的性能优势开始体现;Singularity链接CrayMPI的性能接近native;链接IntelMPI的性能甚至不如kvm虚拟机。