KISTI: Pushing Science and Technology Boundaries

With Intel® Xeon® Scalable processors, NURION is the largest supercomputer in South Korea.

Executive Summary
No longer strictly focused on computationally intensive workloads, modern HPC centers need performant yet general-purpose systems that can address the many challenging and conflicting resource demands required to achieve scientific breakthroughs across a wide array of increasingly complex memory and data-intensive research projects. Further, world-class supercomputers such as the Korea Institute of Science and Technology Information (KISTI) NURION system are also flagship technology tools procured by an organization to provide for the future—be it in science or to meet the economic needs of a region.

According to Dr. Hee-yoon Choi (KISTI president), “KISTI will grow with the industry, academy, and institute community as a central organization to support the dynamic science and technology data ecosystem which, shares data and creates value, laying a foundation for Korea’s innovation growth”1. Equipped with Intel® Xeon® Scalable and Intel® Xeon Phi™ processors linked via an Intel® Omni-Path Architecture (Intel® OPA) communications fabric, the NURION 146-rack Cray* CS500 cluster was procured to expand and increase the pace of innovative R&D. It is the largest supercomputer in South Korea and currently the 13th fastest supercomputer in the world2.

Challenge
Scalability and the need to solve large-scale PDE problems which, involve sparse matrix operations were key technology motivators in the KISTI procurement of a powerful new leadership class supercomputer. Very simply, researchers had outgrown and needed to move beyond the existing decade old TACHYON-II cluster.

Materials research is one of the application areas that KISTI has focused on as a leading HPC R&D institute, since it has the strong potential to lead advanced semiconductor device design that is important for national competitiveness of South Korea. In particular, KISTI has pursued the ability to simulate large-scale solid atomic structures with HPCs.

Dr. Soonwook Hwang (General Director and Principal Researcher, Division of National Supercomputing at KISTI) explains, “Electronic structure simulation of realistically sized solid structures is quite critical to help experimentalists who work on designs of new materials or advanced electronic devices. With large-scale simulations, we expect to cover design factors for nanoscale devices with large-scale simulations that can predict physical behaviors of solid structures having up to several million atoms.”

Approach
Efficiently utilizing the large amount of many- and multi-core processors at scale as well as chip-level vector parallelism requires both detailed scientific and engineering knowledge. While KISTI has been firmly keeping the leadership of HPC R&D in South Korea during last decade with Tachyon-II cluster, the new NURION introduced new levels of technology. Dr. Hwang explains, “Our Intel® Parallel Computing Center (Intel® PCC) project has served as a great opportunity for us to better understand and utilize the many- and multi-core Intel® processors. With the NURION system, now we are ready to broaden the leadership of HPC R&D in the Republic of Korea.”

Results
The Intel PCC collaborative effort has paid off with quick returns as KISTI researchers have already achieved significant success even though NURION was just recently installed and is just starting to be made available to public users.

The Intel PCC project has focused on developing a software package for tight-binding simulations of large-scale electronic structures. Dr. Hoon Ryu (Intel PCC Lead and Principal Researcher, Center for Applied Scientific Computing at KISTI) notes, “The code is useful for advanced semiconducting devices, which is a key national business of South Korea.” KISTI was the first Intel PCC in the Asia-Pacific area starting in 2013.

Dr. Ryu continues, “This work basically needs to solve a Schrödinger equation that normally involves nanostructures consisting of tens of millions of atoms, which are numerically described with system matrices of a billion degrees of freedom. As a result, scalable processors are definitely needed with parallelization of core numerical operations including eigenvalue problems involving large-scale system matrices. With Intel Xeon Phi processors, we are able to drive a huge reduction of end-to-end simulation times for millions of atomic systems.”

Nurion Supercomputer Highlights

  • The 13th fastest supercomputer in the world as of the November 2018 TOP500 list2
  • Equipped with both Intel Xeon Scalable processors and Intel Xeon Phi processors and utilizing Intel Omni-Path Architecture, it is the largest supercomputer in South Korea
  • Designed to provide the resources to achieve scientific breakthroughs for a wide array of increasingly complex, data-intensive challenges across modeling, simulation, analytics, and AI

Use Case: Scaling to 1000k+ Atoms
Dr. Min Sun Yeom (director and principal researcher, Center for Applied Scientific Computing at KISTI) says, “With tight-binding simulations of nanostructures having > 1,000,000 atoms on NURION system, we were able to explore the effect of size and structural engineering on band gap energies of physically realizable lead halide perovskite nanostructures within quite reasonable times. We also obtained the preliminary ideas for how to reduce the light-induced phase separation in halide mixtures, which would not be possible with DFT simulations that can normally handle solids consisting of hundreds of atoms.”

Metal halide perovskite is a promising material candidate for optoelectronic devices, and thus provides the motivation for system empirical modelling of large-scale atomic structures. In short, it can provide nice guidelines for device designs such as how to map optical gaps and how to alleviate light-induced phase separation (a bottleneck in LED designs). The best part of empirical modelling is that it can provide direct connections to experiments.

Connection of experiments and large-scale simulations (a) Experimental image of perovskite (CsPbBr3) quantum dots (Nano Letters 15, 3692-3696) (b) Dependency of band gap energies on quantum dot sizes. The KISTI numerical results connect nicely to experiment.

Dr. Ryu points out that the use of Intel® Math Kernel Library (Intel® MKL) helped scale their calculations, “Intel MKL (scalapack packages such as lib_mkl_scalapack_lp64 and libmkl_blacs_intelmpi_lp64) helped a lot to improve the scalability of our Schrödinger solver. We used the LANCZOS algorithm, a well-known iterative method to tackle large-scale eigenvalue problem which, has a numerical part that is hard to be MPI-parallelized by users and becomes a performance bottleneck as iterative processes continue. With the Intel MKL subroutines, we were able to reduce the corresponding computing load with improved scalability.”

Use Case: Many-core Performance on Sparse Matrix Operations
Leveraging previous work on the first generation Intel Xeon Phi coprocessors, Mr. Kyu Nam Cho (former research associate, Korea University, now principal engineer in Samsung Research, Samsung Electronics) says, “The performance of sparse matrix-vector multiplication, which is the core numerical operation needed to solve large-scale electronic structures, was not bad even when we worked with Intel first generation many-core processors (Intel Xeon Phi coprocessors) compared to Intel® Xeon® processors V3. The performance on the NURION Intel Xeon Phi nodes is much better, particularly when combined with MCDRAM.” Cho notes that, “Another critical strength of Intel Xeon Phi processor-based systems is their ease of use, particularly if we consider the amount of work that must be performed to port the existing code to run on PCI-E add-in devices.”

The KISTI Intel PCC found that the speedup due to the performance of the Intel Xeon Phi processor’s high bandwidth memory (HBM) meant that a single node could take a larger workload. Dr. Ryu points out that “inter-node scalability is quite nice.” Scalability tests demonstrate a speedup when increasing the number of computing nodes. The KISTI Intel PCC observed a 1.5-3x speedup3 when they made use of the high bandwidth memory (HBM) packaged with the many-core Intel Xeon Phi processor 7250 nodes. More recently, they successfully ran a 0.4 billion atomic structure in NURION system and checked the strong scalability up to 2,500 computing nodes (170,000 computing cores).

Dr. Ryu points out that “Intel® technology matches with the purpose of KISTI HPC.” According to a statistical workload analysis performed at KISTI, approximately 50% of their workloads involve sparse matrix operations. This means the NURION supercomputer should perform well in meeting the needs of KISTI researchers across a wide range of research areas.

Performance Realized
The importance of large-scale simulations for advanced material research to South Korea cannot be underestimated as evidenced by the money spent to procure a world class supercomputer4. For this reason, the KISTI Intel PCC critically evaluated the various hardware solutions upon which the NURION procurement could be based—including GPU accelerated systems. Their results have been published in the literature for Intel processors5 6 7 and GPUs8. They present solid technical evidences to show why the choice for NURION was an Intel based system that delivers 25.7 PFlop/s (Rpeak), 13.9 PFlop/s (Rmax),3 ranking it at #13 on the November 2018 TOP500 list.2 Dr. Ryu is developing a white paper to tell the full CPU vs. GPU story in an article to be published later this year9.

Strong scalability of end-to-end simulations (a) Small-scale BMT target was to calculate 5 lowest conduction band states in 27x33x33 nm3 (~1.5million atoms) SI:P quantum dot10The scalability is tested up to 3 computing nodes (204 cores). (b) Extremely large-scale BMT target was to calculate 3 lowest conduction subbands in 2715x54x54 nm3 Si:P nanowires (0.4billion atoms). The scalability here is tested up to 2,560 computing nodes (170,000 cores) in NURION system.

However the story does not stop with the NURION system as the KISTI Intel PCC is evaluating the use of FPGAs for large-scale electronic structure calculations. In particular, the Intel Scalable processor family provides a pathway towards future FPGA acceleration11. As with the GPU and Intel processor evaluations, the KISTI Intel PCC has been publishing their work on FPGAs as well12.

KISTI people who enabled scalable simulations of extremely large electronic structures in NURION system: (From left) Dr. Hoon Ryu, Dr. Ji-Hoon Kang (principal researcher, Center for Applied Scientific Computing), Mr. Taeyoung Hong (NURION operation team lead and senior researcher, Supercomputing Service Center

Explore Related Products and Solutions

Intel® Xeon® Scalable Processors

Drive actionable insight, count on hardware-based security, and deploy dynamic service delivery with Intel® Xeon® Scalable processors.

Learn more

Intel® Omni-Path Architecture

Intel® Omni-Path Architecture (Intel® OPA) lowers system TCO while providing reliability, high performance, and extreme scalability.

Learn more

Intel® Select Solutions

Deliver a simplified data center infrastructure with workload-optimized configurations for fast and easy deployment.

Learn more

通知および免責事項

インテル® テクノロジーの機能と利点はシステム構成によって異なり、対応するハードウェアやソフトウェア、またはサービスの有効化が必要となる場合があります。実際の性能はシステム構成によって異なります。絶対的なセキュリティーを提供できるコンピューター・システムはありません。詳細については、各システムメーカーまたは販売店にお問い合わせいただくか、http://www.intel.co.jp を参照してください。// 性能に関するテストに使用されるソフトウェアとワークロードは、性能がインテル® マイクロプロセッサーだけに最適化されていることがあります。SYSmark* や MobileMark* などの性能テストは、特定のコンピューター・システム、コンポーネント、ソフトウェア、操作、機能を使用して測定したものです。結果はこれらの要因によって異なります。製品の購入を検討される場合は、他の製品と組み合わせた場合の本製品の性能など、ほかの情報や性能テストも参考にして、パフォーマンスを総合的に評価することをお勧めします。詳細については、https://www.intel.co.jp/benchmarks (英語) を参照してください。// 性能の測定結果はシステム構成の詳細に記載された日付時点のテストに基づいています。また、現在公開中のすべてのセキュリティー・アップデートが適用されているとは限りません。詳細については、公開されている構成情報を参照してください。絶対的なセキュリティーを提供できる製品やコンポーネントはありません。// 記載されているコスト削減シナリオは、指定の状況と構成で、特定のインテル® プロセッサー搭載製品が将来のコストに及ぼす影響と実現されるコスト削減の例を示すためのものです。状況によって異なる可能性があります。インテルは、いかなるコストもコスト削減も保証いたしません。// インテルは、本資料で参照しているサードパーティーのベンチマーク・データまたはウェブサイトについて管理や監査を行っていません。本資料で参照しているウェブサイトにアクセスし、本資料で参照しているデータが正確かどうかを確認してください。// いくつかのテスト結果は、インテル社内での分析またはアーキテクチャーのシミュレーションあるいはモデリングで推定 / シュミレートされており、情報提供を目的として提供されています。システム・ハードウェア、ソフトウェア、構成などの違いにより、実際の性能は掲載された性能テストや評価とは異なる場合があります。

免責事項

1 Intel Xeon Phi 7250 nodes; 68 cores/node using 2 MPI processes + 32 threads per node; Quad / Flat memory mode; 100G network connectivity. 2500 Intel Xeon Phi nodes, a total of 68x2500 computing cores were used for the benchmark test of KISTI’s in-house code. BIOS: S72C610.86B.01.03.0018.C0001.012420182107; Memory: 96GB DDR4-2400 memory + 16GB 7.2GT/s MCDRAM; Networking and Storage: Intel Omni-Path Architecture, 100Gb network connectivity; OS and Kernel details: CentOS Linux Release 7.3, Linux kernel 3.10.0- 514.26.2.el7.x86-64; Application software: Quantum simulation tool for Advanced Nanoscale Devices; Tested by KISTI in November, 2018.
2Currently according to the November 2018 TOP500 list
3Test performed by KISTI in November 2018. Rmax is maximal LINPACK performance achieved; Rpeak is theoretical peak perfor­mance per TOP500.org. Configuration: Intel Xeon Phi 7250 nodes; Up to 272 (68x4) cores/node using 4 MPI processes + 68 threads per node; Quad/Flat memory mode; 10 G network connectivity.
7Ji-Hoon Kang, Oh-Kyoung Kwon, Jinwoo Jeong, Kyunghun Lim, Hoon Ryu: Performance Evaluation of Scientific Applications on Intel Xeon Phi Knights Landing Clusters. HPCS 2018: 338-341.
8GPU results were published in “Fast, energy-efficient electronic structure simulations for multi-million atomic systems with GPU devices” by Hoon Ryu and Oh-Kyoung Kwon in Journal of Compu­tational Electronics (2018) 17:698–706, https://doi.org/10.1007/s10825-018-1138-4.
9Please check Dr. Ryu’s publications list to see the article when it ap­pears: https://www.researchgate.net/profile/Hoon_Ryu3
10Si:P alloy structures have been popularly studied to build Si-based qubit systems. See Nature Nanotechnology 9, 430-435, and Nano Letters 15, 1, 450-456.