# HPC基盤の現状と将来

石川裕

東京大学情報理工学系研究科/情報基盤センター 理化学研究所 計算科学研究機構

- Introduction of Key Organizations/Programs/Activities

   HPCI
  - Innovative High Performance Computing Infrastructure
    - Seamless access to K computer, supercomputers, and user's machines
  - SPIRE
    - Strategic Programs for Innovative Research
- Feasibility Study for future HPC in Japan
  - Background
  - Introduction of "Feasibility study on advanced and efficient latency core-based architecture for future HPCI R&D"
- Post T2K

#### What is HPCI (Innovative High Performance Computing Infrastructure)



University of Tokyo/RIKEN AICS

## SPIRE (Strategic Programs for Innovative Research)

- Objectives
  - Scientific results as soon as K computer starts its operation
  - Establishment of several core institutes for computational science
- Overview
  - Selection of the five strategic research fields which will contribute to finding solutions to scientific and social Issues
    - Field 1: Life science/Drug manufacture
    - Field 2: New material/energy creation
    - Field 3: Global change prediction for disaster prevention/mitigation
    - Field 4: *Mono-zukuri* (Manufacturing technology)
    - Field 5: The origin of matters and the universe
  - A nation wide research group is formed by centering the core organization of each research area designated by MEXT.
  - The groups are to promote R&D using K computer and to construct research structures for their own area

## Five strategic groups of SPIRE

- Computational Life Science and Application Drug Discovery and Medical Development
   Led by Toshio Yanagida, RIKEN
- Computational Materials Science Initiative
   Led by Shinji Tsuneyuki, University of Tokyo
- Projection of Planet Earth Variations for Mitigating Natural Disasters
   Led by Shiro Imawaki, JAMSTEC
- Industrial Innovation
   Led by Chisachi Kato, University of Tokyo
- The origin of matters and the universe
   Led by Shinya Aoki, University of Tsukuba











- Introduction of Key Organizations/Programs/Activities

   HPCI
  - Innovative High Performance Computing Infrastructure
    - Seamless access to K computer, supercomputers, and user's machines
  - SPIRE
    - Strategic Programs for Innovative Research
- Feasibility Study for future HPC in Japan
  - Background
  - Introduction of "Feasibility study on advanced and efficient latency core-based architecture for future HPCI R&D"
- Post T2K

## What happened in FY2011

#### http://www.open-supercomputer.org/workshop/sdhpc/

7



University of Tokyo/RIKEN AICS

## System Requirement for Target Sciences by 2020

B/F

Requirement of

- System performance
  - FLOPS: 800 2500PFLOPS
  - Memory capacity: 10TB 500PB
  - Memory bandwidth: 0.001 1.0 B/F
  - Example applications
    - Small capacity requirement
      - MD, Climate, Space physics, ...
    - Small BW requirement
      - Quantum chemistry, ...
    - High capacity/BW requirement
      - Incompressibility fluid dynamics, ...
- Interconnection Network
  - Not enough analysis has been carried out
  - Some applications need >1us latency and large bisection BW
- Storage
  - There is not so big demand





#### Candidate of the Post Peta-scale Architectures

- Four types of architectures are considered
  - General Purpose (GP)
    - Ordinary CPU-based MPPs
    - e.g.) K-Computer, GPU, Blue Gene, capacity x86-based PC-clusters
  - Capacity-Bandwidth oriented (CB)
    - With expensive memory-I/F rather than computing capability
    - e.g.) Vector machines
  - Reduced Memory (RM)
    - With embedded (main) memory
    - e.g.) SoC, MD-GRAPE4, Anton
  - Compute Oriented (CO)
    - Many processing units
    - e.g.) ClearSpeed, GRAPE-DR



Source: Masaaki Kondo's presentation at IESP Kobe meeting, 2012 University of Tokyo/RIKEN AICS

## Gap Between Requirement and Technology Trends

- Mapping four architectures onto science requirement
- Projected performance vs. science requirement
  - Big gap between projected and required performance



#### Needs national research project for science-driven HPC systems

GP (General Purpose), Capacity-Bandwidth oriented (CB), Reduced Memory (RM, Compute Oriented (CO) 2012/8/20

Source: Masaaki Kondo's presentation at IESP Kobe meeting, 2012 University of Tokyo/RIKEN AICS

## Plans



# Feasibility Study on Future HPC R&D in Japan

#### **Program promotion board**

Member: The head of each team and other specialists Role: To check the progress of the each team and to coordinate the collaboration among the teams

1 application study team

RIKEN AICS and TITECH Collaboration with application fileIds

- Identification of scientific and social issues to be solve in the future
- Drawing Science road map until 2020
- Selection of the applications that plays key roles in the roadmap
- Review of the architectures using those applications

#### 3 system study teams



- Review of the system using the application codes
- Estimation of the system's cost

- Introduction of Key Organizations/Programs/Activities

   HPCI
  - Innovative High Performance Computing Infrastructure
    - Seamless access to K computer, supercomputers, and user's machines
  - SPIRE
    - Strategic Programs for Innovative Research
- Feasibility Study for future HPC in Japan
  - Background
  - Introduction of "Feasibility study on advanced and efficient latency core-based architecture for future HPCI R&D"
- Post T2K

# Towards Next-generation General Purpose Supercomputer



## Co-design

- Tightly coupled design of architecture by architects, software developers, and application developers.
- 1 Cycle / 2 months



Programming Mode

## Part of Target Applications in FY2012

- ALPS(Algorithms and Libraries for Physics Simulations)
  - Providing high-end simulation codes for strongly correlated quantum mechanical systems
  - Total Memory: 10~100PB, low latency and high radix network
- **RSDFT** (Real-Space Density Functional Theory)
  - A DFT(Density Functional Theory) code with real space discretized wave functions and densities for molecular dynamics simulations using the Car-Parrinello type approach
  - Total Memory: 1PB, Performance: 1EFLOPS(B/F 0.1)
- NICAM (Nonhydrostatic ICosahedral Atmospheric Model)
  - A Global Cloud Resolving Model (GCRM)
  - Total Memory: 140 TB, Memory Bandwidth: 300 PB/sec, Performance: 700 PFLOPS(B/F = 0.4)
- COCO (CCSR Ocean Component Model)
  - ocean general circulation model developed at Center for Climate System Research (CCSR), the University of Tokyo
  - Total Memory: 320 TB, Memory Bandwidth: 150 PB/sec, Performance: 50 PFLOPS (B/F = 3)



Files: 10 TB

1 job, 1 PB

1 problem:

- Introduction of Key Organizations/Programs/Activities

   HPCI
  - Innovative High Performance Computing Infrastructure
    - Seamless access to K computer, supercomputers, and user's machines
  - SPIRE
    - Strategic Programs for Innovative Research
- Feasibility Study for future HPC in Japan
  - Background
  - Introduction of "Feasibility study on advanced and efficient latency core-based architecture for future HPCI R&D"
- Post T2K



#### Variations of Many-core based machines





# Many-core chip connected to system bus Not existing so far



# System Software Stack

#### In case of Non-Bootable Many Core



Design Criteria

- Cache-aware system software stack
- Scalability
- Minimum overhead of communication facility
- Portability

#### In case of Bootable Many Core



- AAL (Accelarator Abstraction Layer)
  - Provides low-level accelerator interface
  - Enhances portability of the micro kernel
- IKCL (Inter-Kernel Communication Layer)
  - Provides generic-purpose communication and data transfer mechanisms
- SMSL (System Service Layer)

- Provides basic system services

## DCFA: Direct Communication Facility for Accelerator

- Limitations of a PCI-Express device
  - cannot configure another device such as a communication device, and thus it does not know the other device address.
  - Cannot receive interrupts from other devices.
- DCFA
  - The host configures and initializes an Infiniband HCA, and informs the HCA address to an MIC device so that it may issue commands to that device
  - The MIC device directly accesses the Infiniband HCA registers



Min Si and Yutaka Ishikawa, "Design of Direct Communication Facility for Manycorebased Accelerators, " to appear at CASS2012 in conjunction with IPDPS2012.



• The same performance as that of host to host data transfer for large message size



## Design Considerations of File I/O System

• File I/O functions run on computing core in MIC

 File I/Os are delegated to the OSdedicated core in MIC



• File I/Os are delegated to the host OS

Yuki Matsuo, Taku Shimosawa, and Yutaka Ishikawa, "A File I/O System for Many-Core Based Clusters," in conjunction with ICS2012, 2012.



#### **Performance Differences**

#### Iterative

```
size = 64KB;
for(n = 0; n < DIVISOR; n++) {
  for(i = 0; i < size/4; i++) buf[i] = n;
  write(fd, buf, size);
```

In the iterative simple benchmark, the write system call is issued during the user data is located on L2 cache.

#### Once

size = 64KB; j = 0; for(n = 0; n < DIVISOR; n++) { for(i = 0; i < size/4; i++) buf[j++] = n; }

write(fd, buf, size\*DIVISOR);

#### Relative Total Execution Time





- Building Nation-wide infrastructure and collaboration structure
  - HPCI: Innovative High Performance Computing Infrastructure
  - SPIRE: Strategic Programs for Innovative Research
- Starting Feasibility Study for future HPC in Japan
  - 1 application and 3 architecture teams have been selected
- Studying OS mechanisms for Post T2K
  - A manycore-based cluster has been considered and been studied