Sunday, January 28, 2018

Room A+B+C Room F
5:00pm-6:30pm Welcome Reception

Monday, January 29, 2018

Room A+B+C Room F
9:30am-9:45am Opening
10:00am-10:50am Invited Talk
Chair: Taisuke Boku (Univ. of Tsukuba)

Towards Next Generation Chinese Supercomputer
Prof. Yutong Lu
10:50am-11:10am ~ Coffee Break ~
11:10am-12:40pm Best Paper Finalist
Chair: Osamu Tatebe (University of Tsukuba)

Maximizing Communication Overlap with Dynamic Program Analysis
Costin Iancu (LBNL), Emmanuelle Saillard (Inria), Wim Lavrijsen (LBNL), and Koushik Sen (UCB)

Improving Collective MPI-IO Using Topology-Aware Stepwise Data Aggregation with I/O Throttling
Yuichi Tsujita, Atsushi Hori, Toyohisa Kameyama, Atsuya Uno, Fumiyoshi Shoji, and Yutaka Ishikawa (RIKEN, Advanced Institute for Computational Science)

Wave Propagation Simulation of Complex Multi-Material Problems with Fast Low-Order Unstructured Finite-Element Meshing and Analysis
Kohei Fujita (The University of Tokyo, RIKEN); Keisuke Katsushima (The University of Tokyo); Tsuyoshi Ichimura (The University of Tokyo, RIKEN); Masashi Horikoshi (Intel K.K.); Kengo Nakajima (The University of Tokyo); and Muneo Hori and Lalith Maddegedara (The University of Tokyo, RIKEN)
12:40am-2:00pm ~ Lunch Break ~
2:00pm-3:30pm Matrix Computations
Chair: Takahiro Katagiri (Nagoya University)

A Distributed and Parallel Asynchronous Unite and Conquer Method to Solve Large Scale Non-Hermitian Linear Systems
Xinzhe WU and Serge G. Petiton (Maison de la Simulation, University Lille 1)

Iterative Solution of Sparse Linear Least Squares using LU Factorization
Gary Howell (North Carolina State University) and Marc Baboulin (Université Paris-Sud)

A Left-Looking Selected Inversion Algorithm and Task Parallelism on Shared Memory Systems
Mathias Jacquelin (Lawrence Berkeley National Laboratory), Lin Lin (University of California Berkeley), Weile Jia (Lawrence Berkeley National Laboratory), Yonghua Zhao (Supercomputing Center of Chinese Academy of Sciences), and Chao Yang (Lawrence Berkeley National Laboratory)
Cluster Computing
Chair: Hiroyuki Takizawa (Tohoku University)

Autotuning MPI Collectives using Performance Guidelines (Schedule is changed to Cluster Computing 2, Jan. 31)
Sascha Hunold and Alexandra Carpen-Amarie (TU Wien)

Multi-tasking Execution in PGAS Language XcalableMP and Communication Optimization on Many-core Clusters
Keisuke Tsugane (University of Tsukuba) and Jinpil Lee, Hitoshi Murai, and Mitsuhisa Sato (RIKEN)

A Source-to-Source Implementation of Coarray Fortran with MPI for High Performance
Hidetoshi Iwashita, Masahiro Nakao, Hitoshi Murai, and Mitsuhisa Sato (RIKEN, AICS)
3:30pm-4:30pm Poster Session
Authorized posters can be downloaded from here.

(P01) Performance Measurement of Eulerian Kinetic Code on the Xeon Phi KNL
Takayuki Umeda (Nagoya University, Institute for Space-Earth Environmental Research), Keiichiro Fukazawa (Kyoto University, Academic Center for Computing and Media Studies)

(P02) MPI Communication Optimization of Massively Parallel Applications
Fatimah AlRuwai (Saudi Aramco), Majdi Baddourah (Saudi Aramco)

(P03) Visualizing the Connectome Assembled by a Large Number of Single-Neuron Images
Chi-Tin Shih (Department of Applied Physics, Tunghai University), Nan-Yow Chen (National Center for High-Performance Computing)

(P04) Accelerating Convolutional Neural Networks Using Low Precision Arithmetic
Hiroki Naganuma (Tokyo Institute of technology), Rio Yokota (Tokyo Institute of technology)

(P05) Optimization of search for neighbour-particle in MPS method for Xeon, Xeon Phi and GPU by using directives
Takaaki Miyajima (Japan Aerospace Exploration Agency), Kenichi Kubota (Japan Aerospace Exploration Agency), Naoyuki Fujita (Japan Aerospace Exploration Agency)

(P06) A Structure of FEM Matrix by Lagrange Basis Polynomials
Hiroshi Murakami (Tokyo Metropolitan University)

(P07) ooc cuDNN: A Deep Learning Library Supporting CNNs over GPU Memory capacity
Yuki Ito (Tokyo Institute of Technology), Ryo Matsumiya (Tokyo Institute of Technology), Toshio Endo (Tokyo Institute of Technology)

(P08) Large scale ab initio calculation using LDC-DFT algorithm on many-core processor architectures
Kohei Shimamura (Kobe University)

(P09) Performance Improvement of Calculation of Static Magnetic Field of Micromagnetic Simulator Using Supercomputer FX10
Masahiro Arai (Kogakuin University), Fumiko Akagi (Kogakuin University), Saneyasu Yamaguchi (Kogakuin University), Kazuetsu Yoshida (Kogakuin University)

(P10) Acceleration of the tree method with SIMD instruction set
Tetsushi Kodama (Chiba University), Tomoaki Ishiyama (Chiba University)

(P11) Auto-tuning of Hyperparameters of Machine Learning Models
Zhen Wang (Tohoku University), Ryusuke Egawa (Tohoku University), Reiji Suda (The University of Tokyo), Hiroyuki Takizawa (Tohoku University)

(P12) Thermal-aware Dynamic Checkpoint Interval Tuning for High Performance Computing
Pei Li (Tohoku University), Mulya Agung (Tohoku University), Muhammad Alfian Amrizal (Tohoku University), Ryusuke Egawa (Tohoku University), Hiroyuki Takizawa (Tohoku University)

(P13) Optimizing Hardware-Based Privacy-Preserving MapReduce
Han-Yee Kim (Korea University), Rohyoung Myung (Korea University), Sangwoo Park (Korea University), Jungha Lee (KISTI), Sukyong Choi (Korea University), Heonchang Yu (Korea University), Taeweon Suh (Korea University)

(P14) Extension of Simulation Caching Framework for Large-scale Simulation
Yoshitaka Kumada (University of Fukui), Jiachao Zhang (University of Fukui), Shinji Fukuma (University of Fukui), Shin-ichiro Mori (University of Fukui)

(P15) Event-Based Triggering and Management of Scientific Workflow Ensembles
Suraj Pandey (University of Hawaii), Karan Vahi (USC Information Sciences Institute), Rafael Silva (USC Information Sciences Institute), Ewa Deelman (USC Information Sciences Institute), Ming Jiang (Lawrence Livermore National Lab), Cyrus Harrison (Lawrence Livermore National Lab), Al Chu (Lawrence Livermore National Lab), Henri Casanova (University of Hawaii)

(P16) Performance Classification of the K-computer Workloads using Hierarchical Clustering and K-means
Masaaki Terai (RIKEN Advanced Institute for Computational Science), Riku Kashiwaki (Graduate School of Simulation Studies, Univeristy of Hyogo), Fumiyoshi Shoji (RIKEN Advanced Institute for Computational Science)

(P17) Estimation of functions representing observation data using convolution neural network
Issei Koga (The Graduate School and Faculty of Information Science and Electrical Engineering, Kyushu University), Kenji Ono (Kyushu University)

(P18) CCA/EBT: Code Comprehension Assistance Tool for Evidence-Based Performance Tuning
Masatomo Hashimoto (Software Technology and Artificial Intelligence Research Laboratory, Chiba Institute of Technology; RIKEN Advanced Institute for Computational Science), Masaaki Terai (RIKEN Advanced Institute for Computational Science), Toshiyuki Maeda (Software Technology and Artificial Intelligence Research Laboratory, Chiba Institute of Technology), Kazuo Minami (RIKEN Advanced Institute for Computational Science)

(P19) Automatic Translation of OpenACC Code for Multi-GPU Support
Kazuaki Matsumura (Tokyo Institute of Technology), Mitsuhisa Sato (RIKEN Advanced Institute for Computational Science; Graduate School of Systems and Information Engineering, University of Tsukuba), Taisuke Boku (Graduate School of Systems and Information Engineering, University of Tsukuba; Center for Computational Sciences, University of Tsukuba)

(P20) An extension of the fault tolerant multi-SPMD programming environment for large scale systems and MPI-IO
Miwako Tsuji (RIKEN AICS), Mitsuhisa Sato (RIKEN AICS)

(P21) Performance Evaluation of NICAM-DC-MINI using XcalableACC on Accelerated Cluster
Masahiro Nakao (RIKEN, AICS), Hitoshi Murai (RIKEN, AICS), Taisuke Boku (University of Tsukuba), Mitsuhisa Sato (RIKEN, AICS), Akihiro Tabuchi (University of Tsukuba)

(P22) vGASNet: A PGAS Communication Library Supporting Out-of-Core Processing
Ryo Matsumiya (Tokyo Institute of Technology), Toshio Endo (Tokyo Institute of Technology)

(P23) Evaluating Autotuning Heuristics for Loop Tiling
Tomoya Yuki (Tokyo Institute of Technology), Yukinori Sato (Tokyo Institute of Technology), Toshio Endo (Tokyo Institute of Technology)

(P24) The PomPP Framework: From Simple DSL to Sophisticated Power Management for HPC Systems
Yasutaka Wada (Meisei University), Yuan He (The University of Tokyo), Thang Cao (The University of Tokyo), Masaaki Kondo (The University of Tokyo)

(P25) Data Model Optimization for Reducing Computational Cost at Apache Spark
Rohyoung Myung (Korea University), Han-Yee Kim (Korea University), Sukyong Choi (Korea University), Taeweon Suh (Korea University), Heonchang Yu (Korea University)

(P26) Run-Time DFS/DCT Optimization for Power-Constrained HPC Systems
Ikuo Miyoshi (Fujitsu Limited), Shinobu Miwa (The University of Electro-Communications), Koji Inoue (Kyushu University), Masaaki Kondo (The University of Tokyo)

(P27) An Extended GLB Library for Optimization Problems
Shota Izumi (University of Fukui), Daisuke Ishii (University of Fukui), Kazuki Yoshizoe (RIKEN)

(P28) Techniques for Using Gaming GPUs in Deep Learning
Gangwon Jo (ManyCoreSoft), Jungho Park (ManyCoreSoft), Jaejin Lee (Seoul National University)

4:30pm-6:00pm Optimization Techniques for Applications
Chair: Rio Yokota (Tokyo Institute of Technology)

Optimizing Forward Computation in Adjoint Method via Multi-level Blocking
Tomoya Ikeda (Graduate School of Information Science Nagoya University); Shin-ichi Ito (Earthquake Research Institute The University of Tokyo); Hiromichi Nagao (Earthquake Research Institute, Graduate School of Information Science and Technology The University of Tokyo); and Takahiro Katagiri, Toru Nagai, and Masao Ogino (Information Technology Center Nagoya University)

A Dynamic Parallel Strategy for DOACROSS Loops
Yuanzhen Cui, Song Liu, Nianjun Zou, and Weiguo Wu (Xi'an Jiaotong University)

Time-space tiling with tile-level parallelism for the 3D FDTD method
Takeshi Fukaya and Takeshi Iwashita (Hokkaido University)
Resource Management
Chair: Jangwoo Kim (Seoul National University)

GPUhd: Augmenting YARN with GPU Resource Management
Daisuke Fukutomi and Yuki Iida (Ritsumeikan University), Takuya Azumi (Osaka University), Shinpei Kato (The University of Tokyo), and Nobuhiko Nishio (Ritsumeikan University)

Towards a Composable Computer System
I-hsin Chung (IBM T. J. Watson Research Center) and Bulent Abali and Paul Crumley (IBM Research)

Tuesday, January 30, 2018

Room A+B+C Room F
9:00am-9:50am Invited Talk
Chair: Mitsuo Yokokawa (Kobe Univ.)

Issue and Solutions for Extreme Scale Computing
Prof. Jack Dongarra
9:50am-10:10am ~ Coffee Break ~
10:10am-12:10pm HPC Applications
Chair: Reiji Suda (the University of Tokyo)

Massively Parallel Method of Characteristics Neutron Transport Calculation with Anisotropic Scattering Treatment on GPUs
Namjae Choi, Junsu Kang, and Han-gyu Joo (Seoul National University)

Acceleration of Dynamic n-Tuple Computations in Many-Body Molecular Dynamics
Patrick Small, Kuang Liu, Subodh Tiwari, Rajiv Kalia, Aiichiro Nakano, Ken-ichi Nomura, and Priya Vashishta (University of Southern California)

Performance improvement of the general-purpose CFD code FrontFlow/blue on the K computer
Kiyoshi Kumahata and Kazuo Minami (Advanced Institute for Computational Science, RIKEN); Yoshinobu Yamade (Mizuho Information and Research Institute; Institute of Industrial Science, The University of Tokyo); and Chisachi Kato (Institute of Industrial Science, The University of Tokyo)

Performance Evaluation of Large Scale Electron Dynamics Simulation under Many-core Cluster based on Knights Landing
Yuta Hirokawa and Taisuke Boku (University of Tsukuba), Shunsuke Sato (Max-Planck Institute), and Kazuhiro Yabana (University of Tsukuba)
Chair: Takatsugu Ono (Kyushu University)

OpenCL-ready High Speed FPGA Network for Reconfigurable High Performance Computing
Ryohei Kobayashi, Yuma Oobata, Norihisa Fujita, Yoshiki Yamaguchi, and Taisuke Boku (University of Tsukuba)

FlexProtect: A SDN-based DDoS Attack Protection Architecture for Multi-tenant Data Centers
Ming-Hung Chen (National Taiwan University, IBM Research); Jyun-Yan Ciou (National Taiwan University); I-Hsin Chung (IBM Research); and Cheng-Fu Chou (National Taiwan University)

LRUM: Local Reliability Protocol for Unreliable Hardware Multicast
Richard Graham (Mellanox Technologies), Hoang-Vu Dang (University of Illinois at Urbana-Champaign), and Brian Smith and Gilad Shainer (Mellanox Technologies)

A Portable Load Balancer for Kubernetes Cluster
Kimitoshi Takahashi (The Graduate University for Advanced Studies, Cluster Computing Inc.); Kento Aida (National Institute of Informatics, The Graduate University for Advanced Studies); and Tomoya Tanjo and Jingtao Sun (National Institute of Informatics)
12:10am-1:30pm ~ Lunch Break ~
1:30pm-2:30pm HPC Frameworks and Libraries
Chair: Daisuke Takahashi (University of Tsukuba)

Parallel Hierarchical Matrices with Block Low-rank Representation on Distributed Memory Computer Systems
Akihiro Ida (The University of Tokyo), Hiroshi Nakashima (Kyoto University), and Masatoshi Kawai (The University of Tokyo)

A Portability Layer of an All-pairs Operation for Hierarchical N-Body Algorithm Framework Tapas
Motohiko Matsuda (RIKEN AICS), Keisuke Fukuda (Tokyo Institute of Technology), and Naoya Maruyama (RIKEN AICS)
Vender Session (short introduction)

1. DataDirect Networks Japan, Inc.
2. Intel Corporation
3. Hewlett-Packard Japan, Ltd.
4. NEC Corporation
5. IBM Japan, Ltd.
6. Penguin Computing, Inc.
7. Arm Limited / Cavium, Inc.
8. Dell EMC
9. AMD Japan Ltd.
10. Fujitsu
2:30pm-3:30pm Poster Session

See the poster list here.
3:30pm-5:30pm BigData
Chair: Shinji Sumimoto (Fujitsu Laboratories)

A Scalable Multi-Granular Data Model for Data Parallel Workflows
Shinichiro Takizawa, Motohiko Matsuda, Naoya Maruyama, and Yoshifumi Nakamura (RIKEN)

TripleID-C: Low Cost Compressed Representation for RDF Query Processing in GPUs
chantana chantrapornchai and Pisit Makpaisit (Kasetsart University)

Performing External Join Operator on PostgreSQL with Data Transfer Approach
Ryota Takizawa, Hideyuki Kawashima, Ryuya Mitsuhashi, and Osamu Tatebe (University of Tsukuba)

A Study on Open Source Software for Large-Scale Data Visualization on SPARC64fx based HPC Systems
Jorji Nonaka and Motohiko Matsuda (RIKEN AICS); Takashi Shimizu and Naohisa Sakamoto (Kobe University); Masahiro Fujita (LTE Inc.); Keiji Onishi (RIKEN AICS); Eduardo Camilo Inacio (UFSC); Shun Ito and Fumiyoshi Shoji (RIKEN AICS); and Kenji Ono (Kyushu University, RIKEN AICS)
5:30pm-5:40pm Award Session
6:00pm-8:00pm Banquet

Wednesday, January 31, 2018

Room A+B+C Room F
9:00am-9:50am Invited Talk
Chair: Osamu Tatebe (Univ. of Tsukuba)

An Overview of Post-K Development
Dr. Yutaka Ishikawa
9:50am-10:10am ~ Coffee Break ~
10:10am-11:10am System Software for HPC
Chair: Hideyuki Kawashima (University of Tsukuba)

Parallelized Software Offloading of Low-Level Communication with User-Level Threads
Wataru Endo and Kenjiro Taura (The University of Tokyo, Graduate School of Information Science and Technology)

Efficient dentry lookup with backward finding mechanism
Nae Young Song and Hwajung Kim (Seoul National University), Hyuck Han (Dongduk Women's University), and Heon Young Yeom (Seoul National University)
Cluster Computing 2
Chair: Takeshi Nanri (Kyushu Univ.)

Autotuning MPI Collectives using Performance Guidelines
Sascha Hunold and Alexandra Carpen-Amarie (TU Wien)

11:10am-12:00am Closing
13:00pm-16:30pm IXPUG Workshop Asia 2018

IXPUG papers can be downloaded from here. (Passward is required)
Workshop on PGAS programming models: Experiences and Implementations

PGAS papers can be downloaded from here. (Passward is required)

Invited Talk

Jan. 29: Towards Next Generation Chinese Supercomputer

Speaker: Dr. Yutong Lu (GZSC)


In the post-petascale and exascale era, the innovative integrated technologies are needed from new architecture to associated software stacks. We must deal with the performance, scalability, power consumption and reliability issues. The system hardware and software stack co-design needs to explore the capability of cpu, accelerator, interconnection, I/O storage system, and till whole system. This talk will also cover the efforts and experiences from China, especially NUDT to design and implement the Tianhe serial systems and applications. The domain-specific application platforms have been building to help broader users moving forward efficiently to next generation large-scale computing.



Yutong Lu, holds Professor position both in Sun Yat-sen University(SYSU) and National University of Defense Technology (NUDT), she is Director of National supercomputing center in Guangzhou. She is a member of Chinese national key R&D project HPC special expert committee. She got her B.S, M.S, and PhD degrees from the NUDT. Her extensive research and development experience has spanned several generations of domestic supercomputers in China, she is deputy chief designer of Tianhe-2 system. Her continuing research interests include parallel operating system (OS), high speed communications, global file systems, and advanced programming environments converging HPC and bigdata.

Jan. 30: Issue and Solutions for Extreme Scale Computing

Speaker: Prof. Jack Dongarra (University of Tennessee)


In this talk we will look at the current state of high performance computing and look to the future toward exascale. In addition, we will examine some issues that can help in reducing the power consumption and the use of short precision for linear algebra computations.



Jack Dongarra holds an appointment at the University of Tennessee, Oak Ridge National Laboratory, and the University of Manchester. He specializes in numerical algorithms in linear algebra, parallel computing, use of advanced-computer architectures, programming methodology, and tools for parallel computers. He was awarded the IEEE Sid Fernbach Award in 2004; in 2008 he was the recipient of the first IEEE Medal of Excellence in Scalable Computing; in 2010 he was the first recipient of the SIAM Special Interest Group on Supercomputing's award for Career Achievement; in 2011 he was the recipient of the IEEE Charles Babbage Award; and in 2013 he received the ACM/IEEE Ken Kennedy Award. He is a Fellow of the AAAS, ACM, IEEE, and SIAM and a foreign member of the Russian Academy of Science and a member of the US National Academy of Engineering.

Jan. 31: An Overview of Post-K Development

Speaker: Dr. Yutaka Ishikawa (RIKEN AICS)


 The next flagship supercomputer in Japan, replacement of the K supercomputer and thus we call it post-K computer, is being designed to be operated in early 2020s. Its node architecture and interconnect are an ARM HPC extension and a 6-D mesh/torus network, respectively. A three level hierarchical storage system will be installed with compute nodes. The system software developed in the post K supercomputer includes a novel operating system for general-purpose manycore architectures, low-level communication and MPI libraries, and file I/O middleware. After introducing an overview of the post K architecture, the current status of the system software development and activities of international collaborations will be presented.



 Yutaka Ishikawa
 Project Leader of Exascale Computing and Leader of System Software Development, RIKEN AICS
 Visiting Researcher, University of Tokyo
 Yutaka Ishikawa is in charge of developing the Post-K computer. Ishikawa received the BS, MS, and PhD degrees in electrical engineering from Keio University. From 1987 to 2001, he was a member of AIST (former Electrotechnical Laboratory), METI. From 1993 to 2001, he was the chief of Parallel and Distributed System Software Laboratory at Real World Computing Partnership. He led development of cluster system software called SCore, which was used in several large PC cluster systems around 2004. From 2002 to 2014, he was a professor at the University Tokyo. He led the project to design a commodity-based supercomputer called T2K open supercomputer. As a result, three universities, Tsukuba, Tokyo, and Kyoto, obtained each supercomputer based on the specification.