44th SPEEDUP Workshop on High-Performance Computing

University of Lugano (USI)
September 10/11, 2015

Sponsored by:
Speedup   CSCS   USI

The SPEEDUP workshop series has a long history in presenting and discussing the state-of-the-art in high-performance and parallel scientific computing. This includes algorithms, applications, and software aspects related to high-performance parallel computing. The focus of the 44th SPEEDUP workshop is on Fluid-Structure Interaction. The scientific program of September 10 consists of six 45-minutes talks and a poster session. Please encourage your collaborators to upload an abstract for the poster session. The deadline is Sept 4, 2015.


Directions to reach CSCS and University of Lugano can be found here


Thursday September 10th

The conference will take place at CSCS


10:00 - 10:45
Registration and Coffee
10:45 - 10:55
10:55 - 10:40
Felix Wolf (TU Darmstadt): Exascaling Your Library: Will Your Implementation Meet Your Expectations?
Many libraries in the HP field encapsulate sophisticated algorithms with clear theoretical scalability expectations. However, hardware constraints or programming bugs may sometimes render these expectations inaccurate or even plainly wrong. While algorithm engineers have already been advocating the systematic combination of analytical performance models with practical measurements for a very long time, we go one step further and show how this comparison can become part of automated testing procedures. The most important applications of our method include initial validation, regression testing, and benchmarking to compare implementation and platform alternatives. Advancing the concept of performance assertions, we verify asymptotic scaling trends rather than precise analytical expressions, relieving the developer from the burden of having to specify and maintain very fine-grained and potentially non-portable expectations. In this way, scalability validation can be continuously applied throughout the whole development cycle with very little effort. Using MPI as an example, we show how our method can help uncover non-obvious limitations of both libraries and underlying platforms.
11:40 - 12:25
Simone Deparis (EPF Lausanne): Fluid-structure interaction for vascular flows: from supercomputers to laptops

Can we simulate haemodynamics in a vascular district on a laptop in real time?
Even using modern algorithms and computers, one single heartbeat still takes several hours on an HPC platform. Indeed, blood flow in arteries needs to take into account the incompressibility of the fluid, the compliant vessel, and the patient specific data, at least for what concerns the geometry and some integrated quantity like flow rates or pressure. After discretizing the Fluid-Structure Interaction (FSI) coupled problem by finite differences in time and finite element in space, the computational time needed time to simulate a single heartbeat is about 3 hours on 1000 processors.
We propose a model order reduction and a numerical reduction. The former assumes a fixed fluid computational domain and a thin membrane structure which is integrated in the fluid equations as a generalised Robin boundary condition. The latter takes advantage of the reduced model and face it by Proper Orthogonal Decomposition (POD) and the Reduced Basis Method (RBM).
The combination of POD and RBM allows to split the computational effort into an offline and an online parts. The offline part runs on a HPC system and takes about 5 hours on 1000 processors, while the online part can be run in real time, i.e. 1 second of simulations in less than 1 second of cpu time, on a notebook. The real gain of such an approach is that after offline computations, the parameters of the patient specific simulation, like flow rate, heart pace, stiffness of the artery, can be changed online.

12:25 - 13:45
Lunch break (at CSCS)
During the lunch break at 12:50: General Assembly of the Speedup Society
13:45 - 14:30
Stefan Pirker (University of Linz): Improving the performance of numerical simulation of granular systems - Model development, discretization and hardware aware implementation

Numerical simulation of granular systems (i.e. the interaction of a multitude of particles with each other and with an interstitial fluid) has attracted researchers since decades due to its paramount importance in process industries, life sciences and environmental sciences. In general, numerical models can be organized into two modelling classes -- continuous two fluid models and discrete particle models. While in the first case the predictive capability of these models is limited by modelling uncertainties, in the second case numerical simulation are mainly limited by the sheer number of particles involved in real scale particle based processes. This requires huge computational resources and dedicated modelling efforts for distributing compute load on a given hardware.
In this talk I will start with providing an overview on existing modelling approaches for the simulation of granular systems. This will naturally lead to a discussion on the individual limitations and challenges with respect to their performance on distributed hardware systems. Classical hardware aware optimization (i.e. mainly optimisation of code parallelisation) of existing models only has limited potential for speeding up the performance of simulations. We will then conclude that the development levels of model creation, numerical discretization and implementation on a specific hardware cannot be regarded as independent tasks. Rather they strongly interact, with the level of model creation having the greatest impact on the simulation performance. Therefore, physicists dealing with the creation of new modelling approaches have to communicate with computer scientist in a very early project stage.
In a second part of this presentation I will discuss three new modelling concepts, which might lead to a significant gain in performance on distributed hardware systems. First, a lattice-Boltzmann magnification lens is presented, which combines the classical world of Computational Fluid Dynamics (CFD) with a local high resolution lattice-Boltzmann co-simulation. Second, a multi-level coarse graining concept is introduced for particle based simulations, which might accelerate Discrete Element Method (DEM) simulations by an order of magnitude. Finally, we sketch a new concept of randomized simulations, which utilize classical expensive CFD (or DEM) simulation of granular systems in order to train a random process. In a second step, this random process can be utilized for e.g. the simulation of species propagation. This new concept of randomized simulations can boost the performance of granular flow simulations by at least two orders of magnitude.

14:30 - 14:45
Coffee break
14:45 - 15:30
Stephen Turnock (University of Southampton): Dynamic behaviour of passive adaptive composite foils
The ability to replace complex mechanisms by internal structures that are capable of passively adapting their external shape to the fluid loading experienced is attractive. For instance in the maritime environment tidal turbines, propellers, hydrofoils and control surfaces can all offer performance gains through the ability to tune their internal structure using bend-twist coupling possible with the anisotropic properties of composite multi-layered structures. Computational methods of coupling the fluid loading to the resultant structural response are well established but to date there is only limited data available to confirm their prediction of deformation and twist for a given fluid regime. A programme of work within the fluid structure interactions group at the University of Southampton is developing a validated approach to designing such structures based on detailed dynamic wind tunnel measurements using digital image correlation (DIC) and particle image velocimetry (PIV). A series of alternative internal beam structures with different levels of bend-twist coupling for a rectangular planform foil as well as NACRA foil have been tested for a range of wind speeds, angles of attack upto and beyond stall. Alongside this a commercial CFD code and FEA are coupled to predict the performance. The long term ability of such computational systems to provide cost-effective design methods where many operating conditions needs to be considered in order to select the best design will be considered.
15:30 - 16:15
Dominik Obrist (University of Bern): HPC framework for aortic valve simulation with hybrid discretization for fluid and soft tissue

The numerical simulation of aortic valves is a multi-physics problem involving large deformations of soft tissue and transient vortical flow fields. Whereas soft tissue is most appropriately discretized on unstructured meshes in Lagrangian formulation, the three-dimensional flow field is discretized on a structured Cartesian grid to obtain an efficient implementation on modern HPC platforms. The tissue dynamics on the unstructured mesh and the flow on the structured grid are coupled with the immersed boundary method. The parallelization of such a hybrid discretization approach raises interesting questions with respect to data locality and load balancing under a domain decomposition paradigm.

16:15 - 17:15
Apero and poster session

Friday September 11th

The tutorial will take place at USI, room SI-008.

Full-day Tutorial on Advanced MPI and the new features of MPI-3, taught by Torsten Höfler, ETH Zurich


The new MPI standards (MPI-3.0) adds several key-concepts to deal with programming massively parallel modern hardware systems. In this tutorial, the three major concepts are covered:

  1. nonblocking collectives and flexible communicator creation,
  2. greatly improved remote memory access (RMA) programming, and
  3. topology mapping to improve locality and neighborhood ("build your own") collective operations.
Nonblocking collectives enable to write applications that are resilient to small time variations (noise), overlap communication and computation, and enable new complex communication protocols. The new remote memory access semantics allow to efficiently exploit modern computing systems that offer RDMA but require a new way of thinking and developing applications. Topology mapping allows to specify the application's communication requirements and enables the MPI implementation to optimize the process-to-node mapping. Last but not least, neighborhood collectives form a powerful mechanism where programmers can specify their own collective operation and allow the MPI implementation to apply additional optimizations.


9:00 - 10:30
10:30 - 11:00
11:00 - 12:30
12:30 - 14:00
14:00 - 15:30

Content Level

Introductory: 25%, Intermediate: 50%, Advanced: 25%

Audience Prerequisites

We generally assume a basic familiarity with MPI, i.e., attendees should be able to write and execute simple MPI programs. We also assume familiarity with general HPC concepts (i.e., a simple understanding of batch systems, communication and computation tradeoffs, and networks).

Fees: Details and the registration form can be found here.

Organizing committee

A. Adelmann (PSI Villigen), P. Arbenz (ETH Zurich), H. Burkhart (U of Basel), B. Chopard (U Geneva), S. Deparis (EPF Lausanne), J. Hesthaven (EPF Lausanne), A. Janka (EIA Fribourg), R. Krause (USI Lugano), H. Nordborg (HSR), D. Obrist (U Berne), V. Rezzonico (EPF Lausanne), O. Schenk (USI Lugano), J. VandeVondele (ETH Zurich).