UPC/UPC++ for parallel programming and computing

Simula invites anyone interested to a seminar on UPC/UPC++ for parallel programming and computing on June 20. Prof. Scott Baden of UCSD, prof. Hoai Phuong Ha of UiT, and Jérémie Lagraviére of Simula will be presenting talks on the subject.
Date: Tuesday, June 20
Time: 13:00-15:30
Room: Møterommet

Schedule 

13:00-13:45 UPC++: a PGAS library for high performance computing.
Speaker: Prof. Scott Baden, UCSD/LBNL
 
13:45-14:30 Data-centric UPC++.
Speaker: Prof. Hoai Phuong Ha, UiT
 
14:30-15:00 Light scattering at nanoparticles. Can parallel computing give us better solar cells?
Speaker: Dr. Rozalia Lukacs, NMBU
 
15:00-15:30 Performance optimisation and modeling of UPC code that involves fine-grain communication.
Speaker: Jeremie Lagraviere, Simula

Abstracts

UPC++: a PGAS library for high performance computing

UPC++ is a library that implements the Asynchronous PGAS model. We are revising the library under the auspices of the DOE's Exascale Computing Project, to meet the needs of applications requiring PGAS support.  UPC++ is intended for implementing elaborate distributed data structures where communication is irregular or fine-grained.  The UPC++ interfaces for moving non-contiguous data and handling memories with different optimal access methods are composable and closely resemble those used in conventional C++.  

The key abstractions in UPC++ are global pointers, that enable the programmer to express ownership information for improving locality, and asynchronous programming via RPC, also known as function shipping, and futures. Futures enable the programmer to capture  data readiness state, which is useful in making  scheduling decisions, or to chain together a DAG of operations to execute asynchronously as high-latency dependencies become satisfied.

The UPC++ programmer can expect communication to run at close to hardware speeds. To this end, UPC++ runs atop the GASNet communication library and takes advantage of GASNet's low-overhead communication as well as access to any special hardware support, e.g. RDMA. 

Data-centric UPC++

Most of the current programming models are based on the premise that computing is the most expensive component. For example, at the application/algorithm level, the time complexity of an algorithm is analyzed by counting the number of arithmetic operations and ignoring the cost of data movement. At the system level, prominent scheduling mechanisms such as work-stealing aim at keeping all the computing units (i.e., cores) active and ignore the additional cost of data movement caused by several active cores sharing caches.

However, we are entering a big-data era where computing is cheap and massively parallel while data movement dominates performance and energy costs. In order to utilize the next generation of high performance computing systems (i.e., exascale systems), programming models need a paradigm shift from compute-centric to data-centric. Data-centric programming models should not only support programming abstractions to express data locality and affinity at the application level, but also provide runtime systems that minimize data movement at the system level.

In this talk, I will discuss the possibility to incorporate the data-centric aspect into the partitioned global address space (PGAS) programming paradigm, particularly UPC++. I will present our preliminary results on data-centric UPC++ from our RCN FRIPRO project PREAPP.

Light scattering at nanoparticles. Can parallel computing give us better solar cells?

Solar cells are one of the best green energy sources as they convert sunlight directly to electricity. Although their price is constantly falling, we still have high production costs. In order to reduce cost, researchers are aiming for thinner solar cells. But less material means reduction in efficiency. To keep efficiency high and the production cost low, researchers are adding nanostructures on the top of the thin solar cells. But what is the best arrangement of nanostructures which gives the highest efficiency? In order to investigate this we need the tools of theoretical physics. With computer experiments we can understand how light is going around and inside the nanostructures. In our FRINATEK project we do computer simulations where we study the scattering of light on different nanostructures. These computer simulations are done on high performance computer. In order to simulate more and more realistic structures, we need to increase the simulation grid. This has limitations today in memory and computational time as our codes run on single nodes. In this talk I would like to investigate if parallel computing can overcome the limitations in memory and computation time of our codes.  

Performance optimisation and modeling of UPC code that involves fine-grain communication

UPC, as one of the most widely-used PGAS languages, has several inherent user-friendly features in its design. These nice features relieve the programmers of the burden of e.g. working explicitly with inter-thread data movement. However, they may also bring performance penalties, especially to programs that incur fine-grain communication between threads. We show that it is important to avoid global shared array pointers, and instead replace them with private copies and aggregated inter-thread data exchanges. Moreover, we present our work on modeling the performance of several UPC implementations, for the purpose of shedding light on why and when UPC's shared array pointers will become prohibitively time-consuming to use.

20/ Jun 2017 13.0015.30