Investigating techniques for optimal checkpointing without spatial and temporal cost assumptions

Master

Algorithmic differentiation (AD) allows one to automatically compute derivatives of arbitrary functions by decomposing the function into a sequence of elementary operations and applying the chain rule. The reverse mode of AD yields the gradient of a scalar valued function with computational cost comparable to the cost of evaluating the function itself. However, for each elementary operation in the chain, the intermediate solution must be available for reverse mode AD. This means that the memory requirement for reverse mode AD grows with the number of elementary operations. This results in certain problems being infeasible for reverse mode AD. To mitigate this limitation, one can employ a so called checkpointing scheme. This involves only storing a selection of the intermediate solutions, and recomputing the missing intermediate solutions as they are needed from the nearest earlier stored intermediate solution.

Assuming all elementary operations in a sequence have equal computational time costs and their intermediate solutions equal memory costs, it has been shown that one can achieve logarithmic growth in both temporal and spatial complexity by using a binomial checkpointing scheme [1].

Goal

Derive and investigate schemes that account for elementary operations with differing spatial and temporal costs and compare them to the binomial checkpointing scheme.

Learning outcome

  • Experience with AD and checkpointing techniques
  • Experience with complexity analysis

Qualifications

The candidate should have experience with Python programming.

Supervisors

  • Simon Funke
  • Sebastian Mitusch

References

[1] Griewank, A. (1992). Achieving logarithmic growth of temporal and spatial complexity in reverse automatic differentiation. Optimization Methods and software, 1(1), 35-54.