The introduction given in AD theory and mathematical definitions is formulated in terms of elemental operations. In the view of Statement level recording (Expression templates), let
denote a statement in the code where
CoDiPack has to evaluate the reverse AD mode for each statement. The equation is:
Note that
The Jacobian taping approach uses the property that in the above reverse AD equation only
This taping approach has multiple advantages:
The disadvantages are:
The primal value taping approach implements the more traditional taping strategy of AD. It stores all primal values of the computation, such that these values are available during the reverse interpretation. It is important that this is done in a way such that each primal value is only stored once. In the example:
a
is used 6 times as an argument and is assigned only once. If the primal value taping approach would store each argument of an operation, then the value of a
would be stored 6 times in the above case. The strategy can be shifted such that the value of a
is only stored once during the assignment. During the reverse run, the place where the value of a
is stored can be accessed through the identifier of a
.
If this scheme is used, then for each statement 21 bytes are required. 8 bytes are used for the primal value of the result (
The memory for each argument is then only 4 bytes for the identifier of this argument. This identifier is used to access the adjoint values (components of
This taping approach has the following advantages:
The disadvantages are:
In theory the primal value taping approach should be more efficient than the Jacobian taping approach. If the tape has on average 4 arguments per statement, then the primal value taping approach requires 37 bytes per statement whereas the Jacobian taping approach requires 53 bytes. This would provide a memory reduction of 30%. The problem is that the additional memory requirements for the passive and constant values of the primal value taping approach usually use up the memory savings. It can therefore never be said which taping approach is the better one since it depends on the application and the actual computation.