The process gets more involved if custom operators are used in the application. The default handling of MeDiPack in such cases is to change the reduce operation into a gather operation and to perform the reduction on the local processors. This enables the AD tool to see the operations and perform the correct adjoint implementation. An example for such an operation is:
It is important that the implementation of the custom operator is done with the AD type, otherwise a wrong data structure will be used in the custom operator. Since the function is the implemenation of the MPI user defined function only at this place the MPI_Datatype is used and not the AMPI_Datatype.
The default handling from MeDiPack of such custom operators will always work but it is inefficient for large applications. There is a new overload of the MPI_Op_create function that enables the user to provide additional information such that a custom operator can do the reduce operations in the MPI nodes.
The mathematical model for a reduce operation is
where represents the value of one MPI rank, the number of ranks and the final result. is the mathematical representation of the the reduce operation. The reverse AD formulation for this operation is
The most important change, this equation represents, is the communication structure. The reduce operation is changed into a broadcast operation and therefore no custom operator can be executed inside of the broadcast operation. It is also problematic to calculate the derivative on each processor since the values are not available. The above reverse AD formulation is reinterpreted as
where pre and post represent a pre and post broadcast operation. The equation is evaluated by MeDiPack in the form:
double y_b = ...;
double y_b_mod;
if(rank == root) {
y_b_mod = pre(y, y_b);
}
MPI_Bcast(&y_b_mod, 1, MPI_DOUBLE, root, comm);
double x_b = post(x, y, y_b_mod);
tape.updateAdjoint(x_b);
The availability of y on the sending side of the communication is handled by MeDiPack if that is requested by the user. The definitions for the methods can be found in AMPI_Op_create . The definition is
MPI_User_function* primalFunction, int primalFunctionCommute,
MPI_User_function* modifiedPrimalFunction, int modifiedPrimalFunctionCommute,
const PreAdjointOperation preAdjointOperation,
const PostAdjointOperation postAdjointOperation,
AMPI_Op* op) {
The argument requiresPrimal will tell MeDiPack to store the primal values (x and y) on there respective communication sides.
If requiresPrimalSend is set to true, y is made available on the receiving side.
primalFunction defines the user defined function as implemented above. It should use the AD type such that the AD tool can track the operations. This function maybe used as a fallback method.
modifiedPrimalFunction defines the user function that is evaluated in the MPI nodes. This function needs to be implemented such that it uses the modified type of the AD tool. This type is the one, that is defined by the AD tool for the use in the buffers that are send through the network. For details see the implementation for the specific AD tool.
preAdjointOperation the function that is represented by pre in the above equations.
postAdjointOperation the function that is represented by post in the above equations.
The definition for the last two arguments is:
typedef void (*PreAdjointOperation)(void* adjoints, void* primals, int count);
typedef void (*PostAdjointOperation)(void* adjoints, void* primals, void* rootPrimals, int count);
If the primal value is not requested or not requested to be send, the appropriate arguments will be null pointers. Otherwise the methods follow the concept of the MPI user defined functions.
The above custom operator will now be implemented in an optimized way. The handling for the summation of the L1 residual is trivial and the adjoints for this member do not need to be handled. The LMax residual is more difficult to handle. The adjoint is only valid on the processor which provided the maximum value for the computation. On the other processors the adjoint value needs to be set to zero. For 2 ranks this primal computation would be
void max(double a, double b, double& c) {
if(a > b) {
c = a;
} else {
c = b;
}
}
the adjoint for this method can then be implemented by
which is the technique described above. The implementation can now be done with MeDiPack. For this it is important to know that the CoDiPack types are not modified in the MPI buffers. So the implementation can directly use the CoDiPack type.
The implementation just adds the .value() property access such that the operations are hidden from CoDiPack. The first loop is just a special handling for CoDiPack such that an online activity analysis can be performed. The next step implements the post operation since there is no pre operation required for this operator. The tricky part in the implementation is, that there are two adjoint values for each value in the primal computation. Each first adjoint value represents the L1 adjoint and each second represents the LMax adjoint.
void postAdjResidual(double* adjoints, double* primals, double* rootPrimals, int count) {
// no += needs to be evaluated. This is done by MeDiPack and the AD tool.
for(int i = 0; i < count; ++i) {
if(0 == i % 2) {
// no handling for L1 required
} else {
if(rootPrimals[i] != primals[i]) {
adjoints[i] = 0.0; // not the maximum set the adjoint to zero
}
}
}
}
The creation of the MPI operator can now be done with