API Reference¶
Communicators¶
-
chainermn.
create_communicator
(communicator_name='hierarchical', mpi_comm=None)¶ Create a ChainerMN communicator.
Different communicators provide different approaches of communication, so they have different performance charasteristics. The default communicator
hierarchical
is expected to generally perform well on a variety of environments, so one need not to change communicators in most cases. However, choosing proper communicator may give better performance. The following communicators are available.Name CPU GPU NCCL Recommended Use Cases naive OK OK Testing on CPU mode hierarchical OK Required Each node has a single NIC or HCA two_dimensional OK Required Each node has multiple NICs or HCAs single_node OK Required Single node with multiple GPUs flat OK N/A pure_nccl OK Required (>= v2) pure_nccl
is recommended when NCCL2 is available in the environment,but it’sstill experimental support.Parameters: - communicator_name – The name of communicator (
naive
,flat
,hierarchical
,two_dimensional
,pure_nccl
, orsingle_node
) - mpi_comm – MPI4py communicator
Returns: ChainerMN communicator
- communicator_name – The name of communicator (
Optimizers and Evaluators¶
-
chainermn.
create_multi_node_optimizer
(actual_optimizer, communicator)¶ Create a multi node optimizer from a Chainer optimizer.
Parameters: - actual_optimizer – Chainer optimizer
(e.g.,
chainer.optimizers.Adam
). - communicator – ChainerMN communicator.
Returns: The multi node optimizer based on
actual_optimizer
.- actual_optimizer – Chainer optimizer
(e.g.,
-
chainermn.
create_multi_node_evaluator
(actual_evaluator, communicator)¶ Create a multi node evaluator from a normal evaluator.
Parameters: - actual_evaluator – evaluator
(e.g.,
chainer.training.extensions.Evaluator
) - communicator – ChainerMN communicator
Returns: The multi node evaluator based on
actual_evaluator
.- actual_evaluator – evaluator
(e.g.,
Dataset Utilities¶
-
chainermn.
scatter_dataset
(dataset, comm, root=0, shuffle=False, seed=None)¶ Scatter the given dataset to the workers in the communicator.
The dataset of worker 0 (i.e., the worker whose
comm.rank
is 0) is scattered to all workers. The given dataset of other workers are ignored. The dataset is split to sub datasets of almost equal sizes and scattered to workers. To create a sub dataset,chainer.datasets.SubDataset
is used.Parameters: - dataset – A dataset (e.g.,
list
,numpy.ndarray
,chainer.datasets.TupleDataset
, ...). - comm – ChainerMN communicator or MPI4py communicator.
- shuffle (bool) – If
True
, the order of examples is shuffled before being scattered. - root (int) – The root process of the scatter operation.
- seed (int) – Seed the generator used for the permutation of indexes.
If an integer being convertible to 32 bit unsigned integers is
specified, it is guaranteed that each sample
in the given dataset always belongs to a specific subset.
If
None
, the permutation is changed randomly.
Returns: Scattered dataset.
- dataset – A dataset (e.g.,
-
chainermn.datasets.
create_empty_dataset
(dataset)¶ Creates an empty dataset for models with no inputs and outputs.
This function generates an empty dataset, i.e.,
__getitem__()
only returnsNone
. Its dataset is compatible with the original one. Such datasets used for models which do not take any inputs, neither return any outputs. We expect models, e.g., whoseforward()
is starting withchainermn.functions.recv()
and ending withchainermn.functions.send()
.Parameters: dataset – Dataset to convert. Returns: Dataset consists of only patterns in the original one. Return type: TransformDataset
Links¶
-
class
chainermn.
MultiNodeChainList
(comm)¶ Combining multiple non-connected components of computational graph.
This class combines each
chainer.Chain
, which represents one of the non-connected component in compuational graph. In__call__()
, the returned object ofchainer.Chain
(which represents pointer) are passed to the nextchainer.Chain
, in order to retain the computational graph connected and make backprop work properly.Users add each
chainer.Chain
byadd_link()
method. Each chain is invoked in forward computation according to the order they are added, and in backward computation according to the reversed order.Example
This is a simple example of the model which sends its outputs to rank=1 machine:
import chainer import chainer.functions as F import chainermn class SimpleModelSub(chainer.Chain): def __init__(self, n_in, n_hidden, n_out): super(SimpleModelSub, self).__init__( l1=L.Linear(n_in, n_hidden), l2=L.Linear(n_hidden, n_out)) def __call__(self, x): h1 = F.relu(self.l1(x)) return self.l2(h1) class SimpleModel(chainermn.MultiNodeChainList): def __init__(self, comm, n_in, n_hidden, n_out): super(SimpleModel, self).__init__(comm) self.add_link( SimpleModelSub(n_in, n_hidden, n_out), rank_in=None, rank_out=1)
Example
This is the other example of two models interacting each other:
import chainer import chainer.functions as F import chainermn class MLP(chainer.Chain): def __init__(self, n_in, n_hidden, n_out): super(MLP, self).__init__( l1=L.Linear(n_in, n_hidden), l2=L.Linear(n_hidden, n_hidden), l3=L.Linear(n_hidden, n_out)) def __call__(self, x): h1 = F.relu(self.l1(x)) h2 = F.relu(self.l2(h1)) return self.l3(h2) class Model0(chainermn.MultiNodeChainList): def __init__(self, comm): super(Model0, self).__init__(comm) self.add_link( MLP(10000, 5000, 2000), rank_in=None, rank_out=1) self.add_link( MLP(100, 50, 10), rank_in=1, rank_out=None) class Model1(chainermn.MultiNodeChainList): def __init__(self, comm): super(Model1, self).__init__(comm) self.add_link(MLP(2000, 500, 100), rank_in=0, rank_out=0)
Model0
is expected to be on rank=0, andModel1
is expected to be on rank=1. The firstMLP
inModel0
will send its outputs toModel1
, thenMLP
inModel1
will receive it and send its outputs to the secondMLP
inModel0
.Parameters: comm (chainermn.communicators._base.CommunicatorBase) – ChainerMN communicator. -
add_link
(link, rank_in=None, rank_out=None)¶ Register one connected link with its inout rank.
Parameters: - link (chainer.Link) – The link object to be registered.
- rank_in (int, list, or None) – Ranks from which it receives data. If None is specified, the model does not receive from any machines.
- rank_out (int, list, or None) – Ranks to which it sends data. If None is specified, the model will not send to any machine.
-
-
class
chainermn.links.
MultiNodeBatchNormalization
(size, comm, decay=0.9, eps=2e-05, dtype=<type 'numpy.float32'>, use_gamma=True, use_beta=True, initial_gamma=None, initial_beta=None)¶ Batch normalization layer that can use the whole batch stats.
When using chainer.link.BatchNormalization, batch mean and std are computed independently for the local batch in each worker. When local batch size is too small, training is unstable due to unreliable batch stats.
In contrast, when using this MultiNodeBatchNormalization, workers communicate to conduct ‘correct’ batch normalization (e.g., obtaining mean and std for the whole global batch).
This link works only with Chainer >= 2.0.0.
Parameters: - size (int or tuple of ints) – Size (or shape) of channel dimensions.
- comm (ChainerMN communicator) – communicator to share the batch stats.
- decay (float) – Decay rate of moving average. It is used on training.
- eps (float) – Epsilon value for numerical stability.
- dtype (numpy.dtype) – Type to use in computing.
- use_gamma (bool) – If
True
, use scaling parameter. Otherwise, use unit(1) which makes no effect. - use_beta (bool) – If
True
, use shifting parameter. Otherwise, use unit(0) which makes no effect.
Functions¶
-
chainermn.functions.
send
(x, communicator, rank, tag=0)¶ Send elements to target process.
This function returns a dummy variable only holding the computational graph. If
backward()
is invoked by this dummy variable, it will try to receive gradients from the target process and send them back to the parent nodes.Parameters: - x (Variable) – Variable holding a matrix which you would like to send.
- communicator (chainer.communicators.CommunicatorBase) – ChainerMN communicator.
- rank (int) – Target process specifier.
- tag (int) – Optional message ID (MPI feature).
Returns: A dummy variable with no actual data, only holding the computational graph. Please refer
chainermn.functions.pseudo_connect
for detail.Return type: Variable
-
chainermn.functions.
recv
(communicator, rank, delegate_variable=None, tag=0, device=-1)¶ Receive elements from target process.
This function returns data received from target process. If
backward()
is invoked, it will try to send gradients to the target process.Note
If you define non-connected computational graph on one process, you have to use
delegate_variable
to specify the output of previous computational graph component. Otherwisebackward()
does not work well. Please referchainermn.functions.pseudo_connect
for detail.Parameters: - communicator (chainer.communicators.CommunicatorBase) – ChainerMN communicator.
- rank (int) – Target process specifier.
- delegate_variable (chainer.Variable) – Pointer to the other non-connected component.
- tag (int) – Optional message ID (MPI feature).
- device (int) – Target device specifier.
Returns: Data received from target process. If
backward()
is invoked by this variable, it will send gradients to the target process.Return type: Variable
-
chainermn.functions.
pseudo_connect
(delegate_variable, *actual_variables)¶ Connect independent connected graph component.
This function is implemented to return received arguments directly, except the first
delegate_variable
. In backward computation, it returns received gradients directly, adding a zero grad corresponding todelegate_variable
. The detail ofdelegate_variable
is described in the following notes.Note
In model-parallel framework, models on each process might have many non-connected components. Here we call a given graph non-connected when multiple inter-process communications are needed for its computation. For example, consider the following example:
class ConnectedGraph(chainermn.MultiNodeChainList): def __init__(self, comm): super(ConnectedGraph, self).__init__(comm) self.add_link(ConnectedGraphSub(), rank_in=3, rank_out=1)
This model receives inputs from rank=3 process and sends its outputs to rank=1 process. The entire graph can be seen as one connected component
ConnectedGraphSub
. Please refer the document ofMultiNodeChainList
for detail.On the other hand, see the next example:
class NonConnectedGraph(chainermn.MultiNodeChainList): def __init__(self, comm): super(NonConnectedGraph, self).__init__(comm) self.add_link(NonConnectedGraphSubA(), rank_in=3, rank_out=1) self.add_link(NonConnectedGraphSubB(), rank_in=1, rank_out=2)
This model consists of two components: at first,
NonConnectedGraphSubA
receives inputs from rank=3 process and sends its outputs to rank=1 process, and thenNonConnectedGraphSubB
receives inputs from rank=1 process and sends its outputs to rank=2 process. Here multiple inter-process communications are invoked betweenNonConnectedGraphSubA
andNonConnectedGraphSubB
, so it is regarded as non-connected.Such kind of non-connected models can be problematic in backward computation. Chainer traces back the computational graph from the output variable, however naive implementation of
chainermn.functions.recv
does not take any inputs rather receives inputs byMPI_Recv
, where backward path vanishes.To prevent this, dummy variables what we call
delegate_variable
are used. In principle,chainermn.functions.send
does not return any outputs because it sends data to the other process byMPI_Send
. However,chainermn.functions.send
returns a dummy / empty variable in our implementation, which is calleddelegate_variable
. This variable does not hold any data, just used for retaining backward computation path. We can guarantee the backward computation just by puttingdelegate_variable
to the nextchainermn.functions.recv
(chainermn.functions.recv
has an optional argument to receivedelegate_variable
).Note
In some cases the intermediate graph component returns model outputs. See the next example:
class NonConnectedGraph2(chainermn.MultiNodeChainList): def __init__(self, comm): super(NonConnectedGraph2, self).__init__(comm) self.add_link(NonConnectedGraphSubA(), rank_in=1, rank_out=None) self.add_link(NonConnectedGraphSubB(), rank_in=None, rank_out=1)
This model first receives inputs from rank=1 process and make model outputs (specified by
rank_out=None
) inNonConnectedGraphSubA
. Then using model inputs (specified byrank_in=None
),NonConnectedGraphSubB
sends its outputs to rank=1 process. SinceMultiNodeChainList.__call__
returns outputs of the last component (in this case, outputs ofNonConnectedGraphSubB
), naive implementation cannot output the returned value ofNonConnectedGraphSubA
as the model outputs. In this case,pseudo_connect
should be used.pseudo_connect
takes two arguments. The first onedelegate_variable
is what we explained in above note. In this case, returned value ofNonConnectedGraphSubB
corresponds todelegate_variable
. The second oneactual_variables
is “what we wantdelegate_variable
to imitate”. InNonConnectedGraph2
, we obtain returned value ofNonConnectedGraphSubB
as the model outputs, but what we actually want is returned value ofNonConnectedGraphSubA
. At the same time we want to trace back this resulted variable in backward computation. Usingpseudo_connect
, we can make a variable whose data is the same as the returned value ofNonConnectedGraphSubA
, and which traces backNonConnectedGraphSubB
first.pseudo_connect
should also be used in some pathological cases, for example, where multiplechainermn.functions.send
occurs sequentially.Parameters: - delegate_variable (chainer.Variable) – Pointer to the previous non-connected graph component.
- actual_variables (tuple of chainer.Variable) – Actual values which
delegate_variable
imitate.
Returns: A variable with the given values combined with delegating variable.
Return type: Variable