Integration settings¶
In the last two tutorials, we have discussed how the MadNIS integrator can be used for single- and multi-channel integration. This page describes further options to adapt the integrator for different applications.
Changing the integration domain¶
The default integration domain in MadNIS is the unit hypercube \([0,1]^d\). To choose different
finite intervals for the integration domain, a custom
Integrand object can be created. For instance, to
integrate the function from the first tutorial over the interval \([0,2]^4\), we can
construct the integrator as follows:
from madnis.integrator import Integrator, Integrand
f = Integrand(lambda x: (2 * x).prod(dim=1), input_dim=4, bounds=[[0.0, 2.0]] * 4)
integrator = Integrator(f)
For an infinite integration domain we can change the normalizing flow in our integrator to use a normal latent space instead of the default uniform latent space. For example, to integrate the function \(f(x,y) = \exp(- x^2 - y^2)\), we can define our integrator as
integrator = Integrator(
lambda x: x.square().sum(dim=1).neg().exp(),
dims=2,
flow_kwargs={"uniform_latent": False},
)
Note that this only works well if the means and standard deviations of our integrand are roughly zero and one, respectively. Otherwise, the integrand should be shifted and rescaled. Also note that, internally, the integration interval is limited to \([-10, 10]\) if the flow is built this way. An alternative way to integrate over the whole space of real numbers is to apply a logit transformation and the corresponding Jacobian in the integrand function.
Network architecture¶
The default values for the normalizing flow hyperparameters in MadNIS will work well for many
simple, low-dimensional integrands. For more complex functions, it can be necessary to build a
larger network. This can be done with the flow_kwargs argument of the
Integrator class. They are passed on to the constructor
of the Flow class. By default, the flow is built with a sufficient
number of coupling blocks such that every component is conditioned on every other component at least
once. Hence, it is normally not necessary to set the number of coupling blocks by hand. In the
following example, we change the settings for the depth and number of hidden nodes of the flow
sub-networks.
integrator = Integrator(
lambda x: (2 * x).prod(dim=1),
dims = 4,
flow_kwargs={"layers": 4, "units": 64},
)
Similarly, the cwnet_kwargs parameter can be used to change the hyperparameters of the network
used for the trained channel weights.
For even more flexibility with the network architecture, the flow and cwnet arguments can
be used to replace the normalizing flows and channel weight networks used by default in MadNIS.
The interface that objects passed to the flow arguments have to support is specified by the
abstract base class Distribution. A class used as
channel weight network should have a forward function that accepts tensors of shape
(batch_size, remapped_dim) and returns tensors of shape (batch_size, channel_count). The
output of the network is then added to the logarithm of the prior channel weights if they were
provided. After that, the normalized channel weights are computed.
Training hyperparameters¶
There are several hyperparameters that affect the network training that can be set when the
constructor of Integrator is called. The loss function
can be changed using the loss argument. By default, the KL divergence will be used for
single-channel integration. The integral variance and reverse KL divergence are available as
alternative options. The same options are available for multi-channel training, however only the
variance loss allows for the simultaneous optimization of channel mappings and weights.
Further important training parameters are the batch size (batch_size argument) and the learning
rate (learning_rate argument). To enable training with a variable learning rate, a learning rate
scheduler has to be constructed. This can be done by defining a function that returns the scheduler
with the optimizer as a parameter. For instance, the following code sets cosine annealing as the
learning rate scheduling.
from torch.optim.lr_scheduler import CosineAnnealingLR
integrator = Integrator(
..., # other arguments
scheduler = lambda opt: CosineAnnealingLR(opt, n_steps) # number of training iterations
)
If a learning rate scheduler is given, the learning rate used for the current training iteration
will be given in the TrainingStatus object.
Similarly, we can also set the optimizer by passing a function that constructs the optimizer given
the trainable parameters. For instance, to use the SGD optimizer instead of Adam, we can use
from torch.optim import SGD
integrator = Integrator(
..., # other arguments
optimizer = lambda params: SGD(params, lr=1e-3)
)
VEGAS pre-training¶
VEGAS is a commonly used algorithm for importance sampling that works by assuming a factorized
distribution, i.e. no correlations between different dimensions. It then models the one-dimensional distributions using variable-width bins with uniform probabilities. This makes training VEGAS much
faster than the training of a neural network using stochastic gradient descent. We can use that to
our advantage in MadNIS by training a VEGAS grid first and then using it to initialize our
normalizing flow, using the VegasPreTraining class.
It is constructed using an Integrator instance and uses
the same integrand, sample buffer and integration cache. It relies on the vegas package which is
an optional dependency of MadNIS. The pre-training can be performed with the following code:
from madnis.integrator import VegasPreTraining
# Construct integrator here
vegas = VegasPreTraining(integrator, bins=64, damping=0.8)
vegas.train([1000,2000,4000])
vegas.initialize_integrator()
# Regular MadNIS training here
The only two hyperparameters are the number of bins for the VEGAS grid and the damping parameter that influences the VEGAS convergence (high: fast adaption, low: stable convergence). The parameters of the train method specify the number of samples in each VEGAS iteration (per channel in the multi-channel case). The last line initializes the normalizing flow in the integrator.
Similar to the Integrator class, the
VegasPreTraining class also has methods sample,
integrate, integration_metrics and unweighting_metrics which directly draw samples using
the VEGAS grid. This allows us to compare the VEGAS performance and the normalizing flow performance
directly.
Dealing with zeros¶
The MadNIS integrator has two parameters that control how it treats samples with an integrand value
of zero. The parameter batch_size_threshold controls the minimum amount of samples with non-zero
values per batch relative to the total batch size during the training. More samples are generated
until the number is above this threshold. In addition, the parameter drop_zero_integrands
removes samples with zero-integrand value from the optimization. Depending on the loss function,
this may change the optimization objective. It is especially useful in situations where the
integrand evaluation more expensive for samples with non-zero weight. In that case, improving the
training for samples with non-zero weights at the cost of a lower cut efficiency can be beneficial
for the overall performance.
Device and data type¶
The device and data type used for training and sampling can be set using the device and
dtype arguments of the Integrator constructor.
As the class inherits from torch.nn.Module, the to function can be used alternatively
to change the device or data type.
Storing and loading trained models¶
The Integrator class is a torch.nn.Module. The
functions torch.save and torch.load can therefore be used to store and load trained models.
The saved state includes all network parameters and the integration history, but not the buffered
training samples.
# save integrator
torch.save(integrator.state_dict(), "integrator.pth")
# load integrator
integrator.load_state_dict(torch.load("integrator.pth"))