Multithreading

Allpix Squared supports multithreading by running events in parallel. The module manager creates a thread pool with the configured number of workers or determines them from system parameters if not specified. Each event is represented by an instance of the Event class which encapsulates the data used during this event. The configured number of events are then submitted to the thread pool and executed by the thread pool’s workers.

The thread pool features two independent queues. A FIFO-like unsorted queue for events to be processed, and a second, priority-ordered queue for buffered events. The former is constantly filled with new events to be processed by the main thread, while the latter is used to temporarily buffer events which wait to be picked up in the correct sequence by a SequentialModule.

By default modules are assumed to not operate in a thread-safe way and therefore cannot participate in multithreaded processing of events. Therefore each module must explicitly enable multithreading in its constructor in order to signal its multithreading capabilities to Allpix Squared. To support multithreading, the module run() method should be re-entrant and any shared member variables should be protected. If multithreading is enabled in the run configuration, the module manager will check if all the loaded modules support multithreading. In case one or more modules do not support multithreading, a warning is printed and the feature is disabled. Modules can inform themselves about the decision via the multithreadingEnabled() method.

Seed Distribution

A stable seed distribution to modules and core components of Allpix Squared is guaranteed in order to be able to provide reproducibility of simulation results from the same inputs even when the number of workers is different. Each event is seeded upon its creation by the main thread from a central event seed generator, in increasing sequence of event numbers. The event provides access to a random engine that can be used by each module in the run() method.

To avoid the memory overhead of maintaining random engine objects equal to the number of events, the storage of the engines is made static and thread-local, and is only provided to the event for temporary usage. This way ensures that the framework maintains the minimum number of such heavy objects equal to the number of workers used. When a worker starts to execute a new event, it seeds its local random engine first and passes it to the event object.

Using Messenger in Parallel

The Messenger handles communication in different events concurrently. It supports dispatching and fetching messages via the LocalMessenger. Each event has its own local messenger which stores all messages that was produced in this event. The Messenger owns the global message subscription information and internally forwards the module’s requests to dispatch or fetch messages to the local messenger of the event in a thread-safe manner.

Running Events in order using SequentialModule

The SequentialModule class is made available for modules that require processing of events in the correct order without disabling multithreading. Inheriting from this class will allow the module to transparently check if the given event is in the correct sequence and decide whether to execute it immediately or to request buffering in the prioritized buffer queue if the thread pool if it is out of order.

Using the SequentialModule is suitable for I/O modules which read or write to the file system and do not allow random read or write access to events. This enables output modules to produce the same output file for the same simulation inputs without sacrificing the benefits of using multithreading for other modules.

Since random number generators are thread-local and shared between events processed on the same thread, their state is stored internally when being written into the buffer and restored before processing. This ensures that the sequence of pseudo-random numbers is exactly the same regardless of whether the event was buffered or directly processed.

Geant4 Modules

The usage of the Geant4 library in Allpix Squared has some constraints because the Geant4 multithreaded run manager expects to handle parallelization internally which violates the Allpix Squared design. Furthermore, Geant4 does not guarantee results reproducibility between its multithreaded and sequential run managers. Modules that would like to use the Geant4 library shall not use the run managers provided by Geant4. Instead, they must use the custom run managers provided by Allpix Squared as described in Section 14.1.

Object History, TRefs and PointerWrappers

Allpix Squared uses ROOTs TRef objects to store the persistent links of the simulation object history. These references act similar to C pointers and allow accessing the referenced object directly without additional bookkeeping or lookup of indices. Furthermore they persist when being written to a ROOT file and read back to memory. ROOT implements this via a central lookup table that keeps track of the referenced objects and their location in memory as described in the ROOT documentation.

This approach comes with some drawbacks, especially in multithreaded environments. Most importantly the lookup table is a global object, which means mutexes are required for accessing it. Multiple threads generating or using TRef references will have to share this mutex and will consequently be subject to significant waiting for lock release. Furthermore generating more and more TRef relations over the course of a simulation will increase the size of the central reference table. This table is initialized with a fixed size, and once the number of TRef objects outgrows this pre-allocated space, new memory has to be acquired, leading to a reallocation of memory for the entire new size of the table. With potentially millions of entries, this quickly becomes a computationally expensive operation, slowing down the simulation significantly.

Allpix Squared solves these limitations by wrapping the TRef objects into a class called PointerWrapper. It contains both a direct, but transitional C pointer and a TRef to the referenced object. The latter, however, is only generated when required, i.e. if the object holding the PointerWrapper as well as referenced object are going to be written to file. This is achieved by first going through all relevant objects, marking them for storage:

for(auto& object : objects) {
    object.markForStorage();
}

Now, the required history references can be identified and TRef objects are generated only for relations between two objects that are both marked for storage:

for(auto& object : objects) {
    object.petrifyHistory();
}

Objects can now be written to file and will contain the persistent reference to the related object.

This approach solves the above problems. File writing has to be performed single-threaded anyway, so generating TRef objects on the same thread does not lead to additional locking of the central reference table mutex in root. In addition, TRef entries are only generated and stored in the table for objects that require it - all references to objects not to be stored will be nullptr in either case since the target object is not available anymore when reading in the data. Since now the generation of TRef objects and access to the reference table is performed by a single thread and one single event at a time, it is also possible to reset the ROOT-internal object ID of TRef references after the event has been processed. The subsequent event will reuse the same IDs again, preventing a continuous growth of the reference table and related memory re-allocation issues.

As a consequence, when reading objects back from file in a multithreaded environment, the TRef has to be converted back to a C memory pointer in the reading thread, both to prevent mixing of re-used TRef object IDs from different events and to avoid locking access to the central reference table when looking up the memory location from there. This is performed similarly to the generation of history relations, and here only relations to valid TRefs are loaded, other relations will hold a nullptr:

for(auto& object : objects) {
    object.loadHistory();
}

For single-threaded applications such as ROOT analysis macros, this step is not necessary and the reference will be lazy-loaded when accessed, i.e. the TRef reference will be converted to a direct raw pointer only when actually used. Since events are processed sequentially and memory is freed between events, no mixing of IDs occurs.