This tutorial demonstrates how to use lbmpy with waLBerla to generate efficient implementations of Lattice Boltzmann Methods (LBM) to be included in large-scale distributed memory fluid flow simulations. While waLBerla provides an advanced framework for setting up and running complex fluid simulations, lbmpy brings the flexibility to generate highly optimized LBMs for CPUs and GPUs. Manually writing an efficient C++ or GPU implementation of an advanced LBM can be very cumbersome, especially because the compute kernel needs to be optimized for specific hardware. For this reason, lbmpy was developed. It is a Python framework which allows defining a set of LB equations at an abstract level, thus enabling development on the mathematical description of the problem directly and then generate a highly optimized C, OpenCL or CUDA implementation of these equations. An introduction to lbmpy can be found here.
As in the previous tutorial (Tutorial - Code Generation 1: Explicit Solver for the Heat Equation), we will first define the numeric methods and set up code generation in a Python script. We will then include the generated code in a waLBerla application to simulate a two-dimensional shear flow in a periodic domain. Additionally, it will be shown how a waLBerla simulation with complex initial conditions can be initialized using parameter files.
In this section, we will define a single relaxation time (SRT) collision operator in Python using lbmpy and generate a waLBerla lattice model implementation from its equations. A lattice model defines several things, including:
In addition to the lattice model, it will be shown how to generate a method-specific pack info class for MPI communication to reduce communication load to the necessary minimum.
In the code generation python script, we first require a few imports from lbmpy itself and from the waLBerla code generation libraries. lbmpy code generation is based on pystencils; the basic procedure is thus the same as in the previous tutorial. We need the CodeGeneration
context from the pystencils_walberla
module for the connection to the build system. For generating the communication pack info, we will use generate_pack_info_from_kernel
from the same module. This method of pack info generation is not limited to LBM implementations but can be used with any sweep kernel. The function generate_pack_info_from_kernel
takes a pystencils AssignmentCollection
and extracts all field accesses to determine which cell entries need to be communicated.
From the lbmpy.creationfunctions
we require the functions to create collision and update rules. For the actual code generation, generate_lattice_model
from lbmpy_walberla
is required. Since we will define symbols, SymPy
is also needed.
First, we define a few general parameters. These include the stencil (D2Q9) and the memory layout (fzyx
, see Tutorial - Code Generation 1: Explicit Solver for the Heat Equation ). We define a SymPy symbol for the relaxation rate \( \omega \). This means we can later set it to a specific value from the waLBerla code. A dictionary with optimization parameters is also set up. Here, we enable global common subexpression elimination (cse_global
) and set the PDF field's memory layout.
Next, we set the parameters for the SRT method in a dictionary and create both the collision and update rules by calling the respective lbmpy functions. They both return an AssignmentCollection
containing all necessary equations. The only parameters needed for SRT are the stencil and the relaxation rate. For generating the lattice model, we only require the collision rule's equations since generate_lattice_model
adds the two-fields pull scheme for the streaming step internally. At this point, the lattice model generation is limited to the standard stream-pull-collide scheme.
The update rule is still needed in the code generation process; namely for the pack info generation. The collision step only acts within one cell. Thus, the collision rule's equations contain no neighbour accesses. Calling create_lb_update_rule
inserts the two-fields pull scheme as generate_lattice_model
, and resulting update rule contains exactly those neighbour accesses which are required for generate_pack_info_from_kernel
to build the optimized pack info.
Finally, we create the code generation context and call the respective functions for generating the lattice model and the pack info. Both require the context and a class name as parameters. To generate_lattice_model
, we also pass the collision rule and the field layout; generate_pack_info_from_kernel
receives the update rule.
Notice that, other than in Tutorial - Code Generation 1: Explicit Solver for the Heat Equation, we did not need to define any fields. Both the source and destination PDF fields are created internally by lbmpy and generate_lattice_model
.
Furthermore, if we optimise the waLBerla for the machine, it is compiled on with the CMake flag OPTIMIZE_FOR_LOCALHOST
, the code generator automatically introduces vector intrinsics in the kernel files. Available intrinsics sets are sse
, avx
and avx512
. These sets can be passed manually with the argument cpu_vectorize_info
. More information on CPU optimisations available in lbmpy
can be found here. By installing the cpu_vectorize_info
package, it is also possible for lbmpy
to automatically determine the support intrinsics set of the hardware.
As a final touch, we still need to set up the CMake build target for the code generation script. This time, two distinct classes (the lattice model and the pack information) will be generated. Therefore, we need to list the header and source file names for both classes separately.
This completes the code generation part.
This section is concerned with the implementation of a waLBerla C++ application for the simulation of a periodic shear flow using the generated SRT lattice model and pack info. After adding the code generation target defined above as a dependency to the application's CMake target, we can include both generated classes:
For convenience, we define typedef aliases for the several types and templates that will later be used. Since our generated LBM implementation is consistent with the waLBerla LatticeModel API, we can use the lbm::PdfField class template for our PDF fields.
The application will be mostly similar to Tutorial - LBM 1: Basic LBM Simulation, so we will mainly focus on the differences here. It will also be configured by a parameter file as described in the other tutorial.
The only significant difference caused by the usage of a generated lattice model is when the LBM sweep is added to the timeloop. The sweep is exposed as a static member of the generated lattice model class and can be added like this:
The remaining extensions concern only the setup of boundaries and the initial velocities.
We will set up a shear flow scenario in a rectangular, two-dimensional domain which is periodic in the x-direction and limited by NoSlip-boundaries (i.e. walls) to the north and south. The fluid will be moving rightwards in the northern and southern parts of the domain, and leftwards in the middle with the same speed. Its velocity in the y-direction will be slightly perturbed by random noise, which will cause the development of vortices between the shear layers during the simulation.
For simplicity, the boundaries are set up using the lbm::DefaultBoundaryHandlingFactory as described in Tutorial - LBM 1: Basic LBM Simulation.
In the parameter file, the boundary block only defines the NoSlip boundaries. Also, in the DomainSetup
block, the domain needs to be periodic in the x-direction.
The velocity initialization can be defined directly in the parameter file. We will be using the lbm::initializer::PdfFieldInitializer class with lbm::initializer::ExprSystemInitFunction which is capable of parsing mathematical expressions from the parameter file to set up complex initial flows. For this purpose, waLBerla uses the exprtk C++ library. We will need to extend this functionality a little to introduce the random noise.
The PdfFieldInitializer's initDensityAndVelocity
function expects a function of type std::vector< real_t > (const Cell&)
. This function will receive a Cell
with global coordinates and should return a std::vector
with four entries: One density and three cartesian velocity components for the given cell. For this purpose, we create a functor struct with these members:
exprInitFunc_
of lbm::initializer::ExprSystemInitFunction for initializing the x-velocities;rng_
for the y-velocities, which is an instance of walberla::math::RealRandom;All members will be set using parameters specified in a parameter file block. The functor exprInitFunc_
is initialized by calling its parse
method inside the constructor.
We also define the operator()
with the above signature. It first calls exprInitFunc_
to set density and velocity, and then adds the random perturbation to the y component.
The required parameters and expressions for initializing the density and velocity are defined in the parameter file. The block is called ShearFlowSetup
. For rho
, u_x
, u_y
and u_z
, mathematical expressions can be specified which may include the variables x
, y
, z
for a cell's global position and n_x
, n_y
, n_z
representing the number of cells in each direction. These expressions will be evaluated for each domain cell.
A seed for the random number generator is also specified, which controls the random noise and therefore makes the test case reproducible.
This completes the C++ implementation. It will produce VTK files which can then be viewed using ParaView, as described in previous tutorials.
Although generated lattice models are easy to define and easy to integrate with the waLBerla framework, they are somewhat limited in features. For example, the waLBerla code generation methods for lattice models do not allow advanced streaming patterns or the generation of CUDA code. This is remedied by generating sweeps using generate_sweep
directly instead of generate_lattice_model
. The result is a generic waLBerla sweep which is not consistent with the lattice model API, thus making implementations a little more complicated since many class templates created for the use with lattice models can not be used. For this problem too, code generation offers solutions which will be further explained in the next tutorial.