The code has been tested on Ubuntu 24.04 with CUDA 13.0 (driver 580.126.09), and on Ubuntu 20.04 with CUDA 12.8 (driver 570.133.07).