OPENCL·2025·ONGOING

OpenCL FLIP Kernels

An exercise in understanding a FLIP solver by rebuilding its hot loops as OpenCL kernels — particle-to-grid transfer, pressure projection, and grid-to-particle — then measuring where the time actually goes.

WRITE-UP

Kernels

P2G and G2P are straightforward to parallelise per particle; the pressure solve is the interesting part. A Jacobi iteration is trivial on the GPU but converges slowly, so most of the study is about iteration counts vs. visual result.

Profiling

On mid-resolution grids the OpenCL path lands roughly 2× over the naive CPU reference, but boundary handling and the divergence pass still dominate. The lesson: the transfer isn't the bottleneck, the projection is.

NOTES

Jacobi pressure solve at 40–60 iterations was the readable/fast trade-off.
Atomic scatter in P2G is the correctness trap — colour the grid to avoid races.