· FLIP· OpenCL· Notes
Week 27 — Profiling the FLIP solver
Spent most of the week with a profiler open. The assumption going in was that the particle-to-grid transfer would be the expensive part — it usually gets blamed. It wasn't. On mid-resolution grids the pressure projection is where the time goes, by a wide margin.
Porting the Jacobi projection to OpenCL got me roughly a 2× overall speedup, but the win is smaller than the raw kernel numbers suggest, because the boundary handling still runs on the CPU and the sync between the two keeps stalling the pipeline.
Next week: move the boundary pass onto the GPU too, even if the first version is naive, just to keep everything resident. Measuring beats guessing — I would have optimised the wrong loop for a day if I hadn't profiled first.