A brief verbal explaination of how the multithreading in sofren works
- No mutexes are used anywhere, as each thread works on a separate tile, and tiles are non overlapping rectangular regions of the screen, threads can write to the final buffers simultaneously without interfering with one another.
Everything is initialized from sfr_init, using memory from the provided buffer (sfrBuffers)
- Four semaphores,
geometryStartSem(main thread signals this to tell workers to start geometry phase),geometryDoneSem(workers signal this upon completing their geometry work),rasterStartSem(main thread singlas this to tell workers to start raster phase), andrasterDoneSem(workers signal this upon completing their rasterization work), are initialized. SFR_THREAD_COUNTthreads are created, with each executingsfr__worker_thread_func. On Windows,_beginthreadexis used, otherwisepthread_createis used.
Once created, each thread enters the infinite loop inside sfr__worker_thread_func where they wait to be assigned work.
Performs all the per triangle calculations independent of the final output (i.e. no writing to pixel or depth buffers), such as vertex transformation, lighting, and clipping.
- Job Creation (
sfr_mesh):
- When you call
sfr_meshwith a large model, the work is broken up. - The mesh's triangle list is divided into chunks, each
SFR_GEOMETRY_JOB_SIZEin size. - For each chunk, a
SfrMeshChunkJobis created. This job contains a pointer to the mesh, the transformation matrices, color, texture, and the start index and count of triangles it's responsible for.
- Job Queuing (
sfr_mesh):
- Once a
SfrMeshChunkJobis created, its index is added to thegeometryWorkQueue. - The
geometryWorkQueueCountatomic is incremented to make the job visible to worker threads.
- Dispatch and Execution (
sfr_flush_and_waitandsfr__worker_thread_func):
- The main thread calling
sfr_flush_and_waitposts togeometryStartSem, releasing all worker threads from their wait state. - Each worker thread enters a loop to "steal" work from the
geometryWorkQueue. It does this by atomically incrementing thegeometryWorkQueueHeadcounter to get a unique job index. - The thread then processes every triangle within its assigned
SfrMeshChunkJobby callingsfr__process_and_bin_triangle.
- Triangle Binning (
sfr__process_and_bin_triangleandsfr__bin_triangle):
- The triangle's screen space bounding box is calculated to determine which screen tiles it overlaps (the screen is divided into a grid of tiles, each
SFR_TILE_WIDTHbySFR_TILE_HEIGHTpixels). - A pointer to the
SfrTriangleBinis added to the bin list of every tile it touches. Thetile->binCountis updated atomically. - The first time a triangle is added to a specific tile, that tile's index is added to the
rasterWorkQueue, marking it as needing rasterization in the next phase.
- Synchronization (
sfr_flush_and_wait):
- After a worker thread can't find more geometry jobs in the queue, it signals the
geometryDoneSemand proceeds to wait on therasterStartSemfor the next phase. - The main thread, inside
sfr_flush_and_wait, waits until it has received a signal ongeometryDoneSemfrom every worker thread, ensuring the entire geometry phase is complete before continuing.
Takes the binned triangles from the geometry phase and converts them to screen pixels.
- Job Creation:
- The work for this phase was already created during the geometry phase.
- The
rasterWorkQueueis now populated with the indices of all tiles that have at least one triangle to draw.
- Dispatch and Execution:
- Still inside
sfr_flush_and_wait, the main thread posts to therasterStartSem, waking all worker threads to begin rasterization. - Each worker atomically increments the
rasterWorkQueueHeadto steal a tile index from therasterWorkQueue. - The thread gets a pointer to the
SfrTileand iterates through its list ofSfrTriangleBinpointers. For each bin, it callssfr_rasterize_binto perform the actual rasterizing.
- Synchronization:
- When a worker finishes its loop and the raster queue is empty, it signals the
rasterDoneSemand loops back to wait for the next geometry phase. - The main thread waits on
rasterDoneSemuntil all workers have signaled completion, which marks the end of a fully rendered frame. It then resets all the job queues and allocators for the next frame.