Another random guess: Maybe shader compilation is a problem insofar that it blocks in a different GL API call than in other drivers, which causes some lock to be held for much longer. Then the I/O thread won't find the lock free to submit a result when it expects to. That would require the GL / I/O thread synchronization to be done in a suboptimal way - the right synchronization primitives and techniques would avoid such problems.