public inbox for wireguard@lists.zx2c4.com
 help / color / mirror / Atom feed
* [PATCH 0/1] wintun driver get stalled for few sec until another packet is being sent -- approach 2
@ 2026-02-24 23:47 odedkatz
  2026-02-24 23:47 ` [PATCH 1/1] Fix missed-wakeup race in ring buffer Alertable signaling odedkatz
  0 siblings, 1 reply; 2+ messages in thread
From: odedkatz @ 2026-02-24 23:47 UTC (permalink / raw)
  To: wireguard; +Cc: odedk, alexey, odedkatz

## Problem
High-speed TCP uploads stall for 4-5 seconds when the sender and receiver are close (~1ms RTT). The root cause is a missed-wakeup race in the Receive ring's Alertable/Tail signaling protocol between the userspace API (WintunSendPacket) and the kernel0 consumer thread (TunProcessReceiveData).

On x86-64, WriteRelease/ReadAcquire provide acquire-release semantics but do not emit MFENCE — the only instruction that prevents store-load reordering (the one reordering x86 permits). Both cores can simultaneously observe stale values of the other's store, causing the userspace to skip SetEvent while the driver enters KeWaitForMultipleObjects indefinitely.

The driver sleeps until a TCP retransmission (RTO ~1s + exponential backoff) coincidentally wins the race, producing the observed 4-5 second stalls.

## Fix
Insert MemoryBarrier() between each store-load pair on both sides. This guarantees that at least one side always sees the other's store, eliminating the missed wakeup. The Alertable optimization (avoiding SetEvent syscalls when the driver is actively spinning) is fully preserved.
Alexey Lapuka (1):
  Fix missed-wakeup race in ring buffer Alertable signaling

 api/session.c   | 1 +
 driver/wintun.c | 1 +
 2 files changed, 2 insertions(+)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 2+ messages in thread

* [PATCH 1/1] Fix missed-wakeup race in ring buffer Alertable signaling
  2026-02-24 23:47 [PATCH 0/1] wintun driver get stalled for few sec until another packet is being sent -- approach 2 odedkatz
@ 2026-02-24 23:47 ` odedkatz
  0 siblings, 0 replies; 2+ messages in thread
From: odedkatz @ 2026-02-24 23:47 UTC (permalink / raw)
  To: wireguard; +Cc: odedk, alexey

From: Alexey Lapuka <alexey@twingate.com>

   Add MemoryBarrier() between store-load pairs in the Dekker-style
   synchronization used by the Receive ring's Alertable/Tail protocol.

   On x86-64, WriteRelease/ReadAcquire only prevent compiler reordering
   and provide acquire/release semantics, but do not emit MFENCE — the
   only instruction that prevents store-load reordering across cores.
   Without a full barrier, both the userspace producer and the kernel
   consumer can simultaneously read stale values:

     Userspace: STORE(Tail)  ...  LOAD(Alertable) -> sees FALSE (stale)
     Driver:    STORE(Alertable=TRUE) ... LOAD(Tail) -> sees old tail

   The driver then enters KeWaitForMultipleObjects with no pending
   SetEvent, sleeping until a TCP retransmission (typically 4-5s later)
   re-triggers the send path and wins the race.

   The fix adds MemoryBarrier() (MFENCE on x86) on both sides:
   - api/session.c WintunSendPacket: between WriteULongRelease(Tail) and
     ReadAcquire(Alertable)
   - driver/twintun.c TunProcessReceiveData: between
     WriteRelease(Alertable, TRUE) and ReadULongAcquire(Tail)

   This guarantees that at least one side always observes the other's
   store, preventing the missed wakeup while preserving the Alertable
   optimization that avoids unnecessary SetEvent syscalls.
---
 api/session.c   | 1 +
 driver/wintun.c | 1 +
 2 files changed, 2 insertions(+)

diff --git a/api/session.c b/api/session.c
index ab96c64..13d5bca 100644
--- a/api/session.c
+++ b/api/session.c
@@ -302,6 +302,7 @@ WintunSendPacket(TUN_SESSION *Session, const BYTE *Packet)
     if (Session->Descriptor.Receive.Ring->Tail != Session->Receive.TailRelease)
     {
         WriteULongRelease(&Session->Descriptor.Receive.Ring->Tail, Session->Receive.TailRelease);
+        MemoryBarrier();
         if (ReadAcquire(&Session->Descriptor.Receive.Ring->Alertable))
             SetEvent(Session->Descriptor.Receive.TailMoved);
     }
diff --git a/driver/wintun.c b/driver/wintun.c
index 82e346b..72ba5d3 100644
--- a/driver/wintun.c
+++ b/driver/wintun.c
@@ -481,6 +481,7 @@ TunProcessReceiveData(_Inout_ TUN_CTX *Ctx)
             if (RingHead == RingTail)
             {
                 WriteRelease(&Ring->Alertable, TRUE);
+                MemoryBarrier();
                 RingTail = ReadULongAcquire(&Ring->Tail);
                 if (RingHead == RingTail)
                 {
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2026-02-24 23:50 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-24 23:47 [PATCH 0/1] wintun driver get stalled for few sec until another packet is being sent -- approach 2 odedkatz
2026-02-24 23:47 ` [PATCH 1/1] Fix missed-wakeup race in ring buffer Alertable signaling odedkatz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox