* [PATCH 0/1] This PR fixes a critical race condition in the Wintun driver that causes ring buffer overruns
@ 2026-02-19 19:32 odedkatz
2026-02-19 19:32 ` [PATCH 1/1] in order to prevent buffer overrun (which was observed while sending multiple high throughput UDP streams from different threads) I move the driver spinlock to protect Ring buffer Head odedkatz
0 siblings, 1 reply; 8+ messages in thread
From: odedkatz @ 2026-02-19 19:32 UTC (permalink / raw)
To: wireguard; +Cc: odedk, odedkatz
** Description **
This PR fixes a critical race condition in the Wintun driver that causes ring buffer overruns when multiple UDP senders operate in parallel. The issue occurred because multiple threads could read the same Ring->Head value without synchronization, leading to concurrent modifications that corrupted the ring buffer and resulted in ERROR_INVALID_DATA errors.
** Changes **
Extended the critical section in TunSendNetBufferLists by moving spinlock acquisition before reading Ring->Head, ensuring atomic read-modify-write operations on the ring buffer state
Fixed the lock release timing in TunReturnNetBufferLists to occur after updating Ring->Head, ensuring consistency between the active NBL list and ring buffer head pointer
** History **
the issue was reported to wireguard
```
We were trying to integrate `wintun` into our product and while
testing we found an issue with upload of high UDP throughput on
multiple sockets. initially we thought it may be related to our
integration. but we managed to find the same issue in your example
code.we tried to do some debugging on the example, but it seems like
the corruption is on the driver side.
the way to reproduce this issue- run the example
- run the `send_udp.py` (the attached python script) on 2 different
command terminals
- `send_udp.py 10.6.7.8 5201 1450`
- `send_udp.py 10.6.7.8 5201 1400`after some time the
`WintunReceivePacket` returns NULL and `GetLastError()` return 13
(`ERROR_INVALID_DATA`)
many thanks,
```
a fix was provided by wireguard with [this commit](https://git.zx2c4.com/wintun/commit/driver/wintun.c?id=8814f660310c30a9297fc640c844c4ab6f27e5d7), but it didn't solve the issue.
odedkatz (1):
in order to prevent buffer overrun (which was observed while sending
multiple high throughput UDP streams from different threads) I move
the driver spinlock to protect Ring buffer Head.
driver/wintun.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
--
2.43.0
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 1/1] in order to prevent buffer overrun (which was observed while sending multiple high throughput UDP streams from different threads) I move the driver spinlock to protect Ring buffer Head.
2026-02-19 19:32 [PATCH 0/1] This PR fixes a critical race condition in the Wintun driver that causes ring buffer overruns odedkatz
@ 2026-02-19 19:32 ` odedkatz
2026-02-20 7:28 ` Simon Rozman
0 siblings, 1 reply; 8+ messages in thread
From: odedkatz @ 2026-02-19 19:32 UTC (permalink / raw)
To: wireguard; +Cc: odedk, odedkatz
I observed that the Ring->Head was taken and manipulated later on with just a `ReadULongAcquire` which isn't OK when 2 are trying to manipulate it later on based on the same received value.
---
driver/wintun.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/driver/wintun.c b/driver/wintun.c
index d1f3b9f..65cd97e 100644
--- a/driver/wintun.c
+++ b/driver/wintun.c
@@ -284,13 +284,14 @@ TunSendNetBufferLists(
TUN_RING *Ring = Ctx->Device.Send.Ring;
ULONG RingCapacity = Ctx->Device.Send.Capacity;
+ KLOCK_QUEUE_HANDLE LockHandle;
+ KeAcquireInStackQueuedSpinLock(&Ctx->Device.Send.Lock, &LockHandle);
/* Allocate space for packets in the ring. */
ULONG RingHead = ReadULongAcquire(&Ring->Head);
- if (Status = NDIS_STATUS_ADAPTER_NOT_READY, RingHead >= RingCapacity)
+ if (Status = NDIS_STATUS_ADAPTER_NOT_READY, RingHead >= RingCapacity) {
+ KeReleaseInStackQueuedSpinLock(&LockHandle);
goto skipNbl;
-
- KLOCK_QUEUE_HANDLE LockHandle;
- KeAcquireInStackQueuedSpinLock(&Ctx->Device.Send.Lock, &LockHandle);
+ }
ULONG RingTail = Ctx->Device.Send.RingTail;
ASSERT(RingTail < RingCapacity);
@@ -419,8 +420,8 @@ TunReturnNetBufferLists(NDIS_HANDLE MiniportAdapterContext, PNET_BUFFER_LIST Net
Ctx->Device.Receive.ActiveNbls.Head = NET_BUFFER_LIST_NEXT_NBL_EX(CompletedNbl);
if (!Ctx->Device.Receive.ActiveNbls.Head)
KeSetEvent(&Ctx->Device.Receive.ActiveNbls.Empty, IO_NO_INCREMENT, FALSE);
- KeReleaseInStackQueuedSpinLock(&LockHandle);
WriteULongRelease(&Ring->Head, TunNblGetOffset(CompletedNbl));
+ KeReleaseInStackQueuedSpinLock(&LockHandle);
const MDL *TargetMdl = Ctx->Device.Receive.Mdl;
for (MDL *Mdl = NET_BUFFER_FIRST_MDL(NET_BUFFER_LIST_FIRST_NB(CompletedNbl)); Mdl; Mdl = Mdl->Next)
{
--
2.43.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 1/1] in order to prevent buffer overrun (which was observed while sending multiple high throughput UDP streams from different threads) I move the driver spinlock to protect Ring buffer Head.
2026-02-19 19:32 ` [PATCH 1/1] in order to prevent buffer overrun (which was observed while sending multiple high throughput UDP streams from different threads) I move the driver spinlock to protect Ring buffer Head odedkatz
@ 2026-02-20 7:28 ` Simon Rozman
2026-02-20 16:55 ` Oded Katz
2026-03-11 23:35 ` Oded Katz
0 siblings, 2 replies; 8+ messages in thread
From: Simon Rozman @ 2026-02-20 7:28 UTC (permalink / raw)
To: odedkatz, wireguard; +Cc: odedk
Hi, Oded!
First and foremost, thank you very much for taking time to look into
this and troubleshoot it.
It was believed, that ReadULongAcquire() and WriteULongRelease() alone
provide atomic manipulation with Ring->Head on all modern platforms.
Hence, these calls were made outside the spinlock, to squeeze an extra
micrometer of performance.
I could not directly apply your patch to the wintun repo, since it does
not follow our code style, commit message conventions and is not
Signed-off-by you.
However, it would have been a terrible waste of your research if your
contribution wouldn't get into wintun, so I reworked your PR and applied
it here:
https://git.zx2c4.com/wintun/commit/?id=607c181ea9fa0036d23598038e6019ad54db5ce4
Please, stay tuned for an official WHQL-signed release.
Lep pozdrav | Best regards,
Simon Rozman
Amebis, d. o. o., Kamnik
On 19. 2. 2026 20.32, odedkatz wrote:
> I observed that the Ring->Head was taken and manipulated later on with just a `ReadULongAcquire` which isn't OK when 2 are trying to manipulate it later on based on the same received value.
> ---
> driver/wintun.c | 11 ++++++-----
> 1 file changed, 6 insertions(+), 5 deletions(-)
>
> diff --git a/driver/wintun.c b/driver/wintun.c
> index d1f3b9f..65cd97e 100644
> --- a/driver/wintun.c
> +++ b/driver/wintun.c
> @@ -284,13 +284,14 @@ TunSendNetBufferLists(
> TUN_RING *Ring = Ctx->Device.Send.Ring;
> ULONG RingCapacity = Ctx->Device.Send.Capacity;
>
> + KLOCK_QUEUE_HANDLE LockHandle;
> + KeAcquireInStackQueuedSpinLock(&Ctx->Device.Send.Lock, &LockHandle);
> /* Allocate space for packets in the ring. */
> ULONG RingHead = ReadULongAcquire(&Ring->Head);
> - if (Status = NDIS_STATUS_ADAPTER_NOT_READY, RingHead >= RingCapacity)
> + if (Status = NDIS_STATUS_ADAPTER_NOT_READY, RingHead >= RingCapacity) {
> + KeReleaseInStackQueuedSpinLock(&LockHandle);
> goto skipNbl;
> -
> - KLOCK_QUEUE_HANDLE LockHandle;
> - KeAcquireInStackQueuedSpinLock(&Ctx->Device.Send.Lock, &LockHandle);
> + }
>
> ULONG RingTail = Ctx->Device.Send.RingTail;
> ASSERT(RingTail < RingCapacity);
> @@ -419,8 +420,8 @@ TunReturnNetBufferLists(NDIS_HANDLE MiniportAdapterContext, PNET_BUFFER_LIST Net
> Ctx->Device.Receive.ActiveNbls.Head = NET_BUFFER_LIST_NEXT_NBL_EX(CompletedNbl);
> if (!Ctx->Device.Receive.ActiveNbls.Head)
> KeSetEvent(&Ctx->Device.Receive.ActiveNbls.Empty, IO_NO_INCREMENT, FALSE);
> - KeReleaseInStackQueuedSpinLock(&LockHandle);
> WriteULongRelease(&Ring->Head, TunNblGetOffset(CompletedNbl));
> + KeReleaseInStackQueuedSpinLock(&LockHandle);
> const MDL *TargetMdl = Ctx->Device.Receive.Mdl;
> for (MDL *Mdl = NET_BUFFER_FIRST_MDL(NET_BUFFER_LIST_FIRST_NB(CompletedNbl)); Mdl; Mdl = Mdl->Next)
> {
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/1] in order to prevent buffer overrun (which was observed while sending multiple high throughput UDP streams from different threads) I move the driver spinlock to protect Ring buffer Head.
2026-02-20 7:28 ` Simon Rozman
@ 2026-02-20 16:55 ` Oded Katz
2026-02-27 21:28 ` Oded Katz
2026-03-11 23:35 ` Oded Katz
1 sibling, 1 reply; 8+ messages in thread
From: Oded Katz @ 2026-02-20 16:55 UTC (permalink / raw)
To: Simon Rozman; +Cc: wireguard, odedk
Hi Simon,
Thanks for the update.
FYI: the ReadULongAcquire() and WriteULongRelease() themself are good
enough for getting and updating an atomic operation.
however, after on thread updates the head, the second thread can't
stay with the same atomic read value of the Head.
- so assuming 2 thread read the Head on the same time so the 2 threads
has same head.
- now 1st thread takes the spin lock and updates the head
- 2nd thread is pending the job until the spinlock is released and
then it doing the job, but with the non-updated Head value
- so it will overrun the previous thread data and mess the ringbuffer
again thanks for providing the next release
Regards,
Oded
On Thu, Feb 19, 2026 at 11:28 PM Simon Rozman <simon.rozman@amebis.si> wrote:
>
> Hi, Oded!
>
> First and foremost, thank you very much for taking time to look into
> this and troubleshoot it.
>
> It was believed, that ReadULongAcquire() and WriteULongRelease() alone
> provide atomic manipulation with Ring->Head on all modern platforms.
> Hence, these calls were made outside the spinlock, to squeeze an extra
> micrometer of performance.
>
> I could not directly apply your patch to the wintun repo, since it does
> not follow our code style, commit message conventions and is not
> Signed-off-by you.
>
> However, it would have been a terrible waste of your research if your
> contribution wouldn't get into wintun, so I reworked your PR and applied
> it here:
> https://git.zx2c4.com/wintun/commit/?id=607c181ea9fa0036d23598038e6019ad54db5ce4
>
> Please, stay tuned for an official WHQL-signed release.
>
> Lep pozdrav | Best regards,
> Simon Rozman
> Amebis, d. o. o., Kamnik
>
> On 19. 2. 2026 20.32, odedkatz wrote:
> > I observed that the Ring->Head was taken and manipulated later on with just a `ReadULongAcquire` which isn't OK when 2 are trying to manipulate it later on based on the same received value.
> > ---
> > driver/wintun.c | 11 ++++++-----
> > 1 file changed, 6 insertions(+), 5 deletions(-)
> >
> > diff --git a/driver/wintun.c b/driver/wintun.c
> > index d1f3b9f..65cd97e 100644
> > --- a/driver/wintun.c
> > +++ b/driver/wintun.c
> > @@ -284,13 +284,14 @@ TunSendNetBufferLists(
> > TUN_RING *Ring = Ctx->Device.Send.Ring;
> > ULONG RingCapacity = Ctx->Device.Send.Capacity;
> >
> > + KLOCK_QUEUE_HANDLE LockHandle;
> > + KeAcquireInStackQueuedSpinLock(&Ctx->Device.Send.Lock, &LockHandle);
> > /* Allocate space for packets in the ring. */
> > ULONG RingHead = ReadULongAcquire(&Ring->Head);
> > - if (Status = NDIS_STATUS_ADAPTER_NOT_READY, RingHead >= RingCapacity)
> > + if (Status = NDIS_STATUS_ADAPTER_NOT_READY, RingHead >= RingCapacity) {
> > + KeReleaseInStackQueuedSpinLock(&LockHandle);
> > goto skipNbl;
> > -
> > - KLOCK_QUEUE_HANDLE LockHandle;
> > - KeAcquireInStackQueuedSpinLock(&Ctx->Device.Send.Lock, &LockHandle);
> > + }
> >
> > ULONG RingTail = Ctx->Device.Send.RingTail;
> > ASSERT(RingTail < RingCapacity);
> > @@ -419,8 +420,8 @@ TunReturnNetBufferLists(NDIS_HANDLE MiniportAdapterContext, PNET_BUFFER_LIST Net
> > Ctx->Device.Receive.ActiveNbls.Head = NET_BUFFER_LIST_NEXT_NBL_EX(CompletedNbl);
> > if (!Ctx->Device.Receive.ActiveNbls.Head)
> > KeSetEvent(&Ctx->Device.Receive.ActiveNbls.Empty, IO_NO_INCREMENT, FALSE);
> > - KeReleaseInStackQueuedSpinLock(&LockHandle);
> > WriteULongRelease(&Ring->Head, TunNblGetOffset(CompletedNbl));
> > + KeReleaseInStackQueuedSpinLock(&LockHandle);
> > const MDL *TargetMdl = Ctx->Device.Receive.Mdl;
> > for (MDL *Mdl = NET_BUFFER_FIRST_MDL(NET_BUFFER_LIST_FIRST_NB(CompletedNbl)); Mdl; Mdl = Mdl->Next)
> > {
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/1] in order to prevent buffer overrun (which was observed while sending multiple high throughput UDP streams from different threads) I move the driver spinlock to protect Ring buffer Head.
2026-02-20 16:55 ` Oded Katz
@ 2026-02-27 21:28 ` Oded Katz
2026-02-27 22:03 ` Simon Rozman
0 siblings, 1 reply; 8+ messages in thread
From: Oded Katz @ 2026-02-27 21:28 UTC (permalink / raw)
To: Simon Rozman; +Cc: wireguard, odedk
Hi Simon,
We have found another issue with the driver code.
some race condition between "alertable" flag between driver and
user-side. causing driver to stall until new packet wakes it up.
We have seen some cases where it can be for periods of 5 sec.
we already made the fix, and I sent 2 approaches for that fix.
please let us know if you are planning to solve it in your repo?
regards and thanks,
Oded
On Fri, Feb 20, 2026 at 8:55 AM Oded Katz <katz.oded@gmail.com> wrote:
>
> Hi Simon,
>
> Thanks for the update.
> FYI: the ReadULongAcquire() and WriteULongRelease() themself are good
> enough for getting and updating an atomic operation.
> however, after on thread updates the head, the second thread can't
> stay with the same atomic read value of the Head.
> - so assuming 2 thread read the Head on the same time so the 2 threads
> has same head.
> - now 1st thread takes the spin lock and updates the head
> - 2nd thread is pending the job until the spinlock is released and
> then it doing the job, but with the non-updated Head value
> - so it will overrun the previous thread data and mess the ringbuffer
>
>
> again thanks for providing the next release
> Regards,
> Oded
>
> On Thu, Feb 19, 2026 at 11:28 PM Simon Rozman <simon.rozman@amebis.si> wrote:
> >
> > Hi, Oded!
> >
> > First and foremost, thank you very much for taking time to look into
> > this and troubleshoot it.
> >
> > It was believed, that ReadULongAcquire() and WriteULongRelease() alone
> > provide atomic manipulation with Ring->Head on all modern platforms.
> > Hence, these calls were made outside the spinlock, to squeeze an extra
> > micrometer of performance.
> >
> > I could not directly apply your patch to the wintun repo, since it does
> > not follow our code style, commit message conventions and is not
> > Signed-off-by you.
> >
> > However, it would have been a terrible waste of your research if your
> > contribution wouldn't get into wintun, so I reworked your PR and applied
> > it here:
> > https://git.zx2c4.com/wintun/commit/?id=607c181ea9fa0036d23598038e6019ad54db5ce4
> >
> > Please, stay tuned for an official WHQL-signed release.
> >
> > Lep pozdrav | Best regards,
> > Simon Rozman
> > Amebis, d. o. o., Kamnik
> >
> > On 19. 2. 2026 20.32, odedkatz wrote:
> > > I observed that the Ring->Head was taken and manipulated later on with just a `ReadULongAcquire` which isn't OK when 2 are trying to manipulate it later on based on the same received value.
> > > ---
> > > driver/wintun.c | 11 ++++++-----
> > > 1 file changed, 6 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/driver/wintun.c b/driver/wintun.c
> > > index d1f3b9f..65cd97e 100644
> > > --- a/driver/wintun.c
> > > +++ b/driver/wintun.c
> > > @@ -284,13 +284,14 @@ TunSendNetBufferLists(
> > > TUN_RING *Ring = Ctx->Device.Send.Ring;
> > > ULONG RingCapacity = Ctx->Device.Send.Capacity;
> > >
> > > + KLOCK_QUEUE_HANDLE LockHandle;
> > > + KeAcquireInStackQueuedSpinLock(&Ctx->Device.Send.Lock, &LockHandle);
> > > /* Allocate space for packets in the ring. */
> > > ULONG RingHead = ReadULongAcquire(&Ring->Head);
> > > - if (Status = NDIS_STATUS_ADAPTER_NOT_READY, RingHead >= RingCapacity)
> > > + if (Status = NDIS_STATUS_ADAPTER_NOT_READY, RingHead >= RingCapacity) {
> > > + KeReleaseInStackQueuedSpinLock(&LockHandle);
> > > goto skipNbl;
> > > -
> > > - KLOCK_QUEUE_HANDLE LockHandle;
> > > - KeAcquireInStackQueuedSpinLock(&Ctx->Device.Send.Lock, &LockHandle);
> > > + }
> > >
> > > ULONG RingTail = Ctx->Device.Send.RingTail;
> > > ASSERT(RingTail < RingCapacity);
> > > @@ -419,8 +420,8 @@ TunReturnNetBufferLists(NDIS_HANDLE MiniportAdapterContext, PNET_BUFFER_LIST Net
> > > Ctx->Device.Receive.ActiveNbls.Head = NET_BUFFER_LIST_NEXT_NBL_EX(CompletedNbl);
> > > if (!Ctx->Device.Receive.ActiveNbls.Head)
> > > KeSetEvent(&Ctx->Device.Receive.ActiveNbls.Empty, IO_NO_INCREMENT, FALSE);
> > > - KeReleaseInStackQueuedSpinLock(&LockHandle);
> > > WriteULongRelease(&Ring->Head, TunNblGetOffset(CompletedNbl));
> > > + KeReleaseInStackQueuedSpinLock(&LockHandle);
> > > const MDL *TargetMdl = Ctx->Device.Receive.Mdl;
> > > for (MDL *Mdl = NET_BUFFER_FIRST_MDL(NET_BUFFER_LIST_FIRST_NB(CompletedNbl)); Mdl; Mdl = Mdl->Next)
> > > {
> >
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/1] in order to prevent buffer overrun (which was observed while sending multiple high throughput UDP streams from different threads) I move the driver spinlock to protect Ring buffer Head.
2026-02-27 21:28 ` Oded Katz
@ 2026-02-27 22:03 ` Simon Rozman
2026-02-28 1:12 ` Oded Katz
0 siblings, 1 reply; 8+ messages in thread
From: Simon Rozman @ 2026-02-27 22:03 UTC (permalink / raw)
To: Oded Katz; +Cc: wireguard, odedk
Yes, Oded. Your research is much appreciated. I have stare-reviewed it
immediately, and it makes sense: by slightly extending the critical
section a race condition should be fixed. Unfortunately, haven't had
time to apply it to the master branch yet. Hopefully next week.
Definitely before a release.
Lep pozdrav | Best regards,
Simon Rozman
Amebis, d. o. o., Kamnik
On 27.2.2026 22:28, Oded Katz wrote:
> Hi Simon,
>
> We have found another issue with the driver code.
> some race condition between "alertable" flag between driver and
> user-side. causing driver to stall until new packet wakes it up.
> We have seen some cases where it can be for periods of 5 sec.
>
> we already made the fix, and I sent 2 approaches for that fix.
> please let us know if you are planning to solve it in your repo?
>
> regards and thanks,
> Oded
>
> On Fri, Feb 20, 2026 at 8:55 AM Oded Katz <katz.oded@gmail.com> wrote:
>>
>> Hi Simon,
>>
>> Thanks for the update.
>> FYI: the ReadULongAcquire() and WriteULongRelease() themself are good
>> enough for getting and updating an atomic operation.
>> however, after on thread updates the head, the second thread can't
>> stay with the same atomic read value of the Head.
>> - so assuming 2 thread read the Head on the same time so the 2 threads
>> has same head.
>> - now 1st thread takes the spin lock and updates the head
>> - 2nd thread is pending the job until the spinlock is released and
>> then it doing the job, but with the non-updated Head value
>> - so it will overrun the previous thread data and mess the ringbuffer
>>
>>
>> again thanks for providing the next release
>> Regards,
>> Oded
>>
>> On Thu, Feb 19, 2026 at 11:28 PM Simon Rozman <simon.rozman@amebis.si> wrote:
>>>
>>> Hi, Oded!
>>>
>>> First and foremost, thank you very much for taking time to look into
>>> this and troubleshoot it.
>>>
>>> It was believed, that ReadULongAcquire() and WriteULongRelease() alone
>>> provide atomic manipulation with Ring->Head on all modern platforms.
>>> Hence, these calls were made outside the spinlock, to squeeze an extra
>>> micrometer of performance.
>>>
>>> I could not directly apply your patch to the wintun repo, since it does
>>> not follow our code style, commit message conventions and is not
>>> Signed-off-by you.
>>>
>>> However, it would have been a terrible waste of your research if your
>>> contribution wouldn't get into wintun, so I reworked your PR and applied
>>> it here:
>>> https://git.zx2c4.com/wintun/commit/?id=607c181ea9fa0036d23598038e6019ad54db5ce4
>>>
>>> Please, stay tuned for an official WHQL-signed release.
>>>
>>> Lep pozdrav | Best regards,
>>> Simon Rozman
>>> Amebis, d. o. o., Kamnik
>>>
>>> On 19. 2. 2026 20.32, odedkatz wrote:
>>>> I observed that the Ring->Head was taken and manipulated later on with just a `ReadULongAcquire` which isn't OK when 2 are trying to manipulate it later on based on the same received value.
>>>> ---
>>>> driver/wintun.c | 11 ++++++-----
>>>> 1 file changed, 6 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/driver/wintun.c b/driver/wintun.c
>>>> index d1f3b9f..65cd97e 100644
>>>> --- a/driver/wintun.c
>>>> +++ b/driver/wintun.c
>>>> @@ -284,13 +284,14 @@ TunSendNetBufferLists(
>>>> TUN_RING *Ring = Ctx->Device.Send.Ring;
>>>> ULONG RingCapacity = Ctx->Device.Send.Capacity;
>>>>
>>>> + KLOCK_QUEUE_HANDLE LockHandle;
>>>> + KeAcquireInStackQueuedSpinLock(&Ctx->Device.Send.Lock, &LockHandle);
>>>> /* Allocate space for packets in the ring. */
>>>> ULONG RingHead = ReadULongAcquire(&Ring->Head);
>>>> - if (Status = NDIS_STATUS_ADAPTER_NOT_READY, RingHead >= RingCapacity)
>>>> + if (Status = NDIS_STATUS_ADAPTER_NOT_READY, RingHead >= RingCapacity) {
>>>> + KeReleaseInStackQueuedSpinLock(&LockHandle);
>>>> goto skipNbl;
>>>> -
>>>> - KLOCK_QUEUE_HANDLE LockHandle;
>>>> - KeAcquireInStackQueuedSpinLock(&Ctx->Device.Send.Lock, &LockHandle);
>>>> + }
>>>>
>>>> ULONG RingTail = Ctx->Device.Send.RingTail;
>>>> ASSERT(RingTail < RingCapacity);
>>>> @@ -419,8 +420,8 @@ TunReturnNetBufferLists(NDIS_HANDLE MiniportAdapterContext, PNET_BUFFER_LIST Net
>>>> Ctx->Device.Receive.ActiveNbls.Head = NET_BUFFER_LIST_NEXT_NBL_EX(CompletedNbl);
>>>> if (!Ctx->Device.Receive.ActiveNbls.Head)
>>>> KeSetEvent(&Ctx->Device.Receive.ActiveNbls.Empty, IO_NO_INCREMENT, FALSE);
>>>> - KeReleaseInStackQueuedSpinLock(&LockHandle);
>>>> WriteULongRelease(&Ring->Head, TunNblGetOffset(CompletedNbl));
>>>> + KeReleaseInStackQueuedSpinLock(&LockHandle);
>>>> const MDL *TargetMdl = Ctx->Device.Receive.Mdl;
>>>> for (MDL *Mdl = NET_BUFFER_FIRST_MDL(NET_BUFFER_LIST_FIRST_NB(CompletedNbl)); Mdl; Mdl = Mdl->Next)
>>>> {
>>>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/1] in order to prevent buffer overrun (which was observed while sending multiple high throughput UDP streams from different threads) I move the driver spinlock to protect Ring buffer Head.
2026-02-27 22:03 ` Simon Rozman
@ 2026-02-28 1:12 ` Oded Katz
0 siblings, 0 replies; 8+ messages in thread
From: Oded Katz @ 2026-02-28 1:12 UTC (permalink / raw)
To: Simon Rozman; +Cc: wireguard, odedk
Thanks for the update.
Have a wonderful weekend
Oded
On Fri, Feb 27, 2026 at 2:03 PM Simon Rozman <simon.rozman@amebis.si> wrote:
>
> Yes, Oded. Your research is much appreciated. I have stare-reviewed it
> immediately, and it makes sense: by slightly extending the critical
> section a race condition should be fixed. Unfortunately, haven't had
> time to apply it to the master branch yet. Hopefully next week.
> Definitely before a release.
>
> Lep pozdrav | Best regards,
> Simon Rozman
> Amebis, d. o. o., Kamnik
>
> On 27.2.2026 22:28, Oded Katz wrote:
> > Hi Simon,
> >
> > We have found another issue with the driver code.
> > some race condition between "alertable" flag between driver and
> > user-side. causing driver to stall until new packet wakes it up.
> > We have seen some cases where it can be for periods of 5 sec.
> >
> > we already made the fix, and I sent 2 approaches for that fix.
> > please let us know if you are planning to solve it in your repo?
> >
> > regards and thanks,
> > Oded
> >
> > On Fri, Feb 20, 2026 at 8:55 AM Oded Katz <katz.oded@gmail.com> wrote:
> >>
> >> Hi Simon,
> >>
> >> Thanks for the update.
> >> FYI: the ReadULongAcquire() and WriteULongRelease() themself are good
> >> enough for getting and updating an atomic operation.
> >> however, after on thread updates the head, the second thread can't
> >> stay with the same atomic read value of the Head.
> >> - so assuming 2 thread read the Head on the same time so the 2 threads
> >> has same head.
> >> - now 1st thread takes the spin lock and updates the head
> >> - 2nd thread is pending the job until the spinlock is released and
> >> then it doing the job, but with the non-updated Head value
> >> - so it will overrun the previous thread data and mess the ringbuffer
> >>
> >>
> >> again thanks for providing the next release
> >> Regards,
> >> Oded
> >>
> >> On Thu, Feb 19, 2026 at 11:28 PM Simon Rozman <simon.rozman@amebis.si> wrote:
> >>>
> >>> Hi, Oded!
> >>>
> >>> First and foremost, thank you very much for taking time to look into
> >>> this and troubleshoot it.
> >>>
> >>> It was believed, that ReadULongAcquire() and WriteULongRelease() alone
> >>> provide atomic manipulation with Ring->Head on all modern platforms.
> >>> Hence, these calls were made outside the spinlock, to squeeze an extra
> >>> micrometer of performance.
> >>>
> >>> I could not directly apply your patch to the wintun repo, since it does
> >>> not follow our code style, commit message conventions and is not
> >>> Signed-off-by you.
> >>>
> >>> However, it would have been a terrible waste of your research if your
> >>> contribution wouldn't get into wintun, so I reworked your PR and applied
> >>> it here:
> >>> https://git.zx2c4.com/wintun/commit/?id=607c181ea9fa0036d23598038e6019ad54db5ce4
> >>>
> >>> Please, stay tuned for an official WHQL-signed release.
> >>>
> >>> Lep pozdrav | Best regards,
> >>> Simon Rozman
> >>> Amebis, d. o. o., Kamnik
> >>>
> >>> On 19. 2. 2026 20.32, odedkatz wrote:
> >>>> I observed that the Ring->Head was taken and manipulated later on with just a `ReadULongAcquire` which isn't OK when 2 are trying to manipulate it later on based on the same received value.
> >>>> ---
> >>>> driver/wintun.c | 11 ++++++-----
> >>>> 1 file changed, 6 insertions(+), 5 deletions(-)
> >>>>
> >>>> diff --git a/driver/wintun.c b/driver/wintun.c
> >>>> index d1f3b9f..65cd97e 100644
> >>>> --- a/driver/wintun.c
> >>>> +++ b/driver/wintun.c
> >>>> @@ -284,13 +284,14 @@ TunSendNetBufferLists(
> >>>> TUN_RING *Ring = Ctx->Device.Send.Ring;
> >>>> ULONG RingCapacity = Ctx->Device.Send.Capacity;
> >>>>
> >>>> + KLOCK_QUEUE_HANDLE LockHandle;
> >>>> + KeAcquireInStackQueuedSpinLock(&Ctx->Device.Send.Lock, &LockHandle);
> >>>> /* Allocate space for packets in the ring. */
> >>>> ULONG RingHead = ReadULongAcquire(&Ring->Head);
> >>>> - if (Status = NDIS_STATUS_ADAPTER_NOT_READY, RingHead >= RingCapacity)
> >>>> + if (Status = NDIS_STATUS_ADAPTER_NOT_READY, RingHead >= RingCapacity) {
> >>>> + KeReleaseInStackQueuedSpinLock(&LockHandle);
> >>>> goto skipNbl;
> >>>> -
> >>>> - KLOCK_QUEUE_HANDLE LockHandle;
> >>>> - KeAcquireInStackQueuedSpinLock(&Ctx->Device.Send.Lock, &LockHandle);
> >>>> + }
> >>>>
> >>>> ULONG RingTail = Ctx->Device.Send.RingTail;
> >>>> ASSERT(RingTail < RingCapacity);
> >>>> @@ -419,8 +420,8 @@ TunReturnNetBufferLists(NDIS_HANDLE MiniportAdapterContext, PNET_BUFFER_LIST Net
> >>>> Ctx->Device.Receive.ActiveNbls.Head = NET_BUFFER_LIST_NEXT_NBL_EX(CompletedNbl);
> >>>> if (!Ctx->Device.Receive.ActiveNbls.Head)
> >>>> KeSetEvent(&Ctx->Device.Receive.ActiveNbls.Empty, IO_NO_INCREMENT, FALSE);
> >>>> - KeReleaseInStackQueuedSpinLock(&LockHandle);
> >>>> WriteULongRelease(&Ring->Head, TunNblGetOffset(CompletedNbl));
> >>>> + KeReleaseInStackQueuedSpinLock(&LockHandle);
> >>>> const MDL *TargetMdl = Ctx->Device.Receive.Mdl;
> >>>> for (MDL *Mdl = NET_BUFFER_FIRST_MDL(NET_BUFFER_LIST_FIRST_NB(CompletedNbl)); Mdl; Mdl = Mdl->Next)
> >>>> {
> >>>
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/1] in order to prevent buffer overrun (which was observed while sending multiple high throughput UDP streams from different threads) I move the driver spinlock to protect Ring buffer Head.
2026-02-20 7:28 ` Simon Rozman
2026-02-20 16:55 ` Oded Katz
@ 2026-03-11 23:35 ` Oded Katz
1 sibling, 0 replies; 8+ messages in thread
From: Oded Katz @ 2026-03-11 23:35 UTC (permalink / raw)
To: Simon Rozman; +Cc: wireguard, odedk
Hi Simon,
I still haven't heard from you guys, neither saw a ticket in the mailing list.
is there any update ?
regards,
Oded
On Thu, Feb 19, 2026 at 11:28 PM Simon Rozman <simon.rozman@amebis.si> wrote:
>
> Hi, Oded!
>
> First and foremost, thank you very much for taking time to look into
> this and troubleshoot it.
>
> It was believed, that ReadULongAcquire() and WriteULongRelease() alone
> provide atomic manipulation with Ring->Head on all modern platforms.
> Hence, these calls were made outside the spinlock, to squeeze an extra
> micrometer of performance.
>
> I could not directly apply your patch to the wintun repo, since it does
> not follow our code style, commit message conventions and is not
> Signed-off-by you.
>
> However, it would have been a terrible waste of your research if your
> contribution wouldn't get into wintun, so I reworked your PR and applied
> it here:
> https://git.zx2c4.com/wintun/commit/?id=607c181ea9fa0036d23598038e6019ad54db5ce4
>
> Please, stay tuned for an official WHQL-signed release.
>
> Lep pozdrav | Best regards,
> Simon Rozman
> Amebis, d. o. o., Kamnik
>
> On 19. 2. 2026 20.32, odedkatz wrote:
> > I observed that the Ring->Head was taken and manipulated later on with just a `ReadULongAcquire` which isn't OK when 2 are trying to manipulate it later on based on the same received value.
> > ---
> > driver/wintun.c | 11 ++++++-----
> > 1 file changed, 6 insertions(+), 5 deletions(-)
> >
> > diff --git a/driver/wintun.c b/driver/wintun.c
> > index d1f3b9f..65cd97e 100644
> > --- a/driver/wintun.c
> > +++ b/driver/wintun.c
> > @@ -284,13 +284,14 @@ TunSendNetBufferLists(
> > TUN_RING *Ring = Ctx->Device.Send.Ring;
> > ULONG RingCapacity = Ctx->Device.Send.Capacity;
> >
> > + KLOCK_QUEUE_HANDLE LockHandle;
> > + KeAcquireInStackQueuedSpinLock(&Ctx->Device.Send.Lock, &LockHandle);
> > /* Allocate space for packets in the ring. */
> > ULONG RingHead = ReadULongAcquire(&Ring->Head);
> > - if (Status = NDIS_STATUS_ADAPTER_NOT_READY, RingHead >= RingCapacity)
> > + if (Status = NDIS_STATUS_ADAPTER_NOT_READY, RingHead >= RingCapacity) {
> > + KeReleaseInStackQueuedSpinLock(&LockHandle);
> > goto skipNbl;
> > -
> > - KLOCK_QUEUE_HANDLE LockHandle;
> > - KeAcquireInStackQueuedSpinLock(&Ctx->Device.Send.Lock, &LockHandle);
> > + }
> >
> > ULONG RingTail = Ctx->Device.Send.RingTail;
> > ASSERT(RingTail < RingCapacity);
> > @@ -419,8 +420,8 @@ TunReturnNetBufferLists(NDIS_HANDLE MiniportAdapterContext, PNET_BUFFER_LIST Net
> > Ctx->Device.Receive.ActiveNbls.Head = NET_BUFFER_LIST_NEXT_NBL_EX(CompletedNbl);
> > if (!Ctx->Device.Receive.ActiveNbls.Head)
> > KeSetEvent(&Ctx->Device.Receive.ActiveNbls.Empty, IO_NO_INCREMENT, FALSE);
> > - KeReleaseInStackQueuedSpinLock(&LockHandle);
> > WriteULongRelease(&Ring->Head, TunNblGetOffset(CompletedNbl));
> > + KeReleaseInStackQueuedSpinLock(&LockHandle);
> > const MDL *TargetMdl = Ctx->Device.Receive.Mdl;
> > for (MDL *Mdl = NET_BUFFER_FIRST_MDL(NET_BUFFER_LIST_FIRST_NB(CompletedNbl)); Mdl; Mdl = Mdl->Next)
> > {
>
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2026-03-11 23:35 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-19 19:32 [PATCH 0/1] This PR fixes a critical race condition in the Wintun driver that causes ring buffer overruns odedkatz
2026-02-19 19:32 ` [PATCH 1/1] in order to prevent buffer overrun (which was observed while sending multiple high throughput UDP streams from different threads) I move the driver spinlock to protect Ring buffer Head odedkatz
2026-02-20 7:28 ` Simon Rozman
2026-02-20 16:55 ` Oded Katz
2026-02-27 21:28 ` Oded Katz
2026-02-27 22:03 ` Simon Rozman
2026-02-28 1:12 ` Oded Katz
2026-03-11 23:35 ` Oded Katz
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox