CVE-2026-43194: net: consume xmit errors of GSO frames

All of lore.kernel.org
 help / color / mirror / Atom feed

* CVE-2026-43194: net: consume xmit errors of GSO frames
@ 2026-05-06 11:28 Greg Kroah-Hartman
  0 siblings, 0 replies; only message in thread
From: Greg Kroah-Hartman @ 2026-05-06 11:28 UTC (permalink / raw)
  To: linux-cve-announce; +Cc: Greg Kroah-Hartman

From: Greg Kroah-Hartman <gregkh@kernel.org>

Description
===========

In the Linux kernel, the following vulnerability has been resolved:

net: consume xmit errors of GSO frames

udpgro_frglist.sh and udpgro_bench.sh are the flakiest tests
currently in NIPA. They fail in the same exact way, TCP GRO
test stalls occasionally and the test gets killed after 10min.

These tests use veth to simulate GRO. They attach a trivial
("return XDP_PASS;") XDP program to the veth to force TSO off
and NAPI on.

Digging into the failure mode we can see that the connection
is completely stuck after a burst of drops. The sender's snd_nxt
is at sequence number N [1], but the receiver claims to have
received (rcv_nxt) up to N + 3 * MSS [2]. Last piece of the puzzle
is that senders rtx queue is not empty (let's say the block in
the rtx queue is at sequence number N - 4 * MSS [3]).

In this state, sender sends a retransmission from the rtx queue
with a single segment, and sequence numbers N-4*MSS:N-3*MSS [3].
Receiver sees it and responds with an ACK all the way up to
N + 3 * MSS [2]. But sender will reject this ack as TCP_ACK_UNSENT_DATA
because it has no recollection of ever sending data that far out [1].
And we are stuck.

The root cause is the mess of the xmit return codes. veth returns
an error when it can't xmit a frame. We end up with a loss event
like this:

  -------------------------------------------------
  |   GSO super frame 1   |   GSO super frame 2   |
  |-----------------------------------------------|
  | seg | seg | seg | seg | seg | seg | seg | seg |
  |  1  |  2  |  3  |  4  |  5  |  6  |  7  |  8  |
  -------------------------------------------------
     x    ok    ok    <ok>|  ok    ok    ok   <x>
                          \\
			   snd_nxt

"x" means packet lost by veth, and "ok" means it went thru.
Since veth has TSO disabled in this test it sees individual segments.
Segment 1 is on the retransmit queue and will be resent.

So why did the sender not advance snd_nxt even tho it clearly did
send up to seg 8? tcp_write_xmit() interprets the return code
from the core to mean that data has not been sent at all. Since
TCP deals with GSO super frames, not individual segment the crux
of the problem is that loss of a single segment can be interpreted
as loss of all. TCP only sees the last return code for the last
segment of the GSO frame (in <> brackets in the diagram above).

Of course for the problem to occur we need a setup or a device
without a Qdisc. Otherwise Qdisc layer disconnects the protocol
layer from the device errors completely.

We have multiple ways to fix this.

 1) make veth not return an error when it lost a packet.
    While this is what I think we did in the past, the issue keeps
    reappearing and it's annoying to debug. The game of whack
    a mole is not great.

 2) fix the damn return codes
    We only talk about NETDEV_TX_OK and NETDEV_TX_BUSY in the
    documentation, so maybe we should make the return code from
    ndo_start_xmit() a boolean. I like that the most, but perhaps
    some ancient, not-really-networking protocol would suffer.

 3) make TCP ignore the errors
    It is not entirely clear to me what benefit TCP gets from
    interpreting the result of ip_queue_xmit()? Specifically once
    the connection is established and we're pushing data - packet
    loss is just packet loss?

 4) this fix
    Ignore the rc in the Qdisc-less+GSO case, since it's unreliable.
    We already always return OK in the TCQ_F_CAN_BYPASS case.
    In the Qdisc-less case let's be a bit more conservative and only
    mask the GSO errors. This path is taken by non-IP-"networks"
    like CAN, MCTP etc, so we could regress some ancient thing.
    This is the simplest, but also maybe the hackiest fix?

Similar fix has been proposed by Eric in the past but never committed
because original reporter was working with an OOT driver and wasn't
providing feedback (see Link).

The Linux kernel CVE team has assigned CVE-2026-43194 to this issue.

Affected and fixed versions
===========================

	Issue introduced in 3.18 with commit 1f59533f9ca5634e7b8914252e48aee9d9cbe501 and fixed in 5.10.252 with commit ae3f627b45fbc3c776a4e484696f3cad7cbb4eca
	Issue introduced in 3.18 with commit 1f59533f9ca5634e7b8914252e48aee9d9cbe501 and fixed in 5.15.202 with commit 0c9de092ef8c50a7ee9612811566f0aa81d8d7b6
	Issue introduced in 3.18 with commit 1f59533f9ca5634e7b8914252e48aee9d9cbe501 and fixed in 6.1.165 with commit 56bd32c0edca34041a5c215887fcf562fae2e2db
	Issue introduced in 3.18 with commit 1f59533f9ca5634e7b8914252e48aee9d9cbe501 and fixed in 6.6.128 with commit 9ac6aebef4b4bfc5ed408b0b65645981574bc780
	Issue introduced in 3.18 with commit 1f59533f9ca5634e7b8914252e48aee9d9cbe501 and fixed in 6.12.75 with commit ea5d7787635e26ec1194ec7eec0e8e5ae3bd10a5
	Issue introduced in 3.18 with commit 1f59533f9ca5634e7b8914252e48aee9d9cbe501 and fixed in 6.18.16 with commit 4cb163e9efcac4cd35c3043e097f25081a5c015c
	Issue introduced in 3.18 with commit 1f59533f9ca5634e7b8914252e48aee9d9cbe501 and fixed in 6.19.6 with commit c86901d22c89a6bf4e2f013e948aaabc60869893
	Issue introduced in 3.18 with commit 1f59533f9ca5634e7b8914252e48aee9d9cbe501 and fixed in 7.0 with commit 7aa767d0d3d04e50ae94e770db7db8197f666970

Please see https://www.kernel.org for a full list of currently supported
kernel versions by the kernel community.

Unaffected versions might change over time as fixes are backported to
older supported kernel versions.  The official CVE entry at
	https://cve.org/CVERecord/?id=CVE-2026-43194
will be updated if fixes are backported, please check that for the most
up to date information about this issue.

Affected files
==============

The file(s) affected by this issue are:
	net/core/dev.c

Mitigation
==========

The Linux kernel CVE team recommends that you update to the latest
stable kernel version for this, and many other bugfixes.  Individual
changes are never tested alone, but rather are part of a larger kernel
release.  Cherry-picking individual commits is not recommended or
supported by the Linux kernel community at all.  If however, updating to
the latest release is impossible, the individual changes to resolve this
issue can be found at these commits:
	https://git.kernel.org/stable/c/ae3f627b45fbc3c776a4e484696f3cad7cbb4eca
	https://git.kernel.org/stable/c/0c9de092ef8c50a7ee9612811566f0aa81d8d7b6
	https://git.kernel.org/stable/c/56bd32c0edca34041a5c215887fcf562fae2e2db
	https://git.kernel.org/stable/c/9ac6aebef4b4bfc5ed408b0b65645981574bc780
	https://git.kernel.org/stable/c/ea5d7787635e26ec1194ec7eec0e8e5ae3bd10a5
	https://git.kernel.org/stable/c/4cb163e9efcac4cd35c3043e097f25081a5c015c
	https://git.kernel.org/stable/c/c86901d22c89a6bf4e2f013e948aaabc60869893
	https://git.kernel.org/stable/c/7aa767d0d3d04e50ae94e770db7db8197f666970

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2026-05-06 11:33 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-06 11:28 CVE-2026-43194: net: consume xmit errors of GSO frames Greg Kroah-Hartman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.