* [PATCH iwl-next v3 0/3] igc: add support for forcing link speed without autonegotiation
From: KhaiWenTan @ 2026-04-22 15:56 UTC (permalink / raw)
To: anthony.l.nguyen, przemyslaw.kitszel, andrew+netdev, davem,
edumazet, kuba, pabeni
Cc: intel-wired-lan, netdev, linux-kernel, faizal.abdul.rahim,
hong.aun.looi, khai.wen.tan, Faizal Rahim
From: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
This series adds support for forcing 10/100 Mb/s link speed via ethtool
when autonegotiation is disabled on the igc driver.
Changes in v3:
- Modify condition from "if (duplex == DUPLEX_HALF)" to
"if (duplex != DUPLEX_FULL)". (Simon Horman)
Changes in v2:
- When forcing half-duplex, set hw->fc.requested_mode = igc_fc_none,
since half-duplex cannot support flow control per IEEE 802.3.
(Simon Horman)
- Split the original single patch into three patches for clarity:
patches 1 and 2 are preparatory cleanups; patch 3 carries the
functional change.
v2 at:
https://patchwork.kernel.org/project/netdevbpf/patch/20260416015520.6090-4-khai.wen.tan@linux.intel.com/
v1 at:
https://patchwork.ozlabs.org/project/intel-wired-lan/patch/20260409072747.217836-1-khai.wen.tan@linux.intel.com/
Faizal Rahim (3):
igc: remove unused autoneg_failed field
igc: move autoneg-enabled settings into igc_handle_autoneg_enabled()
igc: add support for forcing link speed without autonegotiation
drivers/net/ethernet/intel/igc/igc_base.c | 35 +++-
drivers/net/ethernet/intel/igc/igc_defines.h | 9 +-
drivers/net/ethernet/intel/igc/igc_ethtool.c | 203 +++++++++++++------
drivers/net/ethernet/intel/igc/igc_hw.h | 10 +-
drivers/net/ethernet/intel/igc/igc_mac.c | 16 +-
drivers/net/ethernet/intel/igc/igc_main.c | 2 +-
drivers/net/ethernet/intel/igc/igc_phy.c | 65 +++++-
drivers/net/ethernet/intel/igc/igc_phy.h | 1 +
8 files changed, 251 insertions(+), 90 deletions(-)
--
2.43.0
^ permalink raw reply
* Re: [PATCH net-next v2 0/3] Add ZTE DingHai Ethernet PF driver
From: Andrew Lunn @ 2026-04-22 16:19 UTC (permalink / raw)
To: Junyang Han
Cc: netdev, davem, andrew+netdev, edumazet, kuba, pabeni, ran.ming,
han.chengfei, zhang.yanze
In-Reply-To: <20260422144901.2403456-1-han.junyang@zte.com.cn>
On Wed, Apr 22, 2026 at 10:48:58PM +0800, Junyang Han wrote:
> This series adds initial support for the ZTE DingHai Ethernet controller,
> a high-performance PCIe Ethernet device supporting SR-IOV, hardware
> offloading, and advanced virtualization features.
https://www.kernel.org/doc/html/latest/process/maintainer-netdev.html
Please read sections 1.3 and 1.4, particularly the bit in red.
Andrew
---
pw-bot: cr
^ permalink raw reply
* Re: [BUG] rxrpc: Client connection leak and BUG() call during kernel IO thread exit
From: Anderson Nascimento @ 2026-04-22 16:18 UTC (permalink / raw)
To: David Howells
Cc: netdev, Marc Dionne, Jakub Kicinski, David S. Miller,
Eric Dumazet, Paolo Abeni, linux-kernel, Jeffrey Altman,
Simon Horman
In-Reply-To: <2593154.1776874118@warthog.procyon.org.uk>
Hi David,
On Wed, Apr 22, 2026 at 1:08 PM David Howells <dhowells@redhat.com> wrote:
>
> Do you by any chance have a reproducer program for this?
Yes, you can find it below. The code is not polished, but it works.
> David
>
#include <stdio.h>
#include <string.h>
#include <keyutils.h>
struct rxrpc_key_data_v1 {
uint16_t security_index;
uint16_t ticket_length;
uint32_t expiry;
uint32_t kvno;
uint8_t session_key[8];
uint8_t ticket[];
};
#define TICKET_LENGTH 16349
int main(int argc,char *argv[]){
struct rxrpc_key_data_v1 *v1;
key_serial_t key;
char *key_description = "afs@2";
char payload[16384 + 4 + 100];
char ticket[16384 + 4];
char session_key[8];
unsigned int plen;
uint32_t kver = 1;
memset(&payload, '\0', sizeof(payload));
memset(&ticket, '\0', sizeof(ticket));
memset(&session_key, '\0', sizeof(session_key));
memcpy(&payload, &kver, sizeof(kver));
v1 = (struct rxrpc_key_data_v1 *)((char *)&payload + sizeof(kver));
v1->security_index = 2;
v1->ticket_length = TICKET_LENGTH;
v1->kvno = 1;
memcpy(v1->session_key, session_key, sizeof(v1->session_key));
memcpy(v1->ticket, &ticket, TICKET_LENGTH);
plen = sizeof(kver) + sizeof(struct rxrpc_key_data_v1) + TICKET_LENGTH;
key = add_key("rxrpc", key_description, payload, plen,
KEY_SPEC_PROCESS_KEYRING);
keyctl(KEYCTL_READ, key, payload, 4096);
return 0;
}
It generates the following splat.
[ 123.636173] ------------[ cut here ]------------
[ 123.636176] WARNING: CPU: 2 PID: 1528 at net/rxrpc/key.c:778
rxrpc_read+0x109/0x5c0 [rxrpc]
[ 123.636214] Modules linked in: fcrypt pcbc rxrpc ip6_udp_tunnel
krb5 udp_tunnel rfkill nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib
nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct
nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4
nf_tables intel_rapl_msr intel_rapl_common
intel_uncore_frequency_common intel_pmc_core pmt_telemetry
pmt_discovery pmt_class qrtr intel_pmc_ssram_telemetry intel_vsec rapl
vmw_balloon sunrpc vmxnet3 i2c_piix4 i2c_smbus binfmt_misc joydev loop
dm_multipath nfnetlink zram lz4hc_compress lz4_compress
vmw_vsock_vmci_transport vsock vmw_vmci xfs nvme nvme_core
polyval_clmulni ghash_clmulni_intel nvme_keyring vmwgfx nvme_auth hkdf
drm_ttm_helper ata_generic pata_acpi ttm serio_raw scsi_dh_rdac
scsi_dh_emc scsi_dh_alua i2c_dev fuse
[ 123.636257] CPU: 2 UID: 1000 PID: 1528 Comm: poc Not tainted
6.18.13-200.fc43.x86_64 #1 PREEMPT(lazy)
[ 123.636259] Hardware name: VMware, Inc. VMware Virtual
Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
[ 123.636260] RIP: 0010:rxrpc_read+0x109/0x5c0 [rxrpc]
[ 123.636284] Code: 03 66 83 f8 02 0f 85 5e 02 00 00 80 7b 02 00 74
9f f6 05 89 df 2a 00 04 0f 85 87 58 01 00 b8 28 00 00 00 b9 24 00 00
00 eb b1 <0f> 0b 48 c7 c0 fb ff ff ff 48 8b 54 24 40 65 48 2b 15 19 da
ea c3
[ 123.636285] RSP: 0018:ffffc9000274bc70 EFLAGS: 00010202
[ 123.636287] RAX: ffff8881082e0000 RBX: ffff888104a78e20 RCX: 0000000000000000
[ 123.636288] RDX: 0000000000000000 RSI: ffff88810aeac000 RDI: ffff8881037bf1f4
[ 123.636289] RBP: 0000000000004004 R08: 0000000000001000 R09: 0000000000000001
[ 123.636289] R10: 0000000000000004 R11: ffff88810aeac000 R12: 0000000000000010
[ 123.636290] R13: ffff88810aeac000 R14: 0000000000001000 R15: ffff8881023e9f00
[ 123.636291] FS: 00007f6f8611d740(0000) GS:ffff8882af726000(0000)
knlGS:0000000000000000
[ 123.636293] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 123.636293] CR2: 00007fff0bf73000 CR3: 0000000108146006 CR4: 00000000003706f0
[ 123.636312] Call Trace:
[ 123.636314] <TASK>
[ 123.636316] ? keyctl_read_key+0xec/0x230
[ 123.636320] keyctl_read_key+0x131/0x230
[ 123.636322] do_syscall_64+0x7e/0x7f0
[ 123.636325] ? __folio_mod_stat+0x2d/0x90
[ 123.636328] ? set_ptes.isra.0+0x36/0x80
[ 123.636329] ? do_anonymous_page+0x100/0x520
[ 123.636332] ? __handle_mm_fault+0x551/0x6a0
[ 123.636334] ? count_memcg_events+0xd6/0x220
[ 123.636337] ? handle_mm_fault+0x248/0x360
[ 123.636339] ? do_user_addr_fault+0x21a/0x690
[ 123.636341] ? clear_bhb_loop+0x50/0xa0
[ 123.636344] ? clear_bhb_loop+0x50/0xa0
[ 123.636345] ? clear_bhb_loop+0x50/0xa0
[ 123.636346] ? clear_bhb_loop+0x50/0xa0
[ 123.636347] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 123.636349] RIP: 0033:0x7f6f8621338d
[ 123.636356] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e
fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24
08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 43 5a 0f 00 f7 d8 64 89
01 48
[ 123.636357] RSP: 002b:00007fff0bf70528 EFLAGS: 00000246 ORIG_RAX:
00000000000000fa
[ 123.636358] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f6f8621338d
[ 123.636359] RDX: 00007fff0bf74640 RSI: 000000003809ba43 RDI: 000000000000000b
[ 123.636360] RBP: 00007fff0bf70600 R08: 00000000fffffffe R09: 0000003000000008
[ 123.636361] R10: 0000000000001000 R11: 0000000000000246 R12: 00007fff0bf787e8
[ 123.636362] R13: 0000000000000001 R14: 00007f6f86361000 R15: 0000000000402df0
[ 123.636364] </TASK>
[ 123.636365] ---[ end trace 0000000000000000 ]---
--
Anderson Nascimento
Allele Security Intelligence
https://www.allelesecurity.com
^ permalink raw reply
* Re: [PATCH] net/stmmac: Fix typos: 'tx_undeflow_irq' -> 'tx_underflow_irq'
From: Andrew Lunn @ 2026-04-22 16:15 UTC (permalink / raw)
To: Jakub Raczynski
Cc: netdev, linux-kernel, kuba, davem, andrew+netdev, kernel-janitors,
linux-arm-kernel, linux-stm32
In-Reply-To: <aejYCYObZyFPpLat@AMDC4622.eu.corp.samsungelectronics.net>
On Wed, Apr 22, 2026 at 04:15:37PM +0200, Jakub Raczynski wrote:
> On Wed, Apr 22, 2026 at 02:47:38PM +0200, Andrew Lunn wrote:
> > > I don't see anything wrong with it?
> > > - naming is correct, same as stmmac_extra_stats from common.h, as it
> > > wouldn't compile otherwise
> > > - string length is ok, as max name length is ETH_GSTRING_LEN=32 and it is
> > > not close
> > > - ethtool just polls data from driver and in my tests it is ok
> > > - all instances of 'undeflow' are changed
> > > - 'underflow' semantic is ok, 'undeflow' is just not correct
> > >
> > > Please correct me if I am wrong, but imo no issues with this patch.
> >
> > ABI
> >
> > This name is published as part of the kAPI. You are changing its
> > name. User space could be looking for this name, even thought it has a
> > typo in it.
> >
> > Andrew
> >
> I don't think it is? This part of extra stats (struct stmmac_extra_stats) and
> is not part of standard ABI from
> Documentation/ABI/testing/sysfs-class-net-statistics
> nor is mentioned in
> Documentation/networking/device_drivers/ethernet/stmicro/stmmac.rst
>
> These extra stats are specific to stmmac driver and most of these are more
> than standard
> https://www.kernel.org/doc/html/v7.0/networking/statistics.html#c.rtnl_link_stats64
> This name does not exist outside stmmac driver, so while some application may
> expect this (stmmac specific app), question is should this typo stick?
47dd7a540b8a0 drivers/net/stmmac/stmmac_ethtool.c (Giuseppe Cavallaro 2009-10-14 15:13:45 -0700 81) STMMAC_STAT(tx_undeflow_irq),
It has been exposed to user space for 17 years. In that time, there
could well be stmmac specific apps using it.
Just because it is not documented as ABI does not make it not ABI.
Andrew
^ permalink raw reply
* [PATCH net v2 6/6] rxrpc: Fix missing validation of ticket length in non-XDR key preparsing
From: David Howells @ 2026-04-22 16:14 UTC (permalink / raw)
To: netdev
Cc: David Howells, Marc Dionne, Jakub Kicinski, David S. Miller,
Eric Dumazet, Paolo Abeni, Simon Horman, Anderson Nascimento,
linux-afs, linux-kernel, Jeffrey Altman, stable
In-Reply-To: <20260422161438.2593376-1-dhowells@redhat.com>
From: Anderson Nascimento <anderson@allelesecurity.com>
In rxrpc_preparse(), there are two paths for parsing key payloads: the
XDR path (for large payloads) and the non-XDR path (for payloads <= 28
bytes). While the XDR path (rxrpc_preparse_xdr_rxkad()) correctly
validates the ticket length against AFSTOKEN_RK_TIX_MAX, the non-XDR
path fails to do so.
This allows an unprivileged user to provide a very large ticket length.
When this key is later read via rxrpc_read(), the total
token size (toksize) calculation results in a value that exceeds
AFSTOKEN_LENGTH_MAX, triggering a WARN_ON().
[ 2001.302904] WARNING: CPU: 2 PID: 2108 at net/rxrpc/key.c:778 rxrpc_read+0x109/0x5c0 [rxrpc]
Fix this by adding a check in the non-XDR parsing path of rxrpc_preparse()
to ensure the ticket length does not exceed AFSTOKEN_RK_TIX_MAX,
bringing it into parity with the XDR parsing logic.
Fixes: 8a7a3eb4ddbe ("KEYS: RxRPC: Use key preparsing")
Fixes: 84924aac08a4 ("rxrpc: Fix checker warning")
Reported-by: Anderson Nascimento <anderson@allelesecurity.com>
Signed-off-by: Anderson Nascimento <anderson@allelesecurity.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Jeffrey Altman <jaltman@auristor.com>
cc: Eric Dumazet <edumazet@google.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Simon Horman <horms@kernel.org>
cc: linux-afs@lists.infradead.org
cc: netdev@vger.kernel.org
cc: stable@kernel.org
---
net/rxrpc/key.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/net/rxrpc/key.c b/net/rxrpc/key.c
index 6301d79ee35a..5ebb06d87cdd 100644
--- a/net/rxrpc/key.c
+++ b/net/rxrpc/key.c
@@ -502,6 +502,10 @@ static int rxrpc_preparse(struct key_preparsed_payload *prep)
if (v1->security_index != RXRPC_SECURITY_RXKAD)
goto error;
+ ret = -EKEYREJECTED;
+ if(v1->ticket_length > AFSTOKEN_RK_TIX_MAX)
+ goto error;
+
plen = sizeof(*token->kad) + v1->ticket_length;
prep->quotalen += plen + sizeof(*token);
^ permalink raw reply related
* [PATCH net v2 5/6] rxgk: Fix potential integer overflow in length check
From: David Howells @ 2026-04-22 16:14 UTC (permalink / raw)
To: netdev
Cc: David Howells, Marc Dionne, Jakub Kicinski, David S. Miller,
Eric Dumazet, Paolo Abeni, Simon Horman, Anderson Nascimento,
linux-afs, linux-kernel, Jeffrey Altman, stable
In-Reply-To: <20260422161438.2593376-1-dhowells@redhat.com>
Fix potential integer overflow in rxgk_extract_token() when checking the
length of the ticket. Rather than rounding up the value to be tested
(which might overflow), round down the size of the available data.
Fixes: 2429a1976481 ("rxrpc: Fix untrusted unsigned subtract")
Closes: https://sashiko.dev/#/patchset/20260408121252.2249051-1-dhowells%40redhat.com
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Jeffrey Altman <jaltman@auristor.com>
cc: Eric Dumazet <edumazet@google.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Simon Horman <horms@kernel.org>
cc: linux-afs@lists.infradead.org
cc: netdev@vger.kernel.org
cc: stable@kernel.org
---
net/rxrpc/rxgk_app.c | 2 +-
net/rxrpc/rxgk_common.h | 1 +
2 files changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/rxrpc/rxgk_app.c b/net/rxrpc/rxgk_app.c
index 30275cb5ba3e..5587639d60c5 100644
--- a/net/rxrpc/rxgk_app.c
+++ b/net/rxrpc/rxgk_app.c
@@ -214,7 +214,7 @@ int rxgk_extract_token(struct rxrpc_connection *conn, struct sk_buff *skb,
ticket_len = ntohl(container.token_len);
ticket_offset = token_offset + sizeof(container);
- if (xdr_round_up(ticket_len) > token_len - sizeof(container))
+ if (ticket_len > xdr_round_down(token_len - sizeof(container)))
goto short_packet;
_debug("KVNO %u", kvno);
diff --git a/net/rxrpc/rxgk_common.h b/net/rxrpc/rxgk_common.h
index 80164d89e19c..1e257d7ab8ec 100644
--- a/net/rxrpc/rxgk_common.h
+++ b/net/rxrpc/rxgk_common.h
@@ -34,6 +34,7 @@ struct rxgk_context {
};
#define xdr_round_up(x) (round_up((x), sizeof(__be32)))
+#define xdr_round_down(x) (round_down((x), sizeof(__be32)))
#define xdr_object_len(x) (4 + xdr_round_up(x))
/*
^ permalink raw reply related
* [PATCH net v2 4/6] rxrpc: Fix conn-level packet handling to unshare RESPONSE packets
From: David Howells @ 2026-04-22 16:14 UTC (permalink / raw)
To: netdev
Cc: David Howells, Marc Dionne, Jakub Kicinski, David S. Miller,
Eric Dumazet, Paolo Abeni, Simon Horman, Anderson Nascimento,
linux-afs, linux-kernel, Jeffrey Altman, stable
In-Reply-To: <20260422161438.2593376-1-dhowells@redhat.com>
The security operations that verify the RESPONSE packets decrypt bits of it
in place - however, the sk_buff may be shared with a packet sniffer, which
would lead to the sniffer seeing an apparently corrupt packet (actually
decrypted).
Fix this by handing a copy of the packet off to the specific security
handler if the packet was cloned.
Fixes: 17926a79320a ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both")
Closes: https://sashiko.dev/#/patchset/20260408121252.2249051-1-dhowells%40redhat.com
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Jeffrey Altman <jaltman@auristor.com>
cc: Eric Dumazet <edumazet@google.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Simon Horman <horms@kernel.org>
cc: linux-afs@lists.infradead.org
cc: netdev@vger.kernel.org
cc: stable@kernel.org
---
net/rxrpc/conn_event.c | 29 ++++++++++++++++++++++++++++-
1 file changed, 28 insertions(+), 1 deletion(-)
diff --git a/net/rxrpc/conn_event.c b/net/rxrpc/conn_event.c
index 9a41ec708aeb..aee977291d90 100644
--- a/net/rxrpc/conn_event.c
+++ b/net/rxrpc/conn_event.c
@@ -240,6 +240,33 @@ static void rxrpc_call_is_secure(struct rxrpc_call *call)
rxrpc_notify_socket(call);
}
+static int rxrpc_verify_response(struct rxrpc_connection *conn,
+ struct sk_buff *skb)
+{
+ int ret;
+
+ if (skb_cloned(skb)) {
+ /* Copy the packet if shared so that we can do in-place
+ * decryption.
+ */
+ struct sk_buff *nskb = skb_copy(skb, GFP_NOFS);
+
+ if (nskb) {
+ rxrpc_new_skb(nskb, rxrpc_skb_new_unshared);
+ ret = conn->security->verify_response(conn, nskb);
+ rxrpc_free_skb(nskb, rxrpc_skb_put_response_copy);
+ } else {
+ /* OOM - Drop the packet. */
+ rxrpc_see_skb(skb, rxrpc_skb_see_unshare_nomem);
+ ret = -ENOMEM;
+ }
+ } else {
+ ret = conn->security->verify_response(conn, skb);
+ }
+
+ return ret;
+}
+
/*
* connection-level Rx packet processor
*/
@@ -270,7 +297,7 @@ static int rxrpc_process_event(struct rxrpc_connection *conn,
}
spin_unlock_irq(&conn->state_lock);
- ret = conn->security->verify_response(conn, skb);
+ ret = rxrpc_verify_response(conn, skb);
if (ret < 0)
return ret;
^ permalink raw reply related
* [PATCH net v2 3/6] rxrpc: Fix potential UAF after skb_unshare() failure
From: David Howells @ 2026-04-22 16:14 UTC (permalink / raw)
To: netdev
Cc: David Howells, Marc Dionne, Jakub Kicinski, David S. Miller,
Eric Dumazet, Paolo Abeni, Simon Horman, Anderson Nascimento,
linux-afs, linux-kernel, Jeffrey Altman, stable
In-Reply-To: <20260422161438.2593376-1-dhowells@redhat.com>
If skb_unshare() fails to unshare a packet due to allocation failure in
rxrpc_input_packet(), the skb pointer in the parent (rxrpc_io_thread())
will be NULL'd out. This will likely cause the call to
trace_rxrpc_rx_done() to oops.
Fix this by moving the unsharing down to where rxrpc_input_call_event()
calls rxrpc_input_call_packet(). There are a number of places prior to
that where we ignore DATA packets for a variety of reasons (such as the
call already being complete) for which an unshare is then avoided.
And with that, rxrpc_input_packet() doesn't need to take a pointer to the
pointer to the packet, so change that to just a pointer.
Fixes: 2d1faf7a0ca3 ("rxrpc: Simplify skbuff accounting in receive path")
Closes: https://sashiko.dev/#/patchset/20260408121252.2249051-1-dhowells%40redhat.com
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Jeffrey Altman <jaltman@auristor.com>
cc: Eric Dumazet <edumazet@google.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Simon Horman <horms@kernel.org>
cc: linux-afs@lists.infradead.org
cc: netdev@vger.kernel.org
cc: stable@kernel.org
---
include/trace/events/rxrpc.h | 4 ++--
net/rxrpc/ar-internal.h | 1 -
net/rxrpc/call_event.c | 19 ++++++++++++++++++-
net/rxrpc/io_thread.c | 24 ++----------------------
net/rxrpc/skbuff.c | 9 ---------
5 files changed, 22 insertions(+), 35 deletions(-)
diff --git a/include/trace/events/rxrpc.h b/include/trace/events/rxrpc.h
index 5820d7e41ea0..13b9d017f8e1 100644
--- a/include/trace/events/rxrpc.h
+++ b/include/trace/events/rxrpc.h
@@ -162,8 +162,6 @@
E_(rxrpc_call_poke_timer_now, "Timer-now")
#define rxrpc_skb_traces \
- EM(rxrpc_skb_eaten_by_unshare, "ETN unshare ") \
- EM(rxrpc_skb_eaten_by_unshare_nomem, "ETN unshar-nm") \
EM(rxrpc_skb_get_call_rx, "GET call-rx ") \
EM(rxrpc_skb_get_conn_secured, "GET conn-secd") \
EM(rxrpc_skb_get_conn_work, "GET conn-work") \
@@ -190,6 +188,7 @@
EM(rxrpc_skb_put_purge, "PUT purge ") \
EM(rxrpc_skb_put_purge_oob, "PUT purge-oob") \
EM(rxrpc_skb_put_response, "PUT response ") \
+ EM(rxrpc_skb_put_response_copy, "PUT resp-cpy ") \
EM(rxrpc_skb_put_rotate, "PUT rotate ") \
EM(rxrpc_skb_put_unknown, "PUT unknown ") \
EM(rxrpc_skb_see_conn_work, "SEE conn-work") \
@@ -198,6 +197,7 @@
EM(rxrpc_skb_see_recvmsg_oob, "SEE recvm-oob") \
EM(rxrpc_skb_see_reject, "SEE reject ") \
EM(rxrpc_skb_see_rotate, "SEE rotate ") \
+ EM(rxrpc_skb_see_unshare_nomem, "SEE unshar-nm") \
E_(rxrpc_skb_see_version, "SEE version ")
#define rxrpc_local_traces \
diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
index 96ecb83c9071..27c2aa2dd023 100644
--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
@@ -1486,7 +1486,6 @@ int rxrpc_server_keyring(struct rxrpc_sock *, sockptr_t, int);
void rxrpc_kernel_data_consumed(struct rxrpc_call *, struct sk_buff *);
void rxrpc_new_skb(struct sk_buff *, enum rxrpc_skb_trace);
void rxrpc_see_skb(struct sk_buff *, enum rxrpc_skb_trace);
-void rxrpc_eaten_skb(struct sk_buff *, enum rxrpc_skb_trace);
void rxrpc_get_skb(struct sk_buff *, enum rxrpc_skb_trace);
void rxrpc_free_skb(struct sk_buff *, enum rxrpc_skb_trace);
void rxrpc_purge_queue(struct sk_buff_head *);
diff --git a/net/rxrpc/call_event.c b/net/rxrpc/call_event.c
index fec59d9338b9..cc8f9dfa44e8 100644
--- a/net/rxrpc/call_event.c
+++ b/net/rxrpc/call_event.c
@@ -332,7 +332,24 @@ bool rxrpc_input_call_event(struct rxrpc_call *call)
saw_ack |= sp->hdr.type == RXRPC_PACKET_TYPE_ACK;
- rxrpc_input_call_packet(call, skb);
+ if (sp->hdr.securityIndex != 0 &&
+ skb_cloned(skb)) {
+ /* Unshare the packet so that it can be
+ * modified by in-place decryption.
+ */
+ struct sk_buff *nskb = skb_copy(skb, GFP_ATOMIC);
+
+ if (nskb) {
+ rxrpc_new_skb(nskb, rxrpc_skb_new_unshared);
+ rxrpc_input_call_packet(call, nskb);
+ rxrpc_free_skb(nskb, rxrpc_skb_put_call_rx);
+ } else {
+ /* OOM - Drop the packet. */
+ rxrpc_see_skb(skb, rxrpc_skb_see_unshare_nomem);
+ }
+ } else {
+ rxrpc_input_call_packet(call, skb);
+ }
rxrpc_free_skb(skb, rxrpc_skb_put_call_rx);
did_receive = true;
}
diff --git a/net/rxrpc/io_thread.c b/net/rxrpc/io_thread.c
index 697956931925..dc5184a2fa9d 100644
--- a/net/rxrpc/io_thread.c
+++ b/net/rxrpc/io_thread.c
@@ -192,13 +192,12 @@ static bool rxrpc_extract_abort(struct sk_buff *skb)
/*
* Process packets received on the local endpoint
*/
-static bool rxrpc_input_packet(struct rxrpc_local *local, struct sk_buff **_skb)
+static bool rxrpc_input_packet(struct rxrpc_local *local, struct sk_buff *skb)
{
struct rxrpc_connection *conn;
struct sockaddr_rxrpc peer_srx;
struct rxrpc_skb_priv *sp;
struct rxrpc_peer *peer = NULL;
- struct sk_buff *skb = *_skb;
bool ret = false;
skb_pull(skb, sizeof(struct udphdr));
@@ -244,25 +243,6 @@ static bool rxrpc_input_packet(struct rxrpc_local *local, struct sk_buff **_skb)
return rxrpc_bad_message(skb, rxrpc_badmsg_zero_call);
if (sp->hdr.seq == 0)
return rxrpc_bad_message(skb, rxrpc_badmsg_zero_seq);
-
- /* Unshare the packet so that it can be modified for in-place
- * decryption.
- */
- if (sp->hdr.securityIndex != 0) {
- skb = skb_unshare(skb, GFP_ATOMIC);
- if (!skb) {
- rxrpc_eaten_skb(*_skb, rxrpc_skb_eaten_by_unshare_nomem);
- *_skb = NULL;
- return just_discard;
- }
-
- if (skb != *_skb) {
- rxrpc_eaten_skb(*_skb, rxrpc_skb_eaten_by_unshare);
- *_skb = skb;
- rxrpc_new_skb(skb, rxrpc_skb_new_unshared);
- sp = rxrpc_skb(skb);
- }
- }
break;
case RXRPC_PACKET_TYPE_CHALLENGE:
@@ -494,7 +474,7 @@ int rxrpc_io_thread(void *data)
switch (skb->mark) {
case RXRPC_SKB_MARK_PACKET:
skb->priority = 0;
- if (!rxrpc_input_packet(local, &skb))
+ if (!rxrpc_input_packet(local, skb))
rxrpc_reject_packet(local, skb);
trace_rxrpc_rx_done(skb->mark, skb->priority);
rxrpc_free_skb(skb, rxrpc_skb_put_input);
diff --git a/net/rxrpc/skbuff.c b/net/rxrpc/skbuff.c
index 3bcd6ee80396..e2169d1a14b5 100644
--- a/net/rxrpc/skbuff.c
+++ b/net/rxrpc/skbuff.c
@@ -46,15 +46,6 @@ void rxrpc_get_skb(struct sk_buff *skb, enum rxrpc_skb_trace why)
skb_get(skb);
}
-/*
- * Note the dropping of a ref on a socket buffer by the core.
- */
-void rxrpc_eaten_skb(struct sk_buff *skb, enum rxrpc_skb_trace why)
-{
- int n = atomic_inc_return(&rxrpc_n_rx_skbs);
- trace_rxrpc_skb(skb, 0, n, why);
-}
-
/*
* Note the destruction of a socket buffer.
*/
^ permalink raw reply related
* [PATCH net v2 2/6] rxrpc: Fix rxkad crypto unalignment handling
From: David Howells @ 2026-04-22 16:14 UTC (permalink / raw)
To: netdev
Cc: David Howells, Marc Dionne, Jakub Kicinski, David S. Miller,
Eric Dumazet, Paolo Abeni, Simon Horman, Anderson Nascimento,
linux-afs, linux-kernel, Jeffrey Altman, stable
In-Reply-To: <20260422161438.2593376-1-dhowells@redhat.com>
Fix handling of a packet with a misaligned crypto length. Also handle
non-ENOMEM errors from decryption by aborting. Further, remove the
WARN_ON_ONCE() so that it can't be remotely triggered (a trace line can
still be emitted).
Fixes: f93af41b9f5f ("rxrpc: Fix missing error checks for rxkad encryption/decryption failure")
Closes: https://sashiko.dev/#/patchset/20260408121252.2249051-1-dhowells%40redhat.com
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Jeffrey Altman <jaltman@auristor.com>
cc: Eric Dumazet <edumazet@google.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Simon Horman <horms@kernel.org>
cc: linux-afs@lists.infradead.org
cc: netdev@vger.kernel.org
cc: stable@kernel.org
---
include/trace/events/rxrpc.h | 1 +
net/rxrpc/rxkad.c | 9 +++++++--
2 files changed, 8 insertions(+), 2 deletions(-)
diff --git a/include/trace/events/rxrpc.h b/include/trace/events/rxrpc.h
index 578b8038b211..5820d7e41ea0 100644
--- a/include/trace/events/rxrpc.h
+++ b/include/trace/events/rxrpc.h
@@ -37,6 +37,7 @@
EM(rxkad_abort_1_short_encdata, "rxkad1-short-encdata") \
EM(rxkad_abort_1_short_header, "rxkad1-short-hdr") \
EM(rxkad_abort_2_short_check, "rxkad2-short-check") \
+ EM(rxkad_abort_2_crypto_unaligned, "rxkad2-crypto-unaligned") \
EM(rxkad_abort_2_short_data, "rxkad2-short-data") \
EM(rxkad_abort_2_short_header, "rxkad2-short-hdr") \
EM(rxkad_abort_2_short_len, "rxkad2-short-len") \
diff --git a/net/rxrpc/rxkad.c b/net/rxrpc/rxkad.c
index 5a720222854f..cba7935977f0 100644
--- a/net/rxrpc/rxkad.c
+++ b/net/rxrpc/rxkad.c
@@ -510,6 +510,9 @@ static int rxkad_verify_packet_2(struct rxrpc_call *call, struct sk_buff *skb,
return rxrpc_abort_eproto(call, skb, RXKADSEALEDINCON,
rxkad_abort_2_short_header);
+ /* Don't let the crypto algo see a misaligned length. */
+ sp->len = round_down(sp->len, 8);
+
/* Decrypt the skbuff in-place. TODO: We really want to decrypt
* directly into the target buffer.
*/
@@ -543,8 +546,10 @@ static int rxkad_verify_packet_2(struct rxrpc_call *call, struct sk_buff *skb,
if (sg != _sg)
kfree(sg);
if (ret < 0) {
- WARN_ON_ONCE(ret != -ENOMEM);
- return ret;
+ if (ret == -ENOMEM)
+ return ret;
+ return rxrpc_abort_eproto(call, skb, RXKADSEALEDINCON,
+ rxkad_abort_2_crypto_unaligned);
}
/* Extract the decrypted packet length */
^ permalink raw reply related
* [PATCH net v2 1/6] rxrpc: Fix memory leaks in rxkad_verify_response()
From: David Howells @ 2026-04-22 16:14 UTC (permalink / raw)
To: netdev
Cc: David Howells, Marc Dionne, Jakub Kicinski, David S. Miller,
Eric Dumazet, Paolo Abeni, Simon Horman, Anderson Nascimento,
linux-afs, linux-kernel, Jeffrey Altman, stable
In-Reply-To: <20260422161438.2593376-1-dhowells@redhat.com>
Fix rxkad_verify_response() to free the ticket and the server key under all
circumstances by initialising the ticket pointer to NULL and then making
all paths through the function after the first allocation has been done go
through a single common epilogue that just releases everything - where all
the releases skip on a NULL pointer.
Fixes: 57af281e5389 ("rxrpc: Tidy up abort generation infrastructure")
Fixes: ec832bd06d6f ("rxrpc: Don't retain the server key in the connection")
Closes: https://sashiko.dev/#/patchset/20260408121252.2249051-1-dhowells%40redhat.com
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Jeffrey Altman <jaltman@auristor.com>
cc: Eric Dumazet <edumazet@google.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Simon Horman <horms@kernel.org>
cc: linux-afs@lists.infradead.org
cc: netdev@vger.kernel.org
cc: stable@kernel.org
---
net/rxrpc/rxkad.c | 103 +++++++++++++++++++---------------------------
1 file changed, 42 insertions(+), 61 deletions(-)
diff --git a/net/rxrpc/rxkad.c b/net/rxrpc/rxkad.c
index eb7f2769d2b1..5a720222854f 100644
--- a/net/rxrpc/rxkad.c
+++ b/net/rxrpc/rxkad.c
@@ -1136,7 +1136,7 @@ static int rxkad_verify_response(struct rxrpc_connection *conn,
struct rxrpc_crypt session_key;
struct key *server_key;
time64_t expiry;
- void *ticket;
+ void *ticket = NULL;
u32 version, kvno, ticket_len, level;
__be32 csum;
int ret, i;
@@ -1162,13 +1162,13 @@ static int rxkad_verify_response(struct rxrpc_connection *conn,
ret = -ENOMEM;
response = kzalloc_obj(struct rxkad_response, GFP_NOFS);
if (!response)
- goto temporary_error;
+ goto error;
if (skb_copy_bits(skb, sizeof(struct rxrpc_wire_header),
response, sizeof(*response)) < 0) {
- rxrpc_abort_conn(conn, skb, RXKADPACKETSHORT, -EPROTO,
- rxkad_abort_resp_short);
- goto protocol_error;
+ ret = rxrpc_abort_conn(conn, skb, RXKADPACKETSHORT, -EPROTO,
+ rxkad_abort_resp_short);
+ goto error;
}
version = ntohl(response->version);
@@ -1178,62 +1178,62 @@ static int rxkad_verify_response(struct rxrpc_connection *conn,
trace_rxrpc_rx_response(conn, sp->hdr.serial, version, kvno, ticket_len);
if (version != RXKAD_VERSION) {
- rxrpc_abort_conn(conn, skb, RXKADINCONSISTENCY, -EPROTO,
- rxkad_abort_resp_version);
- goto protocol_error;
+ ret = rxrpc_abort_conn(conn, skb, RXKADINCONSISTENCY, -EPROTO,
+ rxkad_abort_resp_version);
+ goto error;
}
if (ticket_len < 4 || ticket_len > MAXKRB5TICKETLEN) {
- rxrpc_abort_conn(conn, skb, RXKADTICKETLEN, -EPROTO,
- rxkad_abort_resp_tkt_len);
- goto protocol_error;
+ ret = rxrpc_abort_conn(conn, skb, RXKADTICKETLEN, -EPROTO,
+ rxkad_abort_resp_tkt_len);
+ goto error;
}
if (kvno >= RXKAD_TKT_TYPE_KERBEROS_V5) {
- rxrpc_abort_conn(conn, skb, RXKADUNKNOWNKEY, -EPROTO,
- rxkad_abort_resp_unknown_tkt);
- goto protocol_error;
+ ret = rxrpc_abort_conn(conn, skb, RXKADUNKNOWNKEY, -EPROTO,
+ rxkad_abort_resp_unknown_tkt);
+ goto error;
}
/* extract the kerberos ticket and decrypt and decode it */
ret = -ENOMEM;
ticket = kmalloc(ticket_len, GFP_NOFS);
if (!ticket)
- goto temporary_error_free_resp;
+ goto error;
if (skb_copy_bits(skb, sizeof(struct rxrpc_wire_header) + sizeof(*response),
ticket, ticket_len) < 0) {
- rxrpc_abort_conn(conn, skb, RXKADPACKETSHORT, -EPROTO,
- rxkad_abort_resp_short_tkt);
- goto protocol_error;
+ ret = rxrpc_abort_conn(conn, skb, RXKADPACKETSHORT, -EPROTO,
+ rxkad_abort_resp_short_tkt);
+ goto error;
}
ret = rxkad_decrypt_ticket(conn, server_key, skb, ticket, ticket_len,
&session_key, &expiry);
if (ret < 0)
- goto temporary_error_free_ticket;
+ goto error;
/* use the session key from inside the ticket to decrypt the
* response */
ret = rxkad_decrypt_response(conn, response, &session_key);
if (ret < 0)
- goto temporary_error_free_ticket;
+ goto error;
if (ntohl(response->encrypted.epoch) != conn->proto.epoch ||
ntohl(response->encrypted.cid) != conn->proto.cid ||
ntohl(response->encrypted.securityIndex) != conn->security_ix) {
- rxrpc_abort_conn(conn, skb, RXKADSEALEDINCON, -EPROTO,
- rxkad_abort_resp_bad_param);
- goto protocol_error_free;
+ ret = rxrpc_abort_conn(conn, skb, RXKADSEALEDINCON, -EPROTO,
+ rxkad_abort_resp_bad_param);
+ goto error;
}
csum = response->encrypted.checksum;
response->encrypted.checksum = 0;
rxkad_calc_response_checksum(response);
if (response->encrypted.checksum != csum) {
- rxrpc_abort_conn(conn, skb, RXKADSEALEDINCON, -EPROTO,
- rxkad_abort_resp_bad_checksum);
- goto protocol_error_free;
+ ret = rxrpc_abort_conn(conn, skb, RXKADSEALEDINCON, -EPROTO,
+ rxkad_abort_resp_bad_checksum);
+ goto error;
}
for (i = 0; i < RXRPC_MAXCALLS; i++) {
@@ -1241,38 +1241,38 @@ static int rxkad_verify_response(struct rxrpc_connection *conn,
u32 counter = READ_ONCE(conn->channels[i].call_counter);
if (call_id > INT_MAX) {
- rxrpc_abort_conn(conn, skb, RXKADSEALEDINCON, -EPROTO,
- rxkad_abort_resp_bad_callid);
- goto protocol_error_free;
+ ret = rxrpc_abort_conn(conn, skb, RXKADSEALEDINCON, -EPROTO,
+ rxkad_abort_resp_bad_callid);
+ goto error;
}
if (call_id < counter) {
- rxrpc_abort_conn(conn, skb, RXKADSEALEDINCON, -EPROTO,
- rxkad_abort_resp_call_ctr);
- goto protocol_error_free;
+ ret = rxrpc_abort_conn(conn, skb, RXKADSEALEDINCON, -EPROTO,
+ rxkad_abort_resp_call_ctr);
+ goto error;
}
if (call_id > counter) {
if (conn->channels[i].call) {
- rxrpc_abort_conn(conn, skb, RXKADSEALEDINCON, -EPROTO,
+ ret = rxrpc_abort_conn(conn, skb, RXKADSEALEDINCON, -EPROTO,
rxkad_abort_resp_call_state);
- goto protocol_error_free;
+ goto error;
}
conn->channels[i].call_counter = call_id;
}
}
if (ntohl(response->encrypted.inc_nonce) != conn->rxkad.nonce + 1) {
- rxrpc_abort_conn(conn, skb, RXKADOUTOFSEQUENCE, -EPROTO,
- rxkad_abort_resp_ooseq);
- goto protocol_error_free;
+ ret = rxrpc_abort_conn(conn, skb, RXKADOUTOFSEQUENCE, -EPROTO,
+ rxkad_abort_resp_ooseq);
+ goto error;
}
level = ntohl(response->encrypted.level);
if (level > RXRPC_SECURITY_ENCRYPT) {
- rxrpc_abort_conn(conn, skb, RXKADLEVELFAIL, -EPROTO,
- rxkad_abort_resp_level);
- goto protocol_error_free;
+ ret = rxrpc_abort_conn(conn, skb, RXKADLEVELFAIL, -EPROTO,
+ rxkad_abort_resp_level);
+ goto error;
}
conn->security_level = level;
@@ -1280,31 +1280,12 @@ static int rxkad_verify_response(struct rxrpc_connection *conn,
* this the connection security can be handled in exactly the same way
* as for a client connection */
ret = rxrpc_get_server_data_key(conn, &session_key, expiry, kvno);
- if (ret < 0)
- goto temporary_error_free_ticket;
-
- kfree(ticket);
- kfree(response);
- _leave(" = 0");
- return 0;
-protocol_error_free:
- kfree(ticket);
-protocol_error:
- kfree(response);
- key_put(server_key);
- return -EPROTO;
-
-temporary_error_free_ticket:
+error:
kfree(ticket);
-temporary_error_free_resp:
kfree(response);
-temporary_error:
- /* Ignore the response packet if we got a temporary error such as
- * ENOMEM. We just want to send the challenge again. Note that we
- * also come out this way if the ticket decryption fails.
- */
key_put(server_key);
+ _leave(" = %d", ret);
return ret;
}
^ permalink raw reply related
* [PATCH net v2 0/6] rxrpc: Miscellaneous fixes
From: David Howells @ 2026-04-22 16:14 UTC (permalink / raw)
To: netdev
Cc: David Howells, Marc Dionne, Jakub Kicinski, David S. Miller,
Eric Dumazet, Paolo Abeni, Simon Horman, Anderson Nascimento,
linux-afs, linux-kernel
Here are some fixes for rxrpc, as found by Sashiko[1]:
(1) Fix leaks in rxkad_verify_response().
(2) Fix handling of rxkad-encrypted packets with crypto-misaligned
lengths.
(3) Fix problem with unsharing DATA packets potentially causing a crash in
the caller.
(4) Fix lack of unsharing of RESPONSE packets.
(5) Fix integer overflow in RxGK ticket length check.
(6) Fix missing length check in RxKAD tickets.
David
The patches can be found here also:
http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=rxrpc-fixes
Changes
=======
ver #2)
- Use of __free() constructs in networking code is disallowed, so rework
the rxkad_verify_response() patch to just clean everything up at the end
and cope with NULL pointers.
- Reworked the unsharing fix:
- Used skb_cloned() and skb_copy() directly rather than skb_unshare().
The problem with skb_unshare() is that it kills the source skbuff if it
can't copy, which then has to be propagated up the call chain. Even
so, the code still had an bug from this[1].
- Split into two patches, one for DATA and one for RESPONSE packets.
- Do the DATA unshare a lot further along.
- Imported a patch to add a length check on RxKAD tickets.
Link: https://sashiko.dev/#/patchset/20260408121252.2249051-1-dhowells%40redhat.com [1]
Anderson Nascimento (1):
rxrpc: Fix missing validation of ticket length in non-XDR key
preparsing
David Howells (5):
rxrpc: Fix memory leaks in rxkad_verify_response()
rxrpc: Fix rxkad crypto unalignment handling
rxrpc: Fix potential UAF after skb_unshare() failure
rxrpc: Fix conn-level packet handling to unshare RESPONSE packets
rxgk: Fix potential integer overflow in length check
include/trace/events/rxrpc.h | 5 +-
net/rxrpc/ar-internal.h | 1 -
net/rxrpc/call_event.c | 19 +++++-
net/rxrpc/conn_event.c | 29 ++++++++-
net/rxrpc/io_thread.c | 24 +-------
net/rxrpc/key.c | 4 ++
net/rxrpc/rxgk_app.c | 2 +-
net/rxrpc/rxgk_common.h | 1 +
net/rxrpc/rxkad.c | 112 +++++++++++++++--------------------
net/rxrpc/skbuff.c | 9 ---
10 files changed, 106 insertions(+), 100 deletions(-)
^ permalink raw reply
* Re: [PATCH net v2 2/2] net: airoha: Add size check for TX NAPIs in airoha_qdma_cleanup()
From: Lorenzo Bianconi @ 2026-04-22 16:12 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni
Cc: Simon Horman, linux-arm-kernel, linux-mediatek, netdev
In-Reply-To: <20260420-airoha_qdma_init_rx_queue-fix-v2-2-d99347e5c18d@kernel.org>
[-- Attachment #1: Type: text/plain, Size: 2645 bytes --]
> If airoha_qdma_init routine fails before airoha_qdma_tx_irq_init() runs
> successfully for all TX NAPIs, airoha_qdma_cleanup() will
> unconditionally runs netif_napi_del() on TX NAPIs, triggering a NULL
> pointer dereference. Fix the issue relying on q_tx_irq size value to
> check if the TX NAPIs is properly initialized in airoha_qdma_cleanup().
> Moreover, run netif_napi_add_tx() just if irq_q queue is properly
> allocated.
>
> Fixes: 23020f049327 ("net: airoha: Introduce ethernet support for EN7581 SoC")
> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> ---
> drivers/net/ethernet/airoha/airoha_eth.c | 11 ++++++++---
> 1 file changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/ethernet/airoha/airoha_eth.c b/drivers/net/ethernet/airoha/airoha_eth.c
> index fc79c456743c..fd8c4f817d85 100644
> --- a/drivers/net/ethernet/airoha/airoha_eth.c
> +++ b/drivers/net/ethernet/airoha/airoha_eth.c
> @@ -996,8 +996,6 @@ static int airoha_qdma_tx_irq_init(struct airoha_tx_irq_queue *irq_q,
> struct airoha_eth *eth = qdma->eth;
> dma_addr_t dma_addr;
>
> - netif_napi_add_tx(eth->napi_dev, &irq_q->napi,
> - airoha_qdma_tx_napi_poll);
> irq_q->q = dmam_alloc_coherent(eth->dev, size * sizeof(u32),
> &dma_addr, GFP_KERNEL);
> if (!irq_q->q)
> @@ -1007,6 +1005,9 @@ static int airoha_qdma_tx_irq_init(struct airoha_tx_irq_queue *irq_q,
> irq_q->size = size;
> irq_q->qdma = qdma;
>
> + netif_napi_add_tx(eth->napi_dev, &irq_q->napi,
> + airoha_qdma_tx_napi_poll);
> +
> airoha_qdma_wr(qdma, REG_TX_IRQ_BASE(id), dma_addr);
> airoha_qdma_rmw(qdma, REG_TX_IRQ_CFG(id), TX_IRQ_DEPTH_MASK,
> FIELD_PREP(TX_IRQ_DEPTH_MASK, size));
> @@ -1398,8 +1399,12 @@ static void airoha_qdma_cleanup(struct airoha_qdma *qdma)
> }
> }
>
> - for (i = 0; i < ARRAY_SIZE(qdma->q_tx_irq); i++)
> + for (i = 0; i < ARRAY_SIZE(qdma->q_tx_irq); i++) {
> + if (!qdma->q_tx_irq[i].size)
> + continue;
> +
> netif_napi_del(&qdma->q_tx_irq[i].napi);
> + }
>
> for (i = 0; i < ARRAY_SIZE(qdma->q_tx); i++) {
> if (!qdma->q_tx[i].ndesc)
>
> --
> 2.53.0
>
Commenting the issue reported by Sashiko here:
https://sashiko.dev/#/patchset/20260420-airoha_qdma_init_rx_queue-fix-v2-0-d99347e5c18d%40kernel.org
- Could a similar vulnerability still exist in the TX queue initialization and cleanup path?
This issue is not related to this patch and already fixed here:
https://patchwork.kernel.org/project/netdevbpf/patch/20260417-airoha_qdma_cleanup_tx_queue-fix-net-v4-1-e04bcc2c9642@kernel.org/
Regards,
Lorenzo
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply
* Re: [PATCH net v3 8/8] xsk: don't support AF_XDP on 32-bit architectures
From: Alexander Lobakin @ 2026-04-22 16:09 UTC (permalink / raw)
To: Jason Xing
Cc: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
john.fastabend, bpf, netdev, Jason Xing
In-Reply-To: <20260422033650.68457-9-kerneljasonxing@gmail.com>
From: Jason Xing <kerneljasonxing@gmail.com>
Date: Wed, 22 Apr 2026 11:36:50 +0800
> From: Jason Xing <kernelxing@tencent.com>
>
> In copy mode TX, xsk_skb_destructor_set_addr() stores the 64-bit
> descriptor address into skb_shinfo(skb)->destructor_arg (void *) via a
> uintptr_t cast:
>
> skb_shinfo(skb)->destructor_arg = (void *)((uintptr_t)addr | 0x1UL);
>
> On 32-bit architectures uintptr_t is 32 bits, so the upper 32 bits of
> the descriptor address are silently dropped. In XDP_ZEROCOPY unaligned
> mode the chunk offset is encoded in bits 48-63 of the descriptor
> address (XSK_UNALIGNED_BUF_OFFSET_SHIFT = 48), meaning the offset is
> lost entirely. The completion queue then returns a truncated address to
> userspace, making buffer recycling impossible.
What if we relax the restriction a bit? For example, refuse to configure
an XSk socket in unaligned mode if on a 32-bit arch? Or add a check
under CONFIG_32_BIT like it was done in Page Pool:
skb_shinfo(skb)->destructor_arg = (void *)((uintptr_t)addr | 0x1UL);
#ifdef CONFIG_32BIT
if (((uintptr_t)skb_shinfo(skb)->destructor_arg) & ~0x1UL) != addr)
// WARN_ONCE or whatever + error path
#endif
I never used XSk on a 32-bit arch, but back when I was working on 32-bit
MIPS 1G routers, I wanted to add native XSk support to the Eth driver.
Sure, just for fun, now that we have cheap AArch64 and other 64-bit
embedded chips, 32-bit embedded networking SoCs are almost dead, but
OTOH, as you can see, other subsystems like PP still try to support 32 bit.
Especially given that this issue applies to only to the skb XSk path,
not native in-driver implementations.
>
> Since we hear no one is using AF_XDP on 32-bit arch, we decided to
> strictly stop supporting it at compile time.
>
> Closes: https://lore.kernel.org/all/20260419045824.D9E5EC2BCAF@smtp.kernel.org/
> Fixes: 0ebc27a4c67d ("xsk: avoid data corruption on cq descriptor number")
> Suggested-by: Stanislav Fomichev <sdf@fomichev.me>
> Signed-off-by: Jason Xing <kernelxing@tencent.com>
> ---
> net/xdp/Kconfig | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/xdp/Kconfig b/net/xdp/Kconfig
> index 71af2febe72a..819aa5795f50 100644
> --- a/net/xdp/Kconfig
> +++ b/net/xdp/Kconfig
> @@ -1,7 +1,7 @@
> # SPDX-License-Identifier: GPL-2.0-only
> config XDP_SOCKETS
> bool "XDP sockets"
> - depends on BPF_SYSCALL
> + depends on BPF_SYSCALL && 64BIT
> default n
> help
> XDP sockets allows a channel between XDP programs and
Thanks,
Olek
^ permalink raw reply
* Re: [PATCH net v2 1/2] net: airoha: Move ndesc initialization at end of airoha_qdma_init_rx_queue()
From: Lorenzo Bianconi @ 2026-04-22 16:09 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni
Cc: Simon Horman, linux-arm-kernel, linux-mediatek, netdev
In-Reply-To: <20260420-airoha_qdma_init_rx_queue-fix-v2-1-d99347e5c18d@kernel.org>
[-- Attachment #1: Type: text/plain, Size: 2817 bytes --]
> If queue entry or DMA descriptor list allocation fails in
> airoha_qdma_init_rx_queue routine, airoha_qdma_cleanup() will trigger a
> NULL pointer dereference running netif_napi_del() for RX queue NAPIs
> since netif_napi_add() has never been executed to this particular RX NAPI.
> The issue is due to the early ndesc initialization in
> airoha_qdma_init_rx_queue() since airoha_qdma_cleanup() relies on ndesc
> value to check if the queue is properly initialized. Fix the issue moving
> ndesc initialization at end of airoha_qdma_init_tx routine.
> Move page_pool allocation after descriptor list allocation in order to
> avoid memory leaks if desc allocation fails.
>
> Fixes: 23020f049327 ("net: airoha: Introduce ethernet support for EN7581 SoC")
> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> ---
> drivers/net/ethernet/airoha/airoha_eth.c | 14 +++++++-------
> 1 file changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/net/ethernet/airoha/airoha_eth.c b/drivers/net/ethernet/airoha/airoha_eth.c
> index e1ab15f1ee7d..fc79c456743c 100644
> --- a/drivers/net/ethernet/airoha/airoha_eth.c
> +++ b/drivers/net/ethernet/airoha/airoha_eth.c
> @@ -745,14 +745,18 @@ static int airoha_qdma_init_rx_queue(struct airoha_queue *q,
> dma_addr_t dma_addr;
>
> q->buf_size = PAGE_SIZE / 2;
> - q->ndesc = ndesc;
> q->qdma = qdma;
>
> - q->entry = devm_kzalloc(eth->dev, q->ndesc * sizeof(*q->entry),
> + q->entry = devm_kzalloc(eth->dev, ndesc * sizeof(*q->entry),
> GFP_KERNEL);
> if (!q->entry)
> return -ENOMEM;
>
> + q->desc = dmam_alloc_coherent(eth->dev, ndesc * sizeof(*q->desc),
> + &dma_addr, GFP_KERNEL);
> + if (!q->desc)
> + return -ENOMEM;
> +
> q->page_pool = page_pool_create(&pp_params);
> if (IS_ERR(q->page_pool)) {
> int err = PTR_ERR(q->page_pool);
> @@ -761,11 +765,7 @@ static int airoha_qdma_init_rx_queue(struct airoha_queue *q,
> return err;
> }
>
> - q->desc = dmam_alloc_coherent(eth->dev, q->ndesc * sizeof(*q->desc),
> - &dma_addr, GFP_KERNEL);
> - if (!q->desc)
> - return -ENOMEM;
> -
> + q->ndesc = ndesc;
> netif_napi_add(eth->napi_dev, &q->napi, airoha_qdma_rx_napi_poll);
>
> airoha_qdma_wr(qdma, REG_RX_RING_BASE(qid), dma_addr);
>
> --
> 2.53.0
>
As requested, I am commenting the issue reported by Sashiko on this patch:
https://sashiko.dev/#/patchset/20260420-airoha_qdma_init_rx_queue-fix-v2-0-d99347e5c18d%40kernel.org
- Does this code leave a regression in the TX path by omitting the equivalent fix?
This issue is not related to this patch and already fixed here:
https://patchwork.kernel.org/project/netdevbpf/patch/20260417-airoha_qdma_cleanup_tx_queue-fix-net-v4-1-e04bcc2c9642@kernel.org/
Regards,
Lorenzo
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply
* [PATCH] net: dsa: realtek: rtl8365mb: add support for RTL8367SB
From: Mieczyslaw Nalewaj @ 2026-04-22 15:58 UTC (permalink / raw)
To: netdev@vger.kernel.org
In-Reply-To: <f5ac420f-0087-4af0-bb43-b9a5b6228fbd.ref@yahoo.com>
Add chip info entry for the Realtek RTL8367SB switch. This device has
chip ID 0x6367 and version 0x0010. It exposes two external interfaces:
port 6 supports SGMII and HSGMII, while port 7 supports MII, TMII,
RMII and RGMII. Use the existing 8365MB-VC jam table for initialization.
Signed-off-by: Mieczyslaw Nalewaj <namiltd@yahoo.com>
---
drivers/net/dsa/realtek/rtl8365mb.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/drivers/net/dsa/realtek/rtl8365mb.c b/drivers/net/dsa/realtek/rtl8365mb.c
index 073f12ca8028..84d6fdb94a96 100644
--- a/drivers/net/dsa/realtek/rtl8365mb.c
+++ b/drivers/net/dsa/realtek/rtl8365mb.c
@@ -545,6 +545,18 @@ static const struct rtl8365mb_chip_info
.jam_size = ARRAY_SIZE(rtl8365mb_init_jam_8365mb_vc),
},
{
+ .name = "RTL8367SB",
+ .chip_id = 0x6367,
+ .chip_ver = 0x0010,
+ .extints = {
+ { 6, 1, PHY_INTF(SGMII) | PHY_INTF(HSGMII) },
+ { 7, 2, PHY_INTF(MII) | PHY_INTF(TMII) |
+ PHY_INTF(RMII) | PHY_INTF(RGMII) },
+ },
+ .jam_table = rtl8365mb_init_jam_8365mb_vc,
+ .jam_size = ARRAY_SIZE(rtl8365mb_init_jam_8365mb_vc),
+ },
+ {
.name = "RTL8367RB-VB",
.chip_id = 0x6367,
.chip_ver = 0x0020,
--
2.53.0
^ permalink raw reply related
* Re: [BUG] rxrpc: Client connection leak and BUG() call during kernel IO thread exit
From: David Howells @ 2026-04-22 16:08 UTC (permalink / raw)
To: Anderson Nascimento
Cc: dhowells, netdev, Marc Dionne, Jakub Kicinski, David S. Miller,
Eric Dumazet, Paolo Abeni, linux-kernel, Jeffrey Altman,
Simon Horman
In-Reply-To: <CAPhRvkyFrKO3O=D9YZoWWVwykCZ1-fXkzA9NyaJ8aaX9b-pcXw@mail.gmail.com>
Do you by any chance have a reproducer program for this?
David
^ permalink raw reply
* [PATCH net v2] nfp: fix swapped arguments in nfp_encode_basic_qdr() calls
From: Alexey Kodanev @ 2026-04-22 16:05 UTC (permalink / raw)
To: netdev
Cc: Jakub Kicinski, Simon Horman, Andrew Lunn, David S . Miller,
Eric Dumazet, Paolo Abeni, oss-drivers, Alexey Kodanev
There is a mismatch between the passed arguments and the actual
nfp_encode_basic_qdr() function parameter names:
static int nfp_encode_basic_qdr(u64 addr, int dest_island, int cpp_tgt,
int mode, bool addr40, int isld1,
int isld0)
{
...
But "dest_island" and "cpp_tgt" are swapped at every call-site.
For example:
return nfp_encode_basic_qdr(*addr, cpp_tgt, dest_island,
mode, addr40, isld1, isld0);
As a result, nfp_encode_basic_qdr() receives "dest_island" as CPP target
type, which is always NFP_CPP_TARGET_QDR(2) for these calls, and "cpp_tgt"
as the destination island ID, which can accidentally match or be outside
the valid NFP_CPP_TARGET_* types (e.g. '-1' for any destination).
Since code already worked for years, also add extra pr_warn() to error
paths in nfp_encode_basic_qdr() to help identify any potential address
verification failures.
Detected using the static analysis tool - Svace.
Fixes: 4cb584e0ee7d ("nfp: add CPP access core")
Signed-off-by: Alexey Kodanev <aleksei.kodanev@bell-sw.com>
---
v2: tag patch for "net"
add extra warnings to nfp_encode_basic_qdr() error paths
.../ethernet/netronome/nfp/nfpcore/nfp_target.c | 17 +++++++++++------
1 file changed, 11 insertions(+), 6 deletions(-)
diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_target.c b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_target.c
index 79470f198a62..9cf19446657c 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_target.c
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_target.c
@@ -435,12 +435,17 @@ static int nfp_encode_basic_qdr(u64 addr, int dest_island, int cpp_tgt,
/* Full Island ID and channel bits overlap? */
ret = nfp_decode_basic(addr, &v, cpp_tgt, mode, addr40, isld1, isld0);
- if (ret)
+ if (ret) {
+ pr_warn("%s: decode dest_island failed: %d\n", __func__, ret);
return ret;
+ }
/* The current address won't go where expected? */
- if (dest_island != -1 && dest_island != v)
+ if (dest_island != -1 && dest_island != v) {
+ pr_warn("%s: dest_island mismatch: current (%d) != decoded (%d)\n",
+ __func__, dest_island, v);
return -EINVAL;
+ }
/* If dest_island was -1, we don't care where it goes. */
return 0;
@@ -493,7 +498,7 @@ static int nfp_encode_basic(u64 *addr, int dest_island, int cpp_tgt,
* the address but we can verify if the existing
* contents will point to a valid island.
*/
- return nfp_encode_basic_qdr(*addr, cpp_tgt, dest_island,
+ return nfp_encode_basic_qdr(*addr, dest_island, cpp_tgt,
mode, addr40, isld1, isld0);
iid_lsb = addr40 ? 34 : 26;
@@ -504,7 +509,7 @@ static int nfp_encode_basic(u64 *addr, int dest_island, int cpp_tgt,
return 0;
case 1:
if (cpp_tgt == NFP_CPP_TARGET_QDR && !addr40)
- return nfp_encode_basic_qdr(*addr, cpp_tgt, dest_island,
+ return nfp_encode_basic_qdr(*addr, dest_island, cpp_tgt,
mode, addr40, isld1, isld0);
idx_lsb = addr40 ? 39 : 31;
@@ -530,7 +535,7 @@ static int nfp_encode_basic(u64 *addr, int dest_island, int cpp_tgt,
* be set before hand and with them select an island.
* So we need to confirm that it's at least plausible.
*/
- return nfp_encode_basic_qdr(*addr, cpp_tgt, dest_island,
+ return nfp_encode_basic_qdr(*addr, dest_island, cpp_tgt,
mode, addr40, isld1, isld0);
/* Make sure we compare against isldN values
@@ -551,7 +556,7 @@ static int nfp_encode_basic(u64 *addr, int dest_island, int cpp_tgt,
* iid<1> = addr<30> = channel<0>
* channel<1> = addr<31> = Index
*/
- return nfp_encode_basic_qdr(*addr, cpp_tgt, dest_island,
+ return nfp_encode_basic_qdr(*addr, dest_island, cpp_tgt,
mode, addr40, isld1, isld0);
isld[0] &= ~3;
--
2.25.1
^ permalink raw reply related
* [PATCH net 6/6] net/ncsi: validate GP payload lengths before parsing
From: Michael Bommarito @ 2026-04-22 16:03 UTC (permalink / raw)
To: Samuel Mendoza-Jonas, Paul Fertser, netdev
Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, linux-kernel, Michael Bommarito, stable
In-Reply-To: <20260422160342.1975093-1-michael.bommarito@gmail.com>
ncsi_rsp_handler_gp() now bounds MAC and VLAN counts to software
and GC-reported limits, but it still assumes the advertised GP
payload is large enough for the fixed fields plus the consumed
filter-table bytes. A short GP reply can still make parsing start
past the payload or walk beyond its tail.
Validate that the declared GP payload covers the fixed GP prefix,
the consumed MAC and VLAN entries, and the checksum before parsing
the filter tables.
Fixes: 062b3e1b6d4f ("net/ncsi: Refactor MAC, VLAN filters")
Cc: stable@vger.kernel.org
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
---
net/ncsi/ncsi-rsp.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/net/ncsi/ncsi-rsp.c b/net/ncsi/ncsi-rsp.c
index 94354dca23ea..565d38fd4b92 100644
--- a/net/ncsi/ncsi-rsp.c
+++ b/net/ncsi/ncsi-rsp.c
@@ -899,6 +899,8 @@ static int ncsi_rsp_handler_gp(struct ncsi_request *nr)
struct ncsi_dev_priv *ndp = nr->ndp;
struct ncsi_rsp_gp_pkt *rsp;
struct ncsi_channel *nc;
+ size_t needed;
+ unsigned int payload;
unsigned short enable;
unsigned char *pdata;
unsigned long flags;
@@ -924,6 +926,14 @@ static int ncsi_rsp_handler_gp(struct ncsi_request *nr)
if (rsp->mac_cnt > mac_nbits || rsp->vlan_cnt > ncvf->n_vids)
return -ERANGE;
+ payload = ncsi_rsp_payload(nr->rsp);
+ needed = offsetof(struct ncsi_rsp_gp_pkt, mac) - sizeof(rsp->rsp);
+ needed += mac_cnt * ETH_ALEN;
+ needed += vlan_cnt * sizeof(__be16);
+ needed += sizeof(rsp->checksum);
+ if (payload < needed)
+ return -EINVAL;
+
/* Modes with explicit enabled indications */
if (ntohl(rsp->valid_modes) & 0x1) { /* BC filter mode */
nc->modes[NCSI_MODE_BC].enable = 1;
--
2.53.0
^ permalink raw reply related
* [PATCH net 5/6] net/ncsi: validate AEN packet lengths against the skb
From: Michael Bommarito @ 2026-04-22 16:03 UTC (permalink / raw)
To: Samuel Mendoza-Jonas, Paul Fertser, netdev
Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, linux-kernel, Michael Bommarito, stable
In-Reply-To: <20260422160342.1975093-1-michael.bommarito@gmail.com>
AEN packets are dispatched after only pulling the 16-byte common
header. ncsi_aen_handler() then reads the 20-byte AEN header to
select a per-type handler, and ncsi_validate_aen_pkt() walks
farther into the payload and checksum without first ensuring the
skb contains those bytes.
Pull the AEN-specific header before reading h->type, and pull the
full AEN header plus aligned payload before checksum validation.
That keeps short AEN packets from reading past the skb tail on the
AEN path.
Fixes: 2d283bdd079c ("net/ncsi: Resource management")
Cc: stable@vger.kernel.org
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
---
net/ncsi/ncsi-aen.c | 30 +++++++++++++++++++++++-------
1 file changed, 23 insertions(+), 7 deletions(-)
diff --git a/net/ncsi/ncsi-aen.c b/net/ncsi/ncsi-aen.c
index 040a31557201..cd34ef144cf8 100644
--- a/net/ncsi/ncsi-aen.c
+++ b/net/ncsi/ncsi-aen.c
@@ -16,11 +16,19 @@
#include "internal.h"
#include "ncsi-pkt.h"
-static int ncsi_validate_aen_pkt(struct ncsi_aen_pkt_hdr *h,
+static int ncsi_validate_aen_pkt(struct sk_buff *skb,
const unsigned short payload)
{
+ struct ncsi_aen_pkt_hdr *h;
u32 checksum;
__be32 *pchecksum;
+ unsigned int len;
+
+ len = skb_network_offset(skb) + sizeof(*h) + ALIGN(payload, 4);
+ if (!pskb_may_pull(skb, len))
+ return -EINVAL;
+
+ h = (struct ncsi_aen_pkt_hdr *)skb_network_header(skb);
if (h->common.revision != NCSI_PKT_REVISION)
return -EINVAL;
@@ -31,7 +39,7 @@ static int ncsi_validate_aen_pkt(struct ncsi_aen_pkt_hdr *h,
* sender doesn't support checksum according to NCSI
* specification.
*/
- pchecksum = (__be32 *)((void *)(h + 1) + payload - 4);
+ pchecksum = (__be32 *)((void *)(h + 1) + ALIGN(payload, 4) - 4);
if (ntohl(*pchecksum) == 0)
return 0;
@@ -210,12 +218,19 @@ int ncsi_aen_handler(struct ncsi_dev_priv *ndp, struct sk_buff *skb)
{
struct ncsi_aen_pkt_hdr *h;
struct ncsi_aen_handler *nah = NULL;
+ unsigned char type;
int i, ret;
+ if (!pskb_may_pull(skb, skb_network_offset(skb) + sizeof(*h))) {
+ ret = -EINVAL;
+ goto out;
+ }
+
/* Find the handler */
h = (struct ncsi_aen_pkt_hdr *)skb_network_header(skb);
+ type = h->type;
for (i = 0; i < ARRAY_SIZE(ncsi_aen_handlers); i++) {
- if (ncsi_aen_handlers[i].type == h->type) {
+ if (ncsi_aen_handlers[i].type == type) {
nah = &ncsi_aen_handlers[i];
break;
}
@@ -223,24 +238,25 @@ int ncsi_aen_handler(struct ncsi_dev_priv *ndp, struct sk_buff *skb)
if (!nah) {
netdev_warn(ndp->ndev.dev, "Invalid AEN (0x%x) received\n",
- h->type);
+ type);
ret = -ENOENT;
goto out;
}
- ret = ncsi_validate_aen_pkt(h, nah->payload);
+ ret = ncsi_validate_aen_pkt(skb, nah->payload);
if (ret) {
netdev_warn(ndp->ndev.dev,
"NCSI: 'bad' packet ignored for AEN type 0x%x\n",
- h->type);
+ type);
goto out;
}
+ h = (struct ncsi_aen_pkt_hdr *)skb_network_header(skb);
ret = nah->handler(ndp, h);
if (ret)
netdev_err(ndp->ndev.dev,
"NCSI: Handler for AEN type 0x%x returned %d\n",
- h->type, ret);
+ type, ret);
out:
consume_skb(skb);
return ret;
--
2.53.0
^ permalink raw reply related
* [PATCH net 4/6] net/ncsi: validate OEM response payloads before parsing
From: Michael Bommarito @ 2026-04-22 16:03 UTC (permalink / raw)
To: Samuel Mendoza-Jonas, Paul Fertser, netdev
Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, linux-kernel, Michael Bommarito, stable
In-Reply-To: <20260422160342.1975093-1-michael.bommarito@gmail.com>
Reject truncated OEM responses before reading the manufacturer ID,
vendor-specific subheaders, or vendor MAC address payloads.
The OEM response dispatcher reads rsp->mfr_id without verifying that the
skb contains the manufacturer field and checksum. The Mellanox,
Broadcom, and Intel handlers then read their command-specific headers
without checking that the payload is large enough for those fields. The
shared GMA helper finally copies a MAC address from a
manufacturer-specific offset without validating that the payload reaches
that offset.
Validate the advertised payload before each of those reads so malformed
or truncated BMC responses are rejected before the parser touches data
past the end of the skb.
Fixes: fb4ee67529ff ("net/ncsi: Add NCSI OEM command support")
Fixes: cb10c7c0dfd9 ("net/ncsi: Add NCSI Broadcom OEM command")
Fixes: 16e8c4ca21a2 ("net/ncsi: Add NCSI Mellanox OEM command")
Fixes: 205b95fe658d ("net/ncsi: add get MAC address command to get Intel i210 MAC address")
Cc: stable@vger.kernel.org
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
---
net/ncsi/ncsi-rsp.c | 26 ++++++++++++++++++++++++++
1 file changed, 26 insertions(+)
diff --git a/net/ncsi/ncsi-rsp.c b/net/ncsi/ncsi-rsp.c
index cbddb2012f90..94354dca23ea 100644
--- a/net/ncsi/ncsi-rsp.c
+++ b/net/ncsi/ncsi-rsp.c
@@ -656,6 +656,7 @@ static int ncsi_rsp_handler_oem_gma(struct ncsi_request *nr, int mfr_id)
struct net_device *ndev = ndp->ndev.dev;
struct ncsi_rsp_oem_pkt *rsp;
u32 mac_addr_off = 0;
+ unsigned int payload;
/* Get the response header */
rsp = (struct ncsi_rsp_oem_pkt *)skb_network_header(nr->rsp);
@@ -668,6 +669,11 @@ static int ncsi_rsp_handler_oem_gma(struct ncsi_request *nr, int mfr_id)
else if (mfr_id == NCSI_OEM_MFR_INTEL_ID)
mac_addr_off = INTEL_MAC_ADDR_OFFSET;
+ payload = ncsi_rsp_payload(nr->rsp);
+ if (payload < sizeof(rsp->mfr_id) + mac_addr_off + ETH_ALEN +
+ sizeof(__be32))
+ return -EINVAL;
+
saddr->ss_family = ndev->type;
memcpy(saddr->__data, &rsp->data[mac_addr_off], ETH_ALEN);
if (mfr_id == NCSI_OEM_MFR_BCM_ID || mfr_id == NCSI_OEM_MFR_INTEL_ID)
@@ -686,9 +692,14 @@ static int ncsi_rsp_handler_oem_mlx(struct ncsi_request *nr)
{
struct ncsi_rsp_oem_mlx_pkt *mlx;
struct ncsi_rsp_oem_pkt *rsp;
+ unsigned int payload;
/* Get the response header */
rsp = (struct ncsi_rsp_oem_pkt *)skb_network_header(nr->rsp);
+ payload = ncsi_rsp_payload(nr->rsp);
+ if (payload < sizeof(rsp->mfr_id) + sizeof(*mlx) + sizeof(__be32))
+ return -EINVAL;
+
mlx = (struct ncsi_rsp_oem_mlx_pkt *)(rsp->data);
if (mlx->cmd == NCSI_OEM_MLX_CMD_GMA &&
@@ -702,9 +713,14 @@ static int ncsi_rsp_handler_oem_bcm(struct ncsi_request *nr)
{
struct ncsi_rsp_oem_bcm_pkt *bcm;
struct ncsi_rsp_oem_pkt *rsp;
+ unsigned int payload;
/* Get the response header */
rsp = (struct ncsi_rsp_oem_pkt *)skb_network_header(nr->rsp);
+ payload = ncsi_rsp_payload(nr->rsp);
+ if (payload < sizeof(rsp->mfr_id) + sizeof(*bcm) + sizeof(__be32))
+ return -EINVAL;
+
bcm = (struct ncsi_rsp_oem_bcm_pkt *)(rsp->data);
if (bcm->type == NCSI_OEM_BCM_CMD_GMA)
@@ -717,9 +733,14 @@ static int ncsi_rsp_handler_oem_intel(struct ncsi_request *nr)
{
struct ncsi_rsp_oem_intel_pkt *intel;
struct ncsi_rsp_oem_pkt *rsp;
+ unsigned int payload;
/* Get the response header */
rsp = (struct ncsi_rsp_oem_pkt *)skb_network_header(nr->rsp);
+ payload = ncsi_rsp_payload(nr->rsp);
+ if (payload < sizeof(rsp->mfr_id) + sizeof(*intel) + sizeof(__be32))
+ return -EINVAL;
+
intel = (struct ncsi_rsp_oem_intel_pkt *)(rsp->data);
if (intel->cmd == NCSI_OEM_INTEL_CMD_GMA)
@@ -742,10 +763,15 @@ static int ncsi_rsp_handler_oem(struct ncsi_request *nr)
{
struct ncsi_rsp_oem_handler *nrh = NULL;
struct ncsi_rsp_oem_pkt *rsp;
+ unsigned int payload;
unsigned int mfr_id, i;
/* Get the response header */
rsp = (struct ncsi_rsp_oem_pkt *)skb_network_header(nr->rsp);
+ payload = ncsi_rsp_payload(nr->rsp);
+ if (payload < sizeof(rsp->mfr_id) + sizeof(__be32))
+ return -EINVAL;
+
mfr_id = ntohl(rsp->mfr_id);
/* Check for manufacturer id and Find the handler */
--
2.53.0
^ permalink raw reply related
* [PATCH net 3/6] net/ncsi: validate GMCMA address counts against the payload
From: Michael Bommarito @ 2026-04-22 16:03 UTC (permalink / raw)
To: Samuel Mendoza-Jonas, Paul Fertser, netdev
Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, linux-kernel, Michael Bommarito, stable
In-Reply-To: <20260422160342.1975093-1-michael.bommarito@gmail.com>
Get MC MAC Address responses carry a flexible array of provisioned
addresses, but the handler currently trusts address_count without first
checking that the advertised payload actually contains that many MAC
entries.
Validate the fixed GMCMA fields plus checksum, then make sure the
address_count fits in the remaining payload before the handler walks
the address array.
Fixes: b8291cf3d118 ("net/ncsi: Add NC-SI 1.2 Get MC MAC Address command")
Cc: stable@vger.kernel.org
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
---
net/ncsi/ncsi-rsp.c | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)
diff --git a/net/ncsi/ncsi-rsp.c b/net/ncsi/ncsi-rsp.c
index 47ddf2bbb13b..cbddb2012f90 100644
--- a/net/ncsi/ncsi-rsp.c
+++ b/net/ncsi/ncsi-rsp.c
@@ -40,6 +40,14 @@ static bool ncsi_filter_is_enabled(unsigned long enable, unsigned int index,
return index < nbits && (enable & BIT(index));
}
+static unsigned int ncsi_rsp_payload(struct sk_buff *skb)
+{
+ struct ncsi_rsp_pkt_hdr *h;
+
+ h = (struct ncsi_rsp_pkt_hdr *)skb_network_header(skb);
+ return ntohs(h->common.length);
+}
+
static int ncsi_validate_rsp_pkt(struct ncsi_request *nr,
unsigned short payload)
{
@@ -1127,9 +1135,21 @@ static int ncsi_rsp_handler_gmcma(struct ncsi_request *nr)
struct sockaddr_storage *saddr = &ndp->pending_mac;
struct net_device *ndev = ndp->ndev.dev;
struct ncsi_rsp_gmcma_pkt *rsp;
+ unsigned int addr_bytes;
+ unsigned int payload;
int i;
rsp = (struct ncsi_rsp_gmcma_pkt *)skb_network_header(nr->rsp);
+ payload = ncsi_rsp_payload(nr->rsp);
+ if (payload < sizeof(rsp->address_count) + sizeof(rsp->reserved) +
+ sizeof(__be32))
+ return -EINVAL;
+
+ addr_bytes = payload - sizeof(rsp->address_count) -
+ sizeof(rsp->reserved) - sizeof(__be32);
+ if (rsp->address_count > addr_bytes / ETH_ALEN)
+ return -EINVAL;
+
ndev->priv_flags |= IFF_LIVE_ADDR_CHANGE;
netdev_info(ndev, "NCSI: Received %d provisioned MAC addresses\n",
--
2.53.0
^ permalink raw reply related
* [PATCH net 2/6] net/ncsi: bound filter table state to software limits
From: Michael Bommarito @ 2026-04-22 16:03 UTC (permalink / raw)
To: Samuel Mendoza-Jonas, Paul Fertser, netdev
Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, linux-kernel, Michael Bommarito, stable
In-Reply-To: <20260422160342.1975093-1-michael.bommarito@gmail.com>
The NCSI filter state uses single-word bitmaps for both MAC and VLAN
entries, but Get Capabilities and Get Parameters responses can still
feed larger counts into that state.
Cap the stored VLAN table size to the bitmap width before it reaches
the manage-side bitmap walkers, reject GP tables that exceed the sizes
advertised by GC, and stop indexing the MAC filter bitmap past its
software capacity. Also stop shifting past the width of the enable
bitfields when GP reports more entries than fit in those masks.
This keeps oversized or inconsistent filter counts from turning into
out-of-bounds bitmap accesses and oversized table walks in the response
and manage paths. A follow-up patch in this series separately validates
that the GP payload actually covers the consumed MAC/VLAN table bytes.
A live x86_64/KASAN QEMU repro can drive this after GC advertises a
single MAC filter slot and GP then reports mac_cnt=65. Without this
change, KASAN reports a slab-out-of-bounds write in
ncsi_rsp_handler_gp(); with this change applied, the same reply is
rejected with -ERANGE.
Fixes: 062b3e1b6d4f ("net/ncsi: Refactor MAC, VLAN filters")
Cc: stable@vger.kernel.org
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
---
net/ncsi/ncsi-rsp.c | 46 ++++++++++++++++++++++++++++++++++++---------
1 file changed, 37 insertions(+), 9 deletions(-)
diff --git a/net/ncsi/ncsi-rsp.c b/net/ncsi/ncsi-rsp.c
index 1fe061ede26d..47ddf2bbb13b 100644
--- a/net/ncsi/ncsi-rsp.c
+++ b/net/ncsi/ncsi-rsp.c
@@ -22,6 +22,8 @@
/* Nibbles within [0xA, 0xF] add zero "0" to the returned value.
* Optional fields (encoded as 0xFF) will default to zero.
*/
+#define NCSI_FILTER_BITS BITS_PER_TYPE(u64)
+
static u8 decode_bcd_u8(u8 x)
{
int lo = x & 0xF;
@@ -32,6 +34,12 @@ static u8 decode_bcd_u8(u8 x)
return lo + hi * 10;
}
+static bool ncsi_filter_is_enabled(unsigned long enable, unsigned int index,
+ unsigned int nbits)
+{
+ return index < nbits && (enable & BIT(index));
+}
+
static int ncsi_validate_rsp_pkt(struct ncsi_request *nr,
unsigned short payload)
{
@@ -481,7 +489,8 @@ static int ncsi_rsp_handler_sma(struct ncsi_request *nr)
bitmap = &ncf->bitmap;
if (cmd->index == 0 ||
- cmd->index > ncf->n_uc + ncf->n_mc + ncf->n_mixed)
+ cmd->index > ncf->n_uc + ncf->n_mc + ncf->n_mixed ||
+ cmd->index > NCSI_FILTER_BITS)
return -ERANGE;
index = (cmd->index - 1) * ETH_ALEN;
@@ -798,6 +807,7 @@ static int ncsi_rsp_handler_gc(struct ncsi_request *nr)
struct ncsi_channel *nc;
struct ncsi_package *np;
size_t size;
+ unsigned int vlan_cnt;
/* Find the channel */
rsp = (struct ncsi_rsp_gc_pkt *)skb_network_header(nr->rsp);
@@ -819,6 +829,12 @@ static int ncsi_rsp_handler_gc(struct ncsi_request *nr)
nc->caps[NCSI_CAP_VLAN].cap = rsp->vlan_mode &
NCSI_CAP_VLAN_MASK;
+ vlan_cnt = min_t(unsigned int, rsp->vlan_cnt, NCSI_FILTER_BITS);
+ if (vlan_cnt != rsp->vlan_cnt)
+ netdev_warn(ndp->ndev.dev,
+ "NCSI: VLAN filter count %u exceeds software limit %u\n",
+ rsp->vlan_cnt, (unsigned int)NCSI_FILTER_BITS);
+
size = (rsp->uc_cnt + rsp->mc_cnt + rsp->mixed_cnt) * ETH_ALEN;
nc->mac_filter.addrs = kzalloc(size, GFP_ATOMIC);
if (!nc->mac_filter.addrs)
@@ -827,7 +843,7 @@ static int ncsi_rsp_handler_gc(struct ncsi_request *nr)
nc->mac_filter.n_mc = rsp->mc_cnt;
nc->mac_filter.n_mixed = rsp->mixed_cnt;
- nc->vlan_filter.vids = kcalloc(rsp->vlan_cnt,
+ nc->vlan_filter.vids = kcalloc(vlan_cnt,
sizeof(*nc->vlan_filter.vids),
GFP_ATOMIC);
if (!nc->vlan_filter.vids)
@@ -836,7 +852,7 @@ static int ncsi_rsp_handler_gc(struct ncsi_request *nr)
* configuration state
*/
nc->vlan_filter.bitmap = U64_MAX;
- nc->vlan_filter.n_vids = rsp->vlan_cnt;
+ nc->vlan_filter.n_vids = vlan_cnt;
np->ndp->channel_count = rsp->channel_cnt;
return 0;
@@ -853,6 +869,9 @@ static int ncsi_rsp_handler_gp(struct ncsi_request *nr)
unsigned char *pdata;
unsigned long flags;
void *bitmap;
+ unsigned int mac_cnt;
+ unsigned int mac_nbits;
+ unsigned int vlan_cnt;
int i;
/* Find the channel */
@@ -862,6 +881,15 @@ static int ncsi_rsp_handler_gp(struct ncsi_request *nr)
if (!nc)
return -ENODEV;
+ ncmf = &nc->mac_filter;
+ ncvf = &nc->vlan_filter;
+ mac_cnt = min_t(unsigned int, rsp->mac_cnt, NCSI_FILTER_BITS);
+ mac_nbits = ncmf->n_uc + ncmf->n_mc + ncmf->n_mixed;
+ vlan_cnt = min_t(unsigned int, rsp->vlan_cnt, ncvf->n_vids);
+
+ if (rsp->mac_cnt > mac_nbits || rsp->vlan_cnt > ncvf->n_vids)
+ return -ERANGE;
+
/* Modes with explicit enabled indications */
if (ntohl(rsp->valid_modes) & 0x1) { /* BC filter mode */
nc->modes[NCSI_MODE_BC].enable = 1;
@@ -887,11 +915,11 @@ static int ncsi_rsp_handler_gp(struct ncsi_request *nr)
/* MAC addresses filter table */
pdata = (unsigned char *)rsp + 48;
enable = rsp->mac_enable;
- ncmf = &nc->mac_filter;
spin_lock_irqsave(&nc->lock, flags);
bitmap = &ncmf->bitmap;
- for (i = 0; i < rsp->mac_cnt; i++, pdata += 6) {
- if (!(enable & (0x1 << i)))
+ for (i = 0; i < mac_cnt; i++, pdata += 6) {
+ if (!ncsi_filter_is_enabled(enable, i,
+ BITS_PER_TYPE(rsp->mac_enable)))
clear_bit(i, bitmap);
else
set_bit(i, bitmap);
@@ -902,11 +930,11 @@ static int ncsi_rsp_handler_gp(struct ncsi_request *nr)
/* VLAN filter table */
enable = ntohs(rsp->vlan_enable);
- ncvf = &nc->vlan_filter;
bitmap = &ncvf->bitmap;
spin_lock_irqsave(&nc->lock, flags);
- for (i = 0; i < rsp->vlan_cnt; i++, pdata += 2) {
- if (!(enable & (0x1 << i)))
+ for (i = 0; i < vlan_cnt; i++, pdata += 2) {
+ if (!ncsi_filter_is_enabled(enable, i,
+ BITS_PER_TYPE(rsp->vlan_enable)))
clear_bit(i, bitmap);
else
set_bit(i, bitmap);
--
2.53.0
^ permalink raw reply related
* [PATCH net 1/6] net/ncsi: validate response packet lengths against the skb
From: Michael Bommarito @ 2026-04-22 16:03 UTC (permalink / raw)
To: Samuel Mendoza-Jonas, Paul Fertser, netdev
Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, linux-kernel, Michael Bommarito, stable
In-Reply-To: <20260422160342.1975093-1-michael.bommarito@gmail.com>
ncsi_rcv_rsp() reads the common packet header before checking that the
skb contains enough data for it, and ncsi_validate_rsp_pkt() trusts
the response payload length before accessing the checksum field.
Malformed NC-SI replies can therefore drive header and checksum reads
past the received packet body. Make the dispatcher pull the common
header first, then have ncsi_validate_rsp_pkt() pull the full response
body before validating the packet.
This keeps malformed responses on the error path instead of letting the
parser walk past the skb payload.
Fixes: 138635cc27c9 ("net/ncsi: NCSI response packet handler")
Cc: stable@vger.kernel.org
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
---
net/ncsi/ncsi-rsp.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/net/ncsi/ncsi-rsp.c b/net/ncsi/ncsi-rsp.c
index fbd84bc8026a..1fe061ede26d 100644
--- a/net/ncsi/ncsi-rsp.c
+++ b/net/ncsi/ncsi-rsp.c
@@ -38,11 +38,18 @@ static int ncsi_validate_rsp_pkt(struct ncsi_request *nr,
struct ncsi_rsp_pkt_hdr *h;
u32 checksum;
__be32 *pchecksum;
+ unsigned int len;
/* Check NCSI packet header. We don't need validate
* the packet type, which should have been checked
* before calling this function.
*/
+ len = skb_network_offset(nr->rsp) + sizeof(*h) + ALIGN(payload, 4);
+ if (!pskb_may_pull(nr->rsp, len)) {
+ netdev_dbg(nr->ndp->ndev.dev, "NCSI: packet too short\n");
+ return -EINVAL;
+ }
+
h = (struct ncsi_rsp_pkt_hdr *)skb_network_header(nr->rsp);
if (h->common.revision != NCSI_PKT_REVISION) {
@@ -1182,6 +1189,11 @@ int ncsi_rcv_rsp(struct sk_buff *skb, struct net_device *dev,
}
/* Check if it is AEN packet */
+ if (!pskb_may_pull(skb, skb_network_offset(skb) + sizeof(*hdr))) {
+ ret = -EINVAL;
+ goto err_free_skb;
+ }
+
hdr = (struct ncsi_pkt_hdr *)skb_network_header(skb);
if (hdr->type == NCSI_PKT_AEN)
return ncsi_aen_handler(ndp, skb);
--
2.53.0
^ permalink raw reply related
* [PATCH net 0/6] net/ncsi: harden packet parsing against malformed BMC replies
From: Michael Bommarito @ 2026-04-22 16:03 UTC (permalink / raw)
To: Samuel Mendoza-Jonas, Paul Fertser, netdev
Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, linux-kernel, Michael Bommarito
NC-SI treats the management controller as privileged, but the Linux
packet parser still needs to reject malformed or truncated replies
instead of walking past the skb or past its software filter tables.
This series closes six linked parser issues in net/ncsi:
- short replies accepted before response header/checksum reads
- GC/GP count fields exceeding software filter limits
- GMCMA address counts exceeding payload-backed addresses
- OEM response parsing that trusts vendor-specific payload offsets
- short AEN packets accepted before AEN header/payload reads
- GP payloads not checked against the consumed MAC/VLAN table bytes
The threat model here is a compromised BMC or management-channel MITM
on the NC-SI link. This is not internet-reachable remote input, so I am
sending it as a public [PATCH net] series with Cc: stable rather than
through security@.
Testing:
- x86_64 defconfig with CONFIG_NET_NCSI=y and
CONFIG_NCSI_OEM_CMD_GET_MAC=y:
`make -C ~/src/linux-mainline O=~/src/build-ncsi-bmc-oob ARCH=x86_64
-j$(nproc) net/ncsi/`
- live x86_64/KASAN QEMU guest for the GP path: guest `virtio-net`
registered with NCSI, `SP -> CIS -> GC -> GP` issued over the
`NCSI` generic-netlink family, and a host tap responder returning
matching NC-SI frames. Without the series applied, a GP reply
with mac_cnt=65 triggers
`KASAN: slab-out-of-bounds in ncsi_rsp_handler_gp()`. With the
series applied, the same reply is rejected with `-ERANGE` and no
KASAN report.
- synthetic A/B userspace harness covering the other malformed-
response cases: without the series, parsing either faults or
corrupts adjacent state; with the series, each case is rejected
or clamped at the parser boundary.
Impact / regression notes:
- libclang call-graph query shows ncsi_validate_rsp_pkt() is only
reached from ncsi_rcv_rsp() and ncsi_rsp_handler_dc(), so the new
skb-length guard stays local to the response path.
- cscope shows ncsi_aen_handler() is only reached from ncsi_rcv_rsp(),
so the new AEN pulls stay local to AEN dispatch.
- cscope on n_vids shows the downstream consumers are the response
parser, the manage-side VLAN bitmap walkers, and ncsi-netlink's
channel dump path, which is the surface this series intentionally
tightens.
Michael Bommarito (6):
net/ncsi: validate response packet lengths against the skb
net/ncsi: bound filter table state to software limits
net/ncsi: validate GMCMA address counts against the payload
net/ncsi: validate OEM response payloads before parsing
net/ncsi: validate AEN packet lengths against the skb
net/ncsi: validate GP payload lengths before parsing
net/ncsi/ncsi-aen.c | 30 +++++++++---
net/ncsi/ncsi-rsp.c | 114 ++++++++++++++++++++++++++++++++++++++++----
2 files changed, 128 insertions(+), 16 deletions(-)
--
2.53.0
^ permalink raw reply
* Re: [PATCH net v4 1/2] tcp: call sk_data_ready() after listener migration
From: Eric Dumazet @ 2026-04-22 15:56 UTC (permalink / raw)
To: Zhenzhong Wu
Cc: netdev, ncardwell, kuniyu, davem, dsahern, kuba, pabeni, horms,
shuah, tamird, linux-kernel, linux-kselftest, stable
In-Reply-To: <20260422024554.130346-2-jt26wzz@gmail.com>
On Tue, Apr 21, 2026 at 7:46 PM Zhenzhong Wu <jt26wzz@gmail.com> wrote:
>
> When inet_csk_listen_stop() migrates an established child socket from
> a closing listener to another socket in the same SO_REUSEPORT group,
> the target listener gets a new accept-queue entry via
> inet_csk_reqsk_queue_add(), but that path never notifies the target
> listener's waiters. A nonblocking accept() still works because it
> checks the queue directly, but poll()/epoll_wait() waiters and
> blocking accept() callers can also remain asleep indefinitely.
>
> Call READ_ONCE(nsk->sk_data_ready)(nsk) after a successful migration
> in inet_csk_listen_stop().
>
> However, after inet_csk_reqsk_queue_add() succeeds, the ref acquired
> in reuseport_migrate_sock() is effectively transferred to
> nreq->rsk_listener. Another CPU can then dequeue nreq via accept()
> or listener shutdown, hit reqsk_put(), and drop that listener ref.
> Since listeners are SOCK_RCU_FREE, wrap the post-queue_add()
> dereferences of nsk in rcu_read_lock()/rcu_read_unlock(), which also
> covers the existing sock_net(nsk) access in that path.
>
> The reqsk_timer_handler() path does not need the same changes for two
> reasons: half-open requests become readable only after the final ACK,
> where tcp_child_process() already wakes the listener; and once nreq is
> visible via inet_ehash_insert(), the success path no longer touches
> nsk directly.
>
> Fixes: 54b92e841937 ("tcp: Migrate TCP_ESTABLISHED/TCP_SYN_RECV sockets in accept queues.")
> Cc: stable@vger.kernel.org
> Suggested-by: Eric Dumazet <edumazet@google.com>
> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
> Signed-off-by: Zhenzhong Wu <jt26wzz@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox