* Re: [PATCH net-next] chelsio: delete the line with the pidx initialization
From: Jakub Kicinski @ 2026-07-01 0:16 UTC (permalink / raw)
To: Markov Gleb
Cc: Andrew Lunn, David S . Miller, Eric Dumazet, Paolo Abeni, netdev,
linux-kernel, lvc-project
In-Reply-To: <20260629130839.218-1-markov.gi@npc-ksb.ru>
On Mon, 29 Jun 2026 16:08:35 +0300 Markov Gleb wrote:
> The value of pidx is overwritten immediately after exiting the "if" block.
>
> Remove pidx ptr initialization string from conditional block.
shrug?
--
pw-bot: reject
^ permalink raw reply
* Re: [PATCH v5] virtio_net: disable cb when NAPI is busy-polled
From: patchwork-bot+netdevbpf @ 2026-07-01 0:20 UTC (permalink / raw)
To: Longjun Tang
Cc: kuba, horms, mst, jasowang, edumazet, virtualization, netdev,
tanglongjun
In-Reply-To: <20260629024230.37325-1-lange_tang@163.com>
Hello:
This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Mon, 29 Jun 2026 10:42:30 +0800 you wrote:
> From: Longjun Tang <tanglongjun@kylinos.cn>
>
> When busy-poll is active, napi_schedule_prep() returns false in
> virtqueue_napi_schedule(), so virtqueue_disable_cb() is skipped.
> The device may keep firing irqs until reaches virtqueue_napi_complete().
> Under load (received == budget), it will lead to a large number
> of spurious interrupts.
>
> [...]
Here is the summary with links:
- [v5] virtio_net: disable cb when NAPI is busy-polled
https://git.kernel.org/netdev/net/c/1eb8fc67ca41
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply
* Re: [PATCH net 1/2] net/sched: act_skbmod: require an Ethernet header for MAC rewrites
From: Jakub Kicinski @ 2026-07-01 0:20 UTC (permalink / raw)
To: Ren Wei
Cc: netdev, jhs, jiri, davem, edumazet, pabeni, horms, peilin.ye,
cong.wang, gnault, yuantan098, yifanwucs, tomapufckgml, zcliangcn,
bird, bronzed_45_vested
In-Reply-To: <20260630171016.11c02dec@kernel.org>
On Tue, 30 Jun 2026 17:10:16 -0700 Jakub Kicinski wrote:
> On Mon, 29 Jun 2026 10:46:03 +0800 Ren Wei wrote:
> > Cc: stable@vger.kernel.org
> > Reported-by: Yuan Tan <yuantan098@gmail.com>
> > Reported-by: Yifan Wu <yifanwucs@gmail.com>
> > Reported-by: Juefei Pu <tomapufckgml@gmail.com>
> > Reported-by: Zhengchuan Liang <zcliangcn@gmail.com>
> > Reported-by: Xin Liu <bird@lzu.edu.cn>
> > Assisted-by: Codex:GPT-5.4
> > Signed-off-by: Wyatt Feng <bronzed_45_vested@icloud.com>
> > Signed-off-by: Ren Wei <n05ec@lzu.edu.cn>
>
> Let's do away with the 5 reported-by tags? You can use a tag for your
> tool or your team, it doesn't have to be a person. Look at sashiko or
> syzbot reported-by tags.
On second thought, if y'all work together maybe there should be no
reported-by tag at all? Can you explain the situation?
^ permalink raw reply
* Re: [PATCH net] net: sgi: ioc3-eth: fix split TX DMA mapping lengths
From: patchwork-bot+netdevbpf @ 2026-07-01 0:20 UTC (permalink / raw)
To: Xu Rao
Cc: tsbogend, andrew+netdev, davem, edumazet, kuba, pabeni,
linux-mips, netdev, linux-kernel, stable
In-Reply-To: <4E1486BC4536407E+20260629080623.908426-1-raoxu@uniontech.com>
Hello:
This patch was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Mon, 29 Jun 2026 16:06:23 +0800 you wrote:
> From: Xu Rao <raoxu@uniontech.com>
>
> When a linear skb crosses a 16 KiB boundary, ioc3_start_xmit()
> splits it into two buffers of lengths s1 and s2. The descriptor
> advertises those lengths through B1CNT and B2CNT.
>
> The first buffer is mapped with s1, but the second buffer is also
> mapped with s1 even though the device is told to fetch s2 bytes from
> it. When the lengths differ, the DMA mapping does not cover the same
> region as the second descriptor buffer, which can result in incorrect
> cache maintenance or a DMA fault on implementations that enforce the
> mapped range.
>
> [...]
Here is the summary with links:
- [net] net: sgi: ioc3-eth: fix split TX DMA mapping lengths
https://git.kernel.org/netdev/net-next/c/cd066559a073
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply
* Re: [PATCH v4 net-next] bonding: no longer rely on RTNL in bond_fill_info()
From: patchwork-bot+netdevbpf @ 2026-07-01 0:20 UTC (permalink / raw)
To: Eric Dumazet
Cc: davem, kuba, pabeni, horms, netdev, eric.dumazet, jv,
andrew+netdev
In-Reply-To: <20260629173200.469953-1-edumazet@google.com>
Hello:
This patch was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Mon, 29 Jun 2026 17:32:00 +0000 you wrote:
> Add READ_ONCE()/WRITE_ONCE() annotations on port->is_enabled.
> While this field is written under bond->mode_lock protection,
> is is read without this lock being held.
>
> Change bond_fill_info() to acquire RCU and use READ_ONCE()
> to read bond->params fields that can be updated concurrently
> from sysfs/procfs/rtnetlink.
>
> [...]
Here is the summary with links:
- [v4,net-next] bonding: no longer rely on RTNL in bond_fill_info()
https://git.kernel.org/netdev/net-next/c/317cefdcaacc
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply
* Re: [PATCH net-next 0/2] net: do not warn on best-effort skb allocation failures
From: patchwork-bot+netdevbpf @ 2026-07-01 0:20 UTC (permalink / raw)
To: Breno Leitao
Cc: andrew+netdev, davem, edumazet, kuba, pabeni, horms, netdev,
linux-kernel, asantostc, gustavold, vlad.wing, kernel-team
In-Reply-To: <20260629-netpoll_no_warn-v1-0-f380f0b2cd0c@debian.org>
Hello:
This series was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Mon, 29 Jun 2026 04:45:39 -0700 you wrote:
> Both netconsole and netpoll keep a small preallocated pool of skbs
> (skb_pool) so they can still get a buffer under memory pressure.
>
> On the hot path they first attempt a normal GFP_ATOMIC allocation and only
> fall back to the pool when that fails, keeping the pool as a last resort.
>
> This is where the problem happens. If alloc_skb() fails, we now have
> more than 100 message coming from the page=0 failure, which consumes the
> scarce pool of skb, making the real issue disappear.
>
> [...]
Here is the summary with links:
- [net-next,1/2] netconsole: do not warn when the best-effort skb allocation fails
https://git.kernel.org/netdev/net-next/c/09f7a613a14f
- [net-next,2/2] netpoll: do not warn when the best-effort pool refill fails
https://git.kernel.org/netdev/net-next/c/84c0ff1efb62
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply
* Re: [PATCH net] net: sgi: ioc3-eth: unregister netdev before freeing DMA rings
From: patchwork-bot+netdevbpf @ 2026-07-01 0:20 UTC (permalink / raw)
To: Xu Rao
Cc: tsbogend, andrew+netdev, davem, edumazet, kuba, pabeni,
linux-mips, netdev, linux-kernel, stable
In-Reply-To: <40CD736C4911C181+20260629085053.964383-1-raoxu@uniontech.com>
Hello:
This patch was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Mon, 29 Jun 2026 16:50:53 +0800 you wrote:
> From: Xu Rao <raoxu@uniontech.com>
>
> ioc3eth_remove() frees the coherent RX and TX descriptor rings before
> unregistering the netdev. If the interface is running,
> unregister_netdev() invokes ioc3_close() through ndo_stop.
>
> ioc3_close() stops the device and then calls ioc3_free_rx_bufs() and
> ioc3_clean_tx_ring(). Both cleanup functions access descriptors in the
> rings, so the current ordering causes CPU accesses to freed coherent
> memory. Until ioc3_stop() disables RX and TX DMA, the device may also
> continue using the freed ring addresses.
>
> [...]
Here is the summary with links:
- [net] net: sgi: ioc3-eth: unregister netdev before freeing DMA rings
https://git.kernel.org/netdev/net-next/c/18a28f3e107e
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply
* Re: [PATCH net v2] net/liquidio: drop cached VF pci_dev LUT and resolve VF for FLR on request
From: Jakub Kicinski @ 2026-07-01 0:22 UTC (permalink / raw)
To: Yuho Choi
Cc: Andrew Lunn, David S . Miller, Eric Dumazet, Paolo Abeni,
Kory Maincent, Zilin Guan, Uwe Kleine-Koenig, Vadim Fedorenko,
Marco Crivellari, netdev, linux-kernel
In-Reply-To: <20260629141656.1769227-1-dbgh9129@gmail.com>
On Mon, 29 Jun 2026 10:16:56 -0400 Yuho Choi wrote:
> Xgene PCI probe by using port->dev->of_node directly.
sashiko points out that this is not part of this patch (or driver?)
--
pw-bot: cr
^ permalink raw reply
* [PATCH v5] bpf: Fix smp_processor_id() call trace for preemptible kernels
From: Edward Adam Davis @ 2026-07-01 0:27 UTC (permalink / raw)
To: sashiko-bot
Cc: eadavis, jiayuan.chen, sashiko-reviews, andrii, ast, bpf, daniel,
eddyz87, emil, jolsa, linux-kernel, martin.lau, memxor, netdev,
song, syzkaller-bugs, yonghong.song
In-Reply-To: <20260630132226.C44601F000E9@smtp.kernel.org>
bpf_mem_cache_free_rcu() maybe called in preemptible context, this
will trigger the below warning message:
BUG: using smp_processor_id() in preemptible [00000000] code: syz.0.17/5820
caller is bpf_mem_cache_free_rcu+0x48/0xc0 kernel/bpf/memalloc.c:954
Call Trace:
check_preemption_disabled+0xd3/0xe0 lib/smp_processor_id.c:47
bpf_mem_cache_free_rcu+0x48/0xc0 kernel/bpf/memalloc.c:954
rhtab_delete_elem+0x185a/0x1b30 kernel/bpf/hashtab.c:2969
__rhtab_map_lookup_and_delete_batch+0x935/0xcb0 kernel/bpf/hashtab.c:3349
bpf_map_do_batch+0x445/0x630 kernel/bpf/syscall.c:-1
__sys_bpf+0x906/0xd90 kernel/bpf/syscall.c:-1
this_cpu_ptr() requires the caller to prevent task migration.
These helpers currently do not enforce that requirement and may
be invoked from preemptible contexts, leading to accesses to another
CPU's per-CPU cache after migration. Use get_cpu_ptr()/put_cpu_ptr()
to pin the task while accessing the per-CPU allocator state.
Fixes: 5af6807bdb10 ("bpf: Introduce bpf_mem_free_rcu() similar to kfree_rcu().")
Fixes: 7c8199e24fa0 ("bpf: Introduce any context BPF specific memory allocator.")
Reported-by: syzbot+fd7e415d891073b83e1f@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=fd7e415d891073b83e1f
Signed-off-by: Edward Adam Davis <eadavis@qq.com>
---
v1 -> v2: using guard against preemption
v2 -> v3: replace get/put_cpu() to bpf_disable/enable_instrumentation()
v3 -> v4: disable preempt to make this_cpu_ptr() work
v4 -> v5: in mem free disable preemption
kernel/bpf/memalloc.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c
index e9662db7198f..2118fe725ed4 100644
--- a/kernel/bpf/memalloc.c
+++ b/kernel/bpf/memalloc.c
@@ -911,7 +911,8 @@ void notrace bpf_mem_free(struct bpf_mem_alloc *ma, void *ptr)
if (WARN_ON_ONCE(idx < 0))
return;
- unit_free(this_cpu_ptr(ma->caches)->cache + idx, ptr);
+ unit_free(get_cpu_ptr(ma->caches)->cache + idx, ptr);
+ put_cpu_ptr(ma->caches);
}
void notrace bpf_mem_free_rcu(struct bpf_mem_alloc *ma, void *ptr)
@@ -927,7 +928,8 @@ void notrace bpf_mem_free_rcu(struct bpf_mem_alloc *ma, void *ptr)
if (WARN_ON_ONCE(idx < 0))
return;
- unit_free_rcu(this_cpu_ptr(ma->caches)->cache + idx, ptr);
+ unit_free_rcu(get_cpu_ptr(ma->caches)->cache + idx, ptr);
+ put_cpu_ptr(ma->caches);
}
void notrace *bpf_mem_cache_alloc(struct bpf_mem_alloc *ma)
@@ -951,7 +953,8 @@ void notrace bpf_mem_cache_free_rcu(struct bpf_mem_alloc *ma, void *ptr)
if (!ptr)
return;
- unit_free_rcu(this_cpu_ptr(ma->cache), ptr);
+ unit_free_rcu(get_cpu_ptr(ma->cache), ptr);
+ put_cpu_ptr(ma->cache);
}
/* Directly does a kfree() without putting 'ptr' back to the free_llist
--
2.43.0
^ permalink raw reply related
* Re: [PATCH net] cxgb4: Fix decode strings dump for T6 adapters
From: patchwork-bot+netdevbpf @ 2026-07-01 0:30 UTC (permalink / raw)
To: Markov Gleb
Cc: bharat, andrew+netdev, davem, edumazet, kuba, pabeni, netdev,
linux-kernel, lvc-project
In-Reply-To: <20260629130856.1168-1-markov.gi@npc-ksb.ru>
Hello:
This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Mon, 29 Jun 2026 16:08:54 +0300 you wrote:
> From: Gleb Markov <markov.gi@npc-ksb.ru>
>
> Depending on the value of chip_version, the correct decode set is selected.
> However, the subsequent matching with the t4 encoding type in the if-else
> block results in a reassignment, which leads to the loss of support for
> t6_decode as well as reinitializing of values t4_decode and t5_decode.
>
> [...]
Here is the summary with links:
- [net] cxgb4: Fix decode strings dump for T6 adapters
https://git.kernel.org/netdev/net/c/5d6dc22d6268
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply
* Re: [PATCH net-next v2] ipv4: igmp: remove multicast group from hash table on device destruction
From: Yuyang Huang @ 2026-07-01 0:41 UTC (permalink / raw)
To: Kuniyuki Iwashima
Cc: idosch, davem, dsahern, edumazet, horms, jedrzej.jagielski, kuba,
linux-kernel, netdev, pabeni, xiyou.wangcong
In-Reply-To: <20260630211527.3365952-1-kuniyu@google.com>
On Wed, Jul 1, 2026 at 6:15 AM Kuniyuki Iwashima <kuniyu@google.com> wrote:
>
> From: Ido Schimmel <idosch@nvidia.com>
> Date: Tue, 30 Jun 2026 19:59:34 +0300
> > On Tue, Jun 30, 2026 at 04:55:22PM +0900, Yuyang Huang wrote:
> > > > Hi,
> > > >
> > > > why sending this to net-next not to net if that's a bug fix?
> > > >
> > > > In the v1 thread it was said
> > > > >This is a long-standing bug, not a recent regression.
> > > >
> > > > so why do not cc stable kernel to get rid of this bug from
> > > > stable kernels in such case?
> > >
> > > Thanks for the advise, will send this patch to stable kernel.
> >
> > Please target v3 at net and add a trace given you're claiming for a
> > use-after-free. That way we know that the problem is real and not a
> > false-positive from some tool. You can reproduce it by adding enough
> > delay in inetdev_destroy():
>
> I guess delay was added between ip_mc_destroy_dev() and
> RCU_INIT_POINTER(dev->ip_ptr, NULL) ?
>
> I feel like we should clear it first and destroy everything
> as done in IPv6 addrconf_ifdown().
>
>
> >
> > BUG: KASAN: slab-use-after-free in ip_check_mc_rcu+0x2cc/0x500
> > Read of size 4 at addr ffff88810c571208 by task mausezahn/419
> >
> > CPU: 2 UID: 0 PID: 419 Comm: mausezahn Not tainted 7.1.0-virtme-g15d4a7c23bf6 #17 PREEMPT(lazy)
> > Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> > Call Trace:
> > <IRQ>
> > dump_stack_lvl+0x4d/0x70
> > print_report+0x153/0x4c2
> > kasan_report+0xda/0x110
> > ip_check_mc_rcu+0x2cc/0x500
> > ip_route_input_rcu.part.0+0x13d/0xbc0
> > ip_route_input_noref+0xb6/0x110
> > ip_rcv_finish_core+0x41b/0x1d90
> > ip_rcv_finish+0xea/0x1b0
> > ip_rcv+0xb7/0x1b0
> > __netif_receive_skb_one_core+0xfc/0x180
> > process_backlog+0x1ea/0x5e0
> > __napi_poll+0x97/0x480
> > net_rx_action+0x97c/0xfa0
> > handle_softirqs+0x18c/0x4f0
> > do_softirq+0x42/0x60
> > </IRQ>
> >
Thanks for the advise, I will try to add a trace in v3. For more
reference, the issue is pointed out in the following discussion:
https://lore.kernel.org/netdev/95adff35-ee56-49d3-8567-382ac17810b3@redhat.com/#t
^ permalink raw reply
* [PATCH net] gve: fix Rx queue stall on alloc failure
From: Harshitha Ramamurthy @ 2026-07-01 0:53 UTC (permalink / raw)
To: netdev
Cc: joshwash, hramamurthy, andrew+netdev, davem, edumazet, kuba,
pabeni, ast, daniel, hawk, john.fastabend, bpf, sdf, willemb,
jordanrhee, nktgrg, maolson, jacob.e.keller, thostet, csully, bcf,
linux-kernel, stable, Eddie Phillips
From: Eddie Phillips <eddiephillips@google.com>
When the system is under extreme memory pressure, page allocations can
fail during the Rx buffer refill loop. If the number of buffers posted
to hardware falls below a critical low threshold and the refill loop
exits due to allocation failures, the queue can stall:
1. The device drops incoming packets because there are no descriptors.
2. Since no packets are processed, no Rx completions are generated.
3. Because no completions occur, NAPI is never scheduled, preventing
the refill loop from running again even after memory is freed.
This results in a permanent queue stall.
Resolve this by introducing a starvation recovery timer for each Rx queue.
If the number of buffers posted to hardware falls below a critical low
threshold, start a timer to periodically reschedule NAPI. Once NAPI runs
and successfully refills the queue above the threshold, the timer is
not rescheduled.
Also add a new ethtool statistic "rx_critical_low_bufs" to track the
number of times the starvation recovery timer is triggered.
Cc: stable@vger.kernel.org
Fixes: 9b8dd5e5ea48 ("gve: DQO: Add RX path")
Reviewed-by: Jordan Rhee <jordanrhee@google.com>
Signed-off-by: Eddie Phillips <eddiephillips@google.com>
Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
---
drivers/net/ethernet/google/gve/gve.h | 4 ++++
drivers/net/ethernet/google/gve/gve_ethtool.c | 14 +++++++++++++-
drivers/net/ethernet/google/gve/gve_rx_dqo.c | 32 ++++++++++++++++++++++++++++++++
3 files changed, 49 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/google/gve/gve.h b/drivers/net/ethernet/google/gve/gve.h
index 2f7bd330..8378bef2 100644
--- a/drivers/net/ethernet/google/gve/gve.h
+++ b/drivers/net/ethernet/google/gve/gve.h
@@ -13,6 +13,7 @@
#include <linux/netdevice.h>
#include <linux/net_tstamp.h>
#include <linux/pci.h>
+#include <linux/timer.h>
#include <linux/ptp_clock_kernel.h>
#include <linux/u64_stats_sync.h>
#include <net/page_pool/helpers.h>
@@ -41,6 +42,7 @@
/* Interval to schedule a stats report update, 20000ms. */
#define GVE_STATS_REPORT_TIMER_PERIOD 20000
+#define GVE_RX_NAPI_RESCHED_MS 20 /* msecs */
/* Numbers of NIC tx/rx stats in stats report. */
#define NIC_TX_STATS_REPORT_NUM 0
@@ -318,6 +320,7 @@ struct gve_rx_ring {
u64 rx_copied_pkt; /* free-running total number of copied packets */
u64 rx_skb_alloc_fail; /* free-running count of skb alloc fails */
u64 rx_buf_alloc_fail; /* free-running count of buffer alloc fails */
+ u64 rx_critical_low_bufs; /* count of critical low buffer events */
u64 rx_desc_err_dropped_pkt; /* free-running count of packets dropped by descriptor error */
/* free-running count of unsplit packets due to header buffer overflow or hdr_len is 0 */
u64 rx_hsplit_unsplit_pkt;
@@ -334,6 +337,7 @@ struct gve_rx_ring {
struct gve_queue_resources *q_resources; /* head and tail pointer idx */
dma_addr_t q_resources_bus; /* dma address for the queue resources */
struct u64_stats_sync statss; /* sync stats for 32bit archs */
+ struct timer_list starvation_timer; /* for queue starvation recovery */
struct gve_rx_ctx ctx; /* Info for packet currently being processed in this ring. */
diff --git a/drivers/net/ethernet/google/gve/gve_ethtool.c b/drivers/net/ethernet/google/gve/gve_ethtool.c
index a0e0472b..71b6efbf 100644
--- a/drivers/net/ethernet/google/gve/gve_ethtool.c
+++ b/drivers/net/ethernet/google/gve/gve_ethtool.c
@@ -46,6 +46,7 @@ static const char gve_gstrings_main_stats[][ETH_GSTRING_LEN] = {
"rx_hsplit_unsplit_pkt",
"interface_up_cnt", "interface_down_cnt", "reset_cnt",
"page_alloc_fail", "dma_mapping_error", "stats_report_trigger_cnt",
+ "rx_critical_low_bufs",
};
static const char gve_gstrings_rx_stats[][ETH_GSTRING_LEN] = {
@@ -58,6 +59,7 @@ static const char gve_gstrings_rx_stats[][ETH_GSTRING_LEN] = {
"rx_xdp_aborted[%u]", "rx_xdp_drop[%u]", "rx_xdp_pass[%u]",
"rx_xdp_tx[%u]", "rx_xdp_redirect[%u]",
"rx_xdp_tx_errors[%u]", "rx_xdp_redirect_errors[%u]", "rx_xdp_alloc_fails[%u]",
+ "rx_critical_low_bufs[%u]",
};
static const char gve_gstrings_tx_stats[][ETH_GSTRING_LEN] = {
@@ -151,12 +153,14 @@ gve_get_ethtool_stats(struct net_device *netdev,
{
u64 tmp_rx_pkts, tmp_rx_hsplit_pkt, tmp_rx_bytes, tmp_rx_hsplit_bytes,
tmp_rx_skb_alloc_fail, tmp_rx_buf_alloc_fail,
+ tmp_rx_critical_low_bufs,
tmp_rx_desc_err_dropped_pkt, tmp_rx_hsplit_unsplit_pkt,
tmp_tx_pkts, tmp_tx_bytes,
tmp_xdp_tx_errors, tmp_xdp_redirect_errors;
u64 rx_buf_alloc_fail, rx_desc_err_dropped_pkt, rx_hsplit_unsplit_pkt,
rx_pkts, rx_hsplit_pkt, rx_skb_alloc_fail, rx_bytes, tx_pkts, tx_bytes,
- tx_dropped, xdp_tx_errors, xdp_redirect_errors;
+ rx_critical_low_bufs, tx_dropped, xdp_tx_errors,
+ xdp_redirect_errors;
int rx_base_stats_idx, max_rx_stats_idx, max_tx_stats_idx;
int stats_idx, stats_region_len, nic_stats_len;
struct stats *report_stats;
@@ -197,6 +201,7 @@ gve_get_ethtool_stats(struct net_device *netdev,
for (rx_pkts = 0, rx_bytes = 0, rx_hsplit_pkt = 0,
rx_skb_alloc_fail = 0, rx_buf_alloc_fail = 0,
+ rx_critical_low_bufs = 0,
rx_desc_err_dropped_pkt = 0, rx_hsplit_unsplit_pkt = 0,
xdp_tx_errors = 0, xdp_redirect_errors = 0,
ring = 0;
@@ -212,6 +217,8 @@ gve_get_ethtool_stats(struct net_device *netdev,
tmp_rx_bytes = rx->rbytes;
tmp_rx_skb_alloc_fail = rx->rx_skb_alloc_fail;
tmp_rx_buf_alloc_fail = rx->rx_buf_alloc_fail;
+ tmp_rx_critical_low_bufs =
+ rx->rx_critical_low_bufs;
tmp_rx_desc_err_dropped_pkt =
rx->rx_desc_err_dropped_pkt;
tmp_rx_hsplit_unsplit_pkt =
@@ -226,6 +233,7 @@ gve_get_ethtool_stats(struct net_device *netdev,
rx_bytes += tmp_rx_bytes;
rx_skb_alloc_fail += tmp_rx_skb_alloc_fail;
rx_buf_alloc_fail += tmp_rx_buf_alloc_fail;
+ rx_critical_low_bufs += tmp_rx_critical_low_bufs;
rx_desc_err_dropped_pkt += tmp_rx_desc_err_dropped_pkt;
rx_hsplit_unsplit_pkt += tmp_rx_hsplit_unsplit_pkt;
xdp_tx_errors += tmp_xdp_tx_errors;
@@ -269,6 +277,7 @@ gve_get_ethtool_stats(struct net_device *netdev,
data[i++] = priv->page_alloc_fail;
data[i++] = priv->dma_mapping_error;
data[i++] = priv->stats_report_trigger_cnt;
+ data[i++] = rx_critical_low_bufs;
i = GVE_MAIN_STATS_LEN;
rx_base_stats_idx = 0;
@@ -337,6 +346,8 @@ gve_get_ethtool_stats(struct net_device *netdev,
tmp_rx_hsplit_bytes = rx->rx_hsplit_bytes;
tmp_rx_skb_alloc_fail = rx->rx_skb_alloc_fail;
tmp_rx_buf_alloc_fail = rx->rx_buf_alloc_fail;
+ tmp_rx_critical_low_bufs =
+ rx->rx_critical_low_bufs;
tmp_rx_desc_err_dropped_pkt =
rx->rx_desc_err_dropped_pkt;
tmp_xdp_tx_errors = rx->xdp_tx_errors;
@@ -381,6 +392,7 @@ gve_get_ethtool_stats(struct net_device *netdev,
} while (u64_stats_fetch_retry(&priv->rx[ring].statss,
start));
i += GVE_XDP_ACTIONS + 3; /* XDP rx counters */
+ data[i++] = tmp_rx_critical_low_bufs;
}
} else {
i += priv->rx_cfg.num_queues * NUM_GVE_RX_CNTS;
diff --git a/drivers/net/ethernet/google/gve/gve_rx_dqo.c b/drivers/net/ethernet/google/gve/gve_rx_dqo.c
index 02cba280..303db4fa 100644
--- a/drivers/net/ethernet/google/gve/gve_rx_dqo.c
+++ b/drivers/net/ethernet/google/gve/gve_rx_dqo.c
@@ -18,6 +18,16 @@
#include <net/tcp.h>
#include <net/xdp_sock_drv.h>
+static void gve_rx_starvation_timer(struct timer_list *t)
+{
+ struct gve_rx_ring *rx = timer_container_of(rx, t, starvation_timer);
+ struct gve_priv *priv = rx->gve;
+ struct gve_notify_block *block;
+
+ block = &priv->ntfy_blocks[rx->ntfy_id];
+ napi_schedule(&block->napi);
+}
+
static void gve_rx_free_hdr_bufs(struct gve_priv *priv, struct gve_rx_ring *rx)
{
struct device *hdev = &priv->pdev->dev;
@@ -120,6 +130,7 @@ void gve_rx_stop_ring_dqo(struct gve_priv *priv, int idx)
if (rx->dqo.page_pool)
page_pool_disable_direct_recycling(rx->dqo.page_pool);
+ timer_delete_sync(&rx->starvation_timer);
gve_remove_napi(priv, ntfy_idx);
gve_rx_remove_from_block(priv, idx);
gve_rx_reset_ring_dqo(priv, idx);
@@ -136,6 +147,8 @@ void gve_rx_free_ring_dqo(struct gve_priv *priv, struct gve_rx_ring *rx,
u32 qpl_id;
int i;
+ timer_shutdown_sync(&rx->starvation_timer);
+
completion_queue_slots = rx->dqo.complq.mask + 1;
buffer_queue_slots = rx->dqo.bufq.mask + 1;
@@ -232,6 +245,7 @@ int gve_rx_alloc_ring_dqo(struct gve_priv *priv,
rx->gve = priv;
rx->q_num = idx;
rx->packet_buffer_size = cfg->packet_buffer_size;
+ timer_setup(&rx->starvation_timer, gve_rx_starvation_timer, 0);
if (cfg->xdp) {
rx->packet_buffer_truesize = GVE_XDP_RX_BUFFER_SIZE_DQO;
@@ -365,6 +379,7 @@ void gve_rx_post_buffers_dqo(struct gve_rx_ring *rx)
struct gve_rx_compl_queue_dqo *complq = &rx->dqo.complq;
struct gve_rx_buf_queue_dqo *bufq = &rx->dqo.bufq;
struct gve_priv *priv = rx->gve;
+ u32 num_bufs_avail_to_hw;
u32 num_avail_slots;
u32 num_full_slots;
u32 num_posted = 0;
@@ -400,6 +415,23 @@ void gve_rx_post_buffers_dqo(struct gve_rx_ring *rx)
}
rx->fill_cnt += num_posted;
+
+ /* If the queue has fewer than GVE_RX_BUF_THRESH_DQO descriptors
+ * visible to the hardware, and no doorbell was written, the hardware
+ * is in danger of starving and cannot trigger interrupts. Start the
+ * timer to periodically reschedule NAPI and recover from starvation.
+ */
+ num_bufs_avail_to_hw =
+ ((bufq->tail & ~(GVE_RX_BUF_THRESH_DQO - 1)) -
+ bufq->head) & bufq->mask;
+
+ if (num_bufs_avail_to_hw < GVE_RX_BUF_THRESH_DQO) {
+ u64_stats_update_begin(&rx->statss);
+ rx->rx_critical_low_bufs++;
+ u64_stats_update_end(&rx->statss);
+ mod_timer(&rx->starvation_timer,
+ jiffies + msecs_to_jiffies(GVE_RX_NAPI_RESCHED_MS));
+ }
}
static void gve_rx_skb_csum(struct sk_buff *skb,
--
2.55.0.rc2.803.g1fd1e6609c-goog
^ permalink raw reply related
* Re: [PATCH net] selftests: drv-net: tso: don't touch dangerous feature bits
From: patchwork-bot+netdevbpf @ 2026-07-01 1:30 UTC (permalink / raw)
To: Jakub Kicinski
Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, shuah,
daniel.zahka, linux-kselftest
In-Reply-To: <20260629233923.2151144-1-kuba@kernel.org>
Hello:
This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Mon, 29 Jun 2026 16:39:23 -0700 you wrote:
> query_nic_features() detects which offloads depend on tx-gso-partial
> by enabling everything, turning tx-gso-partial off, and seeing which
> active features drop out. Enabling all hw features is dangerous:
> we may end up enabling rx-fcs and loopback for example. For the
> ice driver we end up getting into problems with feature dependencies
> so the cleanup isn't successful either, and the test exits with
> rx-fcs and loopback enabled.
>
> [...]
Here is the summary with links:
- [net] selftests: drv-net: tso: don't touch dangerous feature bits
https://git.kernel.org/netdev/net/c/2f7f2e311106
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply
* Re: [PATCH 3/3] net: stmmac: dwmac-socfpga: Add mac-mode DT property support
From: Nazle Asmade, Muhammad Nazim Amirul @ 2026-07-01 1:32 UTC (permalink / raw)
To: Maxime Chevallier, Andrew Lunn
Cc: dinguyen@kernel.org, rmk+kernel@armlinux.org.uk,
krzk+dt@kernel.org, conor+dt@kernel.org, robh@kernel.org,
davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
pabeni@redhat.com, andrew+netdev@lunn.ch,
devicetree@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <e691298f-b3e5-4c1a-8270-a821c1f46a2b@bootlin.com>
On 30/6/2026 11:42 pm, Maxime Chevallier wrote:
>
>
> On 6/30/26 17:13, Nazle Asmade, Muhammad Nazim Amirul wrote:
>
>> Yes, Agilex5 has the same concept. The GMII-to-RGMII converter is a
>> Quartus soft IP instantiated in the FPGA fabric — equivalent to the
>> CycloneV EMAC splitter. The XGMAC outputs GMII signals to the FPGA
>> fabric, the soft IP converts them to RGMII, and the RGMII signals then
>> go through the FPGA HVIO pins to the external Marvell 88E1512 PHY.
>
> Does this converter need any special config, and does it expose any
> control registers ? or is it fully autonomous ?
>
> If it's fully autonomous, can you detect its presence through some
> capability registers or something like that ?
>
>
> Maxime
>
Hi Maxime,
Per my knowledge, the converter is fully autonomous with no control
registers and no software configuration required.
Speed switching is handled entirely in hardware — the XGMAC's mac_speed
output signals are wired directly in the FPGA fabric to the converter's
speed input. No driver intervention is needed on speed changes.
There are no capability registers and no way to detect its presence in
hardware. It is a property of the FPGA design, not the HPS silicon.
BR,
Nazim Amirul
^ permalink raw reply
* [PATCH net-next v2] ice: use dev_err_probe() in ice_probe()
From: Rongguang Wei @ 2026-07-01 1:36 UTC (permalink / raw)
To: netdev, intel-wired-lan, aleksandr.loktionov, przemyslaw.kitszel
Cc: anthony.l.nguyen, andrew+netdev, Rongguang Wei
From: Rongguang Wei <weirongguang@kylinos.cn>
dev_err_probe() logs the error and returns the supplied error code, which
allows probe error paths to be written more compactly.
Use dev_err_probe() in ice_probe() for error paths that currently print an
error message and immediately return the same error code. This keeps the
existing error handling semantics while reducing open-coded logging and
return sequences.
Signed-off-by: Rongguang Wei <weirongguang@kylinos.cn>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
---
v2:
- Fix commit message per Aleksandr Loktionov's recommendation.
v1: https://lore.kernel.org/netdev/20260630032537.42605-1-clementwei90@163.com/T/#t
---
drivers/net/ethernet/intel/ice/ice_main.c | 24 ++++++++---------------
1 file changed, 8 insertions(+), 16 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index e2fd2dab03e3..31aa42f8e6d3 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -5161,10 +5161,8 @@ ice_probe(struct pci_dev *pdev, const struct pci_device_id __always_unused *ent)
struct ice_hw *hw;
int err;
- if (pdev->is_virtfn) {
- dev_err(dev, "can't probe a virtual function\n");
- return -EINVAL;
- }
+ if (pdev->is_virtfn)
+ return dev_err_probe(dev, -EINVAL, "can't probe a virtual function\n");
/* when under a kdump kernel initiate a reset before enabling the
* device in order to clear out any pending DMA transactions. These
@@ -5188,10 +5186,8 @@ ice_probe(struct pci_dev *pdev, const struct pci_device_id __always_unused *ent)
return err;
err = pcim_iomap_regions(pdev, BIT(ICE_BAR0), dev_driver_string(dev));
- if (err) {
- dev_err(dev, "BAR0 I/O map error %d\n", err);
- return err;
- }
+ if (err)
+ return dev_err_probe(dev, err, "BAR0 I/O map error %d\n", err);
pf = ice_allocate_pf(dev);
if (!pf)
@@ -5202,10 +5198,8 @@ ice_probe(struct pci_dev *pdev, const struct pci_device_id __always_unused *ent)
/* set up for high or low DMA */
err = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(64));
- if (err) {
- dev_err(dev, "DMA configuration failed: 0x%x\n", err);
- return err;
- }
+ if (err)
+ return dev_err_probe(dev, err, "DMA configuration failed: 0x%x\n", err);
pci_set_master(pdev);
pf->pdev = pdev;
@@ -5240,10 +5234,8 @@ ice_probe(struct pci_dev *pdev, const struct pci_device_id __always_unused *ent)
return ice_probe_recovery_mode(pf);
err = ice_init_hw(hw);
- if (err) {
- dev_err(dev, "ice_init_hw failed: %d\n", err);
- return err;
- }
+ if (err)
+ return dev_err_probe(dev, err, "ice_init_hw failed: %d\n", err);
ice_init_dev_hw(pf);
--
2.25.1
^ permalink raw reply related
* Re: [PATCH net] netfilter: nf_nat_masquerade: recalculate TCP TS offset when port is randomized
From: Jiayuan Chen @ 2026-07-01 1:44 UTC (permalink / raw)
To: xietangxin, Pablo Neira Ayuso, Florian Westphal, Phil Sutter,
David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman
Cc: gaoxingwang, huyizhen, netfilter-devel, coreteam, netdev,
linux-kernel, stable
In-Reply-To: <20260629093408.3927103-1-xietangxin@h-partners.com>
On 6/29/26 5:34 PM, xietangxin wrote:
> Problem observed in Kubernetes environments where MASQUERADE target with
> --random-fully is configured by default. after commit
> 165573e41f2f ("tcp: secure_seq: add back ports to TS offset") TCP short
> connection QPS dropped from ~20000 to ~10000. This added source and
> destination ports into TS offset calculation.
>
> However, with MASQUERADE --random-fully, when multiple internal connections
> (e.g sport 10000,20000) are mapped to the same external port (e.g 30000),
> their TS offsets are calculated as ts_offset(10000) and ts_offset(20000).
> If the server reuses the TIME_WAIT slot from the first connection, there is
> a chance that ts_offset(20000) < ts_offset(10000), breaking TSval
> monotonicity for the same 4-tuple and causing RST packets:
> Client -> Server 24870 -> 80 [SYN] TSval=2294041168
> Server -> Client 80 -> 24870 [ACK] TSecr=2846236456
> Client -> Server 24870 -> 80 [RST] Seq=855605690
>
> After nf_nat_setup_info() successfully assigns a new randomized
> source port, recalculate the TS offset using the new port and
> update the SYN packet's TSval accordingly.
>
> Test results on 4U4G VM with
> `./wrk -t8 -c200 -H "Connection: close" -d10s --latency http://5.5.5.5:80`
> Before:
> random:10712 req/s, random-fully:10986 req/s
> After:
> random:21463 req/s, random-fully:19181 req/s
>
> Fixes: 165573e41f2f ("tcp: secure_seq: add back ports to TS offset")
> Cc: stable@vger.kernel.org
I'd treat it as a feature not a fix.
> Closes:https://lore.kernel.org/all/92935c00-e0be-4591-ac44-5978c7804d57@yeah.net/
> Signed-off-by: xietangxin <xietangxin@h-partners.com>
> ---
> net/netfilter/nf_nat_masquerade.c | 91 ++++++++++++++++++++++++++++++-
> 1 file changed, 89 insertions(+), 2 deletions(-)
>
> diff --git a/net/netfilter/nf_nat_masquerade.c b/net/netfilter/nf_nat_masquerade.c
> index 4de6e0a51701..8c9ca5a051cc 100644
> --- a/net/netfilter/nf_nat_masquerade.c
> +++ b/net/netfilter/nf_nat_masquerade.c
> @@ -6,8 +6,11 @@
> #include <linux/netfilter.h>
> #include <linux/netfilter_ipv4.h>
> #include <linux/netfilter_ipv6.h>
> +#include <linux/tcp.h>
>
> +#include <net/tcp.h>
> #include <net/netfilter/nf_nat_masquerade.h>
> +#include <net/secure_seq.h>
>
> struct masq_dev_work {
> struct work_struct work;
> @@ -24,6 +27,76 @@ static DEFINE_MUTEX(masq_mutex);
> static unsigned int masq_refcnt __read_mostly;
> static atomic_t masq_worker_count __read_mostly;
>
> +static __be32 *tcp_ts_option_ptr(const struct sk_buff *skb)
> +{
> + const struct tcphdr *th;
> + unsigned char *ptr;
> + unsigned char opsize;
> + unsigned int optlen, offset;
> +
> + th = tcp_hdr(skb);
> + optlen = (th->doff - 5) * 4;
> + ptr = (unsigned char *)(th + 1);
> + offset = 0;
> +
> + while (offset < optlen) {
> + unsigned char opcode = ptr[offset];
> +
> + if (opcode == TCPOPT_EOL)
> + break;
> + if (opcode == TCPOPT_NOP) {
> + offset++;
> + continue;
> + }
> +
> + if (offset + 1 >= optlen)
> + break;
> +
> + opsize = ptr[offset + 1];
> + if (opsize < 2 || offset + opsize > optlen)
> + break;
> +
> + if (opcode == TCPOPT_TIMESTAMP && opsize == TCPOLEN_TIMESTAMP)
> + return (__be32 *)(ptr + offset + 2);
> +
> + offset += opsize;
> + }
> +
> + return NULL;
> +}
> +
> +static void masquerade_update_tcp_ts_offset(struct nf_conn *ct, struct sk_buff *skb)
> +{
> + __be32 *tsptr;
> + struct net *net;
> + struct tcphdr *th;
> + struct tcp_sock *tp;
> + union tcp_seq_and_ts_off st;
> + struct nf_conntrack_tuple *tuple;
> +
> + th = tcp_hdr(skb);
> + net = nf_ct_net(ct);
> + tuple = &ct->tuplehash[IP_CT_DIR_REPLY].tuple;
> +
why use reply not original, or do I miss something ?
^ permalink raw reply
* Re: [PATCH 2/3] arm64: dts: socfpga: agilex5: Add SoCDK TSN Config2 board
From: Nazle Asmade, Muhammad Nazim Amirul @ 2026-07-01 1:54 UTC (permalink / raw)
To: Andrew Lunn
Cc: dinguyen@kernel.org, maxime.chevallier@bootlin.com,
rmk+kernel@armlinux.org.uk, krzk+dt@kernel.org,
conor+dt@kernel.org, robh@kernel.org, davem@davemloft.net,
edumazet@google.com, kuba@kernel.org, pabeni@redhat.com,
andrew+netdev@lunn.ch, devicetree@vger.kernel.org,
linux-arm-kernel@lists.infradead.org, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org
In-Reply-To: <e4cf8d95-0467-4bdc-8e19-228ced3a8bbc@lunn.ch>
On 30/6/2026 11:25 pm, Andrew Lunn wrote:
> On Tue, Jun 30, 2026 at 02:39:50PM +0000, Nazle Asmade, Muhammad Nazim Amirul wrote:
>> On 30/6/2026 9:58 pm, Andrew Lunn wrote:
>>>> + * gmac1 is the TSN port. The MAC operates in GMII mode internally
>>>> + * while the PHY-side interface is RGMII, so mac-mode and phy-mode differ.
>>>> + */
>>>> +&gmac1 {
>>>> + status = "okay";
>>>> + phy-mode = "rgmii"; /* TX/RX clock delays provided by Agilex5 I/O hardware */
>>> Could you provide more details about this. I want to understand the
>>> big picture.
>>>
>>> Normally we talk about the PCB providing the delays. This sounds like
>>> it is the FPGA? So i need convincing this is correct.
>> Hi Andrew,
>>
>> Thanks for your quick review and yes, it is the FPGA — specifically a
>> soft IP block in the FPGA fabric that implements the RGMII clock delays
>> and is configured before Linux boots via the FPGA bitstream. The driver
>> must not add additional delays on top.
>
> So it depends on how the converter block is described, but ....
>
> From a big picture, MAC and PHY pair, it is the MAC which
> implements the delays.
>
> https://elixir.bootlin.com/linux/v6.15/source/Documentation/devicetree/bindings/net/ethernet-controller.yaml#L346
>
> # There are a small number of cases where the MAC has hard coded
> # delays which cannot be disabled. The 'phy-mode' only describes the
> # PCB. The inability to disable the delays in the MAC does not change
> # the meaning of 'phy-mode'. It does however mean that a 'phy-mode' of
> # 'rgmii' is now invalid, it cannot be supported, since both the PCB
> # and the MAC and PHY adding delays cannot result in a functional
> # link. Thus the MAC should report a fatal error for any modes which
> # cannot be supported. When the MAC implements the delay, it must
> # ensure that the PHY does not also implement the same delay. So it
> # must modify the phy-mode it passes to the PHY, removing the delay it
> # has added. Failure to remove the delay will result in a
> # non-functioning link.
>
> Andrew
>
> ---
> pw-bot: cr
Hi Andrew,
The delays are provided by the FPGA GMII-to-RGMII converter soft IP,
which is hardcoded in the FPGA bitstream and cannot be disabled or
modified from the driver side.
Using phy-mode = "rgmii" is intentional here — it prevents the PHY from
adding its own internal delays on top, since the FPGA converter already
provides the full required delay. This is consistent with how all other
Agilex5 SoCDK board variants are described, as seen in commit
c5637e5ceb4b ("arm64: dts: socfpga: agilex5: Fix phy-mode to rgmii as HW
provides clock delay") already in Dinh Nguyen's tree, which applies the
same rationale across all Agilex5 boards.
BR,
Nazim Amirul
^ permalink raw reply
* Re: [PATCH v2] Subject: [PATCH] net: gro: fix double aggregation of flush-marked skbs
From: Willem de Bruijn @ 2026-07-01 1:54 UTC (permalink / raw)
To: Shiming Cheng, netdev, davem, edumazet, kuba, pabeni, horms,
matthias.bgg, angelogioacchino.delregno, willemb, imv4bel, alice,
eilaimemedsnaimel, sd, steffen.klassert
Cc: lena.wang, stable, Shiming Cheng
In-Reply-To: <20260626084451.27699-1-shiming.cheng@mediatek.com>
Thanks for the fix.
There is something weird with your subject lines:
[PATCH v2] Subject: [PATCH] net:
> The new skb_gro_receive_list() function is missing a critical safety check
> present in the legacy skb_gro_receive() path.
, as of commit 0ab03f353d36 ("net-gro: Fix GRO flush when receiving a
GSO packet.").
Please add a comment referring to this commit, as it well explains the
need for the flush.
> Specifically, it does not
> validate NAPI_GRO_CB(skb)->flush before allowing packet aggregation.
> This allows already-GRO'd packets with existing frag_list to be
> re-aggregated into a new GRO session, corrupting the frag_list chain
> structure. When skb_segment() attempts to unpack these malformed packets,
> it encounters invalid state and triggers a kernel panic.
>
> Scenario (Tethering/Device forwarding):
> 1. Driver: Generated aggregated packet P1 via LRO with frag_list
> 2. Dev A: Receives aggregated fraglist packet and flush flag set
> 3. Dev A: Re-enters GRO, skb_gro_receive_list() is called
> 4. Missing flush check allows re-aggregation despite flush flag
> 5. Frag_list chain becomes corrupted (loops or dangling refs)
> 6. Dev B: TX path calls skb_segment(), crashes on corrupted frag_list
>
> Root cause in skb_segment():
> The check at line ~4891:
> if (hsize <= 0 && i >= nfrags && skb_headlen(list_skb) &&
> (skb_headlen(list_skb) == len || sg)) {
>
> When frag_list is corrupted by double aggregation, when list_skb is
> a NULL pointer from skb->next, skb_headlen(list_skb) dereference
> NULL/corrupted pointers occurs.
>
> Call Trace:
> skb_headlen(NULL skb)
> skb_segment
> tcp_gso_segment
> tcp4_gso_segment
> inet_gso_segment
> skb_mac_gso_segment
> __skb_gso_segment
> skb_gso_segment
> validate_xmit_skb
> validate_xmit_skb_list
> sch_direct_xmit
> qdisc_restart
> __qdisc_run
> qdisc_run
> net_tx_action
>
> Fix: Add NAPI_GRO_CB(skb)->flush validation to the early-return check in
> skb_gro_receive_list(), matching the defensive programming pattern of
> skb_gro_receive().
>
> Fixes: 9dc2c3cd6c11 ("net: add fraglist GRO/GSO support")
> Cc: stable@vger.kernel.org
> Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com>
> ---
> net/core/gro.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/net/core/gro.c b/net/core/gro.c
> index 35f2f708f010..076247c1e662 100644
> --- a/net/core/gro.c
> +++ b/net/core/gro.c
> @@ -229,7 +229,8 @@ int skb_gro_receive(struct sk_buff *p, struct sk_buff *skb)
>
> int skb_gro_receive_list(struct sk_buff *p, struct sk_buff *skb)
> {
> - if (unlikely(p->len + skb->len >= 65536))
> + if (unlikely(p->len + skb->len >= 65536 ||
> + NAPI_GRO_CB(skb)->flush))
> return -E2BIG;
>
> if (!pskb_may_pull(skb, skb_gro_offset(skb))) {
^ permalink raw reply
* Re: [PATCH iproute2-next] ss: stop displaying dccp sockets
From: Stephen Hemminger @ 2026-07-01 1:55 UTC (permalink / raw)
To: Yafang Shao; +Cc: kuniyu, netdev
In-Reply-To: <20260630114121.26430-1-laoar.shao@gmail.com>
On Tue, 30 Jun 2026 19:41:21 +0800
Yafang Shao <laoar.shao@gmail.com> wrote:
> DCCP support was retired in kernel commit 2a63dd0edf38 ("net: Retire
> DCCP socket."). However, ss still attempts to query DCCP sockets via
> netlink, which triggers repeated SELinux warnings in dmesg:
>
> SELinux: unrecognized netlink message: protocol=4 nlmsg_type=19 \
> sclass=netlink_tcpdiag_socket pid=188945 comm=ss
>
> Stop sending DCCPDIAG_GETSOCK netlink messages to suppress these
> warnings and align ss with the kernel change.
>
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> Cc: Kuniyuki Iwashima <kuniyu@google.com>
> ---
Please just completely purge dccp. Remove all vestiages from usage and man page.
Leave no ghosts behind. We purged decnet in the past.
^ permalink raw reply
* Re: [PATCH iproute2-next] ss: stop displaying dccp sockets
From: Stephen Hemminger @ 2026-07-01 1:56 UTC (permalink / raw)
To: Kuniyuki Iwashima; +Cc: Yafang Shao, netdev
In-Reply-To: <CAAVpQUAwrWkXxjErj8QvXg0DCdPc4zkphB4OSLgDDRmGuSrfxg@mail.gmail.com>
On Tue, 30 Jun 2026 16:13:10 -0700
Kuniyuki Iwashima <kuniyu@google.com> wrote:
> > case 'd':
> > - filter_db_set(¤t_filter, DCCP_DB, true);
> > + /* DCCP retired in kernel 6.16, kept for compatibility */
>
> I think it more user-friendly to remove the case and show usage(),
> instead of just ignoring the option.
It should just be killed completely, parse 'd' as error.
^ permalink raw reply
* Re: [PATCH 0/3] arm64: dts/net: stmmac: Add Agilex5 SoCDK TSN Config2 board support
From: Nazle Asmade, Muhammad Nazim Amirul @ 2026-07-01 2:09 UTC (permalink / raw)
To: Maxime Chevallier, dinguyen@kernel.org
Cc: rmk+kernel@armlinux.org.uk, krzk+dt@kernel.org,
conor+dt@kernel.org, robh@kernel.org, davem@davemloft.net,
edumazet@google.com, kuba@kernel.org, pabeni@redhat.com,
andrew+netdev@lunn.ch, devicetree@vger.kernel.org,
linux-arm-kernel@lists.infradead.org, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org
In-Reply-To: <563ac947-0c5d-47ee-aedc-66baf4d32648@bootlin.com>
On 30/6/2026 9:53 pm, Maxime Chevallier wrote:
> Hi,
>
> On 6/30/26 15:31, muhammad.nazim.amirul.nazle.asmade@altera.com wrote:
>> From: Nazim Amirul <muhammad.nazim.amirul.nazle.asmade@altera.com>
>>
>> The Intel SoCFPGA Agilex5 SoCDK TSN Config2 board uses a dual-port
>> Ethernet setup where gmac1 (TSN port) operates with different MAC-side
>> and PHY-side interface modes: GMII internally in the MAC, and RGMII
>> towards the PHY.
>
> There's the same behaviour on Gen5, e.g. CycloneV where we have the
> "EMAC splitter". Based on wether or not we have that splitter in DT,
> we override the INTF_SEL bits to set GMII as the MAC output, the splitter
> converting that to RGMII/SGMII.
>
> Is there something similar on this AgileX5 version by any chance, for
> which we could reuse the logic ?
>
> I know that on CycloneV you also need to adjust that GMII -> RGMII/SGMII
> splitter whenever the speed changes, is that different on agileX5 ? have
> you tested 10/100Mbps ?
>
> Thanks,
>
> Maxime
Hi Maxime,
Yes, we have tested all three speeds.
10Mbps: Link Up - 10Mbps/Full, throughput ~9.35 Mbits/sec 100Mbps: Link
Up - 100Mbps/Full, throughput ~94 Mbits/sec 1000Mbps: Link Up -
1Gbps/Full, throughput ~930 Mbits/sec
BR,
Nazim
^ permalink raw reply
* [PATCH net-next v2 0/2] tools: ynl: pyynl: pull the --family resolution logic into the lib
From: Jakub Kicinski @ 2026-07-01 2:17 UTC (permalink / raw)
To: davem
Cc: netdev, edumazet, pabeni, andrew+netdev, horms, donald.hunter,
sdf, gal, jstancek, ast, Jakub Kicinski
When packaging YNL as a system level utility we added a --family
argument which auto-resolves the full spec path from a well known
path in /usr/share. Spelling out full YAML spec files is at this
point only done in-tree, for example in the selftests which need
the very latest YAML. But the selftests have their own wrapping
classes for each family so test authors aren't really bothered
by having to spell the paths out.
Afford the same ease of use to the Python library users.
Move the path resolution from the CLI code to the library.
This simplifies the pyynl use by a lot:
from pyynl import YnlFamily
ynl = YnlFamily(family="netdev")
Unless I'm missing a trick, resolving the /usr/share path
is hard enough for most users to lean towards shelling out
to ynl CLI with --output-json, which is sad.
v2:
- fix the ethtool script (Donald)
- fix --family=X --validate (Clashiko)
v1: https://lore.kernel.org/20260630001432.2204298-3-kuba@kernel.org
Jakub Kicinski (2):
tools: ynl: pyynl: re-export the library API from the package root
tools: ynl: pyynl: pull the --family resolution logic into the lib
tools/net/ynl/pyynl/__init__.py | 9 +++++
tools/net/ynl/pyynl/cli.py | 56 +++++++----------------------
tools/net/ynl/pyynl/lib/__init__.py | 3 +-
tools/net/ynl/pyynl/lib/nlspec.py | 22 ++++++++++--
tools/net/ynl/pyynl/lib/specdir.py | 51 ++++++++++++++++++++++++++
tools/net/ynl/pyynl/lib/ynl.py | 19 ++++++++--
tools/net/ynl/tests/ethtool.py | 7 +---
7 files changed, 111 insertions(+), 56 deletions(-)
create mode 100644 tools/net/ynl/pyynl/lib/specdir.py
--
2.54.0
^ permalink raw reply
* [PATCH net-next v2 1/2] tools: ynl: pyynl: re-export the library API from the package root
From: Jakub Kicinski @ 2026-07-01 2:17 UTC (permalink / raw)
To: davem
Cc: netdev, edumazet, pabeni, andrew+netdev, horms, donald.hunter,
sdf, gal, jstancek, ast, Jakub Kicinski
In-Reply-To: <20260701021751.3234681-1-kuba@kernel.org>
The public classes live in pyynl.lib, so users had to spell out
from pyynl.lib import YnlFamily
which I forget at least once a month. Re-export lib's API from
the package __init__ so that
from pyynl import YnlFamily
works as well. I don't think there was a real reason not to do
this?
Acked-by: Jan Stancek <jstancek@redhat.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
tools/net/ynl/pyynl/__init__.py | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/tools/net/ynl/pyynl/__init__.py b/tools/net/ynl/pyynl/__init__.py
index e69de29bb2d1..d8f59c132ab7 100644
--- a/tools/net/ynl/pyynl/__init__.py
+++ b/tools/net/ynl/pyynl/__init__.py
@@ -0,0 +1,9 @@
+# SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause
+
+""" Python YNL (YAML Netlink) library. """
+
+# Re-export the public library API so it can be imported straight from the
+# package, e.g. `from pyynl import YnlFamily`.
+# pylint: disable=wildcard-import,unused-wildcard-import
+from .lib import *
+from .lib import __all__
--
2.54.0
^ permalink raw reply related
* [PATCH net-next v2 2/2] tools: ynl: pyynl: pull the --family resolution logic into the lib
From: Jakub Kicinski @ 2026-07-01 2:17 UTC (permalink / raw)
To: davem
Cc: netdev, edumazet, pabeni, andrew+netdev, horms, donald.hunter,
sdf, gal, jstancek, ast, Jakub Kicinski
In-Reply-To: <20260701021751.3234681-1-kuba@kernel.org>
When packaging YNL as a system level utility we added a --family
argument which auto-resolves the full spec path from a well known
path in /usr/share. Spelling out full YAML spec files is at this
point only done in-tree, for example in the selftests which need
the very latest YAML. But the selftests have their own wrapping
classes for each family so test authors aren't really bothered
by having to spell the paths out.
Afford the same ease of use to the Python library users.
Move the path resolution from the CLI code to the library.
This simplifies the pyynl use by a lot:
from pyynl import YnlFamily
ynl = YnlFamily(family="netdev")
Unless I'm missing a trick, resolving the /usr/share path
is hard enough for most users to lean towards shelling out
to ynl CLI with --output-json, which is sad.
The ethtool script can now use family= instead of
resolving the path (the helpers are removed from cli.py
so this isn't just a cleanup).
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
v2:
- fix the ethtool script (Donald)
- fix --family=X --validate (Clashiko)
v1: https://lore.kernel.org/20260630001432.2204298-3-kuba@kernel.org
---
tools/net/ynl/pyynl/cli.py | 56 +++++++----------------------
tools/net/ynl/pyynl/lib/__init__.py | 3 +-
tools/net/ynl/pyynl/lib/nlspec.py | 22 ++++++++++--
tools/net/ynl/pyynl/lib/specdir.py | 51 ++++++++++++++++++++++++++
tools/net/ynl/pyynl/lib/ynl.py | 19 ++++++++--
tools/net/ynl/tests/ethtool.py | 7 +---
6 files changed, 102 insertions(+), 56 deletions(-)
create mode 100644 tools/net/ynl/pyynl/lib/specdir.py
diff --git a/tools/net/ynl/pyynl/cli.py b/tools/net/ynl/pyynl/cli.py
index 8275a806cf73..b6a6ce12b4a7 100755
--- a/tools/net/ynl/pyynl/cli.py
+++ b/tools/net/ynl/pyynl/cli.py
@@ -17,9 +17,7 @@ import textwrap
# pylint: disable=no-name-in-module,wrong-import-position
sys.path.append(pathlib.Path(__file__).resolve().parent.as_posix())
from lib import YnlFamily, Netlink, NlError, SpecFamily, SpecException, YnlException
-
-SYS_SCHEMA_DIR='/usr/share/ynl'
-RELATIVE_SCHEMA_DIR='../../../../Documentation/netlink'
+from lib import list_families
# pylint: disable=too-few-public-methods,too-many-locals
class Colors:
@@ -48,30 +46,6 @@ RELATIVE_SCHEMA_DIR='../../../../Documentation/netlink'
""" Get terminal width in columns (80 if stdout is not a terminal) """
return shutil.get_terminal_size().columns
-def schema_dir():
- """
- Return the effective schema directory, preferring in-tree before
- system schema directory.
- """
- script_dir = os.path.dirname(os.path.abspath(__file__))
- schema_dir_ = os.path.abspath(f"{script_dir}/{RELATIVE_SCHEMA_DIR}")
- if not os.path.isdir(schema_dir_):
- schema_dir_ = SYS_SCHEMA_DIR
- if not os.path.isdir(schema_dir_):
- raise YnlException(f"Schema directory {schema_dir_} does not exist")
- return schema_dir_
-
-def spec_dir():
- """
- Return the effective spec directory, relative to the effective
- schema directory.
- """
- spec_dir_ = schema_dir() + '/specs'
- if not os.path.isdir(spec_dir_):
- raise YnlException(f"Spec directory {spec_dir_} does not exist")
- return spec_dir_
-
-
class YnlEncoder(json.JSONEncoder):
"""A custom encoder for emitting JSON with ynl-specific instance types"""
def default(self, o):
@@ -272,9 +246,8 @@ RELATIVE_SCHEMA_DIR='../../../../Documentation/netlink'
pprint.pprint(msg, width=term_width(), compact=True)
if args.list_families:
- for filename in sorted(os.listdir(spec_dir())):
- if filename.endswith('.yaml'):
- print(filename.removesuffix('.yaml'))
+ for family in list_families():
+ print(family)
return
if args.no_schema:
@@ -284,28 +257,23 @@ RELATIVE_SCHEMA_DIR='../../../../Documentation/netlink'
if args.json_text:
attrs = json.loads(args.json_text)
- if args.family:
- spec = f"{spec_dir()}/{args.family}.yaml"
- else:
- spec = args.spec
- if not os.path.isfile(spec):
- raise YnlException(f"Spec file {spec} does not exist")
+ if args.spec and not os.path.isfile(args.spec):
+ raise YnlException(f"Spec file {args.spec} does not exist")
+ # Spec/YnlFamily will raise if both or neither spec and family are given
if args.validate:
+ # Force validation even for installed specs (schema=True), unless the
+ # user explicitly picked a schema or opted out with --no-schema.
+ schema = True if args.schema is None else args.schema
try:
- SpecFamily(spec, args.schema)
+ SpecFamily(args.spec, schema_path=schema, family=args.family)
except SpecException as error:
print(error)
sys.exit(1)
return
- if args.family: # set behaviour when using installed specs
- if args.schema is None and spec.startswith(SYS_SCHEMA_DIR):
- args.schema = '' # disable schema validation when installed
- if args.process_unknown is None:
- args.process_unknown = True
-
- ynl = YnlFamily(spec, args.schema, args.process_unknown,
+ ynl = YnlFamily(args.spec, schema=args.schema, family=args.family,
+ process_unknown=args.process_unknown,
recv_size=args.dbg_small_recv)
if args.dbg_small_recv:
ynl.set_recv_dbg(True)
diff --git a/tools/net/ynl/pyynl/lib/__init__.py b/tools/net/ynl/pyynl/lib/__init__.py
index be741985ae4e..aa4263c8cba9 100644
--- a/tools/net/ynl/pyynl/lib/__init__.py
+++ b/tools/net/ynl/pyynl/lib/__init__.py
@@ -5,12 +5,13 @@
from .nlspec import SpecAttr, SpecAttrSet, SpecEnumEntry, SpecEnumSet, \
SpecFamily, SpecOperation, SpecSubMessage, SpecSubMessageFormat, \
SpecException
+from .specdir import list_families
from .ynl import YnlFamily, Netlink, NlError, NlPolicy, YnlException
from .doc_generator import YnlDocGenerator
__all__ = ["SpecAttr", "SpecAttrSet", "SpecEnumEntry", "SpecEnumSet",
"SpecFamily", "SpecOperation", "SpecSubMessage", "SpecSubMessageFormat",
- "SpecException",
+ "SpecException", "list_families",
"YnlFamily", "Netlink", "NlError", "NlPolicy", "YnlException",
"YnlDocGenerator"]
diff --git a/tools/net/ynl/pyynl/lib/nlspec.py b/tools/net/ynl/pyynl/lib/nlspec.py
index 0469a0e270d0..b4ec59814ab1 100644
--- a/tools/net/ynl/pyynl/lib/nlspec.py
+++ b/tools/net/ynl/pyynl/lib/nlspec.py
@@ -12,6 +12,8 @@ import importlib
import os
import yaml as pyyaml
+from .specdir import find_spec, SYS_SCHEMA_DIR
+
class SpecException(Exception):
"""Netlink spec exception.
@@ -444,7 +446,23 @@ import yaml as pyyaml
except AttributeError:
_yaml_loader = pyyaml.SafeLoader
- def __init__(self, spec_path, schema_path=None, exclude_ops=None):
+ def __init__(self, spec_path=None, schema_path=None, exclude_ops=None,
+ family=None):
+ # schema_path selects how the spec is validated:
+ # None -- no preference: validate against the default schema,
+ # but trust (skip) installed specs selected by family=
+ # True -- always validate against the default schema
+ # path -- validate against this schema
+ # '' -- do not validate
+ if (spec_path is None) == (family is None):
+ raise ValueError("Specify exactly one of spec path or family name")
+ if family is not None:
+ spec_path = find_spec(family)
+ # Installed specs are assumed correct, so skip schema validation
+ # to save cycles unless the caller asked to validate.
+ if schema_path is None and spec_path.startswith(SYS_SCHEMA_DIR):
+ schema_path = ''
+
with open(spec_path, "r", encoding='utf-8') as stream:
prefix = '# SPDX-License-Identifier: '
first = stream.readline().strip()
@@ -465,7 +483,7 @@ import yaml as pyyaml
self.proto = self.yaml.get('protocol', 'genetlink')
self.msg_id_model = self.yaml['operations'].get('enum-model', 'unified')
- if schema_path is None:
+ if schema_path is None or schema_path is True:
schema_path = os.path.dirname(os.path.dirname(spec_path)) + f'/{self.proto}.yaml'
if schema_path:
with open(schema_path, "r", encoding='utf-8') as stream:
diff --git a/tools/net/ynl/pyynl/lib/specdir.py b/tools/net/ynl/pyynl/lib/specdir.py
new file mode 100644
index 000000000000..fcea9b9fb7b0
--- /dev/null
+++ b/tools/net/ynl/pyynl/lib/specdir.py
@@ -0,0 +1,51 @@
+# SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause
+
+"""
+Locating YNL spec and schema files on disk.
+
+Resolves the directory holding the YAML specs (preferring an in-tree copy
+over the installed system path) and maps family names to spec files.
+"""
+
+import os
+
+SYS_SCHEMA_DIR='/usr/share/ynl'
+RELATIVE_SCHEMA_DIR='../../../../../Documentation/netlink'
+
+
+def schema_dir():
+ """
+ Return the effective schema directory, preferring in-tree before
+ system schema directory.
+ """
+ script_dir = os.path.dirname(os.path.abspath(__file__))
+ schema_dir_ = os.path.abspath(f"{script_dir}/{RELATIVE_SCHEMA_DIR}")
+ if not os.path.isdir(schema_dir_):
+ schema_dir_ = SYS_SCHEMA_DIR
+ if not os.path.isdir(schema_dir_):
+ raise FileNotFoundError(f"Schema directory {schema_dir_} does not exist")
+ return schema_dir_
+
+def spec_dir():
+ """
+ Return the effective spec directory, relative to the effective
+ schema directory.
+ """
+ spec_dir_ = schema_dir() + '/specs'
+ if not os.path.isdir(spec_dir_):
+ raise FileNotFoundError(f"Spec directory {spec_dir_} does not exist")
+ return spec_dir_
+
+
+def find_spec(family):
+ """ Return the path to the YAML spec file for a family by name. """
+ spec = f"{spec_dir()}/{family}.yaml"
+ if not os.path.isfile(spec):
+ raise FileNotFoundError(f"Spec for family '{family}' not found at {spec}")
+ return spec
+
+
+def list_families():
+ """ Return the sorted names of all families with an installed spec. """
+ return sorted(f.removesuffix('.yaml')
+ for f in os.listdir(spec_dir()) if f.endswith('.yaml'))
diff --git a/tools/net/ynl/pyynl/lib/ynl.py b/tools/net/ynl/pyynl/lib/ynl.py
index 092d132edec1..8682bf588e1f 100644
--- a/tools/net/ynl/pyynl/lib/ynl.py
+++ b/tools/net/ynl/pyynl/lib/ynl.py
@@ -661,6 +661,14 @@ from .nlspec import SpecFamily
"""
YNL family -- a Netlink interface built from a YAML spec.
+ The spec can be selected either by file path (def_path=) or, when it
+ ships in a well-known location, by family name (family="xyz"); exactly
+ one of the two must be given. For example:
+
+ from pyynl import YnlFamily
+
+ ynl = YnlFamily(family="netdev")
+
Primary use of the class is to execute Netlink commands:
ynl.<op_name>(attrs, ...)
@@ -691,11 +699,16 @@ from .nlspec import SpecFamily
ynl.get_policy(op_name, mode) -- query kernel policy for an op
"""
- def __init__(self, def_path, schema=None, process_unknown=False,
- recv_size=0):
- super().__init__(def_path, schema)
+ def __init__(self, def_path=None, schema=None, process_unknown=None,
+ recv_size=0, family=None):
+ super().__init__(def_path, schema, family=family)
self.include_raw = False
+ # Specs from /usr (selected by family=) have a higher chance of being
+ # stale, default to ignoring unknown attrs. In-tree users, and users
+ # who bundle the spec need to make a conscious decision.
+ if process_unknown is None:
+ process_unknown = family is not None
self.process_unknown = process_unknown
try:
diff --git a/tools/net/ynl/tests/ethtool.py b/tools/net/ynl/tests/ethtool.py
index db3b62c652e7..0ee0c8e87686 100755
--- a/tools/net/ynl/tests/ethtool.py
+++ b/tools/net/ynl/tests/ethtool.py
@@ -11,12 +11,10 @@ import pathlib
import pprint
import sys
import re
-import os
# pylint: disable=no-name-in-module,wrong-import-position
sys.path.append(pathlib.Path(__file__).resolve().parent.parent.joinpath('pyynl').as_posix())
# pylint: disable=import-error
-from cli import schema_dir, spec_dir
from lib import YnlFamily
@@ -173,10 +171,7 @@ from lib import YnlFamily
args = parser.parse_args()
- spec = os.path.join(spec_dir(), 'ethtool.yaml')
- schema = os.path.join(schema_dir(), 'genetlink-legacy.yaml')
-
- ynl = YnlFamily(spec, schema, recv_size=args.dbg_small_recv)
+ ynl = YnlFamily(family='ethtool', recv_size=args.dbg_small_recv)
if args.dbg_small_recv:
ynl.set_recv_dbg(True)
--
2.54.0
^ permalink raw reply related
* Re: [PATCH net] net/sched: act_bpf: use rcu_dereference_bh() to read the filter
From: patchwork-bot+netdevbpf @ 2026-07-01 2:20 UTC (permalink / raw)
To: Sechang Lim
Cc: davem, edumazet, kuba, pabeni, jhs, jiri, daniel, john.fastabend,
sdf, ast, andrii, martin.lau, horms, bpf, netdev, linux-kernel
In-Reply-To: <20260629154112.1164986-1-rhkrqnwk98@gmail.com>
Hello:
This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Mon, 29 Jun 2026 15:41:06 +0000 you wrote:
> tcf_bpf_act() can run from the tc egress path, which holds only
> rcu_read_lock_bh(), but reads prog->filter with rcu_dereference() and
> trips lockdep:
>
> WARNING: suspicious RCU usage
> net/sched/act_bpf.c:47 suspicious rcu_dereference_check() usage!
> 1 lock held by syz.2.1588/12756:
> #0: (rcu_read_lock_bh){....}-{1:3}, at: __dev_queue_xmit net/core/dev.c:4792
> tcf_bpf_act+0x6ae/0x940 net/sched/act_bpf.c:47
> tcf_classify+0x6e4/0x1080 net/sched/cls_api.c:1860
> sch_handle_egress net/core/dev.c:4545 [inline]
> __dev_queue_xmit+0x2185/0x2c00 net/core/dev.c:4808
> packet_sendmsg+0x3dfa/0x5120 net/packet/af_packet.c:3114
>
> [...]
Here is the summary with links:
- [net] net/sched: act_bpf: use rcu_dereference_bh() to read the filter
https://git.kernel.org/netdev/net/c/adc49c7ba690
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox