Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net-next] chelsio: delete the line with the pidx initialization
From: Jakub Kicinski @ 2026-07-01  0:16 UTC (permalink / raw)
  To: Markov Gleb
  Cc: Andrew Lunn, David S . Miller, Eric Dumazet, Paolo Abeni, netdev,
	linux-kernel, lvc-project
In-Reply-To: <20260629130839.218-1-markov.gi@npc-ksb.ru>

On Mon, 29 Jun 2026 16:08:35 +0300 Markov Gleb wrote:
> The value of pidx is overwritten immediately after exiting the "if" block.
> 
> Remove pidx ptr initialization string from conditional block.

shrug?
-- 
pw-bot: reject

^ permalink raw reply

* Re: [PATCH bpf-next v4 2/2] selftests/bpf: drop tc/xdp/flow_dissector/socket_filter sockmap mutation tests
From: John Fastabend @ 2026-07-01  0:10 UTC (permalink / raw)
  To: Sechang Lim
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Eduard Zingerman, Kumar Kartikeya Dwivedi, David S. Miller,
	Jakub Kicinski, Jesper Dangaard Brouer, Shuah Khan,
	Martin KaFai Lau, Song Liu, Yonghong Song, Jiri Olsa,
	Emil Tsalapatis, Stanislav Fomichev, Jiayuan Chen, Varun R Mallya,
	Ihor Solodrai, bpf, netdev, linux-kernel, linux-kselftest
In-Reply-To: <20260630145410.3648099-3-rhkrqnwk98@gmail.com>

On Tue, Jun 30, 2026 at 02:54:06PM +0000, Sechang Lim wrote:
>tc, xdp, socket_filter and flow_dissector programs can no longer update
>or delete a sockmap. Adjust the tests:
>
> - verifier_sockmap_mutate: the tc, xdp, socket_filter and
>   flow_dissector cases now expect __failure with "cannot update sockmap
>   in this context".
> - sockmap_basic: drop "sockmap update" / "sockhash update", which load
>   a SEC("tc") program that copies a sock between maps.
> - fexit_bpf2bpf: drop "func_sockmap_update", whose freplace program
>   updates a sockmap in the tc cls_redirect context.
>
>Remove the now-unused test_sockmap_update.c and freplace_cls_redirect.c.
>
>Signed-off-by: Sechang Lim <rhkrqnwk98@gmail.com>

Reviewed-by: John Fastabend <john.fastabend@gmail.com>

^ permalink raw reply

* Re: [PATCH net] sctp: fix addr_wq_timer race in sctp_free_addr_wq()
From: patchwork-bot+netdevbpf @ 2026-07-01  0:10 UTC (permalink / raw)
  To: Xin Long
  Cc: netdev, linux-sctp, davem, kuba, edumazet, pabeni, horms,
	marcelo.leitner, ebiederm
In-Reply-To: <5dc95f295bdb5c3f60e880dd9aa5112dc5c071cc.1782757874.git.lucien.xin@gmail.com>

Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Mon, 29 Jun 2026 14:31:14 -0400 you wrote:
> sctp_free_addr_wq() previously removed addr_wq_timer using timer_delete()
> while holding addr_wq_lock. However, timer_delete() does not guarantee that
> a currently running timer handler has completed.
> 
> This allows a race with sctp_addr_wq_timeout_handler(), where the handler
> may still run after addr_waitq has been freed, acquire addr_wq_lock, and
> access freed memory, leading to a use-after-free.
> 
> [...]

Here is the summary with links:
  - [net] sctp: fix addr_wq_timer race in sctp_free_addr_wq()
    https://git.kernel.org/netdev/net/c/976c19de0f22

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net 1/2] net/sched: act_skbmod: require an Ethernet header for MAC rewrites
From: Jakub Kicinski @ 2026-07-01  0:10 UTC (permalink / raw)
  To: Ren Wei
  Cc: netdev, jhs, jiri, davem, edumazet, pabeni, horms, peilin.ye,
	cong.wang, gnault, yuantan098, yifanwucs, tomapufckgml, zcliangcn,
	bird, bronzed_45_vested
In-Reply-To: <3ab4a04fbab887238facc1792e02c33fd68190f7.1782548651.git.bronzed_45_vested@icloud.com>

On Mon, 29 Jun 2026 10:46:03 +0800 Ren Wei wrote:
> Cc: stable@vger.kernel.org
> Reported-by: Yuan Tan <yuantan098@gmail.com>
> Reported-by: Yifan Wu <yifanwucs@gmail.com>
> Reported-by: Juefei Pu <tomapufckgml@gmail.com>
> Reported-by: Zhengchuan Liang <zcliangcn@gmail.com>
> Reported-by: Xin Liu <bird@lzu.edu.cn>
> Assisted-by: Codex:GPT-5.4
> Signed-off-by: Wyatt Feng <bronzed_45_vested@icloud.com>
> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn>

Let's do away with the 5 reported-by tags? You can use a tag for your
tool or your team, it doesn't have to be a person. Look at sashiko or
syzbot reported-by tags.
-- 
pw-bot: cr

^ permalink raw reply

* Re: [PATCH bpf-next v4 1/2] bpf, sockmap: disallow update and delete from tc, xdp, socket_filter and flow_dissector
From: John Fastabend @ 2026-07-01  0:07 UTC (permalink / raw)
  To: Sechang Lim
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Eduard Zingerman, Kumar Kartikeya Dwivedi, David S. Miller,
	Jakub Kicinski, Jesper Dangaard Brouer, Shuah Khan,
	Martin KaFai Lau, Song Liu, Yonghong Song, Jiri Olsa,
	Emil Tsalapatis, Stanislav Fomichev, Jiayuan Chen, Varun R Mallya,
	Ihor Solodrai, bpf, netdev, linux-kernel, linux-kselftest
In-Reply-To: <20260630145410.3648099-2-rhkrqnwk98@gmail.com>

On Tue, Jun 30, 2026 at 02:54:05PM +0000, Sechang Lim wrote:
>sock_map_update_common() and __sock_map_delete() hold stab->lock and call
>sock_map_unref() -> sock_map_del_link(), which takes sk_callback_lock for
>write. That gives the order stab->lock -> sk_callback_lock.
>
>The reverse order comes from the SK_SKB stream parser.
>sk_psock_strp_data_ready() holds sk_callback_lock for read, and after the
>verdict tcp_bpf_strp_read_sock() acks the consumed data inline via
>__tcp_cleanup_rbuf(). The ACK goes out egress, where a sched_cls program
>deletes from the sockmap and takes stab->lock:
>
>  WARNING: possible circular locking dependency detected

[...]

>A tc, xdp, socket_filter or flow_dissector program has no reason to
>update or delete a sockmap, and redirect does not go through here. Drop
>them from may_update_sockmap() so the verifier rejects it. It also
>closes the matching sockhash inversion.
>
>Suggested-by: John Fastabend <john.fastabend@gmail.com>
>Signed-off-by: Sechang Lim <rhkrqnwk98@gmail.com>
>---

Great lets get this merged and we will separately fix the sockops
issue reported by the bots.

Reviewed-by: John Fastabend <john.fastabend@gmail.com>

^ permalink raw reply

* Re: [PATCH 00/13] treewide: replace linux/gpio.h
From: patchwork-bot+netdevbpf @ 2026-07-01  0:00 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linux-gpio, arnd, brgl, andrew, sebastian.hesselbarth,
	gregory.clement, Frank.Li, robert.jarzmik, krzk, gerg, tsbogend,
	hauke, zajec5, ysato, glaubitz, linusw, dmitry.torokhov, kuba,
	pabeni, linux, linux-kernel, linux-arm-kernel, linux-samsung-soc,
	patches, linux-m68k, linux-mips, linux-sh, linux-input,
	linux-media, netdev, linux-sunxi, linux-phy, linux-rockchip,
	linux-sound
In-Reply-To: <20260629132633.1300009-1-arnd@kernel.org>

Hello:

This series was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Mon, 29 Jun 2026 15:26:20 +0200 you wrote:
> From: Arnd Bergmann <arnd@arndb.de>
> 
> The linux/gpio.h header used to be the global definition for the gpio
> interfaces, with 1100 users back in linux-3.17. In linux-7.2, only about
> 130 of those remain, so this series cleans out the rest.
> 
> In each subsystem, we can replace the header either with
> linux/gpio/consumer.h for users of the modern gpio descriptor interface,
> or linux/gpio/legacy.h for the few remaining users of the old number
> based interface.
> 
> [...]

Here is the summary with links:
  - [01/13] ARM: replace linux/gpio.h inclusions
    (no matching commit)
  - [02/13] m68k/coldfire: replace linux/gpio.h inclusions
    (no matching commit)
  - [03/13] mips: replace linux/gpio.h inclusions
    (no matching commit)
  - [04/13] sh: replace linux/gpio.h inclusions
    (no matching commit)
  - [05/13] mfd: replace linux/gpio.h inclusions
    (no matching commit)
  - [06/13,net-next] net: replace linux/gpio.h inclusions
    https://git.kernel.org/netdev/net-next/c/a53d1872f2be
  - [07/13] ASoC: replace linux/gpio.h inclusions
    (no matching commit)
  - [08/13] pcmcia: replace linux/gpio.h inclusions
    (no matching commit)
  - [09/13] phy: replace linux/gpio.h inclusions
    (no matching commit)
  - [10/13] media: replace linux/gpio.h inclusions
    (no matching commit)
  - [11/13] Input: matrix_keyboard - replace linux/gpio.h inclusion
    (no matching commit)
  - [12/13] gpib: gpio: replace linux/gpio.h inclusion
    (no matching commit)
  - [13/13] gpiolib: remove linux/gpio.h
    (no matching commit)

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH v3 net-next] selftests/xsk: Preserve UMEM view in BIDIRECTIONAL test
From: patchwork-bot+netdevbpf @ 2026-07-01  0:00 UTC (permalink / raw)
  To: Maciej Fijalkowski
  Cc: netdev, bpf, magnus.karlsson, stfomichev, kuba, pabeni, horms,
	tushar.vyavahare, kerneljasonxing
In-Reply-To: <20260629191221.2700-1-maciej.fijalkowski@intel.com>

Hello:

This patch was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Mon, 29 Jun 2026 21:12:21 +0200 you wrote:
> The UMEM state refactor made __send_pkts() use xsk->umem for Tx
> address generation. At the same time, the shared-UMEM Tx setup copies the
> Rx UMEM state into a Tx-local state object and resets base_addr and
> next_buffer before configuring the Tx socket.
> 
> Passing that Tx-local object to xsk_configure() makes xsk->umem point to
> the zero-based Tx allocator state. This breaks the BIDIRECTIONAL test once
> the roles are switched: the same socket is then used for Rx validation, but
> received descriptors from the other logical UMEM half are checked against
> base_addr == 0. With the new UMEM bounds check, a valid address such as
> base_addr + XDP_PACKET_HEADROOM is rejected as being outside the UMEM
> window.
> 
> [...]

Here is the summary with links:
  - [v3,net-next] selftests/xsk: Preserve UMEM view in BIDIRECTIONAL test
    https://git.kernel.org/netdev/net-next/c/333289d1690d

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net] selftests: net: bump default cmd() timeout to 20 seconds
From: patchwork-bot+netdevbpf @ 2026-07-01  0:00 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, shuah,
	petrm, leitao, dw, noren, gal, linux-kselftest
In-Reply-To: <20260629233348.2145841-1-kuba@kernel.org>

Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Mon, 29 Jun 2026 16:33:48 -0700 you wrote:
> We always used 5 sec as the default command timeout. But soon after
> it was introduced, David effectively made us ignore the timeout
> (it was passed to process.communicate() as the wrong argument).
> Gal recently fixed that, but turns out the 5 sec is not enough
> for a lot of tests and setups. The fix regressed regressions.
> 
> In particular running reconfig commands (e.g. XDP attach) on mlx5
> with 32 rings and 9k MTU, on a heavily-debug-enabled kernel takes
> more than 5 sec. The XDP installation command will time out after
> 5 sec but since the sleeps in the kernel are non interruptible
> the command finishes anyway, leaving the XDP program attached,
> but with non-zero exit code. defer()ed cleanups are not installed,
> breaking the environment for subsequent tests.
> 
> [...]

Here is the summary with links:
  - [net] selftests: net: bump default cmd() timeout to 20 seconds
    https://git.kernel.org/netdev/net/c/57bb59ab6fa3

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net-next v1 0/2] Reuse threaded NAPI kthread across napi_del()/napi_add().
From: Jakub Kicinski @ 2026-06-30 23:52 UTC (permalink / raw)
  To: Mina Almasry
  Cc: Harshitha Ramamurthy, Jordan Rhee, Shuhao Tan, David S . Miller,
	Eric Dumazet, Paolo Abeni, Simon Horman, Andrew Lunn, Shuah Khan,
	Samiullah Khawaja, Kuniyuki Iwashima, netdev, linux-kernel,
	linux-kselftest
In-Reply-To: <CAHS8izOc7OPrspPfo-6eAwPscQzk6ZzHQd10MFrRvzuPXU9WjA@mail.gmail.com>

On Tue, 30 Jun 2026 16:41:44 -0700 Mina Almasry wrote:
> > Can y'all not open a pidfs fd for the NAPI thread? You'll get a
> > notification when the existing kthread dies?  
> 
> Let me take a look, but I think we need a notification when the
> kthread is back up to reconfigure it. I guess if we're trying very
> hard not to touch the current code we can always monitor the running
> napi kthreads and their affinity and work around it like that.

To be clear -- I don't mind making changes. My first reaction was
to suggest adding Netlink notifications for when NAPIs are created /
removed. That's the standard Netlink way of letting the user space
do what's needed without adding kernel complexity. Then I remembered
pidfs can probably already do it.

The kernel threads for NAPI, and the state transitions are already
quite hairy. Plus keeping the threads for potentially dead NAPIs
around... IDK, just doesn't feel very clean.

^ permalink raw reply

* [Intel-wired-lan][PATCH iwl-net] idpf: adjust TxQ ring count minimum
From: Joshua Hay @ 2026-06-30 23:56 UTC (permalink / raw)
  To: intel-wired-lan; +Cc: netdev

Set the TxQ ring count minimum to 128 descriptors. Any lower than this,
and the queue will stall and trigger Tx timeouts in flow based
scheduling mode. This is because next_to_clean might never be updated.

In flow based scheduling mode, next_to_clean is only updated after a
descriptor completion is processed, i.e. after the RE bit is set in the
last descriptor of a Tx packet. This will never happen with a ring size
of 64 and an IDPF_TX_SPLITQ_RE_MIN_GAP of 64. No matter what the value
of last_re is initialized/set to, the calculated gap will be at most 63
and never trigger the RE bit.

Even a ring size of 96 does not solve this. Because of how infrequent
next_to_clean is updated and how small the ring is, IDPF_DESC_UNUSED
will be much smaller on average. This increases the chance the queue
will be stopped because a multi-descriptor packet, e.g. a large LSO
packet, does not see enough resources on the ring. In this case, the
queue will trigger the stop logic. The queue permanently stalls because
there is no chance for a descriptor completion to update next_to_clean
since it is dependent on a packet being sent.

Fixes: 5f417d551324 ("idpf: replace flow scheduling buffer ring with buffer pool")
Signed-off-by: Joshua Hay <joshua.a.hay@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
---
 drivers/net/ethernet/intel/idpf/idpf_txrx.c | 5 +----
 drivers/net/ethernet/intel/idpf/idpf_txrx.h | 2 +-
 2 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/idpf/idpf_txrx.c b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
index 7f9056404f64..c724d429a7aa 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_txrx.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
@@ -3097,10 +3097,7 @@ static netdev_tx_t idpf_tx_splitq_frame(struct sk_buff *skb,
 
 		tx_params.dtype = IDPF_TX_DESC_DTYPE_FLEX_FLOW_SCHE;
 		tx_params.eop_cmd = IDPF_TXD_FLEX_FLOW_CMD_EOP;
-		/* Set the RE bit to periodically "clean" the descriptor ring.
-		 * MIN_GAP is set to MIN_RING size to ensure it will be set at
-		 * least once each time around the ring.
-		 */
+		/* Set the RE bit periodically to "clean" the descriptor ring */
 		if (idpf_tx_splitq_need_re(tx_q)) {
 			tx_params.eop_cmd |= IDPF_TXD_FLEX_FLOW_CMD_RE;
 			tx_q->txq_grp->num_completions_pending++;
diff --git a/drivers/net/ethernet/intel/idpf/idpf_txrx.h b/drivers/net/ethernet/intel/idpf/idpf_txrx.h
index 4be5b3b6d3ed..908dfa28674e 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_txrx.h
+++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.h
@@ -21,7 +21,7 @@
 /* Mailbox Queue */
 #define IDPF_MAX_MBXQ				1
 
-#define IDPF_MIN_TXQ_DESC			64
+#define IDPF_MIN_TXQ_DESC			128
 #define IDPF_MIN_RXQ_DESC			64
 #define IDPF_MIN_TXQ_COMPLQ_DESC		256
 #define IDPF_MAX_QIDS				256
-- 
2.39.2


^ permalink raw reply related

* Re: [PATCH net-next 6/6] net: document NETDEV_UNREGISTER unlocked rationale
From: Jakub Kicinski @ 2026-06-30 23:43 UTC (permalink / raw)
  To: Stanislav Fomichev; +Cc: netdev, davem, edumazet, pabeni
In-Reply-To: <20260630182129.1601784-7-sdf@fomichev.me>

On Tue, 30 Jun 2026 11:21:29 -0700 Stanislav Fomichev wrote:
> +Many ``NETDEV_UNREGISTER`` handlers release their lowers with
> +``dev_close()``, which takes the instance lock itself. Holding
> +the lock across UNREGISTER would deadlock.
> +
> +Moving UNREGISTER under the lock is mechanical: switch those
> +callers to the ``netif_*()`` lock-held variants. Deferred to
> +limit churn.

Not following TBH. Let's say there's a UNREGISTER ntf for eth0.
Are you saying that eg. vlan which closes their own vlan0 devices
on top of eth0 needs to be switched to netif_ ? That wouldn't make
sense since the notification is holding netdev_lock(eth0) and
we're talking about netif_close(vlan0)?

Doing anything with the device that is sending the UNREGISTER
sounds odd, since it's going away..

^ permalink raw reply

* Re: [PATCH net-next v1 0/2] Reuse threaded NAPI kthread across napi_del()/napi_add().
From: Mina Almasry @ 2026-06-30 23:41 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Harshitha Ramamurthy, Jordan Rhee, Shuhao Tan, David S . Miller,
	Eric Dumazet, Paolo Abeni, Simon Horman, Andrew Lunn, Shuah Khan,
	Samiullah Khawaja, Kuniyuki Iwashima, netdev, linux-kernel,
	linux-kselftest
In-Reply-To: <20260630160555.3736f900@kernel.org>

On Tue, Jun 30, 2026 at 4:06 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Tue, 30 Jun 2026 10:38:01 -0700 Mina Almasry wrote:
> > > > It feels surprising that the userspace needs to reconfigure thread
> > > > properties when changing NIC configurations unrelated to threading.
> > > > Another downside is that when userspace configures NIC configurations
> > > > in quick succession, re-application becomes messy because a previous
> > > > re-application might still be in progress when the thread is gone.
> > >
> > > Can you explain more about your deployment and system configuration
> > > flow? We may be adding micro optimizations when the problem is that
> > > we recreate the NAPIs in the first place.
> >
> > We have an AF_XDP application with extremely low latency and jitter
> > requirements running on our servers. Sami developed busypolling
> > threaded napi for them. Since it's an AF_XDP application, they attach
> > their umem to specific RX queues, and then configure threaded NAPI
> > busypolling to achieve low latency. That involves using the Netlink
> > API to set the threaded/busypolling property, grabbing the kthread
> > PID, and setting some properties on the kthread.
>
> What I don't understand is how you have an "application with extremely
> low latency and jitter requirements" and at the same time "user runs
> an unrelated ethtool command" reallocating NAPIs and disrupting that
> application.
>

It is unlikely. However, if someone intentionally or accidentally
triggers a NAPI reallocation, it should only be disruptive while it is
happening, not leave the application running in a degraded state for
the rest of the workload run. Right now it feels like the user must be
very careful. There are also link down/up events outside the user's
control, which should not happen, but if they do, minimizing their
impact would be nice.

> Honestly, the last two times y'all were touching NAPI it was a major
> effort to get the code into acceptable shape. I don't have the cycles
> right now to help another unknown-upstream (intern?) get their patches
> into shape.
>

My sincere apologies. I also did not do a good job of not taxing your
time with the devmem stuff. FWIW I reviewed this before it was sent,
but I didn't think my Reviewed-by would move the needle much. We can
block future iterations until we get reviewed-bys from Willem or
Kuniyuki.

> Can y'all not open a pidfs fd for the NAPI thread? You'll get a
> notification when the existing kthread dies?

Let me take a look, but I think we need a notification when the
kthread is back up to reconfigure it. I guess if we're trying very
hard not to touch the current code we can always monitor the running
napi kthreads and their affinity and work around it like that.

-- 
Thanks,
Mina

^ permalink raw reply

* Re: [PATCH v3 net-next 1/1] tcp: Replace min_tso_segs() with tso_segs() CC callback
From: Alexei Starovoitov @ 2026-06-30 23:20 UTC (permalink / raw)
  To: chia-yu.chang, jolsa, yonghong.song, song, linux-kselftest,
	memxor, shuah, martin.lau, ast, daniel, andrii, eddyz87, horms,
	dsahern, bpf, netdev, pabeni, jhs, kuba, stephen, davem, edumazet,
	andrew+netdev, donald.hunter, kuniyu, ij, ncardwell,
	koen.de_schepper, g.white, ingemar.s.johansson, mirja.kuehlewind,
	cheshire, rs.ietf, Jason_Livingood, vidhi_goel
In-Reply-To: <20260630120145.286497-1-chia-yu.chang@nokia-bell-labs.com>

On Tue Jun 30, 2026 at 5:01 AM PDT, chia-yu.chang wrote:
> From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
>
> This patch replaces existing min_tso_segs() with tso_segs() CC callbak
> for CC algorithm to provides explicit tso segment number of each data
> burst and overrides tcp_tso_autosize().
>
> This change provides below impacts on BPF struct_ops users:
> - The callback is renamed from min_tso_segs to tso_segs
> - The signature gains an extra u32 mss_now argument
> - The return value semantics is changed from "floor value passed into
>   tcp_tso_autosize()" to "final tso_segs value", bypassing autosizing
>
> As a result, BPF programs shall be updated, beccause retuning a small
> constans will now directly limit tso_segs instead of the minimum.
>
> Signed-off-by: Ilpo Järvinen <ij@kernel.org>
> Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
> ---
>  include/net/tcp.h                                | 13 +++++++++++--
>  net/ipv4/bpf_tcp_ca.c                            |  8 +++++---
>  net/ipv4/tcp_bbr.c                               | 13 ++++++++++---
>  net/ipv4/tcp_output.c                            | 13 +++++++------
>  tools/testing/selftests/bpf/progs/tcp_ca_kfunc.c |  8 ++++----
>  5 files changed, 37 insertions(+), 18 deletions(-)
>
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index 6d376ea4d1c0..7fb42a0ce7da 100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -824,6 +824,9 @@ unsigned int tcp_sync_mss(struct sock *sk, u32 pmtu);
>  unsigned int tcp_current_mss(struct sock *sk);
>  u32 tcp_clamp_probe0_to_user_timeout(const struct sock *sk, u32 when);
>  
> +u32 tcp_tso_autosize(const struct sock *sk, unsigned int mss_now,
> +		     int min_tso_segs);
> +
>  /* Bound MSS / TSO packet size with the half of the window */
>  static inline int tcp_bound_to_half_wnd(struct tcp_sock *tp, int pktsize)
>  {
> @@ -1361,8 +1364,14 @@ struct tcp_congestion_ops {
>  	/* hook for packet ack accounting (optional) */
>  	void (*pkts_acked)(struct sock *sk, const struct ack_sample *sample);
>  
> -	/* override sysctl_tcp_min_tso_segs (optional) */
> -	u32 (*min_tso_segs)(struct sock *sk);
> +	/*
> +	 * Override tcp_tso_autosize (optional)
> +	 *
> +	 * If provided, this callback returns the final TSO segment number
> +	 * and will bypass tcp_tso_autosize() entirely. The implementation
> +	 * must derive an appropriate value and ensure the result is valid.
> +	 */
> +	u32 (*tso_segs)(struct sock *sk, u32 mss_now);

I don't like this interface change.
It introduces churn for no good reason.
At least I don't see why you cannot live with the existing api.


^ permalink raw reply

* [PATCH iwl-next v3 2/2] idpf: implement pci error handlers
From: Emil Tantilov @ 2026-06-30 23:18 UTC (permalink / raw)
  To: intel-wired-lan
  Cc: netdev, przemyslaw.kitszel, jay.bhat, ivan.d.barrera,
	aleksandr.loktionov, larysa.zaremba, anthony.l.nguyen,
	andrew+netdev, davem, edumazet, kuba, pabeni, aleksander.lobakin,
	linux-pci, madhu.chittim, decot, willemb, sheenamo, lukas
In-Reply-To: <20260630231854.11536-1-emil.s.tantilov@intel.com>

Add callbacks to handle PCI errors and FLR reset. When preparing to handle
reset on the bus, the driver must stop all operations that can lead to MMIO
access in order to prevent HW errors. To accomplish this, introduce helper
idpf_reset_prepare() that gets called prior to FLR or when PCI error is
detected. Upon resume the recovery is done through the existing reset path
by starting the event task.

The following callbacks are implemented:
.reset_prepare runs the first portion of the generic reset path leading up
to the part where we wait for the reset to complete.
.reset_done/resume runs the recovery part of the reset handling.
.error_detected is the callback dealing with PCI errors, similar to the
prepare call, we stop all operations, prior to attempting a recovery.
.slot_reset is the callback attempting to restore the device, provided a
PCI reset was initiated due to an error on the bus.

Whereas previously the init logic guaranteed netdevs during reset, the
addition of idpf_detach_and_close() to the PCI callbacks flow makes it
possible for the function to be called without netdevs. Add check to
avoid NULL pointer dereference in that case.

Co-developed-by: Alan Brady <alan.brady@intel.com>
Signed-off-by: Alan Brady <alan.brady@intel.com>
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Reviewed-by: Jay Bhat <jay.bhat@intel.com>
Reviewed-by: Madhu Chittim <madhu.chittim@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
---
 drivers/net/ethernet/intel/idpf/idpf.h      |   3 +
 drivers/net/ethernet/intel/idpf/idpf_lib.c  |  13 ++-
 drivers/net/ethernet/intel/idpf/idpf_main.c | 122 ++++++++++++++++++++
 3 files changed, 136 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/idpf/idpf.h b/drivers/net/ethernet/intel/idpf/idpf.h
index 470bc23c844c..a7fc850a4904 100644
--- a/drivers/net/ethernet/intel/idpf/idpf.h
+++ b/drivers/net/ethernet/intel/idpf/idpf.h
@@ -88,6 +88,7 @@ enum idpf_state {
  * @IDPF_REMOVE_IN_PROG: Driver remove in progress
  * @IDPF_MB_INTR_MODE: Mailbox in interrupt mode
  * @IDPF_VC_CORE_INIT: virtchnl core has been init
+ * @IDPF_PCI_CB_RESET: Reset via the PCI callbacks
  * @IDPF_FLAGS_NBITS: Must be last
  */
 enum idpf_flags {
@@ -97,6 +98,7 @@ enum idpf_flags {
 	IDPF_REMOVE_IN_PROG,
 	IDPF_MB_INTR_MODE,
 	IDPF_VC_CORE_INIT,
+	IDPF_PCI_CB_RESET,
 	IDPF_FLAGS_NBITS,
 };
 
@@ -1012,4 +1014,5 @@ void idpf_idc_vdev_mtu_event(struct iidc_rdma_vport_dev_info *vdev_info,
 int idpf_add_del_fsteer_filters(struct idpf_adapter *adapter,
 				struct virtchnl2_flow_rule_add_del *rule,
 				enum virtchnl2_op opcode);
+void idpf_detach_and_close(struct idpf_adapter *adapter);
 #endif /* !_IDPF_H_ */
diff --git a/drivers/net/ethernet/intel/idpf/idpf_lib.c b/drivers/net/ethernet/intel/idpf/idpf_lib.c
index 1c19cd1f9dd1..80d04e59e151 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_lib.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_lib.c
@@ -758,13 +758,16 @@ static int idpf_init_mac_addr(struct idpf_vport *vport,
 	return 0;
 }
 
-static void idpf_detach_and_close(struct idpf_adapter *adapter)
+void idpf_detach_and_close(struct idpf_adapter *adapter)
 {
 	int max_vports = adapter->max_vports;
 
 	for (int i = 0; i < max_vports; i++) {
 		struct net_device *netdev = adapter->netdevs[i];
 
+		if (!netdev)
+			continue;
+
 		/* If the interface is in detached state, that means the
 		 * previous reset was not handled successfully for this
 		 * vport.
@@ -1908,6 +1911,10 @@ static void idpf_init_hard_reset(struct idpf_adapter *adapter)
 
 	dev_info(dev, "Device HW Reset initiated\n");
 
+	/* Reset has already happened, skip to recovery. */
+	if (test_and_clear_bit(IDPF_PCI_CB_RESET, adapter->flags))
+		goto check_rst_complete;
+
 	/* Prepare for reset */
 	if (test_bit(IDPF_HR_DRV_LOAD, adapter->flags)) {
 		reg_ops->trigger_reset(adapter, IDPF_HR_DRV_LOAD);
@@ -1926,6 +1933,7 @@ static void idpf_init_hard_reset(struct idpf_adapter *adapter)
 		goto unlock_mutex;
 	}
 
+check_rst_complete:
 	/* Wait for reset to complete */
 	err = idpf_check_reset_complete(adapter, &adapter->reset_reg);
 	if (err) {
@@ -1985,7 +1993,8 @@ void idpf_vc_event_task(struct work_struct *work)
 	if (test_bit(IDPF_HR_FUNC_RESET, adapter->flags))
 		goto func_reset;
 
-	if (test_bit(IDPF_HR_DRV_LOAD, adapter->flags))
+	if (test_bit(IDPF_HR_DRV_LOAD, adapter->flags) ||
+	    test_bit(IDPF_PCI_CB_RESET, adapter->flags))
 		goto drv_load;
 
 	return;
diff --git a/drivers/net/ethernet/intel/idpf/idpf_main.c b/drivers/net/ethernet/intel/idpf/idpf_main.c
index 064bf3583824..1786a0dd026b 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_main.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_main.c
@@ -238,6 +238,7 @@ static int idpf_cfg_device(struct idpf_adapter *adapter)
 	if (err)
 		pci_dbg(pdev, "PCIe PTM is not supported by PCIe bus/controller\n");
 
+	pci_save_state(pdev);
 	pci_set_drvdata(pdev, adapter);
 
 	return 0;
@@ -364,6 +365,126 @@ static int idpf_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	return err;
 }
 
+static void idpf_reset_prepare(struct idpf_adapter *adapter)
+{
+	pci_dbg(adapter->pdev, "resetting\n");
+	cancel_delayed_work_sync(&adapter->serv_task);
+	cancel_delayed_work_sync(&adapter->vc_event_task);
+	cancel_delayed_work_sync(&adapter->init_task);
+	set_bit(IDPF_HR_RESET_IN_PROG, adapter->flags);
+	idpf_detach_and_close(adapter);
+	idpf_idc_issue_reset_event(adapter->cdev_info);
+	mutex_lock(&adapter->vport_ctrl_lock);
+	idpf_vc_core_deinit(adapter);
+	idpf_deinit_dflt_mbx(adapter);
+	mutex_unlock(&adapter->vport_ctrl_lock);
+}
+
+/**
+ * idpf_pci_err_detected - PCI error detected, about to attempt recovery
+ * @pdev: PCI device struct
+ * @state: PCI channel state
+ *
+ * Return: %PCI_ERS_RESULT_NEED_RESET to attempt recovery,
+ * %PCI_ERS_RESULT_DISCONNECT if recovery is not possible.
+ */
+static pci_ers_result_t
+idpf_pci_err_detected(struct pci_dev *pdev, pci_channel_state_t state)
+{
+	struct idpf_adapter *adapter = pci_get_drvdata(pdev);
+
+	/* Shutdown the mailbox if PCI I/O is in a bad state to avoid MBX
+	 * timeouts during the prepare stage.
+	 */
+	if (pci_channel_offline(pdev) && adapter->xnm)
+		libie_ctlq_xn_shutdown(adapter->xnm);
+
+	idpf_reset_prepare(adapter);
+
+	if (state == pci_channel_io_perm_failure)
+		return PCI_ERS_RESULT_DISCONNECT;
+
+	/* When called due to PCI error, driver will have to force PFR on
+	 * resume, in order to complete the recovery via the event task.
+	 */
+	set_bit(IDPF_PCI_CB_RESET, adapter->flags);
+
+	return PCI_ERS_RESULT_NEED_RESET;
+}
+
+/**
+ * idpf_pci_err_slot_reset - PCI undergoing reset
+ * @pdev: PCI device struct
+ *
+ * Reset PCI state and use a register read to see if we're good.
+ *
+ * Return: %PCI_ERS_RESULT_RECOVERED on success,
+ * %PCI_ERS_RESULT_DISCONNECT on failure.
+ */
+static pci_ers_result_t
+idpf_pci_err_slot_reset(struct pci_dev *pdev)
+{
+	struct idpf_adapter *adapter = pci_get_drvdata(pdev);
+
+	pci_restore_state(pdev);
+	pci_set_master(pdev);
+	pci_wake_from_d3(pdev, false);
+
+	/* RSTAT register cannot have all bits set during normal operation
+	 * on current HW.
+	 */
+	if (PCI_POSSIBLE_ERROR(readl(adapter->reset_reg.rstat)))
+		return PCI_ERS_RESULT_DISCONNECT;
+
+	return PCI_ERS_RESULT_RECOVERED;
+}
+
+/**
+ * idpf_pci_err_resume - Resume operations after PCI error recovery
+ * @pdev: PCI device struct
+ */
+static void idpf_pci_err_resume(struct pci_dev *pdev)
+{
+	struct idpf_adapter *adapter = pci_get_drvdata(pdev);
+
+	/* Trigger a reset, following PCI error, to allow recovery via the
+	 * regular reset handling path.
+	 */
+	if (test_and_set_bit(IDPF_PCI_CB_RESET, adapter->flags))
+		adapter->dev_ops.reg_ops.trigger_reset(adapter, IDPF_HR_FUNC_RESET);
+
+	queue_delayed_work(adapter->vc_event_wq,
+			   &adapter->vc_event_task,
+			   msecs_to_jiffies(300));
+}
+
+/**
+ * idpf_pci_err_reset_prepare - Prepare driver for PCI reset
+ * @pdev: PCI device struct
+ */
+static void idpf_pci_err_reset_prepare(struct pci_dev *pdev)
+{
+	idpf_reset_prepare(pci_get_drvdata(pdev));
+}
+
+/**
+ * idpf_pci_err_reset_done - PCI err reset recovery complete
+ * @pdev: PCI device struct
+ */
+static void idpf_pci_err_reset_done(struct pci_dev *pdev)
+{
+	pci_dbg(pdev, "reset done\n");
+	idpf_pci_err_resume(pdev);
+}
+
+static const struct pci_error_handlers idpf_pci_err_handler = {
+	.error_detected = idpf_pci_err_detected,
+	.slot_reset = idpf_pci_err_slot_reset,
+	.reset_prepare = idpf_pci_err_reset_prepare,
+	.reset_done = idpf_pci_err_reset_done,
+	.resume = idpf_pci_err_resume,
+};
+
 /* idpf_pci_tbl - PCI Dev idpf ID Table
  */
 static const struct pci_device_id idpf_pci_tbl[] = {
@@ -381,5 +502,6 @@ static struct pci_driver idpf_driver = {
 	.sriov_configure	= idpf_sriov_configure,
 	.remove			= idpf_remove,
 	.shutdown		= idpf_shutdown,
+	.err_handler		= &idpf_pci_err_handler,
 };
 module_pci_driver(idpf_driver);
-- 
2.37.3


^ permalink raw reply related

* [PATCH iwl-next v3 1/2] idpf: remove conditional MBX deinit from idpf_vc_core_deinit()
From: Emil Tantilov @ 2026-06-30 23:18 UTC (permalink / raw)
  To: intel-wired-lan
  Cc: netdev, przemyslaw.kitszel, jay.bhat, ivan.d.barrera,
	aleksandr.loktionov, larysa.zaremba, anthony.l.nguyen,
	andrew+netdev, davem, edumazet, kuba, pabeni, aleksander.lobakin,
	linux-pci, madhu.chittim, decot, willemb, sheenamo, lukas
In-Reply-To: <20260630231854.11536-1-emil.s.tantilov@intel.com>

Previously it was assumed that idpf_vc_core_deinit() is always being
called during reset handling, where the MBX is disabled by the reset,
with remove being the exception. Ideally the driver needs to communicate
the changes to FW in all instances where the MBX is not already disabled.
Remove the remove_in_prog check from idpf_vc_core_deinit() as the MBX was
already disabled while handling the reset via libie_ctlq_xn_shutdown()
in the service task. This is also needed by the following patch,
introducing PCI callbacks support, specifically in the case where FLR is
being triggered by a user, in which case, the driver still has the ability
to notify FW before the reset happens.

Add call to libie_ctlq_xn_shutdown() in idpf_shutdown() to avoid a possible
regression where long timeouts can happen on shutdown when FW is down.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Reviewed-by: Jay Bhat <jay.bhat@intel.com>
Reviewed-by: Madhu Chittim <madhu.chittim@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
---
 drivers/net/ethernet/intel/idpf/idpf_main.c     |  2 ++
 drivers/net/ethernet/intel/idpf/idpf_virtchnl.c | 10 +---------
 2 files changed, 3 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/intel/idpf/idpf_main.c b/drivers/net/ethernet/intel/idpf/idpf_main.c
index 5a191644b28e..064bf3583824 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_main.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_main.c
@@ -191,6 +191,8 @@ static void idpf_shutdown(struct pci_dev *pdev)
 
 	cancel_delayed_work_sync(&adapter->serv_task);
 	cancel_delayed_work_sync(&adapter->vc_event_task);
+	if (adapter->xnm)
+		libie_ctlq_xn_shutdown(adapter->xnm);
 	idpf_vc_core_deinit(adapter);
 	idpf_deinit_dflt_mbx(adapter);
 
diff --git a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
index e0e510b1f1e1..cc5aeec3d00b 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
@@ -3195,24 +3195,16 @@ int idpf_vc_core_init(struct idpf_adapter *adapter)
  */
 void idpf_vc_core_deinit(struct idpf_adapter *adapter)
 {
-	bool remove_in_prog;
-
 	if (!test_bit(IDPF_VC_CORE_INIT, adapter->flags))
 		return;
 
-	/* Avoid transaction timeouts when called during reset */
-	remove_in_prog = test_bit(IDPF_REMOVE_IN_PROG, adapter->flags);
-	if (!remove_in_prog)
-		libie_ctlq_xn_shutdown(adapter->xnm);
-
 	idpf_ptp_release(adapter);
 	idpf_deinit_task(adapter);
 	idpf_idc_deinit_core_aux_device(adapter);
 	idpf_rel_rx_pt_lkup(adapter);
 	idpf_intr_rel(adapter);
 
-	if (remove_in_prog)
-		libie_ctlq_xn_shutdown(adapter->xnm);
+	libie_ctlq_xn_shutdown(adapter->xnm);
 
 	cancel_delayed_work_sync(&adapter->serv_task);
 	cancel_delayed_work_sync(&adapter->mbx_task);
-- 
2.37.3


^ permalink raw reply related

* [PATCH iwl-next v3 0/2] Introduce IDPF PCI callbacks
From: Emil Tantilov @ 2026-06-30 23:18 UTC (permalink / raw)
  To: intel-wired-lan
  Cc: netdev, przemyslaw.kitszel, jay.bhat, ivan.d.barrera,
	aleksandr.loktionov, larysa.zaremba, anthony.l.nguyen,
	andrew+netdev, davem, edumazet, kuba, pabeni, aleksander.lobakin,
	linux-pci, madhu.chittim, decot, willemb, sheenamo, lukas

This series implements PCI callbacks for the purpose of handling FLR and
PCI errors in the IDPF driver.

The first patch removes the conditional deinitialization of the mailbox in
the idpf_vc_core_deinit() function. Aside from being redundant, due to the
shutdown of the mailbox after a reset is detected, the check was also
preventing the driver from sending messages to stop and disable the vports
and queues on FW side, which is needed for the prepare phase of the FLR
handling.

The second patch implements the PCI callbacks. The logic here follows
the reset handling done in idpf_init_hard_reset(), but is split in
prepare and resume phases, where idpf_reset_prepare() stops all driver
operations and the resume callback attempt to recover following the
reset or the PCI error event.

NOTE: These changes depend on, and apply on top of the IXD series:
https://lore.kernel.org/netdev/20260608144127.2751230-1-larysa.zaremba@intel.com/

Testing hints:
1. FLR via sysfs:
echo 1 > /sys/class/net/<ifname>/device/reset

Previously this would have been handled by idpf_init_hard_reset() as the
driver detects the reset. Now it will be done by the PCI err callbacks,
so this is the easiest way to test the reset_prepare/resume path.

2. PCI errors can be tested with aer-inject:
./aer-inject -s 83:00.0 examples/<error_type>

3. Stress testing can be done by combining various callbacks with the
reset from step 1:
echo 1 > /sys/class/net/<if>/device/reset& ethtool -L <if> combined 8
ethtool -L <if> combined 16& echo 1 > /sys/class/net/<if>/device/reset

Changelog:
v2->v3:
1/2:
- Added call to libie_ctlq_xn_shutdown() in idpf_shutdown() to avoid
possible regression when shutting down while FW is not responsive,
causing long delays.
- Fixed typo in the title s/conditonal/conditional/

2/2:
- Improved the logic in idpf_reset_prepare() to make sure the
  RESET_IN_PROG bit is set after the init task.
- Renamed the err parameter in idpf_pci_err_detected() with state
  and updated the description to match.
- Added check for adapter->xnm before calling libie_ctlq_xn_shutdown()
  in idpf_pci_err_detected().
- Corrected a comment to add some context on the reasoning behind
  the reset triggered on resume, following a PCI error.
- Corrected the description for slot_reset callback by removing
  the mention of AER as it is not the only trigger for the reset.
- Use PCI_POSSIBLE_ERROR() macro when checking RSTAT value.
- Add vport_ctrl_lock when calling idpf_vc_core_deinit() in 
  idpf_reset_prepare().

v1->v2:
- Removed the call to pci_save_state() from idpf_pci_err_slot_reset(),
  as it is no longer needed after pci_restore_state(). Suggested by
  Lukas Wunner.

v1:
https://lore.kernel.org/netdev/20260411003959.30959-1-emil.s.tantilov@intel.com/

Emil Tantilov (2):
  idpf: remove conditional MBX deinit from idpf_vc_core_deinit()
  idpf: implement pci error handlers

 drivers/net/ethernet/intel/idpf/idpf.h        |   3 +
 drivers/net/ethernet/intel/idpf/idpf_lib.c    |  13 +-
 drivers/net/ethernet/intel/idpf/idpf_main.c   | 124 ++++++++++++++++++
 .../net/ethernet/intel/idpf/idpf_virtchnl.c   |  10 +-
 4 files changed, 139 insertions(+), 11 deletions(-)

-- 
2.37.3


^ permalink raw reply

* Re: [PATCH iproute2-next] ss: stop displaying dccp sockets
From: Kuniyuki Iwashima @ 2026-06-30 23:13 UTC (permalink / raw)
  To: Yafang Shao; +Cc: stephen, netdev
In-Reply-To: <20260630114121.26430-1-laoar.shao@gmail.com>

On Tue, Jun 30, 2026 at 4:41 AM Yafang Shao <laoar.shao@gmail.com> wrote:
>
> DCCP support was retired in kernel commit 2a63dd0edf38 ("net: Retire
> DCCP socket."). However, ss still attempts to query DCCP sockets via
> netlink, which triggers repeated SELinux warnings in dmesg:
>
>   SELinux: unrecognized netlink message: protocol=4 nlmsg_type=19 \
>     sclass=netlink_tcpdiag_socket pid=188945 comm=ss
>
> Stop sending DCCPDIAG_GETSOCK netlink messages to suppress these
> warnings and align ss with the kernel change.
>
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> Cc: Kuniyuki Iwashima <kuniyu@google.com>
> ---
>  man/man8/ss.8 |  5 +++--
>  misc/ss.c     | 40 ++++++++--------------------------------
>  2 files changed, 11 insertions(+), 34 deletions(-)
>
> diff --git a/man/man8/ss.8 b/man/man8/ss.8
> index 70e0a566..37dd75a0 100644
> --- a/man/man8/ss.8
> +++ b/man/man8/ss.8
> @@ -378,7 +378,8 @@ Display TCP sockets.
>  Display UDP sockets.
>  .TP
>  .B \-d, \-\-dccp
> -Display DCCP sockets.
> +[Deprecated] DCCP is no longer supported since kernel 6.16.
> +This option is ignored.
>  .TP
>  .B \-w, \-\-raw
>  Display RAW sockets.
> @@ -411,7 +412,7 @@ supported: unix, inet, inet6, link, netlink, vsock, tipc, xdp.
>  .B \-A QUERY, \-\-query=QUERY, \-\-socket=QUERY
>  List of socket tables to dump, separated by commas. The following identifiers
>  are understood: all, inet, tcp, udp, raw, unix, packet, netlink, unix_dgram,
> -unix_stream, unix_seqpacket, packet_raw, packet_dgram, dccp, sctp, tipc,
> +unix_stream, unix_seqpacket, packet_raw, packet_dgram, sctp, tipc,
>  vsock_stream, vsock_dgram, xdp, mptcp. Any item in the list may optionally be
>  prefixed by an exclamation mark
>  .RB ( ! )
> diff --git a/misc/ss.c b/misc/ss.c
> index 14e9f27a..dae5f282 100644
> --- a/misc/ss.c
> +++ b/misc/ss.c
> @@ -195,7 +195,6 @@ static const char *dg_proto;
>  enum {
>         TCP_DB,
>         MPTCP_DB,
> -       DCCP_DB,
>         UDP_DB,
>         RAW_DB,
>         UNIX_DG_DB,
> @@ -215,7 +214,7 @@ enum {
>  #define PACKET_DBM ((1<<PACKET_DG_DB)|(1<<PACKET_R_DB))
>  #define UNIX_DBM ((1<<UNIX_DG_DB)|(1<<UNIX_ST_DB)|(1<<UNIX_SQ_DB))
>  #define ALL_DB ((1<<MAX_DB)-1)
> -#define INET_L4_DBM ((1<<TCP_DB)|(1<<MPTCP_DB)|(1<<UDP_DB)|(1<<DCCP_DB)|(1<<SCTP_DB))
> +#define INET_L4_DBM ((1<<TCP_DB)|(1<<MPTCP_DB)|(1<<UDP_DB)|(1<<SCTP_DB))
>  #define INET_DBM (INET_L4_DBM | (1<<RAW_DB))
>  #define VSOCK_DBM ((1<<VSOCK_ST_DB)|(1<<VSOCK_DG_DB))
>
> @@ -274,10 +273,6 @@ static const struct filter default_dbs[MAX_DB] = {
>                 .states   = SS_CONN,
>                 .families = FAMILY_MASK(AF_INET) | FAMILY_MASK(AF_INET6),
>         },
> -       [DCCP_DB] = {
> -               .states   = SS_CONN,
> -               .families = FAMILY_MASK(AF_INET) | FAMILY_MASK(AF_INET6),
> -       },
>         [UDP_DB] = {
>                 .states   = (1 << SS_ESTABLISHED),
>                 .families = FAMILY_MASK(AF_INET) | FAMILY_MASK(AF_INET6),
> @@ -388,13 +383,12 @@ static int filter_db_parse(struct filter *f, const char *s)
>                 int dbs[MAX_DB + 1];
>         } db_name_tbl[] = {
>  #define ENTRY(name, ...) { #name, { __VA_ARGS__, MAX_DB } }
> -               ENTRY(all, UDP_DB, DCCP_DB, TCP_DB, MPTCP_DB, RAW_DB,
> +               ENTRY(all, UDP_DB, TCP_DB, MPTCP_DB, RAW_DB,
>                            UNIX_ST_DB, UNIX_DG_DB, UNIX_SQ_DB,
>                            PACKET_R_DB, PACKET_DG_DB, NETLINK_DB,
>                            SCTP_DB, VSOCK_ST_DB, VSOCK_DG_DB, XDP_DB),
> -               ENTRY(inet, UDP_DB, DCCP_DB, TCP_DB, MPTCP_DB, SCTP_DB, RAW_DB),
> +               ENTRY(inet, UDP_DB, TCP_DB, MPTCP_DB, SCTP_DB, RAW_DB),
>                 ENTRY(udp, UDP_DB),
> -               ENTRY(dccp, DCCP_DB),
>                 ENTRY(tcp, TCP_DB),
>                 ENTRY(mptcp, MPTCP_DB),
>                 ENTRY(sctp, SCTP_DB),
> @@ -935,8 +929,6 @@ static const char *proto_name(int protocol)
>                 return "mptcp";
>         case IPPROTO_SCTP:
>                 return "sctp";
> -       case IPPROTO_DCCP:
> -               return "dccp";
>         case IPPROTO_ICMPV6:
>                 return "icmp6";
>         }
> @@ -3897,8 +3889,6 @@ static int tcpdiag_send(int fd, int protocol, struct filter *f)
>
>         if (protocol == IPPROTO_TCP)
>                 req.nlh.nlmsg_type = TCPDIAG_GETSOCK;
> -       else if (protocol == IPPROTO_DCCP)
> -               req.nlh.nlmsg_type = DCCPDIAG_GETSOCK;
>         else
>                 return -1;
>
> @@ -4134,7 +4124,7 @@ static int inet_show_netlink(struct filter *f, FILE *dump_fp, int protocol)
>
>         /* Suppress netlink errors. Older kernels do not support extended
>          * protocol requests using INET_DIAG_REQ_PROTOCOL, and some protocols
> -        * may not be available in the running kernel (e.g. SCTP, DCCP).
> +        * may not be available in the running kernel (e.g. SCTP).
>          * In both cases the kernel returns EINVAL which would cause
>          * rtnl_dump_error() to print a confusing "RTNETLINK answers" error.
>          */
> @@ -4309,18 +4299,6 @@ static int mptcp_show(struct filter *f)
>         return 0;
>  }
>
> -static int dccp_show(struct filter *f)
> -{
> -       if (!filter_af_get(f, AF_INET) && !filter_af_get(f, AF_INET6))
> -               return 0;
> -
> -       if (!getenv("PROC_NET_DCCP") && !getenv("PROC_ROOT")
> -           && inet_show_netlink(f, NULL, IPPROTO_DCCP) == 0)
> -               return 0;
> -
> -       return 0;
> -}
> -
>  static int sctp_show(struct filter *f)
>  {
>         if (!filter_af_get(f, AF_INET) && !filter_af_get(f, AF_INET6))
> @@ -5779,7 +5757,7 @@ static void _usage(FILE *dest)
>  "   -M, --mptcp         display only MPTCP sockets\n"
>  "   -S, --sctp          display only SCTP sockets\n"
>  "   -u, --udp           display only UDP sockets\n"
> -"   -d, --dccp          display only DCCP sockets\n"
> +"   -d, --dccp          DCCP is no longer supported, option ignored\n"
>  "   -w, --raw           display only RAW sockets\n"
>  "   -x, --unix          display only Unix domain sockets\n"
>  "       --tipc          display only TIPC sockets\n"
> @@ -5795,7 +5773,7 @@ static void _usage(FILE *dest)
>  "       --inet-sockopt  show various inet socket options\n"
>  "\n"
>  "   -A, --query=QUERY, --socket=QUERY\n"
> -"       QUERY := {all|inet|tcp|mptcp|udp|raw|unix|unix_dgram|unix_stream|unix_seqpacket|packet|packet_raw|packet_dgram|netlink|dccp|sctp|vsock_stream|vsock_dgram|tipc|xdp}[,QUERY]\n"
> +"       QUERY := {all|inet|tcp|mptcp|udp|raw|unix|unix_dgram|unix_stream|unix_seqpacket|packet|packet_raw|packet_dgram|netlink|sctp|vsock_stream|vsock_dgram|tipc|xdp}[,QUERY]\n"
>  "\n"
>  "   -D, --diag=FILE     Dump raw information about TCP sockets to FILE\n"
>  "   -F, --filter=FILE   read filter information from FILE\n"
> @@ -5907,7 +5885,7 @@ static const struct option long_opts[] = {
>         { "threads", 0, 0, 'T' },
>         { "bpf", 0, 0, 'b' },
>         { "events", 0, 0, 'E' },
> -       { "dccp", 0, 0, 'd' },
> +       { "dccp", 0, 0, 'd' }, /* DCCP retired, kept for compatibility */
>         { "tcp", 0, 0, 't' },
>         { "sctp", 0, 0, 'S' },
>         { "udp", 0, 0, 'u' },
> @@ -5997,7 +5975,7 @@ int main(int argc, char *argv[])
>                         follow_events = 1;
>                         break;
>                 case 'd':
> -                       filter_db_set(&current_filter, DCCP_DB, true);
> +                       /* DCCP retired in kernel 6.16, kept for compatibility */

I think it more user-friendly to remove the case and show usage(),
instead of just ignoring the option.


>                         break;
>                 case 't':
>                         filter_db_set(&current_filter, TCP_DB, true);
> @@ -6290,8 +6268,6 @@ int main(int argc, char *argv[])
>                 udp_show(&current_filter);
>         if (current_filter.dbs & (1<<TCP_DB))
>                 tcp_show(&current_filter);
> -       if (current_filter.dbs & (1<<DCCP_DB))
> -               dccp_show(&current_filter);
>         if (current_filter.dbs & (1<<SCTP_DB))
>                 sctp_show(&current_filter);
>         if (current_filter.dbs & VSOCK_DBM)
> --
> 2.50.1 (Apple Git-155)
>

^ permalink raw reply

* Re: [PATCH net-next] selftests: drv-net: toeplitz: cap the Rx queue count
From: Jakub Kicinski @ 2026-06-30 23:10 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, shuah,
	willemb, noren, gal, linux-kselftest
In-Reply-To: <willemdebruijn.kernel.23684913d1fd4@gmail.com>

On Tue, 30 Jun 2026 13:11:15 -0400 Willem de Bruijn wrote:
> > +def _cap_queue_count(cfg):
> > +    ehdr = {"header": {"dev-index": cfg.ifindex}}
> > +    chans = cfg.ethnl.channels_get(ehdr)
> > +
> > +    config = {}
> > +    restore = {}
> > +    for key in ("combined-count", "rx-count"):  
> 
> This assumes that combined and rx are not set at the same time.
> SGTM, not expected in real devices. But technically they could be.

Ack, some tests just assume the NIC uses combined, which is most common.
If this ever causes issues we should probably add support for
provisioning min/max number of queues in the env setup itself.
IIRC someone even posted that at some point.

^ permalink raw reply

* Re: [PATCH net-next v1 0/2] Reuse threaded NAPI kthread across napi_del()/napi_add().
From: Jakub Kicinski @ 2026-06-30 23:05 UTC (permalink / raw)
  To: Mina Almasry
  Cc: Harshitha Ramamurthy, Jordan Rhee, Shuhao Tan, David S . Miller,
	Eric Dumazet, Paolo Abeni, Simon Horman, Andrew Lunn, Shuah Khan,
	Samiullah Khawaja, Kuniyuki Iwashima, netdev, linux-kernel,
	linux-kselftest
In-Reply-To: <CAHS8izNeoJiLyoyiPgzmX5CGcyivYqBpVxOcFnoxeOX_zzFd4Q@mail.gmail.com>

On Tue, 30 Jun 2026 10:38:01 -0700 Mina Almasry wrote:
> > > It feels surprising that the userspace needs to reconfigure thread
> > > properties when changing NIC configurations unrelated to threading.
> > > Another downside is that when userspace configures NIC configurations
> > > in quick succession, re-application becomes messy because a previous
> > > re-application might still be in progress when the thread is gone.  
> >
> > Can you explain more about your deployment and system configuration
> > flow? We may be adding micro optimizations when the problem is that
> > we recreate the NAPIs in the first place.  
> 
> We have an AF_XDP application with extremely low latency and jitter
> requirements running on our servers. Sami developed busypolling
> threaded napi for them. Since it's an AF_XDP application, they attach
> their umem to specific RX queues, and then configure threaded NAPI
> busypolling to achieve low latency. That involves using the Netlink
> API to set the threaded/busypolling property, grabbing the kthread
> PID, and setting some properties on the kthread.

What I don't understand is how you have an "application with extremely
low latency and jitter requirements" and at the same time "user runs 
an unrelated ethtool command" reallocating NAPIs and disrupting that
application.

Honestly, the last two times y'all were touching NAPI it was a major
effort to get the code into acceptable shape. I don't have the cycles
right now to help another unknown-upstream (intern?) get their patches
into shape.

Can y'all not open a pidfs fd for the NAPI thread? You'll get a
notification when the existing kthread dies?

^ permalink raw reply

* Re: [RFC net-next] bonding: Retry updating slave MAC after a failure
From: Jay Vosburgh @ 2026-06-30 22:59 UTC (permalink / raw)
  To: Paritosh Potukuchi; +Cc: netdev, linux-kernel, paritosh.potukuchi
In-Reply-To: <20260630150937.3508222-1-paritosh.potukuchi@amd.com>

Paritosh Potukuchi <paritoshpotukuchi@gmail.com> wrote:

>I came across this TODO in bond_set_mac_address() :
>
>        /* TODO: consider downing the slave
>         * and retry ?
>         * User should expect communications
>         * breakage anyway until ARP finish
>         * updating, so...
>         */
>
>Currently, if the dev_set_mac_address() fails on a slave, we go
>ahead and unwind the bond and its slaves.
>
>As the TODO suggests, one possible solution is to try setting
>the MAC again, after putting down the interface. This is because some 
>drivers may reject changing the MAC when the device is UP.
>
>The solution I am proposing is as follows:
>
>dev_set_mac_address on the slave
>        - If this fails, temporarily stop the slave - ndo_stop
>                - If stop fails, unwind
>        - call dev_set_mac_address() on the slave
>                - If this fails, unwind
>        - Bring up the slave by calling ndo_open
>                - If this fails, unwind
>If dev_set_mac_address on slave passes, we go to the next slave
>
>
>Before working on a patch, I wanted to get feedback on whether
>this interpretation of the TODO makes sense and whether there
>are concerns with temporarily stopping and restarting a slave
>during bond_set_mac_address().

	I think the proper thing to do is remove this comment block and
make no other changes.

	This comment dates to sometime before git, when it was common
for network device drivers to lack the ability to change the MAC while
the interface is up.  To the best of my knowledge, that isn't a issue
today.

	-J

---
	-Jay Vosburgh, jv@jvosburgh.net

^ permalink raw reply

* Re: [TEST] intel: low timeout
From: Jakub Kicinski @ 2026-06-30 22:50 UTC (permalink / raw)
  To: Pielech, Adrian
  Cc: Kitszel, Przemyslaw, netdev@vger.kernel.org, intel-wired-lan,
	leszek.pepiak
In-Reply-To: <35ccefb4-a588-4556-87c0-ade880eaa8d6@intel.com>

On Tue, 30 Jun 2026 14:56:02 +0200 Pielech, Adrian wrote:
> On 6/27/2026 6:54 PM, Jakub Kicinski wrote:
> > Hi!
> > 
> > Some of the tests need more than 5min, could you increase the timeout
> > in the runner to 10 or 15min? Looks like it's hard-killing tests right
> > now after 2min:
> > 
> > https://netdev-ci-results.intel.com/ice-results/net-next-hw-2026-06-27--16-00/ice-E810-XXV4/xdp.py/stdout
> > 
> > which leaks config across tests:
> > 
> > https://netdev-ci-results.intel.com/ice-results/net-next-hw-2026-06-27--16-00/ice-E810-XXV4/irq.py/stdout
> > 
> > BTW the JSON reports the timed out tests as pass.  
> 
> Hi Jakub,
> 
> I've increased timeout to 10 minutes per test run. It seems to help with 
> XDP tests score.

Great, thank you!

> I'll later take a look on default behavior of runner in case of timeouts.

default behavior == pass/fail status for the test?

^ permalink raw reply

* Re: [PATCH net] net/mlx5: HWS, fix matcher leak on resize target setup failure
From: Yevgeny Kliteynik @ 2026-06-30 22:47 UTC (permalink / raw)
  To: Dawei Feng, saeedm, tariqt
  Cc: leon, mbloch, andrew+netdev, davem, edumazet, kuba, pabeni,
	vdogaru, horms, kees, stable, netdev, linux-rdma, linux-kernel,
	jianhao.xu, zilin
In-Reply-To: <20260629064049.3852759-1-dawei.feng@seu.edu.cn>

On 29-Jun-26 09:40, Dawei Feng wrote:
> hws_bwc_matcher_move() allocates a replacement matcher before setting it
> as the resize target. If mlx5hws_matcher_resize_set_target() fails, the
> replacement matcher is not attached anywhere and is leaked.
> 
> Fix the leak by destroying the replacement matcher before returning from
> the resize-target failure path.
> 
> The bug was first flagged by an experimental analysis tool we are
> developing for kernel memory-management bugs while analyzing
> v6.13-rc1. The tool is still under development and is not yet publicly
> available. Manual inspection confirms that the bug is still
> present in v7.1.1.
> 
> An x86_64 allyesconfig build showed no new warnings. As we do not have a
> mlx5 HWS-capable device to test with, no runtime testing was able to be
> performed.
> 
> Fixes: 2111bb970c78 ("net/mlx5: HWS, added backward-compatible API handling")
> Cc: stable@vger.kernel.org
> Signed-off-by: Dawei Feng <dawei.feng@seu.edu.cn>
> ---
>   drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c
> index eae02bc74221..3bcf412a08c4 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c
> @@ -205,6 +205,7 @@ static int hws_bwc_matcher_move(struct mlx5hws_bwc_matcher *bwc_matcher)
>   	ret = mlx5hws_matcher_resize_set_target(old_matcher, new_matcher);
>   	if (ret) {
>   		mlx5hws_err(ctx, "Rehash error: failed setting resize target\n");
> +		mlx5hws_matcher_destroy(new_matcher);
>   		return ret;
>   	}
>   

Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>

^ permalink raw reply

* Re: [PATCH net v2 1/2] net: ethernet: oa_tc6: Protect skb pointer used by two different kernel instances
From: Jakub Kicinski @ 2026-06-30 22:46 UTC (permalink / raw)
  To: Selvamani Rajagopal
  Cc: Selvamani Rajagopal via B4 Relay, Parthiban Veerasooran,
	Andrew Lunn, Piergiorgio Beruto, David S. Miller, Eric Dumazet,
	Paolo Abeni, netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	Andrew Lunn
In-Reply-To: <CYYPR02MB9828DBA5FD39F4DB45FC890A83F72@CYYPR02MB9828.namprd02.prod.outlook.com>

On Tue, 30 Jun 2026 04:16:24 +0000 Selvamani Rajagopal wrote:
> > On Fri, 26 Jun 2026 08:35:18 -0700 Selvamani Rajagopal via B4 Relay
> > wrote:  
> > > Threaded IRQ uses waiting_tx_skb. Transmit path also uses
> > > this pointer without any mutual exclusion protection. As a
> > > result, it might leak skb buffer, particularly threaded IRQ
> > > runs in the middle of tranmsmit path, near skb_linearize.  
> > 
> > Can you say more ? only xmit sets waiting_tx_skb, the IRQ
> > clears it. So why is IRQ racing with xmit leading to drops?  
> 
> I believe xmit path and IRQ thread would run in different kernel
> instances. Imagine oa_tc6_try_spi_transfer call fails in threaded
> IRQ. It would set disable_irq. If xmit function didn't see that when
> it checked, but it is set before placing skb buffer in the
> waiting_tx_skb pointer (due to skb_linearize for example), the skb
> would be stuck in waiting_tx_skb.

Perhaps, but wouldn't that cause a stall not a leak?

Please do your digging and submit high quality patches which don't
require research. We get 150 patches a day in netdev, and all
maintainers have day jobs (contrary to popular belief)

^ permalink raw reply

* Re: [PATCH] net: usb: cx82310_eth: stop parsing reboot marker as packet
From: Jakub Kicinski @ 2026-06-30 22:42 UTC (permalink / raw)
  To: Tianchu Chen; +Cc: andrew+netdev, davem, edumazet, pabeni, linux-usb, netdev
In-Reply-To: <e87b8ecf4bbcf87635d144508bf35377dd5397b3@linux.dev>

On Tue, 30 Jun 2026 10:30:53 +0000 Tianchu Chen wrote:
> June 30, 2026 at 8:44 AM, "Jakub Kicinski" <kuba@kernel.org mailto:kuba@kernel.org?to=%22Jakub%20Kicinski%22%20%3Ckuba%40kernel.org%3E > wrote:
> > On Thu, 25 Jun 2026 15:32:04 +0000 Tianchu Chen wrote:
> > > From: Tianchu Chen <flynnnchen@tencent.com>
> > >  
> > >  Discovered by Atuin - Automated Vulnerability Discovery Engine.
> > >  
> > >  cx82310_rx_fixup() treats an RX length of 0xffff as a device reboot
> > >  marker and schedules work to re-enable ethernet mode, but then continues
> > >  processing the marker as a normal packet length. This is an out-of-bounds
> > >  heap write controlled by the usb device.
> > >   
> > Where? Can you be more specific in the commit message? At a glance 
> > the accesses seem to be bound-checked with skb->len.
> > -- 
> > pw-bot: cr
> >  
> 
> 
> The "len > skb->len" check bounds the source read, but the overflow is on the
> destination buffer.
> 
> The buggy path is:
> 
> 	if (len == 0xffff) {
> 		netdev_info(dev->net, "router was rebooted, re-enabling ethernet mode");
> 		schedule_work(&priv->reenable_work);
> 		/* <- BUG: missing return; 0xffff bypasses the oversized-length reject */
> 	} else if (len > CX82310_MTU) {
> 		netdev_err(dev->net, "RX packet too long: %d B\n", len);
> 		return 0;
> 	}
> 	if (len > skb->len) {
> 		dev->partial_len = skb->len; // skb->len is bounded by the USB transfer size (4K)
> 		dev->partial_rem = len - skb->len;
> 		memcpy((void *)dev->partial_data, skb->data,
> 		       dev->partial_len); /* <- TRIGGER: can copy 4K bytes into 1516-byte partial_data */

If skb->len (== dev->partial_len) is not bound-checked to the size
of dev->partial_data - aren't there more paths that could hit this
overflow? Are you fixing the right thing?

> ...
> ...
> 	}
> 
> For normal oversized lengths, the len > CX82310_MTU branch rejects
> before this copy. But 0xffff is special-cased and falls through. With a
> 4096-byte RX URB, after the 2-byte length header is pulled, skb->len can
> be 4094, while partial_data is allocated as dev->hard_mtu
> (CX82310_MTU + 2, 1516 bytes).
> 
> So the source read is bounded by skb->len; the destination write is not.
> 
> I am happy to send a v2 with this spelled out more clearly in the commit message
> if needed.
> 
> Best regards,
> Tianchu Chen
> 


^ permalink raw reply

* Re: [PATCH v4 5/7] ARM: dts: microchip: sama5d27_wlsom1: use fixed-partitions for QSPI flash
From: Linus Walleij @ 2026-06-30 22:27 UTC (permalink / raw)
  To: Manikandan Muralidharan
  Cc: pratyush, mwalle, takahiro.kuwano, miquel.raynal, richard,
	vigneshr, robh, krzk+dt, conor+dt, srini, nicolas.ferre,
	alexandre.belloni, claudiu.beznea, linux, richardcochran, arnd,
	michael, linux-mtd, devicetree, linux-kernel, linux-arm-kernel,
	netdev
In-Reply-To: <20260630092406.150587-6-manikandan.m@microchip.com>

On Tue, Jun 30, 2026 at 11:26 AM Manikandan Muralidharan
<manikandan.m@microchip.com> wrote:

> Move the QSPI flash partitions under a "partitions" node with the
> "fixed-partitions" compatible, as required by the current MTD partition
> binding, instead of declaring them as direct children of the flash node.
> No functional change.
>
> Signed-off-by: Manikandan Muralidharan <manikandan.m@microchip.com>

Reviewed-by: Linus Walleij <linusw@kernel.org>

Yours,
Linus Walleij

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox