Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net-next 1/3] net: ipv4: report multicast group user count
From: Vadim Fedorenko @ 2026-07-01 20:54 UTC (permalink / raw)
  To: Yuyang Huang
  Cc: David S. Miller, Andrew Lunn, David Ahern, Donald Hunter,
	Eric Dumazet, Ido Schimmel, Jakub Kicinski, Paolo Abeni,
	Shuah Khan, Simon Horman, linux-kernel, linux-kselftest, netdev
In-Reply-To: <20260630110207.37841-2-sigefriedhyy@gmail.com>

On 30/06/2026 12:02, Yuyang Huang wrote:
> RTM_GETMULTICAST has been part of the rtnetlink ABI for a long time
> and already reports IPv4 multicast group membership through
> IFA_MULTICAST and IFA_CACHEINFO. It does not report how many consumers
> hold each membership, so userspace still has to parse /proc/net/igmp to
> get the Users column.
> 
> Add IFA_MC_USERS as a u32 attribute carrying ip_mc_list::users in
> RTM_GETMULTICAST replies and entry-lifecycle notifications.
> 
> This gives iproute2 enough information to migrate the IPv4 part of
> "ip maddr show" from procfs parsing to rtnetlink.
> 
> Signed-off-by: Yuyang Huang <sigefriedhyy@gmail.com>
> ---
>   Documentation/netlink/specs/rt-addr.yaml | 4 ++++
>   include/uapi/linux/if_addr.h             | 1 +
>   net/ipv4/igmp.c                          | 2 ++
>   3 files changed, 7 insertions(+)
> 

Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>

^ permalink raw reply

* Re: [PATCH net 3/9] netfilter: ipset: fix race between dump and ip_set_list resize
From: Xiang Mei @ 2026-07-01 20:33 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netdev, netfilter-devel, kadlec
In-Reply-To: <akShGOr3YKg1bs3r@strlen.de>

Thanks, Florian,

I'll deliver an independent patch for that issue.

Xiang

On Tue, Jun 30, 2026 at 10:09 PM Florian Westphal <fw@strlen.de> wrote:
>
> Florian Westphal <fw@strlen.de> wrote:
> > From: Xiang Mei <xmei5@asu.edu>
>
> Xiang, Jozsef, could you please have a look at
>
> https://sashiko.dev/#/patchset/20260630045243.2657-1-fw%40strlen.de
>
> AFAICS it's correct but should be handled in a followup patch rather
> than a v2.
>
> Thanks!

^ permalink raw reply

* Re: RTL8159 firmware
From: Birger Koblitz @ 2026-07-01 20:32 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Jan Hendrik Farr, andrew+netdev, davem, edumazet, hsu.chih.kai,
	kuba, linux-kernel, linux-usb, netdev, olek2, pabeni
In-Reply-To: <8f349e00-1710-428d-8c70-5d1b1f4d42f3@lunn.ch>

On 01/07/2026 21:29, Andrew Lunn wrote:
> On Wed, Jul 01, 2026 at 07:24:13PM +0200, Birger Koblitz wrote:
>> Hi Jan,
>>
>> On 7/1/26 19:13, Jan Hendrik Farr wrote:
>>> Hi Birger,
>>>
>>> it looks like the firmware file rtl_nic/rtl8159-1.fw isn't in linux-firmware yet.
>>> Could you send it for people to potentially test?
>>>
>>> Jan
>>>
>> The code to create the binary firmware file is at:
>> https://gitlab.com/koblitz-rtlnic/rtlnic_fw
>> But I cannot submit the firmware itself to linux-firmware, as the sourcecode from
>> which the binary data is extracted is published by Realtek under the GPL.
> 
> The obvious work around is to not convert it to binary.
> 
> 8152.c has a clear GPL-2.0 header. So you can edit that file, extract
> the two arrays, and it would still be GPL. You can make the firmware
> loader in the driver parse the ASCII array as it is. It also looks
> like you can .xz or .zst compress it, and the kernel
> _request_firmware() will handle the decompression for you.

The binary firmware file that the driver currently expects has the same
structure that is used also for the other supported chips, e.g. the
RTL8156 which were provided by Realtek and are already in the
rtl_nic directory of the firmware. There is a bit more information
in those files such as a pointer to the right flash update registers
and also version information to not update newer chips.
Another solution would be to publish the binaries with linux-firmware
alongside the bit of source code that creates the binaries. Firmware
file and source-code would be licensed under the GPL.
I still hope for a reaction from Realtek, though.

Birger

^ permalink raw reply

* Re: [PATCH bpf-next v2 13/15] libbpf: Support attaching struct_ops to a cgroup
From: Andrii Nakryiko @ 2026-07-01 20:32 UTC (permalink / raw)
  To: Amery Hung
  Cc: bpf, netdev, alexei.starovoitov, andrii, daniel, eddyz87, memxor,
	martin.lau, shakeel.butt, roman.gushchin, kuniyu, kerneljasonxing,
	kernel-team
In-Reply-To: <20260623175006.3136053-14-ameryhung@gmail.com>

On Tue, Jun 23, 2026 at 10:50 AM Amery Hung <ameryhung@gmail.com> wrote:
>
> From: Martin KaFai Lau <martin.lau@kernel.org>
>
> Add bpf_map__attach_cgroup_opts() to attach a struct_ops map to a cgroup
> through a BPF link.
>
> Also extend struct bpf_prog_query_opts with a type_id field so a
> BPF_STRUCT_OPS query on a cgroup can select the struct_ops type to
> enumerate.
>
> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
> Signed-off-by: Amery Hung <ameryhung@gmail.com>
> ---
>  tools/lib/bpf/bpf.c            |  2 ++
>  tools/lib/bpf/bpf.h            |  3 +-
>  tools/lib/bpf/libbpf.c         | 59 ++++++++++++++++++++++++++++++++++
>  tools/lib/bpf/libbpf.h         |  3 ++
>  tools/lib/bpf/libbpf.map       |  5 +++
>  tools/lib/bpf/libbpf_version.h |  2 +-
>  6 files changed, 72 insertions(+), 2 deletions(-)
>

[...]

> diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
> index b965ad571540..0e5f4e9bba41 100644
> --- a/tools/lib/bpf/libbpf.h
> +++ b/tools/lib/bpf/libbpf.h
> @@ -960,6 +960,9 @@ bpf_program__attach_cgroup_opts(const struct bpf_program *prog, int cgroup_fd,
>  struct bpf_map;
>
>  LIBBPF_API struct bpf_link *bpf_map__attach_struct_ops(const struct bpf_map *map);
> +LIBBPF_API struct bpf_link *bpf_map__attach_cgroup_opts(const struct bpf_map *map,
> +                                                       int cgroup_fd,
> +                                                       const struct bpf_cgroup_opts *opts);
>  LIBBPF_API int bpf_link__update_map(struct bpf_link *link, const struct bpf_map *map);
>
>  struct bpf_iter_attach_opts {
> diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
> index b731df19ae69..1b01d49e58eb 100644
> --- a/tools/lib/bpf/libbpf.map
> +++ b/tools/lib/bpf/libbpf.map
> @@ -462,3 +462,8 @@ LIBBPF_1.8.0 {
>                 bpf_program__clone;
>                 btf__new_empty_opts;
>  } LIBBPF_1.7.0;
> +
> +LIBBPF_1.9.0 {
> +       global:
> +               bpf_map__attach_cgroup_opts;
> +} LIBBPF_1.8.0;

v1.8 hasn't been released yet, this new API should go into LIBBPF_1.8.0 section

> diff --git a/tools/lib/bpf/libbpf_version.h b/tools/lib/bpf/libbpf_version.h
> index c446c0cd8cf9..57b74ef3618c 100644
> --- a/tools/lib/bpf/libbpf_version.h
> +++ b/tools/lib/bpf/libbpf_version.h
> @@ -4,6 +4,6 @@
>  #define __LIBBPF_VERSION_H
>
>  #define LIBBPF_MAJOR_VERSION 1
> -#define LIBBPF_MINOR_VERSION 8
> +#define LIBBPF_MINOR_VERSION 9
>
>  #endif /* __LIBBPF_VERSION_H */
> --
> 2.53.0-Meta
>

^ permalink raw reply

* Re: RTL8159 firmware
From: Jan Hendrik Farr @ 2026-07-01 20:13 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Birger Koblitz, andrew+netdev, davem, edumazet, hsu.chih.kai,
	kuba, linux-kernel, linux-usb, netdev, olek2, pabeni
In-Reply-To: <8f349e00-1710-428d-8c70-5d1b1f4d42f3@lunn.ch>

On 01 21:29:20, Andrew Lunn wrote:
> On Wed, Jul 01, 2026 at 07:24:13PM +0200, Birger Koblitz wrote:
> > Hi Jan,
> > 
> > On 7/1/26 19:13, Jan Hendrik Farr wrote:
> > > Hi Birger,
> > > 
> > > it looks like the firmware file rtl_nic/rtl8159-1.fw isn't in linux-firmware yet.
> > > Could you send it for people to potentially test?
> > > 
> > > Jan
> > > 
> > The code to create the binary firmware file is at:
> > https://gitlab.com/koblitz-rtlnic/rtlnic_fw
> > But I cannot submit the firmware itself to linux-firmware, as the sourcecode from
> > which the binary data is extracted is published by Realtek under the GPL.
> 
> The obvious work around is to not convert it to binary.
> 
> 8152.c has a clear GPL-2.0 header. So you can edit that file, extract
> the two arrays, and it would still be GPL. You can make the firmware
> loader in the driver parse the ASCII array as it is. It also looks
> like you can .xz or .zst compress it, and the kernel
> _request_firmware() will handle the decompression for you.
> 

Not a GPL expert, but aren't those "binary" arrays in 8152.c a GPL violation?
Unless they created them directly. But I "highly suspect" they are
compiled from some source. They have to provide the "preferred form for
modification"

Jan


^ permalink raw reply

* Re: [PATCH bpf-next v2] selftests/bpf: Mask socket type flags in mptcpify prog
From: patchwork-bot+netdevbpf @ 2026-07-01 20:00 UTC (permalink / raw)
  To: Guillaume Maudoux
  Cc: ast, daniel, andrii, martin.lau, eddyz87, matttbe, martineau,
	geliang, shuah, bpf, mptcp, netdev, linux-kselftest, linux-kernel
In-Reply-To: <20260630095723.564392-1-layus.on@gmail.com>

Hello:

This patch was applied to bpf/bpf-next.git (master)
by Andrii Nakryiko <andrii@kernel.org>:

On Tue, 30 Jun 2026 11:57:23 +0200 you wrote:
> The mptcpify BPF prog upgrades eligible TCP sockets to MPTCP, but only
> when the socket type is exactly SOCK_STREAM. Its update_socket_protocol()
> hook runs on the raw type from userspace, before the socket core masks
> it with SOCK_TYPE_MASK, so the type may still carry SOCK_CLOEXEC or
> SOCK_NONBLOCK in its upper bits and the equality check fails.
> 
> As a result, a socket(AF_INET, SOCK_STREAM | SOCK_CLOEXEC, 0) -- what
> common libraries do by default -- is silently left as plain TCP. This
> was hit in practice with curl. Since mptcpify.c is referenced as example
> code for enabling MPTCP transparently, the same mistake is likely to be
> copied into real deployments where it fails the same way and is hard to
> diagnose.
> 
> [...]

Here is the summary with links:
  - [bpf-next,v2] selftests/bpf: Mask socket type flags in mptcpify prog
    https://git.kernel.org/bpf/bpf-next/c/b4b8b334f6b5

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH v5] bpf: Fix smp_processor_id() call trace for preemptible kernels
From: Andrii Nakryiko @ 2026-07-01 19:57 UTC (permalink / raw)
  To: Edward Adam Davis
  Cc: sashiko-bot, jiayuan.chen, sashiko-reviews, andrii, ast, bpf,
	daniel, eddyz87, emil, jolsa, linux-kernel, martin.lau, memxor,
	netdev, song, syzkaller-bugs, yonghong.song
In-Reply-To: <tencent_70F825FB8D959232DDCB5DDC991ACFB40D07@qq.com>

On Tue, Jun 30, 2026 at 5:27 PM Edward Adam Davis <eadavis@qq.com> wrote:
>
> bpf_mem_cache_free_rcu() maybe called in preemptible context, this
> will trigger the below warning message:
>
> BUG: using smp_processor_id() in preemptible [00000000] code: syz.0.17/5820
> caller is bpf_mem_cache_free_rcu+0x48/0xc0 kernel/bpf/memalloc.c:954
> Call Trace:
>  check_preemption_disabled+0xd3/0xe0 lib/smp_processor_id.c:47
>  bpf_mem_cache_free_rcu+0x48/0xc0 kernel/bpf/memalloc.c:954
>  rhtab_delete_elem+0x185a/0x1b30 kernel/bpf/hashtab.c:2969
>  __rhtab_map_lookup_and_delete_batch+0x935/0xcb0 kernel/bpf/hashtab.c:3349
>  bpf_map_do_batch+0x445/0x630 kernel/bpf/syscall.c:-1
>  __sys_bpf+0x906/0xd90 kernel/bpf/syscall.c:-1
>
> this_cpu_ptr() requires the caller to prevent task migration.
> These helpers currently do not enforce that requirement and may
> be invoked from preemptible contexts, leading to accesses to another
> CPU's per-CPU cache after migration. Use get_cpu_ptr()/put_cpu_ptr()
> to pin the task while accessing the per-CPU allocator state.
>
> Fixes: 5af6807bdb10 ("bpf: Introduce bpf_mem_free_rcu() similar to kfree_rcu().")
> Fixes: 7c8199e24fa0 ("bpf: Introduce any context BPF specific memory allocator.")
> Reported-by: syzbot+fd7e415d891073b83e1f@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=fd7e415d891073b83e1f
> Signed-off-by: Edward Adam Davis <eadavis@qq.com>
> ---

from what I can see, bpf_mem_free() is called through bpf kfuncs only,
and all BPF programs run with migration disabled. So this seems like a
false positive. For per-cpu checking, it should probably check that
either preemption is disabled or migration is disabled. tl;dr, there
doesn't seem to be any

pw-bot: cr

> v1 -> v2: using guard against preemption
> v2 -> v3: replace get/put_cpu() to bpf_disable/enable_instrumentation()
> v3 -> v4: disable preempt to make this_cpu_ptr() work
> v4 -> v5: in mem free disable preemption
>

maybe throttle your patch resubmission spree a bit?..

^ permalink raw reply

* Re: [RFC] connectat()/bindat() or an alternative design
From: John Ericson @ 2026-07-01 19:41 UTC (permalink / raw)
  To: Cong Wang
  Cc: Li Chen, Andy Lutomirski, Christian Brauner, Jens Axboe,
	network dev, linux-fsdevel
In-Reply-To: <e396ce86-ec84-4189-9da2-98af7cfa6c41@app.fastmail.com>

Two small addendums to my previous email:

First, I linked these two unresponded-to emails for prior art in the
opening email of the thread:

> https://lore.kernel.org/netdev/4FCF171B.8000207@parallels.com/
> https://lore.kernel.org/all/CAEnbY+co6YLXANfeMnfBOBs8Ba_Sbdqz0Ahm8RzAhRF7MrxL4Q@mail.gmail.com/

Since then, I found one more:

https://lore.kernel.org/all/20120815161141.7598.16682.stgit@localhost.localdomain/#t

which is an actual patch by the same person that wrote the first linked
email, with replies. Hopefully that is useful.

Second, it occurred to me that regarding

> (Maybe `bind_unix_anon` should furthermore `listen` right away on
> `lfd` too?)

a useful thing to first decide is whether `bind_unix_anon` is
helpful and/or should be allowed for `SOCK_DGRAM`.

John

^ permalink raw reply

* Re: [PATCH net] selftests: net: bump default cmd() timeout to 20 seconds
From: Tariq Toukan @ 2026-07-01 19:34 UTC (permalink / raw)
  To: Jakub Kicinski, davem
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, shuah, petrm,
	leitao, dw, noren, gal, linux-kselftest
In-Reply-To: <20260629233348.2145841-1-kuba@kernel.org>



On 30/06/2026 2:33, Jakub Kicinski wrote:
> We always used 5 sec as the default command timeout. But soon after
> it was introduced, David effectively made us ignore the timeout
> (it was passed to process.communicate() as the wrong argument).
> Gal recently fixed that, but turns out the 5 sec is not enough
> for a lot of tests and setups. The fix regressed regressions.
> 
> In particular running reconfig commands (e.g. XDP attach) on mlx5
> with 32 rings and 9k MTU, on a heavily-debug-enabled kernel takes
> more than 5 sec. The XDP installation command will time out after
> 5 sec but since the sleeps in the kernel are non interruptible
> the command finishes anyway, leaving the XDP program attached,
> but with non-zero exit code. defer()ed cleanups are not installed,
> breaking the environment for subsequent tests.
> 

Hi Jakub,

We've improved the performance of configuration change operations over 
the past 2 years. We have more patches to be submitted soon, in addition 
to planned ones.

I'd be glad if you could share some details about the NIC and FW version 
for which you hit this 5sec timeout.

Patch LGTM.
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>

Thanks.

> Since "install XDP" is a pretty normal command a "point fix"
> does not seem appropriate. 32 rings is a fairly reasonable
> config, too, so we should just increase the timeout to 20 sec.
> 
> There's no real reason behind the value of 20.
> 
> Fixes: 1cf270424218 ("net: selftest: add test for netdev netlink queue-get API")
> Fixes: f0bd19316663 ("selftests: net: fix timeout passed as positional argument to communicate()")
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
> CC: shuah@kernel.org
> CC: petrm@nvidia.com
> CC: leitao@debian.org
> CC: dw@davidwei.uk
> CC: noren@nvidia.com
> CC: gal@nvidia.com
> CC: linux-kselftest@vger.kernel.org
> ---
>   tools/testing/selftests/net/lib/py/utils.py | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/testing/selftests/net/lib/py/utils.py b/tools/testing/selftests/net/lib/py/utils.py
> index 308c91833239..9b40049e2dbb 100644
> --- a/tools/testing/selftests/net/lib/py/utils.py
> +++ b/tools/testing/selftests/net/lib/py/utils.py
> @@ -44,7 +44,7 @@ import time
>       Use bkg() instead to run a command in the background.
>       """
>       def __init__(self, comm, shell=None, fail=True, expect_fail=False, ns=None,
> -                 background=False, host=None, timeout=5, ksft_ready=None,
> +                 background=False, host=None, timeout=20, ksft_ready=None,
>                    ksft_wait=None):
>           if ns:
>               if hasattr(ns, 'user_ns_path'):
> @@ -113,7 +113,7 @@ import time
>   
>           return stdout, stderr
>   
> -    def process(self, terminate=True, fail=None, expect_fail=False, timeout=5):
> +    def process(self, terminate=True, fail=None, expect_fail=False, timeout=20):
>           if fail is None:
>               fail = not terminate
>   


^ permalink raw reply

* RE: [PATCH net-next v6 06/15] net: ethernet: oa_tc6: Support for hardware timestamp
From: Jerry.Ray @ 2026-07-01 19:33 UTC (permalink / raw)
  To: Selvamani.Rajagopal, andrew, pier.beruto, hkallweit1, linux,
	davem, edumazet, kuba, pabeni, andrew+netdev,
	Parthiban.Veerasooran, richardcochran, robh, krzk+dt, conor+dt,
	horms, corbet, skhan
  Cc: netdev, linux-kernel, devicetree, linux-doc
In-Reply-To: <20260629-s2500-mac-phy-support-v6-6-18ce79500371@onsemi.com>


> From: Selvamani Rajagopal <Selvamani.Rajagopal@onsemi.com>
> 
> PTP register/unregister calls are implemented in oa_tc6_ptp.c.
> The APIs that work with the hardware for timestamp is provided
> by vendor code as it may be vendor dependent.
> 
> Interface for ndo_hwtstamp_set/get, ioctl, control and status
> callback for ethtool are provided to support hardware timestamp
> feature.
> 
> Besides ioctl interface, hardware timestamp functions that handles
> header and footer data are in oa_tc6.c. Helper functions are in
> oa_tc6_tstamp.c.
> 
> Signed-off-by: Selvamani Rajagopal <Selvamani.Rajagopal@onsemi.com>
> 
> ---
> changes in v6
>   - Fixed the issue of function parameter in oa_tc6_get_ts_stats
>     not described in comments section for documentation.
>   - Avoided typecasting __be32 as u32
> changes in v5
>   - As subtracting skb len by FCS size is considered bug, changes
>     are removed. Will be fixed in stable branch (net repo)
> changes in v4
>   - Fixed the condition check for subtracting the FCS size
>     from skb len.
> changes in v3
>   - Replaced warning printk with ratelimited printk
>   - Checking the hardware register before enabling hardware
>     timestamp
> changes in v1
>   - Added hardware timestamp support to the OA TC6 framework.
> ---
>  MAINTAINERS                                  |   1 +
>  drivers/net/ethernet/oa_tc6/Makefile         |   2 +-
>  drivers/net/ethernet/oa_tc6/oa_tc6.c         | 217 +++++++++++++++++++++++++--
>  drivers/net/ethernet/oa_tc6/oa_tc6_ptp.c     |  67 +++++++++
>  drivers/net/ethernet/oa_tc6/oa_tc6_std_def.h |  33 ++++
>  drivers/net/ethernet/oa_tc6/oa_tc6_tstamp.c  | 202 +++++++++++++++++++++++++
>  include/linux/oa_tc6.h                       |  12 ++
>  7 files changed, 519 insertions(+), 15 deletions(-)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index ff1295d37ae2..ca9f39b46b96 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -20214,6 +20214,7 @@ F:      drivers/rtc/rtc-optee.c
> 
>  OPEN ALLIANCE 10BASE-T1S MACPHY SERIAL INTERFACE FRAMEWORK
>  M:     Parthiban Veerasooran <parthiban.veerasooran@microchip.com>
> +M:     Selva Rajagopal <selvamani.rajagopal@onsemi.com> (timestamp support)
>  L:     netdev@vger.kernel.org
>  S:     Maintained
>  F:     Documentation/networking/oa-tc6-framework.rst
> diff --git a/drivers/net/ethernet/oa_tc6/Makefile b/drivers/net/ethernet/oa_tc6/Makefile
> index f24aae852ef2..964f668efc2d 100644
> --- a/drivers/net/ethernet/oa_tc6/Makefile
> +++ b/drivers/net/ethernet/oa_tc6/Makefile
> @@ -4,4 +4,4 @@
>  #
> 
>  obj-$(CONFIG_OA_TC6) := oa_tc6_mod.o
> -oa_tc6_mod-objs := oa_tc6.o
> +oa_tc6_mod-objs := oa_tc6.o oa_tc6_ptp.o oa_tc6_tstamp.o
> diff --git a/drivers/net/ethernet/oa_tc6/oa_tc6.c b/drivers/net/ethernet/oa_tc6/oa_tc6.c
> index bf96e8d1ccb9..6cc7c76d1d3c 100644
> --- a/drivers/net/ethernet/oa_tc6/oa_tc6.c
> +++ b/drivers/net/ethernet/oa_tc6/oa_tc6.c
> @@ -14,6 +14,15 @@
> 
>  #include "oa_tc6_std_def.h"
> 
> +struct oa_tc6_ts_info_rx {
> +       bool rtsa;
> +       bool rtsp;
> +};
> +
> +struct oa_tc6_ts_info_tx {
> +       u8 tsc;
> +};
> +
>  static int oa_tc6_spi_transfer(struct oa_tc6 *tc6,
>                                enum oa_tc6_header_type header_type, u16 length)
>  {
> @@ -48,6 +57,156 @@ static int oa_tc6_get_parity(u32 p)
>         return !((p >> 28) & 1);
>  }
> 
> +static struct oa_tc6_ts_info_tx *oa_tc6_tsinfo_tx(struct sk_buff *skb)
> +{
> +       return (struct oa_tc6_ts_info_tx *)(skb->cb);
> +}
> +
> +static struct oa_tc6_ts_info_rx *oa_tc6_tsinfo_rx(struct sk_buff *skb)
> +{
> +       return (struct oa_tc6_ts_info_rx *)(skb->cb);
> +}
> +
> +static void oa_tc6_defer_for_hwtstamp(struct oa_tc6 *tc6,
> +                                     struct sk_buff *skb)
> +{
> +       if (!tc6->hw_tstamp_enabled)
> +               return;
> +       if (!skb || (skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP) == 0)
> +               return;
> +       if (tc6->ts_config.tx_type != HWTSTAMP_TX_ON) {
> +               tc6->tx_hwtstamp_lost++;
> +               return;
> +       }
> +
> +       skb_shinfo(skb)->tx_flags |= SKBTX_IN_PROGRESS;
> +       u8 ret = tc6->tx_ts_idx++;
> +
> +       if (ret == OA_TC6_TTSCC_REG_ID)
> +               tc6->tx_ts_idx = OA_TC6_TTSCA_REG_ID;
> +       oa_tc6_tsinfo_tx(skb)->tsc = ret;
> +
> +       list_add_tail(&skb->list, &tc6->tx_ts_skb_q);
> +}
> +
> +static int oa_tc6_process_deferred_skb(struct oa_tc6 *tc6, u8 tsc)
> +{
> +       struct skb_shared_hwtstamps tstamp;
> +       struct oa_tc6_ts_info_tx *ski;
> +       struct sk_buff *skb, *tmp;
> +       bool found = false;
> +       int ret = 0;
> +
> +       /* Size of data must match OA_TC6_TSTAMP_SZ */
> +       u32 data[2];
> +
> +       list_for_each_entry_safe(skb, tmp, &tc6->tx_ts_skb_q, list) {
> +               ski = oa_tc6_tsinfo_tx(skb);
> +               if (ski->tsc != tsc)
> +                       continue;
> +               if (found) {
> +                       dev_warn_ratelimited(&tc6->spi->dev,
> +                                            "Multiple skbs. tsc = %d\n",
> +                                            tsc);
> +                       tc6->tx_hwtstamp_err++;
> +               }
> +               found = true;
> +               list_del(&skb->list);
> +
> +               /* Retrieve the timestamping info */
> +               ret = oa_tc6_read_registers(tc6,
> +                                           OA_TC6_REG_TTSCA_HIGH +
> +                                           2 * (tsc - 1), &data[0], 2);
> +
> +               if (!ret) {
> +                       tstamp.hwtstamp = ktime_set(data[0], data[1]);
> +                       skb_tstamp_tx(skb, &tstamp);
> +                       tc6->tx_hwtstamp_pkts++;
> +               }
> +
> +               dev_kfree_skb(skb);
> +       }
> +       return ret;
> +}
> +
> +static void oa_tc6_events_handle(struct oa_tc6 *tc6, u32 val)
> +{
> +       /* Check TX timestamping */
> +       if (val & STATUS0_TTSCAA)
> +               oa_tc6_process_deferred_skb(tc6, OA_TC6_TTSCA_REG_ID);
> +
> +       if (val & STATUS0_TTSCAB)
> +               oa_tc6_process_deferred_skb(tc6, OA_TC6_TTSCB_REG_ID);
> +
> +       if (val & STATUS0_TTSCAC)
> +               oa_tc6_process_deferred_skb(tc6, OA_TC6_TTSCC_REG_ID);
> +}
> +
> +static void oa_tc6_update_ts_in_rx_skb(struct oa_tc6 *tc6)
> +{
> +       struct sk_buff *skb = tc6->rx_skb;
> +       struct oa_tc6_ts_info_rx *ski;
> +       __be32 ts_val[2];
> +       u32 ts[2];
> +
> +       if (!tc6->hw_tstamp_enabled)
> +               return;
> +       ski = oa_tc6_tsinfo_rx(skb);
> +       if (!ski->rtsa)
> +               return;
> +
> +       memcpy(&ts_val[0], skb->data, 4);
> +       memcpy(&ts_val[1], (u32 *)skb->data + 1, 4);
> +
> +       ts[0] = be32_to_cpu(ts_val[0]);
> +       ts[1] = be32_to_cpu(ts_val[1]);
> +
> +       /* Check parity */
> +       if ((oa_tc6_get_parity(ts[0]) ^ oa_tc6_get_parity(ts[1])) ==
> +           !ski->rtsp) {
> +               struct skb_shared_hwtstamps *hw_ts;
> +
> +               /* Report timestamp to the upper layers */
> +               hw_ts = skb_hwtstamps(skb);
> +               memset(hw_ts, 0, sizeof(*hw_ts));
> +               hw_ts->hwtstamp = ktime_set(ts[0], ts[1]);
> +       }
> +       skb_pull(skb, sizeof(ts));

The receive path here unconditionally consumes a 64-bit (OA_TC6_TSTAMP_SZ = 8
byte) frame timestamp: it is gated only on the footer RTSA bit, always copies
two 32-bit words, and pulls sizeof(ts) from the skb. The RX buffer is likewise
sized with a fixed + OA_TC6_TSTAMP_SZ. Nothing consults the configured timestamp
width.

But oa_tc6_set_hwtstamp_settings() only sets CONFIG0.FTSE:

> +}
> +
> +static int oa_tc6_update_standard_capability(struct oa_tc6 *tc6)
> +{
> +       u32 regval = 0;
> +       int ret;
> +
> +       ret = oa_tc6_read_register(tc6, OA_TC6_REG_STDCAP, &regval);
> +       if (ret)
> +               return ret;
> +       if (regval & STDCAP_FRAME_TIMESTAMP_CAPABILITY)
> +               tc6->hw_tstamp_supported = true;
> +       return 0;
> +}
> +
> +/**
> + * oa_tc6_ioctl - generic ioctl interface for MAC-PHY drivers.
> + * @tc6: oa_tc6 struct.
> + * @rq: request from socket interface
> + * @cmd: value to set/get timestamp configuration
> + *
> + * Return: 0 on success otherwise failed.
> + */
> +int oa_tc6_ioctl(struct oa_tc6 *tc6, struct ifreq *rq, int cmd)
> +{
> +       if (!netif_running(tc6->netdev))
> +               return -EINVAL;
> +
> +       if (cmd == SIOCSHWTSTAMP || cmd == SIOCGHWTSTAMP)
> +               return oa_tc6_tstamp_ioctl(tc6, rq, cmd);
> +       else
> +               return phy_do_ioctl_running(tc6->netdev, rq, cmd);
> +}
> +EXPORT_SYMBOL_GPL(oa_tc6_ioctl);
> +
>  static __be32 oa_tc6_prepare_ctrl_header(u32 addr, u8 length,
>                                          enum oa_tc6_register_op reg_op)
>  {
> @@ -571,6 +730,9 @@ static int oa_tc6_process_extended_status(struct oa_tc6 *tc6)
>                 return ret;
>         }
> 
> +       if ((value & STATUS0_TTSCA_MASK) != 0)
> +               oa_tc6_events_handle(tc6, value & STATUS0_TTSCA_MASK);
> +
>         /* Clear the error interrupts status */
>         ret = oa_tc6_write_register(tc6, OA_TC6_REG_STATUS0, value);
>         if (ret) {
> @@ -653,6 +815,7 @@ static void oa_tc6_submit_rx_skb(struct oa_tc6 *tc6)
>             tc6->rx_skb->len > ETH_FCS_LEN)
>                 skb_trim(tc6->rx_skb, tc6->rx_skb->len - ETH_FCS_LEN);
> 
> +       oa_tc6_update_ts_in_rx_skb(tc6);
>         tc6->rx_skb->protocol = eth_type_trans(tc6->rx_skb, tc6->netdev);
>         tc6->netdev->stats.rx_packets++;
>         tc6->netdev->stats.rx_bytes += tc6->rx_skb->len;
> @@ -667,24 +830,29 @@ static void oa_tc6_update_rx_skb(struct oa_tc6 *tc6, u8 *payload, u8 length)
>         memcpy(skb_put(tc6->rx_skb, length), payload, length);
>  }
> 
> -static int oa_tc6_allocate_rx_skb(struct oa_tc6 *tc6)
> +static int oa_tc6_allocate_rx_skb(struct oa_tc6 *tc6, u32 footer)
>  {
> +       struct oa_tc6_ts_info_rx *ski;
> +
>         tc6->rx_skb = netdev_alloc_skb_ip_align(tc6->netdev, tc6->netdev->mtu +
> -                                               ETH_HLEN + ETH_FCS_LEN);
> +                                               ETH_HLEN + ETH_FCS_LEN + OA_TC6_TSTAMP_SZ);
>         if (!tc6->rx_skb) {
>                 tc6->netdev->stats.rx_dropped++;
>                 return -ENOMEM;
>         }
> 
> +       ski = oa_tc6_tsinfo_rx(tc6->rx_skb);
> +       ski->rtsa = FIELD_GET(OA_TC6_DATA_FOOTER_RTSA_VALID, footer);
> +       ski->rtsp = FIELD_GET(OA_TC6_DATA_FOOTER_RTSP_VALID, footer);
>         return 0;
>  }
> 
>  static int oa_tc6_prcs_complete_rx_frame(struct oa_tc6 *tc6, u8 *payload,
> -                                        u16 size)
> +                                        u16 size, u32 footer)
>  {
>         int ret;
> 
> -       ret = oa_tc6_allocate_rx_skb(tc6);
> +       ret = oa_tc6_allocate_rx_skb(tc6, footer);
>         if (ret)
>                 return ret;
> 
> @@ -695,11 +863,11 @@ static int oa_tc6_prcs_complete_rx_frame(struct oa_tc6 *tc6, u8 *payload,
>         return 0;
>  }
> 
> -static int oa_tc6_prcs_rx_frame_start(struct oa_tc6 *tc6, u8 *payload, u16 size)
> +static int oa_tc6_prcs_rx_frame_start(struct oa_tc6 *tc6, u8 *payload, u16 size, u32 footer)
>  {
>         int ret;
> 
> -       ret = oa_tc6_allocate_rx_skb(tc6);
> +       ret = oa_tc6_allocate_rx_skb(tc6, footer);
>         if (ret)
>                 return ret;
> 
> @@ -744,7 +912,7 @@ static int oa_tc6_prcs_rx_chunk_payload(struct oa_tc6 *tc6, u8 *data,
>                 size = end_byte_offset + 1 - start_byte_offset;
>                 return oa_tc6_prcs_complete_rx_frame(tc6,
>                                                      &data[start_byte_offset],
> -                                                    size);
> +                                                    size, footer);
>         }
> 
>         /* Process the chunk with only rx frame start */
> @@ -752,7 +920,7 @@ static int oa_tc6_prcs_rx_chunk_payload(struct oa_tc6 *tc6, u8 *data,
>                 size = OA_TC6_CHUNK_PAYLOAD_SIZE - start_byte_offset;
>                 return oa_tc6_prcs_rx_frame_start(tc6,
>                                                   &data[start_byte_offset],
> -                                                 size);
> +                                                 size, footer);
>         }
> 
>         /* Process the chunk with only rx frame end */
> @@ -777,7 +945,7 @@ static int oa_tc6_prcs_rx_chunk_payload(struct oa_tc6 *tc6, u8 *data,
>                 size = OA_TC6_CHUNK_PAYLOAD_SIZE - start_byte_offset;
>                 return oa_tc6_prcs_rx_frame_start(tc6,
>                                                   &data[start_byte_offset],
> -                                                 size);
> +                                                 size, footer);
>         }
> 
>         /* Process the chunk with ongoing rx frame data */
> @@ -831,13 +999,15 @@ static int oa_tc6_process_spi_data_rx_buf(struct oa_tc6 *tc6, u16 length)
>  }
> 
>  static __be32 oa_tc6_prepare_data_header(bool data_valid, bool start_valid,
> -                                        bool end_valid, u8 end_byte_offset)
> +                                        bool end_valid, u8 end_byte_offset,
> +                                        u8 tsc)
>  {
>         u32 header = FIELD_PREP(OA_TC6_DATA_HEADER_DATA_NOT_CTRL,
>                                 OA_TC6_DATA_HEADER) |
>                      FIELD_PREP(OA_TC6_DATA_HEADER_DATA_VALID, data_valid) |
>                      FIELD_PREP(OA_TC6_DATA_HEADER_START_VALID, start_valid) |
>                      FIELD_PREP(OA_TC6_DATA_HEADER_END_VALID, end_valid) |
> +                    FIELD_PREP(OA_TC6_DATA_HEADER_TSC_OFFSET, tsc) |
>                      FIELD_PREP(OA_TC6_DATA_HEADER_END_BYTE_OFFSET,
>                                 end_byte_offset);
> 
> @@ -856,6 +1026,7 @@ static void oa_tc6_add_tx_skb_to_spi_buf(struct oa_tc6 *tc6)
>         enum oa_tc6_data_start_valid_info start_valid;
>         u8 end_byte_offset = 0;
>         u16 length_to_copy;
> +       u8 tsc = 0;
> 
>         /* Initial value is assigned here to avoid more than 80 characters in
>          * the declaration place.
> @@ -865,8 +1036,10 @@ static void oa_tc6_add_tx_skb_to_spi_buf(struct oa_tc6 *tc6)
>         /* Set start valid if the current tx chunk contains the start of the tx
>          * ethernet frame.
>          */
> -       if (!tc6->tx_skb_offset)
> +       if (!tc6->tx_skb_offset) {
>                 start_valid = OA_TC6_DATA_START_VALID;
> +               tsc = oa_tc6_tsinfo_tx(tc6->ongoing_tx_skb)->tsc;
> +       }
> 
>         /* If the remaining tx skb length is more than the chunk payload size of
>          * 64 bytes then copy only 64 bytes and leave the ongoing tx skb for
> @@ -887,12 +1060,17 @@ static void oa_tc6_add_tx_skb_to_spi_buf(struct oa_tc6 *tc6)
>                 tc6->tx_skb_offset = 0;
>                 tc6->netdev->stats.tx_bytes += tc6->ongoing_tx_skb->len;
>                 tc6->netdev->stats.tx_packets++;
> -               kfree_skb(tc6->ongoing_tx_skb);
> +
> +               /* Free the ones that are not saved for later processing,
> +                * like timestamping.
> +                */
> +               if (!(skb_shinfo(tc6->ongoing_tx_skb)->tx_flags & SKBTX_IN_PROGRESS))
> +                       kfree_skb(tc6->ongoing_tx_skb);
>                 tc6->ongoing_tx_skb = NULL;
>         }
> 
>         *tx_buf = oa_tc6_prepare_data_header(OA_TC6_DATA_VALID, start_valid,
> -                                            end_valid, end_byte_offset);
> +                                            end_valid, end_byte_offset, tsc);
>         tc6->spi_data_tx_buf_offset += OA_TC6_CHUNK_SIZE;
>  }
> 
> @@ -910,6 +1088,8 @@ static u16 oa_tc6_prepare_spi_tx_buf_for_tx_skbs(struct oa_tc6 *tc6)
>                         tc6->ongoing_tx_skb = tc6->waiting_tx_skb;
>                         tc6->waiting_tx_skb = NULL;
>                         spin_unlock_bh(&tc6->tx_skb_lock);
> +                       oa_tc6_defer_for_hwtstamp(tc6,
> +                                                 tc6->ongoing_tx_skb);
>                 }
>                 if (!tc6->ongoing_tx_skb)
>                         break;
> @@ -926,7 +1106,7 @@ static void oa_tc6_add_empty_chunks_to_spi_buf(struct oa_tc6 *tc6,
> 
>         header = oa_tc6_prepare_data_header(OA_TC6_DATA_INVALID,
>                                             OA_TC6_DATA_START_INVALID,
> -                                           OA_TC6_DATA_END_INVALID, 0);
> +                                           OA_TC6_DATA_END_INVALID, 0, false);
> 
>         while (needed_empty_chunks--) {
>                 __be32 *tx_buf = tc6->spi_data_tx_buf +
> @@ -1118,6 +1298,7 @@ netdev_tx_t oa_tc6_start_xmit(struct oa_tc6 *tc6, struct sk_buff *skb)
>                 return NETDEV_TX_OK;
>         }
> 
> +       oa_tc6_tsinfo_tx(skb)->tsc = 0;
>         spin_lock_bh(&tc6->tx_skb_lock);
>         tc6->waiting_tx_skb = skb;
>         spin_unlock_bh(&tc6->tx_skb_lock);
> @@ -1151,6 +1332,8 @@ struct oa_tc6 *oa_tc6_init(struct spi_device *spi, struct net_device *netdev)
>         SET_NETDEV_DEV(netdev, &spi->dev);
>         mutex_init(&tc6->spi_ctrl_lock);
>         spin_lock_init(&tc6->tx_skb_lock);
> +       tc6->tx_ts_idx = OA_TC6_TTSCA_REG_ID;
> +       INIT_LIST_HEAD(&tc6->tx_ts_skb_q);
> 
>         /* Set the SPI controller to pump at realtime priority */
>         tc6->spi->rt = true;
> @@ -1216,6 +1399,12 @@ struct oa_tc6 *oa_tc6_init(struct spi_device *spi, struct net_device *netdev)
>                 goto phy_exit;
>         }
> 
> +       ret = oa_tc6_update_standard_capability(tc6);
> +       if (ret) {
> +               dev_err(&tc6->spi->dev, "Failed to read capability\n");
> +               goto phy_exit;
> +       }
> +
>         ret = devm_request_threaded_irq(&tc6->spi->dev, tc6->spi->irq,
>                                         oa_tc6_macphy_isr,
>                                         oa_tc6_macphy_threaded_irq,
> diff --git a/drivers/net/ethernet/oa_tc6/oa_tc6_ptp.c b/drivers/net/ethernet/oa_tc6/oa_tc6_ptp.c
> new file mode 100644
> index 000000000000..921191ec6829
> --- /dev/null
> +++ b/drivers/net/ethernet/oa_tc6/oa_tc6_ptp.c
> @@ -0,0 +1,67 @@
> +// SPDX-License-Identifier: GPL-2.0+
> +/*
> + * Support for hardware timestamping feature for OPEN Alliance
> + * 10BASE‑T1x MAC‑PHY Serial Interface framework
> + *
> + * Author: Selva Rajagopal <selvamani.rajagopal@onsemi.com>
> + */
> +
> +#include <linux/hrtimer.h>
> +#include <linux/irq.h>
> +#include <linux/irqdomain.h>
> +#include <linux/kernel.h>
> +#include <linux/netdevice.h>
> +#include <linux/phylink.h>
> +#include <linux/spi/spi.h>
> +#include <linux/oa_tc6.h>
> +#include <linux/net_tstamp.h>
> +#include <linux/ptp_clock_kernel.h>
> +#include <linux/delay.h>
> +#include <linux/mutex.h>
> +#include <linux/ktime.h>
> +#include <linux/errno.h>
> +
> +#include "oa_tc6_std_def.h"
> +
> +/**
> + * oa_tc6_ptp_register - Registers clock related callbacks
> + * @tc6: oa_tc6 struct.
> + * @info: Describes a PTP hardware clock
> + *
> + * Description: Vendors are expected to set the hardware timestamp
> + * related callbacks before calling this function.
> + */
> +int oa_tc6_ptp_register(struct oa_tc6 *tc6, struct ptp_clock_info *info)
> +{
> +       /* Not supporting hardware timestamp isn't an error */
> +       if (!tc6->hw_tstamp_supported)
> +               return 0;
> +
> +       snprintf(info->name, sizeof(info->name), "%s",
> +                "OA TC6 PTP clock");
> +       tc6->ptp_clock = ptp_clock_register(info, &tc6->spi->dev);
> +       if (IS_ERR(tc6->ptp_clock)) {
> +               dev_err(&tc6->spi->dev, "Registration of %s failed",
> +                       info->name);
> +               return -EFAULT;
> +       }
> +       dev_info(&tc6->spi->dev, "%s registered. index %d", info->name,
> +                ptp_clock_index(tc6->ptp_clock));
> +       return 0;
> +}
> +EXPORT_SYMBOL_GPL(oa_tc6_ptp_register);
> +
> +/**
> + * oa_tc6_ptp_unregister - Unregisters clock related callbacks
> + * @tc6: oa_tc6 struct.
> + */
> +void oa_tc6_ptp_unregister(struct oa_tc6 *tc6)
> +{
> +       if (tc6->ptp_clock)
> +               ptp_clock_unregister(tc6->ptp_clock);
> +}
> +EXPORT_SYMBOL_GPL(oa_tc6_ptp_unregister);
> +
> +MODULE_DESCRIPTION("OPEN Alliance 10BASE‑T1x MAC‑PHY Serial Interface Lib");
> +MODULE_AUTHOR("Selva Rajagopal <selvamani.rajagopal@onsemi.com>");
> +MODULE_LICENSE("GPL");
> diff --git a/drivers/net/ethernet/oa_tc6/oa_tc6_std_def.h b/drivers/net/ethernet/oa_tc6/oa_tc6_std_def.h
> index bc58834a3368..e8ec379dd60d 100644
> --- a/drivers/net/ethernet/oa_tc6/oa_tc6_std_def.h
> +++ b/drivers/net/ethernet/oa_tc6/oa_tc6_std_def.h
> @@ -22,6 +22,7 @@
>  /* Standard Capabilities Register */
>  #define OA_TC6_REG_STDCAP                      0x0002
>  #define STDCAP_DIRECT_PHY_REG_ACCESS           BIT(8)
> +#define STDCAP_FRAME_TIMESTAMP_CAPABILITY      BIT(6)
> 
>  /* Reset Control and Status Register */
>  #define OA_TC6_REG_RESET                       0x0003
> @@ -31,9 +32,14 @@
>  #define OA_TC6_REG_CONFIG0                     0x0004
>  #define CONFIG0_SYNC                           BIT(15)
>  #define CONFIG0_ZARFE_ENABLE                   BIT(12)
> +#define CONFIG0_FTSE_ENABLE                    BIT(7)
> 
>  /* Status Register #0 */
>  #define OA_TC6_REG_STATUS0                     0x0008
> +#define STATUS0_TTSCAC                         BIT(10)
> +#define STATUS0_TTSCAB                         BIT(9)
> +#define STATUS0_TTSCAA                         BIT(8)
> +#define STATUS0_TTSCA_MASK             GENMASK(10, 8)
>  #define STATUS0_RESETC                         BIT(6)  /* Reset Complete */
>  #define STATUS0_HEADER_ERROR                   BIT(5)
>  #define STATUS0_LOSS_OF_FRAME_ERROR            BIT(4)
> @@ -47,6 +53,7 @@
> 
>  /* Interrupt Mask Register #0 */
>  #define OA_TC6_REG_INT_MASK0                   0x000C
> +#define INT_MASK0_TTSCA_MASK                   GENMASK(10, 8)
>  #define INT_MASK0_HEADER_ERR_MASK              BIT(5)
>  #define INT_MASK0_LOSS_OF_FRAME_ERR_MASK       BIT(4)
>  #define INT_MASK0_RX_BUFFER_OVERFLOW_ERR_MASK  BIT(3)
> @@ -58,6 +65,9 @@
>  #define OA_TC6_PHY_STD_REG_ADDR_BASE           0xFF00
>  #define OA_TC6_PHY_STD_REG_ADDR_MASK           0x1F
> 
> +/* Tx timestamp capture register A (high) */
> +#define OA_TC6_REG_TTSCA_HIGH                  (0x1010)
> +

Please fix the value of OA_TC6_REG_TTSCA_HIGH to 0x0010 in patch 6 where it is
introduced rather than correcting it in patch 12.

>  /* Control command header */
>  #define OA_TC6_CTRL_HEADER_DATA_NOT_CTRL       BIT(31)
>  #define OA_TC6_CTRL_HEADER_WRITE_NOT_READ      BIT(29)
> @@ -73,6 +83,7 @@
>  #define OA_TC6_DATA_HEADER_START_WORD_OFFSET   GENMASK(19, 16)
>  #define OA_TC6_DATA_HEADER_END_VALID           BIT(14)
>  #define OA_TC6_DATA_HEADER_END_BYTE_OFFSET     GENMASK(13, 8)
> +#define OA_TC6_DATA_HEADER_TSC_OFFSET          GENMASK(7, 6)
>  #define OA_TC6_DATA_HEADER_PARITY              BIT(0)
> 
>  /* Data footer */
> @@ -84,6 +95,8 @@
>  #define OA_TC6_DATA_FOOTER_START_VALID         BIT(20)
>  #define OA_TC6_DATA_FOOTER_START_WORD_OFFSET   GENMASK(19, 16)
>  #define OA_TC6_DATA_FOOTER_END_VALID           BIT(14)
> +#define OA_TC6_DATA_FOOTER_RTSA_VALID          BIT(7)
> +#define OA_TC6_DATA_FOOTER_RTSP_VALID          BIT(6)
>  #define OA_TC6_DATA_FOOTER_END_BYTE_OFFSET     GENMASK(13, 8)
>  #define OA_TC6_DATA_FOOTER_TX_CREDITS          GENMASK(5, 1)
> 
> @@ -105,6 +118,12 @@
>  #define STATUS0_RESETC_POLL_DELAY              1000
>  #define STATUS0_RESETC_POLL_TIMEOUT            1000000
> 
> +#define OA_TC6_TSTAMP_SZ                       8
> +
> +#define OA_TC6_TTSCA_REG_ID                    1
> +#define OA_TC6_TTSCB_REG_ID                    2
> +#define OA_TC6_TTSCC_REG_ID                    3
> +
>  /* Internal structure for MAC-PHY drivers */
>  struct oa_tc6 {
>         struct net_device *netdev;
> @@ -127,6 +146,17 @@ struct oa_tc6 {
>         bool rx_buf_overflow;
>         bool int_flag;
>         bool disable_traffic;
> +       struct ptp_clock_info ptp_clock_info;
> +       struct hwtstamp_config ts_config;
> +       struct list_head tx_ts_skb_q;
> +       struct ptp_clock *ptp_clock;
> +       bool hw_tstamp_supported;
> +       bool hw_tstamp_enabled;
> +       u32 tx_hwtstamp_pkts;
> +       u32 tx_hwtstamp_lost;
> +       u32 tx_hwtstamp_err;
> +       int vend1_mms;
> +       u8 tx_ts_idx;
>  };
> 
>  enum oa_tc6_header_type {
> @@ -153,5 +183,8 @@ enum oa_tc6_data_end_valid_info {
>         OA_TC6_DATA_END_INVALID,
>         OA_TC6_DATA_END_VALID,
>  };
> +
> +int oa_tc6_tstamp_ioctl(struct oa_tc6 *tc6, struct ifreq *rq, int cmd);
> +
>  #endif /* OA_TC6_STD_DEF_H */
> 
> diff --git a/drivers/net/ethernet/oa_tc6/oa_tc6_tstamp.c b/drivers/net/ethernet/oa_tc6/oa_tc6_tstamp.c
> new file mode 100644
> index 000000000000..272701a4081d
> --- /dev/null
> +++ b/drivers/net/ethernet/oa_tc6/oa_tc6_tstamp.c
> @@ -0,0 +1,202 @@
> +// SPDX-License-Identifier: GPL-2.0+
> +/*
> + * OPEN Alliance 10BASE‑T1x MAC‑PHY Serial Interface framework
> + *
> + * Author: Selva Rajagopal <selvamani.rajagopal@onsemi.com>
> + */
> +
> +#include <linux/bitfield.h>
> +#include <linux/iopoll.h>
> +#include <linux/mdio.h>
> +#include <linux/phy.h>
> +#include <linux/oa_tc6.h>
> +
> +#include "oa_tc6_std_def.h"
> +
> +static int oa_tc6_set_hwtstamp_settings(struct oa_tc6 *tc6)
> +{
> +       u32 cfg0, irqm, status0;
> +       int ret;
> +
> +       ret = oa_tc6_read_register(tc6, OA_TC6_REG_CONFIG0, &cfg0);
> +       if (ret) {
> +               dev_err(&tc6->spi->dev, "Failed to read CFG0 register\n");
> +               goto out;
> +       }
> +
> +       ret = oa_tc6_read_register(tc6, OA_TC6_REG_INT_MASK0, &irqm);
> +       if (ret) {
> +               dev_err(&tc6->spi->dev, "failed to read IRQM register\n");
> +               goto out;
> +       }
> +
> +       if (tc6->ts_config.tx_type == HWTSTAMP_TX_ON ||
> +           tc6->ts_config.rx_filter == HWTSTAMP_FILTER_ALL)
> +               cfg0 |= CONFIG0_FTSE_ENABLE;
> +       else
> +               cfg0 &= ~CONFIG0_FTSE_ENABLE;

It never sets the 64-bit frame-timestamp-select bit (CONFIG0 bit 6). So the
framework enables timestamping for 32-bit bitwidth while the
receive path strips 8 bytes.

This happens to work for the S2500 only because the S2500 driver forces 64-bit
independently in its own SPI config (patch 12/15):

Please review my feedback against your v3 patch series on 5-Jun.
   (CONFIG0_FTSE_ENABLE | CONFIG0_FTSS_64BIT)

> +
> +       if (tc6->ts_config.tx_type == HWTSTAMP_TX_ON)
> +               irqm &= ~INT_MASK0_TTSCA_MASK;
> +       else
> +               irqm |= INT_MASK0_TTSCA_MASK;
> +
> +       /* Clear timestamp related IRQs */
> +       status0 = STATUS0_TTSCA_MASK;
> +       ret = oa_tc6_write_register(tc6, OA_TC6_REG_STATUS0, status0);
> +       if (ret) {
> +               dev_err(&tc6->spi->dev, "failed to write STATUS0 register\n");
> +               goto out;
> +       }
> +
> +       ret = oa_tc6_write_register(tc6, OA_TC6_REG_INT_MASK0, irqm);
> +       if (ret) {
> +               dev_err(&tc6->spi->dev, "failed to write IRQM register\n");
> +               goto out;
> +       }
> +
> +       ret = oa_tc6_write_register(tc6, OA_TC6_REG_CONFIG0, cfg0);
> +       if (ret) {
> +               dev_err(&tc6->spi->dev, "failed to write CFG0 register\n");
> +               goto out;
> +       }
> +       if (cfg0 & CONFIG0_FTSE_ENABLE)
> +               tc6->hw_tstamp_enabled = true;
> +       else
> +               tc6->hw_tstamp_enabled = false;
> +out:
> +       return ret;
> +}
> +
> +/**
> + * oa_tc6_hwtstamp_get - gets hardware timestamp config
> + * @tc6: oa_tc6 struct.
> + * @cfg: kernel copy of hardware timestamp config
> + */
> +void oa_tc6_hwtstamp_get(struct oa_tc6 *tc6,
> +                        struct kernel_hwtstamp_config *cfg)
> +{
> +       hwtstamp_config_to_kernel(cfg, &tc6->ts_config);
> +}
> +EXPORT_SYMBOL_GPL(oa_tc6_hwtstamp_get);
> +
> +/**
> + * oa_tc6_hwtstamp_set - sets hardware timestamp config
> + * @tc6: oa_tc6 struct.
> + * @cfg: kernel copy of hardware timestamp config
> + *
> + * Return: 0 on success otherwise failed.
> + */
> +int oa_tc6_hwtstamp_set(struct oa_tc6 *tc6,
> +                       struct kernel_hwtstamp_config *cfg)
> +{
> +       if (!netif_running(tc6->netdev))
> +               return -EIO;
> +
> +       if (!tc6->hw_tstamp_supported)
> +               return -EOPNOTSUPP;
> +
> +       switch (cfg->tx_type) {
> +       case HWTSTAMP_TX_OFF:
> +       case HWTSTAMP_TX_ON:
> +               break;
> +       default:
> +               return -ERANGE;
> +       }
> +
> +       switch (cfg->rx_filter) {
> +       case HWTSTAMP_FILTER_NONE:
> +       case HWTSTAMP_FILTER_ALL:
> +       case HWTSTAMP_FILTER_SOME:
> +       case HWTSTAMP_FILTER_PTP_V1_L4_EVENT:
> +       case HWTSTAMP_FILTER_PTP_V1_L4_SYNC:
> +       case HWTSTAMP_FILTER_PTP_V1_L4_DELAY_REQ:
> +       case HWTSTAMP_FILTER_PTP_V2_L4_EVENT:
> +       case HWTSTAMP_FILTER_PTP_V2_L4_SYNC:
> +       case HWTSTAMP_FILTER_PTP_V2_L4_DELAY_REQ:
> +       case HWTSTAMP_FILTER_PTP_V2_L2_EVENT:
> +       case HWTSTAMP_FILTER_PTP_V2_L2_SYNC:
> +       case HWTSTAMP_FILTER_PTP_V2_L2_DELAY_REQ:
> +       case HWTSTAMP_FILTER_PTP_V2_EVENT:
> +       case HWTSTAMP_FILTER_PTP_V2_SYNC:
> +       case HWTSTAMP_FILTER_PTP_V2_DELAY_REQ:
> +       case HWTSTAMP_FILTER_NTP_ALL:
> +               break;
> +       default:
> +               return -ERANGE;
> +       }
> +       hwtstamp_config_from_kernel(&tc6->ts_config, cfg);
> +
> +       /* Supports timestamping all traffic */
> +       if (cfg->rx_filter != HWTSTAMP_FILTER_NONE)
> +               tc6->ts_config.rx_filter = HWTSTAMP_FILTER_ALL;
> +       return oa_tc6_set_hwtstamp_settings(tc6);
> +}
> +EXPORT_SYMBOL_GPL(oa_tc6_hwtstamp_set);
> +
> +/**
> + * oa_tc6_get_ts_stats - Provides timestamping stats
> + * @tc6: oa_tc6 struct.
> + * @ts_stats: ethtool data structure to fill in
> + */
> +void oa_tc6_get_ts_stats(struct oa_tc6 *tc6,
> +                        struct ethtool_ts_stats *stats)
> +{
> +       stats->pkts = tc6->tx_hwtstamp_pkts;
> +       stats->err = tc6->tx_hwtstamp_err;
> +       stats->lost = tc6->tx_hwtstamp_lost;
> +}
> +EXPORT_SYMBOL_GPL(oa_tc6_get_ts_stats);
> +
> +int oa_tc6_tstamp_ioctl(struct oa_tc6 *tc6, struct ifreq *rq, int cmd)
> +{
> +       struct kernel_hwtstamp_config kcfg;
> +       struct hwtstamp_config tscfg;
> +       int ret = 0;
> +
> +       if (!tc6->hw_tstamp_supported)
> +               return -EOPNOTSUPP;
> +
> +       if (cmd == SIOCSHWTSTAMP) {
> +               if (copy_from_user(&tscfg, rq->ifr_data,
> +                                  sizeof(tscfg)))
> +                       return -EFAULT;
> +
> +               if (tscfg.flags)
> +                       return -EINVAL;
> +               hwtstamp_config_to_kernel(&kcfg, &tscfg);
> +               ret = oa_tc6_hwtstamp_set(tc6, &kcfg);
> +               if (ret)
> +                       return ret;
> +       }
> +       if (copy_to_user(rq->ifr_data, &tc6->ts_config,
> +                        sizeof(tc6->ts_config)))
> +               ret = -EFAULT;
> +       return ret;
> +}
> +
> +/**
> + * oa_tc6_get_ts_info - Provides timestamp info for ethtool
> + * @tc6: oa_tc6 struct.
> + * @info: ethtool timestamping info structure
> + * @ts_stats: ethtool data structure to fill in
> + */
> +int oa_tc6_get_ts_info(struct oa_tc6 *tc6,
> +                      struct kernel_ethtool_ts_info *info)
> +{
> +       if (!tc6->ptp_clock)
> +               return ethtool_op_get_ts_info(tc6->netdev, info);
> +
> +       info->so_timestamping = SOF_TIMESTAMPING_RAW_HARDWARE |
> +                               SOF_TIMESTAMPING_TX_HARDWARE |
> +                               SOF_TIMESTAMPING_RX_HARDWARE;
> +       info->phc_index = ptp_clock_index(tc6->ptp_clock);
> +       info->tx_types = BIT(HWTSTAMP_TX_ON);
> +       info->rx_filters = BIT(HWTSTAMP_FILTER_ALL);
> +       return 0;
> +}
> +EXPORT_SYMBOL_GPL(oa_tc6_get_ts_info);
> +
> +MODULE_DESCRIPTION("OPEN Alliance 10BASE‑T1x MAC‑PHY Serial Interface Lib");
> +MODULE_AUTHOR("Selva Rajagopal <selvamani.rajagopal@onsemi.com>");
> +MODULE_LICENSE("GPL");
> diff --git a/include/linux/oa_tc6.h b/include/linux/oa_tc6.h
> index 39b80033dfa9..4047c22a366a 100644
> --- a/include/linux/oa_tc6.h
> +++ b/include/linux/oa_tc6.h
> @@ -12,6 +12,7 @@
> 
>  #include <linux/etherdevice.h>
>  #include <linux/spi/spi.h>
> +#include <linux/ptp_clock_kernel.h>
> 
>  /* PHY – Clause 45 registers memory map selector (MMS) as per table 6 in
>   * the OPEN Alliance specification.
> @@ -36,4 +37,15 @@ int oa_tc6_read_registers(struct oa_tc6 *tc6, u32 address, u32 value[],
>                           u8 length);
>  netdev_tx_t oa_tc6_start_xmit(struct oa_tc6 *tc6, struct sk_buff *skb);
>  int oa_tc6_zero_align_receive_frame_enable(struct oa_tc6 *tc6);
> +int oa_tc6_ptp_register(struct oa_tc6 *tc6, struct ptp_clock_info *info);
> +int oa_tc6_ioctl(struct oa_tc6 *tc6, struct ifreq *rq, int cmd);
> +int oa_tc6_get_ts_info(struct oa_tc6 *tc6,
> +                      struct kernel_ethtool_ts_info *ts_info);
> +void oa_tc6_hwtstamp_get(struct oa_tc6 *tc6,
> +                        struct kernel_hwtstamp_config *cfg);
> +void oa_tc6_get_ts_stats(struct oa_tc6 *tc6,
> +                        struct ethtool_ts_stats *ts_stats);
> +int oa_tc6_hwtstamp_set(struct oa_tc6 *tc6,
> +                       struct kernel_hwtstamp_config *cfg);
> +void oa_tc6_ptp_unregister(struct oa_tc6 *tc6);
>  #endif /* _LINUX_OA_TC6_H */
> 
> --
> 2.43.0

^ permalink raw reply

* Re: [PATCH] qede: Prevent possible snprintf() truncation by bounding %s string format
From: David Laight @ 2026-07-01 19:33 UTC (permalink / raw)
  To: Baran Tuna
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Breno Leitao,
	open list:QLOGIC QL4xxx ETHERNET DRIVER, open list
In-Reply-To: <20260701144713.197557-1-barant@fastmail.com>

On Wed,  1 Jul 2026 17:47:11 +0300
Baran Tuna <barant@fastmail.com> wrote:

> GCC warning shows that formatted strings may
> exceed the fixed-size destination buffers.
> 
> Bounding the %s string format
> so the maximum formatted output always fits.
> 
> This eliminates the -Wformat-truncation warning.
> 
> Signed-off-by: Baran Tuna <barant@fastmail.com>
> ---
>  drivers/net/ethernet/qlogic/qede/qede_ethtool.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/ethernet/qlogic/qede/qede_ethtool.c b/drivers/net/ethernet/qlogic/qede/qede_ethtool.c
> index 647f30a16a94..5428f53150a0 100644
> --- a/drivers/net/ethernet/qlogic/qede/qede_ethtool.c
> +++ b/drivers/net/ethernet/qlogic/qede/qede_ethtool.c
> @@ -618,10 +618,10 @@ static void qede_get_drvinfo(struct net_device *ndev,
>  	if ((strlen(storm) + strlen("[storm]")) <
>  	    sizeof(info->version))
>  		snprintf(info->version, sizeof(info->version),
> -			 "[storm %s]", storm);
> +			 "[storm %.16s]", storm);
>  	else
>  		snprintf(info->version, sizeof(info->version),
> -			 "%s", storm);
> +			 "%.16s", storm);

That looks wrong.
The code is using two different formats based on the length
of 'storm' but you are truncating it to the same length in
both cases.
I think this will work:
	if (snprintf(info->version, sizeof(info->version),
		     "[storm %s]", storm) >= sizeof(info->strorm))
		strscpy(info->version, storm);

-- David

>  
>  	if (edev->dev_info.common.mbi_version) {
>  		snprintf(mbi, ETHTOOL_FWVERS_LEN, "%d.%d.%d",
> @@ -632,10 +632,10 @@ static void qede_get_drvinfo(struct net_device *ndev,
>  			 (edev->dev_info.common.mbi_version &
>  			  QED_MBI_VERSION_0_MASK) >> QED_MBI_VERSION_0_OFFSET);
>  		snprintf(info->fw_version, sizeof(info->fw_version),
> -			 "mbi %s [mfw %s]", mbi, mfw);
> +			 "mbi %.10s [mfw %.10s]", mbi, mfw);
>  	} else {
>  		snprintf(info->fw_version, sizeof(info->fw_version),
> -			 "mfw %s", mfw);
> +			 "mfw %.16s", mfw);
>  	}
>  
>  	strscpy(info->bus_info, pci_name(edev->pdev), sizeof(info->bus_info));


^ permalink raw reply

* Re: [PATCH 02/15] dt-bindings: clock: mediatek: regroup MT8188 dt-bindings into MT8186
From: Rob Herring @ 2026-07-01 19:33 UTC (permalink / raw)
  To: Louis-Alexis Eyraud
  Cc: Michael Turquette, Stephen Boyd, Brian Masney,
	Krzysztof Kozlowski, Conor Dooley, Matthias Brugger,
	AngeloGioacchino Del Regno, Chun-Jie Chen, Philipp Zabel,
	Edward-JW Yang, Richard Cochran, kernel, linux-clk, devicetree,
	linux-kernel, linux-arm-kernel, linux-mediatek, netdev
In-Reply-To: <20260701-mt8189-clocks-system-base-v1-2-2b048feea50a@collabora.com>

On Wed, Jul 01, 2026 at 03:11:07PM +0200, Louis-Alexis Eyraud wrote:
> Regroup the MT8188 clock and system clock dt-bindings into MT8186 ones
> to ease maintainability and have common files for several currently
> supported SoC or new future ones, that have the same kind of clock
> controller design.
> 
> Note:
> The `#clock-cells` property is a required property for all compatibles
> declared in MT8188 clock and system clock dt-bindings but not in MT8186
> ones.
> To avoid ABI breakage, conditional blocks to check this requirement
> for MT8188 compatibles are added, rather than enforcing it for MT8186
> compatibles.

If the existing DTs are just wrong, then I would just make #clock-cells 
required. But please update the .dts files so the warnings don't grow.

The grouping I would do here is:

- clock controller only
- reset controller only
- both clock and reset controller

That should avoid any if/then schemas.

Rob

^ permalink raw reply

* Re: [PATCH nf] netfilter: nft_set_rbtree: reject interval-end get for open intervals
From: Florian Westphal @ 2026-07-01 19:31 UTC (permalink / raw)
  To: Melbin K Mathew; +Cc: pablo, netfilter-devel, coreteam, netdev, linux-kernel
In-Reply-To: <20260630155507.92815-1-mlbnkm1@gmail.com>

Melbin K Mathew <mlbnkm1@gmail.com> wrote:
> nft_rbtree_get() uses the interval endpoint selected by
> nft_array_get_cmp(). For NFT_SET_ELEM_INTERVAL_END requests, the function
> uses interval->to to recover struct nft_rbtree_elem.
> 
> Open-ended intervals can have a NULL end endpoint. In that case,
> nft_array_get_cmp() treats the missing endpoint as b = -1, which can
> still match an interval-end query. Avoid deriving an element pointer
> from a NULL endpoint and report the element as not found instead.
> 
> Return -ENOENT for interval-end requests against open-ended intervals.
> 
> Fixes: 2aa34191f06f ("netfilter: nft_set_rbtree: use binary search array in get command")
> Signed-off-by: Melbin K Mathew <mlbnkm1@gmail.com>
> ---
> Notes:
>   A reduced userspace model confirms the comparator returns match for a
>   NULL-ended interval when NFT_SET_ELEM_INTERVAL_END is set, and that
>   container_of(NULL, ext) produces a garbage pointer (UBSAN fires).
> 
>   I have not reproduced an end-to-end crash through normal nft CLI usage.
>   An instrumented WARN in this branch did not fire during interval-set
>   tests with nft add/get/list. The patch is a defensive fix for the NULL
>   endpoint case.
> 
>   Tested on 7.2-rc1 with KASAN and UBSAN enabled. Function tracing
>   confirms nft_rbtree_get() is reached via nft get element. The added
>   guard returns -ENOENT for a NULL interval endpoint in the instrumented
>   test case.
> ---
>  net/netfilter/nft_set_rbtree.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/net/netfilter/nft_set_rbtree.c b/net/netfilter/nft_set_rbtree.c
> index 018bbb6df4..024a2cd3a6 100644
> --- a/net/netfilter/nft_set_rbtree.c
> +++ b/net/netfilter/nft_set_rbtree.c
> @@ -184,10 +184,13 @@ nft_rbtree_get(const struct net *net, const struct nft_set *set,
>  	if (!interval || nft_set_elem_expired(interval->from))
>  		return ERR_PTR(-ENOENT);
>  
> -	if (flags & NFT_SET_ELEM_INTERVAL_END)
> +	if (flags & NFT_SET_ELEM_INTERVAL_END) {
> +		if (!interval->to)
> +			return ERR_PTR(-ENOENT);
>  		rbe = container_of(interval->to, struct nft_rbtree_elem, ext);
> -	else
> +	} else {
>  		rbe = container_of(interval->from, struct nft_rbtree_elem, ext);
> +	}

Hmm, I don't think the query should have returned a match in the first
place, i.e. we should have left via (!interval || ... condition.

Pablo, could you please have a look?

I suspect we want something like this:

diff --git a/net/netfilter/nft_set_rbtree.c b/net/netfilter/nft_set_rbtree.c
--- a/net/netfilter/nft_set_rbtree.c
+++ b/net/netfilter/nft_set_rbtree.c
@@ -150,10 +150,12 @@ static int nft_array_get_cmp(const void *pkey, const void *entry)
 		b = memcmp(ctx->key, nft_set_ext_key(interval->to), ctx->klen);
 
 	if (a >= 0) {
-		if (ctx->flags & NFT_SET_ELEM_INTERVAL_END && b <= 0)
-			return 0;
-		else if (b < 0)
+		if (ctx->flags & NFT_SET_ELEM_INTERVAL_END && b <= 0) {
+			if (interval->to)
+				return 0;
+		} else if (b < 0) {
 			return 0;
+		}
 	}
 
 	if (a < 0)

When userspace asks for end interval, but we have an open interval,
then cmp callback shouldn't indicate a match.

^ permalink raw reply

* Re: RTL8159 firmware
From: Andrew Lunn @ 2026-07-01 19:29 UTC (permalink / raw)
  To: Birger Koblitz
  Cc: Jan Hendrik Farr, andrew+netdev, davem, edumazet, hsu.chih.kai,
	kuba, linux-kernel, linux-usb, netdev, olek2, pabeni
In-Reply-To: <5dc0e654-0bdb-422c-9049-94ee6d8867e4@birger-koblitz.de>

On Wed, Jul 01, 2026 at 07:24:13PM +0200, Birger Koblitz wrote:
> Hi Jan,
> 
> On 7/1/26 19:13, Jan Hendrik Farr wrote:
> > Hi Birger,
> > 
> > it looks like the firmware file rtl_nic/rtl8159-1.fw isn't in linux-firmware yet.
> > Could you send it for people to potentially test?
> > 
> > Jan
> > 
> The code to create the binary firmware file is at:
> https://gitlab.com/koblitz-rtlnic/rtlnic_fw
> But I cannot submit the firmware itself to linux-firmware, as the sourcecode from
> which the binary data is extracted is published by Realtek under the GPL.

The obvious work around is to not convert it to binary.

8152.c has a clear GPL-2.0 header. So you can edit that file, extract
the two arrays, and it would still be GPL. You can make the firmware
loader in the driver parse the ASCII array as it is. It also looks
like you can .xz or .zst compress it, and the kernel
_request_firmware() will handle the decompression for you.

Just clearly document what you have done, and i think you are O.K. But
IANAL.

	Andrew

^ permalink raw reply

* Re: [PATCH] selftests: Open /dev/udmabuf O_RDONLY
From: T.J. Mercier @ 2026-07-01 19:23 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: kraxel, vivek.kasireddy, Shuah Khan, Andrew Lunn, David S. Miller,
	Eric Dumazet, Paolo Abeni, linux-kselftest, linux-kernel, netdev,
	bpf
In-Reply-To: <20260701115708.01213909@kernel.org>

On Wed, Jul 1, 2026 at 11:57 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Wed, 1 Jul 2026 11:53:15 -0700 T.J. Mercier wrote:
> > On Fri, Jun 26, 2026 at 6:09 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > >
> > > On Thu, 25 Jun 2026 11:15:55 -0700 T.J. Mercier wrote:
> > > > Write permissions on the /dev/udmabuf device file are not required to
> > > > issue ioctls and allocate udmabufs. Applications should be opening this
> > > > file as O_RDONLY. The BPF dmabuf_iter selftest already does this. [1]
> > > >
> > > > Remove the write access mode from the drivers/dma-buf/udmabuf.c and
> > > > drivers/net/hw/ncdevmem.c selftests.
> > >
> > > You need to explain "why", too. Why change it if it clearly
> > > worked for everyone running this test until now.
> > > --
> > > pw-bot: cr
> >
> > Principle of least privilege. Folks use or point to these selftests as
> > examples, and then wonder why O_RDWR doesn't work on systems where
> > write permissions are not available on /dev/udmabuf.
>
> Alright, pop that into the commit msg and repost please.

Done, thanks.
v2 here: https://lore.kernel.org/all/20260701192210.2997769-1-tjmercier@google.com/

^ permalink raw reply

* [PATCH net-next v4 3/3] selftests/net: devmem.py: add check_rx_large_niov
From: Bobby Eshleman @ 2026-07-01 19:22 UTC (permalink / raw)
  To: Donald Hunter, Jakub Kicinski, David S. Miller, Eric Dumazet,
	Paolo Abeni, Simon Horman, Andrew Lunn, Gerd Hoffmann,
	Vivek Kasireddy, Sumit Semwal, Christian König, Shuah Khan
  Cc: netdev, linux-kernel, dri-devel, linux-media, linaro-mm-sig,
	linux-kselftest, sdf, razor, daniel, almasrymina, matttbe,
	skhawaja, dw, Joe Damato, Bobby Eshleman
In-Reply-To: <20260701-tcpdm-large-niovs-v4-0-ca4654f37570@meta.com>

From: Bobby Eshleman <bobbyeshleman@meta.com>

Add a new devmem test case for binding the dmabuf with rx-buf-size=16K.
The test sweeps RX payload sizes straddling the niov boundary to cover
the sub-niov, exact-niov, and multi-niov RX paths.

Silence pylint invalid-name (`with open() as f`) and too-many-arguments
(ncdevmem_rx grew to 6 args) at file scope.

Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
---
 tools/testing/selftests/drivers/net/hw/devmem.py   | 12 ++++-
 .../testing/selftests/drivers/net/hw/devmem_lib.py | 59 +++++++++++++++++++++-
 .../testing/selftests/drivers/net/hw/nk_devmem.py  | 11 +++-
 3 files changed, 76 insertions(+), 6 deletions(-)

diff --git a/tools/testing/selftests/drivers/net/hw/devmem.py b/tools/testing/selftests/drivers/net/hw/devmem.py
index 031cf9905f65..47b54e18e7a6 100755
--- a/tools/testing/selftests/drivers/net/hw/devmem.py
+++ b/tools/testing/selftests/drivers/net/hw/devmem.py
@@ -2,7 +2,8 @@
 # SPDX-License-Identifier: GPL-2.0
 
 from os import path
-from devmem_lib import setup_test, run_rx, run_tx, run_tx_chunks, run_rx_hds
+from devmem_lib import (setup_test, run_rx, run_tx, run_tx_chunks, run_rx_hds,
+                        run_rx_large_niov)
 from lib.py import ksft_run, ksft_exit, ksft_disruptive
 from lib.py import NetDrvEpEnv
 
@@ -30,11 +31,18 @@ def check_rx_hds(cfg) -> None:
     run_rx_hds(cfg)
 
 
+@ksft_disruptive
+def check_rx_large_niov(cfg) -> None:
+    """Run the devmem RX test with rx-buf-size = 16 KiB."""
+    run_rx_large_niov(cfg)
+
+
 def main() -> None:
     """Run the devmem test cases."""
     with NetDrvEpEnv(__file__) as cfg:
         setup_test(cfg, path.abspath(path.dirname(__file__) + "/ncdevmem"))
-        ksft_run([check_rx, check_tx, check_tx_chunks, check_rx_hds],
+        ksft_run([check_rx, check_tx, check_tx_chunks, check_rx_hds,
+                  check_rx_large_niov],
                  args=(cfg,))
     ksft_exit()
 
diff --git a/tools/testing/selftests/drivers/net/hw/devmem_lib.py b/tools/testing/selftests/drivers/net/hw/devmem_lib.py
index 0921ff03eb81..7b8557959c40 100644
--- a/tools/testing/selftests/drivers/net/hw/devmem_lib.py
+++ b/tools/testing/selftests/drivers/net/hw/devmem_lib.py
@@ -1,4 +1,5 @@
 # SPDX-License-Identifier: GPL-2.0
+# pylint: disable=invalid-name,too-many-arguments
 """Shared helpers for devmem TCP selftests."""
 
 import re
@@ -8,7 +9,7 @@ from lib.py import (bkg, cmd, defer, ethtool, rand_port, wait_port_listen,
                     NetdevFamily)
 
 
-def require_devmem(cfg):
+def require_devmem(cfg, rx_buf_size=0):
     """Probe ncdevmem on cfg.ifname and SKIP the test if devmem isn't supported."""
     if not hasattr(cfg, "devmem_probed"):
         probe_command = f"{cfg.bin_local} -f {cfg.ifname}"
@@ -18,6 +19,19 @@ def require_devmem(cfg):
     if not cfg.devmem_supported:
         raise KsftSkipEx("Test requires devmem support")
 
+    if rx_buf_size > 0:
+        if not hasattr(cfg, "devmem_rx_buf_size_probed"):
+            cfg.devmem_rx_buf_size_probed = {}
+
+        if rx_buf_size not in cfg.devmem_rx_buf_size_probed:
+            probe_command = f"{cfg.bin_local} -f {cfg.ifname} -b {rx_buf_size}"
+            cfg.devmem_rx_buf_size_probed[rx_buf_size] = \
+                cmd(probe_command, fail=False, shell=True).ret == 0
+
+        if not cfg.devmem_rx_buf_size_probed[rx_buf_size]:
+            raise KsftSkipEx(
+                f"Test requires devmem rx-buf-size={rx_buf_size} support")
+
 
 def configure_nic(cfg):
     """Channels, rings, RSS, queue lease for netkit devmem."""
@@ -76,7 +90,8 @@ def set_flow_rule(cfg, port):
     return int(re.search(r'ID (\d+)', output).group(1))
 
 
-def ncdevmem_rx(cfg, port, verify=True, fail_on_linear=False, flow_steer=False):
+def ncdevmem_rx(cfg, port, verify=True, fail_on_linear=False, flow_steer=False,
+                rx_buf_size=0):
     """Build the ncdevmem RX listener command."""
     if hasattr(cfg, 'netns'):
         flow_rule_id = set_flow_rule(cfg, port)
@@ -96,6 +111,8 @@ def ncdevmem_rx(cfg, port, verify=True, fail_on_linear=False, flow_steer=False):
         extras.append("-v 7")
     if fail_on_linear:
         extras.append("-L")
+    if rx_buf_size > 0:
+        extras.append(f"-b {rx_buf_size}")
 
     parts = [cfg.bin_local, "-l", f"-f {ifname}", f"-s {addr}",
              f"-p {port}", *extras]
@@ -202,6 +219,44 @@ def run_tx_chunks(cfg):
     ksft_eq(socat.stdout.strip(), "hello\nworld")
 
 
+def _restore_nr_hugepages(hp_file, nr_hugepages):
+    with open(hp_file, 'w', encoding='utf-8') as f:
+        f.write(str(nr_hugepages))
+
+
+def run_rx_large_niov(cfg):
+    """Run the devmem RX test with a large niov (rx-buf-size > PAGE_SIZE).
+
+    Sweep payload sizes that straddle the niov boundary: below, equal to,
+    and above rx_buf_size, to exercise sub-niov, exact-niov, and multi-niov
+    RX paths.
+    """
+    hp_file = "/proc/sys/vm/nr_hugepages"
+    with open(hp_file, 'r+', encoding='utf-8') as f:
+        nr_hugepages = int(f.read().strip())
+        if nr_hugepages < 64:
+            f.seek(0)
+            f.write("64")
+            defer(_restore_nr_hugepages, hp_file, nr_hugepages)
+    require_devmem(cfg, rx_buf_size=16384)
+    configure_nic(cfg)
+    netns = getattr(cfg, "netns", None)
+
+    for size in [1024, 4096, 8192, 16384, 32768, 65536]:
+        port = rand_port()
+        socat = socat_send(cfg, port)
+        listen_cmd = ncdevmem_rx(cfg, port,
+                                 flow_steer=not netns,
+                                 rx_buf_size=16384)
+        data_pipe = (f"yes $(echo -e \x01\x02\x03\x04\x05\x06) | "
+                     f"head -c {size} | {socat}")
+        with bkg(listen_cmd, exit_wait=True, ns=netns) as ncdevmem:
+            wait_port_listen(port, proto="tcp", ns=netns)
+            cmd(data_pipe, host=cfg.remote, shell=True)
+        ksft_eq(ncdevmem.ret, 0,
+                f"large-niov failed for payload size {size}")
+
+
 def run_rx_hds(cfg):
     """Run the HDS test by running devmem RX across a segment size sweep."""
     require_devmem(cfg)
diff --git a/tools/testing/selftests/drivers/net/hw/nk_devmem.py b/tools/testing/selftests/drivers/net/hw/nk_devmem.py
index 300ed2a70ab4..7f1867e4ff32 100755
--- a/tools/testing/selftests/drivers/net/hw/nk_devmem.py
+++ b/tools/testing/selftests/drivers/net/hw/nk_devmem.py
@@ -3,7 +3,8 @@
 """Test devmem TCP with netkit."""
 
 import os
-from devmem_lib import setup_test, run_rx, run_tx, run_tx_chunks, run_rx_hds
+from devmem_lib import (setup_test, run_rx, run_tx, run_tx_chunks, run_rx_hds,
+                        run_rx_large_niov)
 from lib.py import ksft_run, ksft_exit, ksft_disruptive
 from lib.py import NetDrvContEnv
 
@@ -31,6 +32,12 @@ def check_nk_rx_hds(cfg) -> None:
     run_rx_hds(cfg)
 
 
+@ksft_disruptive
+def check_nk_rx_large_niov(cfg) -> None:
+    """Run the devmem RX large-niov test through netkit."""
+    run_rx_large_niov(cfg)
+
+
 def main() -> None:
     """Run the netkit devmem test cases."""
     with NetDrvContEnv(__file__, rxqueues=2, primary_rx_redirect=True) as cfg:
@@ -38,7 +45,7 @@ def main() -> None:
                    os.path.join(os.path.dirname(os.path.abspath(__file__)),
                                 "ncdevmem"))
         ksft_run([check_nk_rx, check_nk_tx, check_nk_tx_chunks,
-                  check_nk_rx_hds], args=(cfg,))
+                  check_nk_rx_hds, check_nk_rx_large_niov], args=(cfg,))
     ksft_exit()
 
 

-- 
2.53.0-Meta


^ permalink raw reply related

* [PATCH net-next v4 2/3] selftests/net: ncdevmem: add -b option to set rx-buf-size on bind
From: Bobby Eshleman @ 2026-07-01 19:22 UTC (permalink / raw)
  To: Donald Hunter, Jakub Kicinski, David S. Miller, Eric Dumazet,
	Paolo Abeni, Simon Horman, Andrew Lunn, Gerd Hoffmann,
	Vivek Kasireddy, Sumit Semwal, Christian König, Shuah Khan
  Cc: netdev, linux-kernel, dri-devel, linux-media, linaro-mm-sig,
	linux-kselftest, sdf, razor, daniel, almasrymina, matttbe,
	skhawaja, dw, Joe Damato, Bobby Eshleman
In-Reply-To: <20260701-tcpdm-large-niovs-v4-0-ca4654f37570@meta.com>

From: Bobby Eshleman <bobbyeshleman@meta.com>

Add -b <bytes> to request a non-default niov size via
NETDEV_A_DMABUF_RX_BUF_SIZE. When the value exceeds PAGE_SIZE,
udmabuf_alloc() switches to an MFD_HUGETLB-backed memfd so each 2 MB
hugepage produces one naturally-aligned sg entry.

Add CONFIG_HUGETLBFS=y to drivers/net/hw/config so the new path is
reachable in the CI kernels built for these tests.

Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
---
 tools/testing/selftests/drivers/net/hw/ncdevmem.c | 36 +++++++++++++++++++++--
 1 file changed, 33 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/drivers/net/hw/ncdevmem.c b/tools/testing/selftests/drivers/net/hw/ncdevmem.c
index d96e8a3b5a65..a16e55af51ee 100644
--- a/tools/testing/selftests/drivers/net/hw/ncdevmem.c
+++ b/tools/testing/selftests/drivers/net/hw/ncdevmem.c
@@ -40,6 +40,7 @@
 
 #include <linux/uio.h>
 #include <stdarg.h>
+#include <stdint.h>
 #include <stdio.h>
 #include <stdlib.h>
 #include <unistd.h>
@@ -61,6 +62,7 @@
 #include <sys/time.h>
 
 #include <linux/memfd.h>
+#include <sys/param.h>
 #include <linux/dma-buf.h>
 #include <linux/errqueue.h>
 #include <linux/udmabuf.h>
@@ -79,6 +81,7 @@
 #define PAGE_SHIFT 12
 #define TEST_PREFIX "ncdevmem"
 #define NUM_PAGES 16000
+#define MB(x) ((x) << 20)
 
 #ifndef MSG_SOCK_DEVMEM
 #define MSG_SOCK_DEVMEM 0x2000000
@@ -100,6 +103,7 @@ static unsigned int dmabuf_id;
 static uint32_t tx_dmabuf_id;
 static int waittime_ms = 500;
 static bool fail_on_linear;
+static uint32_t rx_buf_size;
 
 /* System state loaded by current_config_load() */
 #define MAX_FLOWS	8
@@ -142,6 +146,7 @@ static struct memory_buffer *udmabuf_alloc(size_t size)
 {
 	struct udmabuf_create create;
 	struct memory_buffer *ctx;
+	unsigned int memfd_flags;
 	int ret;
 
 	ctx = malloc(sizeof(*ctx));
@@ -156,9 +161,14 @@ static struct memory_buffer *udmabuf_alloc(size_t size)
 		goto err_free_ctx;
 	}
 
-	ctx->memfd = memfd_create("udmabuf-test", MFD_ALLOW_SEALING);
+	memfd_flags = MFD_ALLOW_SEALING;
+	if (rx_buf_size > getpagesize())
+		memfd_flags |= MFD_HUGETLB | MFD_HUGE_2MB;
+
+	ctx->memfd = memfd_create("udmabuf-test", memfd_flags);
 	if (ctx->memfd < 0) {
-		pr_err("[skip,no-memfd]");
+		pr_err("[skip,no-memfd%s]",
+		       (memfd_flags & MFD_HUGETLB) ? " (need hugepages)" : "");
 		goto err_close_dev;
 	}
 
@@ -168,6 +178,11 @@ static struct memory_buffer *udmabuf_alloc(size_t size)
 		goto err_close_memfd;
 	}
 
+	if (memfd_flags & MFD_HUGETLB) {
+		size = roundup(size, MB(2));
+		ctx->size = size;
+	}
+
 	ret = ftruncate(ctx->memfd, size);
 	if (ret == -1) {
 		pr_err("[FAIL,memfd-truncate]");
@@ -699,6 +714,8 @@ static int bind_rx_queue(unsigned int ifindex, unsigned int dmabuf_fd,
 	netdev_bind_rx_req_set_ifindex(req, ifindex);
 	netdev_bind_rx_req_set_fd(req, dmabuf_fd);
 	__netdev_bind_rx_req_set_queues(req, queues, n_queue_index);
+	if (rx_buf_size)
+		netdev_bind_rx_req_set_rx_buf_size(req, rx_buf_size);
 
 	rsp = netdev_bind_rx(*ys, req);
 	if (!rsp) {
@@ -1411,7 +1428,7 @@ int main(int argc, char *argv[])
 	int is_server = 0, opt;
 	int ret, err = 1;
 
-	while ((opt = getopt(argc, argv, "Lls:c:p:v:q:t:f:z:n")) != -1) {
+	while ((opt = getopt(argc, argv, "Lls:c:p:v:q:t:f:z:nb:")) != -1) {
 		switch (opt) {
 		case 'L':
 			fail_on_linear = true;
@@ -1446,6 +1463,19 @@ int main(int argc, char *argv[])
 		case 'n':
 			skip_config = 1;
 			break;
+		case 'b': {
+			unsigned long val;
+
+			errno = 0;
+			val = strtoul(optarg, NULL, 0);
+			if ((val == ULONG_MAX && errno == ERANGE) ||
+			    val > UINT32_MAX) {
+				pr_err("invalid rx_buf_size: %s", optarg);
+				return 1;
+			}
+			rx_buf_size = val;
+			break;
+		}
 		case '?':
 			fprintf(stderr, "unknown option: %c\n", optopt);
 			break;

-- 
2.53.0-Meta


^ permalink raw reply related

* [PATCH net-next v4 1/3] net: devmem: allow rx-buf-size > PAGE_SIZE per dmabuf binding
From: Bobby Eshleman @ 2026-07-01 19:22 UTC (permalink / raw)
  To: Donald Hunter, Jakub Kicinski, David S. Miller, Eric Dumazet,
	Paolo Abeni, Simon Horman, Andrew Lunn, Gerd Hoffmann,
	Vivek Kasireddy, Sumit Semwal, Christian König, Shuah Khan
  Cc: netdev, linux-kernel, dri-devel, linux-media, linaro-mm-sig,
	linux-kselftest, sdf, razor, daniel, almasrymina, matttbe,
	skhawaja, dw, Joe Damato, Bobby Eshleman
In-Reply-To: <20260701-tcpdm-large-niovs-v4-0-ca4654f37570@meta.com>

From: Bobby Eshleman <bobbyeshleman@meta.com>

Every devmem dmabuf binding today hands the page_pool PAGE_SIZE niovs.
This caps a single RX descriptor at PAGE_SIZE, burning CPU on buffer
churn for large flows.

Add a bind-time netlink attribute, NETDEV_A_DMABUF_RX_BUF_SIZE, that
lets userspace request a larger niov size. The value must be a power of
two >= PAGE_SIZE.

Measurements
------------
Setup: kperf in devmem RX/TX cuda mode, 4 flows, 64 MB messages, 60s,
dctcp, num-rx-queues=4, dmabuf-rx/tx-size-mb=2048, 10 runs per niov
size, mlx5.

CPU Util:

   niov        net sirq %        net idle %         app sys %        app idle %
  -----  ----------------  ----------------  ----------------  ----------------
     4K   62.38 +/-  8.27   33.40 +/-  7.51   54.15 +/- 10.23   43.67 +/- 10.53
    16K   58.91 +/-  5.35   35.23 +/-  5.88   41.05 +/-  8.87   56.42 +/-  9.24
    32K   64.12 +/-  0.68   31.09 +/-  1.48   44.54 +/-  3.51   52.63 +/-  3.65
    64K   54.69 +/-  5.54   39.67 +/-  5.81   35.47 +/-  3.11   61.97 +/-  3.27

RX app sys % drops ~19% from 4K to 64K.

Throughput:

   niov       RX dev Gbps   RX flow avg Gbps
  -----  ----------------  -----------------
     4K  300.63 +/- 53.21    75.16 +/- 13.30
    16K  321.35 +/- 28.20    80.34 +/-  7.05
    32K  347.63 +/-  2.20    86.91 +/-  0.55
    64K  332.11 +/- 14.26    83.03 +/-  3.56

Throughput seems to increase, but the stdev is pretty wide so could just
be noise.

kperf support (not yet merged):
https://github.com/facebookexperimental/kperf/commit/8837577f920876bce6986ec18869ac04439ebcd2

Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
---
 Documentation/netlink/specs/netdev.yaml |  8 +++++
 include/uapi/linux/netdev.h             |  1 +
 net/core/devmem.c                       | 55 +++++++++++++++++++--------------
 net/core/devmem.h                       | 13 +++++---
 net/core/netdev-genl-gen.c              |  5 +--
 net/core/netdev-genl.c                  | 19 ++++++++++--
 tools/include/uapi/linux/netdev.h       |  1 +
 7 files changed, 71 insertions(+), 31 deletions(-)

diff --git a/Documentation/netlink/specs/netdev.yaml b/Documentation/netlink/specs/netdev.yaml
index 5f143da7458c..70b902008bd3 100644
--- a/Documentation/netlink/specs/netdev.yaml
+++ b/Documentation/netlink/specs/netdev.yaml
@@ -598,6 +598,13 @@ attribute-sets:
         type: u32
         checks:
           min: 1
+      -
+        name: rx-buf-size
+        doc: |
+          Size in bytes of each RX buffer the NIC writes into from the bound
+          dmabuf. Must be a power of two and >= PAGE_SIZE; defaults to
+          PAGE_SIZE.
+        type: u32
 
 operations:
   list:
@@ -812,6 +819,7 @@ operations:
             - ifindex
             - fd
             - queues
+            - rx-buf-size
         reply:
           attributes:
             - id
diff --git a/include/uapi/linux/netdev.h b/include/uapi/linux/netdev.h
index 2f3ab75e8cc0..85e1d20c6268 100644
--- a/include/uapi/linux/netdev.h
+++ b/include/uapi/linux/netdev.h
@@ -219,6 +219,7 @@ enum {
 	NETDEV_A_DMABUF_QUEUES,
 	NETDEV_A_DMABUF_FD,
 	NETDEV_A_DMABUF_ID,
+	NETDEV_A_DMABUF_RX_BUF_SIZE,
 
 	__NETDEV_A_DMABUF_MAX,
 	NETDEV_A_DMABUF_MAX = (__NETDEV_A_DMABUF_MAX - 1)
diff --git a/net/core/devmem.c b/net/core/devmem.c
index 957d6b96216b..3d6cf35e50f3 100644
--- a/net/core/devmem.c
+++ b/net/core/devmem.c
@@ -46,7 +46,7 @@ static dma_addr_t net_devmem_get_dma_addr(const struct net_iov *niov)
 
 	owner = net_devmem_iov_to_chunk_owner(niov);
 	return owner->base_dma_addr +
-	       ((dma_addr_t)net_iov_idx(niov) << PAGE_SHIFT);
+	       ((dma_addr_t)net_iov_idx(niov) << owner->binding->niov_shift);
 }
 
 static void net_devmem_dmabuf_binding_release(struct percpu_ref *ref)
@@ -90,16 +90,17 @@ net_devmem_alloc_dmabuf(struct net_devmem_dmabuf_binding *binding)
 	struct dmabuf_genpool_chunk_owner *owner;
 	unsigned long dma_addr;
 	struct net_iov *niov;
-	ssize_t offset;
-	ssize_t index;
+	size_t offset;
+	size_t index;
 
-	dma_addr = gen_pool_alloc_owner(binding->chunk_pool, PAGE_SIZE,
+	dma_addr = gen_pool_alloc_owner(binding->chunk_pool,
+					1UL << binding->niov_shift,
 					(void **)&owner);
 	if (!dma_addr)
 		return NULL;
 
 	offset = dma_addr - owner->base_dma_addr;
-	index = offset / PAGE_SIZE;
+	index = offset >> binding->niov_shift;
 	niov = &owner->area.niovs[index];
 
 	niov->desc.pp_magic = 0;
@@ -113,12 +114,13 @@ void net_devmem_free_dmabuf(struct net_iov *niov)
 {
 	struct net_devmem_dmabuf_binding *binding = net_devmem_iov_binding(niov);
 	unsigned long dma_addr = net_devmem_get_dma_addr(niov);
+	size_t niov_size = 1UL << binding->niov_shift;
 
 	if (WARN_ON(!gen_pool_has_addr(binding->chunk_pool, dma_addr,
-				       PAGE_SIZE)))
+				       niov_size)))
 		return;
 
-	gen_pool_free(binding->chunk_pool, dma_addr, PAGE_SIZE);
+	gen_pool_free(binding->chunk_pool, dma_addr, niov_size);
 }
 
 void net_devmem_unbind_dmabuf(struct net_devmem_dmabuf_binding *binding)
@@ -163,6 +165,9 @@ int net_devmem_bind_dmabuf_to_queue(struct net_device *dev, u32 rxq_idx,
 	u32 xa_idx;
 	int err;
 
+	if (binding->niov_shift != PAGE_SHIFT)
+		mp_params.rx_page_size = 1U << binding->niov_shift;
+
 	err = netif_mp_open_rxq(dev, rxq_idx, &mp_params, extack);
 	if (err)
 		return err;
@@ -184,14 +189,16 @@ struct net_devmem_dmabuf_binding *
 net_devmem_bind_dmabuf(struct net_device *dev, void *vdev,
 		       struct device *dma_dev,
 		       enum dma_data_direction direction,
-		       unsigned int dmabuf_fd, struct netdev_nl_sock *priv,
+		       unsigned int dmabuf_fd, unsigned int niov_shift,
+		       struct netdev_nl_sock *priv,
 		       struct netlink_ext_ack *extack)
 {
 	struct net_devmem_dmabuf_binding *binding;
+	size_t niov_size = 1UL << niov_shift;
 	static u32 id_alloc_next;
+	unsigned int sg_idx, i;
 	struct scatterlist *sg;
 	struct dma_buf *dmabuf;
-	unsigned int sg_idx, i;
 	unsigned long virtual;
 	int err;
 
@@ -213,6 +220,7 @@ net_devmem_bind_dmabuf(struct net_device *dev, void *vdev,
 
 	binding->dev = dev;
 	binding->vdev = vdev;
+	binding->niov_shift = niov_shift;
 	xa_init_flags(&binding->bound_rxqs, XA_FLAGS_ALLOC);
 
 	err = percpu_ref_init(&binding->ref,
@@ -248,18 +256,14 @@ net_devmem_bind_dmabuf(struct net_device *dev, void *vdev,
 			goto err_unmap;
 		}
 		binding->tx_vec = kvmalloc_objs(struct net_iov *,
-						dmabuf->size / PAGE_SIZE);
+						dmabuf->size >> niov_shift);
 		if (!binding->tx_vec) {
 			err = -ENOMEM;
 			goto err_unmap;
 		}
 	}
 
-	/* For simplicity we expect to make PAGE_SIZE allocations, but the
-	 * binding can be much more flexible than that. We may be able to
-	 * allocate MTU sized chunks here. Leave that for future work...
-	 */
-	binding->chunk_pool = gen_pool_create(PAGE_SHIFT,
+	binding->chunk_pool = gen_pool_create(niov_shift,
 					      dev_to_node(&dev->dev));
 	if (!binding->chunk_pool) {
 		err = -ENOMEM;
@@ -273,9 +277,12 @@ net_devmem_bind_dmabuf(struct net_device *dev, void *vdev,
 		size_t len = sg_dma_len(sg);
 		struct net_iov *niov;
 
-		if (!IS_ALIGNED(len, PAGE_SIZE)) {
+		if (!IS_ALIGNED(dma_addr, niov_size) ||
+		    !IS_ALIGNED(len, niov_size)) {
 			err = -EINVAL;
-			NL_SET_ERR_MSG(extack, "dma-buf SG length must be PAGE_SIZE aligned");
+			NL_SET_ERR_MSG_FMT(extack,
+					   "dmabuf sg entry (addr=%pad, len=%zu) not aligned to niov size %zu",
+					   &dma_addr, len, niov_size);
 			goto err_free_chunks;
 		}
 
@@ -288,7 +295,7 @@ net_devmem_bind_dmabuf(struct net_device *dev, void *vdev,
 
 		owner->area.base_virtual = virtual;
 		owner->base_dma_addr = dma_addr;
-		owner->area.num_niovs = len / PAGE_SIZE;
+		owner->area.num_niovs = len >> niov_shift;
 		owner->binding = binding;
 
 		err = gen_pool_add_owner(binding->chunk_pool, dma_addr,
@@ -313,7 +320,7 @@ net_devmem_bind_dmabuf(struct net_device *dev, void *vdev,
 			page_pool_set_dma_addr_netmem(net_iov_to_netmem(niov),
 						      net_devmem_get_dma_addr(niov));
 			if (direction == DMA_TO_DEVICE)
-				binding->tx_vec[owner->area.base_virtual / PAGE_SIZE + i] = niov;
+				binding->tx_vec[(owner->area.base_virtual >> niov_shift) + i] = niov;
 		}
 
 		virtual += len;
@@ -430,13 +437,15 @@ struct net_iov *
 net_devmem_get_niov_at(struct net_devmem_dmabuf_binding *binding,
 		       size_t virt_addr, size_t *off, size_t *size)
 {
+	size_t niov_size = 1UL << binding->niov_shift;
+
 	if (virt_addr >= binding->dmabuf->size)
 		return NULL;
 
-	*off = virt_addr % PAGE_SIZE;
-	*size = PAGE_SIZE - *off;
+	*off = virt_addr & (niov_size - 1);
+	*size = niov_size - *off;
 
-	return binding->tx_vec[virt_addr / PAGE_SIZE];
+	return binding->tx_vec[virt_addr >> binding->niov_shift];
 }
 
 /*** "Dmabuf devmem memory provider" ***/
@@ -454,7 +463,7 @@ int mp_dmabuf_devmem_init(struct page_pool *pool)
 	pool->dma_sync = false;
 	pool->dma_sync_for_cpu = false;
 
-	if (pool->p.order != 0)
+	if (pool->p.order != binding->niov_shift - PAGE_SHIFT)
 		return -E2BIG;
 
 	net_devmem_dmabuf_binding_get(binding);
diff --git a/net/core/devmem.h b/net/core/devmem.h
index 3852a56036cb..4a293a7d1149 100644
--- a/net/core/devmem.h
+++ b/net/core/devmem.h
@@ -71,6 +71,8 @@ struct net_devmem_dmabuf_binding {
 	 */
 	struct net_iov **tx_vec;
 
+	unsigned int niov_shift;
+
 	struct work_struct unbind_w;
 };
 
@@ -93,7 +95,8 @@ struct net_devmem_dmabuf_binding *
 net_devmem_bind_dmabuf(struct net_device *dev, void *vdev,
 		       struct device *dma_dev,
 		       enum dma_data_direction direction,
-		       unsigned int dmabuf_fd, struct netdev_nl_sock *priv,
+		       unsigned int dmabuf_fd, unsigned int niov_shift,
+		       struct netdev_nl_sock *priv,
 		       struct netlink_ext_ack *extack);
 struct net_devmem_dmabuf_binding *net_devmem_lookup_dmabuf(u32 id);
 void net_devmem_unbind_dmabuf(struct net_devmem_dmabuf_binding *binding);
@@ -122,10 +125,11 @@ static inline u32 net_devmem_iov_binding_id(const struct net_iov *niov)
 
 static inline unsigned long net_iov_virtual_addr(const struct net_iov *niov)
 {
-	struct net_iov_area *owner = net_iov_owner(niov);
+	struct dmabuf_genpool_chunk_owner *co =
+		net_devmem_iov_to_chunk_owner(niov);
 
-	return owner->base_virtual +
-	       ((unsigned long)net_iov_idx(niov) << PAGE_SHIFT);
+	return net_iov_owner(niov)->base_virtual +
+	       ((unsigned long)net_iov_idx(niov) << co->binding->niov_shift);
 }
 
 static inline bool
@@ -175,6 +179,7 @@ net_devmem_bind_dmabuf(struct net_device *dev, void *vdev,
 		       struct device *dma_dev,
 		       enum dma_data_direction direction,
 		       unsigned int dmabuf_fd,
+		       unsigned int niov_shift,
 		       struct netdev_nl_sock *priv,
 		       struct netlink_ext_ack *extack)
 {
diff --git a/net/core/netdev-genl-gen.c b/net/core/netdev-genl-gen.c
index d18c89b5a6c7..447ed06d8c74 100644
--- a/net/core/netdev-genl-gen.c
+++ b/net/core/netdev-genl-gen.c
@@ -106,10 +106,11 @@ static const struct nla_policy netdev_qstats_get_nl_policy[NETDEV_A_QSTATS_SCOPE
 };
 
 /* NETDEV_CMD_BIND_RX - do */
-static const struct nla_policy netdev_bind_rx_nl_policy[NETDEV_A_DMABUF_FD + 1] = {
+static const struct nla_policy netdev_bind_rx_nl_policy[NETDEV_A_DMABUF_RX_BUF_SIZE + 1] = {
 	[NETDEV_A_DMABUF_IFINDEX] = NLA_POLICY_MIN(NLA_U32, 1),
 	[NETDEV_A_DMABUF_FD] = { .type = NLA_U32, },
 	[NETDEV_A_DMABUF_QUEUES] = NLA_POLICY_NESTED(netdev_queue_id_nl_policy),
+	[NETDEV_A_DMABUF_RX_BUF_SIZE] = { .type = NLA_U32, },
 };
 
 /* NETDEV_CMD_NAPI_SET - do */
@@ -219,7 +220,7 @@ static const struct genl_split_ops netdev_nl_ops[] = {
 		.cmd		= NETDEV_CMD_BIND_RX,
 		.doit		= netdev_nl_bind_rx_doit,
 		.policy		= netdev_bind_rx_nl_policy,
-		.maxattr	= NETDEV_A_DMABUF_FD,
+		.maxattr	= NETDEV_A_DMABUF_RX_BUF_SIZE,
 		.flags		= GENL_UNS_ADMIN_PERM | GENL_CMD_CAP_DO,
 	},
 	{
diff --git a/net/core/netdev-genl.c b/net/core/netdev-genl.c
index c15d8d4ca1f8..82089dac000f 100644
--- a/net/core/netdev-genl.c
+++ b/net/core/netdev-genl.c
@@ -1013,6 +1013,7 @@ netdev_nl_get_dma_dev(struct net_device *netdev, unsigned long *rxq_bitmap,
 int netdev_nl_bind_rx_doit(struct sk_buff *skb, struct genl_info *info)
 {
 	struct net_devmem_dmabuf_binding *binding;
+	unsigned int niov_shift = PAGE_SHIFT;
 	u32 ifindex, dmabuf_fd, rxq_idx;
 	struct netdev_nl_sock *priv;
 	struct net_device *netdev;
@@ -1030,6 +1031,19 @@ int netdev_nl_bind_rx_doit(struct sk_buff *skb, struct genl_info *info)
 	ifindex = nla_get_u32(info->attrs[NETDEV_A_DEV_IFINDEX]);
 	dmabuf_fd = nla_get_u32(info->attrs[NETDEV_A_DMABUF_FD]);
 
+	if (info->attrs[NETDEV_A_DMABUF_RX_BUF_SIZE]) {
+		u32 rx_buf_size = nla_get_u32(info->attrs[NETDEV_A_DMABUF_RX_BUF_SIZE]);
+
+		if (!rx_buf_size || !is_power_of_2(rx_buf_size) ||
+		    rx_buf_size < PAGE_SIZE) {
+			NL_SET_ERR_MSG_FMT(info->extack,
+					   "rx_buf_size %u must be a power of 2 >= page size (%lu)",
+					   rx_buf_size, PAGE_SIZE);
+			return -EINVAL;
+		}
+		niov_shift = ilog2(rx_buf_size);
+	}
+
 	priv = genl_sk_priv_get(&netdev_nl_family, NETLINK_CB(skb).sk);
 	if (IS_ERR(priv))
 		return PTR_ERR(priv);
@@ -1080,7 +1094,8 @@ int netdev_nl_bind_rx_doit(struct sk_buff *skb, struct genl_info *info)
 	}
 
 	binding = net_devmem_bind_dmabuf(netdev, NULL, dma_dev, DMA_FROM_DEVICE,
-					 dmabuf_fd, priv, info->extack);
+					 dmabuf_fd, niov_shift, priv,
+					 info->extack);
 	if (IS_ERR(binding)) {
 		err = PTR_ERR(binding);
 		goto err_rxq_bitmap;
@@ -1221,7 +1236,7 @@ int netdev_nl_bind_tx_doit(struct sk_buff *skb, struct genl_info *info)
 	binding = net_devmem_bind_dmabuf(bind_dev,
 					 bind_dev != netdev ? netdev : NULL,
 					 dma_dev, DMA_TO_DEVICE, dmabuf_fd,
-					 priv, info->extack);
+					 PAGE_SHIFT, priv, info->extack);
 	if (IS_ERR(binding)) {
 		err = PTR_ERR(binding);
 		goto err_unlock_bind_dev;
diff --git a/tools/include/uapi/linux/netdev.h b/tools/include/uapi/linux/netdev.h
index 2f3ab75e8cc0..85e1d20c6268 100644
--- a/tools/include/uapi/linux/netdev.h
+++ b/tools/include/uapi/linux/netdev.h
@@ -219,6 +219,7 @@ enum {
 	NETDEV_A_DMABUF_QUEUES,
 	NETDEV_A_DMABUF_FD,
 	NETDEV_A_DMABUF_ID,
+	NETDEV_A_DMABUF_RX_BUF_SIZE,
 
 	__NETDEV_A_DMABUF_MAX,
 	NETDEV_A_DMABUF_MAX = (__NETDEV_A_DMABUF_MAX - 1)

-- 
2.53.0-Meta


^ permalink raw reply related

* [PATCH net-next v4 0/3] net: devmem: allow rx-buf-size > PAGE_SIZE per binding
From: Bobby Eshleman @ 2026-07-01 19:22 UTC (permalink / raw)
  To: Donald Hunter, Jakub Kicinski, David S. Miller, Eric Dumazet,
	Paolo Abeni, Simon Horman, Andrew Lunn, Gerd Hoffmann,
	Vivek Kasireddy, Sumit Semwal, Christian König, Shuah Khan
  Cc: netdev, linux-kernel, dri-devel, linux-media, linaro-mm-sig,
	linux-kselftest, sdf, razor, daniel, almasrymina, matttbe,
	skhawaja, dw, Joe Damato, Bobby Eshleman

Every devmem dmabuf binding hands the page_pool PAGE_SIZE niovs today.
On NICs that consume one descriptor per netmem, this caps a single RX
descriptor at PAGE_SIZE and burns CPU on buffer churn.

In this series, we add a bind-time netlink attribute,
NETDEV_A_DMABUF_RX_BUF_SIZE, that lets userspace request a larger niov size
(power of two >= PAGE_SIZE). Drivers must opt in via
queue_mgmt_ops.QCFG_RX_PAGE_SIZE.

Selftests use udmabuf, but udmabuf sgtables were previously hardcoded to
PAGE_SIZE. This series modifies udmabuf to respect folio sizes in its exported
sgtable. The result is that when backing udmabuf with MFD_HUGETLB 2MB pages,
the sgtable is populated with 2MB entries, allowing devmem's gen_pool to carve
out large (eg. 64K) niovs.

Measurements
------------

Setup: kperf devmem RX/TX cuda, 4 flows, 64 MB messages, 60s, dctcp,
num-rx-queues=4, dmabuf-rx/tx-size-mb=2048, 10 runs per niov size,
mlx5.

   niov       RX dev Gbps   RX flow avg Gbps         app sys %
  -----  ----------------  -----------------  ----------------
     4K  300.63 +/- 53.21    75.16 +/- 13.30   54.15 +/- 10.23
    16K  321.35 +/- 28.20    80.34 +/-  7.05   41.05 +/-  8.87
    32K  347.63 +/-  2.20    86.91 +/-  0.55   44.54 +/-  3.51
    64K  332.11 +/- 14.26    83.03 +/-  3.56   35.47 +/-  3.11

RX app sys % drops ~19% from 4K to 64K.

kperf support (not yet merged):
https://github.com/facebookexperimental/kperf/commit/8837577f920876bce6986ec18869ac04439ebcd2

Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
---
Changes in v4:
- ncdevmem: fix the possible overflow in ncdevmem (Sashiko)
- drop the udmabuf patch because the fix is now already in net-next
- silenced two pylint complaints in devmem_lib.py
- Link to v3: https://lore.kernel.org/r/20260612-tcpdm-large-niovs-v3-0-a3b693e76fcb@meta.com

Changes in v3:
- fix a bunch of non-reverse christmas tree declarations (Stan)
- remove extra uint32 cast for getpagesize() (Stan)
- remove overzealous strtoul checking (Stan)
- remove value checks that the kernel already performs on rx_buf_size
  (Stan)
- Link to v2: https://lore.kernel.org/r/20260611-tcpdm-large-niovs-v2-0-ee2bf15e7523@meta.com

Changes in v2:
- Use NL_SET_ERR_MSG_FMT for sg alignment failure details (Stan)
- Keep -E2BIG (not a direct ask, but seemed preferred, Stan)
- Update udmabuf commit message and comments explaining why
  "one sg ent per folio" is useful (Christian)
- Set/restore nr_hugepages in py harness (Stan)
- Link to v1: https://lore.kernel.org/r/20260603-tcpdm-large-niovs-v1-0-f37a4ac6726c@meta.com

---
Bobby Eshleman (3):
      net: devmem: allow rx-buf-size > PAGE_SIZE per dmabuf binding
      selftests/net: ncdevmem: add -b option to set rx-buf-size on bind
      selftests/net: devmem.py: add check_rx_large_niov

 Documentation/netlink/specs/netdev.yaml            |  8 +++
 include/uapi/linux/netdev.h                        |  1 +
 net/core/devmem.c                                  | 55 +++++++++++---------
 net/core/devmem.h                                  | 13 +++--
 net/core/netdev-genl-gen.c                         |  5 +-
 net/core/netdev-genl.c                             | 19 ++++++-
 tools/include/uapi/linux/netdev.h                  |  1 +
 tools/testing/selftests/drivers/net/hw/devmem.py   | 12 ++++-
 .../testing/selftests/drivers/net/hw/devmem_lib.py | 59 +++++++++++++++++++++-
 tools/testing/selftests/drivers/net/hw/ncdevmem.c  | 36 +++++++++++--
 .../testing/selftests/drivers/net/hw/nk_devmem.py  | 11 +++-
 11 files changed, 180 insertions(+), 40 deletions(-)
---
base-commit: 805185b7c7a1069e407b6f7b3bc98e44d415f484
change-id: 20260602-tcpdm-large-niovs-56523a3a1077

Best regards,
-- 
Bobby Eshleman <bobbyeshleman@meta.com>


^ permalink raw reply

* [PATCH v2] selftests: Open /dev/udmabuf O_RDONLY
From: T.J. Mercier @ 2026-07-01 19:22 UTC (permalink / raw)
  To: kraxel, vivek.kasireddy, kuba, Shuah Khan, Andrew Lunn,
	David S. Miller, Eric Dumazet, Paolo Abeni
  Cc: T.J. Mercier, linux-kselftest, linux-kernel, netdev, bpf

Write permissions on the /dev/udmabuf device file are not required to
issue ioctls and allocate udmabufs. Applications should be opening this
file as O_RDONLY. The BPF dmabuf_iter selftest already does this. [1]

Users are pointing to these selftests as examples of how use udmabuf,
and encountering permission errors on systems where write permissions
are not available on /dev/udmabuf. Apply the principle of least
privilege to selftests which use udmabuf by removing the write access
mode from drivers/dma-buf/udmabuf.c and drivers/net/hw/ncdevmem.c.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/testing/selftests/bpf/prog_tests/dmabuf_iter.c?h=v7.1#n49

Signed-off-by: T.J. Mercier <tjmercier@google.com>
---
 tools/testing/selftests/drivers/dma-buf/udmabuf.c | 2 +-
 tools/testing/selftests/drivers/net/hw/ncdevmem.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/drivers/dma-buf/udmabuf.c b/tools/testing/selftests/drivers/dma-buf/udmabuf.c
index d78aec662586..ced0b95c876c 100644
--- a/tools/testing/selftests/drivers/dma-buf/udmabuf.c
+++ b/tools/testing/selftests/drivers/dma-buf/udmabuf.c
@@ -140,7 +140,7 @@ int main(int argc, char *argv[])
 	ksft_print_header();
 	ksft_set_plan(7);
 
-	devfd = open("/dev/udmabuf", O_RDWR);
+	devfd = open("/dev/udmabuf", O_RDONLY);
 	if (devfd < 0) {
 		ksft_print_msg(
 			"%s: [skip,no-udmabuf: Unable to access DMA buffer device file]\n",
diff --git a/tools/testing/selftests/drivers/net/hw/ncdevmem.c b/tools/testing/selftests/drivers/net/hw/ncdevmem.c
index e098d6534c3c..8114a29692fd 100644
--- a/tools/testing/selftests/drivers/net/hw/ncdevmem.c
+++ b/tools/testing/selftests/drivers/net/hw/ncdevmem.c
@@ -149,7 +149,7 @@ static struct memory_buffer *udmabuf_alloc(size_t size)
 
 	ctx->size = size;
 
-	ctx->devfd = open("/dev/udmabuf", O_RDWR);
+	ctx->devfd = open("/dev/udmabuf", O_RDONLY);
 	if (ctx->devfd < 0) {
 		pr_err("[skip,no-udmabuf: Unable to access DMA buffer device file]");
 		goto err_free_ctx;

base-commit: fbb7ad31ab376c5101b2ac7205fad0344fd2de60
-- 
2.55.0.rc0.799.gd6f94ed593-goog


^ permalink raw reply related

* Re: [PATCH v2 0/7] vmsplice: fix some problems in my previous vmsplice patchset
From: Andrei Vagin @ 2026-07-01 19:16 UTC (permalink / raw)
  To: Askar Safin
  Cc: Christian Brauner, David Hildenbrand (Arm), akpm, axboe,
	collin.funk1, david.laight.linux, dhowells, fuse-devel, hch, jack,
	joannelkoong, kernel, linux-api, linux-fsdevel, linux-kernel,
	linux-mm, luto, metze, miklos, netdev, patches, pfalcato,
	torvalds, val, viro, w, willy
In-Reply-To: <CAPnZJGAJROqfCWSeeBu31HsE6nmgxVqHTNeC554S5y1Y-VN19w@mail.gmail.com>

On Tue, Jun 30, 2026 at 9:01 PM Askar Safin <safinaskar@gmail.com> wrote:
>
> On Mon, Jun 29, 2026 at 11:56 AM Christian Brauner <brauner@kernel.org> wrote:
> > The amount of regression reports that we got in short succession doesn't
> > make it likely that we can merge a plain degradation.
>
> Let me repeat: this v2 patchset fixes all regressions found so far,
> except for major CRIU performance regression

Askar,

As previously mentioned, this isn't just a major performance issue for
CRIU; it completely breaks the current pre-dump implementation.

The other proposed fixes look like workarounds that target specific
projects. Since few projects run tests on linux-next kernels, we should
expect more projects to be affected by these changes.

Thanks,
Andrei

^ permalink raw reply

* [REGRESSION][BISECTED] tun/tap & vhost-net: multi-threaded network performance
From: Brett Sheffield @ 2026-07-01 19:16 UTC (permalink / raw)
  To: regressions, netdev
  Cc: Jakub Kicinski, Michael S. Tsirkin, Simon Schippers, Tim Gebauer,
	Willem de Bruijn, Jason Wang, Andrew Lunn, David S. Miller,
	Eric Dumazet, Paolo Abeni, linux-kernel

TL;DR - Commit 1d6e569b7d0c0b2736636749e4be0a27f3cefcb3 causes
significant performance regressions with TAP interfaces and multithreaded
network code. Please revert.


Librecast is an IPv6 multicast library. One of the tests (0055) fails under
Linux 7.2-rc1. The test performs data synchronization over IPv6 multicast using a TAP
interface. This test has run successfully on every stable, LTS and mainline RC
released in the past year. Every kernel with my Tested-by has run this test.

There have been a bunch of changes to MLDv2 so I started bisecting there, but
the culprit is actually 1d6e569b7d0c0b2736636749e4be0a27f3cefcb3 "tun/tap &
vhost-net: avoid ptr_ring tail-drop when a qdisc is present"

Reverting this commit fixes the test.

To eliminate my code and any multicast weirdness, I ran tests with iperf3
comparing the same host running 7.2-rc1 both with and without 1d6e569b7d0
reverted.

CPU: AMD Ryzen 9 9950X

[ host ] - [ bridge ] - [ tap ] - [ guest (qemu) ]

Running matching kernels on host and guest, I started iperf3 in server mode on
the guest and tested from the host so traffic passes through the tap interface.

iperf3 -s -V                 # server
iperf3 -c guest -P nthreads  # client

7.2.0-rc1 (threads 1):

[  5]   0.00-10.00  sec  20.2 GBytes  17.4 Gbits/sec    0            sender
[  5]   0.00-10.00  sec  2.00 GBytes  1.72 Gbits/sec                  receiver

7.2.0-rc1 (threads 1, reverted):

[  5]   0.00-10.00  sec  15.3 GBytes  13.1 Gbits/sec  368            sender
[  5]   0.00-10.00  sec  2.00 GBytes  1.72 Gbits/sec                  receiver

7.2.0-rc1 (threads 2):

[SUM]   0.00-10.00  sec  10.9 GBytes  9.33 Gbits/sec    0             sender
[SUM]   0.00-10.00  sec  4.00 GBytes  3.43 Gbits/sec                  receiver

7.2.0-rc1 (threads 2, reverted):

[SUM]   0.00-10.00  sec  15.9 GBytes  13.7 Gbits/sec  1567             sender
[SUM]   0.00-10.00  sec  4.00 GBytes  3.43 Gbits/sec                  receiver

7.2.0-rc1 (threads 4):

[SUM]   0.00-10.00  sec  10.9 GBytes  9.33 Gbits/sec    0             sender
[SUM]   0.00-10.00  sec  8.00 GBytes  6.87 Gbits/sec                  receiver

7.2.0-rc1 (threads 4, reverted):

[SUM]   0.00-10.00  sec  16.5 GBytes  14.1 Gbits/sec  6701             sender
[SUM]   0.00-10.00  sec  8.00 GBytes  6.87 Gbits/sec                  receiver

7.2.0-rc1 (threads 8):

[SUM]   0.00-10.00  sec  10.7 GBytes  9.15 Gbits/sec    0             sender
[SUM]   0.00-10.01  sec  10.6 GBytes  9.13 Gbits/sec                  receiver

7.2.0-rc1 (threads 8, reverted):

[SUM]   0.00-10.00  sec  16.2 GBytes  14.0 Gbits/sec  19319             sender
[SUM]   0.00-10.00  sec  15.7 GBytes  13.5 Gbits/sec                  receiver

7.2.0-rc1 (threads 16):

[SUM]   0.00-10.00  sec  10.9 GBytes  9.35 Gbits/sec    0             sender
[SUM]   0.00-10.01  sec  10.9 GBytes  9.32 Gbits/sec                  receiver

7.2.0-rc1 (threads 16, reverted):

[SUM]   0.00-10.00  sec  14.4 GBytes  12.4 Gbits/sec  43593             sender
[SUM]   0.00-10.00  sec  14.4 GBytes  12.4 Gbits/sec                  receiver


As you can see, the new code works for single threaded, but for all other cases
there's a significant performance drop. I see this trade-off is mentioned in the
commit, but the performance drop off is much worse than suggested with the
current patch.

In our multicast use case data is sent by multiple threads to multiple groups
simultaneously, this just breaks things to the extent that a <2 second test
times out after 5 minutes.


git bisect start
# status: waiting for both good and bad commits
# bad: [dc59e4fea9d83f03bad6bddf3fa2e52491777482] Linux 7.2-rc1
git bisect bad dc59e4fea9d83f03bad6bddf3fa2e52491777482
# status: waiting for good commit(s), bad commit known
# good: [36bdc0e815b4e8a05b9028d8ef8a25e1ead35cc1] net: usb: asix: ax88772: re-add usbnet_link_change() in phylink callbacks
git bisect good 36bdc0e815b4e8a05b9028d8ef8a25e1ead35cc1
# good: [db314398f618a3a23315f73c87f7d318eaf06c1b] Merge branch 'net-bridge-mcast-support-exponential-field-encoding'
git bisect good db314398f618a3a23315f73c87f7d318eaf06c1b
# bad: [079a028d6327e68cfa5d38b36123637b321c19a7] string: Remove strncpy() from the kernel
git bisect bad 079a028d6327e68cfa5d38b36123637b321c19a7
# bad: [f396f4005180928cd9e15e352a6512865d3bc908] Bluetooth: btmtk: fix URB leak in alloc_mtk_intr_urb error path
git bisect bad f396f4005180928cd9e15e352a6512865d3bc908
# bad: [ec1806a730a1c0b3d68a7f9afe81514fb0dd7991] netfilter: x_tables: disable 32bit compat interface in user namespaces
git bisect bad ec1806a730a1c0b3d68a7f9afe81514fb0dd7991
# good: [50c2d91c5dfa0e465826ec1f8dbad9cdc254bd85] mptcp: do not drop partial packets
git bisect good 50c2d91c5dfa0e465826ec1f8dbad9cdc254bd85
# good: [68993ced0f618e36cf33388f1e50223e5e6e78cc] Merge tag 'net-7.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
git bisect good 68993ced0f618e36cf33388f1e50223e5e6e78cc
# good: [34c78dff59a25110a4ce50c208e42a91490fe615] Merge branch 'net-use-ip_outnoroutes-drop-reason'
git bisect good 34c78dff59a25110a4ce50c208e42a91490fe615
# bad: [9587ed8137fb83d93f84b858337412f4500b21e9] Merge branch 'gve-add-support-for-ptp-gettimex64'
git bisect bad 9587ed8137fb83d93f84b858337412f4500b21e9
# bad: [83ea7fd73b11dd8cbf4416507a5eac3890b49fb0] net: dsa: microchip: remove unused phylink_mac_link_up() callback
git bisect bad 83ea7fd73b11dd8cbf4416507a5eac3890b49fb0
# bad: [f0de88303d5e7e04a1224bc7a00512b5a1c4fe7a] net: make is_skb_wmem() available to modules
git bisect bad f0de88303d5e7e04a1224bc7a00512b5a1c4fe7a
# bad: [c411baa463e85a779a7e68a00ba6298770b58c4c] netconsole: move push_ipv6() from netpoll
git bisect bad c411baa463e85a779a7e68a00ba6298770b58c4c
# good: [fba362c17d9d9211fc51f272156bb84fc23bdf98] ptr_ring: move free-space check into separate helper
git bisect good fba362c17d9d9211fc51f272156bb84fc23bdf98
# bad: [d0273dbe8be1640e597552f81faf1d6c9997d3e3] ipvlan: use netif_receive_skb() in ipvlan_process_multicast()
git bisect bad d0273dbe8be1640e597552f81faf1d6c9997d3e3
# bad: [3803065cd6b0630d4161d86aa04e2d1db0f3a0b5] Merge branch 'tun-tap-vhost-net-apply-qdisc-backpressure-on-full-ptr_ring-to-reduce-tx-drops'
git bisect bad 3803065cd6b0630d4161d86aa04e2d1db0f3a0b5
# bad: [1d6e569b7d0c0b2736636749e4be0a27f3cefcb3] tun/tap & vhost-net: avoid ptr_ring tail-drop when a qdisc is present
git bisect bad 1d6e569b7d0c0b2736636749e4be0a27f3cefcb3
# first bad commit: [1d6e569b7d0c0b2736636749e4be0a27f3cefcb3] tun/tap & vhost-net: avoid ptr_ring tail-drop when a qdisc is present

-- 
Brett Sheffield (he/him)
Librecast - Decentralising the Internet with Multicast
https://librecast.net/
https://blog.brettsheffield.com/

^ permalink raw reply

* Re: [PATCH net] net/mlx5: HWS, fix matcher leak on resize target setup failure
From: Tariq Toukan @ 2026-07-01 19:02 UTC (permalink / raw)
  To: Paolo Abeni, saeedm, tariqt, mbloch, leon
  Cc: andrew+netdev, davem, edumazet, kuba, kliteyn, vdogaru, horms,
	kees, stable, netdev, linux-rdma, linux-kernel, jianhao.xu, zilin,
	Dawei Feng
In-Reply-To: <8138f145-6a4d-465e-a45c-b8ffbf9e05bc@redhat.com>



On 01/07/2026 17:38, Paolo Abeni wrote:
> On 6/29/26 8:40 AM, Dawei Feng wrote:
>> hws_bwc_matcher_move() allocates a replacement matcher before setting it
>> as the resize target. If mlx5hws_matcher_resize_set_target() fails, the
>> replacement matcher is not attached anywhere and is leaked.
>>
>> Fix the leak by destroying the replacement matcher before returning from
>> the resize-target failure path.
>>
>> The bug was first flagged by an experimental analysis tool we are
>> developing for kernel memory-management bugs while analyzing
>> v6.13-rc1. The tool is still under development and is not yet publicly
>> available. Manual inspection confirms that the bug is still
>> present in v7.1.1.
>>
>> An x86_64 allyesconfig build showed no new warnings. As we do not have a
>> mlx5 HWS-capable device to test with, no runtime testing was able to be
>> performed.
>>
>> Fixes: 2111bb970c78 ("net/mlx5: HWS, added backward-compatible API handling")
>> Cc: stable@vger.kernel.org
>> Signed-off-by: Dawei Feng <dawei.feng@seu.edu.cn>
> 
> @nvidia team, double checking I did not miss any relevant communication.
> The last process update I recall is that one of the people listed in
> maintainer file will ack patches for us to merge directly into the
> net/net-next trees.
> 

Ack.

> Should we consider any ack from @nvidia sufficient to take over?
> 
> Thanks,
> 
> Paolo
> 
> 

Acked-by: Tariq Toukan <tariqt@nvidia.com>


^ permalink raw reply

* Re: RTL8159 firmware
From: Birger Koblitz @ 2026-07-01 18:58 UTC (permalink / raw)
  To: Jan Hendrik Farr
  Cc: andrew+netdev, davem, edumazet, hsu.chih.kai, kuba, linux-kernel,
	linux-usb, netdev, olek2, pabeni
In-Reply-To: <akVZUQX1V-jAM06U@archlinux>



On 7/1/26 20:15, Jan Hendrik Farr wrote:
> On 01 19:24:13, Birger Koblitz wrote:
>> Hi Jan,
>>
>> On 7/1/26 19:13, Jan Hendrik Farr wrote:
>>> Hi Birger,
>>>
>>> it looks like the firmware file rtl_nic/rtl8159-1.fw isn't in linux-firmware yet.
>>> Could you send it for people to potentially test?
>>>
>>> Jan
>>>
>> The code to create the binary firmware file is at:
>> https://gitlab.com/koblitz-rtlnic/rtlnic_fw
> 
> I'm getting a 404.
That was my mistake, I forgot to make the repository public. Sorry about that, first
time using GitLab...
Here is the link again, I hope it works now:
https://gitlab.com/koblitz-rtlnic/rtlnic_fw

Birger

^ permalink raw reply

* Re: [PATCH] selftests: Open /dev/udmabuf O_RDONLY
From: Jakub Kicinski @ 2026-07-01 18:57 UTC (permalink / raw)
  To: T.J. Mercier
  Cc: kraxel, vivek.kasireddy, Shuah Khan, Andrew Lunn, David S. Miller,
	Eric Dumazet, Paolo Abeni, linux-kselftest, linux-kernel, netdev,
	bpf
In-Reply-To: <CABdmKX2oG9m316hiJpSXbujrT3vgE5hUpzH_WHfjNxBJ1_+BdA@mail.gmail.com>

On Wed, 1 Jul 2026 11:53:15 -0700 T.J. Mercier wrote:
> On Fri, Jun 26, 2026 at 6:09 PM Jakub Kicinski <kuba@kernel.org> wrote:
> >
> > On Thu, 25 Jun 2026 11:15:55 -0700 T.J. Mercier wrote:  
> > > Write permissions on the /dev/udmabuf device file are not required to
> > > issue ioctls and allocate udmabufs. Applications should be opening this
> > > file as O_RDONLY. The BPF dmabuf_iter selftest already does this. [1]
> > >
> > > Remove the write access mode from the drivers/dma-buf/udmabuf.c and
> > > drivers/net/hw/ncdevmem.c selftests.  
> >
> > You need to explain "why", too. Why change it if it clearly
> > worked for everyone running this test until now.
> > --
> > pw-bot: cr  
> 
> Principle of least privilege. Folks use or point to these selftests as
> examples, and then wonder why O_RDWR doesn't work on systems where
> write permissions are not available on /dev/udmabuf.

Alright, pop that into the commit msg and repost please.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox