Netdev List
 help / color / mirror / Atom feed
* Landlock: LANDLOCK_ACCESS_NET_CONNECT_TCP bypass via TCP Fast Open
From: Bryam Vargas @ 2026-06-16 20:16 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Günther Noack, Matthieu Buffet, Paul Moore, Eric Dumazet,
	Neal Cardwell, linux-security-module, netdev, linux-kernel

Hello Mickaël, and Landlock folks,

A task confined by a Landlock ruleset that handles
LANDLOCK_ACCESS_NET_CONNECT_TCP and is denied connecting to a given port can
still establish a TCP connection to that port by using TCP Fast Open, i.e.
sendto(fd, ..., MSG_FASTOPEN, &dst, dstlen) on a fresh stream socket. The
network-egress confinement for TCP connect is silently bypassed.

Affected
--------
Any kernel with CONFIG_SECURITY_LANDLOCK=y and Landlock enabled that supports
the TCP network access rights (Landlock ABI >= 4, since Linux 6.7). Confirmed by
source inspection on mainline (v7.1-rc7) and reproduced on Linux 7.0.11
(Landlock ABI 8). No CONFIG beyond Landlock + IPv4/IPv6 TCP; TCP Fast Open client
is enabled by the per-netns default (net.ipv4.tcp_fastopen has TFO_CLIENT_ENABLE
set), so no sysctl change and no setsockopt are required.

Root cause
----------
LANDLOCK_ACCESS_NET_CONNECT_TCP is enforced only by the socket_connect LSM hook
(hook_socket_connect -> current_check_access_socket). security_socket_connect()
has exactly one call site in the tree, net/socket.c (the connect(2) syscall).

TCP Fast Open performs an implicit connect inside sendmsg:

  tcp_sendmsg_locked()            net/ipv4/tcp.c  (MSG_FASTOPEN branch)
   -> tcp_sendmsg_fastopen()      net/ipv4/tcp.c
   -> __inet_stream_connect(..., is_sendmsg=1)  net/ipv4/af_inet.c
   -> sk->sk_prot->connect()      net/ipv4/af_inet.c  -> tcp_v4_connect()

This path establishes the connection to the address taken from msg_name but
never calls security_socket_connect(). The only LSM hook fired on the sendmsg
path is security_socket_sendmsg(), and Landlock registers no socket_sendmsg
hook, so LANDLOCK_ACCESS_NET_CONNECT_TCP is never re-checked. __inet_stream_connect()
itself carries no LSM hook (only the cgroup-BPF pre_connect, a different
mechanism).

Notably the kernel already mediates the analogous AF_UNIX implicit-connect on the
send path via the unix_may_send hook, which Landlock does register
(hook_unix_may_send) -- so the sendmsg-implies-connect pattern is recognized, but
the TCP Fast Open case has no equivalent coverage. The MPTCP fast-open path
(mptcp_sendmsg_fastopen -> __inet_stream_connect) is a second producer of the
same unmediated connect (by source inspection; not separately reproduced).

Reproducer
----------
A self-contained, fully unprivileged PoC is available on request. It forks an
unconfined TFO-capable loopback listener, then in a child applies a Landlock
ruleset handling LANDLOCK_ACCESS_NET_CONNECT_TCP with no allow rule
(landlock_create_ruleset() with handled_access_net =
LANDLOCK_ACCESS_NET_CONNECT_TCP, no landlock_add_rule(), then
landlock_restrict_self(); every TCP connect is denied) and tries the forbidden
port two ways:

  (1) connect(fd, &dst)                 -> -EACCES   (Landlock enforces CONNECT_TCP)
  (2) sendto(fd2, buf, len, MSG_FASTOPEN, &dst, dstlen)
                                        -> succeeds; the listener accepts the
                                           connection and reads the payload.

Observed on Linux 7.0.11 (Landlock ABI 8):

  [1] connect(2)            -> ret=-1 errno=13 (Permission denied)
  [2] sendto(MSG_FASTOPEN)  -> ret=14 errno=0 (OK/queued)
  [+] listener ACCEPTED the confined child's connection; payload="..."

connect(2) to the port is denied while sendto(MSG_FASTOPEN) reaches the identical
port and delivers data.

Impact
------
A sandbox that uses LANDLOCK_ACCESS_NET_CONNECT_TCP to restrict outbound TCP
(e.g. to keep a confined component from reaching an internal service or a
metadata endpoint) can be escaped by an unprivileged, self-confined task with no
CAP and no namespace transition -- for any destination port, since the
implicit-connect path never consults the connect hook regardless of address (the
run above shows one port). It is an integrity
bypass of the network-confinement property; no memory safety is involved.
I score it CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:C/C:N/I:H/A:N (6.5 Medium) -- the
confined task escapes the policy authority that defined its sandbox, a scope
change; 5.5 if you treat the Landlock boundary as the same authority (S:U).

Note on the in-flight UDP series
--------------------------------
The "landlock: Add UDP access control support" series (v5, Matthieu Buffet,
https://lore.kernel.org/r/20260611162107.49278-3-matthieu@buffet.re) adds a
socket_sendmsg hook, hook_socket_sendmsg(), but it returns 0 for non-UDP
sockets:

    if (sk_is_udp(sock->sk))
            access_request = LANDLOCK_ACCESS_NET_CONNECT_SEND_UDP;
    else
            return 0;

so a TCP socket using MSG_FASTOPEN still bypasses LANDLOCK_ACCESS_NET_CONNECT_TCP
even after that series lands. It may be most convenient to fix this there.

Suggested direction
-------------------
Re-check LANDLOCK_ACCESS_NET_CONNECT_TCP on the implicit-connect path: either have
the socket_sendmsg hook evaluate CONNECT_TCP for stream sockets when the call
performs an implicit connect (mirroring the AF_UNIX unix_may_send handling), or
place the check inside __inet_stream_connect() so a single chokepoint covers
connect(2), TCP Fast Open, and the MPTCP fast-open sibling.

I am happy to send a patch for this if you would like me to.

Best regards,

Bryam Vargas
Independent security researcher, HEXLAB S.A.S., Cali, Colombia
hexlabsecurity@proton.me


^ permalink raw reply

* Re: [PATCH v27 4/5] sfc: obtain and map cxl range using devm_cxl_probe_mem
From: Dan Williams (nvidia) @ 2026-06-16 19:51 UTC (permalink / raw)
  To: Alejandro Lucero Palau, Dan Williams (nvidia),
	alejandro.lucero-palau, linux-cxl, netdev, edward.cree, davem,
	kuba, pabeni, edumazet, dave.jiang
In-Reply-To: <50d8e423-8248-4e26-901b-010d14d22e67@amd.com>

Alejandro Lucero Palau wrote:
> 
> On 6/10/26 14:56, Alejandro Lucero Palau wrote:
> >
> > On 6/10/26 07:10, Alejandro Lucero Palau wrote:
> >>
> >> On 6/10/26 00:30, Dan Williams (nvidia) wrote:
> >>> alejandro.lucero-palau@ wrote:
> >>>> From: Alejandro Lucero <alucerop@amd.com>
> >>>>
> >>>> Use core API for safely obtain the CXL range linked to an HDM 
> >>>> committed
> >>>> by the BIOS. Map such a range for being used as the ctpio buffer.
> >>>>
> >>>> A potential user space action through sysfs unbinding or core cxl
> >>>> modules remove will trigger sfc driver device detachment, with that 
> >>>> case
> >>>> not racing with this mapping as this is done during driver probe and
> >>>> therefore protected with device lock against those user space actions.
> >>>>
> >>>> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> >>>> ---
> >>>>   drivers/net/ethernet/sfc/efx.c     |  1 +
> >>>>   drivers/net/ethernet/sfc/efx_cxl.c | 24 ++++++++++++++++++++++++
> >>>>   drivers/net/ethernet/sfc/efx_cxl.h |  3 +++
> >>>>   3 files changed, 28 insertions(+)
> >>>>
> >>>> diff --git a/drivers/net/ethernet/sfc/efx.c 
> >>>> b/drivers/net/ethernet/sfc/efx.c
> >>>> index 90ccbe310386..578054c21e79 100644
> >>>> --- a/drivers/net/ethernet/sfc/efx.c
> >>>> +++ b/drivers/net/ethernet/sfc/efx.c
> >>>> @@ -984,6 +984,7 @@ static void efx_pci_remove(struct pci_dev 
> >>>> *pci_dev)
> >>>>       efx_fini_io(efx);
> >>>>         probe_data = container_of(efx, struct efx_probe_data, efx);
> >>>> +    efx_cxl_exit(probe_data);
> >>>>         pci_dbg(efx->pci_dev, "shutdown successful\n");
> >>>>   diff --git a/drivers/net/ethernet/sfc/efx_cxl.c 
> >>>> b/drivers/net/ethernet/sfc/efx_cxl.c
> >>>> index 4d55c08cf2a1..d5766a40e2cf 100644
> >>>> --- a/drivers/net/ethernet/sfc/efx_cxl.c
> >>>> +++ b/drivers/net/ethernet/sfc/efx_cxl.c
> >>>> @@ -18,6 +18,7 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
> >>>>   {
> >>>>       struct efx_nic *efx = &probe_data->efx;
> >>>>       struct pci_dev *pci_dev = efx->pci_dev;
> >>>> +    struct range cxl_pio_range;
> >>>>       struct efx_cxl *cxl;
> >>>>       u16 dvsec;
> >>>>       int rc;
> >>>> @@ -75,9 +76,32 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
> >>>>           return -ENODEV;
> >>>>       }
> >>>>   +    cxl->cxlmd = devm_cxl_probe_mem(&cxl->cxlds, &cxl_pio_range);
> >>>> +    if (IS_ERR(cxl->cxlmd)) {
> >>>> +        pci_err(pci_dev, "CXL accel memdev creation failed\n");
> >>>> +        return PTR_ERR(cxl->cxlmd);
> >>>> +    }
> >>>> +
> >>>> +    cxl->ctpio_cxl = ioremap_wc(cxl_pio_range.start,
> >>>> +                    range_len(&cxl_pio_range));
> >>>> +    if (!cxl->ctpio_cxl) {
> >>>> +        pci_err(pci_dev, "CXL ioremap region (%pra) failed\n",
> >>>> +            &cxl_pio_range);
> >>>> +        return -ENOMEM;
> >>> Dave caught the iounmap leak, but another concern is since you want to
> >>> continue operation if efx_cxl_init() fails then you probably also want
> >>> to release the successful attachment to the CXL domain if this happens.
> >>
> >>
> >> I will do that.
> >>
> >
> > Looking at this issue, I think an error when creating the memdev or 
> > during the region attach triggers the memdev removal, but ...
> >
> >
> >>
> >>> Minor since something else is likely to fail if ioremap is not 
> >>> reliable.
> >
> >
> > .. if we want to specifically do that with an unlikely (but possible) 
> > ioremap error something else needs to be exported like 
> > cxl_memdev_unregister(). Are you happy with that approach?
> >
> 
> I have just tested with this:
> 
> +void cxl_memdev_remove(void *_cxlmd)
> +{
> +       struct cxl_memdev *cxlmd = _cxlmd;
> +       struct device *dev = &cxlmd->dev;
> +
> +       devm_remove_action_nowarn(cxlmd->cxlds->dev, cxl_memdev_unregister,
> +                                 cxlmd);
> +
> +       cdev_device_del(&cxlmd->cdev, dev);
> +       cxl_memdev_shutdown(dev);
> +       put_device(dev);
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_memdev_remove, "CXL");
> 
> 
> only called if the ioremap fails.
> 
> 
> Please, let me know if you like this approach before sending another 
> version.

A devres group can automatically cleanup after devm_cxl_memdev_probe()
in the error path with no new exports needed from the CXL core.
Something like:

        void *group = devres_open_group(cxl->cxlds.dev, NULL, GFP_KERNEL);
        int rc = 0;

        if (!group)
                return -ENOMEM;
        
        cxl->cxlmd = devm_cxl_probe_mem(&cxl->cxlds, &cxl_pio_range);
        if (IS_ERR(cxl->cxlmd)) {
                pci_err(pci_dev, "CXL accel memdev creation failed\n");
                rc = PTR_ERR(cxl->cxlmd);
                goto out;
        }

        cxl->ctpio_cxl =
                ioremap_wc(cxl_pio_range.start, range_len(&cxl_pio_range));
        if (!cxl->ctpio_cxl) {
                pci_err(pci_dev, "CXL ioremap region (%pra) failed\n",
                        &cxl_pio_range);
                rc = -ENOMEM;
        }

out:
        if (rc)
                devres_release_group(group);
        else
                devres_remove_group(group);
        return rc;

^ permalink raw reply

* Re: [PATCH net-next v6 2/2] dinghai: add hardware register access and PCI? capability scanning
From: Andrew Lunn @ 2026-06-16 19:49 UTC (permalink / raw)
  To: han.junyang
  Cc: andrew+netdev, davem, edumazet, kuba, pabeni, horms, linux-kernel,
	netdev, ran.ming, han.chengfei, zhang.yanze
In-Reply-To: <20260616213550502kLzSZF2DiQyd9Dl0Dv0Gz@zte.com.cn>

> +int zxdh_pf_common_cfg_init(struct dh_core_dev *dh_dev)
> +{
> +	struct zxdh_pf_device *pf_dev = dh_dev->priv;
> +	struct pci_dev *pdev = dh_dev->pdev;
> +	int common;
> +
> +	/* check for a common config: if not, use legacy mode (bar 0). */
> +	common = zxdh_pf_pci_find_capability(pdev, ZXDH_PCI_CAP_COMMON_CFG,
> +					     IORESOURCE_IO | IORESOURCE_MEM,
> +					     &pf_dev->modern_bars);
> +	if (common == 0) {
> +		dev_err(dh_dev->device,
> +			"missing capabilities %i, leaving for legacy driver\n",
> +			common);

That looks double odd. Normally you would use !common. Also, you know
common is 0, so why use "%i", when it could be just '0'.

> +int zxdh_pf_notify_cfg_init(struct dh_core_dev *dh_dev)
> +{
> +	struct zxdh_pf_device *pf_dev = dh_dev->priv;
> +	struct pci_dev *pdev = dh_dev->pdev;
> +	u32 notify_length;
> +	u32 notify_offset;
> +	int notify;
> +
> +	/* If common is there, these should be too... */
> +	notify = zxdh_pf_pci_find_capability(pdev, ZXDH_PCI_CAP_NOTIFY_CFG,
> +					     IORESOURCE_IO | IORESOURCE_MEM,
> +					     &pf_dev->modern_bars);
> +	if (notify == 0) {
> +		dev_err(dh_dev->device, "missing capabilities %i\n", notify);
> +		return -EINVAL;
> +	}
> +

Same again.

    Andrew

---
pw-bot: cr

^ permalink raw reply

* Re: [PATCH net-next v6 1/2] dinghai: add ZTE network driver support
From: Andrew Lunn @ 2026-06-16 19:39 UTC (permalink / raw)
  To: han.junyang
  Cc: andrew+netdev, davem, edumazet, kuba, pabeni, horms, linux-kernel,
	netdev, ran.ming, han.chengfei, zhang.yanze
In-Reply-To: <20260616213057452I2KLm3mVgWYl_SUTy_YYS@zte.com.cn>

> +++ b/drivers/net/ethernet/zte/dinghai/en_pf.h
> +static inline void *dh_core_alloc_priv(struct dh_core_dev *dh_dev,
> +				       size_t size)
> +{
> +	void *priv = kzalloc(size, GFP_KERNEL);
> +
> +	if (priv)
> +		dh_dev->priv = priv;
> +	return priv;
> +}
> +
> +static inline void dh_core_free_priv(struct dh_core_dev *dh_dev)
> +{
> +	kfree(dh_dev->priv);
> +}

It is unusual for these to be inline functions in a header. Why is
this?

	Andrew

^ permalink raw reply

* Re: [PATCH bpf-next 1/2] bpf: Guard conntrack opts error writes
From: Alexei Starovoitov @ 2026-06-16 19:36 UTC (permalink / raw)
  To: Yiyang Chen, bpf, netfilter-devel
  Cc: pablo, fw, phil, davem, edumazet, kuba, pabeni, horms, andrii,
	eddyz87, ast, daniel, memxor, martin.lau, song, yonghong.song,
	jolsa, emil, shuah, kartikey406, coreteam, netdev, linux-kernel,
	linux-kselftest
In-Reply-To: <70aeec0ab762aebe65129cf6052e132c7329edc2.1781586477.git.chenyy23@mails.tsinghua.edu.cn>

On Mon Jun 15, 2026 at 10:42 PM PDT, Yiyang Chen wrote:
> The conntrack lookup and allocation kfuncs take an opts pointer
> together with an opts__sz argument. The verifier checks only the memory
> range described by opts__sz, but the wrappers unconditionally write
> opts->error whenever the internal lookup or allocation helper returns an
> error.
>
> For an invalid size smaller than the end of opts->error, that write can
> land outside the verifier-checked range. Keep returning NULL for invalid
> arguments, but only report the error through opts->error when the
> supplied size includes the field.
>
> This preserves error reporting for the supported 12-byte and 16-byte
> layouts, and for other invalid sizes that still include opts->error.
>
> Fixes: b4c2b9593a1c ("net/netfilter: Add unstable CT lookup helpers for XDP and TC-BPF")
> Fixes: d7e79c97c00c ("net: netfilter: Add kfuncs to allocate and insert CT")
> Signed-off-by: Yiyang Chen <chenyy23@mails.tsinghua.edu.cn>
> ---
>  net/netfilter/nf_conntrack_bpf.c | 17 +++++++++++++----
>  1 file changed, 13 insertions(+), 4 deletions(-)
>
> diff --git a/net/netfilter/nf_conntrack_bpf.c b/net/netfilter/nf_conntrack_bpf.c
> index 40c261cd0af38..3c182024ec509 100644
> --- a/net/netfilter/nf_conntrack_bpf.c
> +++ b/net/netfilter/nf_conntrack_bpf.c
> @@ -65,6 +65,11 @@ enum {
>  	NF_BPF_CT_OPTS_SZ = 16,
>  };
>  
> +static bool bpf_ct_opts_has_error(u32 opts_len)
> +{
> +	return opts_len >= offsetofend(struct bpf_ct_opts, error);
> +}
> +
>  static int bpf_nf_ct_tuple_parse(struct bpf_sock_tuple *bpf_tuple,
>  				 u32 tuple_len, u8 protonum, u8 dir,
>  				 struct nf_conntrack_tuple *tuple)
> @@ -298,7 +303,8 @@ bpf_xdp_ct_alloc(struct xdp_md *xdp_ctx, struct bpf_sock_tuple *bpf_tuple,
>  	nfct = __bpf_nf_ct_alloc_entry(dev_net(ctx->rxq->dev), bpf_tuple, tuple__sz,
>  				       opts, opts__sz, 10);
>  	if (IS_ERR(nfct)) {
> -		opts->error = PTR_ERR(nfct);
> +		if (bpf_ct_opts_has_error(opts__sz))
> +			opts->error = PTR_ERR(nfct);

LLMs have no taste.

Above two lines could have been one helper
   bpf_ct_opts_set_error(opts, opts__sz, PTR_ERR(nfct));

Or we can do a step further and simplify the code more.
Turn this:
   if (IS_ERR(nfct)) {
           opts->error = PTR_ERR(nfct);
           return NULL;
   }
   return (struct nf_conn___init *)nfct;
into:
   return (struct nf_conn___init *)bpf_ct_opts_result(opts, opts__sz, nfct);

static void *bpf_ct_opts_result(struct bpf_ct_opts *opts, u32 opts__sz, void *ret)
{
  if (!IS_ERR(ret))
    return ret;
  if (opts__sz >= offsetofend(struct bpf_ct_opts, error))
    opts->error = PTR_ERR(ret);
  return NULL;
}

This kind of small improvements should be obvious to any human developer.
Please do NOT send us patches straight out of LLM.
Review it first and think how to improve it.

pw-bot: cr

^ permalink raw reply

* Re: [PATCH v27 3/5] cxl/sfc: Initialize dpa without a mailbox
From: Dan Williams (nvidia) @ 2026-06-16 19:35 UTC (permalink / raw)
  To: Alejandro Lucero Palau, Dan Williams (nvidia),
	alejandro.lucero-palau, linux-cxl, netdev, edward.cree, davem,
	kuba, pabeni, edumazet, dave.jiang
  Cc: Dan Williams, Ben Cheatham, Jonathan Cameron
In-Reply-To: <17b68fb1-768e-49f6-884d-49e0952621b8@amd.com>

Alejandro Lucero Palau wrote:
> 
> On 6/10/26 00:24, Dan Williams (nvidia) wrote:
> > alejandro.lucero-palau@ wrote:
> >> From: Alejandro Lucero <alucerop@amd.com>
> >>
> >> Type3 relies on mailbox CXL_MBOX_OP_IDENTIFY command for initializing
> >> memdev state params which end up being used for DPA initialization.
> >>
> >> Allow a Type2 driver to initialize DPA simply by giving the size of its
> >> volatile hardware partition.
> >>
> >> Move related functions to memdev.
> > The code movement is not strictly necessary. Just add cxl_set_capacity()
> > and we can consider a move later if mbox.o and memdev.o are ever not
> > both included in cxl_core.o by default.
> 
> 
> I think it is the right thing to do as the new function uses add_part() 
> (moved) and the other add_part() client is the other function moved, 
> cxl_mem_dpa_fetch().
> 
> Note cxl_mem_get_partition() used by cxl_mem_dpa_fetch() is the one 
> working with mbox commands and it remains in the same place inside 
> core/mbox.c and the only cxl_mem_dpa_fetch() client is cxl/pci.c
> 
> 
> This was reviewed and accepted so no reason for not doing it ...

Sure, I am ok to let it go as is.

^ permalink raw reply

* Re: [Bug] incompatibility between 'e1000e' and Aruba AOS-CX switches (too small inter-packet gap)
From: Andrew Lunn @ 2026-06-16 19:34 UTC (permalink / raw)
  To: Philippe Andersson; +Cc: netdev, Ludovic Calmant, Fabian Noël
In-Reply-To: <457d1617-bd7f-44c5-a9af-7ba8aa9250f4@iba-group.com>

> A support ticket has already been opened with Aruba, but it's unclear at
> this stage that the problem is on their side.

How easy is it to reproduce? Can you run a git bisect from the last
known good kernel version to the first known bad version?

      Andrew

^ permalink raw reply

* [PATCH v1 net-next] ipv4: fib_rule: Move fib4_rules_exit() to ->exit().
From: Kuniyuki Iwashima @ 2026-06-16 19:13 UTC (permalink / raw)
  To: David Ahern, Ido Schimmel, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, Kuniyuki Iwashima, Kuniyuki Iwashima, netdev,
	syzbot+965506b59a2de0b6905c

syzbot reported use-after-free of net->ipv4.rules_ops. [0]

It can be reproduced with these commands:

  while true; do
  	ip netns add ns1
  	ip -n ns1 link set dev lo up
  	ip -n ns1 address add 192.0.2.1/24 dev lo
  	ip -n ns1 link add name dummy1 up type dummy
  	ip -n ns1 address add 198.51.100.1/24 dev dummy1
  	ip -n ns1 rule add ipproto tcp sport 12345 table 12345
  	ip -n ns1 fou add port 5555 ipproto 47 local 192.0.2.1 peer 198.51.100.2 peer_port 54321
  	ip netns del ns1
  done

The cited commit moved fib4_rules_exit() earlier to ->exit_rtnl(),
but the kernel socket destroyed in ->exit() could eventually reach
__fib_lookup().

I left fib4_rules_exit() in ->exit_rtnl() because fib4_rule_delete()
calls fib_unmerge(), which requires RTNL.

However, when ->delete() is called, ->configure() has already been
called, thus fib_unmerge() in ->delete() has no effect.

Let's remove fib_unmerge() in fib4_rule_delete() and move
fib4_rules_exit() to ->exit().

Many thanks to Ido Schimmel for providing the nice repro very quickly.

Note that we can make fib_rules_ops.delete() return void once
net-next opens.

[0]:
BUG: KASAN: slab-use-after-free in fib_rules_lookup+0x15e/0xeb0 net/core/fib_rules.c:321
Read of size 8 at addr ffff88804ec4c680 by task kworker/u8:21/12641

CPU: 0 UID: 0 PID: 12641 Comm: kworker/u8:21 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/09/2026
Workqueue: netns cleanup_net
Call Trace:
 <TASK>
 dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120
 print_address_description+0x55/0x1e0 mm/kasan/report.c:378
 print_report+0x58/0x70 mm/kasan/report.c:482
 kasan_report+0x117/0x150 mm/kasan/report.c:595
 fib_rules_lookup+0x15e/0xeb0 net/core/fib_rules.c:321
 __fib_lookup+0x106/0x210 net/ipv4/fib_rules.c:96
 ip_route_output_key_hash_rcu+0x294/0x2720 net/ipv4/route.c:2811
 ip_route_output_key_hash+0x18d/0x2a0 net/ipv4/route.c:2702
 __ip_route_output_key include/net/route.h:169 [inline]
 ip_route_output_flow+0x2a/0x150 net/ipv4/route.c:2929
 ip4_datagram_release_cb+0x89d/0xbe0 net/ipv4/datagram.c:118
 release_sock+0x206/0x260 net/core/sock.c:3861
 inet_shutdown+0x2b1/0x390 net/ipv4/af_inet.c:950
 udp_tunnel_sock_release+0x6d/0x80 net/ipv4/udp_tunnel_core.c:197
 fou_release net/ipv4/fou_core.c:562 [inline]
 fou_exit_net+0x17d/0x1f0 net/ipv4/fou_core.c:1230
 ops_exit_list net/core/net_namespace.c:199 [inline]
 ops_undo_list+0x43d/0x8d0 net/core/net_namespace.c:252
 cleanup_net+0x572/0x810 net/core/net_namespace.c:702
 process_one_work kernel/workqueue.c:3314 [inline]
 process_scheduled_works+0xa8e/0x14e0 kernel/workqueue.c:3397
 worker_thread+0xa47/0xfb0 kernel/workqueue.c:3478
 kthread+0x389/0x470 kernel/kthread.c:436
 ret_from_fork+0x514/0xb70 arch/x86/kernel/process.c:158
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
 </TASK>

Fixes: 759923cf03b0 ("ipv4: fib: Convert fib_net_exit_batch() to ->exit_rtnl().")
Reported-by: syzbot+965506b59a2de0b6905c@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/6a315824.b0403584.28d0ff.0000.GAE@google.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
---
 net/ipv4/fib_frontend.c | 10 ++++++----
 net/ipv4/fib_rules.c    | 11 ++---------
 2 files changed, 8 insertions(+), 13 deletions(-)

diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index c7d1f31650d7..42212970d735 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -1612,10 +1612,6 @@ static void ip_fib_net_exit(struct net *net)
 			fib_free_table(tb);
 		}
 	}
-
-#ifdef CONFIG_IP_MULTIPLE_TABLES
-	fib4_rules_exit(net);
-#endif
 }
 
 static int __net_init fib_net_init(struct net *net)
@@ -1652,6 +1648,9 @@ static int __net_init fib_net_init(struct net *net)
 	ip_fib_net_exit(net);
 	rtnl_net_unlock(net);
 
+#ifdef CONFIG_IP_MULTIPLE_TABLES
+	fib4_rules_exit(net);
+#endif
 	kfree(net->ipv4.fib_table_hash);
 	fib4_notifier_exit(net);
 	goto out;
@@ -1671,6 +1670,9 @@ static void __net_exit fib_net_exit_rtnl(struct net *net,
 
 static void __net_exit fib_net_exit(struct net *net)
 {
+#ifdef CONFIG_IP_MULTIPLE_TABLES
+	fib4_rules_exit(net);
+#endif
 	kfree(net->ipv4.fib_table_hash);
 	fib4_notifier_exit(net);
 	fib4_semantics_exit(net);
diff --git a/net/ipv4/fib_rules.c b/net/ipv4/fib_rules.c
index 51f0193092f0..e068a5bace73 100644
--- a/net/ipv4/fib_rules.c
+++ b/net/ipv4/fib_rules.c
@@ -352,24 +352,17 @@ static int fib4_rule_configure(struct fib_rule *rule, struct sk_buff *skb,
 static int fib4_rule_delete(struct fib_rule *rule)
 {
 	struct net *net = rule->fr_net;
-	int err;
-
-	/* split local/main if they are not already split */
-	err = fib_unmerge(net);
-	if (err)
-		goto errout;
 
 #ifdef CONFIG_IP_ROUTE_CLASSID
 	if (((struct fib4_rule *)rule)->tclassid)
 		atomic_dec(&net->ipv4.fib_num_tclassid_users);
 #endif
-	net->ipv4.fib_has_custom_rules = true;
 
 	if (net->ipv4.fib_rules_require_fldissect &&
 	    fib_rule_requires_fldissect(rule))
 		net->ipv4.fib_rules_require_fldissect--;
-errout:
-	return err;
+
+	return 0;
 }
 
 static int fib4_rule_compare(struct fib_rule *rule, struct fib_rule_hdr *frh,
-- 
2.54.0.1136.gdb2ca164c4-goog


^ permalink raw reply related

* [PATCH 6.1] net: gro: don't merge zcopy skbs
From: Alexander Martyniuk @ 2026-06-16 22:00 UTC (permalink / raw)
  To: stable, Greg Kroah-Hartman
  Cc: Alexander Martyniuk, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Sasha Levin, Sabrina Dubroca,
	Hyunwoo Kim, Pavel Begunkov, netdev, linux-kernel, lvc-project,
	Huzaifa Sidhpurwala, Willem de Bruijn

From: Sabrina Dubroca <sd@queasysnail.net>

commit 4db79a322db8c97f7b73b8a347395ef4d685eb40 upstream.

skb_gro_receive() can currently copy frags between the source and GRO
skb, without checking the zerocopy status, and in particular the
SKBFL_MANAGED_FRAG_REFS flag.

When SKBFL_MANAGED_FRAG_REFS is set, the skb doesn't hold a reference
on the pages in shinfo->frags. Appending those frags to another skb's
frags without fixing up the page refcount can lead to UAF.

When either the last skb in the GRO chain (the one we would append
frags to) or the source skb is zerocopy, don't merge the skbs.

Fixes: 753f1ca4e1e5 ("net: introduce managed frags infrastructure")
Reported-by: Huzaifa Sidhpurwala <huzaifas@redhat.com>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/c3b7f906bbfcbdfd7b4fa9d6c18a438870df85be.1779307748.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Alexander Martyniuk <alexevgmart@gmail.com>
---
Backport fix for CVE-2026-46323
 net/core/gro.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/core/gro.c b/net/core/gro.c
index ea6571c01faa..c5a9733d929a 100644
--- a/net/core/gro.c
+++ b/net/core/gro.c
@@ -171,6 +171,9 @@ int skb_gro_receive(struct sk_buff *p, struct sk_buff *skb)
 	if (p->pp_recycle != skb->pp_recycle)
 		return -ETOOMANYREFS;
 
+	if (skb_zcopy(p) || skb_zcopy(skb))
+		return -ETOOMANYREFS;
+
 	/* pairs with WRITE_ONCE() in netif_set_gro_max_size() */
 	gro_max_size = READ_ONCE(p->dev->gro_max_size);
 
-- 
2.30.2


^ permalink raw reply related

* [PATCH] octeontx2-pf: Clear stats of all resources when freeing resources
From: Subbaraya Sundeep @ 2026-06-16 19:00 UTC (permalink / raw)
  To: andrew+netdev, davem, edumazet, kuba, pabeni, sgoutham, gakula,
	bbhushan2, rkannoth
  Cc: netdev, linux-kernel, Subbaraya Sundeep
In-Reply-To: <1781636420-19816-1-git-send-email-sbhatta@marvell.com>

When all MCS resources mapped to a PF are being freed then clear
stats of all those resources too.

Fixes: 815debbbf7b5 ("octeontx2-pf: mcs: Clear stats before freeing resource")
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
---
 drivers/net/ethernet/marvell/octeontx2/nic/cn10k_macsec.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/cn10k_macsec.c b/drivers/net/ethernet/marvell/octeontx2/nic/cn10k_macsec.c
index 4d3a7f4be962..9524d38f1582 100644
--- a/drivers/net/ethernet/marvell/octeontx2/nic/cn10k_macsec.c
+++ b/drivers/net/ethernet/marvell/octeontx2/nic/cn10k_macsec.c
@@ -182,6 +182,7 @@ static void cn10k_mcs_free_rsrc(struct otx2_nic *pfvf, enum mcs_direction dir,
 	clear_req->id = hw_rsrc_id;
 	clear_req->type = type;
 	clear_req->dir = dir;
+	clear_req->all = all;
 
 	req = otx2_mbox_alloc_msg_mcs_free_resources(mbox);
 	if (!req)
-- 
2.48.1


^ permalink raw reply related

* [net PATCH v2] octeontx2-af: mcs: Fix unsupported secy stats read
From: Subbaraya Sundeep @ 2026-06-16 19:00 UTC (permalink / raw)
  To: andrew+netdev, davem, edumazet, kuba, pabeni, sgoutham, gakula,
	bbhushan2, rkannoth
  Cc: netdev, linux-kernel, Subbaraya Sundeep

From: Geetha sowjanya <gakula@marvell.com>

Secy control stats counter doesn't exist for CNF10KB platform.
Skip reading this respective register for CNF10KB silicon while
fetching secy stats.

Fixes: 9312150af8da ("octeontx2-af: cn10k: mcs: Support for stats collection")
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
---
v2 changes:
 Fixed AI review by modifying debugfs also NOT to access
 Secy control stats counter

 drivers/net/ethernet/marvell/octeontx2/af/mcs.c         | 6 +++---
 drivers/net/ethernet/marvell/octeontx2/af/rvu_debugfs.c | 3 ++-
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/marvell/octeontx2/af/mcs.c b/drivers/net/ethernet/marvell/octeontx2/af/mcs.c
index c1775bd01c2b..a07e0b3d8d00 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/mcs.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/mcs.c
@@ -120,13 +120,13 @@ void mcs_get_rx_secy_stats(struct mcs *mcs, struct mcs_secy_stats *stats, int id
 	reg = MCSX_CSE_RX_MEM_SLAVE_INPKTSSECYUNTAGGEDX(id);
 	stats->pkt_untaged_cnt = mcs_reg_read(mcs, reg);
 
-	reg = MCSX_CSE_RX_MEM_SLAVE_INPKTSSECYCTLX(id);
-	stats->pkt_ctl_cnt = mcs_reg_read(mcs, reg);
-
 	if (mcs->hw->mcs_blks > 1) {
 		reg = MCSX_CSE_RX_MEM_SLAVE_INPKTSSECYNOTAGX(id);
 		stats->pkt_notag_cnt = mcs_reg_read(mcs, reg);
+		return;
 	}
+	reg = MCSX_CSE_RX_MEM_SLAVE_INPKTSSECYCTLX(id);
+	stats->pkt_ctl_cnt = mcs_reg_read(mcs, reg);
 }
 
 void mcs_get_flowid_stats(struct mcs *mcs, struct mcs_flowid_stats *stats,
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_debugfs.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_debugfs.c
index fa461489acdd..ca2704b188a5 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_debugfs.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_debugfs.c
@@ -482,10 +482,11 @@ static int rvu_dbg_mcs_rx_secy_stats_display(struct seq_file *filp, void *unused
 		seq_printf(filp, "secy%d: Tagged ctrl pkts: %lld\n", secy_id,
 			   stats.pkt_tagged_ctl_cnt);
 		seq_printf(filp, "secy%d: Untaged pkts: %lld\n", secy_id, stats.pkt_untaged_cnt);
-		seq_printf(filp, "secy%d: Ctrl pkts: %lld\n", secy_id, stats.pkt_ctl_cnt);
 		if (mcs->hw->mcs_blks > 1)
 			seq_printf(filp, "secy%d: pkts notag: %lld\n", secy_id,
 				   stats.pkt_notag_cnt);
+		else
+			seq_printf(filp, "secy%d: Ctrl pkts: %lld\n", secy_id, stats.pkt_ctl_cnt);
 	}
 	mutex_unlock(&mcs->stats_lock);
 	return 0;
-- 
2.48.1


^ permalink raw reply related

* [net PATCH v2] octeontx2-pf: mcs: Fix mcs resources free on PF shutdown
From: Subbaraya Sundeep @ 2026-06-16 19:00 UTC (permalink / raw)
  To: andrew+netdev, davem, edumazet, kuba, pabeni, sgoutham, gakula,
	bbhushan2, rkannoth
  Cc: netdev, linux-kernel, Subbaraya Sundeep
In-Reply-To: <1781636420-19816-1-git-send-email-sbhatta@marvell.com>

From: Geetha sowjanya <gakula@marvell.com>

On PF shutdown, the current driver free mcs hardware
resources though mcs resources are not allocated to it.
This patch checks the mcs resources status and if resources
are allocated then only sends mailbox message to free them.

Fixes: c54ffc73601c ("octeontx2-pf: mcs: Introduce MACSEC hardware offloading")
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
---
v2 changes:
 Fixed AI review so that pfvf->macsec_cfg is freed correctly

 .../net/ethernet/marvell/octeontx2/nic/cn10k_macsec.c    | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/cn10k_macsec.c b/drivers/net/ethernet/marvell/octeontx2/nic/cn10k_macsec.c
index 2cc1bdfd9b2e..4d3a7f4be962 100644
--- a/drivers/net/ethernet/marvell/octeontx2/nic/cn10k_macsec.c
+++ b/drivers/net/ethernet/marvell/octeontx2/nic/cn10k_macsec.c
@@ -1776,11 +1776,16 @@ int cn10k_mcs_init(struct otx2_nic *pfvf)
 
 void cn10k_mcs_free(struct otx2_nic *pfvf)
 {
+	struct cn10k_mcs_cfg *cfg = pfvf->macsec_cfg;
+
 	if (!test_bit(CN10K_HW_MACSEC, &pfvf->hw.cap_flag))
 		return;
 
-	cn10k_mcs_free_rsrc(pfvf, MCS_TX, MCS_RSRC_TYPE_SECY, 0, true);
-	cn10k_mcs_free_rsrc(pfvf, MCS_RX, MCS_RSRC_TYPE_SECY, 0, true);
+	if (!list_empty(&cfg->txsc_list)) {
+		cn10k_mcs_free_rsrc(pfvf, MCS_TX, MCS_RSRC_TYPE_SECY, 0, true);
+		cn10k_mcs_free_rsrc(pfvf, MCS_RX, MCS_RSRC_TYPE_SECY, 0, true);
+	}
+
 	kfree(pfvf->macsec_cfg);
 	pfvf->macsec_cfg = NULL;
 }
-- 
2.48.1


^ permalink raw reply related

* Re: [PATCH bpf] bpf, sockmap: fix lock inversion between stab->lock and sk_callback_lock
From: Sechang Lim @ 2026-06-16 18:40 UTC (permalink / raw)
  To: Jiayuan Chen
  Cc: John Fastabend, Jakub Sitnicki, Alexei Starovoitov,
	Daniel Borkmann, Eric Dumazet, Kuniyuki Iwashima, Paolo Abeni,
	Willem de Bruijn, David S . Miller, Jakub Kicinski, Simon Horman,
	netdev, bpf, linux-kernel
In-Reply-To: <575a878e-6d37-4337-a821-4883d3dd3a63@linux.dev>

On Tue, Jun 16, 2026 at 06:17:48PM +0800, Jiayuan Chen wrote:
>
>On 6/16/26 5:11 PM, Sechang Lim wrote:
>>sock_map_update_common() and __sock_map_delete() hold stab->lock and call
>>sock_map_unref() -> sock_map_del_link() under it. sock_map_del_link() takes
>>sk_callback_lock for write to stop the strparser and verdict, giving the
>>lock order stab->lock -> sk_callback_lock.
>>
>>The opposite order comes from an SK_SKB stream parser. On RX,
>>sk_psock_strp_data_ready() holds sk_callback_lock for read while running
>>the parser. The verdict redirects the skb to egress, where a sched_cls
>
>
>The commit message is wrong. A verdict does not redirect to egress
>synchronously — sk_psock_skb_redirect() only queues the skb and
>schedule_delayed_work()s sk_psock_backlog, so egress runs in workqueue
>context, not under sk_callback_lock.
>

Thanks, you're right. it's the inline ACK, not the redirect. Sorry for
the misleading changelog, I'll fix it in v2.

>
>>program calls bpf_map_delete_elem() on a sockmap, which takes stab->lock:
>>
>>   WARNING: possible circular locking dependency detected
>>   7.1.0-rc6 Not tainted
>>   ------------------------------------------------------
>>   syz.9.8824 is trying to acquire lock:
>>   (&stab->lock){+.-.}-{3:3}, at: __sock_map_delete net/core/sock_map.c:421
>>   but task is already holding lock:
>>   (clock-AF_INET){++.-}-{3:3}, at: sk_psock_strp_data_ready net/core/skmsg.c:1173
>>
>>   -> #1 (clock-AF_INET){++.-}-{3:3}:
>>          _raw_write_lock_bh
>>          sock_map_del_link net/core/sock_map.c:167
>>          sock_map_unref net/core/sock_map.c:184
>>          sock_map_update_common net/core/sock_map.c:509
>>          sock_map_update_elem_sys net/core/sock_map.c:588
>>          map_update_elem kernel/bpf/syscall.c:1805
>>
>>   -> #0 (&stab->lock){+.-.}-{3:3}:
>>          _raw_spin_lock_bh
>>          __sock_map_delete net/core/sock_map.c:421
>>          sock_map_delete_elem net/core/sock_map.c:452
>>          bpf_prog_06044d24140080b6
>>          tcx_run net/core/dev.c:4451
>>          sch_handle_egress net/core/dev.c:4541
>>          __dev_queue_xmit net/core/dev.c:4808
>>          ...
>>          tcp_bpf_strp_read_sock net/ipv4/tcp_bpf.c:701
>
>
>I guess it is an ACK. What is the actual purpose of a sched_cls 
>program calling
>
>sockmap delete on the TX path of an ACK? If there is no real use case 
>for it, this is
>
>just broken BPF usage, not a kernel bug worth this change.
>
>

I don't have a real use case for that exact program. But the verifier
allows sockmap delete from tc, and it deadlocks when the strparser's
socket is concurrently removed from the same map. The fix only moves
sock_map_unref() out from under stab->lock.

Best,
Sechang

^ permalink raw reply

* [PATCH nf-next v3 0/4] netfilter: replace u_int*_t with kernel int types
From: Carlos Grillet @ 2026-06-16 18:29 UTC (permalink / raw)
  To: Pablo Neira Ayuso, Florian Westphal, Phil Sutter
  Cc: netfilter-devel, coreteam, linux-kernel, netdev

Hi all! This is my first patch series of many, I hope :)
I'd like to start contributing by helping out with janitor work,
standardizing code and cleaning up.

This patch series replaces POSIX u_int8_t/u_int16_t with the preferred
kernel types u8/u16 across several netfilter files.

u_int*_t appears in many other files, but I wanted to keep this series
small, unless advised otherwise.

No functional changes.

Changes in v3:
- dropping changes to nf_log and xt_DSCP (need deeper understanding of the
  subsystem before converting these correctly)
- link to v2: https://lore.kernel.org/all/20260615133835.51273-1-carlos@carlosgrillet.me

Changes in v2:
- addresses sashiko comments https://sashiko.dev/#/patchset/32368
  - nf_sockopt: update function prototypes and struct definitions
  - nf_log: update the corresponding function declarations and the
    nf_logfn typedef
- link to v1: https://lore.kernel.org/all/20260612125146.75672-1-carlos@carlosgrillet.me

Carlos Grillet (4):
  netfilter: nf_nat_ftp: replace u_int16_t with u16
  netfilter: nf_nat_irc: replace u_int16_t with u16
  netfilter: nf_sockopt: replace u_int8_t with u8
  netfilter: xt_TCPOPTSTRIP: replace u_int8_t and u_int16_t with u8 and u16

 include/linux/netfilter.h      | 6 +++---
 net/netfilter/nf_nat_ftp.c     | 2 +-
 net/netfilter/nf_nat_irc.c     | 2 +-
 net/netfilter/nf_sockopt.c     | 8 ++++----
 net/netfilter/xt_TCPOPTSTRIP.c | 8 ++++----
 5 files changed, 13 insertions(+), 13 deletions(-)

-- 
2.54.0


^ permalink raw reply

* [PATCH nf-next v3 2/4] netfilter: nf_nat_irc: replace u_int16_t with u16
From: Carlos Grillet @ 2026-06-16 18:29 UTC (permalink / raw)
  To: Pablo Neira Ayuso, Florian Westphal, Phil Sutter
  Cc: netfilter-devel, coreteam, netdev, linux-kernel
In-Reply-To: <20260616182948.96865-1-carlos@carlosgrillet.me>

Replace POSIX u_int16_t with preferred kernel type u16

No functional changes.

Signed-off-by: Carlos Grillet <carlos@carlosgrillet.me>
---
 net/netfilter/nf_nat_irc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netfilter/nf_nat_irc.c b/net/netfilter/nf_nat_irc.c
index 19c4fcc60c50..14b79cb0171b 100644
--- a/net/netfilter/nf_nat_irc.c
+++ b/net/netfilter/nf_nat_irc.c
@@ -39,7 +39,7 @@ static unsigned int help(struct sk_buff *skb,
 	char buffer[sizeof("4294967296 65635")];
 	struct nf_conn *ct = exp->master;
 	union nf_inet_addr newaddr;
-	u_int16_t port;
+	u16 port;
 
 	/* Reply comes from server. */
 	newaddr = ct->tuplehash[IP_CT_DIR_REPLY].tuple.dst.u3;
-- 
2.54.0


^ permalink raw reply related

* [PATCH nf-next v3 4/4] netfilter: xt_TCPOPTSTRIP: replace u_int8_t and u_int16_t with u8 and u16
From: Carlos Grillet @ 2026-06-16 18:29 UTC (permalink / raw)
  To: Pablo Neira Ayuso, Florian Westphal, Phil Sutter
  Cc: netfilter-devel, coreteam, netdev, linux-kernel
In-Reply-To: <20260616182948.96865-1-carlos@carlosgrillet.me>

Replace POSIX u_int8_t/u_int16_t with preferred kernel types u8/u16

No functional changes.

Signed-off-by: Carlos Grillet <carlos@carlosgrillet.me>
---
 net/netfilter/xt_TCPOPTSTRIP.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/netfilter/xt_TCPOPTSTRIP.c b/net/netfilter/xt_TCPOPTSTRIP.c
index 93f064306901..265d21697847 100644
--- a/net/netfilter/xt_TCPOPTSTRIP.c
+++ b/net/netfilter/xt_TCPOPTSTRIP.c
@@ -16,7 +16,7 @@
 #include <linux/netfilter/x_tables.h>
 #include <linux/netfilter/xt_TCPOPTSTRIP.h>
 
-static inline unsigned int optlen(const u_int8_t *opt, unsigned int offset)
+static inline unsigned int optlen(const u8 *opt, unsigned int offset)
 {
 	/* Beware zero-length options: make finite progress */
 	if (opt[offset] <= TCPOPT_NOP || opt[offset+1] == 0)
@@ -33,8 +33,8 @@ tcpoptstrip_mangle_packet(struct sk_buff *skb,
 	const struct xt_tcpoptstrip_target_info *info = par->targinfo;
 	struct tcphdr *tcph, _th;
 	unsigned int optl, i, j;
-	u_int16_t n, o;
-	u_int8_t *opt;
+	u16 n, o;
+	u8 *opt;
 	int tcp_hdrlen;
 
 	/* This is a fragment, no TCP header is available */
@@ -97,7 +97,7 @@ tcpoptstrip_tg6(struct sk_buff *skb, const struct xt_action_param *par)
 {
 	struct ipv6hdr *ipv6h = ipv6_hdr(skb);
 	int tcphoff;
-	u_int8_t nexthdr;
+	u8 nexthdr;
 	__be16 frag_off;
 
 	nexthdr = ipv6h->nexthdr;
-- 
2.54.0


^ permalink raw reply related

* [PATCH nf-next v3 3/4] netfilter: nf_sockopt: replace u_int8_t with u8
From: Carlos Grillet @ 2026-06-16 18:29 UTC (permalink / raw)
  To: Pablo Neira Ayuso, Florian Westphal, Phil Sutter
  Cc: netfilter-devel, coreteam, linux-kernel, netdev
In-Reply-To: <20260616182948.96865-1-carlos@carlosgrillet.me>

Replace POSIX u_int8_t with preferred kernel type u8, update prototype
and struct definition.

No functional changes.

Signed-off-by: Carlos Grillet <carlos@carlosgrillet.me>
---
 include/linux/netfilter.h  | 6 +++---
 net/netfilter/nf_sockopt.c | 8 ++++----
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/include/linux/netfilter.h b/include/linux/netfilter.h
index efbbfa770d66..91b68bdba3f5 100644
--- a/include/linux/netfilter.h
+++ b/include/linux/netfilter.h
@@ -181,7 +181,7 @@ static inline void nf_hook_state_init(struct nf_hook_state *p,
 struct nf_sockopt_ops {
 	struct list_head list;
 
-	u_int8_t pf;
+	u8 pf;
 
 	/* Non-inclusive ranges: use 0/0/NULL to never get called. */
 	int set_optmin;
@@ -357,9 +357,9 @@ NF_HOOK_LIST(uint8_t pf, unsigned int hook, struct net *net, struct sock *sk,
 }
 
 /* Call setsockopt() */
-int nf_setsockopt(struct sock *sk, u_int8_t pf, int optval, sockptr_t opt,
+int nf_setsockopt(struct sock *sk, u8 pf, int optval, sockptr_t opt,
 		  unsigned int len);
-int nf_getsockopt(struct sock *sk, u_int8_t pf, int optval, char __user *opt,
+int nf_getsockopt(struct sock *sk, u8 pf, int optval, char __user *opt,
 		  int *len);
 
 struct flowi;
diff --git a/net/netfilter/nf_sockopt.c b/net/netfilter/nf_sockopt.c
index 34afcd03b6f6..19a1d028158c 100644
--- a/net/netfilter/nf_sockopt.c
+++ b/net/netfilter/nf_sockopt.c
@@ -59,8 +59,8 @@ void nf_unregister_sockopt(struct nf_sockopt_ops *reg)
 }
 EXPORT_SYMBOL(nf_unregister_sockopt);
 
-static struct nf_sockopt_ops *nf_sockopt_find(struct sock *sk, u_int8_t pf,
-		int val, int get)
+static struct nf_sockopt_ops *nf_sockopt_find(struct sock *sk, u8 pf,
+					      int val, int get)
 {
 	struct nf_sockopt_ops *ops;
 
@@ -89,7 +89,7 @@ static struct nf_sockopt_ops *nf_sockopt_find(struct sock *sk, u_int8_t pf,
 	return ops;
 }
 
-int nf_setsockopt(struct sock *sk, u_int8_t pf, int val, sockptr_t opt,
+int nf_setsockopt(struct sock *sk, u8 pf, int val, sockptr_t opt,
 		  unsigned int len)
 {
 	struct nf_sockopt_ops *ops;
@@ -104,7 +104,7 @@ int nf_setsockopt(struct sock *sk, u_int8_t pf, int val, sockptr_t opt,
 }
 EXPORT_SYMBOL(nf_setsockopt);
 
-int nf_getsockopt(struct sock *sk, u_int8_t pf, int val, char __user *opt,
+int nf_getsockopt(struct sock *sk, u8 pf, int val, char __user *opt,
 		  int *len)
 {
 	struct nf_sockopt_ops *ops;
-- 
2.54.0


^ permalink raw reply related

* [PATCH nf-next v3 1/4] netfilter: nf_nat_ftp: replace u_int16_t with u16
From: Carlos Grillet @ 2026-06-16 18:29 UTC (permalink / raw)
  To: Pablo Neira Ayuso, Florian Westphal, Phil Sutter
  Cc: netfilter-devel, coreteam, netdev, linux-kernel
In-Reply-To: <20260616182948.96865-1-carlos@carlosgrillet.me>

Use preferred kernel integer type u16 instead of the POSIX u_int16_t
variant.

No functional change.

Signed-off-by: Carlos Grillet <carlos@carlosgrillet.me>
---
 net/netfilter/nf_nat_ftp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netfilter/nf_nat_ftp.c b/net/netfilter/nf_nat_ftp.c
index c92a436d9c48..ab714629e2b1 100644
--- a/net/netfilter/nf_nat_ftp.c
+++ b/net/netfilter/nf_nat_ftp.c
@@ -69,7 +69,7 @@ static unsigned int nf_nat_ftp(struct sk_buff *skb,
 			       struct nf_conntrack_expect *exp)
 {
 	union nf_inet_addr newaddr;
-	u_int16_t port;
+	u16 port;
 	int dir = CTINFO2DIR(ctinfo);
 	struct nf_conn *ct = exp->master;
 	char buffer[sizeof("|1||65535|") + INET6_ADDRSTRLEN];
-- 
2.54.0


^ permalink raw reply related

* Re: [syzbot] [net?] KASAN: slab-use-after-free Read in fib_rules_lookup
From: Kuniyuki Iwashima @ 2026-06-16 17:59 UTC (permalink / raw)
  To: kuniyu
  Cc: davem, dsahern, edumazet, horms, idosch, kuba, linux-kernel,
	netdev, pabeni, syzbot+965506b59a2de0b6905c, syzkaller-bugs
In-Reply-To: <CAAVpQUB8W6nXOq-OQfSArKC_xzFbQ=dg62Ee3R=0nuX0sW0fMg@mail.gmail.com>

From: Kuniyuki Iwashima <kuniyu@google.com>
Date: Tue, 16 Jun 2026 10:06:55 -0700
> On Tue, Jun 16, 2026 at 8:55 AM Eric Dumazet <edumazet@google.com> wrote:
> >
> > On Tue, Jun 16, 2026 at 8:31 AM Ido Schimmel <idosch@nvidia.com> wrote:
> > >
> > > On Tue, Jun 16, 2026 at 07:05:24AM -0700, syzbot wrote:
> > > > Hello,
> > > >
> > > > syzbot found the following issue on:
> > > >
> > > > HEAD commit:    72dfa4700f78 net: dsa: sja1105: fix lastused timestamp in ..
> > >
> > > This includes commit 759923cf03b0 ("ipv4: fib: Convert
> > > fib_net_exit_batch() to ->exit_rtnl().") that moved ip_fib_net_exit()
> > > (and therefore fib4_rules_exit()) earlier in the netns dismantle path.
> > >
> > > Kuniyuki, can you please take a look?
> > >
> > > You can use this to reproduce:
> > >
> > > #!/bin/bash
> > >
> > > while true; do
> > >         ip netns add ns1
> > >         ip -n ns1 link set dev lo up
> > >         ip -n ns1 address add 192.0.2.1/24 dev lo
> > >         ip -n ns1 link add name dummy1 up type dummy
> > >         ip -n ns1 address add 198.51.100.1/24 dev dummy1
> > >         ip -n ns1 rule add ipproto tcp sport 12345 table 12345
> > >         ip -n ns1 fou add port 5555 ipproto 47 local 192.0.2.1 peer 198.51.100.2 peer_port 54321
> > >         ip netns del ns1
> > > done
> > >
> >
> > Oh right.
> >
> > While looking at this syzbot report I also found an old issue.
> >
> > https://lore.kernel.org/netdev/20260616141317.407791-1-edumazet@google.com/T/#u
> >
> > I guess adding some delays in enqueue_to_backlog() could trigger a
> > similar bug even if we revert Kuniyuki's patch.
> 
> I'll look into it, thank you both !

I'll move fib4_rules_exit() to ->exit().

fib_unmerge() requires RTNL, but it is not needed in ->delete()
in the first place since it's already called in ->configure().

---8<---
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index c7d1f31650d7..42212970d735 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -1612,10 +1612,6 @@ static void ip_fib_net_exit(struct net *net)
 			fib_free_table(tb);
 		}
 	}
-
-#ifdef CONFIG_IP_MULTIPLE_TABLES
-	fib4_rules_exit(net);
-#endif
 }
 
 static int __net_init fib_net_init(struct net *net)
@@ -1652,6 +1648,9 @@ static int __net_init fib_net_init(struct net *net)
 	ip_fib_net_exit(net);
 	rtnl_net_unlock(net);
 
+#ifdef CONFIG_IP_MULTIPLE_TABLES
+	fib4_rules_exit(net);
+#endif
 	kfree(net->ipv4.fib_table_hash);
 	fib4_notifier_exit(net);
 	goto out;
@@ -1671,6 +1670,9 @@ static void __net_exit fib_net_exit_rtnl(struct net *net,
 
 static void __net_exit fib_net_exit(struct net *net)
 {
+#ifdef CONFIG_IP_MULTIPLE_TABLES
+	fib4_rules_exit(net);
+#endif
 	kfree(net->ipv4.fib_table_hash);
 	fib4_notifier_exit(net);
 	fib4_semantics_exit(net);
diff --git a/net/ipv4/fib_rules.c b/net/ipv4/fib_rules.c
index 51f0193092f0..0bf6204468c5 100644
--- a/net/ipv4/fib_rules.c
+++ b/net/ipv4/fib_rules.c
@@ -352,12 +352,6 @@ static int fib4_rule_configure(struct fib_rule *rule, struct sk_buff *skb,
 static int fib4_rule_delete(struct fib_rule *rule)
 {
 	struct net *net = rule->fr_net;
-	int err;
-
-	/* split local/main if they are not already split */
-	err = fib_unmerge(net);
-	if (err)
-		goto errout;
 
 #ifdef CONFIG_IP_ROUTE_CLASSID
 	if (((struct fib4_rule *)rule)->tclassid)
@@ -368,8 +362,8 @@ static int fib4_rule_delete(struct fib_rule *rule)
 	if (net->ipv4.fib_rules_require_fldissect &&
 	    fib_rule_requires_fldissect(rule))
 		net->ipv4.fib_rules_require_fldissect--;
-errout:
-	return err;
+
+	return 0;
 }
 
 static int fib4_rule_compare(struct fib_rule *rule, struct fib_rule_hdr *frh,
---8<---



> 
> >
> >
> >
> >
> > > Thanks
> > >
> > > > git tree:       net-next
> > > > console output: https://syzkaller.appspot.com/x/log.txt?x=15794bd2580000
> > > > kernel config:  https://syzkaller.appspot.com/x/.config?x=a0842261b62cdea8
> > > > dashboard link: https://syzkaller.appspot.com/bug?extid=965506b59a2de0b6905c
> > > > compiler:       Debian clang version 22.1.6 (++20260514074242+fc4aad7b5db3-1~exp1~20260514074407.73), Debian LLD 22.1.6
> > > >
> > > > Unfortunately, I don't have any reproducer for this issue yet.
> > > >
> > > > Downloadable assets:
> > > > disk image: https://storage.googleapis.com/syzbot-assets/d4e16f50a97c/disk-72dfa470.raw.xz
> > > > vmlinux: https://storage.googleapis.com/syzbot-assets/6cd4a736e796/vmlinux-72dfa470.xz
> > > > kernel image: https://storage.googleapis.com/syzbot-assets/548b0011c8e8/bzImage-72dfa470.xz
> > > >
> > > > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > > > Reported-by: syzbot+965506b59a2de0b6905c@syzkaller.appspotmail.com
> > > >
> > > > bond0 (unregistering): Released all slaves
> > > > bond1 (unregistering): Released all slaves
> > > > bond2 (unregistering): (slave dummy0): Releasing active interface
> > > > bond2 (unregistering): Released all slaves
> > > > ==================================================================
> > > > BUG: KASAN: slab-use-after-free in fib_rules_lookup+0x15e/0xeb0 net/core/fib_rules.c:321
> > > > Read of size 8 at addr ffff88804ec4c680 by task kworker/u8:21/12641
> > > >
> > > > CPU: 0 UID: 0 PID: 12641 Comm: kworker/u8:21 Not tainted syzkaller #0 PREEMPT(full)
> > > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/09/2026
> > > > Workqueue: netns cleanup_net
> > > > Call Trace:
> > > >  <TASK>
> > > >  dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120
> > > >  print_address_description+0x55/0x1e0 mm/kasan/report.c:378
> > > >  print_report+0x58/0x70 mm/kasan/report.c:482
> > > >  kasan_report+0x117/0x150 mm/kasan/report.c:595
> > > >  fib_rules_lookup+0x15e/0xeb0 net/core/fib_rules.c:321
> > > >  __fib_lookup+0x106/0x210 net/ipv4/fib_rules.c:96
> > > >  ip_route_output_key_hash_rcu+0x294/0x2720 net/ipv4/route.c:2811
> > > >  ip_route_output_key_hash+0x18d/0x2a0 net/ipv4/route.c:2702
> > > >  __ip_route_output_key include/net/route.h:169 [inline]
> > > >  ip_route_output_flow+0x2a/0x150 net/ipv4/route.c:2929
> > > >  ip4_datagram_release_cb+0x89d/0xbe0 net/ipv4/datagram.c:118
> > > >  release_sock+0x206/0x260 net/core/sock.c:3861
> > > >  inet_shutdown+0x2b1/0x390 net/ipv4/af_inet.c:950
> > > >  udp_tunnel_sock_release+0x6d/0x80 net/ipv4/udp_tunnel_core.c:197
> > > >  fou_release net/ipv4/fou_core.c:562 [inline]
> > > >  fou_exit_net+0x17d/0x1f0 net/ipv4/fou_core.c:1230
> > > >  ops_exit_list net/core/net_namespace.c:199 [inline]
> > > >  ops_undo_list+0x43d/0x8d0 net/core/net_namespace.c:252
> > > >  cleanup_net+0x572/0x810 net/core/net_namespace.c:702
> > > >  process_one_work kernel/workqueue.c:3314 [inline]
> > > >  process_scheduled_works+0xa8e/0x14e0 kernel/workqueue.c:3397
> > > >  worker_thread+0xa47/0xfb0 kernel/workqueue.c:3478
> > > >  kthread+0x389/0x470 kernel/kthread.c:436
> > > >  ret_from_fork+0x514/0xb70 arch/x86/kernel/process.c:158
> > > >  ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
> > > >  </TASK>
> > > >
> > > > Allocated by task 19121:
> > > >  kasan_save_stack mm/kasan/common.c:57 [inline]
> > > >  kasan_save_track+0x3e/0x80 mm/kasan/common.c:78
> > > >  poison_kmalloc_redzone mm/kasan/common.c:398 [inline]
> > > >  __kasan_kmalloc+0x93/0xb0 mm/kasan/common.c:415
> > > >  kasan_kmalloc include/linux/kasan.h:263 [inline]
> > > >  __do_kmalloc_node mm/slub.c:5296 [inline]
> > > >  __kmalloc_node_track_caller_noprof+0x4d7/0x7b0 mm/slub.c:5408
> > > >  kmemdup_noprof+0x2b/0x70 mm/util.c:138
> > > >  kmemdup_noprof include/linux/fortify-string.h:763 [inline]
> > > >  fib_rules_register+0x2f/0x400 net/core/fib_rules.c:170
> > > >  fib4_rules_init+0x21/0x160 net/ipv4/fib_rules.c:508
> > > >  ip_fib_net_init net/ipv4/fib_frontend.c:1578 [inline]
> > > >  fib_net_init+0x17a/0x3e0 net/ipv4/fib_frontend.c:1628
> > > >  ops_init+0x35d/0x5d0 net/core/net_namespace.c:137
> > > >  setup_net+0x118/0x350 net/core/net_namespace.c:446
> > > >  copy_net_ns+0x4f9/0x720 net/core/net_namespace.c:579
> > > >  create_new_namespaces+0x3f0/0x6b0 kernel/nsproxy.c:132
> > > >  unshare_nsproxy_namespaces+0x149/0x190 kernel/nsproxy.c:234
> > > >  ksys_unshare+0x57d/0xa00 kernel/fork.c:3242
> > > >  __do_sys_unshare kernel/fork.c:3316 [inline]
> > > >  __se_sys_unshare kernel/fork.c:3314 [inline]
> > > >  __x64_sys_unshare+0x38/0x50 kernel/fork.c:3314
> > > >  do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
> > > >  do_syscall_64+0x174/0x580 arch/x86/entry/syscall_64.c:94
> > > >  entry_SYSCALL_64_after_hwframe+0x77/0x7f
> > > >
> > > > Freed by task 12641:
> > > >  kasan_save_stack mm/kasan/common.c:57 [inline]
> > > >  kasan_save_track+0x3e/0x80 mm/kasan/common.c:78
> > > >  kasan_save_free_info+0x40/0x50 mm/kasan/generic.c:584
> > > >  poison_slab_object mm/kasan/common.c:253 [inline]
> > > >  __kasan_slab_free+0x5c/0x80 mm/kasan/common.c:285
> > > >  kasan_slab_free include/linux/kasan.h:235 [inline]
> > > >  slab_free_hook mm/slub.c:2689 [inline]
> > > >  __rcu_free_sheaf_prepare+0x12d/0x2a0 mm/slub.c:2940
> > > >  rcu_free_sheaf+0x31/0x200 mm/slub.c:5850
> > > >  rcu_do_batch kernel/rcu/tree.c:2617 [inline]
> > > >  rcu_core+0x78b/0x10a0 kernel/rcu/tree.c:2869
> > > >  handle_softirqs+0x225/0x840 kernel/softirq.c:622
> > > >  do_softirq+0x76/0xd0 kernel/softirq.c:523
> > > >  __local_bh_enable_ip+0xf8/0x130 kernel/softirq.c:450
> > > >  unregister_netdevice_many_notify+0x1874/0x2150 net/core/dev.c:12445
> > > >  ops_exit_rtnl_list net/core/net_namespace.c:187 [inline]
> > > >  ops_undo_list+0x391/0x8d0 net/core/net_namespace.c:248
> > > >  cleanup_net+0x572/0x810 net/core/net_namespace.c:702
> > > >  process_one_work kernel/workqueue.c:3314 [inline]
> > > >  process_scheduled_works+0xa8e/0x14e0 kernel/workqueue.c:3397
> > > >  worker_thread+0xa47/0xfb0 kernel/workqueue.c:3478
> > > >  kthread+0x389/0x470 kernel/kthread.c:436
> > > >  ret_from_fork+0x514/0xb70 arch/x86/kernel/process.c:158
> > > >  ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
> > > >
> > > > The buggy address belongs to the object at ffff88804ec4c600
> > > >  which belongs to the cache kmalloc-192 of size 192
> > > > The buggy address is located 128 bytes inside of
> > > >  freed 192-byte region [ffff88804ec4c600, ffff88804ec4c6c0)
> > > >
> > > > The buggy address belongs to the physical page:
> > > > page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x4ec4c
> > > > flags: 0xfff00000000000(node=0|zone=1|lastcpupid=0x7ff)
> > > > page_type: f5(slab)
> > > > raw: 00fff00000000000 ffff88813fe163c0 dead000000000100 dead000000000122
> > > > raw: 0000000000000000 0000000800100010 00000000f5000000 0000000000000000
> > > > page dumped because: kasan: bad access detected
> > > > page_owner tracks the page as allocated
> > > > page last allocated via order 0, migratetype Unmovable, gfp_mask 0xd2cc0(GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 13856, tgid 13853 (syz.3.2144), ts 351172300879, free_ts 351133053454
> > > >  set_page_owner include/linux/page_owner.h:32 [inline]
> > > >  post_alloc_hook+0x22d/0x280 mm/page_alloc.c:1853
> > > >  prep_new_page mm/page_alloc.c:1861 [inline]
> > > >  get_page_from_freelist+0x24ae/0x2530 mm/page_alloc.c:3941
> > > >  __alloc_frozen_pages_noprof+0x18d/0x380 mm/page_alloc.c:5221
> > > >  alloc_slab_page mm/slub.c:3278 [inline]
> > > >  allocate_slab+0x77/0x660 mm/slub.c:3467
> > > >  new_slab mm/slub.c:3525 [inline]
> > > >  refill_objects+0x336/0x3d0 mm/slub.c:7272
> > > >  refill_sheaf mm/slub.c:2816 [inline]
> > > >  __pcs_replace_empty_main+0x320/0x720 mm/slub.c:4652
> > > >  alloc_from_pcs mm/slub.c:4750 [inline]
> > > >  slab_alloc_node mm/slub.c:4884 [inline]
> > > >  __do_kmalloc_node mm/slub.c:5295 [inline]
> > > >  __kmalloc_noprof+0x464/0x750 mm/slub.c:5308
> > > >  kmalloc_noprof include/linux/slab.h:954 [inline]
> > > >  kzalloc_noprof include/linux/slab.h:1188 [inline]
> > > >  new_dir fs/proc/proc_sysctl.c:966 [inline]
> > > >  get_subdir fs/proc/proc_sysctl.c:1010 [inline]
> > > >  sysctl_mkdir_p fs/proc/proc_sysctl.c:1320 [inline]
> > > >  __register_sysctl_table+0xc02/0x1370 fs/proc/proc_sysctl.c:1395
> > > >  neigh_sysctl_register+0x9b1/0xa90 net/core/neighbour.c:3915
> > > >  addrconf_sysctl_register+0xb3/0x1c0 net/ipv6/addrconf.c:7396
> > > >  ipv6_add_dev+0xd26/0x13a0 net/ipv6/addrconf.c:460
> > > >  addrconf_notify+0x771/0x1050 net/ipv6/addrconf.c:3679
> > > >  notifier_call_chain+0x1a5/0x3d0 kernel/notifier.c:85
> > > >  call_netdevice_notifiers_extack net/core/dev.c:2288 [inline]
> > > >  call_netdevice_notifiers net/core/dev.c:2302 [inline]
> > > >  register_netdevice+0x18db/0x1f00 net/core/dev.c:11474
> > > >  macsec_newlink+0x706/0x1200 drivers/net/macsec.c:4218
> > > >  rtnl_newlink_create+0x310/0xb00 net/core/rtnetlink.c:3905
> > > > page last free pid 12657 tgid 12657 stack trace:
> > > >  reset_page_owner include/linux/page_owner.h:25 [inline]
> > > >  __free_pages_prepare mm/page_alloc.c:1397 [inline]
> > > >  __free_frozen_pages+0xc0d/0xd20 mm/page_alloc.c:2938
> > > >  __tlb_remove_table_free mm/mmu_gather.c:228 [inline]
> > > >  tlb_remove_table_rcu+0x85/0x100 mm/mmu_gather.c:291
> > > >  rcu_do_batch kernel/rcu/tree.c:2617 [inline]
> > > >  rcu_core+0x78b/0x10a0 kernel/rcu/tree.c:2869
> > > >  handle_softirqs+0x225/0x840 kernel/softirq.c:622
> > > >  __do_softirq kernel/softirq.c:656 [inline]
> > > >  invoke_softirq kernel/softirq.c:496 [inline]
> > > >  __irq_exit_rcu+0xca/0x220 kernel/softirq.c:735
> > > >  irq_exit_rcu+0x9/0x30 kernel/softirq.c:752
> > > >  instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1061 [inline]
> > > >  sysvec_apic_timer_interrupt+0xa6/0xc0 arch/x86/kernel/apic/apic.c:1061
> > > >  asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:697
> > > >
> > > > Memory state around the buggy address:
> > > >  ffff88804ec4c580: 00 00 00 fc fc fc fc fc fc fc fc fc fc fc fc fc
> > > >  ffff88804ec4c600: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> > > > >ffff88804ec4c680: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
> > > >                    ^
> > > >  ffff88804ec4c700: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > >  ffff88804ec4c780: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc
> > > > ==================================================================
> > > >
> > > >
> > > > ---
> > > > This report is generated by a bot. It may contain errors.
> > > > See https://goo.gl/tpsmEJ for more information about syzbot.
> > > > syzbot engineers can be reached at syzkaller@googlegroups.com.
> > > >
> > > > syzbot will keep track of this issue. See:
> > > > https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
> > > >
> > > > If the report is already addressed, let syzbot know by replying with:
> > > > #syz fix: exact-commit-title
> > > >
> > > > If you want to overwrite report's subsystems, reply with:
> > > > #syz set subsystems: new-subsystem
> > > > (See the list of subsystem names on the web dashboard)
> > > >
> > > > If the report is a duplicate of another one, reply with:
> > > > #syz dup: exact-subject-of-another-report
> > > >
> > > > If you want to undo deduplication, reply with:
> > > > #syz undup
> 

^ permalink raw reply related

* [PATCH net v2] tipc: free bearer discoverer via RCU to fix tipc_disc_rcv UAF
From: Samuel Page @ 2026-06-16 17:53 UTC (permalink / raw)
  To: Jon Maloy
  Cc: Tung Quang Nguyen, David S . Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, netdev, tipc-discussion, linux-kernel,
	Samuel Page, stable

bearer_disable() tears down a bearer's discovery object with
tipc_disc_delete(), which frees the struct tipc_discoverer with a plain,
synchronous kfree(). The discovery receive path, however, still reads
that object under RCU in softirq context:

  tipc_udp_recv()            // udp_media.c, rcu_dereference(ub->bearer)
    -> tipc_rcv()            // node.c
      -> tipc_disc_rcv()     // discover.c
        -> tipc_disc_addr_trial_msg(b->disc, ...)  // reads d->net etc.

tipc_udp_recv() only gates this path on test_bit(0, &b->up), which is a
TOCTOU check: an RX softirq that observes b->up == 1 before
bearer_disable() does clear_bit_unlock(0, &b->up) can still be executing
inside tipc_disc_rcv() when bearer_disable() reaches

	if (b->disc)
		tipc_disc_delete(b->disc);

and kfree()s the discoverer. The reader then dereferences freed memory
(d->net, inlined into tipc_disc_rcv()) in softirq context [0].

The bearer itself is freed RCU-safely (tipc_bearer_put() ->
kfree_rcu(b, rcu)) because the RX path runs under RCU, but the discoverer
hanging off b->disc is freed synchronously. The same b->disc is also
touched under rcu_read_lock() by
tipc_disc_add_dest()/tipc_disc_remove_dest().

Free the discoverer with the same RCU lifetime as its bearer. Add an
rcu_head to struct tipc_discoverer and defer the kfree_skb()/kfree() to
an RCU callback so any in-flight reader that already loaded b->disc
completes before the memory is released. The timer is still shut down
synchronously up front with timer_shutdown_sync() (which can sleep and
must not run from the RCU callback), and shutting it down before the
grace period prevents the periodic LINK_REQUEST timer from rearming or
re-entering the object.

This mirrors the existing TIPC pattern of pairing call_rcu() with a
cleanup callback (see tipc_node_free()/tipc_aead_free()).

[0]: (trailing page/memory-state dump trimmed)
BUG: KASAN: slab-use-after-free in tipc_disc_addr_trial_msg net/tipc/discover.c:149 [inline]
BUG: KASAN: slab-use-after-free in tipc_disc_rcv+0xe7c/0x103c net/tipc/discover.c:236
Read of size 8 at addr ffff000028f07428 by task ksoftirqd/0/15

CPU: 0 UID: 0 PID: 15 Comm: ksoftirqd/0 Not tainted 7.0.11 #3 PREEMPT
Hardware name: linux,dummy-virt (DT)
Call trace:
 show_stack+0x2c/0x3c arch/arm64/kernel/stacktrace.c:499 (C)
 __dump_stack lib/dump_stack.c:94 [inline]
 dump_stack_lvl+0xb4/0xd4 lib/dump_stack.c:120
 print_address_description mm/kasan/report.c:378 [inline]
 print_report+0x118/0x5d8 mm/kasan/report.c:482
 kasan_report+0xb0/0xf4 mm/kasan/report.c:595
 __asan_report_load8_noabort+0x20/0x2c mm/kasan/report_generic.c:381
 tipc_disc_addr_trial_msg net/tipc/discover.c:149 [inline]
 tipc_disc_rcv+0xe7c/0x103c net/tipc/discover.c:236
 tipc_rcv+0x1884/0x2b1c net/tipc/node.c:2126
 tipc_udp_recv+0x22c/0x684 net/tipc/udp_media.c:393
 udp_queue_rcv_one_skb+0x898/0x1798 net/ipv4/udp.c:2441
 udp_queue_rcv_skb+0x1b0/0xa44 net/ipv4/udp.c:2518
 udp_unicast_rcv_skb+0x13c/0x348 net/ipv4/udp.c:2678
 __udp4_lib_rcv+0x1aec/0x246c net/ipv4/udp.c:2754
 udp_rcv+0x78/0xa0 net/ipv4/udp.c:2936
 ip_protocol_deliver_rcu+0x68/0x410 net/ipv4/ip_input.c:207
 ip_local_deliver_finish+0x28c/0x4b4 net/ipv4/ip_input.c:241
 NF_HOOK include/linux/netfilter.h:318 [inline]
 NF_HOOK include/linux/netfilter.h:312 [inline]
 ip_local_deliver+0x29c/0x2ec net/ipv4/ip_input.c:262
 dst_input include/net/dst.h:480 [inline]
 ip_rcv_finish net/ipv4/ip_input.c:453 [inline]
 ip_rcv_finish net/ipv4/ip_input.c:439 [inline]
 NF_HOOK include/linux/netfilter.h:318 [inline]
 NF_HOOK include/linux/netfilter.h:312 [inline]
 ip_rcv+0x21c/0x258 net/ipv4/ip_input.c:573
 __netif_receive_skb_one_core+0x110/0x184 net/core/dev.c:6195
 __netif_receive_skb+0x2c/0x170 net/core/dev.c:6308
 process_backlog+0x178/0x488 net/core/dev.c:6659
 __napi_poll+0xa8/0x540 net/core/dev.c:7726
 napi_poll net/core/dev.c:7789 [inline]
 net_rx_action+0x360/0x964 net/core/dev.c:7946
 handle_softirqs+0x2f0/0x7b0 kernel/softirq.c:622
 run_ksoftirqd kernel/softirq.c:1063 [inline]
 run_ksoftirqd+0x6c/0x88 kernel/softirq.c:1055
 smpboot_thread_fn+0x65c/0x958 kernel/smpboot.c:160
 kthread+0x39c/0x444 kernel/kthread.c:436
 ret_from_fork+0x10/0x20 arch/arm64/kernel/entry.S:860

Allocated by task 68873:
 kasan_save_stack+0x3c/0x64 mm/kasan/common.c:57
 kasan_save_track+0x20/0x3c mm/kasan/common.c:78
 kasan_save_alloc_info+0x40/0x54 mm/kasan/generic.c:570
 poison_kmalloc_redzone mm/kasan/common.c:398 [inline]
 __kasan_kmalloc+0xd4/0xd8 mm/kasan/common.c:415
 kasan_kmalloc include/linux/kasan.h:263 [inline]
 __kmalloc_cache_noprof+0x1b0/0x458 mm/slub.c:5385
 kmalloc_noprof include/linux/slab.h:950 [inline]
 tipc_disc_create+0xdc/0x5e0 net/tipc/discover.c:356
 tipc_enable_bearer+0x8b8/0xf94 net/tipc/bearer.c:348
 __tipc_nl_bearer_enable+0x2a8/0x398 net/tipc/bearer.c:1047
 tipc_nl_bearer_enable+0x2c/0x48 net/tipc/bearer.c:1056
 genl_family_rcv_msg_doit+0x1e4/0x2c0 net/netlink/genetlink.c:1114
 genl_family_rcv_msg net/netlink/genetlink.c:1194 [inline]
 genl_rcv_msg+0x4e8/0x750 net/netlink/genetlink.c:1209
 netlink_rcv_skb+0x204/0x3cc net/netlink/af_netlink.c:2550
 genl_rcv+0x3c/0x54 net/netlink/genetlink.c:1218
 netlink_unicast_kernel net/netlink/af_netlink.c:1318 [inline]
 netlink_unicast+0x638/0x930 net/netlink/af_netlink.c:1344
 netlink_sendmsg+0x798/0xc68 net/netlink/af_netlink.c:1894
 sock_sendmsg_nosec net/socket.c:727 [inline]
 __sock_sendmsg+0xe0/0x128 net/socket.c:742
 __sys_sendto+0x230/0x2f4 net/socket.c:2206
 __do_sys_sendto net/socket.c:2213 [inline]
 __se_sys_sendto net/socket.c:2209 [inline]
 __arm64_sys_sendto+0xc4/0x13c net/socket.c:2209
 __invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
 invoke_syscall+0x84/0x2a8 arch/arm64/kernel/syscall.c:49
 el0_svc_common.constprop.0+0xe4/0x294 arch/arm64/kernel/syscall.c:132
 do_el0_svc+0x44/0x5c arch/arm64/kernel/syscall.c:151
 el0_svc+0x38/0xac arch/arm64/kernel/entry-common.c:724
 el0t_64_sync_handler+0xa0/0xe4 arch/arm64/kernel/entry-common.c:743
 el0t_64_sync+0x198/0x19c arch/arm64/kernel/entry.S:596

Freed by task 60072:
 kasan_save_stack+0x3c/0x64 mm/kasan/common.c:57
 kasan_save_track+0x20/0x3c mm/kasan/common.c:78
 kasan_save_free_info+0x4c/0x74 mm/kasan/generic.c:584
 poison_slab_object mm/kasan/common.c:253 [inline]
 __kasan_slab_free+0x88/0xb8 mm/kasan/common.c:285
 kasan_slab_free include/linux/kasan.h:235 [inline]
 slab_free_hook mm/slub.c:2685 [inline]
 slab_free mm/slub.c:6170 [inline]
 kfree+0x14c/0x458 mm/slub.c:6488
 tipc_disc_delete+0x50/0x68 net/tipc/discover.c:393
 bearer_disable+0x18c/0x278 net/tipc/bearer.c:418
 tipc_bearer_stop+0xe0/0x198 net/tipc/bearer.c:757
 tipc_net_stop+0x110/0x178 net/tipc/net.c:159
 tipc_exit_net+0x80/0x19c net/tipc/core.c:112
 ops_exit_list net/core/net_namespace.c:199 [inline]
 ops_undo_list+0x244/0x694 net/core/net_namespace.c:252
 cleanup_net+0x3a0/0x830 net/core/net_namespace.c:702
 process_one_work+0x628/0xd38 kernel/workqueue.c:3289
 process_scheduled_works kernel/workqueue.c:3372 [inline]
 worker_thread+0x7a8/0xac0 kernel/workqueue.c:3453
 kthread+0x39c/0x444 kernel/kthread.c:436
 ret_from_fork+0x10/0x20 arch/arm64/kernel/entry.S:860

Fixes: 25b0b9c4e835 ("tipc: handle collisions of 32-bit node address hash values")
Cc: stable@vger.kernel.org
Assisted-by: Bynario AI
Signed-off-by: Samuel Page <sam@bynar.io>
---
v2:
 - Wrap the over-80-column container_of() line in tipc_disc_free_rcu()
   to fix the coding-style issue raised in review.

v1: https://lore.kernel.org/netdev/20260615144233.1730935-1-sam@bynar.io/

 net/tipc/discover.c | 17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/net/tipc/discover.c b/net/tipc/discover.c
index 3e54d2df5683..761b625bba5a 100644
--- a/net/tipc/discover.c
+++ b/net/tipc/discover.c
@@ -49,6 +49,7 @@
 
 /**
  * struct tipc_discoverer - information about an ongoing link setup request
+ * @rcu: RCU head used to free the structure after a grace period
  * @bearer_id: identity of bearer issuing requests
  * @net: network namespace instance
  * @dest: destination address for request messages
@@ -60,6 +61,7 @@
  * @timer_intv: current interval between requests (in ms)
  */
 struct tipc_discoverer {
+	struct rcu_head rcu;
 	u32 bearer_id;
 	struct tipc_media_addr dest;
 	struct net *net;
@@ -382,6 +384,18 @@ int tipc_disc_create(struct net *net, struct tipc_bearer *b,
 	return 0;
 }
 
+/* RCU callback: free the discoverer only after any concurrent
+ * tipc_disc_rcv() softirq reader of bearer->disc has finished.
+ */
+static void tipc_disc_free_rcu(struct rcu_head *rp)
+{
+	struct tipc_discoverer *d;
+
+	d = container_of(rp, struct tipc_discoverer, rcu);
+	kfree_skb(d->skb);
+	kfree(d);
+}
+
 /**
  * tipc_disc_delete - destroy object sending periodic link setup requests
  * @d: ptr to link dest structure
@@ -389,8 +403,7 @@ int tipc_disc_create(struct net *net, struct tipc_bearer *b,
 void tipc_disc_delete(struct tipc_discoverer *d)
 {
 	timer_shutdown_sync(&d->timer);
-	kfree_skb(d->skb);
-	kfree(d);
+	call_rcu(&d->rcu, tipc_disc_free_rcu);
 }
 
 /**

base-commit: 47186409c092cd7dd70350999186c700233e854d
-- 
2.54.0


^ permalink raw reply related

* [PATCH net] net: thunderbolt: Fix frags[] overflow by bounding frame_count
From: Maoyi Xie @ 2026-06-16 17:38 UTC (permalink / raw)
  To: Mika Westerberg, Yehezkel Bernat, Andrew Lunn, Jakub Kicinski,
	Paolo Abeni
  Cc: David S. Miller, Eric Dumazet, netdev, linux-kernel

tbnet_poll() assembles a multi-frame ThunderboltIP packet into one skb. The
first frame goes into the skb linear area and every further frame is added as
a page fragment.

	skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags,
			page, hdr_size, frame_size,
			TBNET_RX_PAGE_SIZE - hdr_size);

A packet of frame_count frames therefore ends up with frame_count - 1
fragments. tbnet_check_frame() only bounds the peer supplied frame_count to
TBNET_RING_SIZE / 4 (64), which is far above MAX_SKB_FRAGS (17 by default). A
peer that sends a packet of 19 or more small frames pushes nr_frags past
MAX_SKB_FRAGS, so skb_add_rx_frag() writes past skb_shinfo()->frags[] and
corrupts memory after the shared info.

Tighten the start of packet bound to MAX_SKB_FRAGS + 1 so a packet can never
produce more fragments than frags[] can hold. This matches the recent skb
frags overflow fixes in other receive paths, for example f0813bcd2d9d ("net:
wwan: t7xx: fix potential skb->frags overflow in RX path") and 600dc40554dc
("net: usb: cdc-phonet: fix skb frags[] overflow in rx_complete()").

Fixes: e69b6c02b4c3 ("net: Add support for networking over Thunderbolt cable")
Cc: stable@vger.kernel.org
Signed-off-by: Maoyi Xie <maoyixie.tju@gmail.com>
---
Mika preferred the bound in tbnet_check_frame() over the nr_frags <
MAX_SKB_FRAGS guard in tbnet_poll() that I first floated on the list, so this
rejects the oversized packet up front. Reproduced under KASAN with a harness
that mirrors the per-frame skb_add_rx_frag() loop.

 drivers/net/thunderbolt/main.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/net/thunderbolt/main.c b/drivers/net/thunderbolt/main.c
index 7aae5d915a1e..ac016890646c 100644
--- a/drivers/net/thunderbolt/main.c
+++ b/drivers/net/thunderbolt/main.c
@@ -787,8 +787,12 @@ static bool tbnet_check_frame(struct tbnet *net, const struct tbnet_frame *tf,
 		return true;
 	}
 
-	/* Start of packet, validate the frame header */
-	if (frame_count == 0 || frame_count > TBNET_RING_SIZE / 4) {
+	/* Start of packet, validate the frame header. tbnet_poll() puts the
+	 * first frame in the skb linear area and every further frame in a page
+	 * fragment, so a packet may not span more than MAX_SKB_FRAGS + 1 frames
+	 * without overflowing skb_shinfo()->frags[].
+	 */
+	if (frame_count == 0 || frame_count > MAX_SKB_FRAGS + 1) {
 		net->stats.rx_length_errors++;
 		return false;
 	}
-- 
2.34.1


^ permalink raw reply related

* Re: [PATCH net-next V3 2/7] netdevsim: Register devlink after device init
From: Mark Bloch @ 2026-06-16 17:29 UTC (permalink / raw)
  To: Jakub Kicinski, Jiri Pirko
  Cc: Eric Dumazet, Paolo Abeni, Andrew Lunn, David S. Miller,
	Jonathan Corbet, Shuah Khan, Jiri Pirko, Simon Horman,
	Sunil Goutham, Linu Cherian, Geetha sowjanya, hariprasad,
	Subbaraya Sundeep, Bharat Bhushan, Saeed Mahameed,
	Leon Romanovsky, Tariq Toukan, Ethan Nelson-Moore, linux-doc,
	netdev, linux-rdma
In-Reply-To: <f266dfa5-0c6c-4be0-b73e-b2185dadd6a7@nvidia.com>



On 11/06/2026 20:43, Mark Bloch wrote:
> 
> 
> On 11/06/2026 18:54, Jakub Kicinski wrote:
>> On Thu, 11 Jun 2026 09:02:03 +0300 Mark Bloch wrote:
>>> On 11/06/2026 2:50, Jakub Kicinski wrote:
>>>> On Fri, 5 Jun 2026 21:10:25 +0300 Mark Bloch wrote:  
>>>>> devl_register() makes the devlink instance visible to userspace. A later
>>>>> patch also makes registration the point where devlink core may call
>>>>> eswitch_mode_set() to apply a boot-time default eswitch mode.
>>>>>
>>>>> Move netdevsim registration after all objects (resources, params, regions,
>>>>> traps, debugfs etc) are initialized, and after the initial eswitch mode is
>>>>> set to legacy.
>>>>>
>>>>> Move devl_unregister() to the beginning of nsim_drv_remove(), before those
>>>>> devlink objects are torn down. This keeps devlink register/unregister as
>>>>> the notification barrier and makes the later object teardown paths run
>>>>> after devlink is no longer registered, so they do not emit their own
>>>>> netlink DEL notifications.  
>>>>
>>>> This is going backwards. At some point someone from nVidia thought that
>>>> we can order our way out of locking, so mlx5 is likely ordered this way,
>>>> but this must not be required, or in any way normalized.
>>>> We (syzbot) quickly discovered that it doesn't cover all corner cases.
>>>> devl_lock() is exposed specifically to allow the driver to finish
>>>> whatever init it needs without letting user space invoke callbacks, yet.
>>>> Almost (?) all driver callbacks hold devl_lock(), so maybe the devlink
>>>> instance is "visible" to user space but that should not matter.  
>>>
>>> Let me clarify.
>>>
>>> No locking is changed here, and I don't want to make register/unregister
>>> ordering a substitute for devl_lock().
>>>
>>> The only requirement I have for this series is that devl_register() is called
>>> only once the driver is ready for devlink core to call eswitch_mode_set().
>>> That follows from the earlier direction to have the core apply the default
>>> mode from devl_register() instead of adding an explicit driver call.
>>
>> This is exactly what I'm objecting to. AFAIU we are trading off
>> explicit call to get the default value for an implicit behavior
>> depending on order of calls. We want to optimize for how easy it
>> is to get the API wrong, not for LoC.
> 
> Right, the reason I moved in this direction is that in v1 I had
> the explicit driver call, and Jiri asked to make this transparent
> from devlink core instead.
> 
>>
>> If we don't have a clean way to implement this without driver
>> changes let's add the explicit API to get the default value.
>> If driver doesn't call it schedule a work to go via the callback
>> once devl_lock() is dropped. That way drivers which care can optimize
>> themselves by reading the default value upfront. Drivers which don't 
>> care will work correctly, and there's no API call order trap.
> 
> The workqueue fallback is possible, but I think it makes the semantics
> more complicated.
> 
> We would need to track devlink instances which still need the default
> applied, and the worker would have to skip/remove them once handled.
> 
> More importantly, the worker can race with userspace setting the
> eswitch mode, so we would also need some state to tell whether the user
> already changed the mode. That feels more fragile than an explicit
> driver call.
> 
>>
>> Not ideal, but isn't that best we can do here?
>> I still have flashbacks of the fallout from the call ordering games, 
>> we have too many drivers to keep this straight...
> 
> That's why I started with the explicit call in the first place.
> 
> I can switch back to this model: drivers which support boot time eswitch
> defaults will opt in and call the helper once they are ready. This keeps
> the support explicit per driver and avoids making it depend on where
> devl_register() happens in the init path.
> 
> With that, devlink can tell at register time whether the instance supports
> boot time eswitch defaults. If the user configured a default for an instance
> whose driver did not opt in, devlink can write to dmesg from
> devl_register().
> 
> Not perfect, but at least the user gets a visible failure instead of the
> config being silently ignored.
> 
> Mark

Jakub, Jiri, any thoughts?

I think the explicit helper is the cleanest option here, without any
workqueue fallback inside devlink. It avoids depending on devl_register()
ordering, and makes the support explicit per driver.

Does that sound like an acceptable direction?

Mark

> 
>>
>>> So if the objection is to the commit message wording, I can fix that and drop
>>> the "notification barrier" language.
>>>
>>> For unregister, I can probably leave the old ordering as-is. I moved it only
>>> to mirror the register path, which felt cleaner, but it is not required for
>>> the default-mode change and as the lock is held I see no issue with doing
>>> that.
> 
> 


^ permalink raw reply

* Re: [PATCH bpf v2 1/2] bpf: Fix partial copy of non-linear test_run output
From: sun jian @ 2026-06-16 17:16 UTC (permalink / raw)
  To: Paul Chaignon
  Cc: bpf, netdev, linux-kselftest, linux-kernel, ast, daniel, andrii,
	martin.lau, eddyz87, memxor, song, yonghong.song, jolsa, davem,
	edumazet, kuba, pabeni, horms, shuah, hawk, john.fastabend, sdf,
	toke, lorenzo
In-Reply-To: <ajFQvedGURQuKqbX@mail.gmail.com>

On Tue, Jun 16, 2026 at 9:33 PM Paul Chaignon <paul.chaignon@gmail.com> wrote:
>
> On Tue, Jun 16, 2026 at 05:31:02PM +0800, Sun Jian wrote:
> > For non-linear test_run output, bpf_test_finish() derives the linear
> > data copy length from copy_size - frag_size. This only matches the
> > linear data length when copy_size is the full packet size.
> >
> > When userspace provides a short data_out buffer, copy_size is clamped to
> > that buffer size. If copy_size is smaller than frag_size, the computed
> > length becomes negative and bpf_test_finish() returns -ENOSPC before
> > copying the packet prefix or updating data_size_out.
> >
> > Compute the linear data length from the packet layout instead, and clamp
> > the linear copy length to copy_size. This preserves the expected
> > partial-copy semantics: return -ENOSPC, copy the packet prefix that fits
> > in data_out, and report the full packet length through data_size_out.
> >
> > Fixes: 7855e0db150ad ("bpf: test_run: add xdp_shared_info pointer in bpf_test_finish signature")
> > Signed-off-by: Sun Jian <sun.jian.kdev@gmail.com>
> > ---
> >  net/bpf/test_run.c | 11 ++++-------
> >  1 file changed, 4 insertions(+), 7 deletions(-)
> >
> > diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
> > index 2bc04feadfab..976e8fa31bc9 100644
> > --- a/net/bpf/test_run.c
> > +++ b/net/bpf/test_run.c
> > @@ -453,19 +453,16 @@ static int bpf_test_finish(const union bpf_attr *kattr,
> >       }
> >
> >       if (data_out) {
> > -             int len = sinfo ? copy_size - frag_size : copy_size;
> > -
> > -             if (len < 0) {
> > -                     err = -ENOSPC;
> > -                     goto out;
> > -             }
> > +             u32 head_len = size - frag_size;
> > +             u32 len = min(copy_size, head_len);
> >
> >               if (copy_to_user(data_out, data, len))
> >                       goto out;
> >
> >               if (sinfo) {
> > -                     int i, offset = len;
> > +                     u32 offset = len;
> >                       u32 data_len;
> > +                     int i;
>
> That doesn't look needed.
>
> >
> >                       for (i = 0; i < sinfo->nr_frags; i++) {
> >                               skb_frag_t *frag = &sinfo->frags[i];
> > --
> > 2.43.0
> >

Hi Paul,

Thanks for taking another look.

Agreed, I'll keep the fix patch minimal and leave offset as-is.

For the selftest patch, I'll try to reuse pkt_v4 and the existing TC
program where possible, and keep only the minimal XDP frags program for the
XDP case.

Thanks,
Sun Jian

^ permalink raw reply

* Re: [syzbot] [net?] KASAN: slab-use-after-free Read in fib_rules_lookup
From: Kuniyuki Iwashima @ 2026-06-16 17:06 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Ido Schimmel, syzbot, davem, dsahern, horms, kuba, linux-kernel,
	netdev, pabeni, syzkaller-bugs
In-Reply-To: <CANn89iJ7S1op9FJeaEqdR0KDiPu08PbFP7CqJ8NLVRgcPt370A@mail.gmail.com>

On Tue, Jun 16, 2026 at 8:55 AM Eric Dumazet <edumazet@google.com> wrote:
>
> On Tue, Jun 16, 2026 at 8:31 AM Ido Schimmel <idosch@nvidia.com> wrote:
> >
> > On Tue, Jun 16, 2026 at 07:05:24AM -0700, syzbot wrote:
> > > Hello,
> > >
> > > syzbot found the following issue on:
> > >
> > > HEAD commit:    72dfa4700f78 net: dsa: sja1105: fix lastused timestamp in ..
> >
> > This includes commit 759923cf03b0 ("ipv4: fib: Convert
> > fib_net_exit_batch() to ->exit_rtnl().") that moved ip_fib_net_exit()
> > (and therefore fib4_rules_exit()) earlier in the netns dismantle path.
> >
> > Kuniyuki, can you please take a look?
> >
> > You can use this to reproduce:
> >
> > #!/bin/bash
> >
> > while true; do
> >         ip netns add ns1
> >         ip -n ns1 link set dev lo up
> >         ip -n ns1 address add 192.0.2.1/24 dev lo
> >         ip -n ns1 link add name dummy1 up type dummy
> >         ip -n ns1 address add 198.51.100.1/24 dev dummy1
> >         ip -n ns1 rule add ipproto tcp sport 12345 table 12345
> >         ip -n ns1 fou add port 5555 ipproto 47 local 192.0.2.1 peer 198.51.100.2 peer_port 54321
> >         ip netns del ns1
> > done
> >
>
> Oh right.
>
> While looking at this syzbot report I also found an old issue.
>
> https://lore.kernel.org/netdev/20260616141317.407791-1-edumazet@google.com/T/#u
>
> I guess adding some delays in enqueue_to_backlog() could trigger a
> similar bug even if we revert Kuniyuki's patch.

I'll look into it, thank you both !

>
>
>
>
> > Thanks
> >
> > > git tree:       net-next
> > > console output: https://syzkaller.appspot.com/x/log.txt?x=15794bd2580000
> > > kernel config:  https://syzkaller.appspot.com/x/.config?x=a0842261b62cdea8
> > > dashboard link: https://syzkaller.appspot.com/bug?extid=965506b59a2de0b6905c
> > > compiler:       Debian clang version 22.1.6 (++20260514074242+fc4aad7b5db3-1~exp1~20260514074407.73), Debian LLD 22.1.6
> > >
> > > Unfortunately, I don't have any reproducer for this issue yet.
> > >
> > > Downloadable assets:
> > > disk image: https://storage.googleapis.com/syzbot-assets/d4e16f50a97c/disk-72dfa470.raw.xz
> > > vmlinux: https://storage.googleapis.com/syzbot-assets/6cd4a736e796/vmlinux-72dfa470.xz
> > > kernel image: https://storage.googleapis.com/syzbot-assets/548b0011c8e8/bzImage-72dfa470.xz
> > >
> > > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > > Reported-by: syzbot+965506b59a2de0b6905c@syzkaller.appspotmail.com
> > >
> > > bond0 (unregistering): Released all slaves
> > > bond1 (unregistering): Released all slaves
> > > bond2 (unregistering): (slave dummy0): Releasing active interface
> > > bond2 (unregistering): Released all slaves
> > > ==================================================================
> > > BUG: KASAN: slab-use-after-free in fib_rules_lookup+0x15e/0xeb0 net/core/fib_rules.c:321
> > > Read of size 8 at addr ffff88804ec4c680 by task kworker/u8:21/12641
> > >
> > > CPU: 0 UID: 0 PID: 12641 Comm: kworker/u8:21 Not tainted syzkaller #0 PREEMPT(full)
> > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/09/2026
> > > Workqueue: netns cleanup_net
> > > Call Trace:
> > >  <TASK>
> > >  dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120
> > >  print_address_description+0x55/0x1e0 mm/kasan/report.c:378
> > >  print_report+0x58/0x70 mm/kasan/report.c:482
> > >  kasan_report+0x117/0x150 mm/kasan/report.c:595
> > >  fib_rules_lookup+0x15e/0xeb0 net/core/fib_rules.c:321
> > >  __fib_lookup+0x106/0x210 net/ipv4/fib_rules.c:96
> > >  ip_route_output_key_hash_rcu+0x294/0x2720 net/ipv4/route.c:2811
> > >  ip_route_output_key_hash+0x18d/0x2a0 net/ipv4/route.c:2702
> > >  __ip_route_output_key include/net/route.h:169 [inline]
> > >  ip_route_output_flow+0x2a/0x150 net/ipv4/route.c:2929
> > >  ip4_datagram_release_cb+0x89d/0xbe0 net/ipv4/datagram.c:118
> > >  release_sock+0x206/0x260 net/core/sock.c:3861
> > >  inet_shutdown+0x2b1/0x390 net/ipv4/af_inet.c:950
> > >  udp_tunnel_sock_release+0x6d/0x80 net/ipv4/udp_tunnel_core.c:197
> > >  fou_release net/ipv4/fou_core.c:562 [inline]
> > >  fou_exit_net+0x17d/0x1f0 net/ipv4/fou_core.c:1230
> > >  ops_exit_list net/core/net_namespace.c:199 [inline]
> > >  ops_undo_list+0x43d/0x8d0 net/core/net_namespace.c:252
> > >  cleanup_net+0x572/0x810 net/core/net_namespace.c:702
> > >  process_one_work kernel/workqueue.c:3314 [inline]
> > >  process_scheduled_works+0xa8e/0x14e0 kernel/workqueue.c:3397
> > >  worker_thread+0xa47/0xfb0 kernel/workqueue.c:3478
> > >  kthread+0x389/0x470 kernel/kthread.c:436
> > >  ret_from_fork+0x514/0xb70 arch/x86/kernel/process.c:158
> > >  ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
> > >  </TASK>
> > >
> > > Allocated by task 19121:
> > >  kasan_save_stack mm/kasan/common.c:57 [inline]
> > >  kasan_save_track+0x3e/0x80 mm/kasan/common.c:78
> > >  poison_kmalloc_redzone mm/kasan/common.c:398 [inline]
> > >  __kasan_kmalloc+0x93/0xb0 mm/kasan/common.c:415
> > >  kasan_kmalloc include/linux/kasan.h:263 [inline]
> > >  __do_kmalloc_node mm/slub.c:5296 [inline]
> > >  __kmalloc_node_track_caller_noprof+0x4d7/0x7b0 mm/slub.c:5408
> > >  kmemdup_noprof+0x2b/0x70 mm/util.c:138
> > >  kmemdup_noprof include/linux/fortify-string.h:763 [inline]
> > >  fib_rules_register+0x2f/0x400 net/core/fib_rules.c:170
> > >  fib4_rules_init+0x21/0x160 net/ipv4/fib_rules.c:508
> > >  ip_fib_net_init net/ipv4/fib_frontend.c:1578 [inline]
> > >  fib_net_init+0x17a/0x3e0 net/ipv4/fib_frontend.c:1628
> > >  ops_init+0x35d/0x5d0 net/core/net_namespace.c:137
> > >  setup_net+0x118/0x350 net/core/net_namespace.c:446
> > >  copy_net_ns+0x4f9/0x720 net/core/net_namespace.c:579
> > >  create_new_namespaces+0x3f0/0x6b0 kernel/nsproxy.c:132
> > >  unshare_nsproxy_namespaces+0x149/0x190 kernel/nsproxy.c:234
> > >  ksys_unshare+0x57d/0xa00 kernel/fork.c:3242
> > >  __do_sys_unshare kernel/fork.c:3316 [inline]
> > >  __se_sys_unshare kernel/fork.c:3314 [inline]
> > >  __x64_sys_unshare+0x38/0x50 kernel/fork.c:3314
> > >  do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
> > >  do_syscall_64+0x174/0x580 arch/x86/entry/syscall_64.c:94
> > >  entry_SYSCALL_64_after_hwframe+0x77/0x7f
> > >
> > > Freed by task 12641:
> > >  kasan_save_stack mm/kasan/common.c:57 [inline]
> > >  kasan_save_track+0x3e/0x80 mm/kasan/common.c:78
> > >  kasan_save_free_info+0x40/0x50 mm/kasan/generic.c:584
> > >  poison_slab_object mm/kasan/common.c:253 [inline]
> > >  __kasan_slab_free+0x5c/0x80 mm/kasan/common.c:285
> > >  kasan_slab_free include/linux/kasan.h:235 [inline]
> > >  slab_free_hook mm/slub.c:2689 [inline]
> > >  __rcu_free_sheaf_prepare+0x12d/0x2a0 mm/slub.c:2940
> > >  rcu_free_sheaf+0x31/0x200 mm/slub.c:5850
> > >  rcu_do_batch kernel/rcu/tree.c:2617 [inline]
> > >  rcu_core+0x78b/0x10a0 kernel/rcu/tree.c:2869
> > >  handle_softirqs+0x225/0x840 kernel/softirq.c:622
> > >  do_softirq+0x76/0xd0 kernel/softirq.c:523
> > >  __local_bh_enable_ip+0xf8/0x130 kernel/softirq.c:450
> > >  unregister_netdevice_many_notify+0x1874/0x2150 net/core/dev.c:12445
> > >  ops_exit_rtnl_list net/core/net_namespace.c:187 [inline]
> > >  ops_undo_list+0x391/0x8d0 net/core/net_namespace.c:248
> > >  cleanup_net+0x572/0x810 net/core/net_namespace.c:702
> > >  process_one_work kernel/workqueue.c:3314 [inline]
> > >  process_scheduled_works+0xa8e/0x14e0 kernel/workqueue.c:3397
> > >  worker_thread+0xa47/0xfb0 kernel/workqueue.c:3478
> > >  kthread+0x389/0x470 kernel/kthread.c:436
> > >  ret_from_fork+0x514/0xb70 arch/x86/kernel/process.c:158
> > >  ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
> > >
> > > The buggy address belongs to the object at ffff88804ec4c600
> > >  which belongs to the cache kmalloc-192 of size 192
> > > The buggy address is located 128 bytes inside of
> > >  freed 192-byte region [ffff88804ec4c600, ffff88804ec4c6c0)
> > >
> > > The buggy address belongs to the physical page:
> > > page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x4ec4c
> > > flags: 0xfff00000000000(node=0|zone=1|lastcpupid=0x7ff)
> > > page_type: f5(slab)
> > > raw: 00fff00000000000 ffff88813fe163c0 dead000000000100 dead000000000122
> > > raw: 0000000000000000 0000000800100010 00000000f5000000 0000000000000000
> > > page dumped because: kasan: bad access detected
> > > page_owner tracks the page as allocated
> > > page last allocated via order 0, migratetype Unmovable, gfp_mask 0xd2cc0(GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 13856, tgid 13853 (syz.3.2144), ts 351172300879, free_ts 351133053454
> > >  set_page_owner include/linux/page_owner.h:32 [inline]
> > >  post_alloc_hook+0x22d/0x280 mm/page_alloc.c:1853
> > >  prep_new_page mm/page_alloc.c:1861 [inline]
> > >  get_page_from_freelist+0x24ae/0x2530 mm/page_alloc.c:3941
> > >  __alloc_frozen_pages_noprof+0x18d/0x380 mm/page_alloc.c:5221
> > >  alloc_slab_page mm/slub.c:3278 [inline]
> > >  allocate_slab+0x77/0x660 mm/slub.c:3467
> > >  new_slab mm/slub.c:3525 [inline]
> > >  refill_objects+0x336/0x3d0 mm/slub.c:7272
> > >  refill_sheaf mm/slub.c:2816 [inline]
> > >  __pcs_replace_empty_main+0x320/0x720 mm/slub.c:4652
> > >  alloc_from_pcs mm/slub.c:4750 [inline]
> > >  slab_alloc_node mm/slub.c:4884 [inline]
> > >  __do_kmalloc_node mm/slub.c:5295 [inline]
> > >  __kmalloc_noprof+0x464/0x750 mm/slub.c:5308
> > >  kmalloc_noprof include/linux/slab.h:954 [inline]
> > >  kzalloc_noprof include/linux/slab.h:1188 [inline]
> > >  new_dir fs/proc/proc_sysctl.c:966 [inline]
> > >  get_subdir fs/proc/proc_sysctl.c:1010 [inline]
> > >  sysctl_mkdir_p fs/proc/proc_sysctl.c:1320 [inline]
> > >  __register_sysctl_table+0xc02/0x1370 fs/proc/proc_sysctl.c:1395
> > >  neigh_sysctl_register+0x9b1/0xa90 net/core/neighbour.c:3915
> > >  addrconf_sysctl_register+0xb3/0x1c0 net/ipv6/addrconf.c:7396
> > >  ipv6_add_dev+0xd26/0x13a0 net/ipv6/addrconf.c:460
> > >  addrconf_notify+0x771/0x1050 net/ipv6/addrconf.c:3679
> > >  notifier_call_chain+0x1a5/0x3d0 kernel/notifier.c:85
> > >  call_netdevice_notifiers_extack net/core/dev.c:2288 [inline]
> > >  call_netdevice_notifiers net/core/dev.c:2302 [inline]
> > >  register_netdevice+0x18db/0x1f00 net/core/dev.c:11474
> > >  macsec_newlink+0x706/0x1200 drivers/net/macsec.c:4218
> > >  rtnl_newlink_create+0x310/0xb00 net/core/rtnetlink.c:3905
> > > page last free pid 12657 tgid 12657 stack trace:
> > >  reset_page_owner include/linux/page_owner.h:25 [inline]
> > >  __free_pages_prepare mm/page_alloc.c:1397 [inline]
> > >  __free_frozen_pages+0xc0d/0xd20 mm/page_alloc.c:2938
> > >  __tlb_remove_table_free mm/mmu_gather.c:228 [inline]
> > >  tlb_remove_table_rcu+0x85/0x100 mm/mmu_gather.c:291
> > >  rcu_do_batch kernel/rcu/tree.c:2617 [inline]
> > >  rcu_core+0x78b/0x10a0 kernel/rcu/tree.c:2869
> > >  handle_softirqs+0x225/0x840 kernel/softirq.c:622
> > >  __do_softirq kernel/softirq.c:656 [inline]
> > >  invoke_softirq kernel/softirq.c:496 [inline]
> > >  __irq_exit_rcu+0xca/0x220 kernel/softirq.c:735
> > >  irq_exit_rcu+0x9/0x30 kernel/softirq.c:752
> > >  instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1061 [inline]
> > >  sysvec_apic_timer_interrupt+0xa6/0xc0 arch/x86/kernel/apic/apic.c:1061
> > >  asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:697
> > >
> > > Memory state around the buggy address:
> > >  ffff88804ec4c580: 00 00 00 fc fc fc fc fc fc fc fc fc fc fc fc fc
> > >  ffff88804ec4c600: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> > > >ffff88804ec4c680: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
> > >                    ^
> > >  ffff88804ec4c700: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > >  ffff88804ec4c780: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc
> > > ==================================================================
> > >
> > >
> > > ---
> > > This report is generated by a bot. It may contain errors.
> > > See https://goo.gl/tpsmEJ for more information about syzbot.
> > > syzbot engineers can be reached at syzkaller@googlegroups.com.
> > >
> > > syzbot will keep track of this issue. See:
> > > https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
> > >
> > > If the report is already addressed, let syzbot know by replying with:
> > > #syz fix: exact-commit-title
> > >
> > > If you want to overwrite report's subsystems, reply with:
> > > #syz set subsystems: new-subsystem
> > > (See the list of subsystem names on the web dashboard)
> > >
> > > If the report is a duplicate of another one, reply with:
> > > #syz dup: exact-subject-of-another-report
> > >
> > > If you want to undo deduplication, reply with:
> > > #syz undup

^ permalink raw reply

* Re: [PATCH net] netpoll: run NAPI poll in softirq context to avoid rq->lock self-deadlock
From: Peter Zijlstra @ 2026-06-16 17:02 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Jakub Kicinski, Petr Mladek, John Ogness, Sergey Senozhatsky,
	Vlad Poenaru, Thomas Gleixner, netdev, David S . Miller,
	Eric Dumazet, Paolo Abeni, Simon Horman, Breno Leitao,
	Clark Williams, Steven Rostedt, linux-rt-devel, linux-kernel,
	stable, Frederic Weisbecker, Ingo Molnar, Vincent Guittot,
	Dietmar Eggemann, K Prateek Nayak
In-Reply-To: <20260616103529.Yh9Dxsjp@linutronix.de>

On Tue, Jun 16, 2026 at 12:35:29PM +0200, Sebastian Andrzej Siewior wrote:

> So this is not an issue since commit 7eab73b18630e ("netconsole: convert
> to NBCON console infrastructure"). Because from here now on writes are
> deferred to the nbcon thread. So this purely about -stable in this case.

Hmm, I thought netconsole had some reserved skbs and could to writes
'atomic' like? That said, it was 2.6 era the last time I looked at
netconsole.

> Now. The scheduler usually does printk_deferred() because of the rq lock
> so it does not deadlock for various reasons. It is kind of a pity that
> the various WARN macros don't do that.

People have tried, last time was here:

  https://lkml.kernel.org/r/20260611074344.GG48970@noisy.programming.kicks-ass.net

and I hate deferred with a passion. It means you'll never see the
message when you wreck the machine.

> We could add printk_deferred_enter/exit() to all the rq_lock() variants.
> I think PeterZ loves this the most. And Greg will appreciate it too
> while backporting because of all the context changes.

No, not going to happen, ever, sorry. Instead printk should delete
console sem and have printk() itself be atomic safe.

As stated, printk deferred is an abomination and needs to die a horrible
painful death.

As described here:

  https://lkml.kernel.org/r/20260611191922.GK187714@noisy.programming.kicks-ass.net

"So printk should:

 - stick msg in buffer (lockless)
 - print to atomic consoles (lockless)
 - use irq_work to wake console kthreads (lockless)
 - each kthread then tries to flush buffer to its own non-atomic console
   in non-atomic context."




^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox