Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net-next] net: neigh: avoid calling neigh_forced_gc on every alloc when table is full
From: Kuniyuki Iwashima @ 2026-06-25 21:45 UTC (permalink / raw)
  To: avimalin; +Cc: edumazet, kuniyu, netdev, vimal.agrawal, kuba
In-Reply-To: <20260625102020.92814-1-vimal.agrawal@sophos.com>

From: Vimal Agrawal <avimalin@gmail.com>
Date: Thu, 25 Jun 2026 10:20:20 +0000
> Once the neighbour table exceeds gc_thresh3, neigh_forced_gc() is called
> on every allocation attempt with no rate limiting. In workloads with mostly
> active/reachable entries, the GC walk traverses a large portion of the
> neighbour table without reclaiming entries, holding tbl->lock for an
> extended period. This causes severe lock contention and allocation
> latencies exceeding 16ms under sustained neighbour creation.
> 
> Add a pre-lock check in neigh_forced_gc() to skip the GC run if one was
> performed within the last second, avoiding repeated full table scans and
> lock acquisitions on the hot allocation path.
> 
> Profiling of neigh_create() shows ~3 orders of magnitude latency
> improvement with this change.
> 
> Link:https://lore.kernel.org/netdev/CALkUMdSCpx_ywYCx_ePLdm6yioO1nQWx7sSM=AEgsq0kywHxTw@mail.gmail.com/

From the thread, these look misconfigured.

---8<---
net.ipv6.neigh.default.gc_thresh2 = 32768
net.ipv6.neigh.default.gc_thresh3 = 32768
---8<---

If gc_thresh3 is larger enough, gc_thresh2 will give you 5s
rate limiting.

If the number of active neigh entries constantly exceeds
gc_thresh3, it will be the correct gc_thresh2 for you.

Also, I guess you want a new kernel param for the first
neigh_hash_alloc(), which is currently fixed for 3, which
is too small for some hosts.

50000 entries require neigh_hash_grow() 13 times.

Can you test this on your real workload, starting from
neigh_hash_shift=16 and appropriate gc_thresh2/3 ?

---8<---
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 1349c0eedb64..a75b3750eec9 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -1817,6 +1817,22 @@ EXPORT_SYMBOL(neigh_parms_release);
 static struct lock_class_key neigh_table_proxy_queue_class;
 
 static struct neigh_table __rcu *neigh_tables[NEIGH_NR_TABLES] __read_mostly;
+static __initdata unsigned long neigh_hash_shift = 3;
+
+static int __init neigh_set_hash_shift(char *str)
+{
+	ssize_t ret;
+
+	if (!str)
+		return 0;
+
+	ret = kstrtoul(str, 0, &neigh_hash_shift);
+	if (ret)
+		return 0;
+
+	return 1;
+}
+__setup("neigh_hash_shift=", neigh_set_hash_shift);
 
 void neigh_table_init(int index, struct neigh_table *tbl)
 {
@@ -1843,7 +1859,7 @@ void neigh_table_init(int index, struct neigh_table *tbl)
 		panic("cannot create neighbour proc dir entry");
 #endif
 
-	RCU_INIT_POINTER(tbl->nht, neigh_hash_alloc(3));
+	RCU_INIT_POINTER(tbl->nht, neigh_hash_alloc(neigh_hash_shift));
 
 	phsize = (PNEIGH_HASHMASK + 1) * sizeof(struct pneigh_entry *);
 	tbl->phash_buckets = kzalloc(phsize, GFP_KERNEL);
---8<---



> Signed-off-by: Vimal Agrawal <vimal.agrawal@sophos.com>
> ---
>  net/core/neighbour.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/net/core/neighbour.c b/net/core/neighbour.c
> index 1349c0eedb64..078842db3c5f 100644
> --- a/net/core/neighbour.c
> +++ b/net/core/neighbour.c
> @@ -260,6 +260,9 @@ static int neigh_forced_gc(struct neigh_table *tbl)
>  	int shrunk = 0;
>  	int loop = 0;
>  
> +	if (!time_after(jiffies, READ_ONCE(tbl->last_flush) + HZ))
> +		return 0;
> +
>  	NEIGH_CACHE_STAT_INC(tbl, forced_gc_runs);
>  
>  	spin_lock_bh(&tbl->lock);
> -- 
> 2.17.1
> v

^ permalink raw reply related

* Re: [PATCH iproute2-next] "ip help" wrong output, exit code.
From: Stephen Hemminger @ 2026-06-25 21:34 UTC (permalink / raw)
  To: Dmitri Seletski; +Cc: netdev
In-Reply-To: <65f53987-c992-41b9-9603-9e9a448e469d@gmail.com>

On Thu, 25 Jun 2026 16:54:29 +0100
Dmitri Seletski <drjoms@gmail.com> wrote:

> I am confused.
> 
> Whats the next step here?
> 
> Regards
> 
> Dmitri
> 
> On 6/22/26 18:47, Dmitri Seletski wrote:
> > Hello David,
> >
> >
> > Based on change introduced:
> >
> > Two samples of "ip help" with demonstration of exit code and standard 
> > output are below.
> >
> > This is in line with what expect.
> >
> >
> > dimkosPC~/compiled/iproute2-next #if ./ip/ip help a >>/dev/null  ; 
> > then echo help triggered  ; else echo error code triggered  ;fi  #this 
> > redirects standard output  to /dev/null, so text missing is not error,
> > but standard text
> > help triggered
> >
> > dimkosPC~/compiled/iproute2-next #if ./ip/ip help   ; then echo help 
> > triggered  ; else echo error code triggered  ;fi
> > Usage: ip [ OPTIONS ] OBJECT { COMMAND | help }
> >       ip [ -force ] -batch filename
> > where  OBJECT := { address | addrlabel | fou | help | ila | ioam | 
> > l2tp | link |
> >                   macsec | maddress | monitor | mptcp | mroute | mrule |
> >                   neighbor | neighbour | netconf | netns | nexthop | 
> > ntable |
> >                   ntbl | route | rule | sr | stats | tap | tcpmetrics |
> >                   token | tunnel | tuntap | vrf | xfrm }
> >       OPTIONS := { -V[ersion] | -s[tatistics] | -d[etails] | -r[esolve] |
> >                    -h[uman-readable] | -iec | -j[son] | -p[retty] |
> >                    -f[amily] { inet | inet6 | mpls | bridge | link } |
> >                    -4 | -6 | -M | -B | -0 |
> >                    -l[oops] { maximum-addr-flush-attempts } | -echo | 
> > -br[ief] |
> >                    -o[neline] | -t[imestamp] | -ts[hort] | -b[atch] 
> > [filename] |
> >                    -rc[vbuf] [size] | -n[etns] name | -N[umeric] | 
> > -a[ll] |
> >                    -c[olor]}
> > help triggered
> >
> > Two samples of command that is broken on purpose.
> >
> > dimkosPC~/compiled/iproute2-next #if ./ip/ip idontexist   ; then echo 
> > help triggered  ; else echo error code triggered  ;fi
> > Object "idontexist" is unknown, try "ip help".
> > error code triggered
> >
> > dimkosPC~/compiled/iproute2-next #if ./ip/ip idontexist  >>/dev/null 
> >  ; then echo help triggered  ; else echo error code triggered  ;fi 
> >  #this redirects standard output  to /dev/null, so text missing is not 
> > error, but standard text
> > Object "idontexist" is unknown, try "ip help".
> > error code triggered
> >
> > This works as expected as per my understanding.
> >
> >
> > Not everything is fixed, but chunk of things fixed is better than non 
> > of it.
> >
> > for example:
> >
> > if ip  add help    ; then echo help triggered  ; else echo error code 
> > triggered  ;fi  #this redirects standard output  to /dev/null, so text 
> > missing is not error, but standard text
> > Usage: ip address {add|change|replace} IFADDR dev IFNAME [ LIFETIME ]
> >                                                      [ CONFFLAG-LIST ]
> >       ip address del IFADDR dev IFNAME [mngtmpaddr]
> >       ip address {save|flush} [ dev IFNAME ] [ scope SCOPE-ID ] [ to 
> > PREFIX ]
> >                            [ FLAG-LIST ] [ label LABEL ] [ { up | down 
> > } ]
> >       ip address [ show [ dev IFNAME ] [ scope SCOPE-ID ] [ master 
> > DEVICE ]
> >                         [ nomaster ]
> >                         [ type TYPE ] [ to PREFIX ] [ FLAG-LIST ]
> >                         [ label LABEL ] [ { up | down } ] [ vrf NAME ]
> >                         [ proto ADDRPROTO ] ]
> >       ip address {showdump|restore}
> > IFADDR := PREFIX | ADDR peer PREFIX
> >          [ broadcast ADDR ] [ anycast ADDR ]
> >          [ label IFNAME ] [ scope SCOPE-ID ] [ metric METRIC ]
> >          [ proto ADDRPROTO ]
> > SCOPE-ID := [ host | link | global | NUMBER ]
> > FLAG-LIST := [ FLAG-LIST ] FLAG
> > FLAG  := [ permanent | dynamic | secondary | primary |
> >           [-]tentative | [-]deprecated | [-]dadfailed | temporary |
> >           CONFFLAG-LIST ]
> > CONFFLAG-LIST := [ CONFFLAG-LIST ] CONFFLAG
> > CONFFLAG  := [ home | nodad | mngtmpaddr | noprefixroute | autojoin ]
> > LIFETIME := [ valid_lft LFT ] [ preferred_lft LFT ]
> > LFT := forever | SECONDS
> > ADDRPROTO := [ NAME | NUMBER ]
> > TYPE := { amt | bareudp | bond | bond_slave | bridge | bridge_slave |
> >          dsa | dummy | erspan | geneve | gre | gretap | gtp | hsr |
> >          ifb | ip6erspan | ip6gre | ip6gretap | ip6tnl |
> >          ipip | ipoib | ipvlan | ipvtap |
> >          macsec | macvlan | macvtap | netdevsim |
> >          netkit | nlmon | pfcp | rmnet | sit | team | team_slave |
> >          vcan | veth | vlan | vrf | vti | vxcan | vxlan | wwan |
> >          xfrm | virt_wifi }
> > error code triggered
> >
> > This is still problematic.
> >
> >
> > But so far code leaves "ip help" command/argument in better shape than 
> > it found it in.
> >
> >
> > I may try improve things more, but lets submit what we already have 
> > "better", please.
> >
> > Kind Regards
> >
> > Dmitri Seletski
> >
> >
> > On 6/22/26 17:44, David Laight wrote:  
> >> On Mon, 22 Jun 2026 07:57:00 -0700
> >> Stephen Hemminger <stephen@networkplumber.org> wrote:
> >>  
> >>> On Sun, 21 Jun 2026 22:48:59 +0100
> >>> Dmitri Seletski <drjoms@gmail.com> wrote:
> >>>  
> >>>>  From 0805e07105cd15c5b94271a4706e50e3c65dbde5 Mon Sep 17 00:00:00 
> >>>> 2001
> >>>> From: Dmitri Seletski <drjoms@gmail.com>
> >>>> Date: Sun, 21 Jun 2026 22:12:43 +0100
> >>>> Subject: [PATCH iproute2-next]  "ip help" wrong output, exit code.
> >>>>
> >>>> Changed output of "ip help" from standard error to standard output. 
> >>>> And
> >>>> Exit is now 0 instead of -1. "ip help|grep bridge" - now gives bridge
> >>>> syntax instead of flooding user with everything from "ip help".
> >>>> ---
> >>>> ip/ip.c | 4 ++--
> >>>> 1 file changed, 2 insertions(+), 2 deletions(-)
> >>>>
> >>>> diff --git a/ip/ip.c b/ip/ip.c
> >>>> index e4b71bde..4627b61c 100644
> >>>> --- a/ip/ip.c
> >>>> +++ b/ip/ip.c
> >>>> @@ -56,7 +56,7 @@ static void usage(void) __attribute__((noreturn));
> >>>>
> >>>> static void usage(void)
> >>>> {
> >>>> -fprintf(stderr,
> >>>> +fprintf(stdout,
> >>>> "Usage: ip [ OPTIONS ] OBJECT { COMMAND | help }\n"
> >>>> "       ip [ -force ] -batch filename\n"
> >>>> "where  OBJECT := { address | addrlabel | fou | help | ila | ioam | 
> >>>> l2tp
> >>>> | link |\n"
> >>>> @@ -72,7 +72,7 @@ static void usage(void)
> >>>> "                    -o[neline] | -t[imestamp] | -ts[hort] | -b[atch]
> >>>> [filename] |\n"
> >>>> "                    -rc[vbuf] [size] | -n[etns] name | -N[umeric] |
> >>>> -a[ll] |\n"
> >>>> "                    -c[olor]}\n");
> >>>> -exit(-1);
> >>>> +exit(0);
> >>>> }  
> >>> Your mailer damages white space.
> >>>  
> >> The output also needs to depend on whether these is a 'usage' error or
> >> if 'help' is requested.
> >> Code code is correct for the former - except it should do exit(1).
> >>
> >>     David
> >>
> >>  
> 

We need to have a broad solution that doesn't look ugly.
There are a couple problems with current code:
  1. Help should exit with 0 (ok); invalid argument should exit with non-zero
     by Gnu convention that is 2 but other commands like git use 129
  2. help should go to stdout; usage on error should go to stderr

The solution should work across iproute2 commands: ip, tc, dpll, tipc, bridge, ...
and the sub commands.

So far the mailing list patches were kind of messy and limited.

^ permalink raw reply

* Re: [PATCH net] octeontx2-pf: check DMAC extraction support before filtering
From: Harshitha Ramamurthy @ 2026-06-25 21:28 UTC (permalink / raw)
  To: nshettyj
  Cc: netdev, linux-kernel, sgoutham, gakula, sbhatta, hkelam,
	bbhushan2, andrew+netdev, davem, edumazet, kuba, pabeni, naveenm,
	tduszynski, sumang
In-Reply-To: <20260625172552.258631-1-nshettyj@marvell.com>

On Thu, Jun 25, 2026 at 10:30 AM <nshettyj@marvell.com> wrote:
>
> From: Suman Ghosh <sumang@marvell.com>
>
> Currently, configuring a VF MAC address via the PF (e.g., 'ip link
> set <pf> vf 0 mac <mac>') blindly attempts to install a DMAC-based
> hardware filter. However, the hardware parser profile might not
> support DMAC extraction.
>
> Check if the hardware parsing profile supports DMAC extraction
> before adding the filter. Additionally, emit a warning message
> to inform the operator if the MAC filter installation fails due
> to missing DMAC extraction support.
>
> Fixes: f0c2982aaf98 ("octeontx2-pf: Add support for SR-IOV management functions")
> Signed-off-by: Suman Ghosh <sumang@marvell.com>
> Signed-off-by: Nitin Shetty J <nshettyj@marvell.com>
> ---
>  .../ethernet/marvell/octeontx2/nic/otx2_pf.c  | 34 +++++++++++++++++++
>  1 file changed, 34 insertions(+)
>
> diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c
> index b63df5737ff2..8e4435d9e520 100644
> --- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c
> +++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c
> @@ -2546,6 +2546,8 @@ static int otx2_do_set_vf_mac(struct otx2_nic *pf, int vf, const u8 *mac)
>  static int otx2_set_vf_mac(struct net_device *netdev, int vf, u8 *mac)
>  {
>         struct otx2_nic *pf = netdev_priv(netdev);
> +       struct npc_get_field_status_req *req;
> +       struct npc_get_field_status_rsp *rsp;
>         struct pci_dev *pdev = pf->pdev;
>         struct otx2_vf_config *config;
>         int ret;
> @@ -2559,6 +2561,38 @@ static int otx2_set_vf_mac(struct net_device *netdev, int vf, u8 *mac)
>         if (!is_valid_ether_addr(mac))
>                 return -EINVAL;
>
> +       /* Skip installing the DMAC filter if the hardware parser profile
> +        * does not support DMAC extraction.
> +        */
> +       mutex_lock(&pf->mbox.lock);
> +       req = otx2_mbox_alloc_msg_npc_get_field_status(&pf->mbox);
> +       if (!req) {
> +               mutex_unlock(&pf->mbox.lock);
> +               return -ENOMEM;
> +       }
> +
> +       req->field = NPC_DMAC;
> +       if (otx2_sync_mbox_msg(&pf->mbox)) {
> +               mutex_unlock(&pf->mbox.lock);
> +               return -EINVAL;
> +       }
> +
> +       rsp = (struct npc_get_field_status_rsp *)otx2_mbox_get_rsp
> +              (&pf->mbox.mbox, 0, &req->hdr);
> +       if (IS_ERR(rsp)) {
> +               mutex_unlock(&pf->mbox.lock);
> +               return PTR_ERR(rsp);
> +       }
> +
> +       if (!rsp->enable) {
> +               mutex_unlock(&pf->mbox.lock);
> +               netdev_warn(netdev, "VF %d MAC filter not installed: DMAC extraction not supported by parser profile\n",
> +                           vf);
> +               return 0;

Is the intent to return success here even though the MAC address was
not programmed?

> +       }
> +
> +       mutex_unlock(&pf->mbox.lock);
> +

Why not move all these checks into the otx2_do_set_vf_mac() since that
anyway acquires the pf->mbox.lock? That way you could also fold all
the mutex_unlock() calls introduced in the error paths in this patch
into the existing goto-out in that function.

>         config = &pf->vf_configs[vf];
>         ether_addr_copy(config->mac, mac);
>
> --
> 2.48.1
>
>

^ permalink raw reply

* Re: [PATCH v29 4/5] sfc: obtain and map cxl range using devm_cxl_probe_mem
From: Dan Williams (nvidia) @ 2026-06-25 20:34 UTC (permalink / raw)
  To: Alejandro Lucero Palau, Dan Williams (nvidia),
	alejandro.lucero-palau, linux-cxl, netdev, dan.j.williams,
	edward.cree, davem, kuba, pabeni, edumazet, dave.jiang
  Cc: Edward Cree
In-Reply-To: <b0a45e85-f42c-4a52-8223-f8318da10649@amd.com>

Alejandro Lucero Palau wrote:
[..]
> >> +{
> > If you are going to have an explicit efx_cxl_exit() then I would also
> > add an explicit unregistration of the memdev.
> 
> 
> This is necessary for undoing the mmap. Nothing else happens there 
> because it is all relying on devm ...
> 
> 
> I could change the ioremap_wc call to devm_ioremap_wc, but
> 
> 
> > This would also fix the
> > Sashiko report about pci_disable_device() running while the cxl_memdev
> > is still registered. Unfortunately, mixing devm and explicit unwind is
> > always fraught.
> 
> 
> I do not think there is a problem here. The cxl core does not need what 
> a type2 driver can do regarding PCI BAR mappings, or at least it is not 
> the case for sfc.
> 
> Any action through sysfs cxl will go through cxl core and the only thing 
> linked to the type device is the CXL registers which are mapped inside 
> cxl_map_component_regs() and those are managed resources.
> 
> 
> So, I can not see why this change is needed. If it is really necessary, 
> please describe the problem with more detail.
> 
> 
> It looks like you need reasons for delaying this further ...

What? Help with Sashiko reports is an act of malice? I assumed you
wanted help with those so that other maintainers would proceed with
these patches. 

I did do another run through to see if there are any paths that the CXL
core can reach if someone tried to fuzz the CXL ABIs or kernel paths
while SFC is unloading. I think Sashiko is hallucinating a sysfs path to
the BAR mapping given there is no mailbox and the EDAC capabilities are
usually not present on a type-2 device. The RAS path looks valid, but
that may also get lucky that most (all?) of the RAS use cases lock the
device before accessing the registers, so devres_release_all() would
become consistent with pci_disable_device() before any access attempt.
That does not seem like a clean design, but it is also does not appear
to be immediately exploitable.

If you believe the patches are ready and the Sashiko reports are
invalid, please do say so, no more comments from me on this set from
this point forward.

> > Let me know if this passes your testing, and I can send it out as a
> > standalone patch. You could also use it to unwind if the ioremap()
> > fails.
> 
> 
> You did not read my comments on v28 ...
> 
> 
> I changed efx_cxl_init to make the driver probe to fail if cxl is 
> supported and enabled but the cxl initialization fails, including 
> ioremap_wc(). What you proposed to do, explicitly undo cxl 
> initialization bits, has the same outcome: device detached from the driver.

Right, I did read that and that motivated the devm_cxl_remove_mem()
helper to undo the memdev creation without unloading the driver. You are
free to ignore that helper.

^ permalink raw reply

* Re: [GIT PULL] Networking for v7.2-rc1
From: pr-tracker-bot @ 2026-06-25 19:57 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: torvalds, kuba, davem, netdev, linux-kernel, pabeni
In-Reply-To: <20260625174511.745883-1-kuba@kernel.org>

The pull request you sent on Thu, 25 Jun 2026 10:45:11 -0700:

> git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git tags/net-7.2-rc1

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/805185b7c7a1069e407b6f7b3bc98e44d415f484

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html

^ permalink raw reply

* Re: [PATCH bpf-next v2 1/4] bpf: Initialize the l3mdev field for the fib lookup flow
From: David Ahern @ 2026-06-25 19:51 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, Avinash Duduskar,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
  Cc: Eduard Zingerman, Kumar Kartikeya Dwivedi, Martin KaFai Lau,
	Song Liu, Yonghong Song, Jiri Olsa, Emil Tsalapatis,
	John Fastabend, Stanislav Fomichev, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Shuah Khan,
	Jesper Dangaard Brouer, Mykyta Yatsenko, Leon Hwang, KP Singh,
	Anton Protopopov, Amery Hung, Eyal Birger, Rong Tao, bpf, netdev,
	linux-kselftest, linux-kernel
In-Reply-To: <87bjd9h6yh.fsf@toke.dk>

On 6/17/26 3:06 AM, Toke Høiland-Jørgensen wrote:
>> The helper already initializes the other flow fields the rules path
>> consumes (flowi4_mark, flowi4_tun_key.tun_id, flowi4_uid and the v6
>> counterparts); flowi*_l3mdev was added to that set afterwards and this
>> helper was never updated to match. ip_route_input_slow() likewise zeroes
>> the field before its input lookup. Do the same here.
> 
> So how about we explicitly zero-init the whole struct instead of adding
> more fields ad-hoc like this? Otherwise this seems like something that
> is likely to happen again if we ever add another field to the struct?
> 
> -Toke
> 

+1. Piecemeal init of the flow struct has been a known source of bugs.

^ permalink raw reply

* Re: [PATCH net v2] seg6: validate SRH length before reading fixed fields
From: Andrea Mayer @ 2026-06-25 19:49 UTC (permalink / raw)
  To: Nuoqi Gui
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, netdev, bpf, linux-kernel, Mathieu Xhonneux,
	Daniel Borkmann, David Lebrun, stefano.salsano, Paolo Lungaroni,
	Andrea Mayer
In-Reply-To: <20260623-f01-17-seg6-srh-len-v2-1-2edc40e9e3e1@mails.tsinghua.edu.cn>

On Tue, 23 Jun 2026 18:32:31 +0800
Nuoqi Gui <gnq25@mails.tsinghua.edu.cn> wrote:

> seg6_validate_srh() reads fixed SRH fields such as srh->type and
> srh->hdrlen before checking that the supplied length covers the fixed
> struct ipv6_sr_hdr fields.
> 
> The BPF SEG6 encap path reaches this with a BPF program-supplied pointer
> and length: bpf_lwt_push_encap() and the SEG6 local BPF END_B6 and
> END_B6_ENCAP actions call bpf_push_seg6_encap(), which forwards the
> length to seg6_validate_srh() with no minimum-size guard.  A 2-byte SEG6
> encap header can therefore make the validator read srh->type at offset 2
> beyond the caller-supplied buffer.
> 
> Reject lengths shorter than the fixed SRH at the top of
> seg6_validate_srh(), before any field is read.  This fixes the BPF helper
> path and keeps the common validator robust.
> 
> Fixes: fe94cc290f53 ("bpf: Add IPv6 Segment Routing helpers")
> Signed-off-by: Nuoqi Gui <gnq25@mails.tsinghua.edu.cn>
> ---
> Changes in v2:
> - Narrowed the commit message to the BPF encap callers that can supply a
>   too-short SRH length.
> - Dropped the unnecessary cast in the minimum SRH length check.
> - Link to v1: https://patch.msgid.link/20260620-f01-17-seg6-srh-len-v1-1-36cbb29c12f1@mails.tsinghua.edu.cn  
> 
> To: Andrea Mayer <andrea.mayer@uniroma2.it>
> To: "David S. Miller" <davem@davemloft.net>
> To: Eric Dumazet <edumazet@google.com>
> To: Jakub Kicinski <kuba@kernel.org>
> To: Paolo Abeni <pabeni@redhat.com>
> To: Simon Horman <horms@kernel.org>
> To: Mathieu Xhonneux <m.xhonneux@gmail.com>
> To: Daniel Borkmann <daniel@iogearbox.net>
> To: David Lebrun <dlebrun@google.com>
> Cc: netdev@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Cc: bpf@vger.kernel.org
> ---
>  net/ipv6/seg6.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/net/ipv6/seg6.c b/net/ipv6/seg6.c
> index 1c3ad25700c4c..62a7eb7792026 100644
> --- a/net/ipv6/seg6.c
> +++ b/net/ipv6/seg6.c
> @@ -29,6 +29,9 @@ bool seg6_validate_srh(struct ipv6_sr_hdr *srh, int len, bool reduced)
>  	int max_last_entry;
>  	int trailing;
>  
> +	if (len < sizeof(*srh))
> +		return false;
> +

Thanks for the patch.

Looks good to me.

Reviewed-by: Andrea Mayer <andrea.mayer@uniroma2.it>

On a separate note: the AI review message seems correct. The reported
issue is a separate, pre-existing bug in the BPF SEG6 encap path, not
introduced by this patch.

Regards,
Andrea

>  	if (srh->type != IPV6_SRCRT_TYPE_4)
>  		return false;
>  
> 
> ---
> base-commit: 96e7f9122aae0ed000ee321f324b812a447906d9
> change-id: 20260619-f01-17-seg6-srh-len-a85f35427e0b
> 
> Best regards,
> --  
> Nuoqi Gui <gnq25@mails.tsinghua.edu.cn>
> 

^ permalink raw reply

* [mellanox/mlx5-next RFC 1/1] net/mlx5: RX, Fix refcount warning on frag page release
From: Nabil S. Alramli @ 2026-06-25 17:40 UTC (permalink / raw)
  To: saeedm, tariqt, mbloch, dtatulea
  Cc: dev, nalramli, leon, andrew+netdev, davem, edumazet, kuba, pabeni,
	netdev, linux-rdma, linux-kernel
In-Reply-To: <20260625174059.2879717-1-dev@nalramli.com>

Under memory pressure, mlx5 driver has WARNING during fragmented page
release. This happens because there is a discrepency between what mlx5
thinks the page fragment counter is vs what the page_pool actually says it
is.

The cause of the issue is page allocations on concurrent cpus, which
increment the non-atomic u16 page counter mlx5e_frag_page.frags, while at
the same time the page reference counter net_iov.pp_ref_count is atomically
incremented. That sometimes leads to a difference in the counts and
therefore triggers the warning in page_pool_unref_netmem:

```
	ret = atomic_long_sub_return(nr, pp_ref_count);
	WARN_ON(ret < 0);
```

The actual stack trace looks like this:

```
WARNING: CPU: 37 PID: 447795 at include/net/page_pool/helpers.h:277 mlx5e_page_release_fragmented.isra.0+0x51/0x60 [mlx5_core]
Tainted: [S]=CPU_OUT_OF_SPEC, [O]=OOT_MODULE
Hardware name: *
RIP: 0010:mlx5e_page_release_fragmented.isra.0+0x51/0x60 [mlx5_core]
RSP: 0018:ffffc90019814d98 EFLAGS: 00010293
RAX: 000000000000003f RBX: ffff88c0993d0a10 RCX: ffffea02424592c0
RDX: 0000000000000001 RSI: ffffea02424592c0 RDI: ffff88c090e20000
RBP: 000000000000000a R08: 0000000000001409 R09: 0000000000000006
R10: 0000000000000000 R11: ffff88c095fbc040 R12: 000000000000141f
R13: 0000000000000009 R14: ffff88c090e20000 R15: 0000000000000001
FS:  00007f34149fa6c0(0000) GS:ffff89200fa40000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ed0265eb000 CR3: 0000005091cbe000 CR4: 0000000000350ef0
Call Trace:
 <IRQ>
 mlx5e_free_rx_wqes+0x7b/0xa0 [mlx5_core]
 mlx5e_post_rx_wqes+0x1ac/0x5a0 [mlx5_core]
 mlx5e_napi_poll+0x5e5/0x6f0 [mlx5_core]
 __napi_poll+0x2b/0x1a0
 net_rx_action+0x30e/0x370
 ? sched_clock+0x9/0x10
 ? sched_clock_cpu+0xf/0x170
 handle_softirqs+0xe2/0x2a0
 common_interrupt+0x85/0xa0
 </IRQ>
 <TASK>
 asm_common_interrupt+0x26/0x40
RIP: 0010:page_counter_uncharge+0x34/0x90
RSP: 0018:ffffc900e728bb00 EFLAGS: 00000213
RAX: ffff88aff4762000 RBX: ffff88aff4762100 RCX: 0000000000000304
RDX: 0000000000000001 RSI: 00000000004e9e1a RDI: ffff88aff4762100
RBP: 0000000000000001 R08: ffff891ea0560048 R09: 00007ffffffff000
R10: 0000000000001000 R11: ffff891ae8061b00 R12: ffffffffffffffff
R13: ffff89107fcfd4c0 R14: ffff891ae8061b00 R15: ffff892002fe1400
 uncharge_batch+0x40/0xd0
```

The fix is to use an atomic page fragment counter, so it will always match
the number of references held in the page_pool.

Signed-off-by: Nabil S. Alramli <dev@nalramli.com>
Fixes: 6f5742846053 ("net/mlx5e: RX, Enable skb page recycling through the page_pool")
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h  |  2 +-
 .../net/ethernet/mellanox/mlx5/core/en_main.c |  2 +-
 .../net/ethernet/mellanox/mlx5/core/en_rx.c   | 39 ++++++++++---------
 3 files changed, 22 insertions(+), 21 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 2270e2e550dd..c164106eb85d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -568,7 +568,7 @@ struct mlx5e_icosq {
 
 struct mlx5e_frag_page {
 	netmem_ref netmem;
-	u16 frags;
+	atomic_long_t frags;
 };
 
 enum mlx5e_wqe_frag_flag {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 5a46870c4b74..571a0df9f604 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -400,7 +400,7 @@ static int mlx5e_rq_alloc_mpwqe_linear_info(struct mlx5e_rq *rq, int node,
 	rq->mpwqe.linear_info = li;
 
 	/* Set to max to force allocation on first run. */
-	li->frag_page.frags = li->max_frags;
+	atomic_long_set(&li->frag_page.frags, li->max_frags);
 
 	return 0;
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 5b60aa47c75b..ee360fa0c316 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -284,7 +284,7 @@ static int mlx5e_page_alloc_fragmented(struct page_pool *pp,
 
 	*frag_page = (struct mlx5e_frag_page) {
 		.netmem	= netmem,
-		.frags	= 0,
+		.frags	= ATOMIC_LONG_INIT(0),
 	};
 
 	return 0;
@@ -293,7 +293,7 @@ static int mlx5e_page_alloc_fragmented(struct page_pool *pp,
 static void mlx5e_page_release_fragmented(struct page_pool *pp,
 					  struct mlx5e_frag_page *frag_page)
 {
-	u16 drain_count = MLX5E_PAGECNT_BIAS_MAX - frag_page->frags;
+	u16 drain_count = MLX5E_PAGECNT_BIAS_MAX - atomic_long_read(&frag_page->frags);
 	netmem_ref netmem = frag_page->netmem;
 
 	if (page_pool_unref_netmem(netmem, drain_count) == 0)
@@ -304,7 +304,7 @@ static int mlx5e_mpwqe_linear_page_refill(struct mlx5e_rq *rq)
 {
 	struct mlx5e_mpw_linear_info *li = rq->mpwqe.linear_info;
 
-	if (likely(li->frag_page.frags < li->max_frags))
+	if (likely(atomic_long_read(&li->frag_page.frags) < li->max_frags))
 		return 0;
 
 	if (likely(li->frag_page.netmem)) {
@@ -323,7 +323,8 @@ static void *mlx5e_mpwqe_get_linear_page_frag(struct mlx5e_rq *rq)
 	if (unlikely(mlx5e_mpwqe_linear_page_refill(rq)))
 		return NULL;
 
-	frag_offset = li->frag_page.frags << MLX5E_XDP_LOG_MAX_LINEAR_SZ;
+	frag_offset = atomic_long_read(&li->frag_page.frags) <<
+		      MLX5E_XDP_LOG_MAX_LINEAR_SZ;
 	WARN_ON(frag_offset >= BIT(rq->mpwqe.page_shift));
 
 	return netmem_address(li->frag_page.netmem) + frag_offset;
@@ -568,7 +569,7 @@ mlx5e_add_skb_frag(struct mlx5e_rq *rq, struct sk_buff *skb,
 		return;
 	}
 
-	frag_page->frags++;
+	atomic_long_inc(&frag_page->frags);
 	skb_add_rx_frag_netmem(skb, next_frag, netmem,
 			       frag_offset, len, truesize);
 }
@@ -744,7 +745,7 @@ void mlx5e_mpwqe_dealloc_linear_page(struct mlx5e_rq *rq)
 	 * things in a good state for re-allocation.
 	 */
 	li->frag_page.netmem = 0;
-	li->frag_page.frags = li->max_frags;
+	atomic_long_set(&li->frag_page.frags, li->max_frags);
 }
 
 INDIRECT_CALLABLE_SCOPE bool mlx5e_post_rx_wqes(struct mlx5e_rq *rq)
@@ -1615,7 +1616,7 @@ mlx5e_skb_from_cqe_linear(struct mlx5e_rq *rq, struct mlx5e_wqe_frag_info *wi,
 
 	/* queue up for recycling/reuse */
 	skb_mark_for_recycle(skb);
-	frag_page->frags++;
+	atomic_long_inc(&frag_page->frags);
 
 	return skb;
 }
@@ -1683,7 +1684,7 @@ mlx5e_skb_from_cqe_nonlinear(struct mlx5e_rq *rq, struct mlx5e_wqe_frag_info *wi
 				struct mlx5e_wqe_frag_info *pwi;
 
 				for (pwi = head_wi; pwi < wi; pwi++)
-					pwi->frag_page->frags++;
+					atomic_long_inc(&pwi->frag_page->frags);
 			}
 			return NULL; /* page/packet was consumed by XDP */
 		}
@@ -1702,7 +1703,7 @@ mlx5e_skb_from_cqe_nonlinear(struct mlx5e_rq *rq, struct mlx5e_wqe_frag_info *wi
 		return NULL;
 
 	skb_mark_for_recycle(skb);
-	head_wi->frag_page->frags++;
+	atomic_long_inc(&head_wi->frag_page->frags);
 
 	if (xdp_buff_has_frags(&mxbuf->xdp)) {
 		/* sinfo->nr_frags is reset by build_skb, calculate again. */
@@ -1711,7 +1712,7 @@ mlx5e_skb_from_cqe_nonlinear(struct mlx5e_rq *rq, struct mlx5e_wqe_frag_info *wi
 					  xdp_buff_get_skb_flags(&mxbuf->xdp));
 
 		for (struct mlx5e_wqe_frag_info *pwi = head_wi + 1; pwi < wi; pwi++)
-			pwi->frag_page->frags++;
+			atomic_long_inc(&pwi->frag_page->frags);
 	}
 
 	return skb;
@@ -1760,7 +1761,7 @@ static void mlx5e_handle_rx_cqe(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe)
 	if (!skb) {
 		/* probably for XDP */
 		if (__test_and_clear_bit(MLX5E_RQ_FLAG_XDP_XMIT, rq->flags))
-			wi->frag_page->frags++;
+			atomic_long_inc(&wi->frag_page->frags);
 		goto wq_cyc_pop;
 	}
 
@@ -1808,7 +1809,7 @@ static void mlx5e_handle_rx_cqe_rep(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe)
 	if (!skb) {
 		/* probably for XDP */
 		if (__test_and_clear_bit(MLX5E_RQ_FLAG_XDP_XMIT, rq->flags))
-			wi->frag_page->frags++;
+			atomic_long_inc(&wi->frag_page->frags);
 		goto wq_cyc_pop;
 	}
 
@@ -2011,9 +2012,9 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w
 				struct mlx5e_frag_page *pfp;
 
 				for (pfp = head_page; pfp < frag_page; pfp++)
-					pfp->frags++;
+					atomic_long_inc(&pfp->frags);
 
-				linear_page->frags++;
+				atomic_long_inc(&linear_page->frags);
 			}
 			return NULL; /* page/packet was consumed by XDP */
 		}
@@ -2035,7 +2036,7 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w
 			return NULL;
 
 		skb_mark_for_recycle(skb);
-		linear_page->frags++;
+		atomic_long_inc(&linear_page->frags);
 
 		if (xdp_buff_has_frags(&mxbuf->xdp)) {
 			struct mlx5e_frag_page *pagep;
@@ -2048,7 +2049,7 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w
 
 			pagep = head_page;
 			do
-				pagep->frags++;
+				atomic_long_inc(&pagep->frags);
 			while (++pagep < frag_page);
 
 			headlen = min_t(u16, MLX5E_RX_MAX_HEAD - len,
@@ -2068,7 +2069,7 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w
 
 			pagep = frag_page - sinfo->nr_frags;
 			do
-				pagep->frags++;
+				atomic_long_inc(&pagep->frags);
 			while (++pagep < frag_page);
 		}
 		/* copy header */
@@ -2121,7 +2122,7 @@ mlx5e_skb_from_cqe_mpwrq_linear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *wi,
 				 cqe_bcnt, mxbuf);
 		if (mlx5e_xdp_handle(rq, prog, mxbuf)) {
 			if (__test_and_clear_bit(MLX5E_RQ_FLAG_XDP_XMIT, rq->flags))
-				frag_page->frags++;
+				atomic_long_inc(&frag_page->frags);
 			return NULL; /* page/packet was consumed by XDP */
 		}
 
@@ -2136,7 +2137,7 @@ mlx5e_skb_from_cqe_mpwrq_linear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *wi,
 
 	/* queue up for recycling/reuse */
 	skb_mark_for_recycle(skb);
-	frag_page->frags++;
+	atomic_long_inc(&frag_page->frags);
 
 	return skb;
 }
-- 
2.43.0


^ permalink raw reply related

* [mellanox/mlx5-next RFC 1/1] net/mlx5: RX, Fix refcount warning on frag page release
From: Nabil S. Alramli @ 2026-06-25 17:40 UTC (permalink / raw)
  To: saeedm, tariqt, mbloch, dtatulea
  Cc: dev, nalramli, leon, andrew+netdev, davem, edumazet, kuba, pabeni,
	netdev, linux-rdma, linux-kernel

Hello mlx5 experts,

We have been experiencing frequent WARNINGs in the mlx5 driver on frag page
release and we think it could possibly be caused by a bug in mlx5. Could
you please review the attached patch and provide us your guidance on
whether or not our investigation and assumptions are valid, and if so,
would it be possible to incorporate this fix into your next release?

Best Regards,

Nabil S. Alramli (1):
  net/mlx5: RX, Fix refcount warning on frag page release

 drivers/net/ethernet/mellanox/mlx5/core/en.h  |  2 +-
 .../net/ethernet/mellanox/mlx5/core/en_main.c |  2 +-
 .../net/ethernet/mellanox/mlx5/core/en_rx.c   | 39 ++++++++++---------
 3 files changed, 22 insertions(+), 21 deletions(-)

-- 
2.43.0


^ permalink raw reply

* [PATCH net v3 0/3] tcp: TCP-AO connect() fixes
From: Dmitry Safonov via B4 Relay @ 2026-06-25 18:21 UTC (permalink / raw)
  To: David Ahern, Eric Dumazet, Neal Cardwell, Kuniyuki Iwashima,
	David S. Miller, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Salam Noureddine
  Cc: Michael Bommarito, Qihang, netdev, linux-kernel, Dmitry Safonov,
	stable, Dmitry Safonov

Resending v3.

I've addeded credits to Qihang on patch 2; and a third patch/fix
for static key decrement.

Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
---
Dmitry Safonov (1):
      tcp: Decrement tcp_md5_needed static branch

Michael Bommarito (2):
      tcp: restore RCU grace period in tcp_ao_destroy_sock
      tcp: defer md5sig_info kfree past RCU grace period in tcp_connect

 include/net/tcp_ao.h  | 1 +
 net/ipv4/tcp_ao.c     | 5 +++--
 net/ipv4/tcp_ipv4.c   | 4 ++--
 net/ipv4/tcp_output.c | 8 ++++++--
 4 files changed, 12 insertions(+), 6 deletions(-)
---
base-commit: 02f144fbb4c86c360495d33debe307cb46a57f95
change-id: 20260625-tcp-md5-connect-dc2369d7f414

Best regards,
--  
Dmitry Safonov <0x7f454c46@gmail.com>



^ permalink raw reply

* [PATCH net v3 2/3] tcp: defer md5sig_info kfree past RCU grace period in tcp_connect
From: Dmitry Safonov via B4 Relay @ 2026-06-25 18:21 UTC (permalink / raw)
  To: David Ahern, Eric Dumazet, Neal Cardwell, Kuniyuki Iwashima,
	David S. Miller, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Salam Noureddine
  Cc: Michael Bommarito, Qihang, netdev, linux-kernel, Dmitry Safonov,
	stable, Dmitry Safonov
In-Reply-To: <20260625-tcp-md5-connect-v3-0-1fd313d6c1e0@gmail.com>

From: Michael Bommarito <michael.bommarito@gmail.com>

The md5+ao reconciliation in tcp_connect() (net/ipv4/tcp_output.c)
has two symmetric branches:

	if (needs_md5) {
		tcp_ao_destroy_sock(sk, false);
	} else if (needs_ao) {
		tcp_clear_md5_list(sk);
		kfree(rcu_replace_pointer(tp->md5sig_info, NULL, ...));
	}

Both branches free a per-socket auth-info object while the socket is
in TCP_SYN_SENT and is already on the inet ehash (inserted by
inet_hash_connect() in tcp_v4_connect()). Both branches are reachable
by softirq RX-path readers that load the corresponding info pointer
via implicit RCU before bh_lock_sock_nested() is taken.

The needs_md5 branch is fixed in the prior patch by re-introducing
the call_rcu() free in tcp_ao_destroy_sock(): the equivalent per-key
loop runs inside tcp_ao_info_free_rcu(), the RCU callback, so by the
time it frees each tcp_ao_key all softirq readers that captured the
container have already completed rcu_read_unlock().

The needs_ao branch is not symmetric in the same way. The container
free can be deferred via kfree_rcu(md5sig, rcu) -- struct
tcp_md5sig_info already has the required rcu member
(include/net/tcp.h:1999-2002), and the rest of the tree already does
this in the tcp_md5sig_info_add() rollback paths
(net/ipv4/tcp_ipv4.c:1410, 1436). But the per-key teardown is done
by tcp_clear_md5_list() in process context BEFORE the container's
RCU grace period: it walks &md5sig->head and frees each
tcp_md5sig_key with bare hlist_del + kfree. A concurrent softirq
reader in __tcp_md5_do_lookup() / __tcp_md5_do_lookup_exact()
(tcp_ipv4.c:1253, 1298) walks the same list via
hlist_for_each_entry_rcu() and races with that bare kfree on the
keys themselves -- a per-key slab use-after-free of the same class
as the TCP-AO bug, on the same race window.

Fix this in two halves:

  1. Convert the bare kfree() in tcp_connect() to kfree_rcu() so the
     md5sig_info container joins the rest of the md5sig lifecycle.
     The local-variable lift is mechanical and required because
     kfree_rcu() is a macro that expects an lvalue.

  2. Make tcp_clear_md5_list() RCU-safe by replacing hlist_del +
     kfree(key) with hlist_del_rcu + kfree_rcu(key, rcu). struct
     tcp_md5sig_key already carries the rcu member
     (include/net/tcp.h:1995) and tcp_md5_do_del()
     (net/ipv4/tcp_ipv4.c:1456) already uses kfree_rcu, so this
     restores the lifecycle invariant the rest of the file follows
     rather than introducing a one-off.

The other caller of tcp_clear_md5_list() is tcp_md5_destruct_sock()
(net/ipv4/tcp.c:412), which runs from the sock destructor when the
socket is already unhashed and unreachable; the extra grace period
there is unnecessary but harmless. Making the helper unconditionally
RCU-safe is the cleaner contract.

The needs_ao branch is not reachable by the userns reproducer used
to demonstrate the AO-side splat (the repro installs both keys but
ends up in the needs_md5 branch because the connect peer matches
the MD5 key, not the AO key); however the symmetric race exists
and a maintainer touching this code should not have to think about
which branch escapes RCU and which one does not.

Fixes: 51e547e8c89c ("tcp: Free TCP-AO/TCP-MD5 info/keys without RCU")
Cc: stable@vger.kernel.org # v6.18+
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
Assisted-by: Claude:claude-opus-4-7
Reviewed-by: Dmitry Safonov <dima@arista.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
[also credits to Qihang, who found that this races with tcp-diag]
Reported-by: Qihang <q.h.hack.winter@gmail.com>
Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
---
 net/ipv4/tcp_ipv4.c   | 4 ++--
 net/ipv4/tcp_output.c | 8 ++++++--
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index ec09f97cc9e6..209ef7522508 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1467,9 +1467,9 @@ void tcp_clear_md5_list(struct sock *sk)
 	md5sig = rcu_dereference_protected(tp->md5sig_info, 1);
 
 	hlist_for_each_entry_safe(key, n, &md5sig->head, node) {
-		hlist_del(&key->node);
+		hlist_del_rcu(&key->node);
 		atomic_sub(sizeof(*key), &sk->sk_omem_alloc);
-		kfree(key);
+		kfree_rcu(key, rcu);
 	}
 }
 
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 00ec4b5900f2..bc03809ca3af 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -4329,9 +4329,13 @@ int tcp_connect(struct sock *sk)
 		if (needs_md5) {
 			tcp_ao_destroy_sock(sk, false);
 		} else if (needs_ao) {
+			struct tcp_md5sig_info *md5sig;
+
 			tcp_clear_md5_list(sk);
-			kfree(rcu_replace_pointer(tp->md5sig_info, NULL,
-						  lockdep_sock_is_held(sk)));
+			md5sig = rcu_replace_pointer(tp->md5sig_info, NULL,
+						     lockdep_sock_is_held(sk));
+			if (md5sig)
+				kfree_rcu(md5sig, rcu);
 		}
 	}
 #endif

-- 
2.51.2



^ permalink raw reply related

* [PATCH net v3 1/3] tcp: restore RCU grace period in tcp_ao_destroy_sock
From: Dmitry Safonov via B4 Relay @ 2026-06-25 18:21 UTC (permalink / raw)
  To: David Ahern, Eric Dumazet, Neal Cardwell, Kuniyuki Iwashima,
	David S. Miller, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Salam Noureddine
  Cc: Michael Bommarito, Qihang, netdev, linux-kernel, Dmitry Safonov,
	stable, Dmitry Safonov
In-Reply-To: <20260625-tcp-md5-connect-v3-0-1fd313d6c1e0@gmail.com>

From: Michael Bommarito <michael.bommarito@gmail.com>

Commit 51e547e8c89c ("tcp: Free TCP-AO/TCP-MD5 info/keys without RCU")
removed the call_rcu() callback from tcp_ao_destroy_sock(), arguing that
"the destruction of info/keys is delayed until the socket destructor"
and therefore "no one can discover it anymore".

That argument does not hold for the call site in tcp_connect()
(net/ipv4/tcp_output.c:4327-4332). At that point the socket is in
TCP_SYN_SENT, has already been inserted into the inet ehash by
inet_hash_connect() in tcp_v4_connect(), and is therefore very much
discoverable: any softirq running tcp_v4_rcv() on another CPU can take
the socket out of the ehash, walk into tcp_inbound_hash(), and load
tp->ao_info via implicit RCU before bh_lock_sock_nested() is taken on
the destroying CPU.

The reader path then enters __tcp_ao_do_lookup() (net/ipv4/tcp_ao.c:208)
which re-loads tp->ao_info via rcu_dereference_check(); the re-load can
still observe the (about-to-be-freed) pointer because there is no
synchronize_rcu() between rcu_assign_pointer(tp->ao_info, NULL) and
tcp_ao_info_free() in tcp_ao_destroy_sock(). The captured pointer is
then walked at line 223:

	hlist_for_each_entry_rcu(key, &ao->head, node, ...)

The writer's synchronous kfree() is free to complete between the line
218 re-fetch and the line 223 hlist iteration. The slab is reused
(or simply LIST_POISON1-stamped if not yet reused) and the iteration
walks attacker-controlled or poison memory in softirq context.

Reproducer (no debug shim, stock x86_64 v7.1-rc2 SMP+KASAN, QEMU+KVM):
an unprivileged uid=1000 process inside CLONE_NEWUSER|CLONE_NEWNET
installs TCP_MD5SIG + TCP_AO_ADD_KEY on a TCP socket, sprays forged
TCP-AO segments toward its eventual 4-tuple via raw sockets, then
calls connect(). The md5-wins reconciliation in tcp_connect() fires
tcp_ao_destroy_sock(); the softirq backlog reader on the loopback
NAPI path crashes on the freed ao->head.first walk:

  Oops: general protection fault, probably for non-canonical
    address 0xfbd59c000000002f
  KASAN: maybe wild-memory-access in range
    [0xdead000000000178-0xdead00000000017f]
  CPU: 0 UID: 1000 PID: 100 Comm: repro_userns
  RIP: 0010:__tcp_ao_do_lookup+0x107/0x1c0
  Call Trace: <IRQ>
    __tcp_ao_do_lookup+0x107/0x1c0
    tcp_ao_inbound_lookup.constprop.0+0x12a/0x200
    tcp_inbound_ao_hash+0x5ea/0x1520
    tcp_inbound_hash+0x7ce/0x1240
    tcp_v4_rcv+0x1e7a/0x3e10
    ...

Restore the RCU grace period: re-add struct rcu_head to tcp_ao_info
and replace the synchronous tcp_ao_info_free() with a call_rcu()
callback. Readers that captured tp->ao_info before rcu_assign_pointer
NULLed it now see the object remain valid until rcu_read_unlock().
With the patch applied the reproducer runs cleanly for 2000 iterations
on the same kernel build.

Fixes: 51e547e8c89c ("tcp: Free TCP-AO/TCP-MD5 info/keys without RCU")
Cc: stable@vger.kernel.org # v6.18+
Reviewed-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
Assisted-by: Claude:claude-opus-4-7
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
---
 include/net/tcp_ao.h | 1 +
 net/ipv4/tcp_ao.c    | 5 +++--
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/include/net/tcp_ao.h b/include/net/tcp_ao.h
index 29fd7b735afa..9a2333e62e99 100644
--- a/include/net/tcp_ao.h
+++ b/include/net/tcp_ao.h
@@ -145,6 +145,7 @@ struct tcp_ao_info {
 	u32			snd_sne;
 	u32			rcv_sne;
 	refcount_t		refcnt;		/* Protects twsk destruction */
+	struct rcu_head		rcu;
 };
 
 #ifdef CONFIG_TCP_MD5SIG
diff --git a/net/ipv4/tcp_ao.c b/net/ipv4/tcp_ao.c
index a56bb79e15e0..e4ec60a33496 100644
--- a/net/ipv4/tcp_ao.c
+++ b/net/ipv4/tcp_ao.c
@@ -371,8 +371,9 @@ static void tcp_ao_key_free_rcu(struct rcu_head *head)
 	kfree_sensitive(key);
 }
 
-static void tcp_ao_info_free(struct tcp_ao_info *ao)
+static void tcp_ao_info_free_rcu(struct rcu_head *head)
 {
+	struct tcp_ao_info *ao = container_of(head, struct tcp_ao_info, rcu);
 	struct tcp_ao_key *key;
 	struct hlist_node *n;
 
@@ -411,7 +412,7 @@ void tcp_ao_destroy_sock(struct sock *sk, bool twsk)
 
 	if (!twsk)
 		tcp_ao_sk_omem_free(sk, ao);
-	tcp_ao_info_free(ao);
+	call_rcu(&ao->rcu, tcp_ao_info_free_rcu);
 }
 
 void tcp_ao_time_wait(struct tcp_timewait_sock *tcptw, struct tcp_sock *tp)

-- 
2.51.2



^ permalink raw reply related

* [PATCH net v3 3/3] tcp: Decrement tcp_md5_needed static branch
From: Dmitry Safonov via B4 Relay @ 2026-06-25 18:21 UTC (permalink / raw)
  To: David Ahern, Eric Dumazet, Neal Cardwell, Kuniyuki Iwashima,
	David S. Miller, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Salam Noureddine
  Cc: Michael Bommarito, Qihang, netdev, linux-kernel, Dmitry Safonov,
	stable
In-Reply-To: <20260625-tcp-md5-connect-v3-0-1fd313d6c1e0@gmail.com>

From: Dmitry Safonov <0x7f454c46@gmail.com>

In case of early freeing an unwanted TCP-MD5 key on TCP-AO connect(),
md5sig_info is freed right away (and set to NULL). Later, at
the moment of socket destruction, the static branch counter
is not getting decremented.

Add a missing decrement for TCP-MD5 static branch.

Reported-by: Qihang <q.h.hack.winter@gmail.com>
Fixes: 0aadc73995d0 ("net/tcp: Prevent TCP-MD5 with TCP-AO being set")
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
---
 net/ipv4/tcp_output.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index bc03809ca3af..d7c1444b5e30 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -4334,8 +4334,8 @@ int tcp_connect(struct sock *sk)
 			tcp_clear_md5_list(sk);
 			md5sig = rcu_replace_pointer(tp->md5sig_info, NULL,
 						     lockdep_sock_is_held(sk));
-			if (md5sig)
-				kfree_rcu(md5sig, rcu);
+			kfree_rcu(md5sig, rcu);
+			static_branch_slow_dec_deferred(&tcp_md5_needed);
 		}
 	}
 #endif

-- 
2.51.2



^ permalink raw reply related

* [PATCH] selftests: Open /dev/udmabuf O_RDONLY
From: T.J. Mercier @ 2026-06-25 18:15 UTC (permalink / raw)
  To: kraxel, vivek.kasireddy, Shuah Khan, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: T.J. Mercier, linux-kselftest, linux-kernel, netdev, bpf

Write permissions on the /dev/udmabuf device file are not required to
issue ioctls and allocate udmabufs. Applications should be opening this
file as O_RDONLY. The BPF dmabuf_iter selftest already does this. [1]

Remove the write access mode from the drivers/dma-buf/udmabuf.c and
drivers/net/hw/ncdevmem.c selftests.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/testing/selftests/bpf/prog_tests/dmabuf_iter.c?h=v7.1#n49

Signed-off-by: T.J. Mercier <tjmercier@google.com>
---
 tools/testing/selftests/drivers/dma-buf/udmabuf.c | 2 +-
 tools/testing/selftests/drivers/net/hw/ncdevmem.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/drivers/dma-buf/udmabuf.c b/tools/testing/selftests/drivers/dma-buf/udmabuf.c
index d78aec662586..ced0b95c876c 100644
--- a/tools/testing/selftests/drivers/dma-buf/udmabuf.c
+++ b/tools/testing/selftests/drivers/dma-buf/udmabuf.c
@@ -140,7 +140,7 @@ int main(int argc, char *argv[])
 	ksft_print_header();
 	ksft_set_plan(7);
 
-	devfd = open("/dev/udmabuf", O_RDWR);
+	devfd = open("/dev/udmabuf", O_RDONLY);
 	if (devfd < 0) {
 		ksft_print_msg(
 			"%s: [skip,no-udmabuf: Unable to access DMA buffer device file]\n",
diff --git a/tools/testing/selftests/drivers/net/hw/ncdevmem.c b/tools/testing/selftests/drivers/net/hw/ncdevmem.c
index e098d6534c3c..8114a29692fd 100644
--- a/tools/testing/selftests/drivers/net/hw/ncdevmem.c
+++ b/tools/testing/selftests/drivers/net/hw/ncdevmem.c
@@ -149,7 +149,7 @@ static struct memory_buffer *udmabuf_alloc(size_t size)
 
 	ctx->size = size;
 
-	ctx->devfd = open("/dev/udmabuf", O_RDWR);
+	ctx->devfd = open("/dev/udmabuf", O_RDONLY);
 	if (ctx->devfd < 0) {
 		pr_err("[skip,no-udmabuf: Unable to access DMA buffer device file]");
 		goto err_free_ctx;
-- 
2.55.0.rc0.799.gd6f94ed593-goog


^ permalink raw reply related

* Re: [PATCH net v2] sctp: fix SCTP_RESET_STREAMS stream list length limit
From: Yousef Alhouseen @ 2026-06-25 18:14 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Marcelo Ricardo Leitner, Xin Long, David S. Miller, Eric Dumazet,
	Paolo Abeni, Simon Horman, linux-sctp, netdev, linux-kernel
In-Reply-To: <20260625081916.77a017f3@kernel.org>

Hi Jakub,

Understood, sorry for the extra mail. I will avoid reposting
networking patches only to add tags.

Thanks,
Yousef

On Thu, 25 Jun 2026 08:19:16 -0700, Jakub Kicinski <kuba@kernel.org> wrote:
> On Thu, 25 Jun 2026 16:23:54 +0200 Yousef Alhouseen wrote:
> > Changes in v2:
> > - Add Fixes and Acked-by tags from Xin Long.
> > - v1: https://lore.kernel.org/r/20260624122213.4052-1-alhouseenyousef@gmail.com
>
> You don't have to repost patches for networking just to add tags :/

^ permalink raw reply

* Re: [PATCH bpf-next v2] bpf, unix: Guard sk_msg-dependent code behind CONFIG_NET_SOCK_MSG
From: Jakub Sitnicki @ 2026-06-25 17:53 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Jiayuan Chen, Amery Hung, Kuniyuki Iwashima, bpf,
	Alexei Starovoitov, Daniel Borkmann, Jakub Kicinski,
	John Fastabend, Network Development, kernel-team
In-Reply-To: <DJHKW8E1F6PI.P3WUIG9DZE1K@gmail.com>

On Wed, Jun 24, 2026 at 01:57 PM -07, Alexei Starovoitov wrote:
> On Tue Jun 23, 2026 at 6:32 PM PDT, Jiayuan Chen wrote:
>>
>> Hi Alexei and Jakub,
>>
>> skmsg is actually still pretty useful for gateways.
>> I started with bpf by integrating skmsg into nginx as a module and envoy 
>> has something similar.
>> The usual setup is cgroup/sk for L4 bypass (reject SYN), and skmsg for 
>> L7, redirecting
>> between local apps by looking at the payload. So there are real users.
>
> ...
>
>> Agree, just like we remove skmsg from KTLS which is rarely used.
>
> ...
>
>> Hope not have skmsg disabled by default.
>
> I wasn't suggesting to delete the whole skmsg,
> but to disable combinations that are causing issues.
> Like what was done for skmsg and ktls.
> I'd allow plain tcp and udp sockets only.
> Allowing unix sockets was fishy. I think we should reject it too.

For unix & vsock we know Bytedance built a proxy using it.
We've been showcasing it as one of sockmap use cases [1].
That said, I don't know if it's still being used or not.

If we don't want to go through the config-knob-then-deprecate process,
then I guess the only option is to kill it and see if anyone complains.

[1] Slide 117, https://github.com/sockmap-project/sockmap-project/blob/810d259af6e7a5793922af3991c9dc7ff502fe19/talks/2024-09%20-%20NDC%20TechTown%20-%20Splicing%20Sockets%20with%20SOCKMAP.pdf

^ permalink raw reply

* [GIT PULL] Networking for v7.2-rc1
From: Jakub Kicinski @ 2026-06-25 17:45 UTC (permalink / raw)
  To: torvalds; +Cc: kuba, davem, netdev, linux-kernel, pabeni

Hi Linus!

The following changes since commit b85966adbf5de0668a815c6e3527f87e0c387fb4:

  Merge tag 'net-next-7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next (2026-06-17 08:17:00 +0100)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git tags/net-7.2-rc1

for you to fetch changes up to fe9f4ee6c61a1410afd73bf011de5ae618004796:

  Merge branch 'net-avoid-nested-up-notifier-events' (2026-06-25 10:18:41 -0700)

----------------------------------------------------------------
Including fixes from netfilter and IPsec.

Current release - regressions:

 - net: do not acquire dev->tx_global_lock in netdev_watchdog_up()

 - net: ethtool: keep rtnl_lock for ops using ethtool_op_get_link()

 - net: fix deadlock in nested UP notifier events

Current release - new code bugs:

 - eth: cn20k: fix subbank free list indexing for search order

 - eth: airoha: fix BQL underflow in shared QDMA TX ring

Previous releases - regressions:

 - netfilter:
   - flowtable: fix offloaded ct timeout never being extended
   - nf_conncount: prevent connlimit drops for early confirmed ct

Previous releases - always broken:

 - require CAP_NET_ADMIN in the originating netns when modifying
   cross-netns devices

 - report NAPI thread PID in the caller's pid namespace

 - mac802154: fix dirty frag in in-place crypto for IOT radios

 - sctp: hold socket lock when dumping endpoints in sctp_diag,
   avoid an overflow

 - eth: gve: fix header buffer corruption with header-split and HW-GRO

 - af_key: initialize alg_key_len for IPComp states, prevent OOB read

Signed-off-by: Jakub Kicinski <kuba@kernel.org>

----------------------------------------------------------------
Abdun Nihaal (1):
      bnx2x: fix potential memory leak in bnx2x_alloc_mem_bp()

Adrian Bente (1):
      netfilter: flowtable: fix offloaded ct timeout never being extended

Aleksandr Nogikh (1):
      ieee802154: fix kernel-infoleak in dgram_recvmsg()

Aleksandrova Alyona (1):
      net: dsa: sja1105: round up PTP perout pin duration

Ankit Garg (1):
      gve: fix header buffer corruption with header-split and HW-GRO

Arnd Bergmann (1):
      eth: mlx5: fix macsec dependency

Breno Leitao (1):
      netconsole: don't drop the last byte of a full-sized message

Christian Marangi (1):
      net: ethernet: mtk_eth_soc: fix supported_interface set after phylink_create

Cosmin Ratiu (2):
      devlink: Fix parent ref leak in devl_rate_node_create()
      devlink: Fix parent ref leak on tc-bw failure

Daniel Golle (2):
      net: dsa: mxl862xx: avoid unaligned 16-bit access in api_wrap
      net: dsa: mxl862xx: fix use-after-free of DSA ports in crc_err_work

Daniel Zahka (1):
      eth: fbnic: take netif_addr_lock_bh() around rx mode address programming

David Howells (10):
      rxrpc: Fix leak of connection from OOB challenge
      rxrpc: Fix double unlock in rxrpc_recvmsg()
      afs: Fix further netns teardown to cancel the preallocation charger
      afs: Fix uncancelled rxrpc OOB message handler
      rxrpc: Fix the reception of a reply packet before data transmission
      rxrpc: Fix oob challenge leak in cleanup after notification failure
      rxrpc: Fix potential infinite loop in rxrpc_recvmsg()
      rxrpc: Fix socket notification race
      rxrpc: Fix leak of released call in recvmsg(MSG_PEEK)
      rxrpc: Fix rxrpc_rotate_tx_rotate() to check there's something to rotate

David Yang (1):
      net: dsa: realtek: fix memory leak in rtl8366rb_setup_led()

Dawei Feng (1):
      net: ena: clean up XDP TX queues when regular TX setup fails

Dawid Osuchowski (1):
      ice: fix FDIR CTRL VSI resource leak in ice_reset_all_vfs()

Dima Ruinskiy (1):
      e1000e: Reconfigure PLL clock gate timeout and re-enable K1 on Meteor Lake

Dong Chenchen (1):
      xfrm: Fix dev use-after-free in xfrm async resumption

Doruk Tan Ozturk (2):
      tipc: fix slab-use-after-free Read in tipc_aead_decrypt_done
      mac802154: llsec: add skb_cow_data() before in-place crypto

Eric Dumazet (7):
      xfrm: annotate data-races around xfrm_policy_count[] and xfrm_policy_default[]
      xfrm: validate selector family and prefixlen during match
      net: do not acquire dev->tx_global_lock in netdev_watchdog_up()
      veth: fix NAPI leak in XDP enable error path
      net: udp_tunnel: prevent double queueing in udp_tunnel_nic_device_sync
      tipc: fix UAF in cleanup_bearer() due to premature dst_cache_destroy()
      tipc: avoid busy looping in tipc_exit_net()

Erni Sri Satya Vennela (1):
      net: mana: Fall back to standard MTU when PF reports adapter_mtu of 0

Fan Wu (1):
      hdlc_ppp: sync per-proto timers before freeing hdlc state

Fernando Fernandez Mancera (7):
      netfilter: nf_conncount: prevent connlimit drops for early confirmed ct
      ipv6: fix error handling in disable_ipv6 sysctl
      ipv6: fix error handling in ignore_routes_with_linkdown sysctl
      ipv6: fix error handling in forwarding sysctl
      ipv6: fix error handling in disable_policy sysctl
      ipv6: fix state corruption during proxy_ndp sysctl restart
      ipv6: fix missing notification for ignore_routes_with_linkdown

Florian Westphal (9):
      netfilter: nft_payload: reject offsets exceeding 65535 bytes
      netfilter: nft_meta_bridge: add validate callback for get operations
      netfilter: nft_flow_offload: zero device address for non-ether case
      netfilter: nf_reject: skip iphdr options when looking for icmp header
      netfilter: nft_meta_bridge: fix NFT_META_BRI_IIFPVID stack leak
      netfilter: nft_compat: ebtables emulation must reject non-bridge targets
      selftests: nft_queue.sh: add a bridge queue test
      netfilter: conntrack: add deprecation warnings for irc and pptp trackers
      netfilter: nft_ct: expectation timeouts are passed in milliseconds

Geetha sowjanya (2):
      octeontx2-af: mcs: Fix unsupported secy stats read
      octeontx2-pf: mcs: Fix mcs resources free on PF shutdown

Greg Thelen (1):
      tools: ynl: build archives with $(AR)

HanQuan (1):
      net/tcp-ao: fix use-after-free of key in del_async path

Haoxiang Li (4):
      net: ixp4xx_hss: fix duplicate HDLC netdev allocation
      net: wwan: t7xx: destroy DMA pool on CLDMA late init failure
      octeontx2-af: Free BPID bitmap on setup failure
      net: sparx5: unregister blocking notifier on init failure

Haoze Xie (1):
      netfilter: nf_queue: pin bridge device while NFQUEUE holds fake dst

Herbert Xu (1):
      xfrm: Fix xfrm state cache insertion race

Ido Schimmel (1):
      selftests: vlan_bridge_binding: Fix flaky operational state check

Ilya Maximets (1):
      net: dst_metadata: fix false-positive memcpy overflow in tun_dst_unclone

Inochi Amaoto (2):
      net: stmmac: dwmac-spacemit: Fix wrong phy interface definition
      net: stmmac: dwmac-spacemit: Fix wrong irq definition

Ioana Ciornei (2):
      dpaa2-switch: fix VLAN upper check not rejecting bridge join
      dpaa2-switch: do not accept VLAN uppers while bridged

Ivan Abramov (3):
      ieee802154: Restore initial state on failed device_rename() in cfg802154_switch_netns()
      ieee802154: Avoid calling WARN_ON() on -ENOMEM in cfg802154_switch_netns()
      ieee802154: Remove WARN_ON() in cfg802154_pernet_exit()

Jakub Kicinski (27):
      Merge branch 'net-require-cap_net_admin-in-the-device-netns-for-tunnel-changelink'
      net: psample: fix info leak in PSAMPLE_ATTR_DATA
      Merge branch 'net-sched-act_ct-preserve-tc_skb_cb-across-defragmentation'
      Merge branch 'devlink-fix-a-couple-parent-ref-leaks'
      Merge tag 'batadv-net-pullrequest-20260619' of https://git.open-mesh.org/batadv
      Merge tag 'ieee802154-for-net-next-2026-06-20' of git://git.kernel.org/pub/scm/linux/kernel/git/wpan/wpan-next
      Merge branch 'ipv4-ipv6-account-for-fraggap-on-paged-allocation-paths'
      Merge tag 'nf-26-06-21' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
      eth: bnxt: improve the timing of stats
      Merge branch 'selftests-xsk-stabilize-timeout-test-behavior'
      Merge tag 'ipsec-2026-06-22' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec
      Merge branch 'drop-skb-metadata-before-lwt-encapsulation'
      Merge branch 'ipv6-fix-error-handling-in-disable_ipv6-sysctl'
      eth: fbnic: fix ordering of heartbeat vs ownership
      Merge branch 'airoha-fixes-for-sched-htb-offload-support'
      Merge branch 'net-stmmac-dwmac-spacemit-fix-wrong-macro-definition'
      Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
      Merge tag 'nf-26-06-23' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
      Merge branch 'tipc-syzbot-related-fixes'
      Merge branch 'net-hns3-fix-configuration-deadlocks-and-refactor-link-setup'
      Merge branch 'rxrpc-miscellaneous-fixes'
      net: ethtool: keep rtnl_lock for ops using ethtool_op_get_link()
      net: turn the rx_mode work into a generic netdev_work facility
      net: add the driver-facing netdev_work scheduling API
      vlan: defer real device state propagation to netdev_work
      selftests: bonding: add a test for VLAN propagation over a bonded real device
      Merge branch 'net-avoid-nested-up-notifier-events'

Jakub Sitnicki (2):
      net: lwtunnel: Drop skb metadata before LWT encapsulation
      selftests/bpf: Add LWT encap tests for skb metadata

Jamal Hadi Salim (1):
      net/sched: cls_api: Handle TC_ACT_CONSUMED in tcf_qevent_handle

Jan Klos (1):
      net: phy: realtek: Clear MDIO_AN_10GBT_CTRL_ADV10G bit

Jiayuan Chen (1):
      ipv6: ioam: fix type confusion of dst_entry

Jozsef Kadlecsik (4):
      netfilter: ipset: Don't use test_bit() in lockless RCU readers in hash types
      netfilter: ipset: Don't use test_bit() in lockless RCU readers in bitmap types
      netfilter: ipset: fix order of kfree_rcu() and rcu_assign_pointer()
      netfilter: ipset: make sure gc is properly stopped

Junrui Luo (1):
      octeontx2-af: cn10k: restrict VF LMTLINE sharing to its own PF

Krzysztof Kozlowski (1):
      net: ethernet: qualcomm: ppe: Demote from supported and fix maintainer addresses

Kuniyuki Iwashima (1):
      ipv4: fib: Don't ignore error route in local/main tables.

Li RongQing (1):
      net/mlx5: Remove broken and unused mlx5_query_mtppse()

Lorenzo Bianconi (5):
      netfilter: flowtable: fix and simplify IP6IP6 tunnel handling
      netfilter: flowtable: Validate iph->ihl in nf_flow_ip4_tunnel_proto()
      net: airoha: Fix off-by-one in airoha_tc_remove_htb_queue()
      net: airoha: fix netif_set_real_num_tx_queues for sparse QoS channels
      net: airoha: fix BQL underflow in shared QDMA TX ring

Lukasz Czapnik (1):
      ice: fix AQ error code comparison in ice_set_pauseparam()

Lukasz Raczylo (1):
      net: macb: add TX stall timeout callback to recover from lost TSTART write

Maoyi Xie (11):
      net: ip_gre: require CAP_NET_ADMIN in the device netns for changelink
      net: ipip: require CAP_NET_ADMIN in the device netns for changelink
      net: ip_vti: require CAP_NET_ADMIN in the device netns for changelink
      net: ip6_tunnel: require CAP_NET_ADMIN in the device netns for changelink
      net: ip6_gre: require CAP_NET_ADMIN in the device netns for changelink
      net: ip6_vti: require CAP_NET_ADMIN in the device netns for changelink
      xfrm: xfrm_interface: require CAP_NET_ADMIN in the device netns for changelink
      netdev-genl: report NAPI thread PID in the caller's pid namespace
      net: thunderbolt: Fix frags[] overflow by bounding frame_count
      net: sit: require CAP_NET_ADMIN in the device netns for changelink
      net: usb: kalmia: bound RX frame length in kalmia_rx_fixup()

Marcin Szycik (1):
      ice: call netif_keep_dst() once when entering switchdev mode

Mathias Krause (1):
      netfilter: nf_nat: avoid invalid nat_net pointer use on failed nf_nat_init()

Meghana Malladi (1):
      net: ti: icssg: Fix XSK zero copy TX during application wakeup

Michael Bommarito (4):
      net: rds: check cmsg_len before reading rds_rdma_args in size pass
      ieee802154: admin-gate legacy LLSEC dump operations
      ieee802154: allow legacy LLSEC ADD/DEL ops to pass strict validation
      net/sched: act_ct: fix nf_connlabels leak on two error paths

Mohamed Khalfella (1):
      i40e: Fix i40e_debug() to use struct i40e_hw argument

Nicolai Buchwitz (1):
      net: usb: lan78xx: restore VLAN and hash filters after link up

Nirmoy Das (1):
      selftests: tls: size splice_short pipe by page size

Pablo Neira Ayuso (5):
      netfilter: nf_conntrack_expect: use conntrack GC to reap expectations
      netfilter: ctnetlink: do not allow to reset helper on existing conntrack
      netfilter: nf_conntrack_expect: store master_tuple in expectation
      netfilter: nf_conntrack_expect: run expectation eviction with no helper
      netfilter: nf_conntrack_helper: cap maximum number of expectation at helper registration

Paul Greenwalt (1):
      ice: fix ice_init_link() error return preventing probe

Pengpeng Hou (1):
      net: ehea: unwind probe_port sysfs file on failure

Philippe Schenker (1):
      net: ethernet: ti: icssg: guard PA stat lookups

Randy Dunlap (1):
      netfilter: x_tables.h: fix all kernel-doc warnings

Ratheesh Kannoth (5):
      octeontx2-pf: Fix leak of SQ timestamp buffer on teardown
      octeontx2-af: npc: Log successful MCAM drop-on-non-hit install at debug level
      octeontx2-af: npc: cn20k: fix NPC defrag
      octeontx2-af: npc: cn20k: Fix subbank free list indexing for search order
      octeontx2-af: fix CGX debugfs RVU AF PCI reference leaks

Rob Herring (Arm) (1):
      dt-bindings: net: renesas,ether: Drop example "ethernet-phy-ieee802.3-c22" fallback

Robert Marko (1):
      net: pse-pd: set user byte command SUB2 field

Robertus Diawan Chris (1):
      mac802154: Prevent overwrite return code in mac802154_perform_association()

Rongguang Wei (1):
      net: wangxun: don't advertise IFF_SUPP_NOFCS

Rosen Penev (1):
      net: emac: Fix NULL pointer dereference in emac_probe

Ross Porter (1):
      selftests: net: fix file owner for broadcast_ether_dst test

Runyu Xiao (3):
      netfilter: nft_synproxy: stop bypassing the priv->info snapshot
      net: au1000: move free_irq out of the close-time spinlocked section
      openvswitch: conntrack: annotate ct limit hlist traversal

Ruoyu Wang (3):
      net: pch_gbe: handle TX skb allocation failure
      net: marvell: prestera: initialize err in prestera_port_sfp_bind
      net: sungem: fix probe error cleanup

Sabrina Dubroca (1):
      espintcp: use sk_msg_free_partial to fix partial send

Sanman Pradhan (1):
      xfrm: use compat translator only for u64 alignment mismatch

Shitalkumar Gandhi (3):
      ieee802154: ca8210: fix cas_ctl leak on spi_async failure
      ieee802154: ca8210: fix pointer truncation in kfifo on 64-bit
      net: ethernet: sunplus: spl2sw: fix phy_node refcount leak in remove

Shradha Gupta (1):
      net: mana: Optimize irq affinity for low vcpu configs

Shuaisong Yang (4):
      net: hns3: unify copper port ksettings configuration path
      net: hns3: refactor MAC autoneg and speed configuration
      net: hns3: fix permanent link down deadlock after reset
      net: hns3: differentiate autoneg default values between copper and fiber

Subbaraya Sundeep (2):
      octeontx2-pf: Clear stats of all resources when freeing resources
      octeontx2-af: Validate NIX maximum LFs correctly

Sven Eckelmann (15):
      batman-adv: gw: don't deselect gateway with active hardif
      batman-adv: ensure bcast is writable before modifying TTL
      batman-adv: fix (m|b)cast csum after decrementing TTL
      batman-adv: frag: ensure fragment is writable before modifying TTL
      batman-adv: frag: avoid underflow of TTL
      batman-adv: v: prevent OGM aggregation on disabled hardif
      batman-adv: tp_meter: restrict number of unacked list entries
      batman-adv: tp_meter: annotate last_recv_time access with READ/WRITE_ONCE
      batman-adv: tp_meter: prevent parallel modifications of last_recv
      batman-adv: tp_meter: handle overlapping packets
      batman-adv: tt: don't merge change entries with different VIDs
      batman-adv: tt: track roam count per VID
      batman-adv: dat: prevent false sharing between VLANs
      batman-adv: tvlv: enforce 2-byte alignment
      batman-adv: tvlv: avoid race of cifsnotfound handler state

Thorsten Leemhuis (1):
      tools/ynl: add missing uapi header deps in Makefile.deps

Tushar Vyavahare (3):
      selftests/xsk: make poll timeout mode explicit
      selftests/xsk: fix timeout thread harness sequencing
      selftests/xsk: restore shared_umem after POLL_TXQ_FULL

Wayen Yan (4):
      net: airoha: fix foe_check_time allocation size
      net: ethernet: mtk_ppe: Fix rhashtable leak in mtk_ppe_init error paths
      net: airoha: Fix skb->priority underflow in airoha_dev_select_queue()
      net: airoha: Fix TX scheduler queue mask loop upper bound

Wei Fang (1):
      net: enetc: fix potential divide-by-zero when num_vsi is zero

Weiming Shi (2):
      tipc: fix use-after-free of the discoverer in tipc_disc_rcv()
      ipv6: ndisc: fix NULL deref in accept_untracked_na()

Wells Lu (1):
      MAINTAINERS: Orphan SUNPLUS ETHERNET DRIVER

Wentao Guan (1):
      net: llc: make empty have static storage duration

Willem de Bruijn (1):
      selftests: drv-net: so_txtime: relax variance bounds

Wongi Lee (2):
      ipv4: account for fraggap on the paged allocation path
      ipv6: account for fraggap on the paged allocation path

Wyatt Feng (3):
      net: ipv4: bound TCP reordering sysctl writes and MTU probe sizes
      netfilter: xt_cluster: reject template conntracks in hash match
      rxrpc: Fix ACKALL packet handling

Xiang Mei (5):
      virtio-net: fix len check in receive_big()
      ipv6: Fix null-ptr-deref in fib6_nh_mtu_change().
      net, bpf: check master for NULL in xdp_master_redirect()
      geneve: gate GRO hint in geneve_gro_complete() on gs->gro_hint
      geneve: validate inner network offset in geneve_gro_complete()

Xin Long (2):
      sctp: hold socket lock when dumping endpoints in sctp_diag
      sctp: fix err_chunk memory leaks in INIT handling

Xingquan Liu (2):
      net/sched: dualpi2: fix GSO backlog accounting
      selftests/tc-testing: Add DualPI2 GSO backlog accounting test

Yi Chen (1):
      selftests: netfilter: conntrack_sctp_collision.sh: Introduce SCTP INIT collision test

Yun Zhou (2):
      flow_dissector: check device type before reading ETH_ADDRS
      net: mvneta: re-enable percpu interrupt on resume

ZhaoJinming (2):
      ice: dpll: set pointers to NULL after kfree in ice_dpll_deinit_info
      ice: dpll: fix memory leak in ice_dpll_init_info error paths

Zihan Xi (2):
      net/sched: act_ct: preserve tc_skb_cb across defragmentation
      selftests/tc-testing: act_ct: add TDC test for skb cb preservation across defrag

Zijing Yin (1):
      net: af_key: initialize alg_key_len for IPComp states

Ziran Zhang (1):
      rocker: Fix memory leak in ofdpa_port_fdb()

 .mailmap                                           |   3 +-
 .../bindings/clock/qcom,ipq9574-cmn-pll.yaml       |   2 +-
 .../bindings/clock/qcom,qca8k-nsscc.yaml           |   2 +-
 .../devicetree/bindings/net/qcom,ipq9574-ppe.yaml  |   2 +-
 .../devicetree/bindings/net/renesas,ether.yaml     |   3 +-
 Documentation/networking/netdevices.rst            |   2 +
 MAINTAINERS                                        |   7 +-
 drivers/net/dsa/mxl862xx/mxl862xx-host.c           |  18 +-
 drivers/net/dsa/realtek/rtl8366rb-leds.c           |   8 +-
 drivers/net/dsa/sja1105/sja1105_ptp.c              |   2 +-
 drivers/net/ethernet/airoha/airoha_eth.c           | 201 ++++++++++++---------
 drivers/net/ethernet/airoha/airoha_eth.h           |   3 +-
 drivers/net/ethernet/airoha/airoha_ppe.c           |   3 +-
 drivers/net/ethernet/amazon/ena/ena_netdev.c       |  23 ++-
 drivers/net/ethernet/amd/au1000_eth.c              |   3 +-
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c    |   3 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.c          |  48 ++++-
 drivers/net/ethernet/broadcom/bnxt/bnxt.h          |   5 +
 drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c  |   1 +
 drivers/net/ethernet/cadence/macb_main.c           |   8 +
 .../net/ethernet/freescale/dpaa2/dpaa2-switch.c    |  10 +-
 drivers/net/ethernet/freescale/enetc/enetc4_pf.c   |   3 +
 drivers/net/ethernet/google/gve/gve_ethtool.c      |   3 +-
 drivers/net/ethernet/google/gve/gve_rx_dqo.c       |  28 ++-
 drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c |  31 ++--
 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c    | 108 ++++++++---
 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.h    |   1 +
 drivers/net/ethernet/ibm/ehea/ehea_main.c          |   2 +
 drivers/net/ethernet/ibm/emac/core.c               |  13 +-
 drivers/net/ethernet/intel/e1000e/ich8lan.c        |   3 +
 drivers/net/ethernet/intel/e1000e/netdev.c         |  15 +-
 drivers/net/ethernet/intel/i40e/i40e_debug.h       |   2 +-
 drivers/net/ethernet/intel/iavf/iavf_ethtool.c     |   1 +
 drivers/net/ethernet/intel/ice/ice_common.c        |   1 -
 drivers/net/ethernet/intel/ice/ice_dpll.c          |  20 +-
 drivers/net/ethernet/intel/ice/ice_eswitch.c       |   4 +-
 drivers/net/ethernet/intel/ice/ice_ethtool.c       |  12 +-
 drivers/net/ethernet/intel/ice/ice_main.c          |  16 +-
 drivers/net/ethernet/intel/ice/ice_vf_lib.c        |   2 +-
 drivers/net/ethernet/marvell/mvneta.c              |   3 +
 .../net/ethernet/marvell/octeontx2/af/cn20k/npc.c  |  60 ++++--
 drivers/net/ethernet/marvell/octeontx2/af/mcs.c    |   6 +-
 .../net/ethernet/marvell/octeontx2/af/rvu_cn10k.c  |   9 +
 .../ethernet/marvell/octeontx2/af/rvu_debugfs.c    |  59 +++---
 .../ethernet/marvell/octeontx2/af/rvu_devlink.c    |  27 ++-
 .../net/ethernet/marvell/octeontx2/af/rvu_nix.c    |  11 +-
 .../net/ethernet/marvell/octeontx2/af/rvu_npc_fs.c |   2 +-
 .../ethernet/marvell/octeontx2/nic/cn10k_macsec.c  |  10 +-
 .../net/ethernet/marvell/octeontx2/nic/otx2_pf.c   |   1 +
 .../net/ethernet/marvell/prestera/prestera_main.c  |   2 +-
 drivers/net/ethernet/mediatek/mtk_eth_soc.c        |  10 +-
 drivers/net/ethernet/mediatek/mtk_ppe.c            |   4 +-
 drivers/net/ethernet/mellanox/mlx5/core/Kconfig    |   2 +-
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   |   3 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c   |   3 +-
 .../ethernet/mellanox/mlx5/core/ipoib/ethtool.c    |   4 +-
 .../net/ethernet/mellanox/mlx5/core/mlx5_core.h    |   1 -
 drivers/net/ethernet/mellanox/mlx5/core/port.c     |  19 --
 drivers/net/ethernet/meta/fbnic/fbnic_ethtool.c    |   3 +-
 drivers/net/ethernet/meta/fbnic/fbnic_fw.c         |   9 +-
 drivers/net/ethernet/meta/fbnic/fbnic_netdev.c     |   7 +-
 drivers/net/ethernet/meta/fbnic/fbnic_pci.c        |   4 +
 drivers/net/ethernet/meta/fbnic/fbnic_rpc.c        |   2 +
 .../ethernet/microchip/sparx5/sparx5_switchdev.c   |   4 +-
 drivers/net/ethernet/microsoft/mana/gdma_main.c    |  78 ++++++--
 drivers/net/ethernet/microsoft/mana/mana_bpf.c     |   3 +-
 drivers/net/ethernet/microsoft/mana/mana_en.c      |  16 +-
 drivers/net/ethernet/microsoft/mana/mana_ethtool.c |   3 +-
 .../net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c   |  38 +++-
 drivers/net/ethernet/rocker/rocker_ofdpa.c         |   3 +
 .../net/ethernet/stmicro/stmmac/dwmac-spacemit.c   |  13 +-
 drivers/net/ethernet/sun/sungem.c                  |  13 +-
 drivers/net/ethernet/sunplus/spl2sw_phy.c          |   6 +-
 drivers/net/ethernet/ti/icssg/icssg_common.c       |  72 ++++----
 drivers/net/ethernet/wangxun/ngbe/ngbe_main.c      |   1 -
 drivers/net/ethernet/wangxun/txgbe/txgbe_main.c    |   1 -
 drivers/net/geneve.c                               |  18 +-
 drivers/net/ieee802154/ca8210.c                    |   9 +-
 drivers/net/netconsole.c                           |  12 +-
 drivers/net/phy/realtek/realtek_main.c             |   3 +-
 drivers/net/pse-pd/pd692x0.c                       |   2 +-
 drivers/net/thunderbolt/main.c                     |   8 +-
 drivers/net/usb/kalmia.c                           |   8 +
 drivers/net/usb/lan78xx.c                          |  37 +++-
 drivers/net/veth.c                                 |   2 +
 drivers/net/virtio_net.c                           |   9 +-
 drivers/net/wan/hdlc_ppp.c                         |  15 +-
 drivers/net/wan/ixp4xx_hss.c                       |   4 +-
 drivers/net/wwan/t7xx/t7xx_hif_cldma.c             |   3 +
 fs/afs/cm_security.c                               |   3 +-
 fs/afs/rxrpc.c                                     |  10 +-
 include/linux/ethtool.h                            |   2 +
 include/linux/netdevice.h                          |  21 ++-
 include/linux/netfilter/x_tables.h                 |  29 ++-
 include/net/dst_metadata.h                         |   7 +-
 include/net/ip_fib.h                               |   7 +-
 include/net/netfilter/nf_conntrack_expect.h        |  17 +-
 include/net/netfilter/nf_conntrack_helper.h        |   4 +
 include/net/netfilter/nf_queue.h                   |   1 +
 include/net/netfilter/nft_meta.h                   |   2 +
 include/net/rtnetlink.h                            |   2 +
 include/net/sctp/sctp.h                            |   3 +-
 include/net/xfrm.h                                 |  15 +-
 include/uapi/linux/netfilter/nf_conntrack_common.h |   1 +
 net/8021q/vlan.c                                   |  76 +-------
 net/8021q/vlan.h                                   |  11 ++
 net/8021q/vlan_dev.c                               |  60 ++++++
 net/batman-adv/bat_iv_ogm.c                        |  11 +-
 net/batman-adv/bat_v.c                             |   1 +
 net/batman-adv/bat_v_ogm.c                         |  23 ++-
 net/batman-adv/distributed-arp-table.c             |  12 +-
 net/batman-adv/fragmentation.c                     |  22 ++-
 net/batman-adv/fragmentation.h                     |   3 +-
 net/batman-adv/hard-interface.c                    |  28 +--
 net/batman-adv/routing.c                           |  73 +++++++-
 net/batman-adv/tp_meter.c                          |  74 +++++---
 net/batman-adv/translation-table.c                 |  12 +-
 net/batman-adv/tvlv.c                              |  69 ++++++-
 net/batman-adv/types.h                             |  21 ++-
 net/bridge/netfilter/nft_meta_bridge.c             |  23 ++-
 net/core/Makefile                                  |   2 +-
 net/core/dev.c                                     |   2 +
 net/core/dev.h                                     |  11 +-
 net/core/dev_addr_lists.c                          |  77 +-------
 net/core/filter.c                                  |   2 +-
 net/core/flow_dissector.c                          |  12 +-
 net/core/lwtunnel.c                                |   6 +
 net/core/netdev-genl.c                             |   4 +-
 net/core/netdev_work.c                             | 162 +++++++++++++++++
 net/core/rtnetlink.c                               |   8 +
 net/devlink/rate.c                                 |  25 +--
 net/ethtool/common.h                               |   4 +
 net/ieee802154/core.c                              |  49 ++---
 net/ieee802154/header_ops.c                        |   9 +-
 net/ieee802154/ieee802154.h                        |  17 ++
 net/ieee802154/netlink.c                           |  36 ++--
 net/ipv4/ip_gre.c                                  |   6 +
 net/ipv4/ip_output.c                               |   7 +-
 net/ipv4/ip_vti.c                                  |   3 +
 net/ipv4/ipip.c                                    |   3 +
 net/ipv4/netfilter/nf_reject_ipv4.c                |   2 +-
 net/ipv4/sysctl_net_ipv4.c                         |  10 +-
 net/ipv4/tcp_ao.c                                  |   4 +
 net/ipv4/tcp_output.c                              |   4 +-
 net/ipv4/udp_tunnel_nic.c                          |   2 +-
 net/ipv4/xfrm4_input.c                             |   2 -
 net/ipv6/addrconf.c                                |  42 +++--
 net/ipv6/ioam6_iptunnel.c                          |   8 +-
 net/ipv6/ip6_gre.c                                 |   6 +
 net/ipv6/ip6_output.c                              |   9 +-
 net/ipv6/ip6_tunnel.c                              |  10 +
 net/ipv6/ip6_vti.c                                 |   3 +
 net/ipv6/ndisc.c                                   |   8 +-
 net/ipv6/route.c                                   |   3 +
 net/ipv6/sit.c                                     |   3 +
 net/ipv6/xfrm6_input.c                             |   2 -
 net/key/af_key.c                                   |   1 +
 net/llc/sysctl_net_llc.c                           |   2 +-
 net/mac802154/llsec.c                              |  14 ++
 net/mac802154/scan.c                               |   1 +
 net/netfilter/Kconfig                              |  11 +-
 net/netfilter/ipset/ip_set_bitmap_gen.h            |   4 +-
 net/netfilter/ipset/ip_set_bitmap_ip.c             |   2 +-
 net/netfilter/ipset/ip_set_bitmap_ipmac.c          |   2 +-
 net/netfilter/ipset/ip_set_bitmap_port.c           |   2 +-
 net/netfilter/ipset/ip_set_core.c                  |   4 +-
 net/netfilter/ipset/ip_set_hash_gen.h              |  12 +-
 net/netfilter/nf_conncount.c                       |  11 +-
 net/netfilter/nf_conntrack_broadcast.c             |   1 +
 net/netfilter/nf_conntrack_core.c                  |  33 +++-
 net/netfilter/nf_conntrack_expect.c                | 155 ++++++++--------
 net/netfilter/nf_conntrack_h323_main.c             |   4 +-
 net/netfilter/nf_conntrack_helper.c                |  19 +-
 net/netfilter/nf_conntrack_irc.c                   |   2 +
 net/netfilter/nf_conntrack_netlink.c               |  45 ++---
 net/netfilter/nf_conntrack_pptp.c                  |   2 +
 net/netfilter/nf_conntrack_sip.c                   |  13 +-
 net/netfilter/nf_flow_table_core.c                 |  13 +-
 net/netfilter/nf_flow_table_ip.c                   |  88 +++------
 net/netfilter/nf_flow_table_path.c                 |   4 +-
 net/netfilter/nf_nat_core.c                        |  10 +
 net/netfilter/nf_queue.c                           |  14 ++
 net/netfilter/nfnetlink_queue.c                    |   3 +
 net/netfilter/nft_compat.c                         |  24 ++-
 net/netfilter/nft_ct.c                             |  22 ++-
 net/netfilter/nft_meta.c                           |   5 +-
 net/netfilter/nft_payload.c                        |  16 +-
 net/netfilter/nft_synproxy.c                       |   9 +-
 net/netfilter/xt_cluster.c                         |   2 +-
 net/openvswitch/conntrack.c                        |   3 +-
 net/psample/psample.c                              |   6 +-
 net/rds/send.c                                     |   2 +
 net/rxrpc/ar-internal.h                            |   6 +-
 net/rxrpc/call_event.c                             |   5 +-
 net/rxrpc/call_object.c                            |   2 +
 net/rxrpc/conn_client.c                            |   2 +-
 net/rxrpc/conn_event.c                             |   9 +-
 net/rxrpc/input.c                                  |  39 +++-
 net/rxrpc/oob.c                                    |  12 +-
 net/rxrpc/recvmsg.c                                |  10 +-
 net/rxrpc/sendmsg.c                                |   3 +-
 net/sched/act_ct.c                                 |  13 +-
 net/sched/cls_api.c                                |   3 +
 net/sched/sch_dualpi2.c                            |  11 +-
 net/sched/sch_generic.c                            |   7 +-
 net/sctp/diag.c                                    |  67 ++++---
 net/sctp/sm_statefuns.c                            |   5 +
 net/sctp/socket.c                                  |  29 ++-
 net/tipc/core.c                                    |   9 +-
 net/tipc/crypto.c                                  |   9 +
 net/tipc/discover.c                                |  14 +-
 net/tipc/udp_media.c                               |  19 +-
 net/xfrm/espintcp.c                                |  34 +---
 net/xfrm/xfrm_input.c                              |  29 +--
 net/xfrm/xfrm_interface_core.c                     |   3 +
 net/xfrm/xfrm_policy.c                             |  27 +--
 net/xfrm/xfrm_state.c                              |  23 ++-
 net/xfrm/xfrm_user.c                               |  20 +-
 tools/net/ynl/Makefile                             |   2 +-
 tools/net/ynl/Makefile.deps                        |   2 +
 tools/net/ynl/generated/Makefile                   |   2 +-
 tools/net/ynl/lib/Makefile                         |   2 +-
 tools/testing/selftests/bpf/config                 |   3 +
 tools/testing/selftests/bpf/prog_tests/test_xsk.c  |  96 +++++-----
 tools/testing/selftests/bpf/prog_tests/test_xsk.h  |   2 +
 .../bpf/prog_tests/xdp_context_test_run.c          | 175 ++++++++++++++++++
 tools/testing/selftests/bpf/progs/test_xdp_meta.c  | 123 +++++++------
 .../testing/selftests/drivers/net/bonding/Makefile |   1 +
 .../drivers/net/bonding/bond_vlan_real_dev.sh      | 180 ++++++++++++++++++
 tools/testing/selftests/drivers/net/so_txtime.c    |   2 +-
 tools/testing/selftests/net/broadcast_ether_dst.sh |   2 +-
 .../net/netfilter/conntrack_sctp_collision.sh      |  89 ++++++---
 .../selftests/net/netfilter/nft_flowtable.sh       |   8 +-
 tools/testing/selftests/net/netfilter/nft_queue.sh |  66 ++++++-
 tools/testing/selftests/net/tls.c                  |   8 +-
 tools/testing/selftests/net/vlan_bridge_binding.sh |   2 +-
 .../selftests/tc-testing/tc-tests/actions/ct.json  |  38 ++++
 .../tc-testing/tc-tests/qdiscs/dualpi2.json        |  44 +++++
 tools/testing/selftests/tc-testing/tdc_gso.py      |  43 +++++
 239 files changed, 3041 insertions(+), 1290 deletions(-)
 create mode 100644 net/core/netdev_work.c
 create mode 100755 tools/testing/selftests/drivers/net/bonding/bond_vlan_real_dev.sh
 create mode 100755 tools/testing/selftests/tc-testing/tdc_gso.py

^ permalink raw reply

* Re: [PATCH net] xfrm: fix stack-out-of-bounds in xfrm_tmpl_resolve_one
From: Antony Antony @ 2026-06-25 17:43 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, Simon Horman,
	netdev, eric.dumazet, syzbot+0ac4d84afe1066a1f3e9,
	Steffen Klassert, Herbert Xu, Tobias Brunner, Christian Hopps
In-Reply-To: <20260625092417.890245-1-edumazet@google.com>

Hi Eric,

On Thu, Jun 25, 2026 at 09:24:17AM +0000, Eric Dumazet wrote:
> syzbot reported a stack-out-of-bounds read in xfrm_state_find()
> which flows from xfrm_tmpl_resolve_one().
> 
> The issue occurs when a policy has a mix of family-changing templates
> (e.g. BEET or IPTFS) and transport templates. If an optional
> family-changing template is skipped because no state is found, the
> current family of the flow (`family`) is not updated. The subsequent
> transport template is then evaluated using the unchanged family (e.g.
> AF_INET), but it uses the template's `encap_family` (e.g. AF_INET6)
> to perform the state lookup.

Thank you for the quick fix. I would like to look at it from a
different angle.

The commit message mentions BEET as a trigger, but I notice BEET
optional templates in outbound policies are already rejected since:

commit 3d776e31c841 ("xfrm: Reject optional tunnel/BEET mode templates in outbound policies")

Here is the effect of blocker for BEET mode. 

ip netns exec ns_a ip link add dummy0 type dummy
ip netns exec ns_a ip link set dummy0 up
ip netns exec ns_a ip addr add 10.1.1.1/24 dev dummy0
ip xfrm policy add src 10.1.1.1/32 dst 10.1.1.2/32 dir out tmpl \
  src fc00::dead:1 dst fc00::dead:2 proto esp reqid 1 mode beet \
  level use tmpl src fc00::dead:1 dst fc00::dead:2 proto esp reqid 2 \
  mode transport
Error: Mode in optional template not allowed in outbound policy.

However, IPTF is allowed. I think fix should include this.
Does syzbot give a clue which mode was used? I am new syzbot postmartum!
Any way look cordump or so tsee the which mode was actually used?
I suspect it is IPTFS, other mode would not tirgger this code path.

In practice, only IPTFS can still reach xfrm_tmpl_resolve_one() with
the family-mismatch condition, since xfrm_user.c has no equivalent
guard for XFRM_MODE_IPTFS.

ip link add dummy0 type dummy
ip link set dummy0 up
ip addr add 10.1.1.1/24 dev dummy0
ip xfrm policy add src 10.1.1.1/32 dst 10.1.1.2/32 dir out tmpl  \
 src fc00::dead:1 dst fc00::dead:2 proto esp reqid 1 mode iptfs \ 
 level use tmpl src fc00::dead:1 dst fc00::dead:2 proto esp reqid 2 \
 mode transport

ping -W 1 -c 1 10.1.1.2
PING 10.1.1.2 (10.1.1.2) 56(84) bytes of data.

[    3.477151] Adding 998396k swap on /dev/vda5.  Priority:-1 extents:1 across:998396k
[   17.565672] ==================================================================
[   17.567270] BUG: KASAN: stack-out-of-bounds in __xfrm6_addr_hash+0x11e/0x170
[   17.567270] Read of size 4 at addr ffff88800f79fd20 by task ping/2777

[   17.567270] CPU: 1 UID: 0 PID: 2777 Comm: ping Not tainted 7.1.0-rc7-02029-gfb92cc029b34-dirty #94 PREEMPT(full)
[   17.567270] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[   17.567270] Call Trace:
[   17.567270]  <TASK>
[   17.567270]  dump_stack_lvl+0x47/0x70
[   17.567270]  ? __xfrm6_addr_hash+0x11e/0x170
[   17.567270]  print_report+0x152/0x4b0
[   17.567270]  ? ksys_mmap_pgoff+0x6d/0xa0
[   17.567270]  ? entry_SYSCALL_64_after_hwframe+0x76/0x7e
[   17.567270]  ? rcu_read_unlock_sched+0xa/0x20
[   17.567270]  ? __virt_addr_valid+0x21b/0x230
[   17.567270]  ? __xfrm6_addr_hash+0x11e/0x170
[   17.567270]  kasan_report+0xa8/0xd0
[   17.567270]  ? __xfrm6_addr_hash+0x11e/0x170
[   17.567270]  __xfrm6_addr_hash+0x11e/0x170
[   17.567270]  __xfrm_dst_hash+0x24/0xc0
[   17.567270]  xfrm_state_find+0xa2d/0x2f90
[   17.567270]  ? __pfx_xfrm_state_find+0x10/0x10
[   17.567270]  ? __pfx_ftrace_graph_ret_addr+0x10/0x10
[   17.567270]  ? __pfx_ftrace_graph_ret_addr+0x10/0x10
[   17.567270]  xfrm_tmpl_resolve_one+0x210/0x570
[   17.567270]  ? __pfx_xfrm_tmpl_resolve_one+0x10/0x10
[   17.567270]  ? __pfx_stack_trace_consume_entry+0x10/0x10
[   17.567270]  ? kernel_text_address+0x5b/0x80
[   17.567270]  ? __kernel_text_address+0xe/0x30
[   17.567270]  ? unwind_get_return_address+0x5e/0x90
[   17.567270]  ? arch_stack_walk+0x8c/0xe0
[   17.567270]  xfrm_tmpl_resolve+0x130/0x200
[   17.567270]  ? __pfx_xfrm_tmpl_resolve+0x10/0x10
[   17.567270]  ? __pfx_xfrm_policy_inexact_lookup_rcu+0x10/0x10
[   17.567270]  ? __refcount_add_not_zero.constprop.0+0xb2/0x110
[   17.567270]  ? __pfx___refcount_add_not_zero.constprop.0+0x10/0x10
[   17.567270]  xfrm_resolve_and_create_bundle+0xd5/0x310
[   17.567270]  ? __pfx_xfrm_resolve_and_create_bundle+0x10/0x10
[   17.567270]  ? __pfx_xfrm_policy_lookup_bytype+0x10/0x10
[   17.567270]  ? __pfx_xfrm_policy_lookup_bytype+0x10/0x10
[   17.567270]  xfrm_lookup_with_ifid+0x3d7/0xb80
[   17.567270]  ? __pfx_xfrm_lookup_with_ifid+0x10/0x10
[   17.567270]  ? ip_route_output_key_hash+0xc6/0x110
[   17.567270]  ? kasan_save_track+0x10/0x30
[   17.567270]  xfrm_lookup_route+0x18/0xe0
[   17.567270]  ip4_datagram_release_cb+0x4c9/0x530
[   17.567270]  ? __pfx_ip4_datagram_release_cb+0x10/0x10
[   17.567270]  ? do_raw_spin_lock+0x71/0xc0
[   17.567270]  ? __pfx_do_raw_spin_lock+0x10/0x10
[   17.567270]  release_sock+0xb0/0x170
[   17.567270]  udp_connect+0x43/0x50
[   17.567270]  __sys_connect+0xa6/0x100
[   17.567270]  ? alloc_fd+0x2e9/0x300
[   17.567270]  ? __pfx___sys_connect+0x10/0x10
[   17.567270]  ? preempt_latency_start+0x1f/0x70
[   17.567270]  ? fd_install+0x7e/0x150
[   17.567270]  ? rcu_read_unlock_sched+0xa/0x20
[   17.567270]  ? __sys_socket+0xdf/0x130
[   17.567270]  ? __pfx___sys_socket+0x10/0x10
[   17.567270]  ? vma_refcount_put+0x43/0xa0
[   17.567270]  __x64_sys_connect+0x7e/0x90
[   17.567270]  do_syscall_64+0x11b/0x2b0
[   17.567270]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[   17.567270] RIP: 0033:0x7f6604eb0570
[   17.567270] Code: 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 80 3d f9 ca 0d 00 00 74 17 b8 2a 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 48 83 ec 18 89 54
[   17.567270] RSP: 002b:00007ffd02bdf658 EFLAGS: 00000202 ORIG_RAX: 000000000000002a
[   17.567270] RAX: ffffffffffffffda RBX: 00007ffd02bdf690 RCX: 00007f6604eb0570
[   17.567270] RDX: 0000000000000010 RSI: 00007ffd02bdf690 RDI: 0000000000000005
[   17.567270] RBP: 0000000000000000 R08: 0000000000000003 R09: 0000000000000000
[   17.567270] R10: 0000000000000006 R11: 0000000000000202 R12: 0000000000000005
[   17.567270] R13: 0000000000000000 R14: 0000557fb777a340 R15: 0000000000000000
[   17.567270]  </TASK>

[   17.567270] The buggy address belongs to stack of task ping/2777
[   17.567270]  and is located at offset 88 in frame:
[   17.567270]  ip4_datagram_release_cb+0x0/0x530

[   17.567270] This frame has 1 object:
[   17.567270]  [32, 88) 'fl4'

[   17.567270] The buggy address belongs to the physical page:
[   17.567270] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0xf79f
[   17.567270] flags: 0x4000000000000000(zone=1)
[   17.567270] raw: 4000000000000000 0000000000000000 ffffea00003de7c8 0000000000000000
[   17.567270] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
[   17.567270] page dumped because: kasan: bad access detected

[   17.567270] Memory state around the buggy address:
[   17.567270]  ffff88800f79fc00: f2 f2 00 00 f3 f3 00 00 00 00 00 00 00 00 00 00
[   17.567270]  ffff88800f79fc80: 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 00 00
[   17.567270] >ffff88800f79fd00: 00 00 00 00 f3 f3 f3 f3 f3 00 00 00 00 00 00 00
[   17.567270]                                ^
[   17.567270]  ffff88800f79fd80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1
[   17.567270]  ffff88800f79fe00: f1 f1 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   17.567270] ==================================================================
[   17.658376] Disabling lock debugging due to kernel taint


I have another proposed fix:
https://lore.kernel.org/all/20260625-xfrm-pol-out-tmpl-iptfs-reject-fix-v1-1-814861129086@secunet.com/

After this IPTF wouldn't be allowed.

ip xfrm policy add src 10.1.1.1/32 dst 10.1.1.2/32 dir out tmpl \
 src fc00::dead:1 dst fc00::dead:2 proto esp reqid 1 mode iptfs \
 level use tmpl src fc00::dead:1 dst fc00::dead:2 proto esp reqid 2 mode transport

  Error: Mode in optional template not allowed in outbound policy.

> 
> This causes `xfrm_state_find()` to interpret the IPv4 flow addresses
> (allocated on the stack as `struct flowi4` in `raw_sendmsg` or
> `udp_sendmsg`) as IPv6 addresses (`xfrm_address_t`), leading to a
> 16-byte read from the 4-byte stack variables, triggering KASAN.
> 
> Fix this by tracking the active family of the flow (`cur_family`)
> during template resolution:
> 1. Initialize `cur_family` to the flow's original family.
> 2. For transport templates, verify that `tmpl->encap_family` matches
>    `cur_family`. If they mismatch, abort with -EINVAL.
> 3. When a template that can change the family (tunnel, beet, iptfs) is
>    successfully resolved, update `cur_family` to `tmpl->encap_family`.
> 4. If a template is skipped (optional), `cur_family` remains unchanged.
> 
> This prevents mismatched transport lookups and makes the resolution
> robust against any family-transition gaps.
> 
> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> Reported-by: syzbot+0ac4d84afe1066a1f3e9@syzkaller.appspotmail.com
> Closes: https://www.spinics.net/lists/netdev/msg1200923.html
> Assisted-by: Jetski:gemini-3.1-pro-preview
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> ---
> Cc: Steffen Klassert <steffen.klassert@secunet.com>
> Cc: Herbert Xu <herbert@gondor.apana.org.au>
> ---


^ permalink raw reply

* Re: [PATCH 1/2] bpf: preserve rx_queue_index across XDP redirects
From: Mehdi Ben Hadj Khelifa @ 2026-06-25 18:35 UTC (permalink / raw)
  To: Alexei Starovoitov, Jakub Kicinski, Siddharth C
  Cc: ast, hawk, andrii, netdev, bpf, linux-kernel, linux-kselftest
In-Reply-To: <DJIA58WBML8S.A897NKI06DV9@gmail.com>

On 6/25/26 5:44 PM, Alexei Starovoitov wrote:
> On Wed Jun 24, 2026 at 6:54 PM PDT, Jakub Kicinski wrote:
>> On Sat, 20 Jun 2026 12:13:13 +0000 Siddharth C wrote:
>>> diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c
>>> index 5e59ab896f05..8f2d7013620f 100644
>>> --- a/kernel/bpf/cpumap.c
>>> +++ b/kernel/bpf/cpumap.c
>>> @@ -197,7 +197,7 @@ static int cpu_map_bpf_prog_run_xdp(struct bpf_cpu_map_entry *rcpu,
>>>   
>>>   		rxq.dev = xdpf->dev_rx;
>>>   		rxq.mem.type = xdpf->mem_type;
>>> -		/* TODO: report queue_index to xdp_rxq_info */
>>> +		rxq.queue_index = xdpf->rx_queue_index;
>>
>> Do you actually need this or you're just trying to address the TODO?
> 
> It's a 3rd if not 4th attempt from various "people" to address this TODO.
> We should just remove this line instead.
> 

I was one of the people that tried this. And also I have already sent
a patch to address the misleading comment to prevent future wasted 
effort. My patch is still on hold (LINK: 
https://lore.kernel.org/all/20251021114714.1757372-1-mehdi.benhadjkhelifa@gmail.com/).

Best Regards,
Mehdi Ben Hadj Khelifa


^ permalink raw reply

* [PATCH net] octeontx2-pf: check DMAC extraction support before filtering
From: nshettyj @ 2026-06-25 17:25 UTC (permalink / raw)
  To: netdev, linux-kernel
  Cc: sgoutham, gakula, sbhatta, hkelam, bbhushan2, andrew+netdev,
	davem, edumazet, kuba, pabeni, naveenm, tduszynski, sumang,
	Nitin Shetty J

From: Suman Ghosh <sumang@marvell.com>

Currently, configuring a VF MAC address via the PF (e.g., 'ip link
set <pf> vf 0 mac <mac>') blindly attempts to install a DMAC-based
hardware filter. However, the hardware parser profile might not
support DMAC extraction.

Check if the hardware parsing profile supports DMAC extraction
before adding the filter. Additionally, emit a warning message
to inform the operator if the MAC filter installation fails due
to missing DMAC extraction support.

Fixes: f0c2982aaf98 ("octeontx2-pf: Add support for SR-IOV management functions")
Signed-off-by: Suman Ghosh <sumang@marvell.com>
Signed-off-by: Nitin Shetty J <nshettyj@marvell.com>
---
 .../ethernet/marvell/octeontx2/nic/otx2_pf.c  | 34 +++++++++++++++++++
 1 file changed, 34 insertions(+)

diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c
index b63df5737ff2..8e4435d9e520 100644
--- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c
+++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c
@@ -2546,6 +2546,8 @@ static int otx2_do_set_vf_mac(struct otx2_nic *pf, int vf, const u8 *mac)
 static int otx2_set_vf_mac(struct net_device *netdev, int vf, u8 *mac)
 {
 	struct otx2_nic *pf = netdev_priv(netdev);
+	struct npc_get_field_status_req *req;
+	struct npc_get_field_status_rsp *rsp;
 	struct pci_dev *pdev = pf->pdev;
 	struct otx2_vf_config *config;
 	int ret;
@@ -2559,6 +2561,38 @@ static int otx2_set_vf_mac(struct net_device *netdev, int vf, u8 *mac)
 	if (!is_valid_ether_addr(mac))
 		return -EINVAL;
 
+	/* Skip installing the DMAC filter if the hardware parser profile
+	 * does not support DMAC extraction.
+	 */
+	mutex_lock(&pf->mbox.lock);
+	req = otx2_mbox_alloc_msg_npc_get_field_status(&pf->mbox);
+	if (!req) {
+		mutex_unlock(&pf->mbox.lock);
+		return -ENOMEM;
+	}
+
+	req->field = NPC_DMAC;
+	if (otx2_sync_mbox_msg(&pf->mbox)) {
+		mutex_unlock(&pf->mbox.lock);
+		return -EINVAL;
+	}
+
+	rsp = (struct npc_get_field_status_rsp *)otx2_mbox_get_rsp
+	       (&pf->mbox.mbox, 0, &req->hdr);
+	if (IS_ERR(rsp)) {
+		mutex_unlock(&pf->mbox.lock);
+		return PTR_ERR(rsp);
+	}
+
+	if (!rsp->enable) {
+		mutex_unlock(&pf->mbox.lock);
+		netdev_warn(netdev, "VF %d MAC filter not installed: DMAC extraction not supported by parser profile\n",
+			    vf);
+		return 0;
+	}
+
+	mutex_unlock(&pf->mbox.lock);
+
 	config = &pf->vf_configs[vf];
 	ether_addr_copy(config->mac, mac);
 
-- 
2.48.1


^ permalink raw reply related

* [PATCH nf-next v2 0/3] netfilter: replace u_int*_t with kernel int types (batch 2)
From: Carlos Grillet @ 2026-06-25 17:25 UTC (permalink / raw)
  To: Simon Horman, Julian Anastasov, David Ahern, Ido Schimmel,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Pablo Neira Ayuso, Florian Westphal, Phil Sutter
  Cc: netdev, lvs-devel, linux-kernel, netfilter-devel, coreteam

This patch series replaces POSIX u_int8_t/u_int16_t with the preferred
kernel types u8/u16 across several netfilter files and updates the
corresponding header definitions.

This continues the work started in:
https://lore.kernel.org/all/20260616182948.96865-1-carlos@carlosgrillet.me

No functional changes.

Changes in v2:
- Drop nf_conntrack_sane patch (Florian Westphal: ports[] removal pending)
- link to v1: https://lore.kernel.org/all/20260624184036.71051-1-carlos@carlosgrillet.me

Carlos Grillet (3):
  netfilter: nf_conntrack_h323_main: replace u_int8_t with u8
  netfilter: nf_conntrack_amanda: replace u_int16_t with u16
  netfilter: ip_vs_nfct: replace u_int8_t with u8

 include/net/ip_vs.h                    | 2 +-
 net/netfilter/ipvs/ip_vs_nfct.c        | 2 +-
 net/netfilter/nf_conntrack_amanda.c    | 2 +-
 net/netfilter/nf_conntrack_h323_main.c | 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

-- 
2.54.0


^ permalink raw reply

* [PATCH nf-next v2 3/3] netfilter: ip_vs_nfct: replace u_int8_t with u8
From: Carlos Grillet @ 2026-06-25 17:25 UTC (permalink / raw)
  To: David Ahern, Ido Schimmel, Simon Horman, Julian Anastasov,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Pablo Neira Ayuso, Florian Westphal, Phil Sutter
  Cc: netdev, lvs-devel, linux-kernel, netfilter-devel, coreteam
In-Reply-To: <20260625172550.35781-1-carlos@carlosgrillet.me>

Use preferred kernel integer type u8 instead of the POSIX u_int8_t
variant and update header to match definition.

No functional change.

Signed-off-by: Carlos Grillet <carlos@carlosgrillet.me>
---
 include/net/ip_vs.h             | 2 +-
 net/netfilter/ipvs/ip_vs_nfct.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
index 49297fec448a..ed2e9bc1bb4e 100644
--- a/include/net/ip_vs.h
+++ b/include/net/ip_vs.h
@@ -2123,7 +2123,7 @@ void ip_vs_update_conntrack(struct sk_buff *skb, struct ip_vs_conn *cp,
 			    int outin);
 int ip_vs_confirm_conntrack(struct sk_buff *skb);
 void ip_vs_nfct_expect_related(struct sk_buff *skb, struct nf_conn *ct,
-			       struct ip_vs_conn *cp, u_int8_t proto,
+			       struct ip_vs_conn *cp, u8 proto,
 			       const __be16 port, int from_rs);
 void ip_vs_conn_drop_conntrack(struct ip_vs_conn *cp);
 
diff --git a/net/netfilter/ipvs/ip_vs_nfct.c b/net/netfilter/ipvs/ip_vs_nfct.c
index 81974f69e5bb..347185fd0c8c 100644
--- a/net/netfilter/ipvs/ip_vs_nfct.c
+++ b/net/netfilter/ipvs/ip_vs_nfct.c
@@ -208,7 +208,7 @@ static void ip_vs_nfct_expect_callback(struct nf_conn *ct,
  * Use port 0 to expect connection from any port.
  */
 void ip_vs_nfct_expect_related(struct sk_buff *skb, struct nf_conn *ct,
-			       struct ip_vs_conn *cp, u_int8_t proto,
+			       struct ip_vs_conn *cp, u8 proto,
 			       const __be16 port, int from_rs)
 {
 	struct nf_conntrack_expect *exp;
-- 
2.54.0


^ permalink raw reply related

* [PATCH nf-next v2 2/3] netfilter: nf_conntrack_amanda: replace u_int16_t with u16
From: Carlos Grillet @ 2026-06-25 17:25 UTC (permalink / raw)
  To: Pablo Neira Ayuso, Florian Westphal, Phil Sutter, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman
  Cc: netfilter-devel, coreteam, netdev, linux-kernel
In-Reply-To: <20260625172550.35781-1-carlos@carlosgrillet.me>

Use preferred kernel integer type u16 instead of the POSIX u_int16_t
variant.

No functional change.

Signed-off-by: Carlos Grillet <carlos@carlosgrillet.me>
---
 net/netfilter/nf_conntrack_amanda.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netfilter/nf_conntrack_amanda.c b/net/netfilter/nf_conntrack_amanda.c
index ddafbdfc96dc..f10ac2c49f4b 100644
--- a/net/netfilter/nf_conntrack_amanda.c
+++ b/net/netfilter/nf_conntrack_amanda.c
@@ -89,7 +89,7 @@ static int amanda_help(struct sk_buff *skb,
 	struct nf_conntrack_tuple *tuple;
 	unsigned int dataoff, start, stop, off, i;
 	char pbuf[sizeof("65535")], *tmp;
-	u_int16_t len;
+	u16 len;
 	__be16 port;
 	int ret = NF_ACCEPT;
 	nf_nat_amanda_hook_fn *nf_nat_amanda;
-- 
2.54.0


^ permalink raw reply related

* [PATCH nf-next v2 1/3] netfilter: nf_conntrack_h323_main: replace u_int8_t with u8
From: Carlos Grillet @ 2026-06-25 17:25 UTC (permalink / raw)
  To: Pablo Neira Ayuso, Florian Westphal, Phil Sutter, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman
  Cc: netfilter-devel, coreteam, netdev, linux-kernel
In-Reply-To: <20260625172550.35781-1-carlos@carlosgrillet.me>

Use preferred kernel integer type u8 instead of the POSIX u_int8_t
variant.

No functional change.

Signed-off-by: Carlos Grillet <carlos@carlosgrillet.me>
---
 net/netfilter/nf_conntrack_h323_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netfilter/nf_conntrack_h323_main.c b/net/netfilter/nf_conntrack_h323_main.c
index 7f189dceb3c4..68ecaf0daf95 100644
--- a/net/netfilter/nf_conntrack_h323_main.c
+++ b/net/netfilter/nf_conntrack_h323_main.c
@@ -671,7 +671,7 @@ static int expect_h245(struct sk_buff *skb, struct nf_conn *ct,
 static int callforward_do_filter(struct net *net,
 				 const union nf_inet_addr *src,
 				 const union nf_inet_addr *dst,
-				 u_int8_t family)
+				 u8 family)
 {
 	int ret = 0;
 
-- 
2.54.0


^ permalink raw reply related

* [PATCH ipsec] xfrm: reject optional IPTFS templates in outbound policies
From: Antony Antony @ 2026-06-25 17:25 UTC (permalink / raw)
  To: Steffen Klassert, Herbert Xu, "David S. Miller",
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Christian Hopps
  Cc: Tobias Brunner, netdev, syzbot+0ac4d84afe1066a1f3e9,
	Antony Antony, Antony Antony

syzbot reported a stack-out-of-bounds read in xfrm_state_find()
which flows from xfrm_tmpl_resolve_one().

Commit 3d776e31c841 ("xfrm: Reject optional tunnel/BEET mode
templates in outbound policies") disallowed optional tunnel and
BEET in outbound policies to prevent this. Later when IPTFS
added, it was not covered by that fix and can still trigger
the out-of-bounds read;

Extend the check to disallow optional IPTFS in outbound policies
as well. IPTFS should be identical to tunnel mode.
IN and FWD policies are not affected: xfrm_tmpl_resolve_one()
is only reachable via the outbound path.

Reproducer, before:
ip link add dummy0 type dummy
ip link set dummy0 up
ip addr add 10.1.1.1/24 dev dummy0
ip xfrm policy add src 10.1.1.1/32 dst 10.1.1.2/32 dir out tmpl
  src fc00::dead:1 dst fc00::dead:2 proto esp reqid 1 mode iptfs
  level use tmpl src fc00::dead:1 dst fc00::dead:2 proto esp reqid
  2 mode transport
ping -W 1 -c 1 10.1.1.2
PING 10.1.1.2 (10.1.1.2) 56(84) bytes of data.

[   64.168420] ==================================================================
[   64.169977] BUG: KASAN: stack-out-of-bounds in __xfrm6_addr_hash+0x11e/0x170
[   64.169977] Read of size 4 at addr ffff88800e1ffd20 by task ping/2844

[   64.169977] CPU: 2 UID: 0 PID: 2844 Comm: ping Not tainted 7.1.0-rc7-00180-geb23b588430a #98 PREEMPT(full)
[   64.169977] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[   64.169977] Call Trace:
[   64.169977]  <TASK>
[   64.169977]  dump_stack_lvl+0x47/0x70
[   64.169977]  ? __xfrm6_addr_hash+0x11e/0x170
[   64.169977]  print_report+0x152/0x4b0
[   64.169977]  ? ksys_mmap_pgoff+0x6d/0xa0
[   64.169977]  ? entry_SYSCALL_64_after_hwframe+0x76/0x7e
[   64.169977]  ? rcu_read_unlock_sched+0xa/0x20
[   64.169977]  ? __virt_addr_valid+0x21b/0x230
[   64.169977]  ? __xfrm6_addr_hash+0x11e/0x170
[   64.169977]  kasan_report+0xa8/0xd0
[   64.169977]  ? __xfrm6_addr_hash+0x11e/0x170
[   64.169977]  __xfrm6_addr_hash+0x11e/0x170
[   64.169977]  __xfrm_dst_hash+0x24/0xc0
[   64.169977]  xfrm_state_find+0xa2d/0x2f90
[   64.169977]  ? __pfx_xfrm_state_find+0x10/0x10
[   64.169977]  ? __pfx_ftrace_graph_ret_addr+0x10/0x10
[   64.169977]  ? __pfx_ftrace_graph_ret_addr+0x10/0x10
[   64.169977]  xfrm_tmpl_resolve_one+0x210/0x570
[   64.169977]  ? __pfx_xfrm_tmpl_resolve_one+0x10/0x10
[   64.169977]  ? __pfx_stack_trace_consume_entry+0x10/0x10
[   64.169977]  ? kernel_text_address+0x5b/0x80
[   64.169977]  ? __kernel_text_address+0xe/0x30
[   64.169977]  ? unwind_get_return_address+0x5e/0x90
[   64.169977]  ? arch_stack_walk+0x8c/0xe0
[   64.169977]  xfrm_tmpl_resolve+0x130/0x200
[   64.169977]  ? __pfx_xfrm_tmpl_resolve+0x10/0x10
[   64.169977]  ? __pfx_xfrm_policy_inexact_lookup_rcu+0x10/0x10
[   64.169977]  ? __refcount_add_not_zero.constprop.0+0xb2/0x110
[   64.169977]  ? __pfx___refcount_add_not_zero.constprop.0+0x10/0x10
[   64.169977]  xfrm_resolve_and_create_bundle+0xd5/0x310
[   64.169977]  ? __pfx_xfrm_resolve_and_create_bundle+0x10/0x10
[   64.169977]  ? __pfx_xfrm_policy_lookup_bytype+0x10/0x10
[   64.169977]  ? __pfx_xfrm_policy_lookup_bytype+0x10/0x10
[   64.169977]  xfrm_lookup_with_ifid+0x3d8/0xb80
[   64.169977]  ? __pfx_xfrm_lookup_with_ifid+0x10/0x10
[   64.169977]  ? ip_route_output_key_hash+0xc6/0x110
[   64.169977]  ? kasan_save_track+0x10/0x30
[   64.169977]  xfrm_lookup_route+0x18/0xe0
[   64.169977]  ip4_datagram_release_cb+0x4c9/0x530
[   64.169977]  ? __pfx_ip4_datagram_release_cb+0x10/0x10
[   64.169977]  ? do_raw_spin_lock+0x71/0xc0
[   64.169977]  ? __pfx_do_raw_spin_lock+0x10/0x10
[   64.169977]  release_sock+0xb0/0x170
[   64.169977]  udp_connect+0x43/0x50
[   64.169977]  __sys_connect+0xa6/0x100
[   64.169977]  ? alloc_fd+0x2e9/0x300
[   64.169977]  ? __pfx___sys_connect+0x10/0x10
[   64.169977]  ? preempt_latency_start+0x1f/0x70
[   64.169977]  ? fd_install+0x7e/0x150
[   64.169977]  ? rcu_read_unlock_sched+0xa/0x20
[   64.169977]  ? __sys_socket+0xdf/0x130
[   64.169977]  ? __pfx___sys_socket+0x10/0x10
[   64.169977]  ? vma_refcount_put+0x43/0xa0
[   64.169977]  __x64_sys_connect+0x7e/0x90
[   64.169977]  do_syscall_64+0x11b/0x2b0
[   64.169977]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[   64.169977] RIP: 0033:0x7f4851ecb570
[   64.169977] Code: 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 80 3d f9 ca 0d 00 00 74 17 b8 2a 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 48 83 ec 18 89 54
[   64.169977] RSP: 002b:00007ffc830e3498 EFLAGS: 00000202 ORIG_RAX: 000000000000002a
[   64.169977] RAX: ffffffffffffffda RBX: 00007ffc830e34d0 RCX: 00007f4851ecb570
[   64.169977] RDX: 0000000000000010 RSI: 00007ffc830e34d0 RDI: 0000000000000005
[   64.169977] RBP: 0000000000000000 R08: 0000000000000003 R09: 0000000000000000
[   64.169977] R10: 0000000000000006 R11: 0000000000000202 R12: 0000000000000005
[   64.169977] R13: 0000000000000000 R14: 00005619a863f340 R15: 0000000000000000
[   64.169977]  </TASK>

[   64.169977] The buggy address belongs to stack of task ping/2844
[   64.169977]  and is located at offset 88 in frame:
[   64.169977]  ip4_datagram_release_cb+0x0/0x530

[   64.169977] This frame has 1 object:
[   64.169977]  [32, 88) 'fl4'

[   64.169977] The buggy address belongs to the physical page:
[   64.169977] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0xe1ff
[   64.169977] flags: 0x4000000000000000(zone=1)
[   64.169977] raw: 4000000000000000 0000000000000000 ffffea0000387fc8 0000000000000000
[   64.169977] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
[   64.169977] page dumped because: kasan: bad access detected

[   64.169977] Memory state around the buggy address:
[   64.169977]  ffff88800e1ffc00: f2 f2 00 00 f3 f3 00 00 00 00 00 00 00 00 00 00
[   64.169977]  ffff88800e1ffc80: 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 00 00
[   64.169977] >ffff88800e1ffd00: 00 00 00 00 f3 f3 f3 f3 f3 00 00 00 00 00 00 00
[   64.169977]                                ^
[   64.169977]  ffff88800e1ffd80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1
[   64.169977]  ffff88800e1ffe00: f1 f1 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   64.169977] ==================================================================
[   64.245153] Disabling lock debugging due to kernel taint

After the fix:

ip xfrm policy add src 10.1.1.1/32 dst 10.1.1.2/32 dir out tmpl \
 src fc00::dead:1 dst fc00::dead:2 proto esp reqid 1 mode iptfs \
 level use tmpl src fc00::dead:1 dst fc00::dead:2 proto esp reqid 2 \
 mode transport

Error: Mode in optional template not allowed in outbound policy.

Fixes: d1716d5a44c3 ("xfrm: add generic iptfs defines and functionality")
Reported-by: syzbot+0ac4d84afe1066a1f3e9@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/6a3ceb94.43b4ff68.30a095.0004.GAE@google.com/T/
Signed-off-by: Antony Antony <antony@phenome.org>
---
 net/xfrm/xfrm_user.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 61eb5de33b87..b36741c4ea3d 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -2077,13 +2077,12 @@ static int validate_tmpl(int nr, struct xfrm_user_tmpl *ut, u16 family,
 		switch (ut[i].mode) {
 		case XFRM_MODE_TUNNEL:
 		case XFRM_MODE_BEET:
+		case XFRM_MODE_IPTFS:
 			if (ut[i].optional && dir == XFRM_POLICY_OUT) {
 				NL_SET_ERR_MSG(extack, "Mode in optional template not allowed in outbound policy");
 				return -EINVAL;
 			}
 			break;
-		case XFRM_MODE_IPTFS:
-			break;
 		default:
 			if (ut[i].family != prev_family) {
 				NL_SET_ERR_MSG(extack, "Mode in template doesn't support a family change");

---
base-commit: 40f0b1047918539f0b0f795ac65e35336b4c2c78
change-id: 20260625-xfrm-pol-out-tmpl-iptfs-reject-fix-10373324a939

Best regards,
--  
Antony Antony <antony.antony@secunet.com>


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox