Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH v2 0/4] vhost/vsock: add support for VHOST_RESET_OWNER and CPR migration
From: Andrey Drobyshev @ 2026-06-22 17:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: kvm, virtualization, netdev, sgarzare, mst, stefanha,
	dongli.zhang, maciej.szmigiero, bchaney, mark.kanda, ptikhomirov,
	den, andrey.drobyshev

v1 -> v2:

  * Patch 2 (suppress EHOSTUNREACH): replace 'cpr_paused' + backend check
    with a single 'started' latch;
  * Patch 3 (re-scan TX virtqueue): reword commit message;
  * Patch 4 (VHOST_RESET_OWNER):
      - fix a vhost_worker use-after-free / stuck VHOST_WORK_QUEUED stall
        against the lockless send path;
      - drop the no-op vsock_for_each_connected_socket() iteration;
  * Shuffle the patches, keep RESET_OWNER implementation last to preserve
    bisectability;
  * Reword the cover letter.

v1: https://lore.kernel.org/virtualization/20260612165718.433546-1-andrey.drobyshev@virtuozzo.com

Host<-->guest connections via AF_VSOCK sockets aren't supposed to
outlive VM migration, since VM is moving to another host.  However
there's a special case, which is QEMU live-update, or CPR
(checkpoint-restore) migration.  In this case, VM remains on the same
host, and we'd like such connections to persist.

For this to work, we need to be able to transfer device ownership from
source QEMU to dest QEMU.  Namely, source needs to reset ownership by
issuing VHOST_RESET_OWNER ioctl, and then target has to claim it by
calling VHOST_SET_OWNER.

Since VHOST_RESET_OWNER isn't yet implemented for vhost-vsock, let's add
such implementation.  Patch 1 is a preliminary helper.  Patches 2 and 3
fix the pre-existing issues which do manifest during CPR / RESET_OWNER.
Patch 4 is the ioctl's implementation itself - we keep it last to
preserve bisectability.

There's a complementary series for QEMU [0] adding support of vhost-vsock
devices during CPR migration.

I've tested this (patched QEMU + patched kernel) approximately as follows:

  * Run listener in the guest:
  socat -u VSOCK-LISTEN:9999 - >/tmp/recv.bin

  * Run data transfer from host to guest:
  socat -u FILE:/root/bigfile.bin VSOCK-CONNECT:CID:9999

  * Perform CPR migration during transfer (either cpr-exec or cpr-transfer)
  * Check that file hash sum matches

[0] https://lore.kernel.org/qemu-devel/20260619105514.128812-1-andrey.drobyshev@virtuozzo.com

Andrey Drobyshev (2):
  vhost/vsock: suppress EHOSTUNREACH fast-fail during CPR pause
  vhost/vsock: re-scan TX virtqueue on device start

Pavel Tikhomirov (2):
  vhost/vsock: split out vhost_vsock_drop_backends helper
  vhost/vsock: add VHOST_RESET_OWNER ioctl

 drivers/vhost/vsock.c | 96 ++++++++++++++++++++++++++++++++++---------
 1 file changed, 76 insertions(+), 20 deletions(-)

-- 
2.47.1

^ permalink raw reply

* Re: [PATCH net-next v3 1/2] net: dsa: realtek: rtl8365mb: add SGMII support for RTL8367S
From: Johan Alvarado @ 2026-06-22 17:53 UTC (permalink / raw)
  To: Luiz Angelo Daros de Luca
  Cc: netdev, linusw, alsi, andrew, olteanv, davem, edumazet, kuba,
	pabeni, linux, maxime.chevallier, namiltd, linux-kernel
In-Reply-To: <CAJq09z4cVBuRUMyCt8NopNePmjbcj=ycvq95gSXgh581kk4zDw@mail.gmail.com>

Hi Luiz,

Sorry for the slow reply — I wanted to test the changes on hardware
before answering rather than reply blind. I'm also adding the list
back to CC, since this is review of an on-list patch and the
discussion is useful for the archive; hope that's OK.

Thanks for the very thorough review. Almost everything is addressed in
v4, which I'll post once net-next reopens. Replies inline.

> You might want to omit any specific model as there are multiple
> possible supported models that have SGMII. Just use "only the RGMII
> and SGMII interface...". Determining which device supports what is a
> job for chip_info.

Done — the NOTE comment no longer names a model, it just lists the
implemented interfaces.

> > +#define   RTL8365MB_SDS_INDACS_CMD_INDEX_MASK  0x0007
>
> Isn't this MASK larger? I was expecting 0x003F.
>
> Please use GENMASK/BIT whenever possible. It makes it clearer when
> there are holes or overlaps in the reg.
> Once a register macro is added with a bunch of bits described, I think
> it is better to describe all bits we can, even if not in use in this
> driver. Realtek normally maps bits sequentially and, with GENMASK/BIT,
> it is visually easier to spot errors.

The CMD index field is gone in v4: I dropped the always-zero SerDes
index argument, so that mask was removed with it.

On GENMASK/BIT and documenting all bits: agreed in principle, but the
driver currently uses raw hex masks almost everywhere (~75 raw _MASK
defines vs ~22 BIT/GENMASK), so converting only the new SDS defines
would make the file more inconsistent rather than less. Options as I
see them: (1) convert just the new defines, (2) keep raw hex to match
the surrounding style, or (3) a separate file-wide cleanup patch on
top of this series. I'd lean towards 3, keeping this series focused on
the feature and doing the style modernization as its own patch, but
I'm happy to do 1 if you'd rather the new code lead by example. Same
question for documenting currently-unused bits. Which would you prefer?

> > +#define RTL8365MB_SDS_INDACS_ADR_REG           0x6601
>
> This reg is formed by two parts but, in this case, it might be
> pedantic to add the descriptions as well.
> PAGE_MASK    0x7E0
> REGAD_MASK    0x1F

Noted. Since the addresses are passed as whole values and the page/reg
split isn't used in the driver, I've left it as a single address for
now, but I can add the sub-field masks if you'd prefer them documented.

> > +#define   RTL8365MB_SDS_BMCR_DPRST_PHASE1      0x1401
> > +#define   RTL8365MB_SDS_BMCR_DPRST_PHASE2      0x1403
>
> I do not like magic numbers. You could do the BMCR_ANENABLE |
> BMCR_ISOLATE in the macro or code instead of just keeping it in a
> comment. It would give more semantics to the code.

Done — the phase values are now built from the standard bits:

  #define RTL8365MB_SDS_BMCR_DPRST_PHASE1 (BMCR_ANENABLE | BMCR_ISOLATE | 0x1)
  #define RTL8365MB_SDS_BMCR_DPRST_PHASE2 (BMCR_ANENABLE | BMCR_ISOLATE | 0x3)

> > +static const struct rtl8365mb_jam_tbl_entry rtl8365mb_sds_jam_sgmii[] = {
>
> I guess you got this from vendor's redData. However, that sequence is
> for the case when RTL8365MB_CHIP_OPTION_REG(0x13C1) == 0. In my tests,
> rtl8367s returns 1 for that reg, which would select the redDataSB
> variant in the vendor's code. Did you test both or check register
> 0x13C1 [...] HSGMII also has a similar test in vendor's code.

Good catch. I checked 0x13C1 on the MR80X (with the magic
unlock/relock via 0x13C0): it reads option = 1, so the committed
tables are already the SB/HB variants. For SGMII the only difference
between redData and redDataSB is reg 0x482 (0x21A2 for option 0 vs
0x2420 for option 1), and my table has 0x2420; the HSGMII table
matches redDataHB likewise. I captured the full vendor write sequence
on hardware by chainloading a patched U-Boot, so both tables are
confirmed against the live silicon.

v4 reads 0x13C1 at runtime and returns -EOPNOTSUPP for the option-0
variant rather than driving the SerDes with values I cannot verify on
available hardware.

> > +       if (extint->id != 1)
> > +               return -EOPNOTSUPP;
>
> The model RTL8370MB is also a member of RTL8367C [...] Can't you just
> check extint supported_interfaces? [...] This type of hardcoded
> assumption just makes the job harder.

In v4 this hardcoded check is gone. With the phylink_pcs conversion
(see below), the SerDes is gated through supported_interfaces in
get_caps() and mac_select_pcs(), so whether a port uses the SerDes is
driven by chip_info rather than a hardcoded id.

> > +       ret = regmap_update_bits(priv->map, RTL8365MB_BYPASS_LINE_RATE_REG,
> > +                                BIT(extint->id), 0);
>
> BYPASS_LINE_RATE is actually indexed by port number starting at 5
> [...] Describe [it] with a parametric macro, receiving the port number
> and returning the BIT(5-port) as mask [...] For RTL8367R [...] it uses
> bit 0 for int 1 and bit 1 for int 0 or 2.

Done — it's now a parametric macro:

  #define RTL8365MB_BYPASS_LINE_RATE_MASK(_port)  BIT((_port) - 5)

with a comment noting port 5 is the base and that other families (e.g.
the RTL8367R's (id + 1) % 2) index it differently, so this mapping
only holds for the RTL8367C-style parts the driver supports. One small
thing: your mail had BIT(5 - port), which inverts it (port 6 would be
BIT(-1)); I used BIT(port - 5) so port 5 maps to bit 0 — let me know if
you meant something different.

> > +       usleep_range(10, 50);
>
> An arbitrary wait is not ideal but Mieczyslaw already suggested a
> better solution.

Done — the usleep is gone. Writes are fire-and-forget and reads poll
the self-clearing BUSY bit with regmap_read_poll_timeout(), matching
the vendor's getAsicSdsReg. I instrumented it on the MR80X: the BUSY
bit is never even observed set (the access completes within the
register transaction over MDIO), so no sleep is needed.

> > +       ret = regmap_update_bits(
> > +               priv->map, RTL8365MB_DIGITAL_INTERFACE_SELECT_REG(extint->id),
> > +               [...]
>
> Sometimes it is just easier to use a temp variable instead of fighting
> with the 80-col limit.

Done, the mode value goes into a temporary now.

> The lack of test devices is holding me back from making further
> improvements. [...] I think that, with some limitations, the rtl8365mb
> driver [...] could support the full range from RTL8370/RTL8367 up to
> RTL8367D.

Makes sense, and I tried to keep v4 from baking in RTL8367S-specific
assumptions where I could (the SerDes gating and the bypass macro
above). I can only test on the MR80X (RTL8367S) myself, so I've kept
the scope to what I can verify, but I'm happy to keep the code friendly
to that wider range.

One more thing from the wider discussion: following Maxime's review I
converted the whole SerDes path to a phylink_pcs in v4, so the SerDes
handling now lives in pcs_config()/pcs_get_state()/pcs_link_up() rather
than the ext/sds split in mac_link_up/down you saw in v3. pcs_get_state()
now reads the real SerDes link status (reg 0x3d) instead of reporting
the forced value.

Thanks again — this review made the series substantially better.

Best regards,
Johan

^ permalink raw reply

* Re: [PATCH] net: stmmac: fix missed le32_to_cpu()
From: Maxime Chevallier @ 2026-06-22 17:51 UTC (permalink / raw)
  To: Ben Dooks, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Maxime Coquelin, Alexandre Torgue,
	Russell King (Oracle), netdev, linux-stm32, linux-arm-kernel,
	linux-kernel
In-Reply-To: <20260622143707.497198-1-ben.dooks@codethink.co.uk>

Hi Ben,

On 6/22/26 16:37, Ben Dooks wrote:
> The print in ndesc_display_ring() sends the des2 and des3
> to the pr_info() without passing them through the relevant
> conversion to cpu order.
> 
> Fix the (prototype) sparse warnings by using le32_to_cpu():
> drivers/net/ethernet/stmicro/stmmac/norm_desc.c:258:17: warning: incorrect type in argument 6 (different base types)
> drivers/net/ethernet/stmicro/stmmac/norm_desc.c:258:17:    expected unsigned int
> drivers/net/ethernet/stmicro/stmmac/norm_desc.c:258:17:    got restricted __le32 [usertype] des2
> drivers/net/ethernet/stmicro/stmmac/norm_desc.c:258:17: warning: incorrect type in argument 7 (different base types)
> drivers/net/ethernet/stmicro/stmmac/norm_desc.c:258:17:    expected unsigned int
> drivers/net/ethernet/stmicro/stmmac/norm_desc.c:258:17:    got restricted __le32 [usertype] des3
> 
> Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>

I agree on the principle, but this isn't a fix so this'll have to wait
until net-next re-opens :)

Thanks,

Maxime

> ---
>  drivers/net/ethernet/stmicro/stmmac/norm_desc.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/stmicro/stmmac/norm_desc.c b/drivers/net/ethernet/stmicro/stmmac/norm_desc.c
> index c4b613564f87..74c9b7b1fe8f 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/norm_desc.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/norm_desc.c
> @@ -258,7 +258,7 @@ static void ndesc_display_ring(void *head, unsigned int size, bool rx,
>  		pr_info("%03d [%pad]: 0x%x 0x%x 0x%x 0x%x",
>  			i, &dma_addr,
>  			(unsigned int)x, (unsigned int)(x >> 32),
> -			p->des2, p->des3);
> +			le32_to_cpu(p->des2), le32_to_cpu(p->des3));
>  		p++;
>  	}
>  	pr_info("\n");


^ permalink raw reply

* RE: [REGRESSION 6.12.90 -> 6.12.94] vsock/virtio: large AF_VSOCK transfers reset under backpressure
From: Brien Oberstein @ 2026-06-22 17:48 UTC (permalink / raw)
  To: 'Stefano Garzarella'; +Cc: netdev, regressions, stable
In-Reply-To: <ajkmjgGdJp9Dj6em@sgarzare-redhat>

Hi Stefano,

Confirmed -- the 16 MB buffer fixes it: with socat owning the VSOCK-LISTEN
and SO_VM_SOCKETS_BUFFER_MAX_SIZE/SIZE at 16 MB, a 6.12.94 guest passed
21/21 large transfers (1.5 MB x12 through 8 MB); the same 1.5 MB payload
failed every time without it. So the per-socket workaround covers the
bridges whose listen I control, but not vsock services I can't
reconfigure, which stay broken on 6.12.94.

Agreed the old behaviour was buggy in its own right -- it was
over-allocating past the advertised buffer. The practical effect for me is
just that a config that worked on 6.12.90 no longer does on 6.12.94.

A question mainly for stable@: until the merging work lands, would an
interim be acceptable -- something that keeps ordinary small-packet
workloads under the limit without reopening the DoS? I don't have the
kernel-side expertise to judge what's safe there, but I'm glad to prepare
and test whatever interim you think is right, and to test the merging
patch when it's ready.

Thanks,
Brien

-----Original Message-----
From: Stefano Garzarella <sgarzare@redhat.com> 
Sent: Monday, June 22, 2026 8:22 AM
To: Brien Oberstein <brienpub@gmail.com>
Cc: netdev@vger.kernel.org; regressions@lists.linux.dev;
stable@vger.kernel.org
Subject: Re: [REGRESSION 6.12.90 -> 6.12.94] vsock/virtio: large AF_VSOCK
transfers reset under backpressure

On Mon, Jun 22, 2026 at 07:55:30AM -0400, Brien Oberstein wrote:
>Hi Stefano,
>
>Thanks, that matches what I'm seeing: large transfers reset mid-stream
>instead of the sender being throttled (reliable above ~1.5 MB, fine below
>~90 KB).
>
>The bind for me: it's not just this mail bridge -- I use AF_VSOCK for a few
>host/guest services, some of which open their own sockets, so the
per-socket
>buffer workaround can't cover them all. That leaves pinning 6.12.90 (losing
>the DoS fix and further kernel updates) as the only blanket option.

Okay, but in that case did it work?

>
>A few quick questions:
>
>1. Is a -stable backport of the merging fix likely, and roughly when?

We don't have a fix yet.

>2. Could a smaller interim land in -stable sooner (e.g. more default
>   headroom) without reopening the DoS?

What we've merged so far is the best we can do for now, but anyone who 
wants to help improve the situation is welcome to submit patches.

>3. Will the fix guarantee backpressure for any packet size, or just widen
>   the margin?

It should fix STREAM sockets for any packet size.
SEQPACKET/DGRAM is a bit different since we need to keep boundaries, so 
it will come later if needed.

>
>Happy to test any patch

THanks, I'll ask you to test.

>I have a solid reproducer and can turn it around
>in a day. I'll also file this as a tracked regression so it's not lost.

Unfortunately, it's always been partially broken, using more memory than 
specified, so I don't know if this is actually a full regression, but I 
understand.

Thanks,
Stefano

^ permalink raw reply

* Re: [PATCH iproute2-next] "ip help" wrong output, exit code.
From: Dmitri Seletski @ 2026-06-22 17:47 UTC (permalink / raw)
  To: David Laight, Stephen Hemminger; +Cc: netdev
In-Reply-To: <20260622174454.576b3580@pumpkin>

Hello David,


Based on change introduced:

Two samples of "ip help" with demonstration of exit code and standard 
output are below.

This is in line with what expect.


dimkosPC~/compiled/iproute2-next #if ./ip/ip help a >>/dev/null  ; then 
echo help triggered  ; else echo error code triggered  ;fi  #this 
redirects standard output  to /dev/null, so text missing is not error,
but standard text
help triggered

dimkosPC~/compiled/iproute2-next #if ./ip/ip help   ; then echo help 
triggered  ; else echo error code triggered  ;fi
Usage: ip [ OPTIONS ] OBJECT { COMMAND | help }
       ip [ -force ] -batch filename
where  OBJECT := { address | addrlabel | fou | help | ila | ioam | l2tp 
| link |
                   macsec | maddress | monitor | mptcp | mroute | mrule |
                   neighbor | neighbour | netconf | netns | nexthop | 
ntable |
                   ntbl | route | rule | sr | stats | tap | tcpmetrics |
                   token | tunnel | tuntap | vrf | xfrm }
       OPTIONS := { -V[ersion] | -s[tatistics] | -d[etails] | -r[esolve] |
                    -h[uman-readable] | -iec | -j[son] | -p[retty] |
                    -f[amily] { inet | inet6 | mpls | bridge | link } |
                    -4 | -6 | -M | -B | -0 |
                    -l[oops] { maximum-addr-flush-attempts } | -echo | 
-br[ief] |
                    -o[neline] | -t[imestamp] | -ts[hort] | -b[atch] 
[filename] |
                    -rc[vbuf] [size] | -n[etns] name | -N[umeric] | -a[ll] |
                    -c[olor]}
help triggered

Two samples of command that is broken on purpose.

dimkosPC~/compiled/iproute2-next #if ./ip/ip idontexist   ; then echo 
help triggered  ; else echo error code triggered  ;fi
Object "idontexist" is unknown, try "ip help".
error code triggered

dimkosPC~/compiled/iproute2-next #if ./ip/ip idontexist  >>/dev/null  ; 
then echo help triggered  ; else echo error code triggered  ;fi  #this 
redirects standard output  to /dev/null, so text missing is not error, 
but standard text
Object "idontexist" is unknown, try "ip help".
error code triggered

This works as expected as per my understanding.


Not everything is fixed, but chunk of things fixed is better than non of it.

for example:

if ip  add help    ; then echo help triggered  ; else echo error code 
triggered  ;fi  #this redirects standard output  to /dev/null, so text 
missing is not error, but standard text
Usage: ip address {add|change|replace} IFADDR dev IFNAME [ LIFETIME ]
                                                      [ CONFFLAG-LIST ]
       ip address del IFADDR dev IFNAME [mngtmpaddr]
       ip address {save|flush} [ dev IFNAME ] [ scope SCOPE-ID ] [ to 
PREFIX ]
                            [ FLAG-LIST ] [ label LABEL ] [ { up | down } ]
       ip address [ show [ dev IFNAME ] [ scope SCOPE-ID ] [ master DEVICE ]
                         [ nomaster ]
                         [ type TYPE ] [ to PREFIX ] [ FLAG-LIST ]
                         [ label LABEL ] [ { up | down } ] [ vrf NAME ]
                         [ proto ADDRPROTO ] ]
       ip address {showdump|restore}
IFADDR := PREFIX | ADDR peer PREFIX
          [ broadcast ADDR ] [ anycast ADDR ]
          [ label IFNAME ] [ scope SCOPE-ID ] [ metric METRIC ]
          [ proto ADDRPROTO ]
SCOPE-ID := [ host | link | global | NUMBER ]
FLAG-LIST := [ FLAG-LIST ] FLAG
FLAG  := [ permanent | dynamic | secondary | primary |
           [-]tentative | [-]deprecated | [-]dadfailed | temporary |
           CONFFLAG-LIST ]
CONFFLAG-LIST := [ CONFFLAG-LIST ] CONFFLAG
CONFFLAG  := [ home | nodad | mngtmpaddr | noprefixroute | autojoin ]
LIFETIME := [ valid_lft LFT ] [ preferred_lft LFT ]
LFT := forever | SECONDS
ADDRPROTO := [ NAME | NUMBER ]
TYPE := { amt | bareudp | bond | bond_slave | bridge | bridge_slave |
          dsa | dummy | erspan | geneve | gre | gretap | gtp | hsr |
          ifb | ip6erspan | ip6gre | ip6gretap | ip6tnl |
          ipip | ipoib | ipvlan | ipvtap |
          macsec | macvlan | macvtap | netdevsim |
          netkit | nlmon | pfcp | rmnet | sit | team | team_slave |
          vcan | veth | vlan | vrf | vti | vxcan | vxlan | wwan |
          xfrm | virt_wifi }
error code triggered

This is still problematic.


But so far code leaves "ip help" command/argument in better shape than 
it found it in.


I may try improve things more, but lets submit what we already have 
"better", please.

Kind Regards

Dmitri Seletski


On 6/22/26 17:44, David Laight wrote:
> On Mon, 22 Jun 2026 07:57:00 -0700
> Stephen Hemminger <stephen@networkplumber.org> wrote:
>
>> On Sun, 21 Jun 2026 22:48:59 +0100
>> Dmitri Seletski <drjoms@gmail.com> wrote:
>>
>>>  From 0805e07105cd15c5b94271a4706e50e3c65dbde5 Mon Sep 17 00:00:00 2001
>>> From: Dmitri Seletski <drjoms@gmail.com>
>>> Date: Sun, 21 Jun 2026 22:12:43 +0100
>>> Subject: [PATCH iproute2-next]  "ip help" wrong output, exit code.
>>>
>>> Changed output of "ip help" from standard error to standard output. And
>>> Exit is now 0 instead of -1. "ip help|grep bridge" - now gives bridge
>>> syntax instead of flooding user with everything from "ip help".
>>> ---
>>> ip/ip.c | 4 ++--
>>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/ip/ip.c b/ip/ip.c
>>> index e4b71bde..4627b61c 100644
>>> --- a/ip/ip.c
>>> +++ b/ip/ip.c
>>> @@ -56,7 +56,7 @@ static void usage(void) __attribute__((noreturn));
>>>
>>> static void usage(void)
>>> {
>>> -fprintf(stderr,
>>> +fprintf(stdout,
>>> "Usage: ip [ OPTIONS ] OBJECT { COMMAND | help }\n"
>>> "       ip [ -force ] -batch filename\n"
>>> "where  OBJECT := { address | addrlabel | fou | help | ila | ioam | l2tp
>>> | link |\n"
>>> @@ -72,7 +72,7 @@ static void usage(void)
>>> "                    -o[neline] | -t[imestamp] | -ts[hort] | -b[atch]
>>> [filename] |\n"
>>> "                    -rc[vbuf] [size] | -n[etns] name | -N[umeric] |
>>> -a[ll] |\n"
>>> "                    -c[olor]}\n");
>>> -exit(-1);
>>> +exit(0);
>>> }
>> Your mailer damages white space.
>>
> The output also needs to depend on whether these is a 'usage' error or
> if 'help' is requested.
> Code code is correct for the former - except it should do exit(1).
>
> 	David
>
>

^ permalink raw reply

* Re: [PATCH 05/23] powerpc/powermac: fix OF node refcount
From: Bartosz Golaszewski @ 2026-06-22 17:43 UTC (permalink / raw)
  To: Madhavan Srinivasan
  Cc: brgl, linux-kernel, netdev, linux-arm-msm, linux-sound,
	driver-core, devicetree, linuxppc-dev, linux-i2c, iommu, linux-pm,
	imx, linux-arm-kernel, intel-xe, dri-devel, linux-usb, linux-mips,
	platform-driver-x86, Bartosz Golaszewski, stable, Lee Jones,
	Mark Brown, Thierry Reding, Sebastian Hesselbarth, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Srinivas Kandagatla, Greg Kroah-Hartman, Vinod Koul,
	Rafael J. Wysocki, Danilo Krummrich, Rob Herring, Saravana Kannan,
	Michael Ellerman, Nicholas Piggin, Christophe Leroy (CS GROUP),
	Andi Shyti, Andy Shevchenko, Joerg Roedel, Will Deacon,
	Robin Murphy, Doug Berger, Florian Fainelli,
	Broadcom internal kernel review list, Ulf Hansson, Frank Li,
	Sascha Hauer, Pengutronix Kernel Team, Fabio Estevam,
	Matthew Brost, Thomas Hellström, Rodrigo Vivi, David Airlie,
	Simona Vetter, Peter Chen, Paul Cercueil, Bin Liu, Philipp Zabel,
	Maximilian Luz, Hans de Goede, Ilpo Järvinen,
	Krzysztof Kozlowski, Benjamin Herrenschmidt
In-Reply-To: <20260521-pdev-fwnode-ref-v1-5-88c324a1b8d2@oss.qualcomm.com>

On Thu, 21 May 2026 10:36:28 +0200, Bartosz Golaszewski
<bartosz.golaszewski@oss.qualcomm.com> said:
> Platform devices created with platform_device_alloc() call
> platform_device_release() when the last reference to the device's
> kobject is dropped. This function calls of_node_put() unconditionally.
> This works fine for devices created with platform_device_register_full()
> but users of the split approach (platform_device_alloc() +
> platform_device_add()) must bump the reference of the of_node they
> assign manually. Add the missing call to of_node_get().
>
> Cc: stable@vger.kernel.org
> Fixes: 81e5d8646ff6 ("i2c/powermac: Register i2c devices from device-tree")
> Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
> ---
>  arch/powerpc/platforms/powermac/low_i2c.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/platforms/powermac/low_i2c.c b/arch/powerpc/platforms/powermac/low_i2c.c
> index da72a30ab8657e6dc7e6f3437af612155783d8f9..973f58771d9636605ed5d3e91b45008543b584d3 100644
> --- a/arch/powerpc/platforms/powermac/low_i2c.c
> +++ b/arch/powerpc/platforms/powermac/low_i2c.c
> @@ -1471,7 +1471,7 @@ static int __init pmac_i2c_create_platform_devices(void)
>  		if (bus->platform_dev == NULL)
>  			return -ENOMEM;
>  		bus->platform_dev->dev.platform_data = bus;
> -		bus->platform_dev->dev.of_node = bus->busnode;
> +		bus->platform_dev->dev.of_node = of_node_get(bus->busnode);
>  		platform_device_add(bus->platform_dev);
>  	}
>
>
> --
> 2.47.3
>
>

Madhavan, can you please pick this up and send it upstream as a fix please?
Not having to carry it with the rest of the series will make things easier
for the next release.

Thanks,
Bartosz

^ permalink raw reply

* Re: [PATCH net 01/14] netfilter: flowtable: fix offloaded ct timeout never being extended
From: patchwork-bot+netdevbpf @ 2026-06-22 17:40 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: netfilter-devel, davem, netdev, kuba, pabeni, edumazet, fw, horms
In-Reply-To: <20260620222738.112506-2-pablo@netfilter.org>

Hello:

This series was applied to netdev/net.git (main)
by Pablo Neira Ayuso <pablo@netfilter.org>:

On Sun, 21 Jun 2026 00:27:25 +0200 you wrote:
> From: Adrian Bente <adibente@gmail.com>
> 
> OpenWrt has recently migrated many platforms to kernel 6.18. On the
> MediaTek platform, which supports hardware network offloading, WiFi
> connections accelerated via the WED path were observed to drop after
> roughly 300 seconds.
> 
> [...]

Here is the summary with links:
  - [net,01/14] netfilter: flowtable: fix offloaded ct timeout never being extended
    https://git.kernel.org/netdev/net/c/53b3e60edb67
  - [net,02/14] netfilter: nf_queue: pin bridge device while NFQUEUE holds fake dst
    https://git.kernel.org/netdev/net/c/c9c9b37f8c55
  - [net,03/14] netfilter: xt_cluster: reject template conntracks in hash match
    https://git.kernel.org/netdev/net/c/5feba91006ec
  - [net,04/14] netfilter: flowtable: fix and simplify IP6IP6 tunnel handling
    https://git.kernel.org/netdev/net/c/f4c2d8668d85
  - [net,05/14] netfilter: ipset: Don't use test_bit() in lockless RCU readers in hash types
    https://git.kernel.org/netdev/net/c/e4b4984e28c1
  - [net,06/14] netfilter: ipset: Don't use test_bit() in lockless RCU readers in bitmap types
    https://git.kernel.org/netdev/net/c/1171192ac9af
  - [net,07/14] netfilter: ipset: fix order of kfree_rcu() and rcu_assign_pointer()
    https://git.kernel.org/netdev/net/c/3ca9982a8882
  - [net,08/14] netfilter: ipset: make sure gc is properly stopped
    https://git.kernel.org/netdev/net/c/4a597a87e2e2
  - [net,09/14] netfilter: nft_payload: reject offsets exceeding 65535 bytes
    https://git.kernel.org/netdev/net/c/213be32f46a2
  - [net,10/14] netfilter: nft_meta_bridge: add validate callback for get operations
    https://git.kernel.org/netdev/net/c/bff1c8b49a9c
  - [net,11/14] netfilter: nft_flow_offload: zero device address for non-ether case
    https://git.kernel.org/netdev/net/c/e409c23c2d06
  - [net,12/14] netfilter: nf_reject: skip iphdr options when looking for icmp header
    https://git.kernel.org/netdev/net/c/af8d6ae09c0a
  - [net,13/14] netfilter: nf_conntrack_expect: use conntrack GC to reap expectations
    https://git.kernel.org/netdev/net/c/b8b09dc2bf35
  - [net,14/14] netfilter: nft_meta_bridge: fix NFT_META_BRI_IIFPVID stack leak
    https://git.kernel.org/netdev/net/c/27dd2997746d

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH iproute2] ip: return correct status from help command
From: Rose @ 2026-06-22 17:18 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev
In-Reply-To: <20260622081637.172a6bb8@phoenix.local>

Hi Stephen,

Thanks for the feedback. I'd love to finish this up. I can apply the
same fix to the other commands in iproute2 sometime after work, around
17:30 UTC today.

Thanks again,
Rose

On Mon, Jun 22, 2026 at 10:16 AM Stephen Hemminger
<stephen@networkplumber.org> wrote:
>
> On Sun, 21 Jun 2026 18:03:11 +0000
> Rose Wright <rosesophiewright@gmail.com> wrote:
>
> > Currently, "ip help" or "ip -help" always returns an error code because usage() is used as a fall through on "ip" and defaults to stderr with -1.
> >
> > This is a minor bug that breaks "ip help | grep" and other scripts that rely on standard exit codes. The fix is to pass the status code as a parameter into usage() and change stderr to stdout when needed.
> >
> > Signed-off-by: Rose Wright <rosesophiewright@gmail.com>
> > ---
>
> This is the closest of the three submissions, but there are way more commands in iproute2
> than just ip. Need to address all the commands. Looks like perfect trivial job for AI
> coding tools. I am looking into it now.
>

^ permalink raw reply

* [PATCH net] tipc: fix UAF in cleanup_bearer() due to premature dst_cache_destroy()
From: Eric Dumazet @ 2026-06-22 17:10 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, netdev, eric.dumazet, Eric Dumazet,
	syzbot+e14bc5d4942756023b77, Xin Long, Jon Maloy

TIPC UDP media bearer teardown calls dst_cache_destroy() on its
replicast caches before calling synchronize_net() to wait for
concurrent RCU readers (transmitters) to finish:

static void cleanup_bearer(struct work_struct *work)
{
...
	list_for_each_entry_safe(rcast, tmp, &ub->rcast.list, list) {
		dst_cache_destroy(&rcast->dst_cache);
		list_del_rcu(&rcast->list);
		kfree_rcu(rcast, rcu);
	}
...
	dst_cache_destroy(&ub->rcast.dst_cache);
	udp_tunnel_sock_release(ub->sk);
	synchronize_net();
...
}

This is highly buggy because dst_cache_destroy() immediately frees the
per-CPU cache memory (free_percpu()) and releases the cached dst
entries without any synchronization.

If a concurrent transmitter (e.g., tipc_udp_xmit()) is running on another
CPU under RCU protection, it can call dst_cache_get() concurrently,
leading to:
1. Use-After-Free on the per-CPU cache pointer itself (crash).
2. "rcuref - imbalanced put()" warning if it attempts to release a
   dst that was concurrently released by dst_cache_destroy().

Furthermore, calling kfree(ub) immediately after synchronize_net() without
closing the socket first (or waiting after closing it) leaves a window
where a concurrent receiver (tipc_udp_recv()) could start after
synchronize_net(), access ub, and suffer a UAF when kfree(ub) runs.

To fix this, we must defer dst_cache_destroy() and kfree(ub) until after
we have ensured that no more readers can see the bearer/socket and all
existing readers have finished:

1. Move the rcast entries from the public list to a private list
   and delete them using list_del_rcu() (stops new transmit readers).
2. Release the bearer socket using udp_tunnel_sock_release() (stops
   new receive readers).
3. Call synchronize_net() to wait for all outstanding RCU readers
   (both transmit and receive) to finish.
4. Now that it is safe, call dst_cache_destroy() on all isolated
   rcast entries and the main bearer cache, and free the memory.

Fixes: e9c1a793210f ("tipc: add dst_cache support for udp media")
Reported-by: syzbot+e14bc5d4942756023b77@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/6a396a66.52ae72c2.136ac7.0003.GAE@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Xin Long <lucien.xin@gmail.com>
Cc: Jon Maloy <jon.maloy@ericsson.com>
---
 net/tipc/udp_media.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/net/tipc/udp_media.c b/net/tipc/udp_media.c
index 988b8a7f953ad6da860e6190f1f244650f121dce..befaf7137caf642462b7203a2429a60386e64db8 100644
--- a/net/tipc/udp_media.c
+++ b/net/tipc/udp_media.c
@@ -808,21 +808,26 @@ static void cleanup_bearer(struct work_struct *work)
 {
 	struct udp_bearer *ub = container_of(work, struct udp_bearer, work);
 	struct udp_replicast *rcast, *tmp;
+	LIST_HEAD(private_list);
 	struct tipc_net *tn;

 	list_for_each_entry_safe(rcast, tmp, &ub->rcast.list, list) {
-		dst_cache_destroy(&rcast->dst_cache);
 		list_del_rcu(&rcast->list);
-		kfree_rcu(rcast, rcu);
+		list_add(&rcast->list, &private_list);
 	}

 	tn = tipc_net(sock_net(ub->sk));

-	dst_cache_destroy(&ub->rcast.dst_cache);
 	udp_tunnel_sock_release(ub->sk);

-	/* Note: could use a call_rcu() to avoid another synchronize_net() */
 	synchronize_net();
+
+	list_for_each_entry_safe(rcast, tmp, &private_list, list) {
+		dst_cache_destroy(&rcast->dst_cache);
+		kfree(rcast);
+	}
+
+	dst_cache_destroy(&ub->rcast.dst_cache);
 	atomic_dec(&tn->wq_count);
 	kfree(ub);
 }
-- 
2.55.0.rc0.799.gd6f94ed593-goog

^ permalink raw reply related

* Re: [PATCH] net: sungem: fix probe error cleanup
From: Simon Horman @ 2026-06-22 17:02 UTC (permalink / raw)
  To: Ruoyu Wang
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, netdev, linux-kernel
In-Reply-To: <20260620155326.80582-1-ruoyuw560@gmail.com>

On Sat, Jun 20, 2026 at 11:53:26PM +0800, Ruoyu Wang wrote:
> gem_init_one() calls gem_remove_one() when register_netdev() fails.
> That path unregisters and frees resources owned by the net_device,
> then probe continues into its own cleanup labels and touches the same
> state again.
> 
> Clear the driver data and remove the NAPI instance on this error path,
> then let the existing probe cleanup labels release the resources once.

Hi Ruoyu,

I think it would be useful to explain how this problem was found,
naming any publicly available tools that were used.

And to explain what testing has occurred.

> 

A fixes tag should go here (no blank line between it and your
Signed-off-by line).
> Signed-off-by: Ruoyu Wang <ruoyuw560@gmail.com>

...

-- 
pw-bot: changes-requested

^ permalink raw reply

* [syzbot] [net?] WARNING in rcuref_put_slowpath
From: syzbot @ 2026-06-22 17:01 UTC (permalink / raw)
  To: davem, edumazet, horms, kuba, linux-kernel, netdev, pabeni,
	syzkaller-bugs

Hello,

syzbot found the following issue on:

HEAD commit:    4fa3f5fabb30 Add linux-next specific files for 20260616
git tree:       linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=1427abd2580000
kernel config:  https://syzkaller.appspot.com/x/.config?x=6c414e1864e61ef6
dashboard link: https://syzkaller.appspot.com/bug?extid=e14bc5d4942756023b77
compiler:       Debian clang version 22.1.6 (++20260514074242+fc4aad7b5db3-1~exp1~20260514074407.73), Debian LLD 22.1.6

Unfortunately, I don't have any reproducer for this issue yet.

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/bf5b803a695d/disk-4fa3f5fa.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/47871e7c589e/vmlinux-4fa3f5fa.xz
kernel image: https://storage.googleapis.com/syzbot-assets/53cd9ef32a2b/bzImage-4fa3f5fa.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+e14bc5d4942756023b77@syzkaller.appspotmail.com

------------[ cut here ]------------
rcuref - imbalanced put()
WARNING: lib/rcuref.c:266 at rcuref_put_slowpath+0x16e/0x1d0 lib/rcuref.c:266, CPU#0: ktimers/0/16
Modules linked in:
CPU: 0 UID: 0 PID: 16 Comm: ktimers/0 Not tainted syzkaller #0 PREEMPT_{RT,(full)} 
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/09/2026
RIP: 0010:rcuref_put_slowpath+0x16e/0x1d0 lib/rcuref.c:266
Code: c1 e8 03 42 0f b6 04 38 84 c0 75 48 c7 03 00 00 00 a0 31 c0 e9 6d ff ff ff e8 fe 13 80 06 e8 49 d7 14 fd 48 8d 3d 62 f3 e6 0a <67> 48 0f b9 3a 48 89 df be 04 00 00 00 e8 40 b3 80 fd 48 89 d8 48
RSP: 0018:ffffc90000157560 EFLAGS: 00010246
RAX: ffffffff84b04017 RBX: ffff888025328740 RCX: ffff88801d680000
RDX: 0000000000000100 RSI: 0000000000000000 RDI: ffffffff8f973380
RBP: ffffc900001575e8 R08: 0000000000000000 R09: 0000000000000100
R10: dffffc0000000000 R11: ffffed1004a650e9 R12: 1ffff9200002aeac
R13: dffffc0000000000 R14: 00000000dfffffff R15: dffffc0000000000
FS:  0000000000000000(0000) GS:ffff888125ed3000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f67d5f2c558 CR3: 0000000026bb4000 CR4: 00000000003526f0
Call Trace:
 <TASK>
 __rcuref_put include/linux/rcuref.h:117 [inline]
 rcuref_put+0x15b/0x170 include/linux/rcuref.h:173
 dst_release+0x31/0x1b0 net/core/dst.c:251
 dst_cache_per_cpu_get+0x25a/0x2d0 net/core/dst_cache.c:57
 dst_cache_get+0x10d/0x1e0 net/core/dst_cache.c:75
 tipc_udp_xmit+0xcf/0xb30 net/tipc/udp_media.c:177
 tipc_bearer_xmit_skb+0x2b3/0x400 net/tipc/bearer.c:576
 tipc_disc_timeout+0x642/0x790 net/tipc/discover.c:338
 call_timer_fn+0x192/0x5e0 kernel/time/timer.c:1748
 expire_timers kernel/time/timer.c:1799 [inline]
 __run_timers kernel/time/timer.c:2374 [inline]
 __run_timer_base+0x67b/0x9b0 kernel/time/timer.c:2386
 run_timer_base kernel/time/timer.c:2395 [inline]
 run_timer_softirq+0xb7/0x170 kernel/time/timer.c:2405
 handle_softirqs+0x1d9/0x6c0 kernel/softirq.c:626
 __do_softirq kernel/softirq.c:660 [inline]
 run_ktimerd+0x69/0x100 kernel/softirq.c:1155
 smpboot_thread_fn+0x57c/0xa80 kernel/smpboot.c:160
 kthread+0x388/0x470 kernel/kthread.c:436
 ret_from_fork+0x514/0xb70 arch/x86/kernel/process.c:158
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
 </TASK>
----------------
Code disassembly (best guess):
   0:	c1 e8 03             	shr    $0x3,%eax
   3:	42 0f b6 04 38       	movzbl (%rax,%r15,1),%eax
   8:	84 c0                	test   %al,%al
   a:	75 48                	jne    0x54
   c:	c7 03 00 00 00 a0    	movl   $0xa0000000,(%rbx)
  12:	31 c0                	xor    %eax,%eax
  14:	e9 6d ff ff ff       	jmp    0xffffff86
  19:	e8 fe 13 80 06       	call   0x680141c
  1e:	e8 49 d7 14 fd       	call   0xfd14d76c
  23:	48 8d 3d 62 f3 e6 0a 	lea    0xae6f362(%rip),%rdi        # 0xae6f38c
* 2a:	67 48 0f b9 3a       	ud1    (%edx),%rdi <-- trapping instruction
  2f:	48 89 df             	mov    %rbx,%rdi
  32:	be 04 00 00 00       	mov    $0x4,%esi
  37:	e8 40 b3 80 fd       	call   0xfd80b37c
  3c:	48 89 d8             	mov    %rbx,%rax
  3f:	48                   	rex.W


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

^ permalink raw reply

* Re: [PATCH net] sctp: fix err_chunk memory leaks in INIT handling
From: Simon Horman @ 2026-06-22 16:54 UTC (permalink / raw)
  To: Xin Long
  Cc: network dev, linux-sctp, davem, kuba, Eric Dumazet, Paolo Abeni,
	Marcelo Ricardo Leitner
In-Reply-To: <0656704f1b0158287c98aec09ba36c83e4a537ab.1781970534.git.lucien.xin@gmail.com>

On Sat, Jun 20, 2026 at 11:48:54AM -0400, Xin Long wrote:
> When sctp_verify_init() encounters unrecognized parameters, it allocates an
> err_chunk to report them. However, this chunk is leaked in several code
> paths:
> 
> 1. In sctp_sf_do_5_1B_init(), if security_sctp_assoc_request() fails after
>    sctp_verify_init() has populated err_chunk, the function returns
>    immediately without freeing it.
> 
> 2. In sctp_sf_do_unexpected_init(), the same leak occurs on the
>    security_sctp_assoc_request() failure path.
> 
> 3. In sctp_sf_do_unexpected_init(), on the success path after copying
>    unrecognized parameters to the INIT-ACK, the function returns without
>    freeing err_chunk, unlike sctp_sf_do_5_1B_init() which properly frees
>    it.
> 
> Fix all three leaks by adding sctp_chunk_free(err_chunk) calls before
> returning in the error paths and on the success path in
> sctp_sf_do_unexpected_init().
> 
> Fixes: c081d53f97a1 ("security: pass asoc to sctp_assoc_request and sctp_sk_clone")
> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> Reported-by: Sashiko <sashiko-bot@kernel.org>
> Signed-off-by: Xin Long <lucien.xin@gmail.com>

Reviewed-by: Simon Horman <horms@kernel.org>


^ permalink raw reply

* Re: [PATCH net v2 2/2] sctp: add INIT verification after cookie unpacking
From: Simon Horman @ 2026-06-22 16:51 UTC (permalink / raw)
  To: lucien.xin
  Cc: Simon Horman, netdev, linux-sctp, davem, kuba, edumazet, pabeni,
	marcelo.leitner
In-Reply-To: <c34fee6105f9051a52f8e774fe4c35f0e6ba0afc.1781968162.git.lucien.xin@gmail.com>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
Full review at: https://netdev-ai.bots.linux.dev/sashiko/
---
sctp: add INIT verification after cookie unpacking

This commit adds sctp_verify_init() checks in the COOKIE-ECHO processing
paths sctp_sf_do_5_1D_ce() and sctp_sf_do_5_2_4_dupcook() when
cookie_auth_enable is disabled, and updates sctp_verify_init() to bound
the parameter walk against peer_init length rather than chunk->chunk_end.
On verification failure the new association is freed and an ABORT is
generated via sctp_abort_on_init_err().

> diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c
> index 8c636f045e45..6967e889d1bd 100644
> --- a/net/sctp/sm_statefuns.c
> +++ b/net/sctp/sm_statefuns.c
> @@ -650,11 +650,12 @@ enum sctp_disposition sctp_sf_do_5_1D_ce(struct net *net,
>  					 struct sctp_cmd_seq *commands)
>  {
>  	struct sctp_ulpevent *ev, *ai_ev = NULL, *auth_ev = NULL;
> +	struct sctp_chunk *err_chk_p = NULL;
>  	struct sctp_association *new_asoc;
>  	struct sctp_init_chunk *peer_init;
>  	struct sctp_chunk *chunk = arg;
> -	struct sctp_chunk *err_chk_p;
>  	struct sctp_chunk *repl;
> +	enum sctp_cid cid;
>  	struct sock *sk;
>  	int error = 0;
>  
> @@ -728,6 +729,18 @@ enum sctp_disposition sctp_sf_do_5_1D_ce(struct net *net,
>  		}
>  	}
>  
> +	peer_init = (struct sctp_init_chunk *)(chunk->subh.cookie_hdr + 1);
> +	cid = peer_init->chunk_hdr.type;
> +	if (!sctp_sk(sk)->cookie_auth_enable &&
> +	    !sctp_verify_init(net, ep, asoc, cid, peer_init, chunk,
> +			      &err_chk_p)) {
> +		sctp_association_free(new_asoc);
> +		return sctp_abort_on_init_err(net, ep, asoc, cid, arg, commands,
> +					      err_chk_p);
> +	}

[Critical]
When cookie_auth_enable is disabled the cookie HMAC is not validated by
sctp_unpack_cookie(), so every byte of the embedded INIT (including
peer_init->chunk_hdr.type) is attacker-controlled.  Is it safe to feed
that byte directly into both sctp_verify_init() and
sctp_abort_on_init_err() as cid?

Inside the COOKIE-ECHO body the embedded chunk is by RFC always an INIT.
Could the cid argument be hard-coded to SCTP_CID_INIT (or
peer_init->chunk_hdr.type be validated against SCTP_CID_INIT before use)
instead of trusting the wire byte?

Two attacker-reachable consequences look possible if cid is forged to
SCTP_CID_INIT_ACK (3) and the STATE_COOKIE parameter is omitted:

sctp_verify_init() then returns 0 via this branch with *errp populated:

	if ((SCTP_CID_INIT_ACK == cid) && !has_cookie)
		return sctp_process_missing_param(asoc, SCTP_PARAM_STATE_COOKIE,
						  chunk, errp);

control transfers to sctp_abort_on_init_err(net, ep, asoc, cid, ...) with
cid == SCTP_CID_INIT_ACK and a non-NULL err_chunk.  After sending the
ABORT packet, that helper falls through to its out: label because
cid == SCTP_CID_INIT_ACK:

	if (cid != SCTP_CID_INIT_ACK) {
		if (!packet)
			return SCTP_DISPOSITION_NOMEM;
		return SCTP_DISPOSITION_CONSUME;
	}

out:
	...
	return sctp_stop_t1_and_abort(net, commands, error, ECONNREFUSED,
				      asoc, chunk->transport);

In the sctp_sf_do_5_1D_ce() context, asoc is typically NULL (the function
itself opens with if (asoc && !sctp_vtag_verify(chunk, asoc))).  Can the
SCTP_CMD_NEW_STATE / SCTP_CMD_TIMER_STOP / SCTP_CMD_INIT_FAILED commands
queued by sctp_stop_t1_and_abort() against a NULL asoc be dispatched
without the sideeffect interpreter dereferencing asoc?

> @@ -2205,6 +2219,19 @@ enum sctp_disposition sctp_sf_do_5_2_4_dupcook(
>  	switch (action) {
>  	case 'A': /* Association restart. */
>  	case 'B': /* Collision case B. */
> +		peer_init = (struct sctp_init_chunk *)
> +				(chunk->subh.cookie_hdr + 1);
> +		cid = peer_init->chunk_hdr.type;
> +		if (!sctp_sk(ep->base.sk)->cookie_auth_enable &&
> +		    !sctp_verify_init(net, ep, asoc, cid, peer_init, chunk,
> +				      &err_chk_p)) {
> +			sctp_association_free(new_asoc);
> +			return sctp_abort_on_init_err(net, ep, asoc, cid, arg,
> +						      commands, err_chk_p);
> +		}
> +		if (err_chk_p)
> +			sctp_chunk_free(err_chk_p);
> +		fallthrough;

[Critical]
The same attacker-controlled cid is propagated here as well.  In
sctp_sf_do_5_2_4_dupcook() asoc is the existing established association.
With a forged cid of SCTP_CID_INIT_ACK and missing STATE_COOKIE, can a
single unauthenticated COOKIE-ECHO drive sctp_abort_on_init_err() into
its out: path and have sctp_stop_t1_and_abort() tear down the established
asoc and report ECONNREFUSED to the application?

Would constraining the verification to the well-defined INIT case (for
example, hard-coding SCTP_CID_INIT or rejecting any cid !=
SCTP_CID_INIT before calling sctp_verify_init()/sctp_abort_on_init_err())
avoid both of these paths?

^ permalink raw reply

* Re: [PATCH net v2 0/2] net: ethernet: sunplus: spl2sw: fix of_node refcount leaks
From: 呂芳騰 @ 2026-06-22 16:45 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Shitalkumar Gandhi, Andrew Lunn, David S. Miller, Eric Dumazet,
	Paolo Abeni, Simon Horman, netdev, linux-kernel,
	Shitalkumar Gandhi
In-Reply-To: <20260621132239.79000dae@kernel.org>

Hi Jakub,

You are right. I will submit a patch to mark the SUNPLUS ETHERNET DRIVER
as Orphaned, as I no longer have the hardware or the capacity to maintain it.

I contacted the managers at Sunplus, but they have declined to take over
the maintenance position.

Thank you for your understanding, and sorry for any inconvenience caused.

Best regards,
Wells Lu

Jakub Kicinski <kuba@kernel.org> 於 2026年6月22日週一 上午4:22寫道：
>
> On Sun, 21 Jun 2026 12:38:06 +0800 呂芳騰 wrote:
> > I'm sorry that I can't test the fix.
> > I've left from Suplus and don't have the relevant hardware.
>
> That makes things harder.. but you don't necessarily need HW to review
> most of the patches. If you don't intend to serve as a maintainer of
> the sunplus driver please sense a patch to MAINTAINERS and step down.
> Right now you are listed but don't seem to be fulfilling the duties.
> Or please review the patches to the best of your ability without
> testing.

^ permalink raw reply

* Re: [PATCH iproute2-next] "ip help" wrong output, exit code.
From: David Laight @ 2026-06-22 16:44 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Dmitri Seletski, netdev
In-Reply-To: <20260622075700.27806286@phoenix.local>

On Mon, 22 Jun 2026 07:57:00 -0700
Stephen Hemminger <stephen@networkplumber.org> wrote:

> On Sun, 21 Jun 2026 22:48:59 +0100
> Dmitri Seletski <drjoms@gmail.com> wrote:
> 
> > From 0805e07105cd15c5b94271a4706e50e3c65dbde5 Mon Sep 17 00:00:00 2001
> > From: Dmitri Seletski <drjoms@gmail.com>
> > Date: Sun, 21 Jun 2026 22:12:43 +0100
> > Subject: [PATCH iproute2-next]  "ip help" wrong output, exit code.
> > 
> > Changed output of "ip help" from standard error to standard output. And 
> > Exit is now 0 instead of -1. "ip help|grep bridge" - now gives bridge 
> > syntax instead of flooding user with everything from "ip help".
> > ---
> > ip/ip.c | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/ip/ip.c b/ip/ip.c
> > index e4b71bde..4627b61c 100644
> > --- a/ip/ip.c
> > +++ b/ip/ip.c
> > @@ -56,7 +56,7 @@ static void usage(void) __attribute__((noreturn));
> > 
> > static void usage(void)
> > {
> > -fprintf(stderr,
> > +fprintf(stdout,
> > "Usage: ip [ OPTIONS ] OBJECT { COMMAND | help }\n"
> > "       ip [ -force ] -batch filename\n"
> > "where  OBJECT := { address | addrlabel | fou | help | ila | ioam | l2tp 
> > | link |\n"
> > @@ -72,7 +72,7 @@ static void usage(void)
> > "                    -o[neline] | -t[imestamp] | -ts[hort] | -b[atch] 
> > [filename] |\n"
> > "                    -rc[vbuf] [size] | -n[etns] name | -N[umeric] | 
> > -a[ll] |\n"
> > "                    -c[olor]}\n");
> > -exit(-1);
> > +exit(0);
> > }  
> 
> Your mailer damages white space.
> 

The output also needs to depend on whether these is a 'usage' error or
if 'help' is requested.
Code code is correct for the former - except it should do exit(1).

	David



^ permalink raw reply

* Re: [PATCH net v3] net: airoha: fix BQL underflow in shared QDMA TX ring
From: Simon Horman @ 2026-06-22 16:39 UTC (permalink / raw)
  To: Lorenzo Bianconi
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Wayen Yan, linux-arm-kernel, linux-mediatek, netdev
In-Reply-To: <20260620-airoha-bql-fixes-v3-1-76b95374e63e@kernel.org>

On Sat, Jun 20, 2026 at 05:04:51PM +0200, Lorenzo Bianconi wrote:
> When multiple netdevs share a QDMA TX ring and one device is stopped,
> netdev_tx_reset_subqueue() zeroes that device's BQL counters while its
> pending skbs remain in the shared HW TX ring. When NAPI later completes
> those skbs via netdev_tx_completed_queue(), the already-zeroed
> dql->num_queued counter underflows.
> 
> Fix the issue:
> - Remove netdev_tx_reset_subqueue() from airoha_dev_stop() so pending
>   skbs are completed naturally by NAPI with proper BQL accounting.
> - Rework airoha_qdma_tx_cleanup() to disable TX DMA, flush BQL
>   counters, DMA-unmap and free all pending skbs while skb->dev
>   references are still valid. Use a per-queue flushing flag checked
>   under q->lock in airoha_dev_xmit() to prevent races between teardown
>   and transmit. Call airoha_qdma_stop_napi() before
>   airoha_qdma_tx_cleanup() at the call sites.
> - Move DMA engine start into probe. Split DMA teardown so TX DMA is
>   disabled in airoha_qdma_tx_cleanup() and RX DMA in
>   airoha_qdma_cleanup().
> - Remove qdma->users counter since DMA lifetime is now tied to
>   probe/cleanup rather than per-netdev open/stop.
> 
> Fixes: a9c2ca61fec7 ("net: airoha: Support multiple net_devices for a single FE GDM port")
> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> ---
> Changes in v3:
> - Remove airoha_qdma users counter.
> - Drop DEV_STATE_FLUSH bit and add per-queue flushing bool to avoid any
>   race between airoha_qdma_tx_flush() and airoha_dev_xmit().
> - Refactor airoha_qdma_cleanup_tx_queue().
> - Rename airoha_qdma_cleanup_tx_queue() in airoha_qdma_tx_cleanup().
> - Link to v2: https://lore.kernel.org/r/20260619-airoha-bql-fixes-v2-1-4351d6a24484@kernel.org
> 

Thanks for the updates.

Reviewed-by: Simon Horman <horms@kernel.org>


^ permalink raw reply

* Re: building ynl afaics requires updating the UAPI headers first
From: Thorsten Leemhuis @ 2026-06-22 16:33 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Donald Hunter, netdev, Riana Tauro
In-Reply-To: <20260622090534.2af8ae9e@kernel.org>

On 6/22/26 18:05, Jakub Kicinski wrote:
> On Fri, 19 Jun 2026 09:28:47 +0200 Thorsten Leemhuis wrote:
>> On 6/19/26 02:06, Jakub Kicinski wrote:
>>> On Thu, 18 Jun 2026 15:39:46 +0200 Thorsten Leemhuis wrote:  
>>>> DRM_RAS_CMD_CLEAR_ERROR_COUNTER was introduced to mainline yesterday as
>>>> ee18d39a087792 ("drm/drm_ras: Add clear-error-counter netlink command to
>>>> drm_ras") [v7.1-post].
>>>>
>>>> I finally looked closer today and noticed how to prevent this: update
>>>> the kernel's UAPI files (e.g. the stuff that lives in /usr/include/) on
>>>> the builder. Thing is: that's basically impossible to do from a srpm, as
>>>> those should not change the build environment and can't even when
>>>> working as non-root.
>>>> [...]  
>>>
>>> Can't repro for some reason, but we probably need something like 
>>> commit 46e9b0224475abc to add the explicit include rule.  
>>
>> Thx for the pointer. So I guess you mean something like the below,
>> which did the trick for me. Will submit this as properly, unless
>> someone points out something stupid in it.
>
> [...]
>
> Looks good, did you send it?

No, because the funny thing is: now I fail to reproduce it myself. And I
don't know why, as 24h earlier when you had written "Can't repro for
some reason" I had once more checked that I could trigger this by
downgrading Fedora's kernel-headers package to a version from some weeks
ago. Not sure what changed since then.

Want me to sent it nevertheless?

Ciao, Thorsten

^ permalink raw reply

* Re: [PATCH net-next RESEND v3 0/2] udp: fix FOU/GUE over multicast
From: Kuniyuki Iwashima @ 2026-06-22 16:20 UTC (permalink / raw)
  To: Anton Danilov
  Cc: netdev, Willem de Bruijn, davem, David Ahern, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Shuah Khan,
	linux-kselftest
In-Reply-To: <cover.1782067871.git.littlesmilingcloud@gmail.com>

On Sun, Jun 21, 2026 at 12:04 PM Anton Danilov
<littlesmilingcloud@gmail.com> wrote:
>
> This is a resend of the v3 series originally posted on 2026-05-05,
> which did not receive review feedback during the previous net-next
> window.

Probably due to RFC, which means it is not official yet, and the series
is also dropped from patchwork.


> No changes since v3; rebased cleanly on current net-next
> (after the net-next-7.2 merge).

net-next is closed, please repost next Monday, following the process doc:

https://docs.kernel.org/7.0/process/maintainer-netdev.html#resending-after-review
---8<---
The new version of patches should be posted as a separate thread,
not as a reply to the previous posting.
---8<---

^ permalink raw reply

* [PATCH] tools: ynl: build archives with $(AR)
From: Greg Thelen @ 2026-06-22 16:16 UTC (permalink / raw)
  To: Donald Hunter, Jakub Kicinski, David S. Miller
  Cc: netdev, linux-kernel, Greg Thelen

Use $(AR) to allow build system to override the archiver tool (e.g.,
when cross-compiling for a different architecture) by setting the AR
environment variable.

GNU Make defaults AR to ar, so this change will not break existing build
environments that do not explicitly set AR.

Fixes: 07c3cc51a085 ("tools: net: package libynl for use in selftests")
Fixes: 86878f14d71a ("tools: ynl: user space helpers")
Signed-off-by: Greg Thelen <gthelen@google.com>
---
 tools/net/ynl/Makefile           | 2 +-
 tools/net/ynl/generated/Makefile | 2 +-
 tools/net/ynl/lib/Makefile       | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/net/ynl/Makefile b/tools/net/ynl/Makefile
index d514a48dae27..3cefe4ed96cb 100644
--- a/tools/net/ynl/Makefile
+++ b/tools/net/ynl/Makefile
@@ -22,7 +22,7 @@ tests: | lib generated libynl.a
 ynltool: | lib generated libynl.a
 libynl.a: | lib generated
 	@echo -e "\tAR $@"
-	@ar rcs $@ lib/ynl.o generated/*-user.o
+	@$(AR) rcs $@ lib/ynl.o generated/*-user.o
 
 $(SUBDIRS):
 	@if [ -f "$@/Makefile" ] ; then \
diff --git a/tools/net/ynl/generated/Makefile b/tools/net/ynl/generated/Makefile
index 86e1e4a959a7..ea4128f612d6 100644
--- a/tools/net/ynl/generated/Makefile
+++ b/tools/net/ynl/generated/Makefile
@@ -37,7 +37,7 @@ all: protos.a $(HDRS) $(SRCS) $(KHDRS) $(KSRCS) $(UAPI) $(RSTS)
 
 protos.a: $(OBJS)
 	@echo -e "\tAR $@"
-	@ar rcs $@ $(OBJS)
+	@$(AR) rcs $@ $(OBJS)
 
 %-user.h: $(SPECS_DIR)/%.yaml $(TOOL)
 	@echo -e "\tGEN $@"
diff --git a/tools/net/ynl/lib/Makefile b/tools/net/ynl/lib/Makefile
index 4b2b98704ff9..9b98c0599600 100644
--- a/tools/net/ynl/lib/Makefile
+++ b/tools/net/ynl/lib/Makefile
@@ -15,7 +15,7 @@ all: ynl.a
 
 ynl.a: $(OBJS)
 	@echo -e "\tAR $@"
-	@ar rcs $@ $(OBJS)
+	@$(AR) rcs $@ $(OBJS)
 
 clean:
 	rm -f *.o *.d *~
-- 
2.55.0.rc0.738.g0c8ab3ebcc-goog


^ permalink raw reply related

* Re: [PATCH net v2 2/2] net: airoha: fix netif_set_real_num_tx_queues for sparse QoS channels
From: Simon Horman @ 2026-06-22 16:16 UTC (permalink / raw)
  To: Lorenzo Bianconi
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Wayen Yan, linux-arm-kernel, linux-mediatek, netdev
In-Reply-To: <ajk0kS76nahtop8g@lore-desk>

On Mon, Jun 22, 2026 at 03:11:45PM +0200, Lorenzo Bianconi wrote:
> > On Fri, Jun 19, 2026 at 01:37:14PM +0200, Lorenzo Bianconi wrote:
> > > airoha_tc_htb_alloc_leaf_queue() assigns queue IDs based on the channel
> > > index (opt->qid = AIROHA_NUM_TX_RING + channel), but updates
> > > real_num_tx_queues with a simple increment (num_tx_queues + 1). When QoS
> > > channels are allocated sparsely (e.g., channels 0 and 3 without 1 and
> > > 2), the returned qid can exceed real_num_tx_queues, causing out-of-bounds
> > > accesses in the networking stack.
> > > For example, allocating channel 0 then channel 3 results in
> > > real_num_tx_queues = 34 but qid = 35, which is out of range [0, 34).
> > > Fix this by computing real_num_tx_queues based on the highest active
> > > channel index rather than using a simple counter, in both the allocation
> > > and deletion paths.
> > > 
> > > Fixes: ef1ca9271313b ("net: airoha: Add sched HTB offload support")
> > > Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> > 
> > Thanks for the update since v1.
> > 
> > Reviewed-by: Simon Horman <horms@kernel.org>
> 
> Hi Simon,
> 
> thx for the review.
> 
> > 
> > FTR, there is an AI-generated review of this patch on sashiko.dev.
> > I do not think that should impede the progress of this patch but
> > you may want to consider it in the context of follow-up.
> 
> Even if it is not introduced by this patch, I do not think what is reported
> by Sashiko is a real issue since airoha_eth driver implements
> ndo_select_queue() callback and the selected queue is always in the range
> [0, AIROHA_NUM_TX_RING[. HTB queues (in the range
> [AIROHA_NUM_TX_RING, AIROHA_NUM_TX_RING + AIROHA_NUM_QOS_CHANNELS[) are just 
> 'offloaded' and never used in the TC sw path. Agree?

Thanks Lorenzo,

I've looked over this more closely with the above in mind and I agree.

^ permalink raw reply

* Re: [PATCH bpf-next] bpf, unix: Guard sk_msg-dependent code behind CONFIG_NET_SOCK_MSG
From: Kuniyuki Iwashima @ 2026-06-22 16:11 UTC (permalink / raw)
  To: jakub
  Cc: ast, bpf, daniel, jiayuan.chen, john.fastabend, kernel-team, kuba,
	netdev
In-Reply-To: <20260622-bpf-sk_msg-split-unix-v1-1-d7e0cb7bb03b@cloudflare.com>

From: Jakub Sitnicki <jakub@cloudflare.com>
Date: Mon, 22 Jun 2026 14:58:34 +0200
> Prepare to decouple BPF_SYSCALL config option from NET_SOCK_MSG.
> 
> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
> ---
>  net/unix/unix_bpf.c | 6 ++++++

AFAIU, everyhing in this file is for BPF_SYSCALL && NET_SOCK_MSG,
or am I missing something ?

I feel that it would be cleaner to add a new Kconfig that depends
on BPF_SYSCALL and NET_SOCK_MSG, change Makefile obj-$(CONFIG_XXX),
and guard .psock_update_sk_prot in af_unix.c


>  1 file changed, 6 insertions(+)
> 
> diff --git a/net/unix/unix_bpf.c b/net/unix/unix_bpf.c
> index f86ff19e9764..5289a04b4993 100644
> --- a/net/unix/unix_bpf.c
> +++ b/net/unix/unix_bpf.c
> @@ -7,6 +7,7 @@
>  
>  #include "af_unix.h"
>  
> +#ifdef CONFIG_NET_SOCK_MSG
>  #define unix_sk_has_data(__sk, __psock)					\
>  		({	!skb_queue_empty(&__sk->sk_receive_queue) ||	\
>  			!skb_queue_empty(&__psock->ingress_skb) ||	\
> @@ -94,6 +95,7 @@ static int unix_bpf_recvmsg(struct sock *sk, struct msghdr *msg,
>  	sk_psock_put(sk, psock);
>  	return copied;
>  }
> +#endif /* CONFIG_NET_SOCK_MSG */
>  
>  static struct proto *unix_dgram_prot_saved __read_mostly;
>  static DEFINE_SPINLOCK(unix_dgram_prot_lock);
> @@ -107,8 +109,10 @@ static void unix_dgram_bpf_rebuild_protos(struct proto *prot, const struct proto
>  {
>  	*prot        = *base;
>  	prot->close  = sock_map_close;
> +#ifdef CONFIG_NET_SOCK_MSG
>  	prot->recvmsg = unix_bpf_recvmsg;
>  	prot->sock_is_readable = sk_msg_is_readable;
> +#endif
>  }
>  
>  static void unix_stream_bpf_rebuild_protos(struct proto *prot,
> @@ -116,8 +120,10 @@ static void unix_stream_bpf_rebuild_protos(struct proto *prot,
>  {
>  	*prot        = *base;
>  	prot->close  = sock_map_close;
> +#ifdef CONFIG_NET_SOCK_MSG
>  	prot->recvmsg = unix_bpf_recvmsg;
>  	prot->sock_is_readable = sk_msg_is_readable;
> +#endif
>  	prot->unhash  = sock_map_unhash;
>  }
>  
> 

^ permalink raw reply

* Re: building ynl afaics requires updating the UAPI headers first
From: Jakub Kicinski @ 2026-06-22 16:05 UTC (permalink / raw)
  To: Thorsten Leemhuis; +Cc: Donald Hunter, netdev, Riana Tauro
In-Reply-To: <cec7685e-df12-4501-b851-9c8f7b8b06cb@leemhuis.info>

On Fri, 19 Jun 2026 09:28:47 +0200 Thorsten Leemhuis wrote:
> On 6/19/26 02:06, Jakub Kicinski wrote:
> > On Thu, 18 Jun 2026 15:39:46 +0200 Thorsten Leemhuis wrote:  
> >> DRM_RAS_CMD_CLEAR_ERROR_COUNTER was introduced to mainline yesterday as
> >> ee18d39a087792 ("drm/drm_ras: Add clear-error-counter netlink command to
> >> drm_ras") [v7.1-post].
> >>
> >> I finally looked closer today and noticed how to prevent this: update
> >> the kernel's UAPI files (e.g. the stuff that lives in /usr/include/) on
> >> the builder. Thing is: that's basically impossible to do from a srpm, as
> >> those should not change the build environment and can't even when
> >> working as non-root.
> >> [...]  
> > 
> > Can't repro for some reason, but we probably need something like 
> > commit 46e9b0224475abc to add the explicit include rule.  
> 
> Thx for the pointer. So I guess you mean something like the below,
> which did the trick for me. Will submit this as properly, unless
> someone points out something stupid in it.
> 
> Ciao, Thorsten
> 
> diff --git a/tools/net/ynl/Makefile.deps b/tools/net/ynl/Makefile.deps
> index cc53b2f21c444..43d06ecbae93d 100644
> --- a/tools/net/ynl/Makefile.deps
> +++ b/tools/net/ynl/Makefile.deps
> @@ -14,10 +14,12 @@ UAPI_PATH:=../../../../include/uapi/
>  
>  get_hdr_inc=-D$(1) -include $(UAPI_PATH)/linux/$(2)
>  get_hdr_inc2=-D$(1) -D$(2) -include $(UAPI_PATH)/linux/$(3)
> +get_hdr_inc_drm=-D$(1) -include $(UAPI_PATH)/drm/$(2)
>  
>  CFLAGS_dev-energymodel:=$(call get_hdr_inc,_LINUX_DEV_ENERGYMODEL_H,dev_energymodel.h)
>  CFLAGS_devlink:=$(call get_hdr_inc,_LINUX_DEVLINK_H_,devlink.h)
>  CFLAGS_dpll:=$(call get_hdr_inc,_LINUX_DPLL_H,dpll.h)
> +CFLAGS_drm_ras:=$(call get_hdr_inc_drm,_LINUX_DRM_RAS_H,drm_ras.h)
>  CFLAGS_ethtool:=$(call get_hdr_inc,_LINUX_TYPELIMITS_H,typelimits.h) \
>         $(call get_hdr_inc,_LINUX_ETHTOOL_H,ethtool.h) \
>         $(call get_hdr_inc,_LINUX_ETHTOOL_NETLINK_H_,ethtool_netlink.h) \

Looks good, did you send it? Please send it to networking if possible.
I don't have much experience with DRM trees but I don't want this patch
to get stuck somewhere and keep causing issues.

^ permalink raw reply

* Re: [RFC net-next 08/15] ipxlat: add translation engine and dispatch core
From: Ralf Lici @ 2026-06-22 15:56 UTC (permalink / raw)
  To: Beniamino Galvani
  Cc: Toke Høiland-Jørgensen, netdev, Daniel Gröber,
	Antonio Quartulli, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, linux-kernel
In-Reply-To: <ajjzF3W-0xDC133f@tp>

On Mon, 22 Jun 2026 10:32:23 +0200, Beniamino Galvani <bgalvani@redhat.com> wrote:
> Hi,
>

Hi Beniamino,

> speaking as a maintainer of NetworkManager, I would also like to see
> this feature in the kernel!
>
> In NetworkManager currently we are using a BPF program [1] to
> implement the CLAT, but that approach comes with limitations: for
> example, we can't fragment v4->v6 packets if needed, and it's not
> possible to recompute checksums in certain cases (e.g. for v4->v6 UDP
> packets with zero checksum, and for fragmented ICMP). systemd-networkd
> is also adding CLAT support via BPF [2], with a fallback to userspace
> for the cases that can't be handled in kernel.
>
> It would be very useful to have a native in-kernel CLAT that solves
> the limitations of BPF-based solutions, and can be used by different
> tools without having to re-implement everything from scratch.
>

Thanks, this is really useful context.

CLAT is exactly the kind of consumer ipxlat aims to serve, and the gaps
you hit in BPF line up directly with paths ipxlat already handles. I'll
cite this as motivation in the next cover letter, if that's alright.

While reading the BPF programs, two things stood out that would help
shape v2. On addressing, both implementations use a single NAT64/PLAT
prefix for destinations plus an explicit local_v4<>local_v6 mapping for
the host itself. ipxlat today maps both source and destination through
one RFC 6052 prefix, so this suggests v2 should probably support
explicit 1:1 address mappings (EAM, RFC 7757) alongside prefix
embedding. Is a single local mapping enough for your case, or do you
foresee needing several?

On the consumer side, is there anything in how NM models a connection
that would make a particular kernel model awkward to drive, e.g. needing
to attach to an already-managed interface, or conversely being able to
create and own a dedicated device? We're still settling the
kernel-facing model for v2, so consumer input here is genuinely
valuable.

Thanks,

-- 
Ralf Lici
Mandelbit Srl

^ permalink raw reply

* Re: AppArmor: TCP Fast Open bypasses connect mediation (last unaddressed LSM)
From: Ryan Lee @ 2026-06-22 16:01 UTC (permalink / raw)
  To: Bryam Vargas
  Cc: John Johansen, linux-security-module, apparmor, Paul Moore,
	James Morris, Serge E . Hallyn, Mickael Salaun, Stephen Smalley,
	Matthieu Buffet, Mikhail Ivanov, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, netdev, linux-kernel
In-Reply-To: <20260619011138.264578-1-hexlabsecurity@proton.me>

On Thu, Jun 18, 2026 at 6:12 PM Bryam Vargas <hexlabsecurity@proton.me> wrote:
>
> Hello John, and LSM folks,
>
> I have been working on the Landlock TCP Fast Open connect bypass [1]. Stephen
> Smalley's SELinux fix for the same issue [3] -- "Similar to Landlock, SELinux was
> not updated when TCP Fast Open support was introduced ..." -- made me go back and
> check the rest of the connect-mediating LSMs, since I had only been looking at
> Landlock. With Landlock [2], SELinux [3], and now TOMOYO [4] all getting fixes,
> AppArmor is the last one with the same gap and no fix yet.
>
> Root cause (shared with the others)
> -----------------------------------
> security_socket_connect() has a single call site, net/socket.c (the connect(2)
> syscall). TCP Fast Open performs an implicit connect inside sendmsg:
>
>   tcp_sendmsg -> tcp_sendmsg_fastopen -> __inet_stream_connect(..., is_sendmsg=1)
>               -> sk->sk_prot->connect()                 net/ipv4/{tcp.c,af_inet.c}
>
> This never calls security_socket_connect(); the only LSM hook on the path is
> security_socket_sendmsg(). mptcp_sendmsg_fastopen reaches the same code and is a
> second producer.
>
> AppArmor
> --------
> apparmor_socket_connect() requests AA_MAY_CONNECT; apparmor_socket_sendmsg() (via
> aa_sock_msg_perm) requests AA_MAY_SEND. These are distinct bits, and apparmor_parser
> compiles them independently: "network send inet stream," yields accept mask 0x02
> while "network connect inet stream," yields 0x40. So an egress-restriction profile
> that grants send but not connect is bypassed by MSG_FASTOPEN.
>
> Reproduced on 6.12.88 with apparmor active. Under a profile granting the inet/inet6
> stream lifecycle except connect:
>
>   aa-exec -p egress_restricted -- ./probe
>   [TCP ] connect(2)=EACCES(blocked)  sendto(MSG_FASTOPEN)=OK(reached)  => connection established
>   [TCP6] connect(2)=EACCES(blocked)  sendto(MSG_FASTOPEN)=OK(reached)  => connection established
>
> (The coarse "network inet stream," idiom grants connect anyway, so this only bites the
> fine-grained "allow send, deny connect" policy that the asymmetry is meant to serve.)
>
> Fix
> ---
> Same shape as the TOMOYO [4] and SELinux [3] fixes: in apparmor_socket_sendmsg (or
> aa_sock_msg_perm), when MSG_FASTOPEN is set and msg_name carries a destination on a
> not-yet-connected stream socket, additionally require aa_sk_perm(OP_CONNECT,
> AA_MAY_CONNECT, sk). I am happy to send that patch and the reproducer.

We would appreciate having the patch and the reproducer to look over.
Ideally, the reproducer could be integrated as a regression test into
the upstream repo at
https://gitlab.com/apparmor/apparmor/-/tree/master/tests/regression/apparmor?ref_type=heads,
but we can also assist with that step.
>
> (A single core check in __inet_stream_connect(), gated on is_sendmsg, would have
> covered all five LSMs and both the TCP and MPTCP producers in one place -- the kernel
> already mediates the analogous implicit-connect-on-send for AF_UNIX via
> security_unix_may_send and for SCTP via security_sctp_bind_connect. But since the
> other four LSMs are taking per-hook fixes, AppArmor matching them is the consistent
> move; mentioning the core option only in case it is preferred.)
>
> [1] Landlock: LANDLOCK_ACCESS_NET_CONNECT_TCP bypass via TCP Fast Open (report)
>     https://lore.kernel.org/r/20260616201615.275032-1-hexlabsecurity@proton.me
> [2] landlock: fix TCP Fast Open connection bypass (Matthieu Buffet)
>     https://lore.kernel.org/r/20260617180526.15627-2-matthieu@buffet.re
> [3] selinux: check connect-related permissions on TCP Fast Open (Stephen Smalley)
>     https://lore.kernel.org/r/20260618175513.112443-2-stephen.smalley.work@gmail.com
> [4] tomoyo: Enforce connect policy in TCP Fast Open (Matthieu Buffet)
>     https://lore.kernel.org/r/20260619002207.61104-1-matthieu@buffet.re
>
> Thanks,
> Bryam Vargas
>
>

^ permalink raw reply

* Re: [RFC net-next 00/17] MPTCP KTLS support
From: Jakub Kicinski @ 2026-06-22 16:00 UTC (permalink / raw)
  To: Geliang Tang
  Cc: Matthieu Baerts, Mat Martineau, David S. Miller, Eric Dumazet,
	Paolo Abeni, Simon Horman, Neal Cardwell, Kuniyuki Iwashima,
	John Fastabend, Sabrina Dubroca, Hannes Reinecke, Geliang Tang,
	netdev, mptcp, Gang Yan, Zqiang
In-Reply-To: <cover.1782123118.git.tanggeliang@kylinos.cn>

On Mon, 22 Jun 2026 18:43:20 +0800 Geliang Tang wrote:
> Subject: [RFC net-next 00/17] MPTCP KTLS support

Please no. We have a ton of unfixed bugs and may have to revert some of
the features we dropped back in. I'd prefer to avoid large new bug
surfaces until we reach an LTS release.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox