Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net] vsock/virtio: fix potential unbounded skb queue
From: Stefano Garzarella @ 2026-05-07  9:09 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Eric Dumazet, Arseniy Krasnov, Bobby Eshleman, Stefan Hajnoczi,
	David S . Miller, Jakub Kicinski, Paolo Abeni, Simon Horman,
	netdev, eric.dumazet, Arseniy Krasnov, Jason Wang, Xuan Zhuo,
	Eugenio Pérez, kvm, virtualization
In-Reply-To: <20260506113554-mutt-send-email-mst@kernel.org>

On Wed, May 06, 2026 at 11:37:45AM -0400, Michael S. Tsirkin wrote:
>On Tue, May 05, 2026 at 06:11:13PM +0200, Stefano Garzarella wrote:
>> On Tue, May 05, 2026 at 07:14:36AM -0700, Eric Dumazet wrote:
>> > On Tue, May 5, 2026 at 6:52 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
>> > >
>> > > On Thu, Apr 30, 2026 at 12:26:52PM +0000, Eric Dumazet wrote:
>> > > >virtio_transport_inc_rx_pkt() checks vvs->rx_bytes + len > vvs->buf_alloc.
>> > > >
>> > > >virtio_transport_recv_enqueue() skips coalescing for packets
>> > > >with VIRTIO_VSOCK_SEQ_EOM.
>> > > >
>> > > >If fed with packets with len == 0 and VIRTIO_VSOCK_SEQ_EOM,
>> > > >a very large number of packets can be queued
>> > > >because vvs->rx_bytes stays at 0.
>> > > >
>> > > >Fix this by estimating the skb metadata size:
>> > > >
>> > > >       (Number of skbs in the queue) * SKB_TRUESIZE(0)
>> > > >
>> > > >Fixes: 077706165717 ("virtio/vsock: don't use skbuff state to account credit")
>> > > >Signed-off-by: Eric Dumazet <edumazet@google.com>
>> > > >Cc: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
>> > > >Cc: Stefan Hajnoczi <stefanha@redhat.com>
>> > > >Cc: Stefano Garzarella <sgarzare@redhat.com>
>> > > >Cc: "Michael S. Tsirkin" <mst@redhat.com>
>> > > >Cc: Jason Wang <jasowang@redhat.com>
>> > > >Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
>> > > >Cc: "Eugenio Pérez" <eperezma@redhat.com>
>> > > >Cc: kvm@vger.kernel.org
>> > > >Cc: virtualization@lists.linux.dev
>> > > >---
>> > > > net/vmw_vsock/virtio_transport_common.c | 4 +++-
>> > > > 1 file changed, 3 insertions(+), 1 deletion(-)
>> > > >
>> > > >diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>> > > >index 416d533f493d7b07e9c77c43f741d28cfcd0953e..9b8014516f4fb1130ae184635fbba4dfee58bd64 100644
>> > > >--- a/net/vmw_vsock/virtio_transport_common.c
>> > > >+++ b/net/vmw_vsock/virtio_transport_common.c
>> > > >@@ -447,7 +447,9 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
>> > > > static bool virtio_transport_inc_rx_pkt(struct virtio_vsock_sock *vvs,
>> > > >                                       u32 len)
>> > > > {
>> > > >-      if (vvs->buf_used + len > vvs->buf_alloc)
>> > > >+      u64 skb_overhead = (skb_queue_len(&vvs->rx_queue) + 1) * SKB_TRUESIZE(0);
>> > > >+
>> > > >+      if (skb_overhead + vvs->buf_used + len > vvs->buf_alloc)
>> > > >               return false;
>> > >
>> > > I'm not sure about this fix, I mean that maybe this is incomplete.
>> > > In virtio-vsock, there is a credit mechanism between the two peers:
>> > > https://docs.oasis-open.org/virtio/virtio/v1.3/csd01/virtio-v1.3-csd01.html#x1-4850003
>> > >
>> > > This takes only the payload into account, so it’s true that this problem
>> > > exists; however, perhaps we should also inform the other peer of a lower
>> > > credit balance, otherwise the other peer will believe it has much more
>> > > credit than it actually does, send a large payload, and then the packet
>> > > will be discarded and the data lost (there are no retransmissions,
>> > > etc.).
>> >
>> > I dunno, perhaps revert 077706165717 ("virtio/vsock: don't use skbuff
>> > state to account credit")
>> > and find a better fix then?
>>
>> IIRC the same issue was there before the commit fixed by that one (commit
>> 71dc9ec9ac7d ("virtio/vsock: replace virtio_vsock_pkt with sk_buff")), so
>> not sure about reverting it TBH.
>>
>> CCing Arseniy and Bobby.
>>
>> >
>> > There is always a discrepancy between skb->len and skb->truesize.
>> > You will not be able to announce a 1MB window, and accept one milliion
>> > skb of 1-byte each.
>> >
>> > This kind of contract is broken.
>> >
>>
>> Yep, I agree, but before we start discarding data (and losing it), IMHO we
>> should at least inform the other peer that we're out of space.
>>
>> @Stefan, @Michael, do you think we can do something in the spec to avoid
>> this issue and in some way take into account also the metadata in the
>> credit. I mean to avoid the 1-byte packets flooding.
>>
>> Thanks,
>> Stefano
>
>Why do we need the metadata? Just don't keep it around if you begin
>running low on memory.

I don't think removing the skuffs will be easy; we added them for ebpf, 
zero-copy, and seqpacket as well. For now, we're already doing 
something: merging the skuffs if they don't have EOM set.

As a quick fix, I'm thinking of reducing the `buf_alloc` value to 
account for the overhead and notifying the other peer, at least until we 
find a better solution.

Stefano


^ permalink raw reply

* Re: [PATCH ipsec-next v8 12/14] xfrm: add XFRM_MSG_MIGRATE_STATE for single SA migration
From: Steffen Klassert @ 2026-05-07  9:12 UTC (permalink / raw)
  To: Antony Antony
  Cc: Herbert Xu, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, David Ahern, Masahide NAKAMURA,
	Paul Moore, Stephen Smalley, Ondrej Mosnacek, Jonathan Corbet,
	Shuah Khan, Sabrina Dubroca, netdev, linux-kernel, selinux,
	linux-doc, Chiachang Wang, Yan Yan, devel
In-Reply-To: <migrate-state-v8-12-4578fb016965@secunet.com>

On Tue, May 05, 2026 at 06:34:29AM +0200, Antony Antony wrote:
> Add a new netlink method to migrate a single xfrm_state.
> Unlike the existing migration mechanism (SA + policy), this
> supports migrating only the SA and allows changing the reqid.
> 
> The SA is looked up via xfrm_usersa_id, which uniquely
> identifies it, so old_saddr is not needed. old_daddr is carried in
> xfrm_usersa_id.daddr.
> 
> The reqid is invariant in the old migration.
> 
> Signed-off-by: Antony Antony <antony.antony@secunet.com>
> 
> ---
> v7->v8: - removed the unknown-flags validation block
> v6->v7: - add flags field to xfrm_user_migrate_state (based on Sabrina's feedback)
>   - add XFRM_MIGRATE_STATE_NO_OFFLOAD (bit 0): suppresses offload
>   - omit-to-inherit; mutually exclusive with XFRMA_OFFLOAD_DEV
>   - zero-initialize struct xfrm_migrate m[XFRM_MAX_DEPTH]
>   - add struct xfrm_selector new_sel to xfrm_user_migrate_state
>   - add XFRM_MIGRATE_STATE_UPDATE_SEL: derive new selector
>     from SA addresses when old selector is a single-host match
> v5->v6: - (Feedback from Sabrina's review)
>   - reqid change: use xfrm_state_add, not xfrm_state_insert
>   - encap and xuo: use nla_data() directly, no kmemdup needed
>   - notification failure is non-fatal: set extack warning, return 0
>   - drop state direction, x->dir, check, not required
>   - reverse xmas tree local variable ordering
>   - use NL_SET_ERR_MSG_WEAK for clone failure message
>   - fix implicit padding in xfrm_user_migrate_state uapi struct
>   - support XFRMA_SET_MARK/XFRMA_SET_MARK_MASK in XFRM_MSG_MIGRATE_STATE
> v4->v5: - set portid, seq in XFRM_MSG_MIGRATE_STATE netlink notification
>   - rename error label to out for clarity
>   - add locking and synchronize after cloning
>   - change some if(x) to if(!x) for clarity
>   - call __xfrm_state_delete() inside the lock
>   - return error from xfrm_send_migrate_state() instead of always returning 0
> v3->v4: preserve reqid invariant for each state migrated
> v2->v3: free the skb on the error path
> v1->v2: merged next patch here to fix use uninitialized value
>   - removed unnecessary inline
>   - added const when possible
> ---
>  include/net/xfrm.h          |  16 ++-
>  include/uapi/linux/xfrm.h   |  21 ++++
>  net/xfrm/xfrm_device.c      |   2 +-
>  net/xfrm/xfrm_policy.c      |  19 +++
>  net/xfrm/xfrm_state.c       |  29 +++--
>  net/xfrm/xfrm_user.c        | 281 +++++++++++++++++++++++++++++++++++++++++++-
>  security/selinux/nlmsgtab.c |   3 +-
>  7 files changed, 357 insertions(+), 14 deletions(-)

...

> +static unsigned int xfrm_migrate_state_msgsize(const struct xfrm_migrate *m,
> +					       u8 dir)
> +{
> +	return NLMSG_ALIGN(sizeof(struct xfrm_user_migrate_state)) +
> +		(m->encap ? nla_total_size(sizeof(struct xfrm_encap_tmpl)) : 0) +
> +		(m->xuo ? nla_total_size(sizeof(struct xfrm_user_offload)) : 0) +
> +		(m->new_mark ? nla_total_size(sizeof(struct xfrm_mark)) : 0) +
> +		(m->smark.v ? nla_total_size(sizeof(u32)) * 2 : 0) + /* SET_MARK + SET_MARK_MASK */

xfrm_smark_put() checks (m->v | m->m), maybe you should
do (m->smark.v | m->smark.m) here.

> +		(m->mapping_maxage ? nla_total_size(sizeof(u32)) : 0) +
> +		(m->nat_keepalive_interval ? nla_total_size(sizeof(u32)) : 0) +
> +		(dir ? nla_total_size(sizeof(u8)) : 0); /* XFRMA_SA_DIR */
> +}

Also, the function is not really readable.

> +
> +static int xfrm_send_migrate_state(const struct xfrm_user_migrate_state *um,
> +				   const struct xfrm_migrate *m,
> +				   u8 dir, u32 portid, u32 seq)
> +{
> +	int err;
> +	struct sk_buff *skb;
> +	struct net *net = &init_net;

This is wrong. I know we had this in the tree for ages, but I now have
a fix in ipsec/testing for it. We need to make this namespace aware.


^ permalink raw reply

* Re: [PATCH net-next v3 09/13] net: lan966x: add PCIe FDMA support
From: Daniel Machon @ 2026-05-07  9:21 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Horatiu Vultur, Steen Hegelund, UNGLinuxDriver,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, Stanislav Fomichev, Herve Codina, Arnd Bergmann,
	Greg Kroah-Hartman, Mohsin Bashir, netdev, linux-kernel, bpf,
	linux-arm-kernel
In-Reply-To: <b7399930-ea04-4899-a760-ca1410065aac@redhat.com>

> On 5/4/26 4:23 PM, Daniel Machon wrote:
> > +static int lan966x_fdma_pci_rx_check_frame(struct lan966x_rx *rx, u64 *src_port)
> > +{
> > +     struct lan966x *lan966x = rx->lan966x;
> > +     struct fdma *fdma = &rx->fdma;
> > +     struct lan966x_port *port;
> > +     struct fdma_db *db;
> > +     void *virt_addr;
> > +     u32 blockl;
> > +
> > +     /* virt_addr points to the IFH. */
> > +     virt_addr = fdma_dataptr_virt_addr_contiguous(fdma,
> > +                                                   fdma->dcb_index,
> > +                                                   fdma->db_index);
> > +
> > +     lan966x_ifh_get_src_port(virt_addr, src_port);
> > +
> > +     if (WARN_ON(*src_port >= lan966x->num_phys_ports))
> > +             return FDMA_ERROR;
> > +
> > +     port = lan966x->ports[*src_port];
> > +     if (!port)
> > +             return FDMA_ERROR;
> > +
> > +     db = fdma_db_next_get(fdma);
> > +
> > +     /* BLOCKL is a 16-bit HW-populated field; reject obviously-bad
> > +      * values before they feed memcpy/XDP sizes.
> > +      */
> > +     blockl = FDMA_DCB_STATUS_BLOCKL(db->status);
> > +     if (blockl < IFH_LEN_BYTES + ETH_FCS_LEN || blockl > fdma->db_size)
> > +             return FDMA_ERROR;
> 
> Pre-existing issues reported by sashiko (most of them actually) can be
> safely ignored/postponed to follow-ups, but the above OOB (and in patch
> 11/13) access looks real and IMHO should be addressed.
> 
> /P
>

This one looks right. The check ought to be: blockl > fdma->db_size -
XDP_PACKET_HEADROOM.

For patch #11, which issue are you referring to? If its the sashiko-gemini
critical issue:

  > +xdp_init_buff(&xdp, fdma->db_size, &port->xdp_rxq);
  >    +
  > +/* Headroom includes the IFH; BPF may grow into it via adjust_he ad.
  > + * The IFH is rebuilt on XDP_TX and unread on XDP_PASS.
  > + */      
  > +xdp_prepare_buff(&xdp,
  > + data - XDP_PACKET_HEADROOM,
  > + XDP_PACKET_HEADROOM + IFH_LEN_BYTES,
  > + data_len,
  > + false);   

Then no, this is a false-positive. The data pointer is already offset by
XDP_PACKET_HEADROOM, so the hard_start lands correctly at offset 0.

/Daniel


^ permalink raw reply

* [PATCH] dt-bindings: net: lan966x: Accept standard ethernet prefixes
From: Linus Walleij @ 2026-05-07  9:26 UTC (permalink / raw)
  To: Herve Codina, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley, Horatiu Vultur
  Cc: netdev, devicetree, Linus Walleij

The dsa.yaml and ethernet-switch.yaml bindings recommend
prefixing ethernet switches and ports with "ethernet-" so
make the LAN966x do the same.

Reported-by: Herve Codina <herve.codina@bootlin.com>
Signed-off-by: Linus Walleij <linusw@kernel.org>
---
 .../devicetree/bindings/net/microchip,lan966x-switch.yaml      | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/microchip,lan966x-switch.yaml b/Documentation/devicetree/bindings/net/microchip,lan966x-switch.yaml
index 306ef9ecf2b9..0f0f35865ef4 100644
--- a/Documentation/devicetree/bindings/net/microchip,lan966x-switch.yaml
+++ b/Documentation/devicetree/bindings/net/microchip,lan966x-switch.yaml
@@ -17,7 +17,7 @@ description: |
 
 properties:
   $nodename:
-    pattern: "^switch@[0-9a-f]+$"
+    pattern: "^(ethernet-)?switch@[0-9a-f]+$"
 
   compatible:
     const: microchip,lan966x-switch
@@ -70,7 +70,7 @@ properties:
     additionalProperties: false
 
     patternProperties:
-      "^port@[0-9a-f]+$":
+      "^(ethernet-)?port@[0-9a-f]+$":
         type: object
 
         $ref: /schemas/net/ethernet-controller.yaml#
@@ -138,7 +138,7 @@ additionalProperties: false
 examples:
   - |
     #include <dt-bindings/interrupt-controller/arm-gic.h>
-    switch: switch@e0000000 {
+    switch: ethernet-switch@e0000000 {
       compatible = "microchip,lan966x-switch";
       reg =  <0xe0000000 0x0100000>,
              <0xe2000000 0x0800000>;
@@ -151,14 +151,14 @@ examples:
         #address-cells = <1>;
         #size-cells = <0>;
 
-        port0: port@0 {
+        port0: ethernet-port@0 {
           reg = <0>;
           phy-handle = <&phy0>;
           phys = <&serdes 0 0>;
           phy-mode = "gmii";
         };
 
-        port1: port@1 {
+        port1: ethernet-port@1 {
           reg = <1>;
           sfp = <&sfp_eth1>;
           managed = "in-band-status";

---
base-commit: 254f49634ee16a731174d2ae34bc50bd5f45e731
change-id: 20260507-lan966-binding-0df62a018509

Best regards,
--  
Linus Walleij <linusw@kernel.org>


^ permalink raw reply related

* Re: [PATCH ipsec-next v8 03/14] xfrm: allow migration from UDP encapsulated to non-encapsulated ESP
From: Sabrina Dubroca @ 2026-05-07  9:26 UTC (permalink / raw)
  To: Antony Antony
  Cc: Steffen Klassert, Herbert Xu, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, David Ahern,
	Masahide NAKAMURA, Paul Moore, Stephen Smalley, Ondrej Mosnacek,
	Jonathan Corbet, Shuah Khan, netdev, linux-kernel, selinux,
	linux-doc, Chiachang Wang, Yan Yan, devel
In-Reply-To: <migrate-state-v8-3-4578fb016965@secunet.com>

2026-05-05, 06:32:30 +0200, Antony Antony wrote:
> The current code prevents migrating an SA from UDP encapsulation to
> plain ESP. This is needed when moving from a NATed path to a non-NATed
> one, for example when switching from IPv4+NAT to IPv6.
> 
> Only copy the existing encapsulation during migration if the encap
> attribute is explicitly provided.
> 
> Note: PF_KEY's SADB_X_MIGRATE always passes encap=NULL and never
> supported encapsulation in migration. PF_KEY is deprecated and was
> in feature freeze when UDP encapsulation was added to xfrm.
> 
> Signed-off-by: Antony Antony <antony.antony@secunet.com>
> Tested-by: Yan Yan <evitayan@google.com>
> ---
>  net/xfrm/xfrm_state.c | 10 ++--------
>  1 file changed, 2 insertions(+), 8 deletions(-)

Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>

If someone complains about this we can add a sysctl
"preserve_old_encap_on_migrate".

-- 
Sabrina

^ permalink raw reply

* Re: [PATCH net-next 4/5] llc: convert to getsockopt_iter
From: Breno Leitao @ 2026-05-07  9:29 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Oliver Hartkopp, Marc Kleine-Budde, Robin van der Gracht,
	Oleksij Rempel, kernel, Jeremy Kerr, Matt Johnston,
	David S. Miller, Eric Dumazet, Paolo Abeni, Simon Horman,
	Shuah Khan, linux-can, linux-kernel, netdev, linux-kselftest,
	kernel-team
In-Reply-To: <20260506172525.27323c23@kernel.org>

On Wed, May 06, 2026 at 05:25:25PM -0700, Jakub Kicinski wrote:
> On Tue, 05 May 2026 04:12:41 -0700 Breno Leitao wrote:
> > Convert LLC socket's getsockopt implementation to use the new
> > getsockopt_iter callback with sockopt_t.
> > 
> > Key changes:
> > - Replace (char __user *optval, int __user *optlen) with sockopt_t *opt
> > - Use opt->optlen for buffer length (input) and returned size (output)
> > - Use copy_to_iter() instead of put_user()/copy_to_user()
> > - Add linux/uio.h for copy_to_iter()
> 
> kdoc needs to be adjusted here.

Good catch, I will update!

> When you repost could you split the CAN stuff out and send it 
> to Marc and co. ? We don't normally take CAN patches directly.

Ack, I will split this series in two, one for CAN and one for non-CAN
drivers.

Thanks for the review,
--breno

^ permalink raw reply

* Re: [PATCH ipsec-next v8 04/14] xfrm: fix NAT-related field inheritance in SA migration
From: Sabrina Dubroca @ 2026-05-07  9:33 UTC (permalink / raw)
  To: Antony Antony
  Cc: Steffen Klassert, Herbert Xu, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, David Ahern,
	Masahide NAKAMURA, Paul Moore, Stephen Smalley, Ondrej Mosnacek,
	Jonathan Corbet, Shuah Khan, netdev, linux-kernel, selinux,
	linux-doc, Chiachang Wang, Yan Yan, devel
In-Reply-To: <migrate-state-v8-4-4578fb016965@secunet.com>

2026-05-05, 06:32:43 +0200, Antony Antony wrote:
> During SA migration via xfrm_state_clone_and_setup(),
> nat_keepalive_interval was silently dropped and never copied to the new
> SA. mapping_maxage was unconditionally copied even when migrating to a
> non-encapsulated SA.

mapping_maxage should be harmless (0/unused on non-encap), but I think
migrating nat_keepalive_interval should be considered a fix:

Fixes: f531d13bdfe3 ("xfrm: support sending NAT keepalives in ESP in UDP states")

(maybe even split out of this series, but that would cause a conflict
with the previous patch)

Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>

-- 
Sabrina

^ permalink raw reply

* [PATCH net-next v7 0/2] net: sfp: extend SMBus support
From: Jonas Jelonek @ 2026-05-07  9:32 UTC (permalink / raw)
  To: Russell King, Andrew Lunn, Heiner Kallweit, David S . Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Maxime Chevallier
  Cc: netdev, linux-kernel, Bjørn Mork, Jonas Jelonek

Today, the SFP driver only drives I2C adapters that advertise full
I2C_FUNC_I2C, or SMBus-only adapters via single-byte transfers (with
hwmon disabled). Several SoCs ship I2C/SMBus-only controllers that
support more than just byte access -- e.g. word and I2C block -- and
have SFP cages wired to them. Today, those adapters either work
poorly or not at all.

This series teaches the SFP driver to use the larger SMBus access
modes when the adapter advertises them, and along the way starts
honoring i2c_adapter quirks on read/write length so adapters that
cap below the SFP block size are handled correctly. Patch 1 is a
small prep doing only the quirks handling; patch 2 extends the
SMBus path itself.

Capability matrix supported by patch 2:
  - BYTE only:                   single-byte access (unchanged).
  - BYTE + WORD:                 word for >=2-byte chunks, byte tail.
  - I2C_BLOCK present:           block as the universal transport.
  - WORD only (no BYTE/BLOCK):   accepted with WARN_ONCE; works for
                                 even-length transfers, odd-length
                                 transfers will error at xfer time.

Adapters with asymmetric R/W capabilities (e.g. only READ_I2C_BLOCK
without WRITE_I2C_BLOCK) remain functionally correct but use the
worse-supported direction's max for both directions, since
i2c_max_block_size is a single field. No mainline I2C driver was
seen advertising such asymmetry; per-direction sizes can be added
later if needed.

---
v6 -> v7:
  - use i2c_block_size instead of i2c_max_block_size (Maxime)
  - move WARN_ONCE into 'else if ()' (Maxime)
  - reword comments
  - included Maxime's Reviewed-by for patch 1
v6: https://lore.kernel.org/netdev/20260505200647.1125311-1-jelonek.jonas@gmail.com/

v5 -> v6:
  - Split adapter-quirks handling into a separate prep patch (1/2).
  - Use I2C_SMBUS_I2C_BLOCK_DATA in the block-write branch (was
    I2C_SMBUS_WORD_DATA), so block writes actually transfer this_len
    bytes (also flagged by Jakub's AI bot review).
  - In sfp_smbus_read/write, check i2c_smbus_xfer() return before
    copying smbus_data into the caller's buffer.
  - Use I2C_BLOCK as the universal transport when available (carries
    any length 1..32); drop the this_len > 2 guard on the block
    branches.
  - Broaden the SMBus gate to also accept BLOCK-only adapters
    (Russell).
  - Accept word-only adapters with WARN_ONCE rather than rejecting
    them (Andrew).
  - Add a short comment in sfp_i2c_configure() explaining the access
    hierarchy (Maxime).
  - Use the all-bits-set form via i2c_check_functionality() for the
    composite I2C_FUNC_SMBUS_* checks (Russell).
v5: https://lore.kernel.org/netdev/20260116113105.244592-1-jelonek.jonas@gmail.com/

v4 -> v5:
  - made a more general approach, also covering word access
v4: https://lore.kernel.org/netdev/20260109101321.2804-1-jelonek.jonas@gmail.com/

v3 -> v4:
  - fix formal issues
v3: https://lore.kernel.org/netdev/20260105161242.578487-1-jelonek.jonas@gmail.com/

v2 -> v3:
  - fix previous attempt of v2 to fix return value
v2: https://lore.kernel.org/netdev/20260105154653.575397-1-jelonek.jonas@gmail.com/

v1 -> v2:
  - return number of written bytes instead of zero
v1: https://lore.kernel.org/netdev/20251228213331.472887-1-jelonek.jonas@gmail.com/

---
Jonas Jelonek (2):
  net: sfp: apply I2C adapter quirks to limit block size
  net: sfp: extend SMBus support

 drivers/net/phy/sfp.c | 144 ++++++++++++++++++++++++++++++++++--------
 1 file changed, 116 insertions(+), 28 deletions(-)


base-commit: dacf281771a9aed1a723b196120a0de8637910b9
-- 
2.51.0


^ permalink raw reply

* [PATCH net-next v7 1/2] net: sfp: apply I2C adapter quirks to limit block size
From: Jonas Jelonek @ 2026-05-07  9:33 UTC (permalink / raw)
  To: Russell King, Andrew Lunn, Heiner Kallweit, David S . Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Maxime Chevallier
  Cc: netdev, linux-kernel, Bjørn Mork, Jonas Jelonek
In-Reply-To: <20260507093301.1144740-1-jelonek.jonas@gmail.com>

The SFP driver assumes all I2C adapters support reading and writing the
pre-defined block size SFP_EEPROM_BLOCK_SIZE of 16 bytes. This constant
was probably chosen based on good guesses and known limitations of a
range of I2C adapters and SFP modules.

However, I2C adapters may even support less and usually need to specify
this via I2C quirks. Theoretically, such an adapter may provide full
functionality but only support a read and write length of e.g. 8 bytes.
Currently, the SFP driver doesn't account for that.

Add handling for I2C quirks in SFP I2C configuration taking the fields
max_read_len and max_write_len in struct i2c_adapter_quirks into account
to further limit the maximum block size if needed.

Signed-off-by: Jonas Jelonek <jelonek.jonas@gmail.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
---
 drivers/net/phy/sfp.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/net/phy/sfp.c b/drivers/net/phy/sfp.c
index bd970f753beb..e58e29a1e8d2 100644
--- a/drivers/net/phy/sfp.c
+++ b/drivers/net/phy/sfp.c
@@ -807,21 +807,29 @@ static int sfp_smbus_byte_write(struct sfp *sfp, bool a2, u8 dev_addr,
 
 static int sfp_i2c_configure(struct sfp *sfp, struct i2c_adapter *i2c)
 {
+	size_t max_block_size;
+
 	sfp->i2c = i2c;
 
 	if (i2c_check_functionality(i2c, I2C_FUNC_I2C)) {
 		sfp->read = sfp_i2c_read;
 		sfp->write = sfp_i2c_write;
-		sfp->i2c_max_block_size = SFP_EEPROM_BLOCK_SIZE;
+		max_block_size = SFP_EEPROM_BLOCK_SIZE;
 	} else if (i2c_check_functionality(i2c, I2C_FUNC_SMBUS_BYTE_DATA)) {
 		sfp->read = sfp_smbus_byte_read;
 		sfp->write = sfp_smbus_byte_write;
-		sfp->i2c_max_block_size = 1;
+		max_block_size = 1;
 	} else {
 		sfp->i2c = NULL;
 		return -EINVAL;
 	}
 
+	if (i2c->quirks && i2c->quirks->max_read_len)
+		max_block_size = min(max_block_size, i2c->quirks->max_read_len);
+	if (i2c->quirks && i2c->quirks->max_write_len)
+		max_block_size = min(max_block_size, i2c->quirks->max_write_len);
+
+	sfp->i2c_max_block_size = max_block_size;
 	return 0;
 }
 
-- 
2.51.0


^ permalink raw reply related

* [PATCH net-next v7 2/2] net: sfp: extend SMBus support
From: Jonas Jelonek @ 2026-05-07  9:33 UTC (permalink / raw)
  To: Russell King, Andrew Lunn, Heiner Kallweit, David S . Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Maxime Chevallier
  Cc: netdev, linux-kernel, Bjørn Mork, Jonas Jelonek
In-Reply-To: <20260507093301.1144740-1-jelonek.jonas@gmail.com>

Commit 7662abf4db94 ("net: phy: sfp: Add support for SMBus module access")
added SMBus access for SFP modules, but limited it to single-byte
transfers. As a side effect, hwmon is disabled (16-bit reads cannot be
guaranteed atomic) and a warning is printed.

Many SMBus-only I2C controllers in the wild support more than just
byte access, and SFP cages are often wired to such controllers
rather than to a full-featured I2C controller -- e.g. the SMBus
controllers in the Realtek longan and mango SoCs, which advertise
word access and I2C block reads. Today, they cannot drive an SFP at
all without falling back to the byte-only path.

Extend sfp_smbus_read()/sfp_smbus_write() so that, in addition to
the existing byte access, they also use SMBus word access and SMBus
I2C block access whenever the adapter advertises them. Both
directions are handled in a single read and a single write helper
that pick the largest supported transfer per chunk and fall back as
needed.

I2C-block is preferred unconditionally when available: the protocol
carries any length 1..32, so it can serve every chunk -- including
the 1- and 2-byte tails -- without help from word or byte access.
Note that this requires I2C_FUNC_SMBUS_I2C_BLOCK, which reads a
caller-specified number of bytes. This deviates from the official
SMBus Block Read (length is supplied by the slave) but is widely
supported by Linux I2C controllers/drivers.

Capability matrix this implementation supports:

  - BYTE only:                  works (unchanged behaviour); 1-byte
                                xfers, hwmon disabled.
  - BYTE + WORD:                word for >=2-byte chunks, byte for
                                trailing odd byte.
  - I2C_BLOCK present (with or
    without BYTE/WORD):         block as the universal transport for
                                every chunk.
  - WORD only (no BYTE/BLOCK):  accepted with WARN_ONCE. Even-length
                                transfers work; odd-length transfers
                                (e.g. the 3-byte cotsworks fixup
                                write) hit the BYTE branch which the
                                adapter does not implement, so the
                                xfer returns an error and the
                                operation is aborted. No mainline
                                I2C driver was found to advertise
                                WORD without BYTE; the warning lets
                                us learn about it if it ever shows
                                up.

Adapters with asymmetric R/W capabilities (e.g. only READ_I2C_BLOCK
but not WRITE_I2C_BLOCK) remain functionally correct -- the
per-iteration fallback uses the direction-specific bits -- but the
shared i2c_max_block_size is sized by the all-bits-set check, so a
transfer in the better-supported direction is not upgraded. None of
the mainline I2C bus drivers surveyed during review advertise such
asymmetry; promoting i2c_max_block_size to per-direction sizes can
be revisited if needed.

Signed-off-by: Jonas Jelonek <jelonek.jonas@gmail.com>
---
 drivers/net/phy/sfp.c | 134 +++++++++++++++++++++++++++++++++---------
 1 file changed, 107 insertions(+), 27 deletions(-)

diff --git a/drivers/net/phy/sfp.c b/drivers/net/phy/sfp.c
index e58e29a1e8d2..16d41d7ee632 100644
--- a/drivers/net/phy/sfp.c
+++ b/drivers/net/phy/sfp.c
@@ -14,6 +14,7 @@
 #include <linux/platform_device.h>
 #include <linux/rtnetlink.h>
 #include <linux/slab.h>
+#include <linux/unaligned.h>
 #include <linux/workqueue.h>
 
 #include "sfp.h"
@@ -756,50 +757,110 @@ static int sfp_i2c_write(struct sfp *sfp, bool a2, u8 dev_addr, void *buf,
 	return ret == ARRAY_SIZE(msgs) ? len : 0;
 }
 
-static int sfp_smbus_byte_read(struct sfp *sfp, bool a2, u8 dev_addr,
-			       void *buf, size_t len)
+static int sfp_smbus_read(struct sfp *sfp, bool a2, u8 dev_addr, void *buf,
+			  size_t len)
 {
 	union i2c_smbus_data smbus_data;
 	u8 bus_addr = a2 ? 0x51 : 0x50;
+	size_t this_len, transferred;
+	u32 functionality;
 	u8 *data = buf;
 	int ret;
 
-	while (len) {
-		ret = i2c_smbus_xfer(sfp->i2c, bus_addr, 0,
-				     I2C_SMBUS_READ, dev_addr,
-				     I2C_SMBUS_BYTE_DATA, &smbus_data);
-		if (ret < 0)
-			return ret;
+	functionality = i2c_get_functionality(sfp->i2c);
 
-		*data = smbus_data.byte;
+	while (len) {
+		this_len = min(len, sfp->i2c_block_size);
+
+		if (functionality & I2C_FUNC_SMBUS_READ_I2C_BLOCK) {
+			smbus_data.block[0] = this_len;
+			ret = i2c_smbus_xfer(sfp->i2c, bus_addr, 0,
+					     I2C_SMBUS_READ, dev_addr,
+					     I2C_SMBUS_I2C_BLOCK_DATA, &smbus_data);
+			if (ret < 0)
+				return ret;
+
+			memcpy(data, &smbus_data.block[1], this_len);
+			transferred = this_len;
+		} else if (this_len >= 2 &&
+			   (functionality & I2C_FUNC_SMBUS_READ_WORD_DATA)) {
+			ret = i2c_smbus_xfer(sfp->i2c, bus_addr, 0,
+					     I2C_SMBUS_READ, dev_addr,
+					     I2C_SMBUS_WORD_DATA, &smbus_data);
+			if (ret < 0)
+				return ret;
+
+			put_unaligned_le16(smbus_data.word, data);
+			transferred = 2;
+		} else {
+			ret = i2c_smbus_xfer(sfp->i2c, bus_addr, 0,
+					     I2C_SMBUS_READ, dev_addr,
+					     I2C_SMBUS_BYTE_DATA, &smbus_data);
+			if (ret < 0)
+				return ret;
+
+			*data = smbus_data.byte;
+			transferred = 1;
+		}
 
-		len--;
-		data++;
-		dev_addr++;
+		data += transferred;
+		len -= transferred;
+		dev_addr += transferred;
 	}
 
 	return data - (u8 *)buf;
 }
 
-static int sfp_smbus_byte_write(struct sfp *sfp, bool a2, u8 dev_addr,
-				void *buf, size_t len)
+static int sfp_smbus_write(struct sfp *sfp, bool a2, u8 dev_addr, void *buf,
+			   size_t len)
 {
 	union i2c_smbus_data smbus_data;
 	u8 bus_addr = a2 ? 0x51 : 0x50;
+	size_t this_len, transferred;
+	u32 functionality;
 	u8 *data = buf;
 	int ret;
 
+	functionality = i2c_get_functionality(sfp->i2c);
+
 	while (len) {
-		smbus_data.byte = *data;
-		ret = i2c_smbus_xfer(sfp->i2c, bus_addr, 0,
-				     I2C_SMBUS_WRITE, dev_addr,
-				     I2C_SMBUS_BYTE_DATA, &smbus_data);
-		if (ret)
-			return ret;
+		this_len = min(len, sfp->i2c_block_size);
+
+		if (functionality & I2C_FUNC_SMBUS_WRITE_I2C_BLOCK) {
+			smbus_data.block[0] = this_len;
+			memcpy(&smbus_data.block[1], data, this_len);
+
+			ret = i2c_smbus_xfer(sfp->i2c, bus_addr, 0,
+					     I2C_SMBUS_WRITE, dev_addr,
+					     I2C_SMBUS_I2C_BLOCK_DATA, &smbus_data);
+			if (ret < 0)
+				return ret;
+
+			transferred = this_len;
+		} else if (this_len >= 2 &&
+			   (functionality & I2C_FUNC_SMBUS_WRITE_WORD_DATA)) {
+			smbus_data.word = get_unaligned_le16(data);
+			ret = i2c_smbus_xfer(sfp->i2c, bus_addr, 0,
+					     I2C_SMBUS_WRITE, dev_addr,
+					     I2C_SMBUS_WORD_DATA, &smbus_data);
+			if (ret < 0)
+				return ret;
+
+			transferred = 2;
+		} else {
+			smbus_data.byte = *data;
+			ret = i2c_smbus_xfer(sfp->i2c, bus_addr, 0,
+					     I2C_SMBUS_WRITE, dev_addr,
+					     I2C_SMBUS_BYTE_DATA, &smbus_data);
+			if (ret < 0)
+				return ret;
+
+			transferred = 1;
+		}
 
-		len--;
-		data++;
-		dev_addr++;
+		data += transferred;
+		len -= transferred;
+		dev_addr += transferred;
 	}
 
 	return data - (u8 *)buf;
@@ -815,10 +876,29 @@ static int sfp_i2c_configure(struct sfp *sfp, struct i2c_adapter *i2c)
 		sfp->read = sfp_i2c_read;
 		sfp->write = sfp_i2c_write;
 		max_block_size = SFP_EEPROM_BLOCK_SIZE;
-	} else if (i2c_check_functionality(i2c, I2C_FUNC_SMBUS_BYTE_DATA)) {
-		sfp->read = sfp_smbus_byte_read;
-		sfp->write = sfp_smbus_byte_write;
-		max_block_size = 1;
+	} else if (i2c_check_functionality(i2c, I2C_FUNC_SMBUS_BYTE_DATA) ||
+		   i2c_check_functionality(i2c, I2C_FUNC_SMBUS_I2C_BLOCK)) {
+		/* Either protocol alone covers any length: I2C-block carries
+		 * 1..32 bytes per xfer, byte iterates one byte at a time.
+		 */
+		sfp->read = sfp_smbus_read;
+		sfp->write = sfp_smbus_write;
+
+		if (i2c_check_functionality(i2c, I2C_FUNC_SMBUS_I2C_BLOCK))
+			max_block_size = SFP_EEPROM_BLOCK_SIZE;
+		else if (i2c_check_functionality(i2c, I2C_FUNC_SMBUS_WORD_DATA))
+			max_block_size = 2;
+		else
+			max_block_size = 1;
+	} else if (WARN_ONCE(i2c_check_functionality(i2c, I2C_FUNC_SMBUS_WORD_DATA),
+			     "SMBus word-only adapter; odd-length transfers will fail\n")) {
+		/* Word-only: even-length xfers work; odd-length xfers fall
+		 * to BYTE, which the adapter does not advertise and will
+		 * likely fail.
+		 */
+		sfp->read = sfp_smbus_read;
+		sfp->write = sfp_smbus_write;
+		max_block_size = 2;
 	} else {
 		sfp->i2c = NULL;
 		return -EINVAL;
-- 
2.51.0


^ permalink raw reply related

* Re: [Intel-wired-lan] [PATCH net v2] ice: Fix missing 1's complement negation in GCS raw checksum
From: Matt Fleming @ 2026-05-07  9:34 UTC (permalink / raw)
  To: Jacob Keller
  Cc: Tony Nguyen, Aleksandr Loktionov, kernel-team, Matt Fleming,
	stable, Simon Horman, Przemek Kitszel, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Eric Joyner, Paul Greenwalt, Alice Michael, intel-wired-lan,
	netdev, linux-kernel
In-Reply-To: <531aec13-c33f-4e77-ab48-de8861f9b6c6@intel.com>

On Mon, May 04, 2026 at 05:10:23PM -0700, Jacob Keller wrote:
> 
> Hi,
> 
> Based on your patch description, I assume that you've tested this on
> real hardware.
> 
> I dug a little through some of our internal changes history and sawe
> that it looks like the hardware has a register setting in its
> GL_RDPU_CNTRL register which determines whether the checksum value
> reported is inverted or not. In E830 hardware, it is supposed to be off
> (i.e. the checksum value reported already matches the expected setting.
> 
> Perhaps your device somehow got the GL_RDPU_CNTRL register set to the
> wrong mode and that results in the swap being necessary. Hmm.
> 
> I'll ask the team to see if they can confirm this behavior.

Hi Jake,

Thanks for digging into this.

I read GL_RDPU_CNTRL on our affected E830 and the value is the same on
both ports of the NIC:

  0000:c1:00.0: GL_RDPU_CNTRL = 0x0020a275
  0000:c1:00.1: GL_RDPU_CNTRL = 0x0020a275

Decoding bit 22 (E830_GL_RDPU_CNTRL_CHECKSUM_COMPLETE_INV) gives 0,
i.e. the hardware is supposedly in "not inverted" mode, which matches
the default you described.

However, looking at the data on the wire I see:

  - netdev_rx_csum_fault fires ~65 000 times/sec on this host.
  - bpftrace at fexit:ice_process_skb_fields shows skb->csum =
    swab16(raw_csum) directly (no negation), e.g. raw_csum=0xfb4f
    -> skb->csum=0x4ffb.
  - At fentry:__skb_checksum_complete the upper 16 bits of skb->csum
    are 0xFFFF on every TCP/UDP packet -- the signature of nf_ip_checksum
    adding the pseudo-header to a value that was the un-negated raw_csum.
  - fold2(skb->csum_at_fentry + skb_checksum(skb,0,len,0)) ≈ 0xFFFF
    for every packet, which means the two values are ones-complement
    complements of each other, i.e. the driver stored S where the
    stack expects ~S.

Negating the checksum makes the failures go away.

Thanks,
Matt

^ permalink raw reply

* Re: [PATCH net-next v6 04/10] enic: add admin CQ service with MSI-X interrupt and NAPI polling
From: Paolo Abeni @ 2026-05-07  9:42 UTC (permalink / raw)
  To: Satish Kharat, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski
  Cc: netdev, linux-kernel, Sesidhar Baddela
In-Reply-To: <20260503-enic-sriov-v2-admin-channel-v2-v6-4-0af4fbc2d86d@cisco.com>

On 5/3/26 1:22 PM, Satish Kharat wrote:
> +static void enic_admin_msg_enqueue(struct enic *enic, void *buf,
> +				   unsigned int len)
> +{
> +	struct enic_admin_msg *msg;
> +
> +	msg = kmalloc(struct_size(msg, data, len), GFP_ATOMIC);
> +	if (!msg) {
> +		enic->admin_msg_drop_cnt++;
> +		if (net_ratelimit())
> +			netdev_warn(enic->netdev,
> +				    "admin msg enqueue drop (len=%u drops=%llu)\n",
> +				    len, enic->admin_msg_drop_cnt);

Failed allocation will splat dmesg; no need to added additional error
messages (here and elsewhere in the series).

/P


^ permalink raw reply

* RE: [PATCH] ixgbe: E610: do not fill EEE lp_advertised from local PHY caps
From: Jagielski, Jedrzej @ 2026-05-07  9:50 UTC (permalink / raw)
  To: Keller, Jacob E, David CARLIER, Andrew Lunn
  Cc: Nguyen, Anthony L, Kitszel, Przemyslaw, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Loktionov, Aleksandr, intel-wired-lan@lists.osuosl.org,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <92b40e66-3f68-4d4a-b0cf-47b8aea5c72b@intel.com>

>From: Keller, Jacob E <jacob.e.keller@intel.com> 
>Sent: Tuesday, May 5, 2026 12:13 AM
>On 5/4/2026 7:05 AM, David CARLIER wrote:
>> Hi Andrew,
>> 
>>   No E610 here, found it by reading the code - the X550 path
>>   (ixgbe_get_eee_fw) uses a separate FW_PHY_ACT_UD_2 activity and
>>   ixgbe_lp_map[] for partner data, the E610 path just feeds
>>   pcaps.eee_cap from REPORT_ACTIVE_CFG into lp_advertised. None of
>>   the IXGBE_ACI_REPORT_* modes return partner info so that field
>>   can't be right.
>> 
>>   The set path goes hw->mac.ops.setup_eee() ->
>> ixgbe_aci_set_phy_cfg(),
>>   so negotiation is in the firmware. eee_active / eee_enabled come
>>   from link.eee_status from the same FW, if those bits are right then
>>   negotiation works. Can't say more without hardware, Jedrzej or
>>   Aleksandr would know.
>> 
>> Cheers
>
>Hi David,
>
>Thanks for the report and possible patch. The EEE support just merged,
>and I believe the series has undergone testing. It is possible E610 is
>significantly different from X550.
>
>@Jedrzej,
>
>Could you please look at this patch and the report from David and
>confirm if we need this (or a different?) fix or if the code is correct
>for E610 and explain why in that case?
>
>Thanks,
>Jake

Sorry for the delay in responding, i just came back to the office an
 i didn't have access to my mailbox.

After looking into documentation once again and checking it on my setup
i see that David is right. What a catch, thanks! And sorry for my oversight,
i was convinced that negotiated speeds are reported via that field and
it somehow has not been exposed during my tests.

Moreover, looks like E610 currently doesn't report such.

So i believe we would like to have this fix, thank you once again.

Reviewed-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>

Jedrek

^ permalink raw reply

* Re: [PATCH v3 1/2] arm64: dts: renesas: r9a08g046: Add GBETH nodes
From: Geert Uytterhoeven @ 2026-05-07  9:52 UTC (permalink / raw)
  To: Biju
  Cc: Magnus Damm, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Richard Cochran, Biju Das, linux-renesas-soc, devicetree,
	linux-kernel, netdev, Prabhakar Mahadev Lad
In-Reply-To: <20260326111953.31024-2-biju.das.jz@bp.renesas.com>

On Thu, 26 Mar 2026 at 12:19, Biju <biju.das.au@gmail.com> wrote:
> From: Biju Das <biju.das.jz@bp.renesas.com>
>
> Renesas RZ/G3L SoC is equipped with 2x Synopsys DesignWare Ethernet
> (10/100/1000 BASE) with TSN, IP block version 5.30. Add GBETH nodes
> to R9A08G046 RZ/G3L SoC DTSI.
>
> Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
> ---
> v2->v3:
>  * Rebased to boot series.

Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
i.e. will queue in renesas-devel for v7.2.

Gr{oetje,eeting}s,

                        Geert


--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply

* [PATCH net-next V6 0/3] net/mlx5: Avoid payload in skb's linear part for better GRO-processing
From: Tariq Toukan @ 2026-05-07  9:53 UTC (permalink / raw)
  To: Christoph Paasch, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Andrew Lunn, David S. Miller
  Cc: Saeed Mahameed, Tariq Toukan, Mark Bloch, Leon Romanovsky, netdev,
	linux-rdma, linux-kernel, Gal Pressman, Dragos Tatulea,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
	Stanislav Fomichev, Amery Hung, Alexei Starovoitov

Hi,

This is V6 of a series originally submitted by Christoph.

When LRO is enabled on the MLX, mlx5e_skb_from_cqe_mpwrq_nonlinear
copies parts of the payload to the linear part of the skb.

This triggers suboptimal processing in GRO, causing slow throughput.

This patch series addresses this by using eth_get_headlen to compute the
size of the protocol headers and only copy those bits. This results in a
significant throughput improvement (detailed results in the specific
patch).

Regards,
Tariq

---

V6:
- Rebase after Amery's changes.
- Address Amery's concern about header length after XDP pull.
- Add a small optimization to memcpy the header length aligned to cache
  line.

V5: https://lore.kernel.org/all/20250904-cpaasch-pf-927-netmlx5-avoid-copying-the-payload-to-the-malloced-area-v5-0-ea492f7b11ac@openai.com/


Christoph Paasch (2):
  net/mlx5e: DMA-sync earlier in mlx5e_skb_from_cqe_mpwrq_nonlinear
  net/mlx5e: Avoid copying payload to the skb's linear part

Dragos Tatulea (1):
  net/mlx5e: Align header copy to cache line for Striding RQ non-linear

 .../net/ethernet/mellanox/mlx5/core/en_rx.c   | 31 +++++++++++++------
 1 file changed, 22 insertions(+), 9 deletions(-)


base-commit: dacf281771a9aed1a723b196120a0de8637910b9
-- 
2.44.0


^ permalink raw reply

* [PATCH net-next V6 1/3] net/mlx5e: DMA-sync earlier in mlx5e_skb_from_cqe_mpwrq_nonlinear
From: Tariq Toukan @ 2026-05-07  9:53 UTC (permalink / raw)
  To: Christoph Paasch, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Andrew Lunn, David S. Miller
  Cc: Saeed Mahameed, Tariq Toukan, Mark Bloch, Leon Romanovsky, netdev,
	linux-rdma, linux-kernel, Gal Pressman, Dragos Tatulea,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
	Stanislav Fomichev, Amery Hung, Alexei Starovoitov
In-Reply-To: <20260507095330.318892-1-tariqt@nvidia.com>

From: Christoph Paasch <cpaasch@openai.com>

Doing the call to dma_sync_single_for_cpu() earlier will allow us to
adjust headlen based on the actual size of the protocol headers.

Doing this earlier means that we don't need to call
mlx5e_copy_skb_header() anymore and rather can call
skb_copy_to_linear_data() directly.

Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Christoph Paasch <cpaasch@openai.com>
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 .../net/ethernet/mellanox/mlx5/core/en_rx.c   | 22 +++++++++++++------
 1 file changed, 15 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 5b60aa47c75b..75ccf40a7f17 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -1923,11 +1923,11 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w
 	unsigned int truesize = 0;
 	u32 pg_consumed_bytes;
 	struct bpf_prog *prog;
+	void *va, *head_addr;
 	struct sk_buff *skb;
 	u32 linear_frame_sz;
 	u16 linear_data_len;
 	u16 linear_hr;
-	void *va;
 
 	if (unlikely(cqe_bcnt > rq->hw_mtu)) {
 		u8 lro_num_seg = get_cqe_lro_num_seg(cqe);
@@ -1940,9 +1940,11 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w
 
 	prog = rcu_dereference(rq->xdp_prog);
 
+	head_addr = netmem_address(head_page->netmem) + head_offset;
+
 	if (prog) {
 		/* area for bpf_xdp_[store|load]_bytes */
-		net_prefetchw(netmem_address(frag_page->netmem) + frag_offset);
+		net_prefetchw(head_addr);
 
 		va = mlx5e_mpwqe_get_linear_page_frag(rq);
 		if (!va) {
@@ -1956,6 +1958,8 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w
 		linear_frame_sz = MLX5_SKB_FRAG_SZ(linear_hr + MLX5E_RX_MAX_HEAD);
 		linear_page = &rq->mpwqe.linear_info->frag_page;
 	} else {
+		dma_addr_t addr;
+
 		skb = napi_alloc_skb(rq->cq.napi,
 				     ALIGN(MLX5E_RX_MAX_HEAD, sizeof(long)));
 		if (unlikely(!skb)) {
@@ -1967,6 +1971,11 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w
 		net_prefetchw(va); /* xdp_frame data area */
 		net_prefetchw(skb->data);
 
+		addr = page_pool_get_dma_addr_netmem(head_page->netmem);
+		dma_sync_single_for_cpu(rq->pdev, addr + head_offset,
+					ALIGN(headlen, sizeof(long)),
+					rq->buff.map_dir);
+
 		frag_offset += headlen;
 		byte_cnt -= headlen;
 		linear_hr = skb_headroom(skb);
@@ -2056,8 +2065,6 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w
 			__pskb_pull_tail(skb, headlen);
 		}
 	} else {
-		dma_addr_t addr;
-
 		if (xdp_buff_has_frags(&mxbuf->xdp)) {
 			struct mlx5e_frag_page *pagep;
 
@@ -2071,10 +2078,11 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w
 				pagep->frags++;
 			while (++pagep < frag_page);
 		}
+
 		/* copy header */
-		addr = page_pool_get_dma_addr_netmem(head_page->netmem);
-		mlx5e_copy_skb_header(rq, skb, head_page->netmem, addr,
-				      head_offset, head_offset, headlen);
+		skb_copy_to_linear_data(skb, head_addr,
+					ALIGN(headlen, sizeof(long)));
+
 		/* skb linear part was allocated with headlen and aligned to long */
 		skb->tail += headlen;
 		skb->len  += headlen;
-- 
2.44.0


^ permalink raw reply related

* [PATCH net-next V6 2/3] net/mlx5e: Avoid copying payload to the skb's linear part
From: Tariq Toukan @ 2026-05-07  9:53 UTC (permalink / raw)
  To: Christoph Paasch, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Andrew Lunn, David S. Miller
  Cc: Saeed Mahameed, Tariq Toukan, Mark Bloch, Leon Romanovsky, netdev,
	linux-rdma, linux-kernel, Gal Pressman, Dragos Tatulea,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
	Stanislav Fomichev, Amery Hung, Alexei Starovoitov
In-Reply-To: <20260507095330.318892-1-tariqt@nvidia.com>

From: Christoph Paasch <cpaasch@openai.com>

mlx5e_skb_from_cqe_mpwrq_nonlinear() copies MLX5E_RX_MAX_HEAD (256)
bytes from the page-pool to the skb's linear part. Those 256 bytes
include part of the payload.

When attempting to do GRO in skb_gro_receive, if headlen > data_offset
(and skb->head_frag is not set), we end up aggregating packets in the
frag_list.

This is of course not good when we are CPU-limited. Also causes a worse
skb->len/truesize ratio,...

So, let's avoid copying parts of the payload to the linear part. We use
eth_get_headlen() to parse the headers and compute the length of the
protocol headers, which will be used to copy the relevant bits of the
skb's linear part.

We still allocate MLX5E_RX_MAX_HEAD for the skb so that if the networking
stack needs to call pskb_may_pull() later on, we don't need to reallocate
memory.

This gives a nice throughput increase (ARM Neoverse-V2 with CX-7 NIC and
LRO enabled):

BEFORE:
=======
(netserver pinned to core receiving interrupts)
$ netperf -H 10.221.81.118 -T 80,9 -P 0 -l 60 -- -m 256K -M 256K
 87380  16384 262144    60.01    32547.82

(netserver pinned to adjacent core receiving interrupts)
$ netperf -H 10.221.81.118 -T 80,10 -P 0 -l 60 -- -m 256K -M 256K
 87380  16384 262144    60.00    52531.67

AFTER:
======
(netserver pinned to core receiving interrupts)
$ netperf -H 10.221.81.118 -T 80,9 -P 0 -l 60 -- -m 256K -M 256K
 87380  16384 262144    60.00    52896.06

(netserver pinned to adjacent core receiving interrupts)
 $ netperf -H 10.221.81.118 -T 80,10 -P 0 -l 60 -- -m 256K -M 256K
 87380  16384 262144    60.00    85094.90

Additional tests across a larger range of parameters w/ and w/o LRO, w/
and w/o IPv6-encapsulation, different MTUs (1500, 4096, 9000), different
TCP read/write-sizes as well as UDP benchmarks, all have shown equal or
better performance with this patch.

Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Christoph Paasch <cpaasch@openai.com>
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 75ccf40a7f17..301b33419207 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -1976,6 +1976,8 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w
 					ALIGN(headlen, sizeof(long)),
 					rq->buff.map_dir);
 
+		headlen = eth_get_headlen(rq->netdev, head_addr, headlen);
+
 		frag_offset += headlen;
 		byte_cnt -= headlen;
 		linear_hr = skb_headroom(skb);
@@ -2012,9 +2014,13 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w
 
 	if (prog) {
 		u8 nr_frags_free, old_nr_frags = sinfo->nr_frags;
+		skb_frag_t *frag = &sinfo->frags[0];
 		u8 new_nr_frags;
 		u32 len;
 
+		headlen = eth_get_headlen(rq->netdev, skb_frag_address(frag),
+					  skb_frag_size(frag));
+
 		if (mlx5e_xdp_handle(rq, prog, mxbuf)) {
 			if (__test_and_clear_bit(MLX5E_RQ_FLAG_XDP_XMIT, rq->flags)) {
 				struct mlx5e_frag_page *pfp;
@@ -2060,8 +2066,7 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w
 				pagep->frags++;
 			while (++pagep < frag_page);
 
-			headlen = min_t(u16, MLX5E_RX_MAX_HEAD - len,
-					skb->data_len);
+			headlen = min_t(u16, headlen - len, skb->data_len);
 			__pskb_pull_tail(skb, headlen);
 		}
 	} else {
-- 
2.44.0


^ permalink raw reply related

* [PATCH net-next V6 3/3] net/mlx5e: Align header copy to cache line for Striding RQ non-linear
From: Tariq Toukan @ 2026-05-07  9:53 UTC (permalink / raw)
  To: Christoph Paasch, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Andrew Lunn, David S. Miller
  Cc: Saeed Mahameed, Tariq Toukan, Mark Bloch, Leon Romanovsky, netdev,
	linux-rdma, linux-kernel, Gal Pressman, Dragos Tatulea,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
	Stanislav Fomichev, Amery Hung, Alexei Starovoitov
In-Reply-To: <20260507095330.318892-1-tariqt@nvidia.com>

From: Dragos Tatulea <dtatulea@nvidia.com>

In Striding RQ non-linear mode, there is a memcpy to pull the
header from the first fragment into the linear part of the skb.
As the header length is not aligned, it can cause cache thrashing
from a Read-Modify-Write cycle for the remaining bytes of the
cache line.

This patch changes the memcopy length to be aligned to the cache line.
The DMA sync is also aligned to cache line size accordingly. Note that
the original DMA sync is done on the initial conservative headlen
which is min(MLX5E_RX_MAX_HEAD, cqe_bcnt).

To show the improvement, a test was run with an XDP_DROP program
processing 64B packets at 100% CPU utilization over a single queue at
9000 MTU:

|----------+----------+------|
| Before   | After    | Diff |
|----------+----------+------|
| 3.6 Mpps | 3.8 Mpps | 5%   |
|----------+----------+------|

(CX7 NIC on Intel Xeon Platinum 8580 system)

While small packets profit most from this improvement, large packets
are not negatively affected (no regressions).

Suggested-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 301b33419207..e5963e1b5309 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -1973,7 +1973,7 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w
 
 		addr = page_pool_get_dma_addr_netmem(head_page->netmem);
 		dma_sync_single_for_cpu(rq->pdev, addr + head_offset,
-					ALIGN(headlen, sizeof(long)),
+					ALIGN(headlen, cache_line_size()),
 					rq->buff.map_dir);
 
 		headlen = eth_get_headlen(rq->netdev, head_addr, headlen);
@@ -2086,7 +2086,7 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w
 
 		/* copy header */
 		skb_copy_to_linear_data(skb, head_addr,
-					ALIGN(headlen, sizeof(long)));
+					ALIGN(headlen, cache_line_size()));
 
 		/* skb linear part was allocated with headlen and aligned to long */
 		skb->tail += headlen;
-- 
2.44.0


^ permalink raw reply related

* Re: [PATCH ipsec-next v8 04/14] xfrm: fix NAT-related field inheritance in SA migration
From: Steffen Klassert @ 2026-05-07  9:56 UTC (permalink / raw)
  To: Sabrina Dubroca
  Cc: Antony Antony, Herbert Xu, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, David Ahern,
	Masahide NAKAMURA, Paul Moore, Stephen Smalley, Ondrej Mosnacek,
	Jonathan Corbet, Shuah Khan, netdev, linux-kernel, selinux,
	linux-doc, Chiachang Wang, Yan Yan, devel
In-Reply-To: <afxcVV83k7CxImwC@krikkit>

On Thu, May 07, 2026 at 11:33:09AM +0200, Sabrina Dubroca wrote:
> 2026-05-05, 06:32:43 +0200, Antony Antony wrote:
> > During SA migration via xfrm_state_clone_and_setup(),
> > nat_keepalive_interval was silently dropped and never copied to the new
> > SA. mapping_maxage was unconditionally copied even when migrating to a
> > non-encapsulated SA.
> 
> mapping_maxage should be harmless (0/unused on non-encap), but I think
> migrating nat_keepalive_interval should be considered a fix:
> 
> Fixes: f531d13bdfe3 ("xfrm: support sending NAT keepalives in ESP in UDP states")
> 
> (maybe even split out of this series, but that would cause a conflict
> with the previous patch)

Can this be backported without the previous patches?
If not, we might need to split it out.

^ permalink raw reply

* [net-next v3 4/5] net: stmmac: starfive: Add jhb100 SGMII interface
From: Minda Chen @ 2026-05-07  9:41 UTC (permalink / raw)
  To: Alexandre Torgue, Andrew Lunn, David S . Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Maxime Coquelin,
	Emil Renner Berthing, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley, netdev
  Cc: linux-kernel, linux-stm32, devicetree, Minda Chen
In-Reply-To: <20260507094115.8355-1-minda.chen@starfivetech.com>

Add jhb100 compatible and SGMII support. jhb100 soc contains
2 SGMII interfaces and integrated with serdes PHY. SGMII with
split TX/RX MAC clock and need to set 2.5M/25M/125M TX/RX clock
rate in 10M/100M/1000M speed mode.

Signed-off-by: Minda Chen <minda.chen@starfivetech.com>
Reviewed-by: Sai Krishna <saikrishnag@marvell.com>
---
 .../ethernet/stmicro/stmmac/dwmac-starfive.c  | 59 ++++++++++++++-----
 1 file changed, 45 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-starfive.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-starfive.c
index 16b955a6d77b..bd86a39b79f0 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-starfive.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-starfive.c
@@ -26,6 +26,7 @@ struct starfive_dwmac_data {
 struct starfive_dwmac {
 	struct device *dev;
 	const struct starfive_dwmac_data *data;
+	struct clk *sgmii_rx;
 };
 
 static int starfive_dwmac_set_mode(struct plat_stmmacenet_data *plat_dat)
@@ -68,6 +69,25 @@ static int starfive_dwmac_set_mode(struct plat_stmmacenet_data *plat_dat)
 	return 0;
 }
 
+static int stmmac_starfive_sgmii_set_clk_rate(void *bsp_priv, struct clk *clk_tx_i,
+					      phy_interface_t __maybe_unused interface,
+					      int speed)
+{
+	struct starfive_dwmac *dwmac = (void *)bsp_priv;
+	long rate = rgmii_clock(speed);
+	int ret;
+
+	/* MAC clock rate the same as RGMII */
+	if (rate < 0)
+		return 0;
+
+	ret = clk_set_rate(clk_tx_i, rate);
+	if (ret)
+		return ret;
+
+	return clk_set_rate(dwmac->sgmii_rx, rate);
+}
+
 static int starfive_dwmac_probe(struct platform_device *pdev)
 {
 	struct plat_stmmacenet_data *plat_dat;
@@ -102,23 +122,33 @@ static int starfive_dwmac_probe(struct platform_device *pdev)
 		return dev_err_probe(&pdev->dev, PTR_ERR(clk_gtx),
 				     "error getting gtx clock\n");
 
-	/* Generally, the rgmii_tx clock is provided by the internal clock,
-	 * which needs to match the corresponding clock frequency according
-	 * to different speeds. If the rgmii_tx clock is provided by the
-	 * external rgmii_rxin, there is no need to configure the clock
-	 * internally, because rgmii_rxin will be adaptively adjusted.
-	 */
-	if (!device_property_read_bool(&pdev->dev, "starfive,tx-use-rgmii-clk"))
-		plat_dat->set_clk_tx_rate = stmmac_set_clk_tx_rate;
-
 	dwmac->dev = &pdev->dev;
-	plat_dat->flags |= STMMAC_FLAG_EN_TX_LPI_CLK_PHY_CAP;
 	plat_dat->bsp_priv = dwmac;
-	plat_dat->dma_cfg->dche = true;
+	if (plat_dat->phy_interface == PHY_INTERFACE_MODE_SGMII) {
+		dwmac->sgmii_rx = devm_clk_get_enabled(&pdev->dev, "sgmii_rx");
+		if (IS_ERR(dwmac->sgmii_rx))
+			return dev_err_probe(&pdev->dev,
+					     PTR_ERR(dwmac->sgmii_rx),
+					     "error getting sgmii rx clock\n");
+		plat_dat->set_clk_tx_rate = stmmac_starfive_sgmii_set_clk_rate;
+	} else {
+		/*
+		 * Generally, the rgmii_tx clock is provided by the internal clock,
+		 * which needs to match the corresponding clock frequency according
+		 * to different speeds. If the rgmii_tx clock is provided by the
+		 * external rgmii_rxin, there is no need to configure the clock
+		 * internally, because rgmii_rxin will be adaptively adjusted.
+		 */
+		if (!device_property_read_bool(&pdev->dev, "starfive,tx-use-rgmii-clk"))
+			plat_dat->set_clk_tx_rate = stmmac_set_clk_tx_rate;
+
+		err = starfive_dwmac_set_mode(plat_dat);
+		if (err)
+			return err;
+	}
 
-	err = starfive_dwmac_set_mode(plat_dat);
-	if (err)
-		return err;
+	plat_dat->flags |= STMMAC_FLAG_EN_TX_LPI_CLK_PHY_CAP;
+	plat_dat->dma_cfg->dche = true;
 
 	return stmmac_dvr_probe(&pdev->dev, plat_dat, &stmmac_res);
 }
@@ -130,6 +160,7 @@ static const struct starfive_dwmac_data jh7100_data = {
 static const struct of_device_id starfive_dwmac_match[] = {
 	{ .compatible = "starfive,jh7100-dwmac", .data = &jh7100_data },
 	{ .compatible = "starfive,jh7110-dwmac" },
+	{ .compatible = "starfive,jhb100-dwmac" },
 	{ /* sentinel */ }
 };
 MODULE_DEVICE_TABLE(of, starfive_dwmac_match);
-- 
2.17.1


^ permalink raw reply related

* Re: [PATCH net v1] net/mlx5: Fix HWS L2-to-L3 tunnel reformat release
From: Yevgeny Kliteynik @ 2026-05-07  9:58 UTC (permalink / raw)
  To: Prathamesh Deshpande
  Cc: Alexander Lobakin, Saeed Mahameed, Leon Romanovsky, Moshe Shemesh,
	Mark Bloch, Tariq Toukan, Jakub Kicinski, netdev, linux-rdma,
	linux-kernel
In-Reply-To: <1ecb87f1-fcae-47bb-ad83-7bf2ec807463@intel.com>

On 05-May-26 19:26, Alexander Lobakin wrote:
> From: Prathamesh Deshpande <prathameshdeshpande7@gmail.com>
> Date: Mon,  4 May 2026 23:19:17 +0100
> 
>> mlx5_cmd_hws_packet_reformat_alloc() allocates
>> MLX5_REFORMAT_TYPE_L2_TO_L3_TUNNEL objects from el2tol3tnl_pools with
>> MLX5HWS_ACTION_TYP_REFORMAT_L2_TO_TNL_L3.
>>
>> The deallocation path uses el2tol2tnl_pools with
>> MLX5HWS_ACTION_TYP_REFORMAT_L2_TO_TNL_L2 instead. This releases the
>> packet-reformat entry through the wrong pool, corrupting pool accounting
>> and potentially moving the bulk entry onto the wrong pool list.
>>
>> Use the matching L2-to-L3 tunnel pool and action type when releasing the
>> object.
>>
>> Fixes: aecd9d1020e3 ("net/mlx5: fs, add HWS packet reformat API function")
>> Signed-off-by: Prathamesh Deshpande <prathameshdeshpande7@gmail.com>
> 
> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
> 

Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>

>> ---
>>   drivers/net/ethernet/mellanox/mlx5/core/steering/hws/fs_hws.c | 4 ++--
>>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> Thanks,
> Olek


^ permalink raw reply

* Re: [PATCH v4 2/4] ynl_gen: generate Rust files from yaml files
From: Alice Ryhl @ 2026-05-07 10:00 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Carlos Llamas, Greg Kroah-Hartman, Andrew Lunn, Donald Hunter,
	David S. Miller, Eric Dumazet, Paolo Abeni, Simon Horman,
	Matthew Maurer, Miguel Ojeda, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg,
	Trevor Gross, Danilo Krummrich, Christian Brauner, linux-kernel,
	rust-for-linux, netdev
In-Reply-To: <20260505171637.17e20b98@kernel.org>

On Tue, May 05, 2026 at 05:16:37PM -0700, Jakub Kicinski wrote:
> On Tue, 5 May 2026 09:10:17 +0000 Alice Ryhl wrote:
> > On Mon, May 04, 2026 at 04:58:58PM -0700, Jakub Kicinski wrote:
> > > On Mon, 04 May 2026 09:04:55 +0000 Alice Ryhl wrote:  
> > > >  tools/net/ynl/pyynl/ynl_gen_c.py | 139 ++++++++++++++++++++++++++++++++++++++-  
> > > 
> > > No. Rust. In. This. File.
> > > 
> > > Just commit the artifacts. I truly hope that this is the only Netlink
> > > family we will have in Rust.  
> > 
> > There's no reason to react like this. I have not ignored your concern.
> > Last time we discussed this, the discussion ended on splitting the file
> > into ynl_gen_c.py and ynl_gen_rust.py, which you did not reply to, and I
> > actually spent some time working on that. However, I felt the change was
> > non-trivial and I wanted to discuss whether that was the correct way
> > forward before spending more time on it. Therefore, I kept this patch
> > as-is for now and noted why it was non-trivial (sharing of CodeWriter)
> > in the commit message, until we could discuss further.
> > 
> > I think you are probably right that just comitting the artifacts is the
> > simplest way forward for now. Especially since Donald is apparently
> > working on splitting up the file for strace [1]. On the off-chance that
> > a second Netlink family is ever added, hopefully Donald's work has
> > already completed and we can easily add this support in a new file when
> > the time comes.
> > 
> > I guess another way forward is to commit a copy of the python script
> > with the Rust support to drivers/android/binder/ and I can run it
> > manually if the Binder yaml file is ever updated.
> 
> Could you _please_ do what I'm asking you to do instead inventing your
> own solutions. Just commit the generated files and leave the script out.
> We lived without Netlink code gen for 30 years.

I will drop the patch in the next version.

Alice

^ permalink raw reply

* Re: [PATCH ipsec-next v8 06/14] xfrm: split xfrm_state_migrate into create and install functions
From: Sabrina Dubroca @ 2026-05-07 10:11 UTC (permalink / raw)
  To: Antony Antony
  Cc: Steffen Klassert, Herbert Xu, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, David Ahern,
	Masahide NAKAMURA, Paul Moore, Stephen Smalley, Ondrej Mosnacek,
	Jonathan Corbet, Shuah Khan, netdev, linux-kernel, selinux,
	linux-doc, Chiachang Wang, Yan Yan, devel
In-Reply-To: <migrate-state-v8-6-4578fb016965@secunet.com>

2026-05-05, 06:33:07 +0200, Antony Antony wrote:
> To prepare for subsequent patches, split
> xfrm_state_migrate() into two functions:
> - xfrm_state_migrate_create(): creates the migrated state
> - xfrm_state_migrate_install(): installs it into the state table
> 
> splitting will help to avoid SN/IV reuse when migrating AEAD SA.
> 
> And add const whenever possible.
> No functional change.
> 
> Signed-off-by: Antony Antony <antony.antony@secunet.com>

Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>

(I was going to mention xuo, but I see it's handled later on)

-- 
Sabrina

^ permalink raw reply

* Re: [PATCH ipsec-next v8 04/14] xfrm: fix NAT-related field inheritance in SA migration
From: Sabrina Dubroca @ 2026-05-07 10:13 UTC (permalink / raw)
  To: Steffen Klassert
  Cc: Antony Antony, Herbert Xu, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, David Ahern,
	Masahide NAKAMURA, Paul Moore, Stephen Smalley, Ondrej Mosnacek,
	Jonathan Corbet, Shuah Khan, netdev, linux-kernel, selinux,
	linux-doc, Chiachang Wang, Yan Yan, devel
In-Reply-To: <afxh6tZDV7RwXQ_a@secunet.com>

2026-05-07, 11:56:58 +0200, Steffen Klassert wrote:
> On Thu, May 07, 2026 at 11:33:09AM +0200, Sabrina Dubroca wrote:
> > 2026-05-05, 06:32:43 +0200, Antony Antony wrote:
> > > During SA migration via xfrm_state_clone_and_setup(),
> > > nat_keepalive_interval was silently dropped and never copied to the new
> > > SA. mapping_maxage was unconditionally copied even when migrating to a
> > > non-encapsulated SA.
> > 
> > mapping_maxage should be harmless (0/unused on non-encap), but I think
> > migrating nat_keepalive_interval should be considered a fix:
> > 
> > Fixes: f531d13bdfe3 ("xfrm: support sending NAT keepalives in ESP in UDP states")
> > 
> > (maybe even split out of this series, but that would cause a conflict
> > with the previous patch)
> 
> Can this be backported without the previous patches?
> If not, we might need to split it out.

git cherry-pick managed to handle the small context change, so it's
probably fine like this.

-- 
Sabrina

^ permalink raw reply

* [net-next v3 3/5] dt-bindings: net: starfive,jh7110-dwmac: Add jhb100 sgmii rx clk
From: Minda Chen @ 2026-05-07  9:41 UTC (permalink / raw)
  To: Alexandre Torgue, Andrew Lunn, David S . Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Maxime Coquelin,
	Emil Renner Berthing, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley, netdev
  Cc: linux-kernel, linux-stm32, devicetree, Minda Chen
In-Reply-To: <20260507094115.8355-1-minda.chen@starfivetech.com>

jhb100 SGMII interface tx/rx mac clock is split and require to
set clock rate in 10M/100M/1000M speed. So dts need to add a
new rx clock in code, dts and dt binding doc.
So in jhb100 SGMII interface contain 6 clocks, RMII/RGMII
interface still contail 5 clocks.

Signed-off-by: Minda Chen <minda.chen@starfivetech.com>
---
 .../bindings/net/starfive,jh7110-dwmac.yaml   | 42 ++++++++++++++++---
 1 file changed, 36 insertions(+), 6 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/starfive,jh7110-dwmac.yaml b/Documentation/devicetree/bindings/net/starfive,jh7110-dwmac.yaml
index 06aeaa0f6f00..af160a8dedb8 100644
--- a/Documentation/devicetree/bindings/net/starfive,jh7110-dwmac.yaml
+++ b/Documentation/devicetree/bindings/net/starfive,jh7110-dwmac.yaml
@@ -39,20 +39,18 @@ properties:
     maxItems: 1
 
   clocks:
+    minItems: 5
     items:
       - description: GMAC main clock
       - description: GMAC AHB clock
       - description: PTP clock
       - description: TX clock
       - description: GTX clock
+      - description: SGMII RX clock
 
   clock-names:
-    items:
-      - const: stmmaceth
-      - const: pclk
-      - const: ptp_ref
-      - const: tx
-      - const: gtx
+    minItems: 5
+    maxItems: 6
 
   starfive,tx-use-rgmii-clk:
     description:
@@ -99,6 +97,18 @@ allOf:
           minItems: 2
           maxItems: 2
 
+        clocks:
+          minItems: 5
+          maxItems: 5
+
+        clock-names:
+          items:
+            - const: stmmaceth
+            - const: pclk
+            - const: ptp_ref
+            - const: tx
+            - const: gtx
+
         resets:
           maxItems: 1
 
@@ -111,6 +121,26 @@ allOf:
           contains:
             const: starfive,jh7110-dwmac
     then:
+      properties:
+        clocks:
+          minItems: 5
+          maxItems: 6
+
+        clock-names:
+          oneOf:
+            - items:
+                - const: stmmaceth
+                - const: pclk
+                - const: ptp_ref
+                - const: tx
+                - const: gtx
+            - items:
+                - const: stmmaceth
+                - const: pclk
+                - const: ptp_ref
+                - const: tx
+                - const: gtx
+                - const: sgmii_rx
       if:
         properties:
           compatible:
-- 
2.17.1


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox