* Re: [PATCH 6.12.y] netfilter: conntrack: add missing netlink policy validations
From: Pablo Neira Ayuso @ 2026-04-13 9:47 UTC (permalink / raw)
To: Li hongliang
Cc: gregkh, stable, fw, patches, linux-kernel, kadlec, davem,
edumazet, kuba, pabeni, horms, kaber, netfilter-devel, coreteam,
netdev, imv4bel
In-Reply-To: <20260413073105.2990210-1-1468888505@139.com>
Why only 6.12?
On Mon, Apr 13, 2026 at 03:31:05PM +0800, Li hongliang wrote:
> From: Florian Westphal <fw@strlen.de>
>
> [ Upstream commit f900e1d77ee0ef87bfb5ab3fe60f0b3d8ad5ba05 ]
>
> Hyunwoo Kim reports out-of-bounds access in sctp and ctnetlink.
>
> These attributes are used by the kernel without any validation.
> Extend the netlink policies accordingly.
>
> Quoting the reporter:
> nlattr_to_sctp() assigns the user-supplied CTA_PROTOINFO_SCTP_STATE
> value directly to ct->proto.sctp.state without checking that it is
> within the valid range. [..]
>
> and: ... with exp->dir = 100, the access at
> ct->master->tuplehash[100] reads 5600 bytes past the start of a
> 320-byte nf_conn object, causing a slab-out-of-bounds read confirmed by
> UBSAN.
>
> Fixes: 076a0ca02644 ("netfilter: ctnetlink: add NAT support for expectations")
> Fixes: a258860e01b8 ("netfilter: ctnetlink: add full support for SCTP to ctnetlink")
> Reported-by: Hyunwoo Kim <imv4bel@gmail.com>
> Signed-off-by: Florian Westphal <fw@strlen.de>
> Signed-off-by: Li hongliang <1468888505@139.com>
> ---
> net/netfilter/nf_conntrack_netlink.c | 2 +-
> net/netfilter/nf_conntrack_proto_sctp.c | 3 ++-
> 2 files changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
> index 323e147fe282..f51cdfba68fb 100644
> --- a/net/netfilter/nf_conntrack_netlink.c
> +++ b/net/netfilter/nf_conntrack_netlink.c
> @@ -3460,7 +3460,7 @@ ctnetlink_change_expect(struct nf_conntrack_expect *x,
>
> #if IS_ENABLED(CONFIG_NF_NAT)
> static const struct nla_policy exp_nat_nla_policy[CTA_EXPECT_NAT_MAX+1] = {
> - [CTA_EXPECT_NAT_DIR] = { .type = NLA_U32 },
> + [CTA_EXPECT_NAT_DIR] = NLA_POLICY_MAX(NLA_BE32, IP_CT_DIR_REPLY),
> [CTA_EXPECT_NAT_TUPLE] = { .type = NLA_NESTED },
> };
> #endif
> diff --git a/net/netfilter/nf_conntrack_proto_sctp.c b/net/netfilter/nf_conntrack_proto_sctp.c
> index 4cc97f971264..fabb2c1ca00a 100644
> --- a/net/netfilter/nf_conntrack_proto_sctp.c
> +++ b/net/netfilter/nf_conntrack_proto_sctp.c
> @@ -587,7 +587,8 @@ static int sctp_to_nlattr(struct sk_buff *skb, struct nlattr *nla,
> }
>
> static const struct nla_policy sctp_nla_policy[CTA_PROTOINFO_SCTP_MAX+1] = {
> - [CTA_PROTOINFO_SCTP_STATE] = { .type = NLA_U8 },
> + [CTA_PROTOINFO_SCTP_STATE] = NLA_POLICY_MAX(NLA_U8,
> + SCTP_CONNTRACK_HEARTBEAT_SENT),
> [CTA_PROTOINFO_SCTP_VTAG_ORIGINAL] = { .type = NLA_U32 },
> [CTA_PROTOINFO_SCTP_VTAG_REPLY] = { .type = NLA_U32 },
> };
> --
> 2.34.1
>
>
^ permalink raw reply
* Re: [PATCH] net/sched: sch_cake: fix NAT destination port not being updated in cake_update_flowkeys
From: Toke Høiland-Jørgensen @ 2026-04-13 9:41 UTC (permalink / raw)
To: Dudu Lu, netdev; +Cc: jhs, jiri, Dudu Lu
In-Reply-To: <20260413084715.70169-1-phx0fer@gmail.com>
Dudu Lu <phx0fer@gmail.com> writes:
> cake_update_flowkeys() is supposed to update the flow dissector keys
> with the NAT-translated addresses and ports from conntrack, so that
> CAKE's per-flow fairness correctly identifies post-NAT flows as
> belonging to the same connection.
>
> For the source port, this works correctly:
> keys->ports.src = port; /* writes conntrack port into keys */
>
> But for the destination port, the assignment is reversed:
> port = keys->ports.dst; /* reads FROM keys into local var — no-op */
Huh, what a silly mistake - nice find!
> This means the NAT destination port is never updated in the flow keys.
> As a result, when multiple connections are NATed to the same destination
> (same IP + same port), CAKE treats them as separate flows because the
> original (pre-NAT) destination ports differ. This completely defeats
> CAKE's NAT-aware flow isolation when using the "nat" mode.
>
> The vulnerability was introduced in commit b0c19ed6088a ("sch_cake: Take advantage
> of skb->hash where appropriate")
Calling it a "vulnerability" seems perhaps a tad hyperbolic. Care to
elaborate on what you mean here?
-Toke
^ permalink raw reply
* RE: [PATCH net 1/1] tipc: validate Gap ACK blocks in STATE message
From: Tung Quang Nguyen @ 2026-04-13 10:01 UTC (permalink / raw)
To: Ruide Cao, Ren Wei
Cc: jmaloy@redhat.com, davem@davemloft.net, edumazet@google.com,
kuba@kernel.org, pabeni@redhat.com, horms@kernel.org,
yifanwucs@gmail.com, tomapufckgml@gmail.com, yuantan098@gmail.com,
bird@lzu.edu.cn, enjou1224z@gmail.com, netdev@vger.kernel.org
In-Reply-To: <7369ab71-e3bc-48ac-8165-439ad8595fc0@gmail.com>
>Subject: Re: [PATCH net 1/1] tipc: validate Gap ACK blocks in STATE message
>
>
>On 4/12/2026 8:06 PM, Tung Quang Nguyen wrote:
>>> Subject: [PATCH net 1/1] tipc: validate Gap ACK blocks in STATE
>>> message
>>>
>>> From: Ruide Cao <caoruide123@gmail.com>
>>>
>>> tipc_get_gap_ack_blks() reads len, ugack_cnt and bgack_cnt directly
>>> from
>>> msg_data(hdr) before verifying that a STATE message actually contains
>>> the fixed Gap ACK block header in its logical data area.
>>>
>>> A peer that negotiates TIPC_GAP_ACK_BLOCK can send a short STATE
>>> message with a declared TIPC payload shorter than struct
>>> tipc_gap_ack_blks and still append a few physical bytes after the
>>> header. The helper then trusts those bytes as Gap ACK metadata, and
>>> the forged bgack_cnt/len values can drive the broadcast receive path into
>kmemdup() beyond the skb boundary.
>> Can you explain how that peer can alter the STATE message ? If it can, what
>concrete values are used and on what fields of the STATE messages ?
>
>Thanks for the review.
>
>To clarify, the peer is not "altering" an already received STATE message; it is
>actively sending a malformed LINK_PROTOCOL/STATE_MSG after the link has
>already negotiated the TIPC_GAP_ACK_BLOCK capability.
>
>Concretely, the crafted STATE message is sent with a modified msg_size so that
>msg_data_sz(hdr) is 0, but the actual UDP payload still carries extra physical
>bytes after the 40-byte TIPC header. Those bytes are then interpreted as the
>fixed Gap ACK header. For example:
> len = 0x07fc
> ugack_cnt = 0xff
> bgack_cnt = 0xff
>
It is surprising that you can modify any field you want in the TIPC message. I do not think that current TIPC code can handle this corrupt message .
Can you send me the stack trace at receiving peer when real crash happens after you send the "crafted" state message ?
>These values are specifically chosen so that the existing sanity check remains
>internally consistent:
> struct_size(p, gacks, 0xff + 0xff) == 0x07fc
>
>Therefore, the existing sanity check does not reject this case. It only checks the
>self-consistency of the attacker-controlled Gap ACK fields; it completely fails to
>check if the declared Gap ACK record actually fits inside the enclosing STATE
>message's logical payload length.
>
>>> Fix this by rejecting Gap ACK parsing unless the logical STATE
>>> payload is large enough to cover the fixed header, and by rejecting
>>> declared Gap ACK lengths that are smaller than the fixed header or larger
>than the logical payload.
>>> Return 0 for invalid lengths so malformed Gap ACK data is not treated
>>> as a valid payload offset, and drop unicast STATE messages that
>>> advertise Gap ACK support but still yield an invalid Gap ACK length.
>>> This keeps malformed Gap ACK data ignored without misaligning monitor
>payload parsing.
>>>
>>> Fixes: d7626b5acff9 ("tipc: introduce Gap ACK blocks for broadcast
>>> link")
>>> Cc: stable@kernel.org
>>> Reported-by: Yifan Wu <yifanwucs@gmail.com>
>>> Reported-by: Juefei Pu <tomapufckgml@gmail.com>
>>> Co-developed-by: Yuan Tan <yuantan098@gmail.com>
>>> Signed-off-by: Yuan Tan <yuantan098@gmail.com>
>>> Suggested-by: Xin Liu <bird@lzu.edu.cn>
>>> Tested-by: Ren Wei <enjou1224z@gmail.com>
>>> Signed-off-by: Ruide Cao <caoruide123@gmail.com>
>>> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn>
>>> ---
>>> net/tipc/link.c | 16 ++++++++++++++--
>>> 1 file changed, 14 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/net/tipc/link.c b/net/tipc/link.c index
>>> 49dfc098d89b..44678d98939a
>>> 100644
>>> --- a/net/tipc/link.c
>>> +++ b/net/tipc/link.c
>>> @@ -1415,12 +1415,22 @@ u16 tipc_get_gap_ack_blks(struct
>>> tipc_gap_ack_blks **ga, struct tipc_link *l,
>>> struct tipc_msg *hdr, bool uc)
>>> {
>>> struct tipc_gap_ack_blks *p;
>>> - u16 sz = 0;
>>> + u16 sz = 0, dlen = msg_data_sz(hdr);
>>>
>>> /* Does peer support the Gap ACK blocks feature? */
>>> if (l->peer_caps & TIPC_GAP_ACK_BLOCK) {
>>> + u16 min_sz = struct_size(p, gacks, 0);
>>> +
>>> + if (dlen < min_sz)
>>> + goto ignore;
>> This checking is redundant because with existing sanity checking, the invalid
>gap ACK blocks will not be used to release acked messages in transmit queue.
>
>The `dlen < min_sz` check is required because the existing sanity check already
>dereferences `p->len`, `p->ugack_cnt`, and `p->bgack_cnt`.
In the case of dlen > p->len and p->len = 0x07fc, p->ugack_cnt = 0xff, p->bgack_cnt = 0xff (Sender modified or kept dlen and modified the remaining 3 fields),
how could your above and subsequent sanity checks validate p->len, p->bgack_cnt and p->ugack_cnt ?
>Without this new check, an Out-of-Bounds (OOB) read occurs before the old
>sanity check even has a chance to run.
>
>>> +
>>> p = (struct tipc_gap_ack_blks *)msg_data(hdr);
>>> sz = ntohs(p->len);
>>> + if (sz < min_sz || sz > dlen) {
>>> + sz = 0;
>>> + goto ignore;
>>> + }
>> This checking is redundant. Existing sanity checking is good enough.
>
>The `sz < min_sz || sz > dlen` check is not redundant because the old sanity
>check completely fails to verify if the declared Gap ACK length
>(`sz`) actually fits inside the enclosing STATE message's logical payload length
>(`dlen`).
>
>Without checking against `dlen`, an internally consistent spoofed packet will
>pass the old check and cause OOB reads during the subsequent block parsing.
>
>>> +
>>> /* Sanity check */
>>> if (sz == struct_size(p, gacks, size_add(p->ugack_cnt, p-
>>>> bgack_cnt))) {
>>> /* Good, check if the desired type exists */ @@ -
>>> 1434,6 +1444,8 @@ u16 tipc_get_gap_ack_blks(struct tipc_gap_ack_blks
>>> **ga, struct tipc_link *l,
>>> }
>>> }
>>> }
>>> +
>>> +ignore:
>>> /* Other cases: ignore! */
>>> p = NULL;
>>>
>>> @@ -2270,7 +2282,7 @@ static int tipc_link_proto_rcv(struct tipc_link
>>> *l, struct sk_buff *skb,
>>> case STATE_MSG:
>>> /* Validate Gap ACK blocks, drop if invalid */
>>> glen = tipc_get_gap_ack_blks(&ga, l, hdr, true);
>>> - if (glen > dlen)
>>> + if (glen > dlen || ((l->peer_caps & TIPC_GAP_ACK_BLOCK) &&
>>> !glen))
>> This checking is redundant. Existing sanity checking is good enough.
>
>The unicast caller-side drop `((l->peer_caps & TIPC_GAP_ACK_BLOCK) &&
>!glen)` is also necessary. Once the capability is negotiated, a valid Gap ACK
>record MUST have at least the fixed 4-byte header. If `glen == 0` from such a
>peer, it indicates a malformed payload.
>
>The STATE message must be dropped here so it is not passed on to
>`tipc_mon_rcv()` as if monitor data started at `data + 0`, which would misalign
>the monitor payload parsing.
>
>>> break;
>>>
>>> l->rcv_nxt_state = msg_seqno(hdr) + 1;
>>> --
>>> 2.34.1
>>>
^ permalink raw reply
* Re: [RFC] Proposal: Add sysfs interface for PCIe TPH Steering Tag retrieval and configuration
From: Leon Romanovsky @ 2026-04-13 10:01 UTC (permalink / raw)
To: fengchengwen
Cc: Jason Gunthorpe, Bjorn Helgaas, linux-rdma, linux-pci, netdev,
dri-devel, Keith Busch, Yochai Cohen, Yishai Hadas, Zhiping Zhang
In-Reply-To: <6ea4c4c2-774e-aa76-3665-918e2a24cc84@huawei.com>
On Fri, Apr 10, 2026 at 10:30:52PM +0800, fengchengwen wrote:
> Hi all,
>
> I'm writing to propose adding a sysfs interface to expose and configure the
> PCIe TPH
> Steering Tag for PCIe devices, which is retrieved inside the kernel.
>
>
> Background: The TPH Steering Tag is tightly coupled with both a PCIe device
> (identified
> by its BDF) and a CPU core. It can only be obtained in kernel mode. To allow
> user-space
> applications to fetch and set this value securely and conveniently, we need
> a standard
> kernel-to-user interface.
>
>
> Proposed Solution: Add several sysfs attributes under each PCIe device's
> sysfs directory:
> 1. /sys/bus/pci/devices/<BDF>/tph_mode to query the TPH mode (interrupt or
> device specific)
> 2. /sys/bus/pci/devices/<BDF>/tph_enable to control the TPH feature
> 3. /sys/bus/pci/devices/<BDF>/tph_st to support both read and write
> operations, e.g.:
> Read operation:
> echo "cpu=3" > /sys/bus/pci/devices/0000:01:00.0/tph_st
> cat /sys/bus/pci/devices/0000:01:00.0/tph_st
> Write operation:
> echo "index=10 st=123" > /sys/bus/pci/devices/0000:01:00.0/tph_st
>
>
> The design strictly follows PCI subsystem sysfs standards and has the
> following key properties:
>
> 1. Dynamic Visibility: The sysfs attributes will only be present for PCIe
> devices that
> support TPH Steering Tag. Devices without TPH capability will not show
> these nodes,
> avoiding unnecessary user confusion.
>
> 2. Permission Control: The attributes will use 0600 file permissions,
> ensuring only
> privileged root users can read or write them, which satisfies security
> requirements
> for hardware configuration interfaces.
>
> 3. Standard Implementation Location: The interface will be implemented in
> drivers/pci/pci-sysfs.c, the canonical location for all PCI device sysfs
> attributes,
> ensuring consistency and maintainability within the PCI subsystem.
>
>
> Why sysfs instead of alternatives like VFIO-PCI ioctl:
>
> - Universality: sysfs does not require binding the device to a special
> driver such as
> vfio-pci. It is available to any privileged user-space component,
> including system
> utilities, daemons, and monitoring tools.
>
> - Simplicity: Both user-space usage (cat/echo) and kernel implementation are
> straightforward, reducing code complexity and long-term maintenance cost.
>
> - Design Alignment: TPH Steering Tag is a generic PCIe device feature, not
> specific to
> user-space drivers like DPDK or VFIO. Exposing it via sysfs matches the
> kernel's
> standard pattern for hardware capabilities.
>
>
> I look forward to your comments about this design before submitting the
> final patch.
You need to explain more clearly why this write functionality is useful
and necessary outside the VFIO/RDMA context:
https://lore.kernel.org/all/20260324234615.3731237-1-zhipingz@meta.com/
AFAIK, for non-VFIO TPH callers, kernel has enough knowledge to set
right ST values.
There are several comments regarding the implementation, but those can wait
until the rationale behind the proposal is fully clarified.
Thanks
>
> Best regards,
> Chengwen Feng
>
^ permalink raw reply
* Re: [PATCH] net/sched: sch_cake: fix NAT destination port not being updated in cake_update_flowkeys
From: Toke Høiland-Jørgensen @ 2026-04-13 10:07 UTC (permalink / raw)
To: phx; +Cc: netdev
In-Reply-To: <CAKvCo-yFnu3RBbiGkaVi-X5qX_hN1a-FYrBZfzB9UKz8k-PZtQ@mail.gmail.com>
phx <phx0fer@gmail.com> writes:
> You're right, "vulnerability" is too strong - it's a correctness
> bug, not a security issue. Thanks for picking it up.
Cool. Could you please re-send with an updated commit message? Thanks!
-Toke
pw-bot: cr
^ permalink raw reply
* Re: [PATCH iwl-net v2 2/6] ixgbe: add bounds check for debugfs register access
From: Simon Horman @ 2026-04-13 10:30 UTC (permalink / raw)
To: Aleksandr Loktionov
Cc: intel-wired-lan, anthony.l.nguyen, netdev, Paul Greenwalt
In-Reply-To: <20260408131154.2661818-3-aleksandr.loktionov@intel.com>
On Wed, Apr 08, 2026 at 03:11:50PM +0200, Aleksandr Loktionov wrote:
> From: Paul Greenwalt <paul.greenwalt@intel.com>
>
> Prevent out-of-bounds MMIO accesses triggered through user-controlled
> register offsets. IXGBE_HFDR (0x15FE8) is the highest valid MMIO
> register in the ixgbe register map; any offset beyond it would address
> unmapped memory.
>
> Add a defense-in-depth check at two levels:
>
> 1. ixgbe_read_reg() -- the noinline register read accessor. A
> WARN_ON_ONCE() guard here catches any future code path (including
> ioctl extensions) that might inadvertently pass an out-of-range
> offset without relying on higher layers to catch it first.
> ixgbe_write_reg() is a static inline called from the TX/RX hot path;
> adding WARN_ON_ONCE there would inline the check at every call site,
> so only the read path gets this guard.
>
> 2. ixgbe_dbg_reg_ops_write() -- the debugfs 'reg_ops' interface is the
> only current path where a raw, user-supplied offset enters the driver.
> Gating it before invoking the register accessors provides a clean,
> user-visible failure (silent ignore with no kernel splat) for
> deliberately malformed debugfs writes.
>
> Add a reg <= IXGBE_HFDR guard to both the read and write paths in
> ixgbe_dbg_reg_ops_write(), and a WARN_ON_ONCE + early-return guard to
> ixgbe_read_reg().
>
> Fixes: 91fbd8f081e2 ("ixgbe: added reg_ops file to debugfs")
> Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com>
> Cc: stable@vger.kernel.org
> Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
> ---
> v1 -> v2:
> - Add Fixes: tag; reroute from iwl-next to iwl-net (security-relevant
> hardening for user-controllable out-of-bounds MMIO).
Thanks for the update.
And sorry for not thinking to ask this earlier: this patch
addresses possible overruns of the mapped address space if the
supplied value for reg is too large. But do we also need a
guard against underrun if the value for reg is too small?
...
^ permalink raw reply
* Re: [PATCH] vsock/virtio: fix accept queue count leak on transport mismatch in recv_listen
From: Stefano Garzarella @ 2026-04-13 10:30 UTC (permalink / raw)
To: Dudu Lu; +Cc: netdev, stefanha, mst, jasowang
In-Reply-To: <20260413085243.73200-1-phx0fer@gmail.com>
On Mon, Apr 13, 2026 at 04:52:43PM +0800, Dudu Lu wrote:
>virtio_transport_recv_listen() calls sk_acceptq_added(sk) to increment
>the listener's accept queue counter before calling
>vsock_assign_transport(). When vsock_assign_transport() fails or selects
>a different transport than the one that received the packet, the error
>path returns without calling sk_acceptq_removed(sk), permanently
>incrementing sk_ack_backlog.
>
>A malicious VM peer can exploit this by sending repeated CONNECT
>requests that trigger the transport mismatch condition. Each such
>request permanently increments sk_ack_backlog. After approximately
>backlog+1 such requests (default backlog ~128), sk_acceptq_is_full()
>returns true, causing the listener to reject ALL new connections with
>-ENOMEM. The only recovery is closing and re-creating the listener
>socket.
>
>Compare with vmci_transport.c and hyperv_transport.c which correctly
>place sk_acceptq_added() AFTER the transport check, avoiding this
>issue entirely.
>
>Fix by moving sk_acceptq_added(sk) to after the transport validation
>check, matching the pattern used by the other transports.
The issue seems legitimate, but this patch doesn't do what you're
describing here.
Out of curiosity, how did you generate it?
Stefano
>
>Fixes: c0cfa2d8a788 ("vsock: add multi-transports support")
>Signed-off-by: Dudu Lu <phx0fer@gmail.com>
>---
> net/vmw_vsock/virtio_transport_common.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
>diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>index 8a9fb23c6e85..29e1d9833be4 100644
>--- a/net/vmw_vsock/virtio_transport_common.c
>+++ b/net/vmw_vsock/virtio_transport_common.c
>@@ -1,3 +1,4 @@
>+ sk_acceptq_added(sk);
> // SPDX-License-Identifier: GPL-2.0-only
> /*
> * common code for virtio vsock
>@@ -1560,8 +1561,9 @@ virtio_transport_recv_listen(struct sock *sk, struct sk_buff *skb,
> return -ENOMEM;
> }
>
>- sk_acceptq_added(sk);
>
>+
>+ sk_acceptq_added(sk);
> lock_sock_nested(child, SINGLE_DEPTH_NESTING);
>
> child->sk_state = TCP_ESTABLISHED;
>--
>2.39.3 (Apple Git-145)
>
^ permalink raw reply
* [PATCH net-next v6 0/2] net: hsr: strict supervision TLV validation
From: luka.gejak @ 2026-04-13 10:34 UTC (permalink / raw)
To: davem, edumazet, kuba, pabeni; +Cc: netdev, fmaurer, horms, Luka Gejak
From: Luka Gejak <luka.gejak@linux.dev>
Changes in v6:
- Dropped capitalization comment changes per request of Jakub Kicinski
Changes in v5:
- Reverted TLV loop in Patch 1 to strict sequential parsing per IEC
62439-3.
- Retained pskb_may_pull() logic to ensure memory safety for TLV
headers.
- Dropped Reviewed-by from Patch 1 due to the logic evolving since
original review.
- Added Assisted-by tag for AI-aided translation and formatting to
both patches.
Changes in v4:
- Split from a 4-patch series into 'net' and 'net-next' as requested.
- Implemented a TLV walker in Patch 1 to correctly handle extension
TLVs and avoid regressions on paged frames/non-linearized skbs.
- Corrected pskb_may_pull() logic to include the TLV header size.
History of pre-separation series (v1-v3):
Changes in v3:
- addressed Felix review feedback in the VLAN add unwind fix
- removed the superfluous empty line
Changes in v2:
- picked up Reviewed-by tags on patches 1, 3 and 4
- changes in patch 2 per advice of Felix Maurer
Luka Gejak (2):
net: hsr: require valid EOT supervision TLV
net: hsr: reject unresolved interlink ifindex
net/hsr/hsr_forward.c | 6 +++---
net/hsr/hsr_netlink.c | 7 ++++++-
2 files changed, 9 insertions(+), 4 deletions(-)
--
2.53.0
^ permalink raw reply
* [PATCH net-next v6 1/2] net: hsr: require valid EOT supervision TLV
From: luka.gejak @ 2026-04-13 10:34 UTC (permalink / raw)
To: davem, edumazet, kuba, pabeni; +Cc: netdev, fmaurer, horms, Luka Gejak
In-Reply-To: <20260413103449.169913-1-luka.gejak@linux.dev>
From: Luka Gejak <luka.gejak@linux.dev>
Supervision frames are only valid if terminated with a zero-length EOT
TLV. The current check fails to reject non-EOT entries as the terminal
TLV, potentially allowing malformed supervision traffic.
Fix this by strictly requiring the terminal TLV to be HSR_TLV_EOT
with a length of zero, and properly linearizing the TLV header before
access.
Assisted-by: Gemini:Gemini-3.1-flash
Signed-off-by: Luka Gejak <luka.gejak@linux.dev>
---
net/hsr/hsr_forward.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/net/hsr/hsr_forward.c b/net/hsr/hsr_forward.c
index 0aca859c88cb..0774981a65c1 100644
--- a/net/hsr/hsr_forward.c
+++ b/net/hsr/hsr_forward.c
@@ -84,7 +84,7 @@ static bool is_supervision_frame(struct hsr_priv *hsr, struct sk_buff *skb)
/* Get next tlv */
total_length += hsr_sup_tag->tlv.HSR_TLV_length;
- if (!pskb_may_pull(skb, total_length))
+ if (!pskb_may_pull(skb, total_length + sizeof(struct hsr_sup_tlv)))
return false;
skb_pull(skb, total_length);
hsr_sup_tlv = (struct hsr_sup_tlv *)skb->data;
@@ -100,7 +100,7 @@ static bool is_supervision_frame(struct hsr_priv *hsr, struct sk_buff *skb)
/* make sure another tlv follows */
total_length += sizeof(struct hsr_sup_tlv) + hsr_sup_tlv->HSR_TLV_length;
- if (!pskb_may_pull(skb, total_length))
+ if (!pskb_may_pull(skb, total_length + sizeof(struct hsr_sup_tlv)))
return false;
/* get next tlv */
@@ -110,7 +110,7 @@ static bool is_supervision_frame(struct hsr_priv *hsr, struct sk_buff *skb)
}
/* end of tlvs must follow at the end */
- if (hsr_sup_tlv->HSR_TLV_type == HSR_TLV_EOT &&
+ if (hsr_sup_tlv->HSR_TLV_type != HSR_TLV_EOT ||
hsr_sup_tlv->HSR_TLV_length != 0)
return false;
--
2.53.0
^ permalink raw reply related
* [PATCH net-next v6 2/2] net: hsr: reject unresolved interlink ifindex
From: luka.gejak @ 2026-04-13 10:34 UTC (permalink / raw)
To: davem, edumazet, kuba, pabeni; +Cc: netdev, fmaurer, horms, Luka Gejak
In-Reply-To: <20260413103449.169913-1-luka.gejak@linux.dev>
From: Luka Gejak <luka.gejak@linux.dev>
In hsr_newlink(), a provided but invalid IFLA_HSR_INTERLINK attribute
was silently ignored if __dev_get_by_index() returned NULL. This leads
to incorrect RedBox topology creation without notifying the user.
Fix this by returning -EINVAL and an extack message when the
interlink attribute is present but cannot be resolved.
Assisted-by: Gemini:Gemini-3.1-flash
Reviewed-by: Felix Maurer <fmaurer@redhat.com>
Signed-off-by: Luka Gejak <luka.gejak@linux.dev>
---
net/hsr/hsr_netlink.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/net/hsr/hsr_netlink.c b/net/hsr/hsr_netlink.c
index db0b0af7a692..f0ca23da3ab9 100644
--- a/net/hsr/hsr_netlink.c
+++ b/net/hsr/hsr_netlink.c
@@ -76,9 +76,14 @@ static int hsr_newlink(struct net_device *dev,
return -EINVAL;
}
- if (data[IFLA_HSR_INTERLINK])
+ if (data[IFLA_HSR_INTERLINK]) {
interlink = __dev_get_by_index(link_net,
nla_get_u32(data[IFLA_HSR_INTERLINK]));
+ if (!interlink) {
+ NL_SET_ERR_MSG_MOD(extack, "Interlink does not exist");
+ return -EINVAL;
+ }
+ }
if (interlink && interlink == link[0]) {
NL_SET_ERR_MSG_MOD(extack, "Interlink and Slave1 are the same");
--
2.53.0
^ permalink raw reply related
* Re: [PATCH v11 net-next 2/5] psp: add new netlink cmd for dev-assoc and dev-disassoc
From: Paolo Abeni @ 2026-04-13 10:36 UTC (permalink / raw)
To: Wei Wang, netdev, Jakub Kicinski, Daniel Zahka, Willem de Bruijn,
David Wei, Andrew Lunn, David S . Miller, Eric Dumazet,
Simon Horman
Cc: Wei Wang
In-Reply-To: <20260408231415.522691-3-weibunny.kernel@gmail.com>
On 4/9/26 1:14 AM, Wei Wang wrote:
> From: Wei Wang <weibunny@fb.com>
>
> The main purpose of this cmd is to be able to associate a
> non-psp-capable device (e.g. veth or netkit) with a psp device.
> One use case is if we create a pair of veth/netkit, and assign 1 end
> inside a netns, while leaving the other end within the default netns,
> with a real PSP device, e.g. netdevsim or a physical PSP-capable NIC.
> With this command, we could associate the veth/netkit inside the netns
> with PSP device, so the virtual device could act as PSP-capable device
> to initiate PSP connections, and performs PSP encryption/decryption on
> the real PSP device.
>
> Signed-off-by: Wei Wang <weibunny@fb.com>
> Reviewed-by: Daniel Zahka <daniel.zahka@gmail.com>
> ---
> Documentation/netlink/specs/psp.yaml | 67 +++++-
> include/net/psp/types.h | 15 ++
> include/uapi/linux/psp.h | 13 ++
> net/psp/psp-nl-gen.c | 32 +++
> net/psp/psp-nl-gen.h | 2 +
> net/psp/psp_main.c | 20 ++
> net/psp/psp_nl.c | 325 ++++++++++++++++++++++++++-
> 7 files changed, 462 insertions(+), 12 deletions(-)
>
> diff --git a/Documentation/netlink/specs/psp.yaml b/Documentation/netlink/specs/psp.yaml
> index c54e1202cbe0..3d1b7223e084 100644
> --- a/Documentation/netlink/specs/psp.yaml
> +++ b/Documentation/netlink/specs/psp.yaml
> @@ -13,6 +13,17 @@ definitions:
> hdr0-aes-gmac-128, hdr0-aes-gmac-256]
>
> attribute-sets:
> + -
> + name: assoc-dev-info
> + attributes:
> + -
> + name: ifindex
> + doc: ifindex of an associated network device.
> + type: u32
> + -
> + name: nsid
> + doc: Network namespace ID of the associated device.
> + type: s32
> -
> name: dev
> attributes:
> @@ -24,7 +35,9 @@ attribute-sets:
> min: 1
> -
> name: ifindex
> - doc: ifindex of the main netdevice linked to the PSP device.
> + doc: |
> + ifindex of the main netdevice linked to the PSP device,
> + or the ifindex to associate with the PSP device.
> type: u32
> -
> name: psp-versions-cap
> @@ -38,6 +51,28 @@ attribute-sets:
> type: u32
> enum: version
> enum-as-flags: true
> + -
> + name: assoc-list
> + doc: List of associated virtual devices.
> + type: nest
> + nested-attributes: assoc-dev-info
> + multi-attr: true
> + -
> + name: nsid
> + doc: |
> + Network namespace ID for the device to associate/disassociate.
> + Optional for dev-assoc and dev-disassoc; if not present, the
> + device is looked up in the caller's network namespace.
> + type: s32
> + -
> + name: by-association
> + doc: |
> + Flag indicating the PSP device is an associated device from a
> + different network namespace.
> + Present when in associated namespace, absent when in primary/host
> + namespace.
> + type: flag
> +
> -
> name: assoc
> attributes:
> @@ -170,6 +205,8 @@ operations:
> - ifindex
> - psp-versions-cap
> - psp-versions-ena
> + - assoc-list
> + - by-association
> pre: psp-device-get-locked
> post: psp-device-unlock
> dump:
> @@ -279,6 +316,34 @@ operations:
> post: psp-device-unlock
> dump:
> reply: *stats-all
> + -
> + name: dev-assoc
> + doc: Associate a network device with a PSP device.
> + attribute-set: dev
> + do:
> + request:
> + attributes:
> + - id
> + - ifindex
> + - nsid
> + reply:
> + attributes: []
> + pre: psp-device-get-locked
> + post: psp-device-unlock
> + -
> + name: dev-disassoc
> + doc: Disassociate a network device from a PSP device.
> + attribute-set: dev
> + do:
> + request:
> + attributes:
> + - id
> + - ifindex
> + - nsid
> + reply:
> + attributes: []
> + pre: psp-device-get-locked
> + post: psp-device-unlock
>
> mcast-groups:
> list:
> diff --git a/include/net/psp/types.h b/include/net/psp/types.h
> index 25a9096d4e7d..4bd432ed107a 100644
> --- a/include/net/psp/types.h
> +++ b/include/net/psp/types.h
> @@ -5,6 +5,7 @@
>
> #include <linux/mutex.h>
> #include <linux/refcount.h>
> +#include <net/net_trackers.h>
>
> struct netlink_ext_ack;
>
> @@ -43,9 +44,22 @@ struct psp_dev_config {
> u32 versions;
> };
>
> +/**
> + * struct psp_assoc_dev - wrapper for associated net_device
> + * @dev_list: list node for psp_dev::assoc_dev_list
> + * @assoc_dev: the associated net_device
> + * @dev_tracker: tracker for the net_device reference
> + */
> +struct psp_assoc_dev {
> + struct list_head dev_list;
> + struct net_device *assoc_dev;
> + netdevice_tracker dev_tracker;
> +};
> +
> /**
> * struct psp_dev - PSP device struct
> * @main_netdev: original netdevice of this PSP device
> + * @assoc_dev_list: list of psp_assoc_dev entries associated with this PSP device
> * @ops: driver callbacks
> * @caps: device capabilities
> * @drv_priv: driver priv pointer
> @@ -67,6 +81,7 @@ struct psp_dev_config {
> */
> struct psp_dev {
> struct net_device *main_netdev;
> + struct list_head assoc_dev_list;
>
> struct psp_dev_ops *ops;
> struct psp_dev_caps *caps;
> diff --git a/include/uapi/linux/psp.h b/include/uapi/linux/psp.h
> index a3a336488dc3..1c8899cd4da5 100644
> --- a/include/uapi/linux/psp.h
> +++ b/include/uapi/linux/psp.h
> @@ -17,11 +17,22 @@ enum psp_version {
> PSP_VERSION_HDR0_AES_GMAC_256,
> };
>
> +enum {
> + PSP_A_ASSOC_DEV_INFO_IFINDEX = 1,
> + PSP_A_ASSOC_DEV_INFO_NSID,
> +
> + __PSP_A_ASSOC_DEV_INFO_MAX,
> + PSP_A_ASSOC_DEV_INFO_MAX = (__PSP_A_ASSOC_DEV_INFO_MAX - 1)
> +};
> +
> enum {
> PSP_A_DEV_ID = 1,
> PSP_A_DEV_IFINDEX,
> PSP_A_DEV_PSP_VERSIONS_CAP,
> PSP_A_DEV_PSP_VERSIONS_ENA,
> + PSP_A_DEV_ASSOC_LIST,
> + PSP_A_DEV_NSID,
> + PSP_A_DEV_BY_ASSOCIATION,
>
> __PSP_A_DEV_MAX,
> PSP_A_DEV_MAX = (__PSP_A_DEV_MAX - 1)
> @@ -74,6 +85,8 @@ enum {
> PSP_CMD_RX_ASSOC,
> PSP_CMD_TX_ASSOC,
> PSP_CMD_GET_STATS,
> + PSP_CMD_DEV_ASSOC,
> + PSP_CMD_DEV_DISASSOC,
>
> __PSP_CMD_MAX,
> PSP_CMD_MAX = (__PSP_CMD_MAX - 1)
> diff --git a/net/psp/psp-nl-gen.c b/net/psp/psp-nl-gen.c
> index 1f5e73e7ccc1..114299c64423 100644
> --- a/net/psp/psp-nl-gen.c
> +++ b/net/psp/psp-nl-gen.c
> @@ -53,6 +53,20 @@ static const struct nla_policy psp_get_stats_nl_policy[PSP_A_STATS_DEV_ID + 1] =
> [PSP_A_STATS_DEV_ID] = NLA_POLICY_MIN(NLA_U32, 1),
> };
>
> +/* PSP_CMD_DEV_ASSOC - do */
> +static const struct nla_policy psp_dev_assoc_nl_policy[PSP_A_DEV_NSID + 1] = {
> + [PSP_A_DEV_ID] = NLA_POLICY_MIN(NLA_U32, 1),
> + [PSP_A_DEV_IFINDEX] = { .type = NLA_U32, },
> + [PSP_A_DEV_NSID] = { .type = NLA_S32, },
> +};
> +
> +/* PSP_CMD_DEV_DISASSOC - do */
> +static const struct nla_policy psp_dev_disassoc_nl_policy[PSP_A_DEV_NSID + 1] = {
> + [PSP_A_DEV_ID] = NLA_POLICY_MIN(NLA_U32, 1),
> + [PSP_A_DEV_IFINDEX] = { .type = NLA_U32, },
> + [PSP_A_DEV_NSID] = { .type = NLA_S32, },
> +};
> +
> /* Ops table for psp */
> static const struct genl_split_ops psp_nl_ops[] = {
> {
> @@ -119,6 +133,24 @@ static const struct genl_split_ops psp_nl_ops[] = {
> .dumpit = psp_nl_get_stats_dumpit,
> .flags = GENL_CMD_CAP_DUMP,
> },
> + {
> + .cmd = PSP_CMD_DEV_ASSOC,
> + .pre_doit = psp_device_get_locked,
> + .doit = psp_nl_dev_assoc_doit,
> + .post_doit = psp_device_unlock,
> + .policy = psp_dev_assoc_nl_policy,
> + .maxattr = PSP_A_DEV_NSID,
> + .flags = GENL_CMD_CAP_DO,
> + },
> + {
> + .cmd = PSP_CMD_DEV_DISASSOC,
> + .pre_doit = psp_device_get_locked,
> + .doit = psp_nl_dev_disassoc_doit,
> + .post_doit = psp_device_unlock,
> + .policy = psp_dev_disassoc_nl_policy,
> + .maxattr = PSP_A_DEV_NSID,
> + .flags = GENL_CMD_CAP_DO,
Sashiko notes that the above allows deleteing an associations bypassing
the netns boundaries. Do you need ADMIN_PERM flag or exlicit checks in
the doit cb?
> @@ -292,6 +455,145 @@ int psp_nl_key_rotate_doit(struct sk_buff *skb, struct genl_info *info)
> return err;
> }
>
> +int psp_nl_dev_assoc_doit(struct sk_buff *skb, struct genl_info *info)
> +{
> + struct psp_dev *psd = info->user_ptr[0];
> + struct psp_assoc_dev *psp_assoc_dev;
> + struct net_device *assoc_dev;
> + struct sk_buff *rsp;
> + u32 assoc_ifindex;
> + struct net *net;
> + int nsid, err;
> +
> + if (GENL_REQ_ATTR_CHECK(info, PSP_A_DEV_IFINDEX))
> + return -EINVAL;
> +
> + if (info->attrs[PSP_A_DEV_NSID]) {
> + nsid = nla_get_s32(info->attrs[PSP_A_DEV_NSID]);
> +
> + net = get_net_ns_by_id(genl_info_net(info), nsid);
> + if (!net) {
> + NL_SET_BAD_ATTR(info->extack,
> + info->attrs[PSP_A_DEV_NSID]);
> + return -EINVAL;
> + }
> + } else {
> + net = get_net(genl_info_net(info));
> + }
psp_nl_dev_disassoc_doit() has the same code; perhaps it would be worthy
move it in a common helper, called via pre_doit()? It should also
simplify the cleanup paths.
> +
> + psp_assoc_dev = kzalloc(sizeof(*psp_assoc_dev), GFP_KERNEL);
> + if (!psp_assoc_dev) {
> + err = -ENOMEM;
> + goto alloc_err;
> + }
> +
> + assoc_ifindex = nla_get_u32(info->attrs[PSP_A_DEV_IFINDEX]);
> + assoc_dev = netdev_get_by_index(net, assoc_ifindex,
> + &psp_assoc_dev->dev_tracker,
> + GFP_KERNEL);
> + if (!assoc_dev) {
> + NL_SET_BAD_ATTR(info->extack, info->attrs[PSP_A_DEV_IFINDEX]);
> + err = -ENODEV;
> + goto assoc_dev_err;
> + }
> +
> + /* Check if device is already associated with a PSP device */
> + if (cmpxchg(&assoc_dev->psp_dev, NULL, RCU_INITIALIZER(psd))) {
> + NL_SET_ERR_MSG(info->extack,
> + "Device already associated with a PSP device");
> + err = -EBUSY;
> + goto cmpxchg_err;
> + }
> +
> + psp_assoc_dev->assoc_dev = assoc_dev;
> + rsp = psp_nl_reply_new(info);
> + if (!rsp) {
> + err = -ENOMEM;
> + goto rsp_err;
> + }
> +
> + list_add_tail(&psp_assoc_dev->dev_list, &psd->assoc_dev_list);
Sashiko says:
---
list_add_tail(&psp_assoc_dev->dev_list, &psd->assoc_dev_list);
There doesn't seem to be a limit on the number of devices that can be
associated with a single PSP device.
If a user repeatedly associates devices, could the generated netlink message
in psp_nl_dev_fill() exceed the maximum allowed size (GENLMSG_DEFAULT_SIZE),
causing it to fail with -EMSGSIZE and permanently break PSP_CMD_DEV_GET
and management notifications for the device?
--
/P
^ permalink raw reply
* Re: [PATCH net] net: usb: cdc_ncm: reject negative chained NDP offsets
From: Greg Kroah-Hartman @ 2026-04-13 10:43 UTC (permalink / raw)
To: Oliver Neukum
Cc: linux-usb, netdev, linux-kernel, Oliver Neukum, Andrew Lunn,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
stable
In-Reply-To: <2a6963c8-4a87-4fed-b875-d46f3ce53e42@suse.com>
On Mon, Apr 13, 2026 at 10:36:19AM +0200, Oliver Neukum wrote:
>
>
> On 11.04.26 12:53, Greg Kroah-Hartman wrote:
> > cdc_ncm_rx_fixup() reads dwNextNdpIndex from each NDP32 to chain to the
> > next one. The 32-bit value from the device is stored into the signed
> > int ndpoffset so that means values with the high bit set become
>
> Well, then isn't the problem rather that you should not store an
> unsigned value in a signed variable?
No. well, yes. but no.
cdc_ncm_rx_verify_nth16() returns an int, and is negative if something
went wrong, so we need it that way, and then we need to check it, like
we properly do at the top of the loop, it's just that at the bottom of
the loop we also need to do the same exact thing.
So I really think this patch is the correct thing to do unless you want
to add another temp variable here just for the sign -> unsigned
transition and but that might be even messier.
thanks,
greg k-h
^ permalink raw reply
* Re: [PATCH v11 net-next 3/5] psp: add a new netdev event for dev unregister
From: Paolo Abeni @ 2026-04-13 10:47 UTC (permalink / raw)
To: Wei Wang, netdev, Jakub Kicinski, Daniel Zahka, Willem de Bruijn,
David Wei, Andrew Lunn, David S . Miller, Eric Dumazet,
Simon Horman
Cc: Wei Wang
In-Reply-To: <20260408231415.522691-4-weibunny.kernel@gmail.com>
On 4/9/26 1:14 AM, Wei Wang wrote:
> +static int psp_netdev_event(struct notifier_block *nb, unsigned long event,
> + void *ptr)
> +{
> + struct net_device *dev = netdev_notifier_info_to_dev(ptr);
> + struct psp_dev *psd;
> +
> + if (event != NETDEV_UNREGISTER)
> + return NOTIFY_DONE;
> +
> + rcu_read_lock();
> + psd = rcu_dereference(dev->psp_dev);
> + if (psd && psp_dev_tryget(psd)) {
> + rcu_read_unlock();
> + mutex_lock(&psd->lock);
> + psp_dev_disassoc_one(psd, dev);
> + mutex_unlock(&psd->lock);
> + psp_dev_put(psd);
Sashiko notes that the above is racy:
---
Can this code race with psp_nl_dev_assoc_doit() and permanently leak a
net_device reference?
If CPU1 is executing psp_nl_dev_assoc_doit() and CPU2 is unregistering the
device, the following interleaving could happen:
CPU1 (psp_nl_dev_assoc_doit)
assoc_dev = dev_get_by_index(...) // acquires a reference
CPU2 (unregister_netdevice)
psp_netdev_event()
psd = rcu_dereference(dev->psp_dev); // sees NULL, returns
NOTIFY_DONE
CPU1 (psp_nl_dev_assoc_doit)
cmpxchg(&assoc_dev->psp_dev, NULL, psd); // succeeds!
list_add(...) // adds to psd->assoc_dev_list
If this occurs, the notifier misses the unregistration event since it runs
before the device is fully associated. The unregistering thread will then
enter netdev_wait_allrefs() and wait indefinitely because the reference
held in assoc_dev_list is never released.
---
/P
^ permalink raw reply
* Re: [PATCH iwl-net v2 3/6] ixgbe: call ixgbe_setup_fc() before fc_enable() after NVM update
From: Simon Horman @ 2026-04-13 10:51 UTC (permalink / raw)
To: aleksandr.loktionov
Cc: 'Simon Horman', intel-wired-lan, anthony.l.nguyen, netdev
In-Reply-To: <20260408131154.2661818-4-aleksandr.loktionov@intel.com>
From: 'Simon Horman' <horms@kernel.org>
This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
Full review at: https://sashiko.dev
---
ixgbe: call ixgbe_setup_fc() before fc_enable() after NVM update
This commit adds a call to hw->mac.ops.setup_fc() before fc_enable() in the
watchdog task when the link comes up, aiming to prevent MDD events after
an NVM update by reprogramming flow-control autonegotiation.
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> @@ -8031,7 +8031,14 @@ static void ixgbe_watchdog_update_link(struct ixgbe_adapter *adapter)
> pfc_en |= !!(adapter->ixgbe_ieee_pfc->pfc_en);
>
> if (link_up && !((adapter->flags & IXGBE_FLAG_DCB_ENABLED) && pfc_en)) {
> - hw->mac.ops.fc_enable(hw);
> + /* Re-program flow-control autoneg before applying the result.
> + * If setup_fc() fails its output is invalid; skip fc_enable()
> + * to avoid committing stale capability bits that trigger MDD.
> + */
> + if (hw->mac.ops.setup_fc && hw->mac.ops.setup_fc(hw))
Will calling setup_fc() unconditionally from the link-up watchdog cause an
infinite link flapping loop on 82599 backplane interfaces?
For backplane interfaces like KX/KX4/KR, setup_fc() resolves to
ixgbe_setup_fc_generic(), which then calls hw->mac.ops.prot_autoc_write().
On 82599 hardware, this calls prot_autoc_write_82599(), which unconditionally
executes ixgbe_reset_pipeline_82599().
Since ixgbe_reset_pipeline_82599() toggles the IXGBE_AUTOC_AN_RESTART bit,
would this drop the link immediately to restart Auto-Negotiation?
If the link drops, it would trigger a link-down LSC interrupt. Once
Auto-Negotiation completes, the link would come back up, triggering another
link-up LSC interrupt. This would cause ixgbe_watchdog_update_link() to run
again, repeating the cycle.
> + e_warn(drv, "setup_fc failed, skipping fc_enable\n");
> + else
> + hw->mac.ops.fc_enable(hw);
> ixgbe_set_rx_drop_en(adapter);
> }
^ permalink raw reply
* Re: [PATCH iwl-net v2 1/6] ixgbe: fix SWFW semaphore timeout for X550 family
From: Simon Horman @ 2026-04-13 10:52 UTC (permalink / raw)
To: Aleksandr Loktionov; +Cc: intel-wired-lan, anthony.l.nguyen, netdev
In-Reply-To: <20260408131154.2661818-2-aleksandr.loktionov@intel.com>
On Wed, Apr 08, 2026 at 03:11:49PM +0200, Aleksandr Loktionov wrote:
> According to FW documentation, the most time-consuming FW operation is
> Shadow RAM (SR) dump which takes up to 3.2 seconds. For X550 family
> devices the module-update FW command can take over 4.5 s. The default
> semaphore loop runs 200 iterations with a 5 ms sleep each, giving a
> maximum wait of 1 s -- not "200 ms" as previously stated in error.
> This is insufficient for X550 family FW update operations and causes
> spurious EBUSY failures.
>
> Extend the SW/FW semaphore timeout from 1 s to 5 s (1000 iterations x
> 5 ms) for all three X550 variants: ixgbe_mac_X550, ixgbe_mac_X550EM_x,
> and ixgbe_mac_x550em_a. All three share the same FW and exhibit the
> same worst-case latency. Use three explicit mac.type comparisons rather
> than a range check so future MAC additions are not inadvertently
> captured.
>
> The timeout variable is set immediately before the loop so the intent
> is clear, with an inline comment stating the resulting maximum delay.
>
> Suggested-by: Soumen Karmakar <soumen.karmakar@intel.com>
> Cc: stable@vger.kernel.org
> Suggested-by: Marta Plantykow <marta.a.plantykow@intel.com>
> Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
> ---
> v1 -> v2:
> - Squash with 0015 (X550EM extension); fix commit message ("200ms" was
> wrong, actual default is 1 s); replace >= / <= range check with three
> explicit mac.type == comparisons per Tony Nguyen.
Reviewed-by: Simon Horman <horms@kernel.org>
^ permalink raw reply
* [PATCH net V2 0/3] net/mlx5: Fixes for Socket-Direct
From: Tariq Toukan @ 2026-04-13 10:53 UTC (permalink / raw)
To: Eric Dumazet, Jakub Kicinski, Paolo Abeni, Andrew Lunn,
David S. Miller
Cc: Saeed Mahameed, Tariq Toukan, Mark Bloch, Leon Romanovsky,
Shay Drory, Simon Horman, Kees Cook, Parav Pandit,
Patrisious Haddad, Gal Pressman, netdev, linux-rdma, linux-kernel,
Dragos Tatulea
Hi,
This series fixes several race conditions and bugs in the mlx5
Socket-Direct (SD) single netdev flow.
Patch 1 serializes mlx5_sd_init()/mlx5_sd_cleanup() with
mlx5_devcom_comp_lock() and tracks the SD group state on the primary
device, preventing concurrent or duplicate bring-up/tear-down.
Patch 2 fixes the debugfs "multi-pf" directory being stored on the
calling device's sd struct instead of the primary's, which caused
memory leaks and recreation errors when cleanup ran from a different PF.
Patch 3 fixes a race where a secondary PF could access the primary's
auxiliary device after it had been unbound, by holding the primary's
device lock while operating on its auxiliary device.
Regards,
Tariq
V2:
- Link to V1:
https://lore.kernel.org/all/20260330193412.53408-1-tariqt@nvidia.com/
- Reorder the patches so that "net/mlx5: SD: Serialize init/cleanup"
is first.
- Add MLX5_SD_STATE_DESTROYING to the patch above to solve a concurrent
edge case.
- Expend commit message of "net/mlx5e: SD, Fix race condition in
secondary device probe/remove"
Shay Drory (3):
net/mlx5: SD: Serialize init/cleanup
net/mlx5: SD, Keep multi-pf debugfs entries on primary
net/mlx5e: SD, Fix race condition in secondary device probe/remove
.../net/ethernet/mellanox/mlx5/core/en_main.c | 18 +++--
.../net/ethernet/mellanox/mlx5/core/lib/sd.c | 70 ++++++++++++++++---
.../net/ethernet/mellanox/mlx5/core/lib/sd.h | 2 +
3 files changed, 77 insertions(+), 13 deletions(-)
base-commit: 2dddb34dd0d07b01fa770eca89480a4da4f13153
--
2.44.0
^ permalink raw reply
* [PATCH net V2 1/3] net/mlx5: SD: Serialize init/cleanup
From: Tariq Toukan @ 2026-04-13 10:53 UTC (permalink / raw)
To: Eric Dumazet, Jakub Kicinski, Paolo Abeni, Andrew Lunn,
David S. Miller
Cc: Saeed Mahameed, Tariq Toukan, Mark Bloch, Leon Romanovsky,
Shay Drory, Simon Horman, Kees Cook, Parav Pandit,
Patrisious Haddad, Gal Pressman, netdev, linux-rdma, linux-kernel,
Dragos Tatulea
In-Reply-To: <20260413105323.186411-1-tariqt@nvidia.com>
From: Shay Drory <shayd@nvidia.com>
mlx5_sd_init() / mlx5_sd_cleanup() may run from multiple PFs in the same
Socket-Direct group. This can cause the SD bring-up/tear-down sequence
to be executed more than once or interleaved across PFs.
Protect SD init/cleanup with mlx5_devcom_comp_lock() and track the SD
group state on the primary device. Skip init if the primary is already
UP, and skip cleanup unless the primary is UP.
Fixes: 381978d28317 ("net/mlx5e: Create single netdev per SD group")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
.../net/ethernet/mellanox/mlx5/core/lib/sd.c | 35 +++++++++++++++++--
1 file changed, 32 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c
index 954942ad93c5..a34860ad231a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c
@@ -18,6 +18,7 @@ struct mlx5_sd {
u8 host_buses;
struct mlx5_devcom_comp_dev *devcom;
struct dentry *dfs;
+ u8 state;
bool primary;
union {
struct { /* primary */
@@ -31,6 +32,12 @@ struct mlx5_sd {
};
};
+enum mlx5_sd_state {
+ MLX5_SD_STATE_DOWN = 0,
+ MLX5_SD_STATE_UP,
+ MLX5_SD_STATE_DESTROYING,
+};
+
static int mlx5_sd_get_host_buses(struct mlx5_core_dev *dev)
{
struct mlx5_sd *sd = mlx5_get_sd(dev);
@@ -426,6 +433,7 @@ int mlx5_sd_init(struct mlx5_core_dev *dev)
struct mlx5_core_dev *primary, *pos, *to;
struct mlx5_sd *sd = mlx5_get_sd(dev);
u8 alias_key[ACCESS_KEY_LEN];
+ struct mlx5_sd *primary_sd;
int err, i;
err = sd_init(dev);
@@ -440,10 +448,15 @@ int mlx5_sd_init(struct mlx5_core_dev *dev)
if (err)
goto err_sd_cleanup;
+ mlx5_devcom_comp_lock(sd->devcom);
if (!mlx5_devcom_comp_is_ready(sd->devcom))
- return 0;
+ goto out;
primary = mlx5_sd_get_primary(dev);
+ primary_sd = mlx5_get_sd(primary);
+
+ if (primary_sd->state != MLX5_SD_STATE_DOWN)
+ goto out;
for (i = 0; i < ACCESS_KEY_LEN; i++)
alias_key[i] = get_random_u8();
@@ -472,6 +485,9 @@ int mlx5_sd_init(struct mlx5_core_dev *dev)
sd->group_id, mlx5_devcom_comp_get_size(sd->devcom));
sd_print_group(primary);
+ primary_sd->state = MLX5_SD_STATE_UP;
+out:
+ mlx5_devcom_comp_unlock(sd->devcom);
return 0;
err_unset_secondaries:
@@ -481,6 +497,7 @@ int mlx5_sd_init(struct mlx5_core_dev *dev)
sd_cmd_unset_primary(primary);
debugfs_remove_recursive(sd->dfs);
err_sd_unregister:
+ mlx5_devcom_comp_unlock(sd->devcom);
sd_unregister(dev);
err_sd_cleanup:
sd_cleanup(dev);
@@ -491,23 +508,35 @@ void mlx5_sd_cleanup(struct mlx5_core_dev *dev)
{
struct mlx5_sd *sd = mlx5_get_sd(dev);
struct mlx5_core_dev *primary, *pos;
+ struct mlx5_sd *primary_sd = NULL;
int i;
if (!sd)
return;
+ mlx5_devcom_comp_lock(sd->devcom);
if (!mlx5_devcom_comp_is_ready(sd->devcom))
- goto out;
+ goto out_unlock;
primary = mlx5_sd_get_primary(dev);
+ primary_sd = mlx5_get_sd(primary);
+
+ if (primary_sd->state != MLX5_SD_STATE_UP)
+ goto out_unlock;
+
mlx5_sd_for_each_secondary(i, primary, pos)
sd_cmd_unset_secondary(pos);
sd_cmd_unset_primary(primary);
debugfs_remove_recursive(sd->dfs);
sd_info(primary, "group id %#x, uncombined\n", sd->group_id);
-out:
+ primary_sd->state = MLX5_SD_STATE_DESTROYING;
+out_unlock:
+ mlx5_devcom_comp_unlock(sd->devcom);
sd_unregister(dev);
+ if (primary_sd)
+ /* devcom isn't ready, reset the state */
+ primary_sd->state = MLX5_SD_STATE_DOWN;
sd_cleanup(dev);
}
--
2.44.0
^ permalink raw reply related
* [PATCH net V2 3/3] net/mlx5e: SD, Fix race condition in secondary device probe/remove
From: Tariq Toukan @ 2026-04-13 10:53 UTC (permalink / raw)
To: Eric Dumazet, Jakub Kicinski, Paolo Abeni, Andrew Lunn,
David S. Miller
Cc: Saeed Mahameed, Tariq Toukan, Mark Bloch, Leon Romanovsky,
Shay Drory, Simon Horman, Kees Cook, Parav Pandit,
Patrisious Haddad, Gal Pressman, netdev, linux-rdma, linux-kernel,
Dragos Tatulea
In-Reply-To: <20260413105323.186411-1-tariqt@nvidia.com>
From: Shay Drory <shayd@nvidia.com>
When utilizing Socket-Direct single netdev functionality the driver
resolves the actual auxiliary device using mlx5_sd_get_adev(). However,
the current implementation returns the primary ETH auxiliary device
without holding the device lock, leading to a potential race condition
where the ETH device could be unbound or removed concurrently during
probe, suspend, resume, or remove operations.[1]
Fix this by introducing mlx5_sd_put_adev() and updating
mlx5_sd_get_adev() so that secondaries devices would acquire the device
lock of the returned auxiliary device. After the lock is acquired, a
second devcom check is needed[2].
In addition, update The callers to pair the get operation with the new
put operation, ensuring the lock is held while the auxiliary device is
being operated on and released afterwards.
The "primary" designation is determined once in sd_register(). It's set
before devcom is marked ready, and it never changes after that.
In Addition, The primary path never locks a secondary: When the primary
device invoke mlx5_sd_get_adev(), it sees dev == primary and returns.
no additional lock is taken.
Therefore lock ordering is always: secondary_lock -> primary_lock. The
reverse never happens, so ABBA deadlock is impossible.
[1]
for example:
BUG: kernel NULL pointer dereference, address: 0000000000000370
PGD 0 P4D 0
Oops: Oops: 0000 [#1] SMP
CPU: 4 UID: 0 PID: 3945 Comm: bash Not tainted 6.19.0-rc3+ #1 NONE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
RIP: 0010:mlx5e_dcbnl_dscp_app+0x23/0x100 [mlx5_core]
Call Trace:
<TASK>
mlx5e_remove+0x82/0x12a [mlx5_core]
device_release_driver_internal+0x194/0x1f0
bus_remove_device+0xc6/0x140
device_del+0x159/0x3c0
? devl_param_driverinit_value_get+0x29/0x80
mlx5_rescan_drivers_locked+0x92/0x160 [mlx5_core]
mlx5_unregister_device+0x34/0x50 [mlx5_core]
mlx5_uninit_one+0x43/0xb0 [mlx5_core]
remove_one+0x4e/0xc0 [mlx5_core]
pci_device_remove+0x39/0xa0
device_release_driver_internal+0x194/0x1f0
unbind_store+0x99/0xa0
kernfs_fop_write_iter+0x12e/0x1e0
vfs_write+0x215/0x3d0
ksys_write+0x5f/0xd0
do_syscall_64+0x55/0xe90
entry_SYSCALL_64_after_hwframe+0x4b/0x53
[2]
CPU0 (primary) CPU1 (secondary)
==========================================================================
mlx5e_remove() (device_lock held)
mlx5e_remove() (2nd device_lock held)
mlx5_sd_get_adev()
mlx5_devcom_comp_is_ready() => true
device_lock(primary)
mlx5_sd_get_adev() ==> ret adev
_mlx5e_remove()
mlx5_sd_cleanup()
// mlx5e_remove finished
// releasing device_lock
//need another check here...
mlx5_devcom_comp_is_ready() => false
Fixes: 381978d28317 ("net/mlx5e: Create single netdev per SD group")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
.../net/ethernet/mellanox/mlx5/core/en_main.c | 18 ++++++++++++++----
.../net/ethernet/mellanox/mlx5/core/lib/sd.c | 17 +++++++++++++++++
.../net/ethernet/mellanox/mlx5/core/lib/sd.h | 2 ++
3 files changed, 33 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 0b8b44bbcb9e..11f80158e107 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -6657,8 +6657,11 @@ static int mlx5e_resume(struct auxiliary_device *adev)
return err;
actual_adev = mlx5_sd_get_adev(mdev, adev, edev->idx);
- if (actual_adev)
- return _mlx5e_resume(actual_adev);
+ if (actual_adev) {
+ err = _mlx5e_resume(actual_adev);
+ mlx5_sd_put_adev(actual_adev, adev);
+ return err;
+ }
return 0;
}
@@ -6698,6 +6701,8 @@ static int mlx5e_suspend(struct auxiliary_device *adev, pm_message_t state)
err = _mlx5e_suspend(actual_adev, false);
mlx5_sd_cleanup(mdev);
+ if (actual_adev)
+ mlx5_sd_put_adev(actual_adev, adev);
return err;
}
@@ -6795,8 +6800,11 @@ static int mlx5e_probe(struct auxiliary_device *adev,
return err;
actual_adev = mlx5_sd_get_adev(mdev, adev, edev->idx);
- if (actual_adev)
- return _mlx5e_probe(actual_adev);
+ if (actual_adev) {
+ err = _mlx5e_probe(actual_adev);
+ mlx5_sd_put_adev(actual_adev, adev);
+ return err;
+ }
return 0;
}
@@ -6849,6 +6857,8 @@ static void mlx5e_remove(struct auxiliary_device *adev)
_mlx5e_remove(actual_adev);
mlx5_sd_cleanup(mdev);
+ if (actual_adev)
+ mlx5_sd_put_adev(actual_adev, adev);
}
static const struct auxiliary_device_id mlx5e_id_table[] = {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c
index a5e2e0a411df..6cece851b102 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c
@@ -546,6 +546,10 @@ void mlx5_sd_cleanup(struct mlx5_core_dev *dev)
sd_cleanup(dev);
}
+/* Cannot take devcom lock as a gate for device lock. ABBA deadlock:
+ * primary: actual_adev_lock -> SD devcom comp lock
+ * secondary: SD devcom comp lock -> actual_adev_lock
+ */
struct auxiliary_device *mlx5_sd_get_adev(struct mlx5_core_dev *dev,
struct auxiliary_device *adev,
int idx)
@@ -563,5 +567,18 @@ struct auxiliary_device *mlx5_sd_get_adev(struct mlx5_core_dev *dev,
if (dev == primary)
return adev;
+ device_lock(&primary->priv.adev[idx]->adev.dev);
+ /* In case primary finish removing its adev */
+ if (!mlx5_devcom_comp_is_ready(sd->devcom)) {
+ device_unlock(&primary->priv.adev[idx]->adev.dev);
+ return NULL;
+ }
return &primary->priv.adev[idx]->adev;
}
+
+void mlx5_sd_put_adev(struct auxiliary_device *actual_adev,
+ struct auxiliary_device *adev)
+{
+ if (actual_adev != adev)
+ device_unlock(&actual_adev->dev);
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.h b/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.h
index 137efaf9aabc..9bfd5b9756b5 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.h
@@ -15,6 +15,8 @@ struct mlx5_core_dev *mlx5_sd_ch_ix_get_dev(struct mlx5_core_dev *primary, int c
struct auxiliary_device *mlx5_sd_get_adev(struct mlx5_core_dev *dev,
struct auxiliary_device *adev,
int idx);
+void mlx5_sd_put_adev(struct auxiliary_device *actual_adev,
+ struct auxiliary_device *adev);
int mlx5_sd_init(struct mlx5_core_dev *dev);
void mlx5_sd_cleanup(struct mlx5_core_dev *dev);
--
2.44.0
^ permalink raw reply related
* [PATCH net V2 2/3] net/mlx5: SD, Keep multi-pf debugfs entries on primary
From: Tariq Toukan @ 2026-04-13 10:53 UTC (permalink / raw)
To: Eric Dumazet, Jakub Kicinski, Paolo Abeni, Andrew Lunn,
David S. Miller
Cc: Saeed Mahameed, Tariq Toukan, Mark Bloch, Leon Romanovsky,
Shay Drory, Simon Horman, Kees Cook, Parav Pandit,
Patrisious Haddad, Gal Pressman, netdev, linux-rdma, linux-kernel,
Dragos Tatulea
In-Reply-To: <20260413105323.186411-1-tariqt@nvidia.com>
From: Shay Drory <shayd@nvidia.com>
mlx5_sd_init() creates the "multi-pf" debugfs directory under the
primary device debugfs root, but stored the dentry in the calling
device's sd struct. When sd_cleanup() run on a different PF,
this leads to using the wrong sd->dfs for removing entries, which
results in memory leak and an error in when re-creating the SD.[1]
Fix it by explicitly storing the debugfs dentry in the primary
device sd struct and use it for all per-group files.
[1]
debugfs: 'multi-pf' already exists in '0000:08:00.1'
Fixes: 4375130bf527 ("net/mlx5: SD, Add debugfs")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
.../net/ethernet/mellanox/mlx5/core/lib/sd.c | 20 ++++++++++++-------
1 file changed, 13 insertions(+), 7 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c
index a34860ad231a..a5e2e0a411df 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c
@@ -465,9 +465,13 @@ int mlx5_sd_init(struct mlx5_core_dev *dev)
if (err)
goto err_sd_unregister;
- sd->dfs = debugfs_create_dir("multi-pf", mlx5_debugfs_get_dev_root(primary));
- debugfs_create_x32("group_id", 0400, sd->dfs, &sd->group_id);
- debugfs_create_file("primary", 0400, sd->dfs, primary, &dev_fops);
+ primary_sd->dfs =
+ debugfs_create_dir("multi-pf",
+ mlx5_debugfs_get_dev_root(primary));
+ debugfs_create_x32("group_id", 0400, primary_sd->dfs,
+ &primary_sd->group_id);
+ debugfs_create_file("primary", 0400, primary_sd->dfs, primary,
+ &dev_fops);
mlx5_sd_for_each_secondary(i, primary, pos) {
char name[32];
@@ -477,7 +481,8 @@ int mlx5_sd_init(struct mlx5_core_dev *dev)
goto err_unset_secondaries;
snprintf(name, sizeof(name), "secondary_%d", i - 1);
- debugfs_create_file(name, 0400, sd->dfs, pos, &dev_fops);
+ debugfs_create_file(name, 0400, primary_sd->dfs, pos,
+ &dev_fops);
}
@@ -495,7 +500,8 @@ int mlx5_sd_init(struct mlx5_core_dev *dev)
mlx5_sd_for_each_secondary_to(i, primary, to, pos)
sd_cmd_unset_secondary(pos);
sd_cmd_unset_primary(primary);
- debugfs_remove_recursive(sd->dfs);
+ debugfs_remove_recursive(primary_sd->dfs);
+ primary_sd->dfs = NULL;
err_sd_unregister:
mlx5_devcom_comp_unlock(sd->devcom);
sd_unregister(dev);
@@ -520,14 +526,14 @@ void mlx5_sd_cleanup(struct mlx5_core_dev *dev)
primary = mlx5_sd_get_primary(dev);
primary_sd = mlx5_get_sd(primary);
-
if (primary_sd->state != MLX5_SD_STATE_UP)
goto out_unlock;
mlx5_sd_for_each_secondary(i, primary, pos)
sd_cmd_unset_secondary(pos);
sd_cmd_unset_primary(primary);
- debugfs_remove_recursive(sd->dfs);
+ debugfs_remove_recursive(primary_sd->dfs);
+ primary_sd->dfs = NULL;
sd_info(primary, "group id %#x, uncombined\n", sd->group_id);
primary_sd->state = MLX5_SD_STATE_DESTROYING;
--
2.44.0
^ permalink raw reply related
* Re: [PATCH v11 net-next 4/7] devlink: Implement devlink param multi attribute nested data values
From: Paolo Abeni @ 2026-04-13 10:54 UTC (permalink / raw)
To: Ratheesh Kannoth, netdev, linux-kernel, linux-rdma
Cc: sgoutham, andrew+netdev, davem, edumazet, kuba, donald.hunter,
horms, jiri, chuck.lever, matttbe, cjubran, saeedm, leon, tariqt,
mbloch, dtatulea
In-Reply-To: <20260409025055.1664053-5-rkannoth@marvell.com>
On 4/9/26 4:50 AM, Ratheesh Kannoth wrote:
> @@ -441,6 +448,7 @@ union devlink_param_value {
> u64 vu64;
> char vstr[__DEVLINK_PARAM_MAX_STRING_VALUE];
> bool vbool;
> + struct devlink_param_u64_array u64arr;
You mentioned that you intend to handle the possible CONFIG_FRAME_WARN
with a separate patch. IMHO such patch need to be part of this series,
or things will stay broken for an undefined amount of time until such
patch is merged separatelly.
/P
^ permalink raw reply
* Re: [PATCH iwl-net v2 4/6] ixgbe: fix cls_u32 nexthdr path returning success when no entry installed
From: Simon Horman @ 2026-04-13 10:54 UTC (permalink / raw)
To: Aleksandr Loktionov
Cc: intel-wired-lan, anthony.l.nguyen, netdev, Marcin Szycik
In-Reply-To: <20260408131154.2661818-5-aleksandr.loktionov@intel.com>
On Wed, Apr 08, 2026 at 03:11:52PM +0200, Aleksandr Loktionov wrote:
> ixgbe_configure_clsu32() returns 0 (success) after the nexthdr loop
> even when ixgbe_clsu32_build_input() fails for every candidate entry
> and no jump-table slot is actually programmed. Callers that test the
> return value would then falsely believe the filter was installed.
>
> The variable 'err' already tracks the last ixgbe_clsu32_build_input()
> return value; if the loop completes with a successful break, err is 0.
> If all attempts failed, err holds the last failure code. Change the
> unconditional 'return 0' to 'return err' so errors are propagated
> correctly.
>
> Fixes: 1cdaaf5405ba ("ixgbe: Match on multiple headers for cls_u32 offloads")
> Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
> Cc: stable@vger.kernel.org
> Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com>
> ---
> v1 -> v2:
> - Add Fixes: tag; reroute from iwl-next to iwl-net (false-success
> return is a user-visible correctness bug, not a cleanup).
Reviewed-by: Simon Horman <horms@kernel.org>
^ permalink raw reply
* Re: [Intel-wired-lan] [PATCH iwl-next v2 08/10] ice: program ACL entry
From: Marcin Szycik @ 2026-04-13 10:57 UTC (permalink / raw)
To: Loktionov, Aleksandr, intel-wired-lan@lists.osuosl.org
Cc: netdev@vger.kernel.org, Penigalapati, Sandeep, S, Ananth,
alexander.duyck@gmail.com, Cao, Chinh T, Nguyen, Anthony L
In-Reply-To: <IA3PR11MB898666B7AB0330B3C29DC630E5582@IA3PR11MB8986.namprd11.prod.outlook.com>
On 09.04.2026 15:35, Loktionov, Aleksandr wrote:
>
>
>> -----Original Message-----
>> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf
>> Of Marcin Szycik
>> Sent: Thursday, April 9, 2026 2:00 PM
>> To: intel-wired-lan@lists.osuosl.org
>> Cc: netdev@vger.kernel.org; Penigalapati, Sandeep
>> <sandeep.penigalapati@intel.com>; S, Ananth <ananth.s@intel.com>;
>> alexander.duyck@gmail.com; Marcin Szycik
>> <marcin.szycik@linux.intel.com>; Cao, Chinh T <chinh.t.cao@intel.com>;
>> Nguyen, Anthony L <anthony.l.nguyen@intel.com>
>> Subject: [Intel-wired-lan] [PATCH iwl-next v2 08/10] ice: program ACL
>> entry
>>
>> From: Real Valiquette <real.valiquette@intel.com>
>>
>> Complete the filter programming process; set the flow entry and action
>> into the scenario and write it to hardware. Configure the VSI for ACL
>> filters.
>>
>> Co-developed-by: Chinh Cao <chinh.t.cao@intel.com>
>> Signed-off-by: Chinh Cao <chinh.t.cao@intel.com>
>> Signed-off-by: Real Valiquette <real.valiquette@intel.com>
>> Co-developed-by: Tony Nguyen <anthony.l.nguyen@intel.com>
>> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
>> Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com>
>> ---
>> v2:
>> * Use plain alloc instead of devm_ for ice_flow_entry::acts
>> * Use FIELD_PREP_CONST() for ICE_ACL_RX_*_MISS_CNTR
>> * Fix wrong struct ice_acl_act_entry alloc count in
>> ice_flow_acl_add_scen_entry_sync() - was e->entry_sz, which is an
>> unrelated value
>> * Only set acts_cnt after successful allocation in
>> ice_flow_acl_add_scen_entry_sync()
>> * Return -EINVAL instead of -ENOSPC on wrong index in
>> ice_acl_scen_free_entry_idx()
>> ---
>> drivers/net/ethernet/intel/ice/ice.h | 2 +
>> drivers/net/ethernet/intel/ice/ice_acl.h | 21 +
>> .../net/ethernet/intel/ice/ice_adminq_cmd.h | 2 +
>> drivers/net/ethernet/intel/ice/ice_flow.h | 3 +
>> drivers/net/ethernet/intel/ice/ice_acl.c | 53 ++-
>> drivers/net/ethernet/intel/ice/ice_acl_ctrl.c | 251 +++++++++++
>> drivers/net/ethernet/intel/ice/ice_acl_main.c | 4 +
>> .../ethernet/intel/ice/ice_ethtool_ntuple.c | 48 ++-
>> drivers/net/ethernet/intel/ice/ice_flow.c | 395
>> ++++++++++++++++++
>> drivers/net/ethernet/intel/ice/ice_lib.c | 10 +-
>> 10 files changed, 782 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/intel/ice/ice.h
>> b/drivers/net/ethernet/intel/ice/ice.h
>> index 9e6643931022..f9a43daf04fe 100644
>> --- a/drivers/net/ethernet/intel/ice/ice.h
>> +++ b/drivers/net/ethernet/intel/ice/ice.h
>> @@ -1061,6 +1061,8 @@ void ice_aq_prep_for_event(struct ice_pf *pf,
>> struct ice_aq_task *task,
>> u16 opcode);
>> int ice_aq_wait_for_event(struct ice_pf *pf, struct ice_aq_task
>> *task,
>> unsigned long timeout);
>> +int ice_ntuple_update_list_entry(struct ice_pf *pf,
>> + struct ice_ntuple_fltr *input, int
>> fltr_idx);
>> int ice_open(struct net_device *netdev); int
>> ice_open_internal(struct net_device *netdev); int ice_stop(struct
>> net_device *netdev); diff --git
>> a/drivers/net/ethernet/intel/ice/ice_acl.h
>> b/drivers/net/ethernet/intel/ice/ice_acl.h
>> index 3a4adcf368cf..0b5651401eb7 100644
>> --- a/drivers/net/ethernet/intel/ice/ice_acl.h
>> +++ b/drivers/net/ethernet/intel/ice/ice_acl.h
>> @@ -39,6 +39,7 @@ struct ice_acl_tbl {
>> DECLARE_BITMAP(avail, ICE_AQC_ACL_ALLOC_UNITS); };
>>
>> +#define ICE_MAX_ACL_TCAM_ENTRY (ICE_AQC_ACL_TCAM_DEPTH *
>> +ICE_AQC_ACL_SLICES)
>> enum ice_acl_entry_prio {
>> ICE_ACL_PRIO_LOW = 0,
>> ICE_ACL_PRIO_NORMAL,
>> @@ -65,6 +66,11 @@ struct ice_acl_scen {
>> * participate in this scenario
>> */
>> DECLARE_BITMAP(act_mem_bitmap, ICE_AQC_MAX_ACTION_MEMORIES);
>
> ...
>
>> + /* Determine number of cascaded TCAMs */
>> + num_cscd = DIV_ROUND_UP(scen->width,
>> ICE_AQC_ACL_KEY_WIDTH_BYTES);
>> +
>> + entry_tcam = ICE_ACL_TBL_TCAM_IDX(scen->start);
>> + idx = ICE_ACL_TBL_TCAM_ENTRY_IDX(scen->start + *entry_idx);
>> +
>> + for (u8 i = 0; i < num_cscd; i++) {
>> + /* If the key spans more than one TCAM in the case of
>> cascaded
>> + * TCAMs, the key and key inverts need to be properly
>> split
>> + * among TCAMs.E.g.bytes 0 - 4 go to an index in the
>> first TCAM
> "E.g.bytes" -> "E.g. bytes"
>
>> + * and bytes 5 - 9 go to the same index in the next
>> TCAM, etc.
>> + * If the entry spans more than one TCAM in a cascaded
>> TCAM
>> + * mode, the programming of the entries in the TCAMs
>> must be in
>> + * reversed order - the TCAM entry of the rightmost TCAM
>> should
>> + * be programmed first; the TCAM entry of the leftmost
>> TCAM
>> + * should be programmed last.
>> + */
>> + offset = num_cscd - i - 1;
>> + memcpy(&buf.entry_key.val,
>> + &keys[offset * sizeof(buf.entry_key.val)],
>> + sizeof(buf.entry_key.val));
>> + memcpy(&buf.entry_key_invert.val,
>> + &inverts[offset *
>> sizeof(buf.entry_key_invert.val)],
>> + sizeof(buf.entry_key_invert.val));
>> + err = ice_aq_program_acl_entry(hw, entry_tcam + offset,
>> idx,
>> + &buf, NULL);
>> + if (err) {
>> + ice_debug(hw, ICE_DBG_ACL, "aq program acl entry
>> failed status: %d\n",
>> + err);
>> + goto out;
>> + }
>> + }
>> +
>> + err = ice_acl_prog_act(hw, scen, acts, acts_cnt, *entry_idx);
>> +
>> +out:
>> + if (err) {
>> + ice_acl_rem_entry(hw, scen, *entry_idx);
>> + *entry_idx = 0;
>> + }
>> +
>> + return err;
>> +}
>> +
>> +/**
>> + * ice_acl_prog_act - Program a scenario's action memory
>> + * @hw: pointer to the HW struct
>> + * @scen: scenario to add the entry to
>> + * @acts: pointer to a buffer containing formatted actions
>> + * @acts_cnt: indicates the number of actions stored in "acts"
>> + * @entry_idx: scenario relative index of the added flow entry
>> + *
>> + * Return: 0 on success, negative on error */ int
>> +ice_acl_prog_act(struct ice_hw *hw, struct ice_acl_scen *scen,
>> + struct ice_acl_act_entry *acts, u8 acts_cnt, u16
>> entry_idx) {
>> + u8 entry_tcam, num_cscd, i, actx_idx = 0;
>> + struct ice_aqc_actpair act_buf = {};
>> + int err = 0;
>> + u16 idx;
>> +
>> + if (entry_idx >= scen->num_entry)
>> + return -ENOSPC;
>> +
>> + /* Determine number of cascaded TCAMs */
>> + num_cscd = DIV_ROUND_UP(scen->width,
>> ICE_AQC_ACL_KEY_WIDTH_BYTES);
>> +
>> + entry_tcam = ICE_ACL_TBL_TCAM_IDX(scen->start);
>> + idx = ICE_ACL_TBL_TCAM_ENTRY_IDX(scen->start + entry_idx);
>> +
>> + for_each_set_bit(i, scen->act_mem_bitmap,
>> ICE_AQC_MAX_ACTION_MEMORIES) {
>> + struct ice_acl_act_mem *mem = &hw->acl_tbl->act_mems[i];
>> +
>> + if (actx_idx >= acts_cnt)
>> + break;
>> + if (mem->member_of_tcam >= entry_tcam &&
>> + mem->member_of_tcam < entry_tcam + num_cscd) {
>> + memcpy(&act_buf.act[0], &acts[actx_idx],
>> + sizeof(struct ice_acl_act_entry));
>> +
>> + if (++actx_idx < acts_cnt) {
>> + memcpy(&act_buf.act[1], &acts[actx_idx],
>> + sizeof(struct ice_acl_act_entry));
>> + }
>> +
>> + err = ice_aq_program_actpair(hw, i, idx,
>> &act_buf,
>> + NULL);
>> + if (err) {
>> + ice_debug(hw, ICE_DBG_ACL, "program actpair
>> failed status: %d\n",
>> + err);
>> + break;
>> + }
>> + actx_idx++;
>> + }
>> + }
>> +
>> + if (!err && actx_idx < acts_cnt)
>> + err = -ENOSPC;
>> +
>> + return err;
>> +}
>> +
>> +/**
>> + * ice_acl_rem_entry - Remove a flow entry from an ACL scenario
>> + * @hw: pointer to the HW struct
>> + * @scen: scenario to remove the entry from
>> + * @entry_idx: the scenario-relative index of the flow entry being
>> +removed
>> + *
>> + * Return: 0 on success, negative on error */ int
>> +ice_acl_rem_entry(struct ice_hw *hw, struct ice_acl_scen *scen,
>> + u16 entry_idx)
>> +{
>> + struct ice_aqc_actpair act_buf = {};
>> + struct ice_aqc_acl_data buf;
>> + u8 entry_tcam, num_cscd, i;
>> + int err = 0;
>> + u16 idx;
>> +
>> + if (!scen)
>> + return -ENOENT;
>> +
>> + if (entry_idx >= scen->num_entry)
>> + return -ENOSPC;
>> +
>> + if (!test_bit(entry_idx, scen->entry_bitmap))
>> + return -ENOENT;
>> +
>> + /* Determine number of cascaded TCAMs */
>> + num_cscd = DIV_ROUND_UP(scen->width,
>> ICE_AQC_ACL_KEY_WIDTH_BYTES);
>> +
>> + entry_tcam = ICE_ACL_TBL_TCAM_IDX(scen->start);
>> + idx = ICE_ACL_TBL_TCAM_ENTRY_IDX(scen->start + entry_idx);
>> +
>> + /* invalidate the flow entry */
>> + memset(&buf, 0, sizeof(buf));
>> + for (i = 0; i < num_cscd; i++) {
>> + err = ice_aq_program_acl_entry(hw, entry_tcam + i, idx,
>> &buf,
>> + NULL);
>> + if (err)
>> + ice_debug(hw, ICE_DBG_ACL, "AQ program ACL entry
>> failed status: %d\n",
>> + err);
>> + }
>> +
>> + for_each_set_bit(i, scen->act_mem_bitmap,
>> ICE_AQC_MAX_ACTION_MEMORIES) {
>> + struct ice_acl_act_mem *mem = &hw->acl_tbl->act_mems[i];
>> +
>> + if (mem->member_of_tcam >= entry_tcam &&
>> + mem->member_of_tcam < entry_tcam + num_cscd) {
>> + /* Invalidate allocated action pairs */
>> + err = ice_aq_program_actpair(hw, i, idx,
>> &act_buf,
>> + NULL);
>> + if (err)
>> + ice_debug(hw, ICE_DBG_ACL, "program actpair
>> failed status: %d\n",
>> + err);
>> + }
>> + }
>> +
>> + ice_acl_scen_free_entry_idx(scen, entry_idx);
>> +
>> + return err;
>> +}
>> diff --git a/drivers/net/ethernet/intel/ice/ice_acl_main.c
>> b/drivers/net/ethernet/intel/ice/ice_acl_main.c
>> index 53cca0526756..16228be574ed 100644
>> --- a/drivers/net/ethernet/intel/ice/ice_acl_main.c
>> +++ b/drivers/net/ethernet/intel/ice/ice_acl_main.c
>> @@ -280,6 +280,10 @@ int ice_acl_add_rule_ethtool(struct ice_vsi *vsi,
>> struct ethtool_rxnfc *cmd)
>> hw_prof->entry_h[hw_prof->cnt++][0] = entry_h;
>> }
>>
>> + input->acl_fltr = true;
>> + /* input struct is added to the HW filter list */
>> + ice_ntuple_update_list_entry(pf, input, fsp->location);
>> +
>> return 0;
>>
>> free_input:
>> diff --git a/drivers/net/ethernet/intel/ice/ice_ethtool_ntuple.c
>> b/drivers/net/ethernet/intel/ice/ice_ethtool_ntuple.c
>> index 3e79c0bf40f4..21d4f4e3a1d0 100644
>> --- a/drivers/net/ethernet/intel/ice/ice_ethtool_ntuple.c
>> +++ b/drivers/net/ethernet/intel/ice/ice_ethtool_ntuple.c
>> @@ -1791,6 +1791,21 @@ void ice_vsi_manage_fdir(struct ice_vsi *vsi,
>> bool ena)
>> mutex_unlock(&hw->fdir_fltr_lock);
>> }
>>
>> +/**
>> + * ice_del_acl_ethtool - delete an ACL rule entry
>> + * @hw: pointer to HW instance
>> + * @fltr: filter structure
>> + *
>> + * Return: 0 on success, negative on error */ static int
>> +ice_del_acl_ethtool(struct ice_hw *hw, struct ice_ntuple_fltr *fltr)
>> {
>> + u64 entry;
>> +
>> + entry = ice_flow_find_entry(hw, ICE_BLK_ACL, fltr->fltr_id);
>> + return ice_flow_rem_entry(hw, ICE_BLK_ACL, entry); }
>> +
>> /**
>> * ice_fdir_do_rem_flow - delete flow and possibly add perfect flow
>> * @pf: PF structure
>> @@ -1824,7 +1839,7 @@ ice_fdir_do_rem_flow(struct ice_pf *pf, enum
>> ice_fltr_ptype flow_type)
>> *
>> * Return: 0 on success and negative on errors
>> */
>> -static int
>> +int
>> ice_ntuple_update_list_entry(struct ice_pf *pf, struct
>> ice_ntuple_fltr *input,
>> int fltr_idx)
>> {
>> @@ -1843,13 +1858,36 @@ ice_ntuple_update_list_entry(struct ice_pf
>> *pf, struct ice_ntuple_fltr *input,
>>
>> old_fltr = ice_fdir_find_fltr_by_idx(hw, fltr_idx);
>> if (old_fltr) {
>> - err = ice_fdir_write_all_fltr(pf, old_fltr, false);
>> - if (err)
>> - return err;
>> + if (old_fltr->acl_fltr) {
>> + /* ACL filter - if the input buffer is present
>> + * then this is an update and we don't want to
>> + * delete the filter from the HW. We've already
>> + * written the change to the HW at this point, so
>> + * just update the SW structures to make sure
>> + * everything is hunky-dory. If no input then
>> this
>> + * is a delete so we should delete the filter
>> from
>> + * the HW and clean up our SW structures.
>> + */
>> + if (!input) {
>> + err = ice_del_acl_ethtool(hw, old_fltr);
>> + if (err)
>> + return err;
>> + }
>> + } else {
>> + /* FD filter */
>> + err = ice_fdir_write_all_fltr(pf, old_fltr,
>> false);
>> + if (err)
>> + return err;
>> + }
>> +
>> ice_fdir_update_cntrs(hw, old_fltr->flow_type, false,
>> false);
>> /* update sb-filters count, specific to ring->channel */
>> ice_update_per_q_fltr(vsi, old_fltr->orig_q_index,
>> false);
>> - if (!input && !hw->fdir_fltr_cnt[old_fltr->flow_type])
>> + /* Also delete the HW filter info if we have just
>> deleted the
>> + * last filter of flow_type.
>> + */
>> + if (!old_fltr->acl_fltr && !input &&
>> + !hw->fdir_fltr_cnt[old_fltr->flow_type])
>> /* we just deleted the last filter of flow_type
>> so we
>> * should also delete the HW filter info.
>> */
>> diff --git a/drivers/net/ethernet/intel/ice/ice_flow.c
>> b/drivers/net/ethernet/intel/ice/ice_flow.c
>> index dce6d2ffcb15..144d8326d4f9 100644
>> --- a/drivers/net/ethernet/intel/ice/ice_flow.c
>> +++ b/drivers/net/ethernet/intel/ice/ice_flow.c
>> @@ -1744,6 +1744,16 @@ static int ice_flow_rem_entry_sync(struct
>> ice_hw *hw, enum ice_block blk,
>> return -EINVAL;
>>
>> if (blk == ICE_BLK_ACL) {
>> + int err;
>> +
>> + if (!entry->prof)
>> + return -EINVAL;
>> +
>> + err = ice_acl_rem_entry(hw, entry->prof->cfg.scen,
>> + entry->scen_entry_idx);
>> + if (err)
>> + return err;
>> +
>> if (entry->acts_cnt && entry->acts)
>> ice_flow_acl_free_act_cntr(hw, entry->acts,
>> entry->acts_cnt);
>> @@ -1879,10 +1889,34 @@ ice_flow_rem_prof_sync(struct ice_hw *hw, enum
>> ice_block blk,
>> }
>>
>> if (blk == ICE_BLK_ACL) {
>> + struct ice_aqc_acl_prof_generic_frmt buf;
>> + u8 prof_id = 0;
>> +
>> /* Disassociate the scenario from the profile for the PF
>> */
>> status = ice_flow_acl_disassoc_scen(hw, prof);
>> if (status)
>> return status;
>> +
>> + status = ice_flow_get_hw_prof(hw, blk, prof->id,
>> &prof_id);
>> + if (status)
>> + return status;
>> +
>> + status = ice_query_acl_prof(hw, prof_id, &buf, NULL);
>> + if (status)
>> + return status;
>> +
>> + /* Clear the range-checker if the profile ID is no
>> longer
>> + * used by any PF
>> + */
>> + if (!ice_flow_acl_is_prof_in_use(&buf)) {
>> + /* Clear the range-checker value for profile ID
>> */
>> + struct ice_aqc_acl_profile_ranges query_rng_buf =
>> {};
>> +
>> + status = ice_prog_acl_prof_ranges(hw, prof_id,
>> + &query_rng_buf,
>> NULL);
>> + if (status)
>> + return status;
>> + }
>> }
>>
>> /* Remove all hardware profiles associated with this flow
>> profile */ @@ -2214,6 +2248,44 @@ int ice_flow_rem_prof(struct ice_hw
>> *hw, enum ice_block blk, u64 prof_id)
>> return status;
>> }
>>
>> +/**
>> + * ice_flow_find_entry - look for a flow entry using its unique ID
>> + * @hw: pointer to the HW struct
>> + * @blk: classification stage
>> + * @entry_id: unique ID to identify this flow entry
>> + *
>> + * Look for the flow entry with the specified unique ID in all flow
>> +profiles of
>> + * the specified classification stage.
>> + *
>> + * Return: flow entry handle if entry found, ICE_FLOW_ENTRY_ID_INVAL
>> +otherwise */
>> +u64 ice_flow_find_entry(struct ice_hw *hw, enum ice_block blk, u64
>> +entry_id) {
>> + struct ice_flow_entry *found = NULL;
>> + struct ice_flow_prof *p;
>> +
>> + mutex_lock(&hw->fl_profs_locks[blk]);
>> +
>> + list_for_each_entry(p, &hw->fl_profs[blk], l_entry) {
>> + struct ice_flow_entry *e;
>> +
>> + mutex_lock(&p->entries_lock);
>> + list_for_each_entry(e, &p->entries, l_entry)
>> + if (e->id == entry_id) {
>> + found = e;
>> + break;
>> + }
>> + mutex_unlock(&p->entries_lock);
>> +
>> + if (found)
>> + break;
>> + }
>> +
>> + mutex_unlock(&hw->fl_profs_locks[blk]);
>> +
>> + return found ? ICE_FLOW_ENTRY_HNDL(found) :
>> +ICE_FLOW_ENTRY_HANDLE_INVAL; }
>> +
>> /**
>> * ice_flow_acl_check_actions - Checks the ACL rule's actions
>> * @hw: pointer to the hardware structure @@ -2541,6 +2613,325 @@
>> static int ice_flow_acl_frmt_entry(struct ice_hw *hw,
>>
>> return err;
>> }
>> +
>> +/**
>> + * ice_flow_acl_find_scen_entry_cond - Find an ACL scenario entry
>> that matches
>> + * the compared data
>> + * @prof: pointer to flow profile
>> + * @e: pointer to the comparing flow entry
>> + * @do_chg_action: decide if we want to change the ACL action
>> + * @do_add_entry: decide if we want to add the new ACL entry
>> + * @do_rem_entry: decide if we want to remove the current ACL entry
>> + *
>> + * Find an ACL scenario entry that matches the compared data. Also
>> figure out:
>> + * a) If we want to change the ACL action
>> + * b) If we want to add the new ACL entry
>> + * c) If we want to remove the current ACL entry
>> + *
>> + * Return: ACL scenario entry, or NULL if not found */ static struct
>> +ice_flow_entry * ice_flow_acl_find_scen_entry_cond(struct
>> ice_flow_prof
>> +*prof,
>> + struct ice_flow_entry *e, bool
>> *do_chg_action,
>> + bool *do_add_entry, bool *do_rem_entry) {
>> + struct ice_flow_entry *p, *return_entry = NULL;
>> +
>> + /* Check if:
>> + * a) There exists an entry with same matching data, but
>> different
>> + * priority, then we remove this existing ACL entry. Then,
>> we
>> + * will add the new entry to the ACL scenario.
>> + * b) There exists an entry with same matching data, priority,
>> and
>> + * result action, then we do nothing
>> + * c) There exists an entry with same matching data, priority,
>> but
>> + * different, action, then do only change the action's
>> entry.
> Too much of commas, please reduce the number.
>
>
>> + * d) Else, we add this new entry to the ACL scenario.
>> + */
>> + *do_chg_action = false;
>> + *do_add_entry = true;
>> + *do_rem_entry = false;
>> + list_for_each_entry(p, &prof->entries, l_entry) {
>> + if (memcmp(p->entry, e->entry, p->entry_sz))
>> + continue;
>> +
>> + /* From this point, we have the same matching_data. */
>> + *do_add_entry = false;
>> + return_entry = p;
>> +
>> + if (p->priority != e->priority) {
>> + /* matching data && !priority */
>> + *do_add_entry = true;
>> + *do_rem_entry = true;
>> + break;
>> + }
>> +
>> + /* From this point, we will have matching_data &&
>> priority */
>> + if (p->acts_cnt != e->acts_cnt)
>> + *do_chg_action = true;
>> + for (int i = 0; i < p->acts_cnt; i++) {
>> + bool found_not_match = false;
>> +
>> + for (int j = 0; j < e->acts_cnt; j++)
>> + if (memcmp(&p->acts[i], &e->acts[j],
>> + sizeof(struct ice_flow_action)))
> Due to comment above it should be if (!memcmp(&p->acts[i], &e->acts[j],
> Please fix the comment or code.
I believe the logic is fine, but naming/comments might be a bit confusing.
I'll try to make it more readable.
Thanks,
Marcin
> Otherwise, it looks good.
> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
>
>> {
>> + found_not_match = true;
>> + break;
>
> ...
>
>> }
>>
>> /**
>> --
>> 2.49.0
>
^ permalink raw reply
* Re: [PATCH net-next v6 1/2] net: hsr: require valid EOT supervision TLV
From: Felix Maurer @ 2026-04-13 10:57 UTC (permalink / raw)
To: luka.gejak; +Cc: davem, edumazet, kuba, pabeni, netdev, horms
In-Reply-To: <20260413103449.169913-2-luka.gejak@linux.dev>
On Mon, Apr 13, 2026 at 12:34:48PM +0200, luka.gejak@linux.dev wrote:
> From: Luka Gejak <luka.gejak@linux.dev>
>
> Supervision frames are only valid if terminated with a zero-length EOT
> TLV. The current check fails to reject non-EOT entries as the terminal
> TLV, potentially allowing malformed supervision traffic.
>
> Fix this by strictly requiring the terminal TLV to be HSR_TLV_EOT
> with a length of zero, and properly linearizing the TLV header before
> access.
>
> Assisted-by: Gemini:Gemini-3.1-flash
> Signed-off-by: Luka Gejak <luka.gejak@linux.dev>
Please respect the netdev development process [1], net-next is currently
closed. I'll leave it to the maintainers if a resubmission is required.
Other than that:
Reviewed-by: Felix Maurer <fmaurer@redhat.com>
Thanks,
Felix
[1]: https://docs.kernel.org/process/maintainer-netdev.html#git-trees-and-patch-flow
> ---
> net/hsr/hsr_forward.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/net/hsr/hsr_forward.c b/net/hsr/hsr_forward.c
> index 0aca859c88cb..0774981a65c1 100644
> --- a/net/hsr/hsr_forward.c
> +++ b/net/hsr/hsr_forward.c
> @@ -84,7 +84,7 @@ static bool is_supervision_frame(struct hsr_priv *hsr, struct sk_buff *skb)
>
> /* Get next tlv */
> total_length += hsr_sup_tag->tlv.HSR_TLV_length;
> - if (!pskb_may_pull(skb, total_length))
> + if (!pskb_may_pull(skb, total_length + sizeof(struct hsr_sup_tlv)))
> return false;
> skb_pull(skb, total_length);
> hsr_sup_tlv = (struct hsr_sup_tlv *)skb->data;
> @@ -100,7 +100,7 @@ static bool is_supervision_frame(struct hsr_priv *hsr, struct sk_buff *skb)
>
> /* make sure another tlv follows */
> total_length += sizeof(struct hsr_sup_tlv) + hsr_sup_tlv->HSR_TLV_length;
> - if (!pskb_may_pull(skb, total_length))
> + if (!pskb_may_pull(skb, total_length + sizeof(struct hsr_sup_tlv)))
> return false;
>
> /* get next tlv */
> @@ -110,7 +110,7 @@ static bool is_supervision_frame(struct hsr_priv *hsr, struct sk_buff *skb)
> }
>
> /* end of tlvs must follow at the end */
> - if (hsr_sup_tlv->HSR_TLV_type == HSR_TLV_EOT &&
> + if (hsr_sup_tlv->HSR_TLV_type != HSR_TLV_EOT ||
> hsr_sup_tlv->HSR_TLV_length != 0)
> return false;
>
> --
> 2.53.0
>
^ permalink raw reply
* Re: [PATCH v11 net-next 4/7] devlink: Implement devlink param multi attribute nested data values
From: Ratheesh Kannoth @ 2026-04-13 11:00 UTC (permalink / raw)
To: Paolo Abeni
Cc: netdev, linux-kernel, linux-rdma, sgoutham, andrew+netdev, davem,
edumazet, kuba, donald.hunter, horms, jiri, chuck.lever, matttbe,
cjubran, saeedm, leon, tariqt, mbloch, dtatulea
In-Reply-To: <b52ce943-18f7-4402-8b6a-3d9f69bf7d19@redhat.com>
On 2026-04-13 at 16:24:41, Paolo Abeni (pabeni@redhat.com) wrote:
> On 4/9/26 4:50 AM, Ratheesh Kannoth wrote:
> > @@ -441,6 +448,7 @@ union devlink_param_value {
> > u64 vu64;
> > char vstr[__DEVLINK_PARAM_MAX_STRING_VALUE];
> > bool vbool;
> > + struct devlink_param_u64_array u64arr;
>
> You mentioned that you intend to handle the possible CONFIG_FRAME_WARN
> with a separate patch. IMHO such patch need to be part of this series,
> or things will stay broken for an undefined amount of time until such
> patch is merged separatelly.
Patch no: 3 in the same series.
https://lore.kernel.org/netdev/20260409025055.1664053-4-rkannoth@marvell.com/#t
>
> /P
>
^ permalink raw reply
* [PATCH v2] net/sched: sch_cake: fix NAT destination port not being updated in cake_update_flowkeys
From: Dudu Lu @ 2026-04-13 11:00 UTC (permalink / raw)
To: netdev; +Cc: toke, jhs, jiri, Dudu Lu
cake_update_flowkeys() is supposed to update the flow dissector keys
with the NAT-translated addresses and ports from conntrack, so that
CAKE's per-flow fairness correctly identifies post-NAT flows as
belonging to the same connection.
For the source port, this works correctly:
keys->ports.src = port;
But for the destination port, the assignment is reversed:
port = keys->ports.dst;
This means the NAT destination port is never updated in the flow keys.
As a result, when multiple connections are NATed to the same destination,
CAKE treats them as separate flows because the original (pre-NAT)
destination ports differ. This breaks CAKE's NAT-aware flow isolation
when using the "nat" mode.
The bug was introduced in commit b0c19ed6088a ("sch_cake: Take advantage
of skb->hash where appropriate") which refactored the original direct
assignment into a compare-and-conditionally-update pattern, but wrote
the destination port update backwards.
Fix by reversing the assignment direction to match the source port
pattern.
Fixes: b0c19ed6088a ("sch_cake: Take advantage of skb->hash where appropriate")
Signed-off-by: Dudu Lu <phx0fer@gmail.com>
---
net/sched/sch_cake.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/sched/sch_cake.c b/net/sched/sch_cake.c
index 9efe23f8371b..4ac6c36ca6e4 100644
--- a/net/sched/sch_cake.c
+++ b/net/sched/sch_cake.c
@@ -619,7 +619,7 @@ static bool cake_update_flowkeys(struct flow_keys *keys,
}
port = rev ? tuple.src.u.all : tuple.dst.u.all;
if (port != keys->ports.dst) {
- port = keys->ports.dst;
+ keys->ports.dst = port;
upd = true;
}
}
--
2.39.3 (Apple Git-145)
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox