* Re: [PATCH v8 05/14] remoteproc: qcom_q6v5_mss: Switch to generic PAS TZ APIs
From: Konrad Dybcio @ 2026-06-30 12:35 UTC (permalink / raw)
To: Sumit Garg, andersson
Cc: linux-arm-msm, dri-devel, freedreno, linux-media, netdev,
linux-wireless, ath12k, linux-remoteproc, konradybcio, robh,
krzk+dt, conor+dt, robin.clark, sean, akhilpo, lumag,
abhinav.kumar, jesszhan0024, marijn.suijten, airlied, simona,
vikash.garodia, bod, mchehab, elder, andrew+netdev, davem,
edumazet, kuba, pabeni, jjohnson, mathieu.poirier,
trilokkumar.soni, mukesh.ojha, pavan.kondeti, jorge.ramirez,
tonyh, vignesh.viswanathan, srinivas.kandagatla, amirreza.zarrabi,
jens.wiklander, op-tee, apurupa, skare, linux-kernel, Sumit Garg
In-Reply-To: <20260626133440.692849-6-sumit.garg@kernel.org>
On 6/26/26 3:34 PM, Sumit Garg wrote:
> From: Sumit Garg <sumit.garg@oss.qualcomm.com>
>
> Switch qcom_q6v5_mss client driver over to generic PAS TZ APIs. Generic PAS
> TZ service allows to support multiple TZ implementation backends like QTEE
> based SCM PAS service, OP-TEE based PAS service and any further future TZ
> backend service.
>
> Reviewed-by: Mukesh Ojha <mukesh.ojha@oss.qualcomm.com>
> Tested-by: Mukesh Ojha <mukesh.ojha@oss.qualcomm.com> # Lemans
> Signed-off-by: Sumit Garg <sumit.garg@oss.qualcomm.com>
> ---
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Konrad
^ permalink raw reply
* Re: [PATCH v8 04/14] remoteproc: qcom_q6v5_pas: Switch over to generic PAS TZ APIs
From: Konrad Dybcio @ 2026-06-30 12:34 UTC (permalink / raw)
To: Sumit Garg, andersson
Cc: linux-arm-msm, dri-devel, freedreno, linux-media, netdev,
linux-wireless, ath12k, linux-remoteproc, konradybcio, robh,
krzk+dt, conor+dt, robin.clark, sean, akhilpo, lumag,
abhinav.kumar, jesszhan0024, marijn.suijten, airlied, simona,
vikash.garodia, bod, mchehab, elder, andrew+netdev, davem,
edumazet, kuba, pabeni, jjohnson, mathieu.poirier,
trilokkumar.soni, mukesh.ojha, pavan.kondeti, jorge.ramirez,
tonyh, vignesh.viswanathan, srinivas.kandagatla, amirreza.zarrabi,
jens.wiklander, op-tee, apurupa, skare, linux-kernel, Sumit Garg
In-Reply-To: <20260626133440.692849-5-sumit.garg@kernel.org>
On 6/26/26 3:34 PM, Sumit Garg wrote:
> From: Sumit Garg <sumit.garg@oss.qualcomm.com>
>
> Switch qcom_q6v5_pas client driver over to generic PAS TZ APIs. Generic PAS
> TZ service allows to support multiple TZ implementation backends like QTEE
> based SCM PAS service, OP-TEE based PAS service and any further future TZ
> backend service.
>
> Since qcom_q6v5_pas depends on MDT loader for PAS firmware loading, it
> has to be switched over to generic PAS APIs in this commit to avoid any
> build issues.
>
> Reviewed-by: Mukesh Ojha <mukesh.ojha@oss.qualcomm.com>
> Tested-by: Mukesh Ojha <mukesh.ojha@oss.qualcomm.com> # Lemans
> Tested-by: Vignesh Viswanathan <vignesh.viswanathan@oss.qualcomm.com> # IPQ9650
> Signed-off-by: Sumit Garg <sumit.garg@oss.qualcomm.com>
> ---
I assume that the leftover qcom_scm_assign_mem() will be handled
in a separate effort, presumably through something like FF-A lend
on the backend
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Konrad
^ permalink raw reply
* [PATCH net 3/3] selftests/bpf: Add test for redirect from qdisc qevent block
From: Daniel Borkmann @ 2026-06-30 12:33 UTC (permalink / raw)
To: kuba; +Cc: pabeni, jhs, bigeasy, andrii, memxor, bpf, netdev
In-Reply-To: <20260630123331.186840-1-daniel@iogearbox.net>
Add a regression test for the NULL current->bpf_net_context deref hit
when a BPF classifier attached to a qdisc qevent block asks for a
redirect. The classifier runs from tcf_qevent_handle() on the qdisc
enqueue path, outside any bpf_net_context.
# LDLIBS=-static PKG_CONFIG='pkg-config --static' ./vmtest.sh -- ./test_progs -t qevent
[...]
+ /etc/rcS.d/S50-startup
./test_progs -t qevent
#496/1 tc_qevent/redirect_verdict:OK
#496/2 tc_qevent/redirect_helper:OK
#496 tc_qevent:OK
Summary: 1/2 PASSED, 0 SKIPPED, 0 FAILED
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
---
tools/testing/selftests/bpf/config | 1 +
.../selftests/bpf/prog_tests/tc_qevent.c | 113 ++++++++++++++++++
.../selftests/bpf/progs/test_tc_qevent.c | 23 ++++
3 files changed, 137 insertions(+)
create mode 100644 tools/testing/selftests/bpf/prog_tests/tc_qevent.c
create mode 100644 tools/testing/selftests/bpf/progs/test_tc_qevent.c
diff --git a/tools/testing/selftests/bpf/config b/tools/testing/selftests/bpf/config
index adb25146e88c..ea7044f30adc 100644
--- a/tools/testing/selftests/bpf/config
+++ b/tools/testing/selftests/bpf/config
@@ -82,6 +82,7 @@ CONFIG_NET_SCH_BPF=y
CONFIG_NET_SCH_FQ=y
CONFIG_NET_SCH_INGRESS=y
CONFIG_NET_SCH_HTB=y
+CONFIG_NET_SCH_RED=y
CONFIG_NET_SCHED=y
CONFIG_NETDEVSIM=y
CONFIG_NETFILTER=y
diff --git a/tools/testing/selftests/bpf/prog_tests/tc_qevent.c b/tools/testing/selftests/bpf/prog_tests/tc_qevent.c
new file mode 100644
index 000000000000..67e1d17567ab
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/tc_qevent.c
@@ -0,0 +1,113 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <test_progs.h>
+#include <network_helpers.h>
+#include <sys/socket.h>
+#include <netinet/in.h>
+#include <arpa/inet.h>
+#include <unistd.h>
+#include <string.h>
+
+#include "test_tc_qevent.skel.h"
+
+#define NS_TX "tc_qevent_tx"
+#define NS_RX "tc_qevent_rx"
+#define IP_TX "10.255.0.1"
+#define IP_RX "10.255.0.2"
+#define PIN_PATH "/sys/fs/bpf/tc_qevent_redirect"
+
+static void blast_udp(void)
+{
+ struct sockaddr_in dst = {};
+ char buf[1400] = {};
+ int fd, i;
+
+ fd = socket(AF_INET, SOCK_DGRAM, 0);
+ if (!ASSERT_GE(fd, 0, "udp socket"))
+ return;
+
+ dst.sin_family = AF_INET;
+ dst.sin_port = htons(12345);
+ inet_pton(AF_INET, IP_RX, &dst.sin_addr);
+
+ /*
+ * Push far more than the RED queue can hold. Once qavg crosses qth_min
+ * every further packet hits the congestion_drop / early_drop qevent.
+ */
+ for (i = 0; i < 50000; i++)
+ sendto(fd, buf, sizeof(buf), MSG_DONTWAIT,
+ (struct sockaddr *)&dst, sizeof(dst));
+
+ close(fd);
+}
+
+static void run_qevent_redirect(struct bpf_program *prog, __u64 *counter)
+{
+ struct nstoken *tok = NULL;
+ int err;
+
+ SYS_NOFAIL("ip netns del %s", NS_TX);
+ SYS_NOFAIL("ip netns del %s", NS_RX);
+ unlink(PIN_PATH);
+
+ err = bpf_program__pin(prog, PIN_PATH);
+ if (!ASSERT_OK(err, "pin prog"))
+ return;
+
+ SYS(unpin, "ip netns add %s", NS_TX);
+ SYS(del_tx, "ip netns add %s", NS_RX);
+ SYS(del_rx, "ip -n %s link add veth0 type veth peer name veth1 netns %s", NS_TX, NS_RX);
+ SYS(del_rx, "ip -n %s addr add %s/24 dev veth0", NS_TX, IP_TX);
+ SYS(del_rx, "ip -n %s link set veth0 up", NS_TX);
+ SYS(del_rx, "ip -n %s addr add %s/24 dev veth1", NS_RX, IP_RX);
+ SYS(del_rx, "ip -n %s link set veth1 up", NS_RX);
+
+ tok = open_netns(NS_TX);
+ if (!ASSERT_OK_PTR(tok, "open_netns"))
+ goto del_rx;
+
+ SYS(close_ns, "tc qdisc add dev veth0 root handle 1: htb default 1");
+ SYS(close_ns, "tc class add dev veth0 parent 1: classid 1:1 htb rate 1mbit ceil 1mbit");
+
+ if (system("tc qdisc add dev veth0 parent 1:1 handle 11: red "
+ "limit 500000 avpkt 1000 probability 1 min 5000 max 6000 "
+ "burst 6 qevent early_drop block 10 2>/dev/null")) {
+ test__skip();
+ goto close_ns;
+ }
+
+ if (system("tc filter add block 10 bpf da object-pinned "
+ PIN_PATH " 2>/dev/null")) {
+ test__skip();
+ goto close_ns;
+ }
+
+ blast_udp();
+ ASSERT_GT(*counter, 0, "qevent classifier ran");
+close_ns:
+ close_netns(tok);
+del_rx:
+ SYS_NOFAIL("ip netns del %s", NS_RX);
+del_tx:
+ SYS_NOFAIL("ip netns del %s", NS_TX);
+unpin:
+ bpf_program__unpin(prog, PIN_PATH);
+}
+
+void test_tc_qevent(void)
+{
+ struct test_tc_qevent *skel;
+
+ skel = test_tc_qevent__open_and_load();
+ if (!ASSERT_OK_PTR(skel, "open_and_load"))
+ return;
+
+ if (test__start_subtest("redirect_verdict"))
+ run_qevent_redirect(skel->progs.qevent_redirect_verdict,
+ &skel->bss->verdict_calls);
+ if (test__start_subtest("redirect_helper"))
+ run_qevent_redirect(skel->progs.qevent_redirect_helper,
+ &skel->bss->helper_calls);
+
+ test_tc_qevent__destroy(skel);
+}
diff --git a/tools/testing/selftests/bpf/progs/test_tc_qevent.c b/tools/testing/selftests/bpf/progs/test_tc_qevent.c
new file mode 100644
index 000000000000..1529c111f4aa
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_tc_qevent.c
@@ -0,0 +1,23 @@
+// SPDX-License-Identifier: GPL-2.0
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+
+int redirect_ifindex = 1;
+__u64 verdict_calls = 0;
+__u64 helper_calls = 0;
+
+SEC("tc")
+int qevent_redirect_verdict(struct __sk_buff *skb)
+{
+ __sync_fetch_and_add(&verdict_calls, 1);
+ return TCX_REDIRECT;
+}
+
+SEC("tc")
+int qevent_redirect_helper(struct __sk_buff *skb)
+{
+ __sync_fetch_and_add(&helper_calls, 1);
+ return bpf_redirect(redirect_ifindex, 0);
+}
+
+char _license[] SEC("license") = "GPL";
--
2.43.0
^ permalink raw reply related
* [PATCH net 2/3] net/sched: Handle TC_ACT_REDIRECT from qdisc filter chains
From: Daniel Borkmann @ 2026-06-30 12:33 UTC (permalink / raw)
To: kuba; +Cc: pabeni, jhs, bigeasy, andrii, memxor, bpf, netdev,
Victor Nogueira
In-Reply-To: <20260630123331.186840-1-daniel@iogearbox.net>
From: Jamal Hadi Salim <jhs@mojatatu.com>
When a TC filter attached to a qdisc filter chain returns
TC_ACT_REDIRECT (ex: via an eBPF program calling bpf_redirect() or an
act_bpf action), the redirect was silently lost i.e no qdisc classify
function handled TC_ACT_REDIRECT, so the packet fell through the
switch and was enqueued normally instead of being redirected.
This has been broken since bpf_redirect() was introduced for TC in
commit 27b29f63058d ("bpf: add bpf_redirect() helper"). We got lucky
for a long time because bpf_net_context was a per-CPU variable that
was always available.
commit 401cb7dae813 ("net: Reference bpf_redirect_info via task_struct
on PREEMPT_RT.") turned bpf_net_context into a task_struct member that
is only set up by explicit callers. Without a caller setting it up,
bpf_redirect() itself crashes with a NULL pointer dereference in
bpf_net_ctx_get_ri(). However, even with bpf_net_context available,
TC_ACT_REDIRECT from qdisc filter chains cannot be honored without
adding skb_do_redirect() calls to every qdisc classify function, which
would require changes across net/sched/. Isolate it to ebpf core where
it belongs.
Instead, add a tcf_classify_qdisc() inline helper in pkt_cls.h, as a
wrapper around tcf_classify() for use by qdisc classify functions and
tcf_qevent_handle(). When the classify verdict is TC_ACT_REDIRECT,
the wrapper converts it to TC_ACT_SHOT, dropping the packet rather
than letting it continue silently. Dropping is preferred over
letting the packet through because the user immediately sees packet
loss. Silently passing the packet through would hide the problem and
leave the user wondering why their redirect is not working.
The clsact fast path, tc_run() continues to call tcf_classify() directly
and is unaffected: TC_ACT_REDIRECT is returned as-is and handled by
sch_handle_egress/ingress() calling skb_do_redirect() as before.
Fixes: 27b29f63058d ("bpf: add bpf_redirect() helper")
Fixes: 401cb7dae813 ("net: Reference bpf_redirect_info via task_struct on PREEMPT_RT.")
Tested-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
---
include/net/pkt_cls.h | 14 +++++++++++++-
net/sched/cls_api.c | 4 +---
net/sched/sch_cake.c | 2 +-
net/sched/sch_drr.c | 2 +-
net/sched/sch_dualpi2.c | 2 +-
net/sched/sch_ets.c | 2 +-
net/sched/sch_fq_codel.c | 2 +-
net/sched/sch_fq_pie.c | 2 +-
net/sched/sch_hfsc.c | 2 +-
net/sched/sch_htb.c | 2 +-
net/sched/sch_multiq.c | 2 +-
net/sched/sch_prio.c | 2 +-
net/sched/sch_qfq.c | 2 +-
net/sched/sch_sfb.c | 2 +-
net/sched/sch_sfq.c | 2 +-
15 files changed, 27 insertions(+), 17 deletions(-)
diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 3bd08d7f39c1..5f5cb36439fe 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -156,8 +156,20 @@ static inline int tcf_classify(struct sk_buff *skb,
{
return TC_ACT_UNSPEC;
}
-
#endif
+static inline int tcf_classify_qdisc(struct sk_buff *skb,
+ const struct tcf_proto *tp,
+ struct tcf_result *res, bool compat_mode)
+{
+ int ret = tcf_classify(skb, NULL, tp, res, compat_mode);
+
+ /* TC_ACT_REDIRECT from qdisc filter chains is not supported.
+ * Use BPF via tcx or mirred redirect instead.
+ */
+ if (unlikely(ret == TC_ACT_REDIRECT))
+ ret = TC_ACT_SHOT;
+ return ret;
+}
static inline unsigned long
__cls_set_class(unsigned long *clp, unsigned long cl)
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index ac49ca6d9a0c..3ca56d060e28 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -4033,9 +4033,7 @@ struct sk_buff *tcf_qevent_handle(struct tcf_qevent *qe, struct Qdisc *sch, stru
fl = rcu_dereference_bh(qe->filter_chain);
- switch (tcf_classify(skb, NULL, fl, &cl_res, false)) {
- case TC_ACT_REDIRECT:
- fallthrough;
+ switch (tcf_classify_qdisc(skb, fl, &cl_res, false)) {
case TC_ACT_SHOT:
qdisc_qstats_drop(sch);
__qdisc_drop(skb, to_free);
diff --git a/net/sched/sch_cake.c b/net/sched/sch_cake.c
index a3c185505afc..94eb47ac54ee 100644
--- a/net/sched/sch_cake.c
+++ b/net/sched/sch_cake.c
@@ -1730,7 +1730,7 @@ static u32 cake_classify(struct Qdisc *sch, struct cake_tin_data **t,
goto hash;
*qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
- result = tcf_classify(skb, NULL, filter, &res, false);
+ result = tcf_classify_qdisc(skb, filter, &res, false);
if (result >= 0) {
#ifdef CONFIG_NET_CLS_ACT
diff --git a/net/sched/sch_drr.c b/net/sched/sch_drr.c
index 020657f959b5..91b1ef824afa 100644
--- a/net/sched/sch_drr.c
+++ b/net/sched/sch_drr.c
@@ -312,7 +312,7 @@ static struct drr_class *drr_classify(struct sk_buff *skb, struct Qdisc *sch,
*qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
fl = rcu_dereference_bh(q->filter_list);
- result = tcf_classify(skb, NULL, fl, &res, false);
+ result = tcf_classify_qdisc(skb, fl, &res, false);
if (result >= 0) {
#ifdef CONFIG_NET_CLS_ACT
switch (result) {
diff --git a/net/sched/sch_dualpi2.c b/net/sched/sch_dualpi2.c
index 5434df6ca8ef..98364f74211e 100644
--- a/net/sched/sch_dualpi2.c
+++ b/net/sched/sch_dualpi2.c
@@ -364,7 +364,7 @@ static int dualpi2_skb_classify(struct dualpi2_sched_data *q,
return NET_XMIT_SUCCESS;
}
- result = tcf_classify(skb, NULL, fl, &res, false);
+ result = tcf_classify_qdisc(skb, fl, &res, false);
if (result >= 0) {
#ifdef CONFIG_NET_CLS_ACT
switch (result) {
diff --git a/net/sched/sch_ets.c b/net/sched/sch_ets.c
index cb8cf437ce87..25fcf4079fec 100644
--- a/net/sched/sch_ets.c
+++ b/net/sched/sch_ets.c
@@ -391,7 +391,7 @@ static struct ets_class *ets_classify(struct sk_buff *skb, struct Qdisc *sch,
*qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
if (TC_H_MAJ(skb->priority) != sch->handle) {
fl = rcu_dereference_bh(q->filter_list);
- err = tcf_classify(skb, NULL, fl, &res, false);
+ err = tcf_classify_qdisc(skb, fl, &res, false);
#ifdef CONFIG_NET_CLS_ACT
switch (err) {
case TC_ACT_STOLEN:
diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c
index cafd1f943d99..6cce86ba383c 100644
--- a/net/sched/sch_fq_codel.c
+++ b/net/sched/sch_fq_codel.c
@@ -91,7 +91,7 @@ static unsigned int fq_codel_classify(struct sk_buff *skb, struct Qdisc *sch,
return fq_codel_hash(q, skb) + 1;
*qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
- result = tcf_classify(skb, NULL, filter, &res, false);
+ result = tcf_classify_qdisc(skb, filter, &res, false);
if (result >= 0) {
#ifdef CONFIG_NET_CLS_ACT
switch (result) {
diff --git a/net/sched/sch_fq_pie.c b/net/sched/sch_fq_pie.c
index 72f48fa4010b..069e1facd413 100644
--- a/net/sched/sch_fq_pie.c
+++ b/net/sched/sch_fq_pie.c
@@ -96,7 +96,7 @@ static unsigned int fq_pie_classify(struct sk_buff *skb, struct Qdisc *sch,
return fq_pie_hash(q, skb) + 1;
*qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
- result = tcf_classify(skb, NULL, filter, &res, false);
+ result = tcf_classify_qdisc(skb, filter, &res, false);
if (result >= 0) {
#ifdef CONFIG_NET_CLS_ACT
switch (result) {
diff --git a/net/sched/sch_hfsc.c b/net/sched/sch_hfsc.c
index 7e537295b8b6..e87f5021a199 100644
--- a/net/sched/sch_hfsc.c
+++ b/net/sched/sch_hfsc.c
@@ -1143,7 +1143,7 @@ hfsc_classify(struct sk_buff *skb, struct Qdisc *sch, int *qerr)
*qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
head = &q->root;
tcf = rcu_dereference_bh(q->root.filter_list);
- while (tcf && (result = tcf_classify(skb, NULL, tcf, &res, false)) >= 0) {
+ while (tcf && (result = tcf_classify_qdisc(skb, tcf, &res, false)) >= 0) {
#ifdef CONFIG_NET_CLS_ACT
switch (result) {
case TC_ACT_QUEUED:
diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
index 908b9ba9ba2e..fdac0dc8f35a 100644
--- a/net/sched/sch_htb.c
+++ b/net/sched/sch_htb.c
@@ -243,7 +243,7 @@ static struct htb_class *htb_classify(struct sk_buff *skb, struct Qdisc *sch,
}
*qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
- while (tcf && (result = tcf_classify(skb, NULL, tcf, &res, false)) >= 0) {
+ while (tcf && (result = tcf_classify_qdisc(skb, tcf, &res, false)) >= 0) {
#ifdef CONFIG_NET_CLS_ACT
switch (result) {
case TC_ACT_QUEUED:
diff --git a/net/sched/sch_multiq.c b/net/sched/sch_multiq.c
index 4e465d11e3d7..004f0d275caf 100644
--- a/net/sched/sch_multiq.c
+++ b/net/sched/sch_multiq.c
@@ -36,7 +36,7 @@ multiq_classify(struct sk_buff *skb, struct Qdisc *sch, int *qerr)
int err;
*qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
- err = tcf_classify(skb, NULL, fl, &res, false);
+ err = tcf_classify_qdisc(skb, fl, &res, false);
#ifdef CONFIG_NET_CLS_ACT
switch (err) {
case TC_ACT_STOLEN:
diff --git a/net/sched/sch_prio.c b/net/sched/sch_prio.c
index e4dd56a89072..79437c587e7e 100644
--- a/net/sched/sch_prio.c
+++ b/net/sched/sch_prio.c
@@ -39,7 +39,7 @@ prio_classify(struct sk_buff *skb, struct Qdisc *sch, int *qerr)
*qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
if (TC_H_MAJ(skb->priority) != sch->handle) {
fl = rcu_dereference_bh(q->filter_list);
- err = tcf_classify(skb, NULL, fl, &res, false);
+ err = tcf_classify_qdisc(skb, fl, &res, false);
#ifdef CONFIG_NET_CLS_ACT
switch (err) {
case TC_ACT_STOLEN:
diff --git a/net/sched/sch_qfq.c b/net/sched/sch_qfq.c
index cb56787e1d25..6f3b7273cb16 100644
--- a/net/sched/sch_qfq.c
+++ b/net/sched/sch_qfq.c
@@ -709,7 +709,7 @@ static struct qfq_class *qfq_classify(struct sk_buff *skb, struct Qdisc *sch,
*qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
fl = rcu_dereference_bh(q->filter_list);
- result = tcf_classify(skb, NULL, fl, &res, false);
+ result = tcf_classify_qdisc(skb, fl, &res, false);
if (result >= 0) {
#ifdef CONFIG_NET_CLS_ACT
switch (result) {
diff --git a/net/sched/sch_sfb.c b/net/sched/sch_sfb.c
index b1d465094276..ed39869199c0 100644
--- a/net/sched/sch_sfb.c
+++ b/net/sched/sch_sfb.c
@@ -260,7 +260,7 @@ static bool sfb_classify(struct sk_buff *skb, struct tcf_proto *fl,
struct tcf_result res;
int result;
- result = tcf_classify(skb, NULL, fl, &res, false);
+ result = tcf_classify_qdisc(skb, fl, &res, false);
if (result >= 0) {
#ifdef CONFIG_NET_CLS_ACT
switch (result) {
diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
index 758b88f21865..77675f9a4c46 100644
--- a/net/sched/sch_sfq.c
+++ b/net/sched/sch_sfq.c
@@ -171,7 +171,7 @@ static unsigned int sfq_classify(struct sk_buff *skb, struct Qdisc *sch,
return sfq_hash(q, skb) + 1;
*qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
- result = tcf_classify(skb, NULL, fl, &res, false);
+ result = tcf_classify_qdisc(skb, fl, &res, false);
if (result >= 0) {
#ifdef CONFIG_NET_CLS_ACT
switch (result) {
--
2.43.0
^ permalink raw reply related
* [PATCH net 1/3] bpf: Reject redirect helpers without a bpf_net_context
From: Daniel Borkmann @ 2026-06-30 12:33 UTC (permalink / raw)
To: kuba; +Cc: pabeni, jhs, bigeasy, andrii, memxor, bpf, netdev
In-Reply-To: <20260630123331.186840-1-daniel@iogearbox.net>
The bpf_redirect*() helpers and skb_do_redirect() obtain the per-task
bpf_redirect_info via bpf_net_ctx_get_ri(), which dereferences the
current->bpf_net_context unconditionally. That context is established
on the paths that run tc BPF such as sch_handle_{ingress,egress}(),
*except* for the case where {cls,act}_bpf was attached to a proper
qdisc. A program running from there reaches the NULL deref in two ways:
* It calls bpf_redirect() directly, which dereferences the context at
the top of the helper:
tc qdisc add dev eth0 root handle 1: red limit 1MB min 10KB max 20KB \
avpkt 1000 burst 100 qevent early_drop block 10
tc filter add block 10 pref 1 bpf obj redirect.o
* It simply returns TC_ACT_REDIRECT without helper call: tcf_qevent_handle()
then dispatches to skb_do_redirect(), which dereferences the context
Rather than extending bpf_net_context management into the qdisc path,
make the redirect helpers refuse to operate when no context exists, and
have tcf_qevent_handle() drop a TC_ACT_REDIRECT verdict instead of
calling skb_do_redirect(). Previous behaviour was a crash, so nothing
regresses by not supporting it.
Fixes: 401cb7dae813 ("net: Reference bpf_redirect_info via task_struct on PREEMPT_RT.")
Fixes: 3625750f05ec ("net: sched: Introduce helpers for qevent blocks")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
net/core/filter.c | 17 +++++++++++------
net/sched/cls_api.c | 6 ++----
2 files changed, 13 insertions(+), 10 deletions(-)
diff --git a/net/core/filter.c b/net/core/filter.c
index b446aa8be5c3..11bb0d236822 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2552,11 +2552,13 @@ int skb_do_redirect(struct sk_buff *skb)
BPF_CALL_2(bpf_redirect, u32, ifindex, u64, flags)
{
- struct bpf_redirect_info *ri = bpf_net_ctx_get_ri();
+ struct bpf_redirect_info *ri;
- if (unlikely(flags & (~(BPF_F_INGRESS) | BPF_F_REDIRECT_INTERNAL)))
+ if (unlikely(!bpf_net_ctx_get() ||
+ (flags & (~(BPF_F_INGRESS) | BPF_F_REDIRECT_INTERNAL))))
return TC_ACT_SHOT;
+ ri = bpf_net_ctx_get_ri();
ri->flags = flags;
ri->tgt_index = ifindex;
@@ -2573,11 +2575,12 @@ static const struct bpf_func_proto bpf_redirect_proto = {
BPF_CALL_2(bpf_redirect_peer, u32, ifindex, u64, flags)
{
- struct bpf_redirect_info *ri = bpf_net_ctx_get_ri();
+ struct bpf_redirect_info *ri;
- if (unlikely(flags))
+ if (unlikely(!bpf_net_ctx_get() || flags))
return TC_ACT_SHOT;
+ ri = bpf_net_ctx_get_ri();
ri->flags = BPF_F_PEER;
ri->tgt_index = ifindex;
@@ -2595,11 +2598,13 @@ static const struct bpf_func_proto bpf_redirect_peer_proto = {
BPF_CALL_4(bpf_redirect_neigh, u32, ifindex, struct bpf_redir_neigh *, params,
int, plen, u64, flags)
{
- struct bpf_redirect_info *ri = bpf_net_ctx_get_ri();
+ struct bpf_redirect_info *ri;
- if (unlikely((plen && plen < sizeof(*params)) || flags))
+ if (unlikely((plen && plen < sizeof(*params)) ||
+ !bpf_net_ctx_get() || flags))
return TC_ACT_SHOT;
+ ri = bpf_net_ctx_get_ri();
ri->flags = BPF_F_NEIGH | (plen ? BPF_F_NEXTHOP : 0);
ri->tgt_index = ifindex;
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 3e67600a4a1a..ac49ca6d9a0c 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -4034,6 +4034,8 @@ struct sk_buff *tcf_qevent_handle(struct tcf_qevent *qe, struct Qdisc *sch, stru
fl = rcu_dereference_bh(qe->filter_chain);
switch (tcf_classify(skb, NULL, fl, &cl_res, false)) {
+ case TC_ACT_REDIRECT:
+ fallthrough;
case TC_ACT_SHOT:
qdisc_qstats_drop(sch);
__qdisc_drop(skb, to_free);
@@ -4045,10 +4047,6 @@ struct sk_buff *tcf_qevent_handle(struct tcf_qevent *qe, struct Qdisc *sch, stru
__qdisc_drop(skb, to_free);
*ret = __NET_XMIT_STOLEN;
return NULL;
- case TC_ACT_REDIRECT:
- skb_do_redirect(skb);
- *ret = __NET_XMIT_STOLEN;
- return NULL;
case TC_ACT_CONSUMED:
*ret = __NET_XMIT_STOLEN;
return NULL;
--
2.43.0
^ permalink raw reply related
* [PATCH net 0/3] Fix broken TC_ACT_REDIRECT from qdiscs
From: Daniel Borkmann @ 2026-06-30 12:33 UTC (permalink / raw)
To: kuba; +Cc: pabeni, jhs, bigeasy, andrii, memxor, bpf, netdev
This is an alternative fix to [0] in order to not uglify
__dev_queue_xmit() with sprinkled ifdefs given this can be
simplified and isolated through a simple test into the BPF
redirect helper itself.
I've also added a proper BPF selftest, so there is no need
to check-in a binary BPF object into selftests given we do
have BPF infra for all of this.
[0] https://lore.kernel.org/netdev/20260629102157.737306-1-jhs@mojatatu.com/
[1] https://lore.kernel.org/netdev/20260629102157.737306-4-jhs@mojatatu.com/
Daniel Borkmann (2):
bpf: Reject redirect helpers without a bpf_net_context
selftests/bpf: Add test for redirect from qdisc qevent block
Jamal Hadi Salim (1):
net/sched: Handle TC_ACT_REDIRECT from qdisc filter chains
include/net/pkt_cls.h | 14 ++-
net/core/filter.c | 17 ++-
net/sched/cls_api.c | 6 +-
net/sched/sch_cake.c | 2 +-
net/sched/sch_drr.c | 2 +-
net/sched/sch_dualpi2.c | 2 +-
net/sched/sch_ets.c | 2 +-
net/sched/sch_fq_codel.c | 2 +-
net/sched/sch_fq_pie.c | 2 +-
net/sched/sch_hfsc.c | 2 +-
net/sched/sch_htb.c | 2 +-
net/sched/sch_multiq.c | 2 +-
net/sched/sch_prio.c | 2 +-
net/sched/sch_qfq.c | 2 +-
net/sched/sch_sfb.c | 2 +-
net/sched/sch_sfq.c | 2 +-
tools/testing/selftests/bpf/config | 1 +
.../selftests/bpf/prog_tests/tc_qevent.c | 113 ++++++++++++++++++
.../selftests/bpf/progs/test_tc_qevent.c | 23 ++++
19 files changed, 175 insertions(+), 25 deletions(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/tc_qevent.c
create mode 100644 tools/testing/selftests/bpf/progs/test_tc_qevent.c
--
2.43.0
^ permalink raw reply
* Re: [PATCH v8 02/14] firmware: qcom_scm: Migrate to generic PAS service
From: Konrad Dybcio @ 2026-06-30 12:26 UTC (permalink / raw)
To: Sumit Garg, andersson
Cc: linux-arm-msm, dri-devel, freedreno, linux-media, netdev,
linux-wireless, ath12k, linux-remoteproc, konradybcio, robh,
krzk+dt, conor+dt, robin.clark, sean, akhilpo, lumag,
abhinav.kumar, jesszhan0024, marijn.suijten, airlied, simona,
vikash.garodia, bod, mchehab, elder, andrew+netdev, davem,
edumazet, kuba, pabeni, jjohnson, mathieu.poirier,
trilokkumar.soni, mukesh.ojha, pavan.kondeti, jorge.ramirez,
tonyh, vignesh.viswanathan, srinivas.kandagatla, amirreza.zarrabi,
jens.wiklander, op-tee, apurupa, skare, linux-kernel, Sumit Garg,
Harshal Dev
In-Reply-To: <20260626133440.692849-3-sumit.garg@kernel.org>
On 6/26/26 3:34 PM, Sumit Garg wrote:
> From: Sumit Garg <sumit.garg@oss.qualcomm.com>
>
> With the availability of generic PAS service, let's add SCM calls as
> a backend to keep supporting legacy QTEE interfaces. The exported
> qcom_scm* wrappers will get dropped once all the client drivers get
> migrated as part of future patches.
>
> Tested-by: Mukesh Ojha <mukesh.ojha@oss.qualcomm.com> # Lemans
> Reviewed-by: Harshal Dev <harshal.dev@oss.qualcomm.com>
> Tested-by: Vignesh Viswanathan <vignesh.viswanathan@oss.qualcomm.com> # IPQ9650
> Signed-off-by: Sumit Garg <sumit.garg@oss.qualcomm.com>
> ---
[...]
> struct qcom_scm_pas_context *devm_qcom_scm_pas_context_alloc(struct device *dev,
> u32 pas_id,
> phys_addr_t mem_phys,
> size_t mem_size)
> {
> - struct qcom_scm_pas_context *ctx;
> + struct qcom_pas_context *ctx;
>
> ctx = devm_kzalloc(dev, sizeof(*ctx), GFP_KERNEL);
> if (!ctx)
> @@ -600,11 +569,12 @@ struct qcom_scm_pas_context *devm_qcom_scm_pas_context_alloc(struct device *dev,
> ctx->mem_phys = mem_phys;
> ctx->mem_size = mem_size;
>
> - return ctx;
> + return (struct qcom_scm_pas_context *)ctx;
"please don't explode"
otherwise
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Konrad
^ permalink raw reply
* [PATCH] net: ipv4: fix TOCTOU race in __ip_do_redirect
From: Lei Huang @ 2026-06-30 12:23 UTC (permalink / raw)
To: dsahern, idosch
Cc: davem, edumazet, kuba, pabeni, horms, netdev, linux-kernel,
Lei Huang
From: Lei Huang <huanglei@kylinos.cn>
fib_lookup() internally acquires and releases rcu_read_lock and always uses
FIB_LOOKUP_NOREF (no refcount on fib_info). After it returns, res (a local
struct fib_result on the stack) has its nhc field pointing into the
fib_info internal nexthop array, but RCU protection is already dropped.
A concurrent route deletion can free the fib_info via kfree_rcu, making
res.nhc a stale pointer. Subsequent FIB_RES_NHC(res) reads this stale value
and update_or_create_fnhe() dereferences it, causing UAF.
Fix by wrap the entire fib_lookup + FIB_RES_NHC + update_or_create_fnhe
region in an explicit rcu_read_lock/unlock to keep the fib_info alive
throughout the critical section.
Signed-off-by: Lei Huang <huanglei@kylinos.cn>
---
net/ipv4/route.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 3f3de5164d6e..86f4b6325050 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -793,6 +793,7 @@ static void __ip_do_redirect(struct rtable *rt, struct sk_buff *skb, struct flow
if (!(READ_ONCE(n->nud_state) & NUD_VALID)) {
neigh_event_send(n, NULL);
} else {
+ rcu_read_lock();
if (fib_lookup(net, fl4, &res, 0) == 0) {
struct fib_nh_common *nhc;
@@ -802,6 +803,7 @@ static void __ip_do_redirect(struct rtable *rt, struct sk_buff *skb, struct flow
0, false,
jiffies + ip_rt_gc_timeout);
}
+ rcu_read_unlock();
if (kill_route)
WRITE_ONCE(rt->dst.obsolete, DST_OBSOLETE_KILL);
call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, n);
--
2.25.1
^ permalink raw reply related
* Re: [PATCH v8 01/14] firmware: qcom: Add a generic PAS service
From: Konrad Dybcio @ 2026-06-30 12:21 UTC (permalink / raw)
To: Sumit Garg, andersson
Cc: linux-arm-msm, dri-devel, freedreno, linux-media, netdev,
linux-wireless, ath12k, linux-remoteproc, konradybcio, robh,
krzk+dt, conor+dt, robin.clark, sean, akhilpo, lumag,
abhinav.kumar, jesszhan0024, marijn.suijten, airlied, simona,
vikash.garodia, bod, mchehab, elder, andrew+netdev, davem,
edumazet, kuba, pabeni, jjohnson, mathieu.poirier,
trilokkumar.soni, mukesh.ojha, pavan.kondeti, jorge.ramirez,
tonyh, vignesh.viswanathan, srinivas.kandagatla, amirreza.zarrabi,
jens.wiklander, op-tee, apurupa, skare, linux-kernel, Sumit Garg,
Harshal Dev
In-Reply-To: <20260626133440.692849-2-sumit.garg@kernel.org>
On 6/26/26 3:34 PM, Sumit Garg wrote:
> From: Sumit Garg <sumit.garg@oss.qualcomm.com>
>
> Qcom platforms has the legacy of using non-standard SCM calls
> splintered over the various kernel drivers. These SCM calls aren't
> compliant with the standard SMC calling conventions which is a
> prerequisite to enable migration to the FF-A specifications from Arm.
>
> OP-TEE as an alternative trusted OS to Qualcomm TEE (QTEE) can't
> support these non-standard SCM calls. And even for newer architectures
> using S-EL2 with Hafnium support, QTEE won't be able to support SCM
> calls either with FF-A requirements coming in. And with both OP-TEE
> and QTEE drivers well integrated in the TEE subsystem, it makes further
> sense to reuse the TEE bus client drivers infrastructure.
>
> The added benefit of TEE bus infrastructure is that there is support
> for discoverable/enumerable services. With that client drivers don't
> have to manually invoke a special SCM call to know the service status.
>
> So enable the generic Peripheral Authentication Service (PAS) provided
> by the firmware. It acts as the common layer with different TZ
> backends plugged in whether it's an SCM implementation or a proper
> TEE bus based PAS service implementation.
>
> Reviewed-by: Mukesh Ojha <mukesh.ojha@oss.qualcomm.com>
> Tested-by: Mukesh Ojha <mukesh.ojha@oss.qualcomm.com> # Lemans
> Reviewed-by: Harshal Dev <harshal.dev@oss.qualcomm.com>
> Tested-by: Vignesh Viswanathan <vignesh.viswanathan@oss.qualcomm.com> # IPQ9650
> Signed-off-by: Sumit Garg <sumit.garg@oss.qualcomm.com>
> ---
[...]
> +struct qcom_pas_context {
> + struct device *dev;
> + u32 pas_id;
> + phys_addr_t mem_phys;
> + size_t mem_size;
> + void *ptr;
> + dma_addr_t phys;
> + ssize_t size;
> + bool use_tzmem;
> +};
Redefining this instead of moving the definition (this is a cross-
subsystem merge anyway) makes things more difficult, as there are
patches from another team touching this struct.. hopefully no kaboom..
Konrad
^ permalink raw reply
* [PATCH net] net: microchip: vcap: fix races on the shared Super VCAP block
From: Jens Emil Schulz Østergaard @ 2026-06-30 12:20 UTC (permalink / raw)
To: Horatiu Vultur, UNGLinuxDriver, Andrew Lunn, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Steen Hegelund,
Daniel Machon
Cc: Steen Hegelund, netdev, linux-kernel, linux-arm-kernel,
Jens Emil Schulz Østergaard
The VCAP instances on a chip are not independent, yet they are locked
independently. On sparx5 and lan969x the IS0 and IS2 instances are
backed by the same Super VCAP hardware block and share its cache and
command registers: every access drives the shared VCAP_SUPER_CTRL
register and moves data through the shared cache registers.
Accessing one instance therefore races with accessing another. The
per-instance admin->lock cannot prevent this, as each instance takes a
different lock.
The locking issue is mostly disguised by the fact that the core usage of
the vcap api runs under rtnl. However, the full rule dump in debugfs
decodes rules straight from hardware (a READ command followed by a cache
read) and runs outside rtnl, so it races a concurrent tc-flower rule
write to another Super VCAP instance.
Besides corrupting the dump, the read repopulates the shared cache
between the writers cache fill and its write command, so the writer
commits the wrong data and corrupts the hardware entry.
Introduce vcap_lock() and vcap_unlock() helpers and route every rule
lock site in the VCAP API and its debugfs code through them. Replace the
per-instance admin->lock with a single mutex in struct vcap_control that
serializes access to all instances. The helpers reach it through a new
admin->vctrl back-pointer, and the clients initialise and destroy the
control lock instead of a per-instance one.
No path holds more than one instance lock, so collapsing them onto a
single mutex cannot self-deadlock.
Fixes: 71c9de995260 ("net: microchip: sparx5: Add VCAP locking to protect rules")
Signed-off-by: Jens Emil Schulz Østergaard <jensemil.schulzostergaard@microchip.com>
---
This was discovered by sashiko on a net-next series adding L3 unicast
routing to sparx5 and lan969x. That work introduces a new vcap instance
which is driven outside of rtnl. However, since the debugfs dump already
runs outside of rtnl the bug is reachable in mainline.
I have added this as a single patch to make it easier to cherry-pick. I
can split it into a patch refactoring all locking sites to use new
vcap_lock/vcap_unlock functions, and one which then swaps the
admin->lock to a global vcap lock, if that is preferred.
---
.../ethernet/microchip/lan966x/lan966x_vcap_impl.c | 5 +-
.../ethernet/microchip/sparx5/sparx5_vcap_impl.c | 5 +-
drivers/net/ethernet/microchip/vcap/vcap_api.c | 72 ++++++++++++----------
drivers/net/ethernet/microchip/vcap/vcap_api.h | 3 +-
.../net/ethernet/microchip/vcap/vcap_api_debugfs.c | 8 +--
.../microchip/vcap/vcap_api_debugfs_kunit.c | 3 +-
.../net/ethernet/microchip/vcap/vcap_api_kunit.c | 3 +-
.../net/ethernet/microchip/vcap/vcap_api_private.h | 3 +
8 files changed, 60 insertions(+), 42 deletions(-)
diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_vcap_impl.c b/drivers/net/ethernet/microchip/lan966x/lan966x_vcap_impl.c
index 72e3b189bac5..eb28df80b281 100644
--- a/drivers/net/ethernet/microchip/lan966x/lan966x_vcap_impl.c
+++ b/drivers/net/ethernet/microchip/lan966x/lan966x_vcap_impl.c
@@ -601,7 +601,6 @@ static void lan966x_vcap_admin_free(struct vcap_admin *admin)
kfree(admin->cache.keystream);
kfree(admin->cache.maskstream);
kfree(admin->cache.actionstream);
- mutex_destroy(&admin->lock);
kfree(admin);
}
@@ -615,7 +614,7 @@ lan966x_vcap_admin_alloc(struct lan966x *lan966x, struct vcap_control *ctrl,
if (!admin)
return ERR_PTR(-ENOMEM);
- mutex_init(&admin->lock);
+ admin->vctrl = ctrl;
INIT_LIST_HEAD(&admin->list);
INIT_LIST_HEAD(&admin->rules);
INIT_LIST_HEAD(&admin->enabled);
@@ -721,6 +720,7 @@ int lan966x_vcap_init(struct lan966x *lan966x)
ctrl->ops = &lan966x_vcap_ops;
INIT_LIST_HEAD(&ctrl->list);
+ mutex_init(&ctrl->lock);
for (int i = 0; i < ARRAY_SIZE(lan966x_vcap_inst_cfg); ++i) {
cfg = &lan966x_vcap_inst_cfg[i];
@@ -780,5 +780,6 @@ void lan966x_vcap_deinit(struct lan966x *lan966x)
lan966x_vcap_admin_free(admin);
}
+ mutex_destroy(&ctrl->lock);
kfree(ctrl);
}
diff --git a/drivers/net/ethernet/microchip/sparx5/sparx5_vcap_impl.c b/drivers/net/ethernet/microchip/sparx5/sparx5_vcap_impl.c
index 95b93e46a41d..cf332de6bf73 100644
--- a/drivers/net/ethernet/microchip/sparx5/sparx5_vcap_impl.c
+++ b/drivers/net/ethernet/microchip/sparx5/sparx5_vcap_impl.c
@@ -1930,7 +1930,6 @@ static void sparx5_vcap_admin_free(struct vcap_admin *admin)
{
if (!admin)
return;
- mutex_destroy(&admin->lock);
kfree(admin->cache.keystream);
kfree(admin->cache.maskstream);
kfree(admin->cache.actionstream);
@@ -1950,7 +1949,7 @@ sparx5_vcap_admin_alloc(struct sparx5 *sparx5, struct vcap_control *ctrl,
INIT_LIST_HEAD(&admin->list);
INIT_LIST_HEAD(&admin->rules);
INIT_LIST_HEAD(&admin->enabled);
- mutex_init(&admin->lock);
+ admin->vctrl = ctrl;
admin->vtype = cfg->vtype;
admin->vinst = cfg->vinst;
admin->ingress = cfg->ingress;
@@ -2059,6 +2058,7 @@ int sparx5_vcap_init(struct sparx5 *sparx5)
ctrl->ops = &sparx5_vcap_ops;
INIT_LIST_HEAD(&ctrl->list);
+ mutex_init(&ctrl->lock);
for (idx = 0; idx < ARRAY_SIZE(sparx5_vcap_inst_cfg); ++idx) {
cfg = &consts->vcaps_cfg[idx];
admin = sparx5_vcap_admin_alloc(sparx5, ctrl, cfg);
@@ -2097,5 +2097,6 @@ void sparx5_vcap_deinit(struct sparx5 *sparx5)
list_del(&admin->list);
sparx5_vcap_admin_free(admin);
}
+ mutex_destroy(&ctrl->lock);
kfree(ctrl);
}
diff --git a/drivers/net/ethernet/microchip/vcap/vcap_api.c b/drivers/net/ethernet/microchip/vcap/vcap_api.c
index 0fdb5e363bad..ff86cde11a32 100644
--- a/drivers/net/ethernet/microchip/vcap/vcap_api.c
+++ b/drivers/net/ethernet/microchip/vcap/vcap_api.c
@@ -934,6 +934,16 @@ static bool vcap_rule_exists(struct vcap_control *vctrl, u32 id)
return false;
}
+void vcap_lock(struct vcap_admin *admin)
+{
+ mutex_lock(&admin->vctrl->lock);
+}
+
+void vcap_unlock(struct vcap_admin *admin)
+{
+ mutex_unlock(&admin->vctrl->lock);
+}
+
/* Find a rule with a provided rule id return a locked vcap */
static struct vcap_rule_internal *
vcap_get_locked_rule(struct vcap_control *vctrl, u32 id)
@@ -943,11 +953,11 @@ vcap_get_locked_rule(struct vcap_control *vctrl, u32 id)
/* Look for the rule id in all vcaps */
list_for_each_entry(admin, &vctrl->list, list) {
- mutex_lock(&admin->lock);
+ vcap_lock(admin);
list_for_each_entry(ri, &admin->rules, list)
if (ri->data.id == id)
return ri;
- mutex_unlock(&admin->lock);
+ vcap_unlock(admin);
}
return NULL;
}
@@ -961,14 +971,14 @@ int vcap_lookup_rule_by_cookie(struct vcap_control *vctrl, u64 cookie)
/* Look for the rule id in all vcaps */
list_for_each_entry(admin, &vctrl->list, list) {
- mutex_lock(&admin->lock);
+ vcap_lock(admin);
list_for_each_entry(ri, &admin->rules, list) {
if (ri->data.cookie == cookie) {
id = ri->data.id;
break;
}
}
- mutex_unlock(&admin->lock);
+ vcap_unlock(admin);
if (id)
return id;
}
@@ -985,11 +995,11 @@ int vcap_admin_rule_count(struct vcap_admin *admin, int cid)
int count = 0;
list_for_each_entry(elem, &admin->rules, list) {
- mutex_lock(&admin->lock);
+ vcap_lock(admin);
if (elem->data.vcap_chain_id >= min_cid &&
elem->data.vcap_chain_id < max_cid)
++count;
- mutex_unlock(&admin->lock);
+ vcap_unlock(admin);
}
return count;
}
@@ -2266,7 +2276,7 @@ int vcap_add_rule(struct vcap_rule *rule)
if (ret)
return ret;
/* Insert the new rule in the list of vcap rules */
- mutex_lock(&ri->admin->lock);
+ vcap_lock(ri->admin);
vcap_rule_set_state(ri);
ret = vcap_insert_rule(ri, &move);
@@ -2302,7 +2312,7 @@ int vcap_add_rule(struct vcap_rule *rule)
goto out;
}
out:
- mutex_unlock(&ri->admin->lock);
+ vcap_unlock(ri->admin);
return ret;
}
EXPORT_SYMBOL_GPL(vcap_add_rule);
@@ -2330,7 +2340,7 @@ struct vcap_rule *vcap_alloc_rule(struct vcap_control *vctrl,
if (vctrl->vcaps[admin->vtype].rows == 0)
return ERR_PTR(-EINVAL);
- mutex_lock(&admin->lock);
+ vcap_lock(admin);
/* Check if a rule with this id already exists */
if (vcap_rule_exists(vctrl, id)) {
err = -EINVAL;
@@ -2369,13 +2379,13 @@ struct vcap_rule *vcap_alloc_rule(struct vcap_control *vctrl,
goto out_free;
}
- mutex_unlock(&admin->lock);
+ vcap_unlock(admin);
return (struct vcap_rule *)ri;
out_free:
kfree(ri);
out_unlock:
- mutex_unlock(&admin->lock);
+ vcap_unlock(admin);
return ERR_PTR(err);
}
@@ -2446,7 +2456,7 @@ struct vcap_rule *vcap_get_rule(struct vcap_control *vctrl, u32 id)
return ERR_PTR(-ENOENT);
rule = vcap_decode_rule(elem);
- mutex_unlock(&elem->admin->lock);
+ vcap_unlock(elem->admin);
return rule;
}
EXPORT_SYMBOL_GPL(vcap_get_rule);
@@ -2483,7 +2493,7 @@ int vcap_mod_rule(struct vcap_rule *rule)
err = vcap_write_counter(ri, &ctr);
out:
- mutex_unlock(&ri->admin->lock);
+ vcap_unlock(ri->admin);
return err;
}
EXPORT_SYMBOL_GPL(vcap_mod_rule);
@@ -2570,7 +2580,7 @@ int vcap_del_rule(struct vcap_control *vctrl, struct net_device *ndev, u32 id)
admin->last_used_addr = elem->addr;
}
- mutex_unlock(&admin->lock);
+ vcap_unlock(admin);
return err;
}
EXPORT_SYMBOL_GPL(vcap_del_rule);
@@ -2585,7 +2595,7 @@ int vcap_del_rules(struct vcap_control *vctrl, struct vcap_admin *admin)
if (ret)
return ret;
- mutex_lock(&admin->lock);
+ vcap_lock(admin);
list_for_each_entry_safe(ri, next_ri, &admin->rules, list) {
vctrl->ops->init(ri->ndev, admin, ri->addr, ri->size);
list_del(&ri->list);
@@ -2598,7 +2608,7 @@ int vcap_del_rules(struct vcap_control *vctrl, struct vcap_admin *admin)
list_del(&eport->list);
kfree(eport);
}
- mutex_unlock(&admin->lock);
+ vcap_unlock(admin);
return 0;
}
@@ -3016,7 +3026,7 @@ static int vcap_enable_rules(struct vcap_control *vctrl,
continue;
/* Found the admin, now find the offloadable rules */
- mutex_lock(&admin->lock);
+ vcap_lock(admin);
list_for_each_entry(ri, &admin->rules, list) {
/* Is the rule in the lookup defined by the chain */
if (!(ri->data.vcap_chain_id >= chain &&
@@ -3034,7 +3044,7 @@ static int vcap_enable_rules(struct vcap_control *vctrl,
if (err)
break;
}
- mutex_unlock(&admin->lock);
+ vcap_unlock(admin);
if (err)
break;
}
@@ -3074,7 +3084,7 @@ static int vcap_disable_rules(struct vcap_control *vctrl,
continue;
/* Found the admin, now find the rules on the chain */
- mutex_lock(&admin->lock);
+ vcap_lock(admin);
list_for_each_entry(ri, &admin->rules, list) {
if (ri->data.vcap_chain_id != chain)
continue;
@@ -3089,7 +3099,7 @@ static int vcap_disable_rules(struct vcap_control *vctrl,
if (err)
break;
}
- mutex_unlock(&admin->lock);
+ vcap_unlock(admin);
if (err)
break;
}
@@ -3133,9 +3143,9 @@ static int vcap_enable(struct vcap_control *vctrl, struct net_device *ndev,
eport->cookie = cookie;
eport->src_cid = src_cid;
eport->dst_cid = dst_cid;
- mutex_lock(&admin->lock);
+ vcap_lock(admin);
list_add_tail(&eport->list, &admin->enabled);
- mutex_unlock(&admin->lock);
+ vcap_unlock(admin);
if (vcap_path_exist(vctrl, ndev, src_cid)) {
/* Enable chained lookups */
@@ -3185,9 +3195,9 @@ static int vcap_disable(struct vcap_control *vctrl, struct net_device *ndev,
dst_cid = vcap_get_next_chain(vctrl, ndev, dst_cid);
}
- mutex_lock(&found->lock);
+ vcap_lock(found);
list_del(&eport->list);
- mutex_unlock(&found->lock);
+ vcap_unlock(found);
kfree(eport);
return 0;
}
@@ -3270,9 +3280,9 @@ int vcap_rule_set_counter(struct vcap_rule *rule, struct vcap_counter *ctr)
return -EINVAL;
}
- mutex_lock(&ri->admin->lock);
+ vcap_lock(ri->admin);
err = vcap_write_counter(ri, ctr);
- mutex_unlock(&ri->admin->lock);
+ vcap_unlock(ri->admin);
return err;
}
@@ -3291,9 +3301,9 @@ int vcap_rule_get_counter(struct vcap_rule *rule, struct vcap_counter *ctr)
return -EINVAL;
}
- mutex_lock(&ri->admin->lock);
+ vcap_lock(ri->admin);
err = vcap_read_counter(ri, ctr);
- mutex_unlock(&ri->admin->lock);
+ vcap_unlock(ri->admin);
return err;
}
@@ -3395,7 +3405,7 @@ int vcap_get_rule_count_by_cookie(struct vcap_control *vctrl,
/* Iterate all rules in each VCAP instance */
list_for_each_entry(admin, &vctrl->list, list) {
- mutex_lock(&admin->lock);
+ vcap_lock(admin);
list_for_each_entry(ri, &admin->rules, list) {
if (ri->data.cookie != cookie)
continue;
@@ -3412,12 +3422,12 @@ int vcap_get_rule_count_by_cookie(struct vcap_control *vctrl,
if (err)
goto unlock;
}
- mutex_unlock(&admin->lock);
+ vcap_unlock(admin);
}
return err;
unlock:
- mutex_unlock(&admin->lock);
+ vcap_unlock(admin);
return err;
}
EXPORT_SYMBOL_GPL(vcap_get_rule_count_by_cookie);
diff --git a/drivers/net/ethernet/microchip/vcap/vcap_api.h b/drivers/net/ethernet/microchip/vcap/vcap_api.h
index 6069ad95c27e..05b4b02e59ef 100644
--- a/drivers/net/ethernet/microchip/vcap/vcap_api.h
+++ b/drivers/net/ethernet/microchip/vcap/vcap_api.h
@@ -164,7 +164,7 @@ struct vcap_admin {
struct list_head list; /* for insertion in vcap_control */
struct list_head rules; /* list of rules */
struct list_head enabled; /* list of enabled ports */
- struct mutex lock; /* control access to rules */
+ struct vcap_control *vctrl; /* the control instance owning this vcap */
enum vcap_type vtype; /* type of vcap */
int vinst; /* instance number within the same type */
int first_cid; /* first chain id in this vcap */
@@ -275,6 +275,7 @@ struct vcap_control {
const struct vcap_info *vcaps; /* client supplied vcap models */
const struct vcap_statistics *stats; /* client supplied vcap stats */
struct list_head list; /* list of vcap instances */
+ struct mutex lock; /* serialize access to all vcap instances */
};
#endif /* __VCAP_API__ */
diff --git a/drivers/net/ethernet/microchip/vcap/vcap_api_debugfs.c b/drivers/net/ethernet/microchip/vcap/vcap_api_debugfs.c
index 59bfbda29bb3..e0c65c7ab23e 100644
--- a/drivers/net/ethernet/microchip/vcap/vcap_api_debugfs.c
+++ b/drivers/net/ethernet/microchip/vcap/vcap_api_debugfs.c
@@ -410,9 +410,9 @@ static int vcap_debugfs_show(struct seq_file *m, void *unused)
};
int ret;
- mutex_lock(&info->admin->lock);
+ vcap_lock(info->admin);
ret = vcap_show_admin(info->vctrl, info->admin, &out);
- mutex_unlock(&info->admin->lock);
+ vcap_unlock(info->admin);
return ret;
}
DEFINE_SHOW_ATTRIBUTE(vcap_debugfs);
@@ -427,9 +427,9 @@ static int vcap_raw_debugfs_show(struct seq_file *m, void *unused)
};
int ret;
- mutex_lock(&info->admin->lock);
+ vcap_lock(info->admin);
ret = vcap_show_admin_raw(info->vctrl, info->admin, &out);
- mutex_unlock(&info->admin->lock);
+ vcap_unlock(info->admin);
return ret;
}
DEFINE_SHOW_ATTRIBUTE(vcap_raw_debugfs);
diff --git a/drivers/net/ethernet/microchip/vcap/vcap_api_debugfs_kunit.c b/drivers/net/ethernet/microchip/vcap/vcap_api_debugfs_kunit.c
index 9c9d38042125..ac2a3b8c4f32 100644
--- a/drivers/net/ethernet/microchip/vcap/vcap_api_debugfs_kunit.c
+++ b/drivers/net/ethernet/microchip/vcap/vcap_api_debugfs_kunit.c
@@ -243,10 +243,11 @@ static void vcap_test_api_init(struct vcap_admin *admin)
{
/* Initialize the shared objects */
INIT_LIST_HEAD(&test_vctrl.list);
+ mutex_init(&test_vctrl.lock);
INIT_LIST_HEAD(&admin->list);
INIT_LIST_HEAD(&admin->rules);
INIT_LIST_HEAD(&admin->enabled);
- mutex_init(&admin->lock);
+ admin->vctrl = &test_vctrl;
list_add_tail(&admin->list, &test_vctrl.list);
memset(test_updateaddr, 0, sizeof(test_updateaddr));
test_updateaddridx = 0;
diff --git a/drivers/net/ethernet/microchip/vcap/vcap_api_kunit.c b/drivers/net/ethernet/microchip/vcap/vcap_api_kunit.c
index ce26ccbdccdf..83de384d3e3b 100644
--- a/drivers/net/ethernet/microchip/vcap/vcap_api_kunit.c
+++ b/drivers/net/ethernet/microchip/vcap/vcap_api_kunit.c
@@ -233,10 +233,11 @@ static void vcap_test_api_init(struct vcap_admin *admin)
{
/* Initialize the shared objects */
INIT_LIST_HEAD(&test_vctrl.list);
+ mutex_init(&test_vctrl.lock);
INIT_LIST_HEAD(&admin->list);
INIT_LIST_HEAD(&admin->rules);
INIT_LIST_HEAD(&admin->enabled);
- mutex_init(&admin->lock);
+ admin->vctrl = &test_vctrl;
list_add_tail(&admin->list, &test_vctrl.list);
memset(test_updateaddr, 0, sizeof(test_updateaddr));
test_updateaddridx = 0;
diff --git a/drivers/net/ethernet/microchip/vcap/vcap_api_private.h b/drivers/net/ethernet/microchip/vcap/vcap_api_private.h
index 844bdf6b5f45..b4057fbe3d18 100644
--- a/drivers/net/ethernet/microchip/vcap/vcap_api_private.h
+++ b/drivers/net/ethernet/microchip/vcap/vcap_api_private.h
@@ -50,6 +50,9 @@ struct vcap_stream_iter {
/* Check that the control has a valid set of callbacks */
int vcap_api_check(struct vcap_control *ctrl);
+/* Serialize access to the vcap instances of a control */
+void vcap_lock(struct vcap_admin *admin);
+void vcap_unlock(struct vcap_admin *admin);
/* Erase the VCAP cache area used or encoding and decoding */
void vcap_erase_cache(struct vcap_rule_internal *ri);
---
base-commit: d87363b0edfc7504ff2b144fe4cdd8154f90f42e
change-id: 20260624-microchip_fix_vcap_locking-70c057531c16
Best regards,
--
Jens Emil Schulz Østergaard <jensemil.schulzostergaard@microchip.com>
^ permalink raw reply related
* Re: [PATCH net-next v3 3/5] net: af_unix: useful handling of LSM denials on SCM_RIGHTS
From: Jori Koolstra @ 2026-06-30 12:17 UTC (permalink / raw)
To: Christian Brauner
Cc: Aleksa Sarai, Kuniyuki Iwashima, David S . Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman, netdev, linux-fsdevel,
linux-kernel
In-Reply-To: <20260630-getoppt-granit-gekrochen-78244e8979d9@brauner>
> Op 30-06-2026 11:58 CEST schreef Christian Brauner <brauner@kernel.org>:
>
>
> > Right now if some LSM such as Smack denies an AF_UNIX socket peer to
> > receive an SCM_RIGHTS fd, the SCM_RIGHTS fd array will be cut short at
> > that point, and MSG_CTRUNC is set on return of recvmsg(). This is
> > highly problematic behaviour, because it leaves the receiver
> > wondering what happened. As per man page MSG_CTRUNC is supposed to
> > indicate that the control buffer was sized too short, but suddenly
> > a permission error might result in the exact same flag being set.
> > Moreover, the receiver has no chance to determine how many fds got
> > originally sent and how many were suppressed.[1]
> >
> > Add a SO_RIGHTS_NOTRUNC option to UNIX sockets to enable more useful
> > handling of LSM denials when receiving SCM_RIGHTS messages: instead of
> > truncating the message at the first blocked fd, keep every fd slot
> > and store the LSM errno in the blocked slot.
> >
> > [1]: https://github.com/uapi-group/kernel-features#useful-handling-of-lsm-denials-on-scm_rights
> >
> > Signed-off-by: Jori Koolstra <jkoolstra@xs4all.nl>
> >
> > diff --git a/include/net/af_unix.h b/include/net/af_unix.h
> > index 34f53dde65ce..bb1b3dee02e8 100644
> > --- a/include/net/af_unix.h
> > +++ b/include/net/af_unix.h
> > @@ -49,6 +49,7 @@ struct unix_sock {
> > struct scm_stat scm_stat;
> > int inq_len;
> > bool recvmsg_inq;
> > + bool scm_rights_notrunc;
> > #if IS_ENABLED(CONFIG_AF_UNIX_OOB)
> > struct sk_buff *oob_skb;
> > #endif
> > diff --git a/include/net/scm.h b/include/net/scm.h
> > index c52519669349..761cda0803fb 100644
> > --- a/include/net/scm.h
> > +++ b/include/net/scm.h
> > @@ -50,8 +50,8 @@ struct scm_cookie {
> > #endif
> > };
> >
> > -void scm_detach_fds(struct msghdr *msg, struct scm_cookie *scm);
> > -void scm_detach_fds_compat(struct msghdr *msg, struct scm_cookie *scm);
> > +void scm_detach_fds(struct msghdr *msg, struct scm_cookie *scm, bool notrunc);
> > +void scm_detach_fds_compat(struct msghdr *msg, struct scm_cookie *scm, bool notrunc);
> > int __scm_send(struct socket *sock, struct msghdr *msg, struct scm_cookie *scm);
> > void __scm_destroy(struct scm_cookie *scm);
> > struct scm_fp_list *scm_fp_dup(struct scm_fp_list *fpl);
> > @@ -108,11 +108,18 @@ void scm_recv_unix(struct socket *sock, struct msghdr *msg,
> > struct scm_cookie *scm, int flags);
> >
> > static inline int scm_recv_one_fd(struct file *f, int __user *ufd,
> > - unsigned int flags)
> > + unsigned int flags, bool notrunc)
> > {
> > + bool filtered;
> > + int error;
> > +
> > if (!ufd)
> > return -EFAULT;
> > - return receive_fd(f, ufd, flags);
> > +
> > + error = receive_fd_filtered(f, ufd, flags, &filtered);
> > + if (filtered && notrunc)
> > + return put_user(error, ufd);
>
> This helper makes no sense to me. The boolean return argument is just
> really nasty and you need an additional put_user() as well. At this
> point, just drop receive_fd() and open-code it instead of using another
> custom helper. Something like the completely untested:
>
It's not very pretty no. I thought about several different options but they
all kinda suck.
You could override the error returned from the LSM to -EACCES. Since nothing
else in receive_fd() produces this, if you get it you can be sure that you had
an LSM fd block. However, this masks the real returned error and is also a bit
fragile if receive_fd() does ever return -EACCES in another path (unlikely but still).
You can also signal blocking by setting the ufd to the security_file_receive() error
no matter the socket option. But this does change userspace.
But open-coding might be a better idea.
I also choose to not put -EPERM as sentinel as suggested first, but use the
actual LSM error. Agreed?
> diff --git a/include/net/scm.h b/include/net/scm.h
> index 761cda0803fb..171b5ccd0b77 100644
> --- a/include/net/scm.h
> +++ b/include/net/scm.h
> @@ -116,10 +116,22 @@ static inline int scm_recv_one_fd(struct file *f, int __user *ufd,
> if (!ufd)
> return -EFAULT;
>
> - error = receive_fd_filtered(f, ufd, flags, &filtered);
> - if (filtered && notrunc)
> - return put_user(error, ufd);
> - return error;
> + error = security_file_receive(file);
> + if (error)
> + return notrunc ? put_user(error, ufd) : error;
> +
> + FD_PREPARE(fdf, flags, f);
> + if (fdf.err)
> + return fdf.err;
> + get_file(f);
> +
> + error = put_user(fd_prepare_fd(fdf), ufd);
> + if (error)
> + return error;
> +
> + __receive_sock(f);
> + return fd_publish(fdf);
> }
>
> --
> Christian Brauner <brauner@kernel.org>
^ permalink raw reply
* Re: [PATCH v8 01/14] firmware: qcom: Add a generic PAS service
From: Konrad Dybcio @ 2026-06-30 12:14 UTC (permalink / raw)
To: Sumit Garg, andersson
Cc: linux-arm-msm, dri-devel, freedreno, linux-media, netdev,
linux-wireless, ath12k, linux-remoteproc, konradybcio, robh,
krzk+dt, conor+dt, robin.clark, sean, akhilpo, lumag,
abhinav.kumar, jesszhan0024, marijn.suijten, airlied, simona,
vikash.garodia, bod, mchehab, elder, andrew+netdev, davem,
edumazet, kuba, pabeni, jjohnson, mathieu.poirier,
trilokkumar.soni, mukesh.ojha, pavan.kondeti, jorge.ramirez,
tonyh, vignesh.viswanathan, srinivas.kandagatla, amirreza.zarrabi,
jens.wiklander, op-tee, apurupa, skare, linux-kernel, Sumit Garg,
Harshal Dev
In-Reply-To: <20260626133440.692849-2-sumit.garg@kernel.org>
On 6/26/26 3:34 PM, Sumit Garg wrote:
> From: Sumit Garg <sumit.garg@oss.qualcomm.com>
>
> Qcom platforms has the legacy of using non-standard SCM calls
> splintered over the various kernel drivers. These SCM calls aren't
> compliant with the standard SMC calling conventions which is a
> prerequisite to enable migration to the FF-A specifications from Arm.
[...]
> +bool qcom_pas_is_available(void)
This is the most important function, for which I would expect
kerneldoc be present. I think it also wouldn't hurt to add a
footnote in every other function's kerneldoc saying that this must
be called first
Konrad
^ permalink raw reply
* Re: [PATCH v2 5/6] arm64: dts: qcom: ipq5018: add nodes required for Bluetooth support
From: Konrad Dybcio @ 2026-06-30 12:12 UTC (permalink / raw)
To: George Moussalem, Jens Axboe, Ulf Hansson, Rob Herring,
Krzysztof Kozlowski, Conor Dooley, Johannes Berg, Jeff Johnson,
Bartosz Golaszewski, Marcel Holtmann, Luiz Augusto von Dentz,
Balakrishna Godavarthi, Rocky Liao, Saravana Kannan, Andrew Lunn,
Heiner Kallweit, Russell King, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman, Bjorn Andersson,
Konrad Dybcio, Mathieu Poirier, Philipp Zabel
Cc: linux-block, linux-kernel, linux-mmc, devicetree, linux-wireless,
ath10k, linux-arm-msm, linux-bluetooth, netdev, linux-remoteproc
In-Reply-To: <SN7PR19MB6736F8FEB36D52E867C000E29DF72@SN7PR19MB6736.namprd19.prod.outlook.com>
On 6/30/26 2:09 PM, George Moussalem wrote:
> On 6/30/26 15:40, Konrad Dybcio wrote:
>> On 6/29/26 3:01 PM, George Moussalem via B4 Relay wrote:
>>> From: George Moussalem <george.moussalem@outlook.com>
>>>
>>> Add nodes for the reserved memory carveout and Bluetooth.
>>>
>>> Signed-off-by: George Moussalem <george.moussalem@outlook.com>
>>> ---
>>> arch/arm64/boot/dts/qcom/ipq5018.dtsi | 25 ++++++++++++++++++++++++-
>>> 1 file changed, 24 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/arm64/boot/dts/qcom/ipq5018.dtsi b/arch/arm64/boot/dts/qcom/ipq5018.dtsi
>>> index 6f8004a22a1f..65a47ba7d3a3 100644
>>> --- a/arch/arm64/boot/dts/qcom/ipq5018.dtsi
>>> +++ b/arch/arm64/boot/dts/qcom/ipq5018.dtsi
>>> @@ -17,6 +17,23 @@ / {
>>> #address-cells = <2>;
>>> #size-cells = <2>;
>>>
>>> + bluetooth: bluetooth {
>>> + compatible = "qcom,ipq5018-bt";
>>> +
>>> + firmware-name = "qca/bt_fw_patch.mbn";
>>
>> Is this fw vendor-signed?
>
> I've just analyzed the mbn file (and the mdt + b0x files): it only
> contains hashes for the mdt and b02 segments, no signature/certs at all.
> I've used your pil squasher to create the mbn file. Here are the FW files:
> https://github.com/georgemoussalem/openwrt/tree/ipq50xx-bluetooth/package/firmware/qca-bt-firmware/files
>
> Perhaps you can double check?
Using the not very sophisticated but very quick method of running
strings on it, there's no certificate identifiers indeed
Konrad
^ permalink raw reply
* Re: [PATCH v2 5/6] arm64: dts: qcom: ipq5018: add nodes required for Bluetooth support
From: George Moussalem @ 2026-06-30 12:09 UTC (permalink / raw)
To: Konrad Dybcio, Jens Axboe, Ulf Hansson, Rob Herring,
Krzysztof Kozlowski, Conor Dooley, Johannes Berg, Jeff Johnson,
Bartosz Golaszewski, Marcel Holtmann, Luiz Augusto von Dentz,
Balakrishna Godavarthi, Rocky Liao, Saravana Kannan, Andrew Lunn,
Heiner Kallweit, Russell King, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman, Bjorn Andersson,
Konrad Dybcio, Mathieu Poirier, Philipp Zabel
Cc: linux-block, linux-kernel, linux-mmc, devicetree, linux-wireless,
ath10k, linux-arm-msm, linux-bluetooth, netdev, linux-remoteproc
In-Reply-To: <f3c79cb4-02eb-4e4b-b5b4-9732876c075c@oss.qualcomm.com>
On 6/30/26 15:40, Konrad Dybcio wrote:
> On 6/29/26 3:01 PM, George Moussalem via B4 Relay wrote:
>> From: George Moussalem <george.moussalem@outlook.com>
>>
>> Add nodes for the reserved memory carveout and Bluetooth.
>>
>> Signed-off-by: George Moussalem <george.moussalem@outlook.com>
>> ---
>> arch/arm64/boot/dts/qcom/ipq5018.dtsi | 25 ++++++++++++++++++++++++-
>> 1 file changed, 24 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/arm64/boot/dts/qcom/ipq5018.dtsi b/arch/arm64/boot/dts/qcom/ipq5018.dtsi
>> index 6f8004a22a1f..65a47ba7d3a3 100644
>> --- a/arch/arm64/boot/dts/qcom/ipq5018.dtsi
>> +++ b/arch/arm64/boot/dts/qcom/ipq5018.dtsi
>> @@ -17,6 +17,23 @@ / {
>> #address-cells = <2>;
>> #size-cells = <2>;
>>
>> + bluetooth: bluetooth {
>> + compatible = "qcom,ipq5018-bt";
>> +
>> + firmware-name = "qca/bt_fw_patch.mbn";
>
> Is this fw vendor-signed?
I've just analyzed the mbn file (and the mdt + b0x files): it only
contains hashes for the mdt and b02 segments, no signature/certs at all.
I've used your pil squasher to create the mbn file. Here are the FW files:
https://github.com/georgemoussalem/openwrt/tree/ipq50xx-bluetooth/package/firmware/qca-bt-firmware/files
Perhaps you can double check?
>
> Konrad
Best regards,
George
^ permalink raw reply
* [PATCH v3 net-next 1/1] tcp: Replace min_tso_segs() with tso_segs() CC callback
From: chia-yu.chang @ 2026-06-30 12:01 UTC (permalink / raw)
To: jolsa, yonghong.song, song, linux-kselftest, memxor, shuah,
martin.lau, ast, daniel, andrii, eddyz87, horms, dsahern, bpf,
netdev, pabeni, jhs, kuba, stephen, davem, edumazet,
andrew+netdev, donald.hunter, kuniyu, ij, ncardwell,
koen.de_schepper, g.white, ingemar.s.johansson, mirja.kuehlewind,
cheshire, rs.ietf, Jason_Livingood, vidhi_goel
Cc: Chia-Yu Chang
From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
This patch replaces existing min_tso_segs() with tso_segs() CC callbak
for CC algorithm to provides explicit tso segment number of each data
burst and overrides tcp_tso_autosize().
This change provides below impacts on BPF struct_ops users:
- The callback is renamed from min_tso_segs to tso_segs
- The signature gains an extra u32 mss_now argument
- The return value semantics is changed from "floor value passed into
tcp_tso_autosize()" to "final tso_segs value", bypassing autosizing
As a result, BPF programs shall be updated, beccause retuning a small
constans will now directly limit tso_segs instead of the minimum.
Signed-off-by: Ilpo Järvinen <ij@kernel.org>
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
---
include/net/tcp.h | 13 +++++++++++--
net/ipv4/bpf_tcp_ca.c | 8 +++++---
net/ipv4/tcp_bbr.c | 13 ++++++++++---
net/ipv4/tcp_output.c | 13 +++++++------
tools/testing/selftests/bpf/progs/tcp_ca_kfunc.c | 8 ++++----
5 files changed, 37 insertions(+), 18 deletions(-)
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 6d376ea4d1c0..7fb42a0ce7da 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -824,6 +824,9 @@ unsigned int tcp_sync_mss(struct sock *sk, u32 pmtu);
unsigned int tcp_current_mss(struct sock *sk);
u32 tcp_clamp_probe0_to_user_timeout(const struct sock *sk, u32 when);
+u32 tcp_tso_autosize(const struct sock *sk, unsigned int mss_now,
+ int min_tso_segs);
+
/* Bound MSS / TSO packet size with the half of the window */
static inline int tcp_bound_to_half_wnd(struct tcp_sock *tp, int pktsize)
{
@@ -1361,8 +1364,14 @@ struct tcp_congestion_ops {
/* hook for packet ack accounting (optional) */
void (*pkts_acked)(struct sock *sk, const struct ack_sample *sample);
- /* override sysctl_tcp_min_tso_segs (optional) */
- u32 (*min_tso_segs)(struct sock *sk);
+ /*
+ * Override tcp_tso_autosize (optional)
+ *
+ * If provided, this callback returns the final TSO segment number
+ * and will bypass tcp_tso_autosize() entirely. The implementation
+ * must derive an appropriate value and ensure the result is valid.
+ */
+ u32 (*tso_segs)(struct sock *sk, u32 mss_now);
/* new value of cwnd after loss (required) */
u32 (*undo_cwnd)(struct sock *sk);
diff --git a/net/ipv4/bpf_tcp_ca.c b/net/ipv4/bpf_tcp_ca.c
index 791e15063237..27c4cdfd80a8 100644
--- a/net/ipv4/bpf_tcp_ca.c
+++ b/net/ipv4/bpf_tcp_ca.c
@@ -284,9 +284,11 @@ static void bpf_tcp_ca_pkts_acked(struct sock *sk, const struct ack_sample *samp
{
}
-static u32 bpf_tcp_ca_min_tso_segs(struct sock *sk)
+static u32 bpf_tcp_ca_tso_segs(struct sock *sk, u32 mss_now)
{
- return 0;
+ if (unlikely(!mss_now))
+ return U32_MAX;
+ return tcp_tso_autosize(sk, mss_now, 0);
}
static void bpf_tcp_ca_cong_control(struct sock *sk, u32 ack, int flag,
@@ -320,7 +322,7 @@ static struct tcp_congestion_ops __bpf_ops_tcp_congestion_ops = {
.cwnd_event_tx_start = bpf_tcp_ca_cwnd_event_tx_start,
.in_ack_event = bpf_tcp_ca_in_ack_event,
.pkts_acked = bpf_tcp_ca_pkts_acked,
- .min_tso_segs = bpf_tcp_ca_min_tso_segs,
+ .tso_segs = bpf_tcp_ca_tso_segs,
.cong_control = bpf_tcp_ca_cong_control,
.undo_cwnd = bpf_tcp_ca_undo_cwnd,
.sndbuf_expand = bpf_tcp_ca_sndbuf_expand,
diff --git a/net/ipv4/tcp_bbr.c b/net/ipv4/tcp_bbr.c
index 82378a2bfd1e..b63e77b14c65 100644
--- a/net/ipv4/tcp_bbr.c
+++ b/net/ipv4/tcp_bbr.c
@@ -297,11 +297,18 @@ static void bbr_set_pacing_rate(struct sock *sk, u32 bw, int gain)
}
/* override sysctl_tcp_min_tso_segs */
-__bpf_kfunc static u32 bbr_min_tso_segs(struct sock *sk)
+static u32 bbr_min_tso_segs(struct sock *sk)
{
return READ_ONCE(sk->sk_pacing_rate) < (bbr_min_tso_rate >> 3) ? 1 : 2;
}
+__bpf_kfunc static u32 bbr_tso_segs(struct sock *sk, u32 mss_now)
+{
+ if (unlikely(!mss_now))
+ return U32_MAX;
+ return tcp_tso_autosize(sk, mss_now, bbr_min_tso_segs(sk));
+}
+
static u32 bbr_tso_segs_goal(struct sock *sk)
{
struct tcp_sock *tp = tcp_sk(sk);
@@ -1151,7 +1158,7 @@ static struct tcp_congestion_ops tcp_bbr_cong_ops __read_mostly = {
.undo_cwnd = bbr_undo_cwnd,
.cwnd_event_tx_start = bbr_cwnd_event_tx_start,
.ssthresh = bbr_ssthresh,
- .min_tso_segs = bbr_min_tso_segs,
+ .tso_segs = bbr_tso_segs,
.get_info = bbr_get_info,
.set_state = bbr_set_state,
};
@@ -1163,7 +1170,7 @@ BTF_ID_FLAGS(func, bbr_sndbuf_expand)
BTF_ID_FLAGS(func, bbr_undo_cwnd)
BTF_ID_FLAGS(func, bbr_cwnd_event_tx_start)
BTF_ID_FLAGS(func, bbr_ssthresh)
-BTF_ID_FLAGS(func, bbr_min_tso_segs)
+BTF_ID_FLAGS(func, bbr_tso_segs)
BTF_ID_FLAGS(func, bbr_set_state)
BTF_KFUNCS_END(tcp_bbr_check_kfunc_ids)
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 00ec4b5900f2..f3fc4b64e61d 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2253,8 +2253,8 @@ static bool tcp_nagle_check(bool partial, const struct tcp_sock *tp,
* for every 2^9 usec (aka 512 us) of RTT, so that the RTT-based allowance
* is below 1500 bytes after 6 * ~500 usec = 3ms.
*/
-static u32 tcp_tso_autosize(const struct sock *sk, unsigned int mss_now,
- int min_tso_segs)
+u32 tcp_tso_autosize(const struct sock *sk, unsigned int mss_now,
+ int min_tso_segs)
{
unsigned long bytes;
u32 r;
@@ -2269,6 +2269,7 @@ static u32 tcp_tso_autosize(const struct sock *sk, unsigned int mss_now,
return max_t(u32, bytes / mss_now, min_tso_segs);
}
+EXPORT_SYMBOL(tcp_tso_autosize);
/* Return the number of segments we want in the skb we are transmitting.
* See if congestion control module wants to decide; otherwise, autosize.
@@ -2278,11 +2279,11 @@ static u32 tcp_tso_segs(struct sock *sk, unsigned int mss_now)
const struct tcp_congestion_ops *ca_ops = inet_csk(sk)->icsk_ca_ops;
u32 min_tso, tso_segs;
- min_tso = ca_ops->min_tso_segs ?
- ca_ops->min_tso_segs(sk) :
- READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_min_tso_segs);
+ min_tso = READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_min_tso_segs);
- tso_segs = tcp_tso_autosize(sk, mss_now, min_tso);
+ tso_segs = ca_ops->tso_segs ?
+ ca_ops->tso_segs(sk, mss_now) :
+ tcp_tso_autosize(sk, mss_now, min_tso);
return min_t(u32, tso_segs, sk->sk_gso_max_segs);
}
diff --git a/tools/testing/selftests/bpf/progs/tcp_ca_kfunc.c b/tools/testing/selftests/bpf/progs/tcp_ca_kfunc.c
index 0a3e9d35bf6f..58262e490336 100644
--- a/tools/testing/selftests/bpf/progs/tcp_ca_kfunc.c
+++ b/tools/testing/selftests/bpf/progs/tcp_ca_kfunc.c
@@ -10,7 +10,7 @@ extern u32 bbr_sndbuf_expand(struct sock *sk) __ksym;
extern u32 bbr_undo_cwnd(struct sock *sk) __ksym;
extern void bbr_cwnd_event_tx_start(struct sock *sk) __ksym;
extern u32 bbr_ssthresh(struct sock *sk) __ksym;
-extern u32 bbr_min_tso_segs(struct sock *sk) __ksym;
+extern u32 bbr_tso_segs(struct sock *sk, u32 mss_now) __ksym;
extern void bbr_set_state(struct sock *sk, u8 new_state) __ksym;
extern void dctcp_init(struct sock *sk) __ksym;
@@ -90,9 +90,9 @@ u32 BPF_PROG(ssthresh, struct sock *sk)
}
SEC("struct_ops")
-u32 BPF_PROG(min_tso_segs, struct sock *sk)
+u32 BPF_PROG(tso_segs, struct sock *sk, u32 mss_now)
{
- return bbr_min_tso_segs(sk);
+ return bbr_tso_segs(sk, mss_now);
}
SEC("struct_ops")
@@ -120,7 +120,7 @@ struct tcp_congestion_ops tcp_ca_kfunc = {
.cwnd_event = (void *)cwnd_event,
.cwnd_event_tx_start = (void *)cwnd_event_tx_start,
.ssthresh = (void *)ssthresh,
- .min_tso_segs = (void *)min_tso_segs,
+ .tso_segs = (void *)tso_segs,
.set_state = (void *)set_state,
.pkts_acked = (void *)pkts_acked,
.name = "tcp_ca_kfunc",
--
2.34.1
^ permalink raw reply related
* Re: [PATCH net-next] net: neigh: avoid calling neigh_forced_gc on every alloc when table is full
From: Vimal Agrawal @ 2026-06-30 12:01 UTC (permalink / raw)
To: Kuniyuki Iwashima; +Cc: kuba, edumazet, netdev, vimal.agrawal
In-Reply-To: <CAAVpQUA4dyukqihiQoGfbaPtBn1OAaRaWyQ197+hPs7gmqW7=Q@mail.gmail.com>
Hi Kuniyuki,
You are correct that in this specific test case GC does not help since
all entries are active/reachable. However, this is not the only
scenario where entries can exceed gc_thresh3.
In a real workload, the table can exceed gc_thresh3 with a mix of
active and stale entries. In that case GC does help, but should not be
called on every allocation attempt — once per 50ms is sufficient for
GC to make progress without causing lock contention.
The rate limiting also protects against the case where GC cannot
reclaim anything. Without it, every allocation attempt above
gc_thresh3 triggers a full table scan holding tbl->lock, even when GC
has no work to do.
Thanks,
Vimal
On Mon, Jun 29, 2026 at 11:35 PM Kuniyuki Iwashima <kuniyu@google.com> wrote:
>
> On Mon, Jun 29, 2026 at 12:57 AM Vimal Agrawal <avimalin@gmail.com> wrote:
> >
> > Hi Kuniyuki,
> > Thank you for the feedback.
> > However, the rate limiting issue exists independently of the threshold
> > values. If entries genuinely exceed gc_thresh3 — regardless of what it
> > is set to — neigh_forced_gc() is called on every allocation attempt
> > with no rate limiting. In my workload, most entries are
> > active/reachable with refcnt > 1, so the GC walk traverses the entire
> > table without reclaiming anything.
>
> This suggests your gc_thresh2/3 do not fit your use case.
>
> If GC does not help, there is no point in running it or rate-limiting
> in the first place.
>
>
> > Increasing gc_thresh3 would make
> > this worse, not better, as GC now has a larger table to scan on each
> > call.
>
> If you just increase gc_thresh3 slightly, then yes, it won't help.
>
>
> >
> > Regarding neigh_hash_shift: in my workload, neigh_alloc() returns
> > ENOBUFS before reaching do_alloc() since GC cannot reclaim any
> > entries. kzalloc() is never called, so neigh_hash_grow() is not
> > involved in the latency I observed. The pre-lock time check in
> > neigh_forced_gc() is a low-cost safeguard that prevents repeated full
> > table scans regardless of gc_thresh3 value. It does not interfere with
> > correct GC behaviour — if entries are still above the threshold, GC
> > runs normally.
> >
> >
> > Hi Jakub,
> > I tested with different threshold values, filling the table completely
> > with 32k reachable entries and attempting 1000 additional allocations.
> > Exported neigh_forced_gc so that it can be profiled
> > no change 10ms 50ms 100ms
> > max cpu usage % 44% 11.8% 2.56% 1.42%
> > calls > 100us (of 1000) 101 31 13 7
> >
> > At 10ms, max CPU usage is still 11.8% and 31 out of 1000 calls take
> > more than 100us. Given that 50ms reduces this to 2.56% and 13 calls
> > respectively, I would prefer 50ms as the threshold. However, I am open
> > to further discussion on the right value.
> >
> > Thanks,
> > Vimal
> >
> >
> > On Fri, Jun 26, 2026 at 3:17 AM Kuniyuki Iwashima <kuniyu@google.com> wrote:
> > >
> > > From: Vimal Agrawal <avimalin@gmail.com>
> > > Date: Thu, 25 Jun 2026 10:20:20 +0000
> > > > Once the neighbour table exceeds gc_thresh3, neigh_forced_gc() is called
> > > > on every allocation attempt with no rate limiting. In workloads with mostly
> > > > active/reachable entries, the GC walk traverses a large portion of the
> > > > neighbour table without reclaiming entries, holding tbl->lock for an
> > > > extended period. This causes severe lock contention and allocation
> > > > latencies exceeding 16ms under sustained neighbour creation.
> > > >
> > > > Add a pre-lock check in neigh_forced_gc() to skip the GC run if one was
> > > > performed within the last second, avoiding repeated full table scans and
> > > > lock acquisitions on the hot allocation path.
> > > >
> > > > Profiling of neigh_create() shows ~3 orders of magnitude latency
> > > > improvement with this change.
> > > >
> > > > Link:https://lore.kernel.org/netdev/CALkUMdSCpx_ywYCx_ePLdm6yioO1nQWx7sSM=AEgsq0kywHxTw@mail.gmail.com/
> > >
> > > From the thread, these look misconfigured.
> > >
> > > ---8<---
> > > net.ipv6.neigh.default.gc_thresh2 = 32768
> > > net.ipv6.neigh.default.gc_thresh3 = 32768
> > > ---8<---
> > >
> > > If gc_thresh3 is larger enough, gc_thresh2 will give you 5s
> > > rate limiting.
> > >
> > > If the number of active neigh entries constantly exceeds
> > > gc_thresh3, it will be the correct gc_thresh2 for you.
> > >
> > > Also, I guess you want a new kernel param for the first
> > > neigh_hash_alloc(), which is currently fixed for 3, which
> > > is too small for some hosts.
> > >
> > > 50000 entries require neigh_hash_grow() 13 times.
> > >
> > > Can you test this on your real workload, starting from
> > > neigh_hash_shift=16 and appropriate gc_thresh2/3 ?
> > >
> > > ---8<---
> > > diff --git a/net/core/neighbour.c b/net/core/neighbour.c
> > > index 1349c0eedb64..a75b3750eec9 100644
> > > --- a/net/core/neighbour.c
> > > +++ b/net/core/neighbour.c
> > > @@ -1817,6 +1817,22 @@ EXPORT_SYMBOL(neigh_parms_release);
> > > static struct lock_class_key neigh_table_proxy_queue_class;
> > >
> > > static struct neigh_table __rcu *neigh_tables[NEIGH_NR_TABLES] __read_mostly;
> > > +static __initdata unsigned long neigh_hash_shift = 3;
> > > +
> > > +static int __init neigh_set_hash_shift(char *str)
> > > +{
> > > + ssize_t ret;
> > > +
> > > + if (!str)
> > > + return 0;
> > > +
> > > + ret = kstrtoul(str, 0, &neigh_hash_shift);
> > > + if (ret)
> > > + return 0;
> > > +
> > > + return 1;
> > > +}
> > > +__setup("neigh_hash_shift=", neigh_set_hash_shift);
> > >
> > > void neigh_table_init(int index, struct neigh_table *tbl)
> > > {
> > > @@ -1843,7 +1859,7 @@ void neigh_table_init(int index, struct neigh_table *tbl)
> > > panic("cannot create neighbour proc dir entry");
> > > #endif
> > >
> > > - RCU_INIT_POINTER(tbl->nht, neigh_hash_alloc(3));
> > > + RCU_INIT_POINTER(tbl->nht, neigh_hash_alloc(neigh_hash_shift));
> > >
> > > phsize = (PNEIGH_HASHMASK + 1) * sizeof(struct pneigh_entry *);
> > > tbl->phash_buckets = kzalloc(phsize, GFP_KERNEL);
> > > ---8<---
> > >
> > >
> > >
> > > > Signed-off-by: Vimal Agrawal <vimal.agrawal@sophos.com>
> > > > ---
> > > > net/core/neighbour.c | 3 +++
> > > > 1 file changed, 3 insertions(+)
> > > >
> > > > diff --git a/net/core/neighbour.c b/net/core/neighbour.c
> > > > index 1349c0eedb64..078842db3c5f 100644
> > > > --- a/net/core/neighbour.c
> > > > +++ b/net/core/neighbour.c
> > > > @@ -260,6 +260,9 @@ static int neigh_forced_gc(struct neigh_table *tbl)
> > > > int shrunk = 0;
> > > > int loop = 0;
> > > >
> > > > + if (!time_after(jiffies, READ_ONCE(tbl->last_flush) + HZ))
> > > > + return 0;
> > > > +
> > > > NEIGH_CACHE_STAT_INC(tbl, forced_gc_runs);
> > > >
> > > > spin_lock_bh(&tbl->lock);
> > > > --
> > > > 2.17.1
> > > > v
^ permalink raw reply
* Re: [PATCH v2 1/7] ata: don't keep pci_device_id
From: Niklas Cassel @ 2026-06-30 11:59 UTC (permalink / raw)
To: Gary Guo
Cc: Bjorn Helgaas, Zhenzhong Duan, Greg Kroah-Hartman,
Rafael J. Wysocki, Danilo Krummrich, Damien Le Moal,
GOTO Masanori, YOKOTA Hiroshi, James E.J. Bottomley,
Martin K. Petersen, Vaibhav Gupta, Jens Taprogge, Ido Schimmel,
Petr Machata, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, linux-pci, driver-core, linux-kernel,
linux-ide, linux-scsi, industrypack-devel, netdev
In-Reply-To: <20260630-pci_id_fix-v2-1-b834a98c0af2@garyguo.net>
Hello Gary,
On Tue, Jun 30, 2026 at 12:09:01PM +0100, Gary Guo wrote:
> pci_device_id is not guaranteed to live longer than probe due to presence
> of dynamic ID. All information apart from driver_data can be easily
> retrieved from pci_dev, so just store driver_data.
>
> Signed-off-by: Gary Guo <gary@garyguo.net>
Please write a proper commit message.
The commit message should be detailed enough for someone to realize what
is going on without reading your cover-letter (as information in the cover
letter in not part of the accepted commit).
1) Explain how to reproduce.
2) Explain the problem.
3) Explain the consequences of the problem. UAF? Crash?
4) Explain how you fix it.
AFAICT, this is somehow related to pci_add_dynid(), which is called when
user-space is doing something like:
$ echo "vendor device" > /sys/bus/pci/drivers/your_driver/new_id
Kind regards,
Niklas
^ permalink raw reply
* [PATCH v3 2/3] net: stmmac: fix l3l4 filter rejecting unsupported offload requests
From: muhammad.nazim.amirul.nazle.asmade @ 2026-06-30 11:56 UTC (permalink / raw)
To: netdev
Cc: andrew+netdev, davem, edumazet, kuba, pabeni, rmk+kernel,
maxime.chevallier, Jose.Abreu, linux-kernel
In-Reply-To: <20260630115622.9426-1-muhammad.nazim.amirul.nazle.asmade@altera.com>
From: Nazim Amirul <muhammad.nazim.amirul.nazle.asmade@altera.com>
The basic flow parser in tc_add_basic_flow() does not validate match
keys before proceeding. Unsupported offload configurations such as
partial protocol masks, non-IPv4 network proto, or non-TCP/UDP transport
proto are silently accepted instead of returning -EOPNOTSUPP.
Add validation to return -EOPNOTSUPP early for:
- No network or transport proto present in the key
- Partial protocol mask (only full mask supported)
- Network proto is not IPv4
- Transport proto is not TCP or UDP
Each rejection includes an extack message so the user knows which part
of the match is unsupported.
Also propagate -EOPNOTSUPP from tc_add_basic_flow() in tc_add_flow()
by returning it directly rather than using break. The break was silently
discarding the error for FLOW_CLS_REPLACE operations where entry->in_use
is already true, causing tc_add_flow() to return 0 (success) for
unsupported replace requests.
Fixes: 425eabddaf0f ("net: stmmac: Implement L3/L4 Filters using TC Flower")
Signed-off-by: Rohan G Thomas <rohan.g.thomas@altera.com>
Signed-off-by: Nazim Amirul <muhammad.nazim.amirul.nazle.asmade@altera.com>
---
Changes in v3:
- Add extack messages to each -EOPNOTSUPP return so users know which
part of the match is unsupported (Jakub Kicinski)
- Return -EOPNOTSUPP directly instead of break to avoid silently
reporting success on unsupported FLOW_CLS_REPLACE (Sashiko review)
- Patches 1/3 and 3/3 are unchanged from v2
Changes in v2:
- No changes
---
.../net/ethernet/stmicro/stmmac/stmmac_tc.c | 34 +++++++++++++++++++
1 file changed, 34 insertions(+)
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_tc.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_tc.c
index d78652718599..14cabe76e53e 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_tc.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_tc.c
@@ -446,6 +446,7 @@ static int tc_parse_flow_actions(struct stmmac_priv *priv,
}
#define ETHER_TYPE_FULL_MASK cpu_to_be16(~0)
+#define IP_PROTO_FULL_MASK 0xFF
static int tc_add_basic_flow(struct stmmac_priv *priv,
struct flow_cls_offload *cls,
@@ -461,6 +462,33 @@ static int tc_add_basic_flow(struct stmmac_priv *priv,
flow_rule_match_basic(rule, &match);
+ /* Both network proto and transport proto not present in the key */
+ if (!match.mask || !(match.mask->n_proto || match.mask->ip_proto)) {
+ NL_SET_ERR_MSG_MOD(cls->common.extack,
+ "filter must specify network or transport protocol");
+ return -EOPNOTSUPP;
+ }
+
+ /* If the proto is present in the key and is not full mask */
+ if ((match.mask->n_proto && match.mask->n_proto != ETHER_TYPE_FULL_MASK) ||
+ (match.mask->ip_proto && match.mask->ip_proto != IP_PROTO_FULL_MASK)) {
+ NL_SET_ERR_MSG_MOD(cls->common.extack,
+ "only full protocol mask is supported");
+ return -EOPNOTSUPP;
+ }
+
+ /* Network proto is present in the key and is not IPv4 */
+ if (match.mask->n_proto && match.key->n_proto != cpu_to_be16(ETH_P_IP)) {
+ NL_SET_ERR_MSG_MOD(cls->common.extack,
+ "only IPv4 network protocol is supported");
+ return -EOPNOTSUPP;
+ }
+
+ /* Transport proto is present in the key and is not TCP or UDP */
+ if (match.mask->ip_proto &&
+ match.key->ip_proto != IPPROTO_TCP &&
+ match.key->ip_proto != IPPROTO_UDP) {
+ NL_SET_ERR_MSG_MOD(cls->common.extack,
+ "only TCP and UDP transport protocols are supported");
+ return -EOPNOTSUPP;
+ }
+
entry->ip_proto = match.key->ip_proto;
return 0;
}
@@ -598,11 +626,7 @@ static int tc_add_flow(struct stmmac_priv *priv,
ret = tc_flow_parsers[i].fn(priv, cls, entry);
if (!ret)
entry->in_use = true;
- else if (ret == -EOPNOTSUPP)
- /* The basic flow parser will return EOPNOTSUPP, if a
- * requested offload not fully supported by the hw. And
- * in that case fail early.
- */
- break;
+ else if (ret == -EOPNOTSUPP)
+ return ret;
}
if (!entry->in_use)
--
2.43.7
^ permalink raw reply related
* [PATCH v3 0/3] net: stmmac: L3/L4 filter bug fixes
From: muhammad.nazim.amirul.nazle.asmade @ 2026-06-30 11:56 UTC (permalink / raw)
To: netdev
Cc: andrew+netdev, davem, edumazet, kuba, pabeni, rmk+kernel,
maxime.chevallier, Jose.Abreu, linux-kernel
From: Nazim Amirul <muhammad.nazim.amirul.nazle.asmade@altera.com>
This series fixes three bugs in the stmmac L3/L4 TC flower filter
implementation for the XGMAC2 core. All three patches target net.
The L3/L4 filter match count statistics patch (originally patch 4/4)
has been split out and will be sent separately against net-next per
Andrew Lunn's review of v1.
Patch 1 fixes a register corruption bug in the L4 filter port configuration.
The XGMAC_L4_ADDR register holds both source and destination port match
values in a single register. The original code overwrites the entire register
when setting either field, silently erasing the other. This is fixed by
using a read-modify-write sequence.
Patch 2 fixes the basic flow match parser to properly reject unsupported
offload requests with -EOPNOTSUPP instead of silently accepting them.
Unsupported cases include partial protocol masks, non-IPv4 network proto,
and non-TCP/UDP transport proto. Extack messages are now included so users
know exactly which part of the match is unsupported. The -EOPNOTSUPP is
also now returned directly instead of using break, which was silently
discarding the error on FLOW_CLS_REPLACE operations.
Patch 3 fixes a stale action bug on filter deletion. When a filter entry
with a drop action is deleted, the action field was not reset, causing
it to persist and potentially affect subsequent filter configurations.
All three patches fix the original L3/L4 filter implementation introduced in
425eabddaf0f ("net: stmmac: Implement L3/L4 Filters using TC Flower").
Changes in v3:
- Patch 2: add extack messages to each -EOPNOTSUPP return (Jakub Kicinski)
- Patch 2: return -EOPNOTSUPP directly instead of break to avoid silently
reporting success on unsupported FLOW_CLS_REPLACE (Sashiko review)
Changes in v2:
- Split patch 4/4 (ethtool stats) out to net-next per Andrew Lunn's review
Nazim Amirul (3):
net: stmmac: xgmac: fix l4 filter port overwrite on register update
net: stmmac: fix l3l4 filter rejecting unsupported offload requests
net: stmmac: reset residual action in L3L4 filters on delete
.../ethernet/stmicro/stmmac/dwxgmac2_core.c | 28 +++++++++++--------
.../net/ethernet/stmicro/stmmac/stmmac_tc.c | 35 +++++++++++++++++++
2 files changed, 49 insertions(+), 12 deletions(-)
--
2.43.7
^ permalink raw reply
* [PATCH v2 3/3] net: stmmac: reset residual action in L3L4 filters on delete
From: muhammad.nazim.amirul.nazle.asmade @ 2026-06-30 11:56 UTC (permalink / raw)
To: netdev
Cc: andrew+netdev, davem, edumazet, kuba, pabeni, rmk+kernel,
maxime.chevallier, Jose.Abreu, linux-kernel
In-Reply-To: <20260630115622.9426-1-muhammad.nazim.amirul.nazle.asmade@altera.com>
From: Nazim Amirul <muhammad.nazim.amirul.nazle.asmade@altera.com>
When deleting an L3/L4 flower filter entry, the action field is not
reset. If a filter was previously configured with a drop action, that
action may persist and affect subsequent filter configurations
unintentionally.
Clear the action field when the filter entry is deleted.
Fixes: 425eabddaf0f ("net: stmmac: Implement L3/L4 Filters using TC Flower")
Signed-off-by: Rohan G Thomas <rohan.g.thomas@altera.com>
Signed-off-by: Nazim Amirul <muhammad.nazim.amirul.nazle.asmade@altera.com>
---
drivers/net/ethernet/stmicro/stmmac/stmmac_tc.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_tc.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_tc.c
index 869f84756ca5..4f9758eeb86f 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_tc.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_tc.c
@@ -653,6 +653,7 @@ static int tc_del_flow(struct stmmac_priv *priv,
entry->in_use = false;
entry->cookie = 0;
entry->is_l4 = false;
+ entry->action = 0;
return ret;
}
--
2.43.7
^ permalink raw reply related
* [PATCH v2 1/3] net: stmmac: xgmac: fix l4 filter port overwrite on register update
From: muhammad.nazim.amirul.nazle.asmade @ 2026-06-30 11:56 UTC (permalink / raw)
To: netdev
Cc: andrew+netdev, davem, edumazet, kuba, pabeni, rmk+kernel,
maxime.chevallier, Jose.Abreu, linux-kernel
In-Reply-To: <20260630115622.9426-1-muhammad.nazim.amirul.nazle.asmade@altera.com>
From: Nazim Amirul <muhammad.nazim.amirul.nazle.asmade@altera.com>
The XGMAC_L4_ADDR register holds both source and destination port
match values. The current implementation overwrites the entire register
when configuring either port, so setting one silently erases the other.
Fix this by reading the register first, then masking and updating only
the relevant field before writing back.
Fixes: 425eabddaf0f ("net: stmmac: Implement L3/L4 Filters using TC Flower")
Signed-off-by: Rohan G Thomas <rohan.g.thomas@altera.com>
Signed-off-by: Nazim Amirul <muhammad.nazim.amirul.nazle.asmade@altera.com>
---
.../ethernet/stmicro/stmmac/dwxgmac2_core.c | 28 +++++++++++--------
1 file changed, 16 insertions(+), 12 deletions(-)
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_core.c b/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_core.c
index f02b434bbd50..52054f31376d 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_core.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_core.c
@@ -1370,36 +1370,40 @@ static int dwxgmac2_config_l4_filter(struct mac_device_info *hw, u32 filter_no,
value &= ~XGMAC_L4PEN0;
}
- value &= ~(XGMAC_L4SPM0 | XGMAC_L4SPIM0);
- value &= ~(XGMAC_L4DPM0 | XGMAC_L4DPIM0);
if (sa) {
value |= XGMAC_L4SPM0;
if (inv)
value |= XGMAC_L4SPIM0;
+ else
+ value &= ~XGMAC_L4SPIM0;
} else {
value |= XGMAC_L4DPM0;
if (inv)
value |= XGMAC_L4DPIM0;
+ else
+ value &= ~XGMAC_L4DPIM0;
}
ret = dwxgmac2_filter_write(hw, filter_no, XGMAC_L3L4_CTRL, value);
if (ret)
return ret;
- if (sa) {
- value = FIELD_PREP(XGMAC_L4SP0, match);
+ ret = dwxgmac2_filter_read(hw, filter_no, XGMAC_L4_ADDR, &value);
+ if (ret)
+ return ret;
- ret = dwxgmac2_filter_write(hw, filter_no, XGMAC_L4_ADDR, value);
- if (ret)
- return ret;
+ if (sa) {
+ value &= ~XGMAC_L4SP0;
+ value |= FIELD_PREP(XGMAC_L4SP0, match);
} else {
- value = FIELD_PREP(XGMAC_L4DP0, match);
-
- ret = dwxgmac2_filter_write(hw, filter_no, XGMAC_L4_ADDR, value);
- if (ret)
- return ret;
+ value &= ~XGMAC_L4DP0;
+ value |= FIELD_PREP(XGMAC_L4DP0, match);
}
+ ret = dwxgmac2_filter_write(hw, filter_no, XGMAC_L4_ADDR, value);
+ if (ret)
+ return ret;
+
if (!en)
return dwxgmac2_filter_write(hw, filter_no, XGMAC_L3L4_CTRL, 0);
--
2.43.7
^ permalink raw reply related
* [PATCH net V4 3/3] net/mlx5e: Fix publication race for priv->channel_stats[]
From: Tariq Toukan @ 2026-06-30 11:51 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
netdev, Paolo Abeni
Cc: Cosmin Ratiu, Eran Ben Elisha, Feng Liu, Haiyang Zhang,
Lama Kayal, Leon Romanovsky, linux-kernel, linux-rdma, Mark Bloch,
Nimrod Oren, Saeed Mahameed, Tariq Toukan, Gal Pressman,
Alexei Lazar, Simon Horman, Carolina Jubran, Kees Cook,
Eran Ben Elisha, Saeed Mahameed
In-Reply-To: <20260630115151.729219-1-tariqt@nvidia.com>
From: Feng Liu <feliu@nvidia.com>
mlx5e_channel_stats_alloc() publishes a new entry to
priv->channel_stats[] and then increments priv->stats_nch as a
publication token, but neither store carries any memory barrier:
priv->channel_stats[ix] = kvzalloc_node(...);
if (!priv->channel_stats[ix])
return -ENOMEM;
priv->stats_nch++;
Concurrent readers compute the loop bound from priv->stats_nch and
then dereference priv->channel_stats[i] using plain accesses, e.g.
for (i = 0; i < priv->stats_nch; i++) {
struct mlx5e_channel_stats *cs = priv->channel_stats[i];
... cs->rq.packets ...
}
On weakly-ordered architectures (ARM, PowerPC, RISC-V) the writes to
channel_stats[ix] and stats_nch may become visible to other CPUs out
of program order. A reader can observe stats_nch == N while still
seeing channel_stats[N-1] == NULL, leading to a NULL pointer
dereference in the channel_stats loop.
This has been observed in production on BlueField-3 DPUs (arm64),
where ovs-vswitchd queries netdev statistics over netlink during NIC
bringup, racing mlx5e_open_channel() -> mlx5e_channel_stats_alloc()
on another CPU:
Unable to handle kernel NULL pointer dereference at virtual address 0x840
Hardware name: BlueField-3 DPU
pc : mlx5e_fold_sw_stats64+0x30/0x180 [mlx5_core]
Call trace:
mlx5e_fold_sw_stats64+0x30/0x180 [mlx5_core]
dev_get_stats+0x50/0xc0
ovs_vport_get_stats+0x38/0xac [openvswitch]
ovs_vport_cmd_fill_info+0x194/0x290 [openvswitch]
ovs_vport_cmd_get+0xbc/0x10c [openvswitch]
genl_family_rcv_msg_doit+0xd0/0x160
genl_rcv_msg+0xec/0x1f0
netlink_rcv_skb+0x64/0x130
genl_rcv+0x40/0x60
netlink_unicast+0x2fc/0x370
netlink_sendmsg+0x1dc/0x454
...
__arm64_sys_sendmsg+0x2c/0x40
Add mlx5e_stats_nch_write() and mlx5e_stats_nch_read() helpers in en.h
that wrap the smp_store_release()/smp_load_acquire() pair on stats_nch.
The release/acquire pair establishes the contract:
stats_nch == N => channel_stats[0..N-1] are visible and non-NULL.
Publish the stats_nch increment via mlx5e_stats_nch_write() in the
writer (mlx5e_channel_stats_alloc()), and read stats_nch via
mlx5e_stats_nch_read() in all readers: mlx5e RX/TX queue stats,
mlx5e_get_base_stats(), ethtool channels stats, IPoIB stats, the
sw_stats fold and the HV VHCA stats agent.
Fixes: fa691d0c9c08 ("net/mlx5e: Allocate per-channel stats dynamically at first usage")
Signed-off-by: Feng Liu <feliu@nvidia.com>
Reviewed-by: Eran Ben Elisha <eranbe@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Nimrod Oren <noren@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
drivers/net/ethernet/mellanox/mlx5/core/en.h | 12 ++++++++++++
.../ethernet/mellanox/mlx5/core/en/hv_vhca_stats.c | 10 ++++++----
drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 14 ++++++++------
drivers/net/ethernet/mellanox/mlx5/core/en_stats.c | 9 +++++----
.../net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c | 3 ++-
5 files changed, 33 insertions(+), 15 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 2270e2e550dd..d507289096c2 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -987,6 +987,18 @@ struct mlx5e_priv {
struct ethtool_fec_hist_range *fec_ranges;
};
+static inline u16 mlx5e_stats_nch_read(const struct mlx5e_priv *priv)
+{
+ /* Pairs with smp_store_release in mlx5e_stats_nch_write(). */
+ return smp_load_acquire(&priv->stats_nch);
+}
+
+static inline void mlx5e_stats_nch_write(struct mlx5e_priv *priv, u16 n)
+{
+ /* Pairs with smp_load_acquire in mlx5e_stats_nch_read(). */
+ smp_store_release(&priv->stats_nch, n);
+}
+
struct mlx5e_dev {
struct net_device *netdev;
struct devlink_port dl_port;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/hv_vhca_stats.c b/drivers/net/ethernet/mellanox/mlx5/core/en/hv_vhca_stats.c
index cdaf77650164..631f802105d5 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/hv_vhca_stats.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/hv_vhca_stats.c
@@ -33,9 +33,10 @@ mlx5e_hv_vhca_fill_ring_stats(struct mlx5e_priv *priv, int ch,
static void mlx5e_hv_vhca_fill_stats(struct mlx5e_priv *priv, void *data,
int buf_len)
{
+ u16 nch = mlx5e_stats_nch_read(priv);
int ch, i = 0;
- for (ch = 0; ch < priv->stats_nch; ch++) {
+ for (ch = 0; ch < nch; ch++) {
void *buf = data + i;
if (WARN_ON_ONCE(buf +
@@ -50,8 +51,9 @@ static void mlx5e_hv_vhca_fill_stats(struct mlx5e_priv *priv, void *data,
static int mlx5e_hv_vhca_stats_buf_size(struct mlx5e_priv *priv)
{
- return (sizeof(struct mlx5e_hv_vhca_per_ring_stats) *
- priv->stats_nch);
+ u16 nch = mlx5e_stats_nch_read(priv);
+
+ return sizeof(struct mlx5e_hv_vhca_per_ring_stats) * nch;
}
static int mlx5e_hv_vhca_stats_buf_max_size(struct mlx5e_priv *priv)
@@ -106,7 +108,7 @@ static void mlx5e_hv_vhca_stats_control(struct mlx5_hv_vhca_agent *agent,
sagent = &priv->stats_agent;
block->version = MLX5_HV_VHCA_STATS_VERSION;
- block->rings = priv->stats_nch;
+ block->rings = mlx5e_stats_nch_read(priv);
if (!block->command) {
cancel_delayed_work_sync(&priv->stats_agent.work);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 775f0c6e55c9..aa8610cedaa8 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -2773,7 +2773,7 @@ static int mlx5e_channel_stats_alloc(struct mlx5e_priv *priv, int ix, int cpu)
GFP_KERNEL, cpu_to_node(cpu));
if (!priv->channel_stats[ix])
return -ENOMEM;
- priv->stats_nch++;
+ mlx5e_stats_nch_write(priv, priv->stats_nch + 1);
return 0;
}
@@ -4040,9 +4040,10 @@ static int mlx5e_setup_tc(struct net_device *dev, enum tc_setup_type type,
void mlx5e_fold_sw_stats64(struct mlx5e_priv *priv, struct rtnl_link_stats64 *s)
{
+ u16 nch = mlx5e_stats_nch_read(priv);
int i;
- for (i = 0; i < priv->stats_nch; i++) {
+ for (i = 0; i < nch; i++) {
struct mlx5e_channel_stats *channel_stats = priv->channel_stats[i];
struct mlx5e_rq_stats *xskrq_stats = &channel_stats->xskrq;
struct mlx5e_rq_stats *rq_stats = &channel_stats->rq;
@@ -5488,7 +5489,7 @@ static void mlx5e_get_queue_stats_rx(struct net_device *dev, int i,
struct mlx5e_rq_stats *xskrq_stats;
struct mlx5e_rq_stats *rq_stats;
- if (mlx5e_is_uplink_rep(priv) || !priv->stats_nch)
+ if (mlx5e_is_uplink_rep(priv) || !mlx5e_stats_nch_read(priv))
return;
channel_stats = priv->channel_stats[i];
@@ -5512,7 +5513,7 @@ static void mlx5e_get_queue_stats_tx(struct net_device *dev, int i,
struct mlx5e_priv *priv = netdev_priv(dev);
struct mlx5e_sq_stats *sq_stats;
- if (!priv->stats_nch)
+ if (!mlx5e_stats_nch_read(priv))
return;
/* no special case needed for ptp htb etc since txq2sq_stats is kept up
@@ -5538,6 +5539,7 @@ static void mlx5e_get_base_stats(struct net_device *dev,
struct netdev_queue_stats_tx *tx)
{
struct mlx5e_priv *priv = netdev_priv(dev);
+ u16 nch = mlx5e_stats_nch_read(priv);
struct mlx5e_ptp *ptp_channel;
int i, tc;
@@ -5549,7 +5551,7 @@ static void mlx5e_get_base_stats(struct net_device *dev,
rx->hw_gro_wire_packets = 0;
rx->hw_gro_wire_bytes = 0;
- for (i = priv->channels.params.num_channels; i < priv->stats_nch; i++) {
+ for (i = priv->channels.params.num_channels; i < nch; i++) {
struct netdev_queue_stats_rx rx_i = {0};
mlx5e_get_queue_stats_rx(dev, i, &rx_i);
@@ -5585,7 +5587,7 @@ static void mlx5e_get_base_stats(struct net_device *dev,
tx->stop = 0;
tx->wake = 0;
- for (i = 0; i < priv->stats_nch; i++) {
+ for (i = 0; i < nch; i++) {
struct mlx5e_channel_stats *channel_stats = priv->channel_stats[i];
/* handle two cases:
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
index 7f33261ba655..de38b60806c2 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
@@ -515,6 +515,7 @@ static void mlx5e_stats_update_stats_rq_page_pool(struct mlx5e_channel *c)
static MLX5E_DECLARE_STATS_GRP_OP_UPDATE_STATS(sw)
{
struct mlx5e_sw_stats *s = &priv->stats.sw;
+ u16 nch = mlx5e_stats_nch_read(priv);
int i;
memset(s, 0, sizeof(*s));
@@ -522,7 +523,7 @@ static MLX5E_DECLARE_STATS_GRP_OP_UPDATE_STATS(sw)
for (i = 0; i < priv->channels.num; i++) /* for active channels only */
mlx5e_stats_update_stats_rq_page_pool(priv->channels.c[i]);
- for (i = 0; i < priv->stats_nch; i++) {
+ for (i = 0; i < nch; i++) {
struct mlx5e_channel_stats *channel_stats =
priv->channel_stats[i];
@@ -2614,7 +2615,7 @@ static MLX5E_DECLARE_STATS_GRP_OP_UPDATE_STATS(ptp) { return; }
static MLX5E_DECLARE_STATS_GRP_OP_NUM_STATS(channels)
{
- int max_nch = priv->stats_nch;
+ int max_nch = mlx5e_stats_nch_read(priv);
return (NUM_RQ_STATS * max_nch) +
(NUM_CH_STATS * max_nch) +
@@ -2627,8 +2628,8 @@ static MLX5E_DECLARE_STATS_GRP_OP_NUM_STATS(channels)
static MLX5E_DECLARE_STATS_GRP_OP_FILL_STRS(channels)
{
+ int max_nch = mlx5e_stats_nch_read(priv);
bool is_xsk = priv->xsk.ever_used;
- int max_nch = priv->stats_nch;
int i, j, tc;
for (i = 0; i < max_nch; i++)
@@ -2660,8 +2661,8 @@ static MLX5E_DECLARE_STATS_GRP_OP_FILL_STRS(channels)
static MLX5E_DECLARE_STATS_GRP_OP_FILL_STATS(channels)
{
+ int max_nch = mlx5e_stats_nch_read(priv);
bool is_xsk = priv->xsk.ever_used;
- int max_nch = priv->stats_nch;
int i, j, tc;
for (i = 0; i < max_nch; i++)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c
index 0a6003fe60e9..674bed721e63 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c
@@ -135,10 +135,11 @@ void mlx5i_cleanup(struct mlx5e_priv *priv)
static void mlx5i_grp_sw_update_stats(struct mlx5e_priv *priv)
{
+ u16 nch = mlx5e_stats_nch_read(priv);
struct rtnl_link_stats64 s = {};
int i, j;
- for (i = 0; i < priv->stats_nch; i++) {
+ for (i = 0; i < nch; i++) {
struct mlx5e_channel_stats *channel_stats;
struct mlx5e_rq_stats *rq_stats;
--
2.44.0
^ permalink raw reply related
* [PATCH net V4 2/3] net/mlx5e: Fix HV VHCA stats agent registration race
From: Tariq Toukan @ 2026-06-30 11:51 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
netdev, Paolo Abeni
Cc: Cosmin Ratiu, Eran Ben Elisha, Feng Liu, Haiyang Zhang,
Lama Kayal, Leon Romanovsky, linux-kernel, linux-rdma, Mark Bloch,
Nimrod Oren, Saeed Mahameed, Tariq Toukan, Gal Pressman,
Alexei Lazar, Simon Horman, Carolina Jubran, Kees Cook,
Eran Ben Elisha, Saeed Mahameed
In-Reply-To: <20260630115151.729219-1-tariqt@nvidia.com>
From: Feng Liu <feliu@nvidia.com>
mlx5e_hv_vhca_stats_create() registers the stats agent through
mlx5_hv_vhca_agent_create(). The helper publishes the agent in
hv_vhca->agents[type] under agents_lock and immediately schedules an
asynchronous control invalidation on the HV VHCA workqueue before
returning to mlx5e.
The asynchronous invalidation invokes the control agent's invalidate
callback, which reads the hypervisor control block and forwards the
command to mlx5e_hv_vhca_stats_control(). That callback may either:
- call cancel_delayed_work_sync(&priv->stats_agent.work), or
- call queue_delayed_work(priv->wq, &sagent->work, sagent->delay).
However, the delayed_work and priv->stats_agent.agent are only
initialized after mlx5_hv_vhca_agent_create() returns to mlx5e:
agent = mlx5_hv_vhca_agent_create(...); /* publish + invalidate */
...
priv->stats_agent.agent = agent; /* too late */
INIT_DELAYED_WORK(&priv->stats_agent.work, ...); /* too late */
If the asynchronous control path runs before the two assignments
above, it can:
- Operate on an uninitialized delayed_work whose timer.function is
NULL. queue_delayed_work() calls add_timer() unconditionally, so
when the timer expires the timer softirq invokes a NULL function
pointer.
- Re-initialize the timer later through INIT_DELAYED_WORK() while
the timer is already enqueued in the timer wheel, corrupting the
hlist (entry.pprev cleared while the previous bucket node still
points at this entry).
- When the worker eventually runs, mlx5e_hv_vhca_stats_work() reads
sagent->agent (NULL) and dereferences it inside
mlx5_hv_vhca_agent_write().
Fix this by:
- Initializing priv->stats_agent.work before invoking
mlx5_hv_vhca_agent_create(), so the work is always in a valid
state when the control callback observes it.
- Adding a struct mlx5_hv_vhca_agent **ctx_update out-parameter
to mlx5_hv_vhca_agent_create(). The helper writes the agent
pointer to *ctx_update before publishing into hv_vhca->agents[]
and triggering the agents_update flow, so any callback
subsequently invoked from that flow already sees a valid
priv->stats_agent.agent. This avoids having the control
callback participate in agent initialization.
While at it, access priv->stats_agent.agent with
READ_ONCE()/WRITE_ONCE() for the cross-CPU access with the worker, and
clear priv->stats_agent.buf on the agent_create() failure path.
Fixes: cef35af34d6d ("net/mlx5e: Add mlx5e HV VHCA stats agent")
Signed-off-by: Feng Liu <feliu@nvidia.com>
Reviewed-by: Eran Ben Elisha <eranbe@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
.../mellanox/mlx5/core/en/hv_vhca_stats.c | 21 +++++++++++--------
.../ethernet/mellanox/mlx5/core/lib/hv_vhca.c | 8 +++++--
.../ethernet/mellanox/mlx5/core/lib/hv_vhca.h | 6 ++++--
3 files changed, 22 insertions(+), 13 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/hv_vhca_stats.c b/drivers/net/ethernet/mellanox/mlx5/core/en/hv_vhca_stats.c
index 72f3ca4dd076..cdaf77650164 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/hv_vhca_stats.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/hv_vhca_stats.c
@@ -73,7 +73,7 @@ static void mlx5e_hv_vhca_stats_work(struct work_struct *work)
sagent = container_of(dwork, struct mlx5e_hv_vhca_stats_agent, work);
priv = container_of(sagent, struct mlx5e_priv, stats_agent);
buf_len = mlx5e_hv_vhca_stats_buf_size(priv);
- agent = sagent->agent;
+ agent = READ_ONCE(sagent->agent);
buf = sagent->buf;
memset(buf, 0, buf_len);
@@ -135,11 +135,14 @@ void mlx5e_hv_vhca_stats_create(struct mlx5e_priv *priv)
if (!priv->stats_agent.buf)
return;
+ INIT_DELAYED_WORK(&priv->stats_agent.work, mlx5e_hv_vhca_stats_work);
+
agent = mlx5_hv_vhca_agent_create(priv->mdev->hv_vhca,
MLX5_HV_VHCA_AGENT_STATS,
mlx5e_hv_vhca_stats_control, NULL,
mlx5e_hv_vhca_stats_cleanup,
- priv);
+ priv,
+ &priv->stats_agent.agent);
if (IS_ERR_OR_NULL(agent)) {
if (IS_ERR(agent))
@@ -148,20 +151,20 @@ void mlx5e_hv_vhca_stats_create(struct mlx5e_priv *priv)
agent);
kvfree(priv->stats_agent.buf);
- return;
+ priv->stats_agent.buf = NULL;
}
-
- priv->stats_agent.agent = agent;
- INIT_DELAYED_WORK(&priv->stats_agent.work, mlx5e_hv_vhca_stats_work);
}
void mlx5e_hv_vhca_stats_destroy(struct mlx5e_priv *priv)
{
- if (IS_ERR_OR_NULL(priv->stats_agent.agent))
+ struct mlx5_hv_vhca_agent *agent;
+
+ agent = READ_ONCE(priv->stats_agent.agent);
+ if (IS_ERR_OR_NULL(agent))
return;
- mlx5_hv_vhca_agent_destroy(priv->stats_agent.agent);
- priv->stats_agent.agent = NULL;
+ mlx5_hv_vhca_agent_destroy(agent);
+ WRITE_ONCE(priv->stats_agent.agent, NULL);
kvfree(priv->stats_agent.buf);
priv->stats_agent.buf = NULL;
}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/hv_vhca.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/hv_vhca.c
index d6dc7bce855e..305752dab7bd 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/hv_vhca.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/hv_vhca.c
@@ -190,7 +190,7 @@ mlx5_hv_vhca_control_agent_create(struct mlx5_hv_vhca *hv_vhca)
return mlx5_hv_vhca_agent_create(hv_vhca, MLX5_HV_VHCA_AGENT_CONTROL,
NULL,
mlx5_hv_vhca_control_agent_invalidate,
- NULL, NULL);
+ NULL, NULL, NULL);
}
static void mlx5_hv_vhca_control_agent_destroy(struct mlx5_hv_vhca_agent *agent)
@@ -256,7 +256,8 @@ mlx5_hv_vhca_agent_create(struct mlx5_hv_vhca *hv_vhca,
void (*invalidate)(struct mlx5_hv_vhca_agent*,
u64 block_mask),
void (*cleaup)(struct mlx5_hv_vhca_agent *agent),
- void *priv)
+ void *priv,
+ struct mlx5_hv_vhca_agent **ctx_update)
{
struct mlx5_hv_vhca_agent *agent;
@@ -284,6 +285,9 @@ mlx5_hv_vhca_agent_create(struct mlx5_hv_vhca *hv_vhca,
agent->invalidate = invalidate;
agent->cleanup = cleaup;
+ if (ctx_update)
+ WRITE_ONCE(*ctx_update, agent);
+
mutex_lock(&hv_vhca->agents_lock);
hv_vhca->agents[type] = agent;
mutex_unlock(&hv_vhca->agents_lock);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/hv_vhca.h b/drivers/net/ethernet/mellanox/mlx5/core/lib/hv_vhca.h
index f240ffe5116c..8b3974cf0ee4 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/hv_vhca.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/hv_vhca.h
@@ -43,7 +43,8 @@ mlx5_hv_vhca_agent_create(struct mlx5_hv_vhca *hv_vhca,
void (*invalidate)(struct mlx5_hv_vhca_agent*,
u64 block_mask),
void (*cleanup)(struct mlx5_hv_vhca_agent *agent),
- void *context);
+ void *context,
+ struct mlx5_hv_vhca_agent **ctx_update);
void mlx5_hv_vhca_agent_destroy(struct mlx5_hv_vhca_agent *agent);
int mlx5_hv_vhca_agent_write(struct mlx5_hv_vhca_agent *agent,
@@ -84,7 +85,8 @@ mlx5_hv_vhca_agent_create(struct mlx5_hv_vhca *hv_vhca,
void (*invalidate)(struct mlx5_hv_vhca_agent*,
u64 block_mask),
void (*cleanup)(struct mlx5_hv_vhca_agent *agent),
- void *context)
+ void *context,
+ struct mlx5_hv_vhca_agent **ctx_update)
{
return NULL;
}
--
2.44.0
^ permalink raw reply related
* [PATCH net V4 1/3] net/mlx5e: Fix HV VHCA stats zero-sized buffer allocation
From: Tariq Toukan @ 2026-06-30 11:51 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
netdev, Paolo Abeni
Cc: Cosmin Ratiu, Eran Ben Elisha, Feng Liu, Haiyang Zhang,
Lama Kayal, Leon Romanovsky, linux-kernel, linux-rdma, Mark Bloch,
Nimrod Oren, Saeed Mahameed, Tariq Toukan, Gal Pressman,
Alexei Lazar, Simon Horman, Carolina Jubran, Kees Cook,
Eran Ben Elisha, Saeed Mahameed
In-Reply-To: <20260630115151.729219-1-tariqt@nvidia.com>
From: Feng Liu <feliu@nvidia.com>
mlx5e_hv_vhca_stats_create() is called from mlx5e_nic_enable(),
before mlx5e_open(). At that point priv->stats_nch is still zero,
because it is only ever incremented in mlx5e_channel_stats_alloc(),
which is reached only from mlx5e_open_channel().
mlx5e_hv_vhca_stats_buf_size() therefore returns 0, and
kvzalloc(0, GFP_KERNEL) returns ZERO_SIZE_PTR ((void *)16) rather
than NULL. The "if (!buf)" guard does not catch this, and
mlx5e_hv_vhca_stats_create() completes "successfully" with
priv->stats_agent.buf set to ZERO_SIZE_PTR.
Once channels are opened (priv->stats_nch > 0) and the hypervisor
enables stats reporting, mlx5e_hv_vhca_stats_work() recomputes
buf_len using the new non-zero stats_nch and calls
memset(buf, 0, buf_len) on ZERO_SIZE_PTR, faulting at address 0x10.
Allocate the buffer based on priv->max_nch, which is set in
mlx5e_priv_init() and is the upper bound on stats_nch:
- Add a separate helper mlx5e_hv_vhca_stats_buf_max_size() that
returns sizeof(per_ring_stats) * max(max_nch, stats_nch), and
use it for the kvzalloc() in mlx5e_hv_vhca_stats_create().
- Keep mlx5e_hv_vhca_stats_buf_size() (which returns based on
stats_nch) for the worker's active payload size, so the wire
format (block->rings = stats_nch) and the amount of data filled
by mlx5e_hv_vhca_fill_stats() are unchanged.
The max(max_nch, stats_nch) guard handles the rare case where
mlx5e_attach_netdev() recomputes max_nch downward across a
detach/resume cycle while priv->stats_nch persists (mlx5e_detach_netdev
does not call mlx5e_priv_cleanup, so stats_nch is only reset when
the netdev is destroyed). Without the guard, the worker could compute
buf_len from stats_nch and overrun the smaller buffer allocated based
on the reduced max_nch.
Allocating a non-zero buffer also makes the kvzalloc() failure path in
mlx5e_hv_vhca_stats_create() reachable for the first time: it returns
early without (re)creating the agent. Clear
priv->stats_agent.{agent,buf} in mlx5e_hv_vhca_stats_destroy() after
freeing them, so that if a later create() bails out on this path, a
subsequent teardown does not double-free the stale agent/buffer left
from a previous enable/disable cycle.
This mirrors the existing mlx5e pattern of preallocating arrays of
size max_nch (e.g. priv->channel_stats) and lazily populating
entries up to stats_nch on demand.
Fixes: fa691d0c9c08 ("net/mlx5e: Allocate per-channel stats dynamically at first usage")
Signed-off-by: Feng Liu <feliu@nvidia.com>
Reviewed-by: Eran Ben Elisha <eranbe@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
.../net/ethernet/mellanox/mlx5/core/en/hv_vhca_stats.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/hv_vhca_stats.c b/drivers/net/ethernet/mellanox/mlx5/core/en/hv_vhca_stats.c
index 195863b2c013..72f3ca4dd076 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/hv_vhca_stats.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/hv_vhca_stats.c
@@ -54,6 +54,12 @@ static int mlx5e_hv_vhca_stats_buf_size(struct mlx5e_priv *priv)
priv->stats_nch);
}
+static int mlx5e_hv_vhca_stats_buf_max_size(struct mlx5e_priv *priv)
+{
+ return (sizeof(struct mlx5e_hv_vhca_per_ring_stats) *
+ max(priv->max_nch, priv->stats_nch));
+}
+
static void mlx5e_hv_vhca_stats_work(struct work_struct *work)
{
struct mlx5e_hv_vhca_stats_agent *sagent;
@@ -122,7 +128,7 @@ static void mlx5e_hv_vhca_stats_cleanup(struct mlx5_hv_vhca_agent *agent)
void mlx5e_hv_vhca_stats_create(struct mlx5e_priv *priv)
{
- int buf_len = mlx5e_hv_vhca_stats_buf_size(priv);
+ int buf_len = mlx5e_hv_vhca_stats_buf_max_size(priv);
struct mlx5_hv_vhca_agent *agent;
priv->stats_agent.buf = kvzalloc(buf_len, GFP_KERNEL);
@@ -155,5 +161,7 @@ void mlx5e_hv_vhca_stats_destroy(struct mlx5e_priv *priv)
return;
mlx5_hv_vhca_agent_destroy(priv->stats_agent.agent);
+ priv->stats_agent.agent = NULL;
kvfree(priv->stats_agent.buf);
+ priv->stats_agent.buf = NULL;
}
--
2.44.0
^ permalink raw reply related
* [PATCH net V4 0/3] net/mlx5e: Fix crashes in dynamic per-channel stats and HV VHCA agent
From: Tariq Toukan @ 2026-06-30 11:51 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
netdev, Paolo Abeni
Cc: Cosmin Ratiu, Eran Ben Elisha, Feng Liu, Haiyang Zhang,
Lama Kayal, Leon Romanovsky, linux-kernel, linux-rdma, Mark Bloch,
Nimrod Oren, Saeed Mahameed, Tariq Toukan, Gal Pressman,
Alexei Lazar, Simon Horman, Carolina Jubran, Kees Cook,
Eran Ben Elisha, Saeed Mahameed
Hi,
Since per-channel stats were converted to be allocated and published
lazily at first channel open in commit fa691d0c9c08 ("net/mlx5e:
Allocate per-channel stats dynamically at first usage"),
priv->channel_stats[] and priv->stats_nch are filled in
incrementally during interface bring-up. This opened a window in
which the various stats readers - most of them reachable from
userspace via netlink/netdev stats queries - can race with
mlx5e_open_channel() on another CPU and observe partially
initialized state. The HV VHCA stats agent, which is created
before the channels are opened, hits related problems of its own.
This series by Feng fixes the resulting crashes.
Regards,
Tariq
V4:
- Patch 1/3: also clear priv->stats_agent.{agent,buf} to NULL in
mlx5e_hv_vhca_stats_destroy() after freeing them. Making the
allocation non-zero in V3 made the kvzalloc() failure path in
mlx5e_hv_vhca_stats_create() reachable for the first time; without
the NULL assignments a failed create followed by destroy would
double-free stale pointers from a previous cycle.
(Caught by Simon Horman.)
V3:
https://lore.kernel.org/all/20260622083646.593220-1-tariqt@nvidia.com/
V2:
https://lore.kernel.org/all/20260617140127.573117-1-tariqt@nvidia.com/
Feng Liu (3):
net/mlx5e: Fix HV VHCA stats zero-sized buffer allocation
net/mlx5e: Fix HV VHCA stats agent registration race
net/mlx5e: Fix publication race for priv->channel_stats[]
drivers/net/ethernet/mellanox/mlx5/core/en.h | 12 ++++++
.../mellanox/mlx5/core/en/hv_vhca_stats.c | 37 +++++++++++++------
.../net/ethernet/mellanox/mlx5/core/en_main.c | 14 ++++---
.../ethernet/mellanox/mlx5/core/en_stats.c | 9 +++--
.../ethernet/mellanox/mlx5/core/ipoib/ipoib.c | 3 +-
.../ethernet/mellanox/mlx5/core/lib/hv_vhca.c | 8 +++-
.../ethernet/mellanox/mlx5/core/lib/hv_vhca.h | 6 ++-
7 files changed, 62 insertions(+), 27 deletions(-)
base-commit: dbf803bc4a8b0522c9a12560c20905a5952d1cb9
--
2.44.0
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox