* [PATCH v3 5/5] net: qrtr: ns: Fix use-after-free in driver remove()
From: Manivannan Sadhasivam via B4 Relay @ 2026-04-09 17:34 UTC (permalink / raw)
To: Manivannan Sadhasivam, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman
Cc: linux-arm-msm, netdev, linux-kernel, stable,
Manivannan Sadhasivam
In-Reply-To: <20260409-qrtr-fix-v3-0-00a8a5ff2b51@oss.qualcomm.com>
From: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
In the remove callback, if a packet arrives after destroy_workqueue() is
called, but before sock_release(), the qrtr_ns_data_ready() callback will
try to queue the work, causing use-after-free issue.
Fix this issue by saving the default 'sk_data_ready' callback during
qrtr_ns_init() and use it to replace the qrtr_ns_data_ready() callback at
the start of remove(). This ensures that even if a packet arrives after
destroy_workqueue(), the work struct will not be dereferenced.
Note that it is also required to ensure that the RX threads are completed
before destroying the workqueue, because the threads could be using the
qrtr_ns_data_ready() callback.
Cc: stable@vger.kernel.org
Fixes: 0c2204a4ad71 ("net: qrtr: Migrate nameservice to kernel from userspace")
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
---
net/qrtr/ns.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/net/qrtr/ns.c b/net/qrtr/ns.c
index c0418764470b..b3f9bbcf9ab9 100644
--- a/net/qrtr/ns.c
+++ b/net/qrtr/ns.c
@@ -25,6 +25,7 @@ static struct {
u32 lookup_count;
struct workqueue_struct *workqueue;
struct work_struct work;
+ void (*saved_data_ready)(struct sock *sk);
int local_node;
} qrtr_ns;
@@ -757,6 +758,7 @@ int qrtr_ns_init(void)
goto err_sock;
}
+ qrtr_ns.saved_data_ready = qrtr_ns.sock->sk->sk_data_ready;
qrtr_ns.sock->sk->sk_data_ready = qrtr_ns_data_ready;
sq.sq_port = QRTR_PORT_CTRL;
@@ -797,6 +799,10 @@ int qrtr_ns_init(void)
return 0;
err_wq:
+ write_lock_bh(&qrtr_ns.sock->sk->sk_callback_lock);
+ qrtr_ns.sock->sk->sk_data_ready = qrtr_ns.saved_data_ready;
+ write_unlock_bh(&qrtr_ns.sock->sk->sk_callback_lock);
+
destroy_workqueue(qrtr_ns.workqueue);
err_sock:
sock_release(qrtr_ns.sock);
@@ -806,7 +812,12 @@ EXPORT_SYMBOL_GPL(qrtr_ns_init);
void qrtr_ns_remove(void)
{
+ write_lock_bh(&qrtr_ns.sock->sk->sk_callback_lock);
+ qrtr_ns.sock->sk->sk_data_ready = qrtr_ns.saved_data_ready;
+ write_unlock_bh(&qrtr_ns.sock->sk->sk_callback_lock);
+
cancel_work_sync(&qrtr_ns.work);
+ synchronize_net();
destroy_workqueue(qrtr_ns.workqueue);
/* sock_release() expects the two references that were put during
--
2.51.0
^ permalink raw reply related
* [PATCH v3 2/5] net: qrtr: ns: Limit the maximum number of lookups
From: Manivannan Sadhasivam via B4 Relay @ 2026-04-09 17:34 UTC (permalink / raw)
To: Manivannan Sadhasivam, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman
Cc: linux-arm-msm, netdev, linux-kernel, stable,
Manivannan Sadhasivam
In-Reply-To: <20260409-qrtr-fix-v3-0-00a8a5ff2b51@oss.qualcomm.com>
From: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
Current code does no bound checking on the number of lookups a client can
perform. Though the code restricts the lookups to local clients, there is
still a possibility of a malicious local client sending a flood of
NEW_LOOKUP messages over the same socket.
Fix this issue by limiting the maximum number of lookups to 64 globally.
Since the nameserver allows only atmost one local observer, this global
lookup count will ensure that the lookups stay within the limit.
Note that, limit of 64 is chosen based on the current platform
requirements. If requirement changes in the future, this limit can be
increased.
Cc: stable@vger.kernel.org
Fixes: 0c2204a4ad71 ("net: qrtr: Migrate nameservice to kernel from userspace")
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
---
net/qrtr/ns.c | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/net/qrtr/ns.c b/net/qrtr/ns.c
index 63cb5861d87a..5b08d4d4840a 100644
--- a/net/qrtr/ns.c
+++ b/net/qrtr/ns.c
@@ -22,6 +22,7 @@ static struct {
struct socket *sock;
struct sockaddr_qrtr bcast_sq;
struct list_head lookups;
+ u32 lookup_count;
struct workqueue_struct *workqueue;
struct work_struct work;
int local_node;
@@ -70,10 +71,11 @@ struct qrtr_node {
u32 server_count;
};
-/* Max server limit is chosen based on the current platform requirements. If the
- * requirement changes in the future, this value can be increased.
+/* Max server, lookup limits are chosen based on the current platform requirements.
+ * If the requirement changes in the future, these values can be increased.
*/
#define QRTR_NS_MAX_SERVERS 256
+#define QRTR_NS_MAX_LOOKUPS 64
static struct qrtr_node *node_get(unsigned int node_id)
{
@@ -433,6 +435,7 @@ static int ctrl_cmd_del_client(struct sockaddr_qrtr *from,
list_del(&lookup->li);
kfree(lookup);
+ qrtr_ns.lookup_count--;
}
/* Remove the server belonging to this port but don't broadcast
@@ -550,6 +553,11 @@ static int ctrl_cmd_new_lookup(struct sockaddr_qrtr *from,
if (from->sq_node != qrtr_ns.local_node)
return -EINVAL;
+ if (qrtr_ns.lookup_count >= QRTR_NS_MAX_LOOKUPS) {
+ pr_err_ratelimited("QRTR client node exceeds max lookup limit!\n");
+ return -ENOSPC;
+ }
+
lookup = kzalloc_obj(*lookup);
if (!lookup)
return -ENOMEM;
@@ -558,6 +566,7 @@ static int ctrl_cmd_new_lookup(struct sockaddr_qrtr *from,
lookup->service = service;
lookup->instance = instance;
list_add_tail(&lookup->li, &qrtr_ns.lookups);
+ qrtr_ns.lookup_count++;
memset(&filter, 0, sizeof(filter));
filter.service = service;
@@ -598,6 +607,7 @@ static void ctrl_cmd_del_lookup(struct sockaddr_qrtr *from,
list_del(&lookup->li);
kfree(lookup);
+ qrtr_ns.lookup_count--;
}
}
--
2.51.0
^ permalink raw reply related
* [PATCH v3 3/5] net: qrtr: ns: Free the node during ctrl_cmd_bye()
From: Manivannan Sadhasivam via B4 Relay @ 2026-04-09 17:34 UTC (permalink / raw)
To: Manivannan Sadhasivam, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman
Cc: linux-arm-msm, netdev, linux-kernel, stable,
Manivannan Sadhasivam
In-Reply-To: <20260409-qrtr-fix-v3-0-00a8a5ff2b51@oss.qualcomm.com>
From: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
A node sends the BYE packet when it is about to go down. So the nameserver
should advertise the removal of the node to all remote and local observers
and free the node finally. But currently, the nameserver doesn't free the
node memory even after processing the BYE packet. This causes the node
memory to leak.
Hence, remove the node from Xarray list and free the node memory during
both success and failure case of ctrl_cmd_bye().
Cc: stable@vger.kernel.org
Fixes: 0c2204a4ad71 ("net: qrtr: Migrate nameservice to kernel from userspace")
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
---
net/qrtr/ns.c | 20 +++++++++++++++-----
1 file changed, 15 insertions(+), 5 deletions(-)
diff --git a/net/qrtr/ns.c b/net/qrtr/ns.c
index 5b08d4d4840a..1b9a90240a68 100644
--- a/net/qrtr/ns.c
+++ b/net/qrtr/ns.c
@@ -359,7 +359,7 @@ static int ctrl_cmd_bye(struct sockaddr_qrtr *from)
struct qrtr_node *node;
unsigned long index;
struct kvec iv;
- int ret;
+ int ret = 0;
iv.iov_base = &pkt;
iv.iov_len = sizeof(pkt);
@@ -374,8 +374,10 @@ static int ctrl_cmd_bye(struct sockaddr_qrtr *from)
/* Advertise the removal of this client to all local servers */
local_node = node_get(qrtr_ns.local_node);
- if (!local_node)
- return 0;
+ if (!local_node) {
+ ret = 0;
+ goto delete_node;
+ }
memset(&pkt, 0, sizeof(pkt));
pkt.cmd = cpu_to_le32(QRTR_TYPE_BYE);
@@ -392,10 +394,18 @@ static int ctrl_cmd_bye(struct sockaddr_qrtr *from)
ret = kernel_sendmsg(qrtr_ns.sock, &msg, &iv, 1, sizeof(pkt));
if (ret < 0 && ret != -ENODEV) {
pr_err("failed to send bye cmd\n");
- return ret;
+ goto delete_node;
}
}
- return 0;
+
+ /* Ignore -ENODEV */
+ ret = 0;
+
+delete_node:
+ xa_erase(&nodes, from->sq_node);
+ kfree(node);
+
+ return ret;
}
static int ctrl_cmd_del_client(struct sockaddr_qrtr *from,
--
2.51.0
^ permalink raw reply related
* [PATCH v3 4/5] net: qrtr: ns: Limit the total number of nodes
From: Manivannan Sadhasivam via B4 Relay @ 2026-04-09 17:34 UTC (permalink / raw)
To: Manivannan Sadhasivam, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman
Cc: linux-arm-msm, netdev, linux-kernel, stable,
Manivannan Sadhasivam
In-Reply-To: <20260409-qrtr-fix-v3-0-00a8a5ff2b51@oss.qualcomm.com>
From: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
Currently, the nameserver doesn't limit the number of nodes it handles.
This can be an attack vector if a malicious client starts registering
random nodes, leading to memory exhaustion.
Hence, limit the maximum number of nodes to 64. Note that, limit of 64 is
chosen based on the current platform requirements. If requirement changes
in the future, this limit can be increased.
Cc: stable@vger.kernel.org
Fixes: 0c2204a4ad71 ("net: qrtr: Migrate nameservice to kernel from userspace")
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
---
net/qrtr/ns.c | 16 ++++++++++++++--
1 file changed, 14 insertions(+), 2 deletions(-)
diff --git a/net/qrtr/ns.c b/net/qrtr/ns.c
index 1b9a90240a68..c0418764470b 100644
--- a/net/qrtr/ns.c
+++ b/net/qrtr/ns.c
@@ -71,12 +71,16 @@ struct qrtr_node {
u32 server_count;
};
-/* Max server, lookup limits are chosen based on the current platform requirements.
- * If the requirement changes in the future, these values can be increased.
+/* Max nodes, server, lookup limits are chosen based on the current platform
+ * requirements. If the requirement changes in the future, these values can be
+ * increased.
*/
+#define QRTR_NS_MAX_NODES 64
#define QRTR_NS_MAX_SERVERS 256
#define QRTR_NS_MAX_LOOKUPS 64
+static u8 node_count;
+
static struct qrtr_node *node_get(unsigned int node_id)
{
struct qrtr_node *node;
@@ -85,6 +89,11 @@ static struct qrtr_node *node_get(unsigned int node_id)
if (node)
return node;
+ if (node_count >= QRTR_NS_MAX_NODES) {
+ pr_err_ratelimited("QRTR clients exceed max node limit!\n");
+ return NULL;
+ }
+
/* If node didn't exist, allocate and insert it to the tree */
node = kzalloc_obj(*node);
if (!node)
@@ -98,6 +107,8 @@ static struct qrtr_node *node_get(unsigned int node_id)
return NULL;
}
+ node_count++;
+
return node;
}
@@ -404,6 +415,7 @@ static int ctrl_cmd_bye(struct sockaddr_qrtr *from)
delete_node:
xa_erase(&nodes, from->sq_node);
kfree(node);
+ node_count--;
return ret;
}
--
2.51.0
^ permalink raw reply related
* [PATCH v3 0/5] net: qrtr: ns: A bunch of fixs
From: Manivannan Sadhasivam via B4 Relay @ 2026-04-09 17:34 UTC (permalink / raw)
To: Manivannan Sadhasivam, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman
Cc: linux-arm-msm, netdev, linux-kernel, stable, Yiming Qian,
Manivannan Sadhasivam
Hi,
This series fixes a bunch of possible memory exhaustion issues in the QRTR
nameserver.
---
Changes in v3:
- Fixed the issues in remove() callback and other places reported by Sashiko
- Link to v2: https://patch.msgid.link/20260403-qrtr-fix-v2-0-f88a14859c63@oss.qualcomm.com
---
Manivannan Sadhasivam (5):
net: qrtr: ns: Limit the maximum server registration per node
net: qrtr: ns: Limit the maximum number of lookups
net: qrtr: ns: Free the node during ctrl_cmd_bye()
net: qrtr: ns: Limit the total number of nodes
net: qrtr: ns: Fix use-after-free in driver remove()
net/qrtr/ns.c | 79 +++++++++++++++++++++++++++++++++++++++++++++++++++--------
1 file changed, 69 insertions(+), 10 deletions(-)
---
base-commit: 6de23f81a5e08be8fbf5e8d7e9febc72a5b5f27f
change-id: 20260331-qrtr-fix-b502c26e5f46
Best regards,
--
Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
^ permalink raw reply
* [PATCH v3 1/5] net: qrtr: ns: Limit the maximum server registration per node
From: Manivannan Sadhasivam via B4 Relay @ 2026-04-09 17:34 UTC (permalink / raw)
To: Manivannan Sadhasivam, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman
Cc: linux-arm-msm, netdev, linux-kernel, stable, Yiming Qian,
Manivannan Sadhasivam
In-Reply-To: <20260409-qrtr-fix-v3-0-00a8a5ff2b51@oss.qualcomm.com>
From: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
Current code does no bound checking on the number of servers added per
node. A malicious client can flood NEW_SERVER messages and exhaust memory.
Fix this issue by limiting the maximum number of server registrations to
256 per node. If the NEW_SERVER message is received for an old port, then
don't restrict it as it will get replaced. While at it, also rate limit
the error messages in the failure path of qrtr_ns_worker().
Note that the limit of 256 is chosen based on the current platform
requirements. If requirement changes in the future, this limit can be
increased.
Cc: stable@vger.kernel.org
Fixes: 0c2204a4ad71 ("net: qrtr: Migrate nameservice to kernel from userspace")
Reported-by: Yiming Qian <yimingqian591@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
---
net/qrtr/ns.c | 26 +++++++++++++++++++++-----
1 file changed, 21 insertions(+), 5 deletions(-)
diff --git a/net/qrtr/ns.c b/net/qrtr/ns.c
index 3203b2220860..63cb5861d87a 100644
--- a/net/qrtr/ns.c
+++ b/net/qrtr/ns.c
@@ -67,8 +67,14 @@ struct qrtr_server {
struct qrtr_node {
unsigned int id;
struct xarray servers;
+ u32 server_count;
};
+/* Max server limit is chosen based on the current platform requirements. If the
+ * requirement changes in the future, this value can be increased.
+ */
+#define QRTR_NS_MAX_SERVERS 256
+
static struct qrtr_node *node_get(unsigned int node_id)
{
struct qrtr_node *node;
@@ -229,6 +235,17 @@ static struct qrtr_server *server_add(unsigned int service,
if (!service || !port)
return NULL;
+ node = node_get(node_id);
+ if (!node)
+ return NULL;
+
+ /* Make sure the new servers per port are capped at the maximum value */
+ old = xa_load(&node->servers, port);
+ if (!old && node->server_count >= QRTR_NS_MAX_SERVERS) {
+ pr_err_ratelimited("QRTR client node %u exceeds max server limit!\n", node_id);
+ return NULL;
+ }
+
srv = kzalloc_obj(*srv);
if (!srv)
return NULL;
@@ -238,10 +255,6 @@ static struct qrtr_server *server_add(unsigned int service,
srv->node = node_id;
srv->port = port;
- node = node_get(node_id);
- if (!node)
- goto err;
-
/* Delete the old server on the same port */
old = xa_store(&node->servers, port, srv, GFP_KERNEL);
if (old) {
@@ -252,6 +265,8 @@ static struct qrtr_server *server_add(unsigned int service,
} else {
kfree(old);
}
+ } else {
+ node->server_count++;
}
trace_qrtr_ns_server_add(srv->service, srv->instance,
@@ -292,6 +307,7 @@ static int server_del(struct qrtr_node *node, unsigned int port, bool bcast)
}
kfree(srv);
+ node->server_count--;
return 0;
}
@@ -670,7 +686,7 @@ static void qrtr_ns_worker(struct work_struct *work)
}
if (ret < 0)
- pr_err("failed while handling packet from %d:%d",
+ pr_err_ratelimited("failed while handling packet from %d:%d",
sq.sq_node, sq.sq_port);
}
--
2.51.0
^ permalink raw reply related
* [syzbot] [mptcp?] possible deadlock in mptcp_pm_mp_prio_send_ack
From: syzbot @ 2026-04-09 17:13 UTC (permalink / raw)
To: davem, edumazet, geliang, horms, kuba, linux-kernel, martineau,
matttbe, mptcp, netdev, pabeni, syzkaller-bugs
Hello,
syzbot found the following issue on:
HEAD commit: 1caa871bb061 Merge branch 'net-stmmac-fix-tegra234-mgbe-cl..
git tree: net
console output: https://syzkaller.appspot.com/x/log.txt?x=11d74e06580000
kernel config: https://syzkaller.appspot.com/x/.config?x=6754c86e8d9e4c91
dashboard link: https://syzkaller.appspot.com/bug?extid=2204dbe6a049b3218db9
compiler: Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8
Unfortunately, I don't have any reproducer for this issue yet.
Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/014aae23b990/disk-1caa871b.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/c574a710638c/vmlinux-1caa871b.xz
kernel image: https://storage.googleapis.com/syzbot-assets/b29909f4efc4/bzImage-1caa871b.xz
IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+2204dbe6a049b3218db9@syzkaller.appspotmail.com
netlink: 8 bytes leftover after parsing attributes in process `syz.2.2034'.
netlink: 8 bytes leftover after parsing attributes in process `syz.2.2034'.
======================================================
WARNING: possible circular locking dependency detected
syzkaller #0 Not tainted
------------------------------------------------------
syz.2.2034/13659 is trying to acquire lock:
ffff888031173560 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_pm_mp_prio_send_ack+0xaf8/0xba0 net/mptcp/pm.c:296
but task is already holding lock:
ffff88807e300ea0 (sk_lock-AF_INET){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1709 [inline]
ffff88807e300ea0 (sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_pm_nl_set_flags_all net/mptcp/pm_kernel.c:1482 [inline]
ffff88807e300ea0 (sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_pm_nl_set_flags+0x795/0xc90 net/mptcp/pm_kernel.c:1551
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #7 (sk_lock-AF_INET){+.+.}-{0:0}:
lock_sock_nested+0x48/0x100 net/core/sock.c:3780
lock_sock include/net/sock.h:1709 [inline]
inet_shutdown+0x6a/0x390 net/ipv4/af_inet.c:919
nbd_mark_nsock_dead+0x2e9/0x560 drivers/block/nbd.c:318
recv_work+0x1c7f/0x1d90 drivers/block/nbd.c:1021
process_one_work kernel/workqueue.c:3276 [inline]
process_scheduled_works+0xb6e/0x18c0 kernel/workqueue.c:3359
worker_thread+0xa53/0xfc0 kernel/workqueue.c:3440
kthread+0x388/0x470 kernel/kthread.c:436
ret_from_fork+0x51e/0xb90 arch/x86/kernel/process.c:158
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
-> #6 (&nsock->tx_lock){+.+.}-{4:4}:
__mutex_lock_common kernel/locking/mutex.c:614 [inline]
__mutex_lock+0x19f/0x1300 kernel/locking/mutex.c:776
nbd_handle_cmd drivers/block/nbd.c:1143 [inline]
nbd_queue_rq+0x37b/0x1100 drivers/block/nbd.c:1207
blk_mq_dispatch_rq_list+0xa70/0x1910 block/blk-mq.c:2148
__blk_mq_do_dispatch_sched block/blk-mq-sched.c:168 [inline]
blk_mq_do_dispatch_sched block/blk-mq-sched.c:182 [inline]
__blk_mq_sched_dispatch_requests+0xdcc/0x1600 block/blk-mq-sched.c:307
blk_mq_sched_dispatch_requests+0xd7/0x190 block/blk-mq-sched.c:329
blk_mq_run_hw_queue+0x348/0x4f0 block/blk-mq.c:2386
blk_mq_dispatch_list+0xd16/0xe10 include/linux/spinlock.h:-1
blk_mq_flush_plug_list+0x48d/0x570 block/blk-mq.c:2997
__blk_flush_plug+0x3ed/0x4d0 block/blk-core.c:1230
blk_finish_plug block/blk-core.c:1257 [inline]
__submit_bio+0x28d/0x580 block/blk-core.c:649
__submit_bio_noacct_mq block/blk-core.c:722 [inline]
submit_bio_noacct_nocheck+0x2f4/0xa70 block/blk-core.c:753
submit_bh fs/buffer.c:2826 [inline]
block_read_full_folio+0x599/0x830 fs/buffer.c:2444
filemap_read_folio+0x137/0x3b0 mm/filemap.c:2501
do_read_cache_folio+0x358/0x590 mm/filemap.c:4101
read_mapping_folio include/linux/pagemap.h:1017 [inline]
read_part_sector+0xb6/0x2b0 block/partitions/core.c:723
adfspart_check_ICS+0xa5/0xa40 block/partitions/acorn.c:360
check_partition block/partitions/core.c:142 [inline]
blk_add_partitions block/partitions/core.c:590 [inline]
bdev_disk_changed+0x7ba/0x1550 block/partitions/core.c:694
blkdev_get_whole+0x380/0x510 block/bdev.c:764
bdev_open+0x31e/0xd30 block/bdev.c:973
blkdev_open+0x470/0x610 block/fops.c:697
do_dentry_open+0x785/0x14e0 fs/open.c:949
vfs_open+0x3b/0x340 fs/open.c:1081
do_open fs/namei.c:4677 [inline]
path_openat+0x2e08/0x3860 fs/namei.c:4836
do_file_open+0x23e/0x4a0 fs/namei.c:4865
do_sys_openat2+0x113/0x200 fs/open.c:1366
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x138/0x170 fs/open.c:1383
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0x14d/0xf80 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
-> #5 (&cmd->lock){+.+.}-{4:4}:
__mutex_lock_common kernel/locking/mutex.c:614 [inline]
__mutex_lock+0x19f/0x1300 kernel/locking/mutex.c:776
nbd_queue_rq+0xc6/0x1100 drivers/block/nbd.c:1199
blk_mq_dispatch_rq_list+0xa70/0x1910 block/blk-mq.c:2148
__blk_mq_do_dispatch_sched block/blk-mq-sched.c:168 [inline]
blk_mq_do_dispatch_sched block/blk-mq-sched.c:182 [inline]
__blk_mq_sched_dispatch_requests+0xdcc/0x1600 block/blk-mq-sched.c:307
blk_mq_sched_dispatch_requests+0xd7/0x190 block/blk-mq-sched.c:329
blk_mq_run_hw_queue+0x348/0x4f0 block/blk-mq.c:2386
blk_mq_dispatch_list+0xd16/0xe10 include/linux/spinlock.h:-1
blk_mq_flush_plug_list+0x48d/0x570 block/blk-mq.c:2997
__blk_flush_plug+0x3ed/0x4d0 block/blk-core.c:1230
blk_finish_plug block/blk-core.c:1257 [inline]
__submit_bio+0x28d/0x580 block/blk-core.c:649
__submit_bio_noacct_mq block/blk-core.c:722 [inline]
submit_bio_noacct_nocheck+0x2f4/0xa70 block/blk-core.c:753
submit_bh fs/buffer.c:2826 [inline]
block_read_full_folio+0x599/0x830 fs/buffer.c:2444
filemap_read_folio+0x137/0x3b0 mm/filemap.c:2501
do_read_cache_folio+0x358/0x590 mm/filemap.c:4101
read_mapping_folio include/linux/pagemap.h:1017 [inline]
read_part_sector+0xb6/0x2b0 block/partitions/core.c:723
adfspart_check_ICS+0xa5/0xa40 block/partitions/acorn.c:360
check_partition block/partitions/core.c:142 [inline]
blk_add_partitions block/partitions/core.c:590 [inline]
bdev_disk_changed+0x7ba/0x1550 block/partitions/core.c:694
blkdev_get_whole+0x380/0x510 block/bdev.c:764
bdev_open+0x31e/0xd30 block/bdev.c:973
blkdev_open+0x470/0x610 block/fops.c:697
do_dentry_open+0x785/0x14e0 fs/open.c:949
vfs_open+0x3b/0x340 fs/open.c:1081
do_open fs/namei.c:4677 [inline]
path_openat+0x2e08/0x3860 fs/namei.c:4836
do_file_open+0x23e/0x4a0 fs/namei.c:4865
do_sys_openat2+0x113/0x200 fs/open.c:1366
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x138/0x170 fs/open.c:1383
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0x14d/0xf80 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
-> #4 (set->srcu){.+.+}-{0:0}:
srcu_lock_sync include/linux/srcu.h:199 [inline]
__synchronize_srcu+0xca/0x300 kernel/rcu/srcutree.c:1481
elevator_switch+0x1e8/0x7a0 block/elevator.c:576
elevator_change+0x2cc/0x450 block/elevator.c:681
elevator_set_default+0x36c/0x430 block/elevator.c:754
blk_register_queue+0x366/0x430 block/blk-sysfs.c:946
__add_disk+0x677/0xd50 block/genhd.c:528
add_disk_fwnode+0xfb/0x480 block/genhd.c:597
add_disk include/linux/blkdev.h:785 [inline]
nbd_dev_add+0x72c/0xb50 drivers/block/nbd.c:1984
nbd_init+0x168/0x1f0 drivers/block/nbd.c:2692
do_one_initcall+0x250/0x8d0 init/main.c:1382
do_initcall_level+0x104/0x190 init/main.c:1444
do_initcalls+0x59/0xa0 init/main.c:1460
kernel_init_freeable+0x2a6/0x3e0 init/main.c:1692
kernel_init+0x1d/0x1d0 init/main.c:1582
ret_from_fork+0x51e/0xb90 arch/x86/kernel/process.c:158
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
-> #3 (&q->elevator_lock){+.+.}-{4:4}:
__mutex_lock_common kernel/locking/mutex.c:614 [inline]
__mutex_lock+0x19f/0x1300 kernel/locking/mutex.c:776
elevator_change+0x1b3/0x450 block/elevator.c:679
elevator_set_none+0xb5/0x140 block/elevator.c:769
blk_mq_elv_switch_none block/blk-mq.c:5110 [inline]
__blk_mq_update_nr_hw_queues block/blk-mq.c:5155 [inline]
blk_mq_update_nr_hw_queues+0x5e7/0x1a60 block/blk-mq.c:5220
nbd_start_device+0x17f/0xb10 drivers/block/nbd.c:1489
nbd_genl_connect+0x165b/0x1cf0 drivers/block/nbd.c:2239
genl_family_rcv_msg_doit+0x22a/0x330 net/netlink/genetlink.c:1114
genl_family_rcv_msg net/netlink/genetlink.c:1194 [inline]
genl_rcv_msg+0x61c/0x7a0 net/netlink/genetlink.c:1209
netlink_rcv_skb+0x232/0x4b0 net/netlink/af_netlink.c:2550
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1218
netlink_unicast_kernel net/netlink/af_netlink.c:1318 [inline]
netlink_unicast+0x80f/0x9b0 net/netlink/af_netlink.c:1344
netlink_sendmsg+0x813/0xb40 net/netlink/af_netlink.c:1894
sock_sendmsg_nosec net/socket.c:727 [inline]
__sock_sendmsg net/socket.c:742 [inline]
____sys_sendmsg+0x972/0x9f0 net/socket.c:2592
___sys_sendmsg+0x2a5/0x360 net/socket.c:2646
__sys_sendmsg net/socket.c:2678 [inline]
__do_sys_sendmsg net/socket.c:2683 [inline]
__se_sys_sendmsg net/socket.c:2681 [inline]
__x64_sys_sendmsg+0x1bd/0x2a0 net/socket.c:2681
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0x14d/0xf80 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
-> #2 (&q->q_usage_counter(io)#49){++++}-{0:0}:
blk_alloc_queue+0x546/0x680 block/blk-core.c:461
blk_mq_alloc_queue block/blk-mq.c:4429 [inline]
__blk_mq_alloc_disk+0x197/0x390 block/blk-mq.c:4476
nbd_dev_add+0x499/0xb50 drivers/block/nbd.c:1954
nbd_init+0x168/0x1f0 drivers/block/nbd.c:2692
do_one_initcall+0x250/0x8d0 init/main.c:1382
do_initcall_level+0x104/0x190 init/main.c:1444
do_initcalls+0x59/0xa0 init/main.c:1460
kernel_init_freeable+0x2a6/0x3e0 init/main.c:1692
kernel_init+0x1d/0x1d0 init/main.c:1582
ret_from_fork+0x51e/0xb90 arch/x86/kernel/process.c:158
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
-> #1 (fs_reclaim){+.+.}-{0:0}:
__fs_reclaim_acquire mm/page_alloc.c:4348 [inline]
fs_reclaim_acquire+0x71/0x100 mm/page_alloc.c:4362
might_alloc include/linux/sched/mm.h:317 [inline]
slab_pre_alloc_hook mm/slub.c:4489 [inline]
slab_alloc_node mm/slub.c:4843 [inline]
__do_kmalloc_node mm/slub.c:5259 [inline]
__kmalloc_node_track_caller_noprof+0x9d/0x7b0 mm/slub.c:5368
memdup_sockptr_noprof include/linux/sockptr.h:126 [inline]
xfrm_user_policy+0x1c4/0x960 net/xfrm/xfrm_state.c:2975
do_ip_setsockopt+0x1aab/0x2ea0 net/ipv4/ip_sockglue.c:1347
ip_setsockopt+0x66/0x110 net/ipv4/ip_sockglue.c:1417
smc_setsockopt+0x249/0xac0 net/smc/af_smc.c:3110
do_sock_setsockopt+0x17c/0x1b0 net/socket.c:2322
__sys_setsockopt net/socket.c:2347 [inline]
__do_sys_setsockopt net/socket.c:2353 [inline]
__se_sys_setsockopt net/socket.c:2350 [inline]
__x64_sys_setsockopt+0x13d/0x1b0 net/socket.c:2350
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0x14d/0xf80 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
-> #0 (k-sk_lock-AF_INET){+.+.}-{0:0}:
check_prev_add kernel/locking/lockdep.c:3165 [inline]
check_prevs_add kernel/locking/lockdep.c:3284 [inline]
validate_chain kernel/locking/lockdep.c:3908 [inline]
__lock_acquire+0x15a5/0x2cf0 kernel/locking/lockdep.c:5237
lock_acquire+0xf0/0x2e0 kernel/locking/lockdep.c:5868
lock_sock_fast include/net/sock.h:1741 [inline]
__mptcp_pm_send_ack+0x76/0x200 net/mptcp/pm.c:196
mptcp_pm_mp_prio_send_ack+0xaf8/0xba0 net/mptcp/pm.c:296
mptcp_pm_nl_set_flags_all net/mptcp/pm_kernel.c:1484 [inline]
mptcp_pm_nl_set_flags+0x893/0xc90 net/mptcp/pm_kernel.c:1551
mptcp_pm_set_flags net/mptcp/pm_netlink.c:277 [inline]
mptcp_pm_nl_set_flags_doit+0x364/0x450 net/mptcp/pm_netlink.c:282
genl_family_rcv_msg_doit+0x22a/0x330 net/netlink/genetlink.c:1114
genl_family_rcv_msg net/netlink/genetlink.c:1194 [inline]
genl_rcv_msg+0x61c/0x7a0 net/netlink/genetlink.c:1209
netlink_rcv_skb+0x232/0x4b0 net/netlink/af_netlink.c:2550
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1218
netlink_unicast_kernel net/netlink/af_netlink.c:1318 [inline]
netlink_unicast+0x80f/0x9b0 net/netlink/af_netlink.c:1344
netlink_sendmsg+0x813/0xb40 net/netlink/af_netlink.c:1894
sock_sendmsg_nosec net/socket.c:727 [inline]
__sock_sendmsg net/socket.c:742 [inline]
____sys_sendmsg+0x972/0x9f0 net/socket.c:2592
___sys_sendmsg+0x2a5/0x360 net/socket.c:2646
__sys_sendmsg net/socket.c:2678 [inline]
__do_sys_sendmsg net/socket.c:2683 [inline]
__se_sys_sendmsg net/socket.c:2681 [inline]
__x64_sys_sendmsg+0x1bd/0x2a0 net/socket.c:2681
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0x14d/0xf80 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
other info that might help us debug this:
Chain exists of:
k-sk_lock-AF_INET --> &nsock->tx_lock --> sk_lock-AF_INET
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(sk_lock-AF_INET);
lock(&nsock->tx_lock);
lock(sk_lock-AF_INET);
lock(k-sk_lock-AF_INET);
*** DEADLOCK ***
3 locks held by syz.2.2034/13659:
#0: ffffffff8fc3de70 (cb_lock){++++}-{4:4}, at: genl_rcv+0x19/0x40 net/netlink/genetlink.c:1217
#1: ffffffff8fc3dc88 (genl_mutex){+.+.}-{4:4}, at: genl_lock net/netlink/genetlink.c:35 [inline]
#1: ffffffff8fc3dc88 (genl_mutex){+.+.}-{4:4}, at: genl_op_lock net/netlink/genetlink.c:60 [inline]
#1: ffffffff8fc3dc88 (genl_mutex){+.+.}-{4:4}, at: genl_rcv_msg+0x10b/0x7a0 net/netlink/genetlink.c:1208
#2: ffff88807e300ea0 (sk_lock-AF_INET){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1709 [inline]
#2: ffff88807e300ea0 (sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_pm_nl_set_flags_all net/mptcp/pm_kernel.c:1482 [inline]
#2: ffff88807e300ea0 (sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_pm_nl_set_flags+0x795/0xc90 net/mptcp/pm_kernel.c:1551
stack backtrace:
CPU: 1 UID: 0 PID: 13659 Comm: syz.2.2034 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/18/2026
Call Trace:
<TASK>
dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120
print_circular_bug+0x2e1/0x300 kernel/locking/lockdep.c:2043
check_noncircular+0x12e/0x150 kernel/locking/lockdep.c:2175
check_prev_add kernel/locking/lockdep.c:3165 [inline]
check_prevs_add kernel/locking/lockdep.c:3284 [inline]
validate_chain kernel/locking/lockdep.c:3908 [inline]
__lock_acquire+0x15a5/0x2cf0 kernel/locking/lockdep.c:5237
lock_acquire+0xf0/0x2e0 kernel/locking/lockdep.c:5868
lock_sock_fast include/net/sock.h:1741 [inline]
__mptcp_pm_send_ack+0x76/0x200 net/mptcp/pm.c:196
mptcp_pm_mp_prio_send_ack+0xaf8/0xba0 net/mptcp/pm.c:296
mptcp_pm_nl_set_flags_all net/mptcp/pm_kernel.c:1484 [inline]
mptcp_pm_nl_set_flags+0x893/0xc90 net/mptcp/pm_kernel.c:1551
mptcp_pm_set_flags net/mptcp/pm_netlink.c:277 [inline]
mptcp_pm_nl_set_flags_doit+0x364/0x450 net/mptcp/pm_netlink.c:282
genl_family_rcv_msg_doit+0x22a/0x330 net/netlink/genetlink.c:1114
genl_family_rcv_msg net/netlink/genetlink.c:1194 [inline]
genl_rcv_msg+0x61c/0x7a0 net/netlink/genetlink.c:1209
netlink_rcv_skb+0x232/0x4b0 net/netlink/af_netlink.c:2550
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1218
netlink_unicast_kernel net/netlink/af_netlink.c:1318 [inline]
netlink_unicast+0x80f/0x9b0 net/netlink/af_netlink.c:1344
netlink_sendmsg+0x813/0xb40 net/netlink/af_netlink.c:1894
sock_sendmsg_nosec net/socket.c:727 [inline]
__sock_sendmsg net/socket.c:742 [inline]
____sys_sendmsg+0x972/0x9f0 net/socket.c:2592
___sys_sendmsg+0x2a5/0x360 net/socket.c:2646
__sys_sendmsg net/socket.c:2678 [inline]
__do_sys_sendmsg net/socket.c:2683 [inline]
__se_sys_sendmsg net/socket.c:2681 [inline]
__x64_sys_sendmsg+0x1bd/0x2a0 net/socket.c:2681
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0x14d/0xf80 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fb82ef9c819
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fb82fe60028 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007fb82f216090 RCX: 00007fb82ef9c819
RDX: 0000000000000000 RSI: 0000200000000400 RDI: 0000000000000009
RBP: 00007fb82f032c91 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007fb82f216128 R14: 00007fb82f216090 R15: 00007ffcfa510d88
</TASK>
---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.
syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title
If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)
If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report
If you want to undo deduplication, reply with:
#syz undup
^ permalink raw reply
* [PATCH net-next v4 4/4] net: dsa: yt921x: Add port qdisc tbf support
From: David Yang @ 2026-04-09 17:12 UTC (permalink / raw)
To: netdev
Cc: David Yang, Andrew Lunn, Vladimir Oltean, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, linux-kernel
In-Reply-To: <20260409171209.2575583-1-mmyangfl@gmail.com>
Enable port shaping and support limiting the rate of outgoing traffic.
Signed-off-by: David Yang <mmyangfl@gmail.com>
---
drivers/net/dsa/yt921x.c | 134 +++++++++++++++++++++++++++++++++++++++
drivers/net/dsa/yt921x.h | 65 ++++++++++++++++++-
2 files changed, 198 insertions(+), 1 deletion(-)
diff --git a/drivers/net/dsa/yt921x.c b/drivers/net/dsa/yt921x.c
index f0ebbcd2151c..b50d76b2d164 100644
--- a/drivers/net/dsa/yt921x.c
+++ b/drivers/net/dsa/yt921x.c
@@ -24,6 +24,7 @@
#include <net/dsa.h>
#include <net/dscp.h>
#include <net/ieee8021q.h>
+#include <net/pkt_cls.h>
#include "yt921x.h"
@@ -1274,6 +1275,19 @@ yt921x_marker_tfm_police(struct yt921x_marker *marker,
priv, port, extack);
}
+static int
+yt921x_marker_tfm_shape(struct yt921x_marker *marker, u64 rate, u64 burst,
+ unsigned int flags, bool queue,
+ struct yt921x_priv *priv, int port,
+ struct netlink_ext_ack *extack)
+{
+ return yt921x_marker_tfm(marker, rate, burst, flags,
+ queue ? priv->queue_shape_slot_ns :
+ priv->port_shape_slot_ns, YT921X_SHAPE_CIR_MAX,
+ YT921X_SHAPE_CBS_MAX, YT921X_SHAPE_UNIT_MAX,
+ priv, port, extack);
+}
+
static int
yt921x_police_validate(const struct flow_action_police *police,
const struct flow_action *action,
@@ -1380,6 +1394,111 @@ yt921x_dsa_port_policer_add(struct dsa_switch *ds, int port,
return res;
}
+static int
+yt921x_tbf_validate(struct yt921x_priv *priv,
+ const struct tc_tbf_qopt_offload *qopt, int *queuep)
+{
+ struct device *dev = to_device(priv);
+ int queue = -1;
+
+ /* TODO: queue support */
+ if (qopt->parent != TC_H_ROOT) {
+ dev_err(dev, "Parent should be \"root\"\n");
+ return -EOPNOTSUPP;
+ }
+
+ switch (qopt->command) {
+ case TC_TBF_REPLACE: {
+ const struct tc_tbf_qopt_offload_replace_params *p;
+
+ p = &qopt->replace_params;
+
+ if (!p->rate.mpu) {
+ dev_info(dev, "Assuming you want mpu = 64\n");
+ } else if (p->rate.mpu != 64) {
+ dev_err(dev, "mpu other than 64 not supported\n");
+ return -EINVAL;
+ }
+ break;
+ }
+ default:
+ break;
+ }
+
+ *queuep = queue;
+ return 0;
+}
+
+static int
+yt921x_dsa_port_setup_tc_tbf_port(struct dsa_switch *ds, int port,
+ const struct tc_tbf_qopt_offload *qopt)
+{
+ struct yt921x_priv *priv = to_yt921x_priv(ds);
+ u32 ctrls[2];
+ int res;
+
+ switch (qopt->command) {
+ case TC_TBF_DESTROY:
+ ctrls[0] = 0;
+ ctrls[1] = 0;
+ break;
+ case TC_TBF_REPLACE: {
+ const struct tc_tbf_qopt_offload_replace_params *p;
+ struct yt921x_marker marker;
+ u64 burst;
+
+ p = &qopt->replace_params;
+
+ /* where is burst??? */
+ burst = div_u64(priv->port_shape_slot_ns * p->rate.rate_bytes_ps,
+ 1000000000) + 10000;
+ res = yt921x_marker_tfm_shape(&marker, p->rate.rate_bytes_ps,
+ burst,
+ YT921X_MARKER_SINGLE_BUCKET,
+ false, priv, port, NULL);
+ if (res)
+ return res;
+
+ ctrls[0] = YT921X_PORT_SHAPE_CTRLa_CIR(marker.cir) |
+ YT921X_PORT_SHAPE_CTRLa_CBS(marker.cbs);
+ ctrls[1] = YT921X_PORT_SHAPE_CTRLb_UNIT(marker.unit) |
+ YT921X_PORT_SHAPE_CTRLb_EN;
+ break;
+ }
+ default:
+ return -EOPNOTSUPP;
+ }
+
+ mutex_lock(&priv->reg_lock);
+ res = yt921x_reg64_write(priv, YT921X_PORTn_SHAPE_CTRL(port), ctrls);
+ mutex_unlock(&priv->reg_lock);
+
+ return res;
+}
+
+static int
+yt921x_dsa_port_setup_tc(struct dsa_switch *ds, int port,
+ enum tc_setup_type type, void *type_data)
+{
+ struct yt921x_priv *priv = to_yt921x_priv(ds);
+ int res;
+
+ switch (type) {
+ case TC_SETUP_QDISC_TBF: {
+ const struct tc_tbf_qopt_offload *qopt = type_data;
+ int queue;
+
+ res = yt921x_tbf_validate(priv, qopt, &queue);
+ if (res)
+ return res;
+
+ return yt921x_dsa_port_setup_tc_tbf_port(ds, port, qopt);
+ }
+ default:
+ return -EOPNOTSUPP;
+ }
+}
+
static int
yt921x_mirror_del(struct yt921x_priv *priv, int port, bool ingress)
{
@@ -3526,6 +3645,20 @@ static int yt921x_chip_setup_tc(struct yt921x_priv *priv)
return res;
priv->meter_slot_ns = ctrl * op_ns;
+ ctrl = max(priv->port_shape_slot_ns / op_ns,
+ YT921X_PORT_SHAPE_SLOT_MIN);
+ res = yt921x_reg_write(priv, YT921X_PORT_SHAPE_SLOT, ctrl);
+ if (res)
+ return res;
+ priv->port_shape_slot_ns = ctrl * op_ns;
+
+ ctrl = max(priv->queue_shape_slot_ns / op_ns,
+ YT921X_QUEUE_SHAPE_SLOT_MIN);
+ res = yt921x_reg_write(priv, YT921X_QUEUE_SHAPE_SLOT, ctrl);
+ if (res)
+ return res;
+ priv->queue_shape_slot_ns = ctrl * op_ns;
+
return 0;
}
@@ -3682,6 +3815,7 @@ static const struct dsa_switch_ops yt921x_dsa_switch_ops = {
/* rate */
.port_policer_del = yt921x_dsa_port_policer_del,
.port_policer_add = yt921x_dsa_port_policer_add,
+ .port_setup_tc = yt921x_dsa_port_setup_tc,
/* hsr */
.port_hsr_leave = dsa_port_simple_hsr_leave,
.port_hsr_join = dsa_port_simple_hsr_join,
diff --git a/drivers/net/dsa/yt921x.h b/drivers/net/dsa/yt921x.h
index 01cc2298be94..1c4f4b56d1c8 100644
--- a/drivers/net/dsa/yt921x.h
+++ b/drivers/net/dsa/yt921x.h
@@ -524,6 +524,12 @@ enum yt921x_app_selector {
#define YT921X_PORT_VLAN_CTRL1_CVLAN_DROP_TAGGED BIT(1)
#define YT921X_PORT_VLAN_CTRL1_CVLAN_DROP_UNTAGGED BIT(0)
+#define YT921X_PORTn_PRIO_UCAST_QUEUE(port) (0x300200 + 4 * (port))
+#define YT921X_PORT_PRIOm_UCAST_QUEUE_M(m) (7 << (3 * (m)))
+#define YT921X_PORT_PRIOm_UCAST_QUEUE(m, x) ((x) << (3 * (m)))
+#define YT921X_PORTn_PRIO_MCAST_QUEUE(port) (0x300280 + 4 * (port))
+#define YT921X_PORT_PRIOm_MCAST_QUEUE_M(m) (3 << (2 * (m)))
+#define YT921X_PORT_PRIOm_MCAST_QUEUE(m, x) ((x) << (2 * (m)))
#define YT921X_MIRROR 0x300300
#define YT921X_MIRROR_IGR_PORTS_M GENMASK(26, 16)
#define YT921X_MIRROR_IGR_PORTS(x) FIELD_PREP(YT921X_MIRROR_IGR_PORTS_M, (x))
@@ -534,6 +540,47 @@ enum yt921x_app_selector {
#define YT921X_MIRROR_PORT_M GENMASK(3, 0)
#define YT921X_MIRROR_PORT(x) FIELD_PREP(YT921X_MIRROR_PORT_M, (x))
+#define YT921X_QUEUE_SHAPE_SLOT 0x340008
+#define YT921X_QUEUE_SHAPE_SLOT_SLOT_M GENMASK(11, 0)
+#define YT921X_PORT_SHAPE_SLOT 0x34000c
+#define YT921X_PORT_SHAPE_SLOT_SLOT_M GENMASK(11, 0)
+#define YT921X_QUEUEn_SCH(x) (0x341000 + 4 * (x))
+#define YT921X_QUEUE_SCH_E_DWRR_M GENMASK(27, 18)
+#define YT921X_QUEUE_SCH_E_DWRR(x) FIELD_PREP(YT921X_QUEUE_SCH_E_DWRR_M, (x))
+#define YT921X_QUEUE_SCH_C_DWRR_M GENMASK(17, 8)
+#define YT921X_QUEUE_SCH_C_DWRR(x) FIELD_PREP(YT921X_QUEUE_SCH_C_DWRR_M, (x))
+#define YT921X_QUEUE_SCH_E_PRIO_M GENMASK(7, 4)
+#define YT921X_QUEUE_SCH_E_PRIO(x) FIELD_PREP(YT921X_QUEUE_SCH_E_PRIO_M, (x))
+#define YT921X_QUEUE_SCH_C_PRIO_M GENMASK(3, 0)
+#define YT921X_QUEUE_SCH_C_PRIO(x) FIELD_PREP(YT921X_QUEUE_SCH_C_PRIO_M, (x))
+#define YT921X_C_DWRRn(x) (0x342000 + 4 * (x))
+#define YT921X_E_DWRRn(x) (0x343000 + 4 * (x))
+#define YT921X_DWRR_PKT_MODE BIT(0) /* 0: byte rate mode */
+#define YT921X_QUEUEn_SHAPE_CTRL(x) (0x34c000 + 0x10 * (x))
+#define YT921X_QUEUE_SHAPE_CTRLc_TOKEN_OVERFLOW_EN BIT(6)
+#define YT921X_QUEUE_SHAPE_CTRLc_E_EN BIT(5)
+#define YT921X_QUEUE_SHAPE_CTRLc_C_EN BIT(4)
+#define YT921X_QUEUE_SHAPE_CTRLc_PKT_MODE BIT(3) /* 0: byte rate mode */
+#define YT921X_QUEUE_SHAPE_CTRLc_UNIT_M GENMASK(2, 0)
+#define YT921X_QUEUE_SHAPE_CTRLc_UNIT(x) FIELD_PREP(YT921X_QUEUE_SHAPE_CTRLc_UNIT_M, (x))
+#define YT921X_QUEUE_SHAPE_CTRLb_EBS_M GENMASK(31, 18)
+#define YT921X_QUEUE_SHAPE_CTRLb_EBS(x) FIELD_PREP(YT921X_QUEUE_SHAPE_CTRLb_EBS_M, (x))
+#define YT921X_QUEUE_SHAPE_CTRLb_EIR_M GENMASK(17, 0)
+#define YT921X_QUEUE_SHAPE_CTRLb_EIR(x) FIELD_PREP(YT921X_QUEUE_SHAPE_CTRLb_EIR_M, (x))
+#define YT921X_QUEUE_SHAPE_CTRLa_CBS_M GENMASK(31, 18)
+#define YT921X_QUEUE_SHAPE_CTRLa_CBS(x) FIELD_PREP(YT921X_QUEUE_SHAPE_CTRLa_CBS_M, (x))
+#define YT921X_QUEUE_SHAPE_CTRLa_CIR_M GENMASK(17, 0)
+#define YT921X_QUEUE_SHAPE_CTRLa_CIR(x) FIELD_PREP(YT921X_QUEUE_SHAPE_CTRLa_CIR_M, (x))
+#define YT921X_PORTn_SHAPE_CTRL(port) (0x354000 + 8 * (port))
+#define YT921X_PORT_SHAPE_CTRLb_EN BIT(4)
+#define YT921X_PORT_SHAPE_CTRLb_PKT_MODE BIT(3) /* 0: byte rate mode */
+#define YT921X_PORT_SHAPE_CTRLb_UNIT_M GENMASK(2, 0)
+#define YT921X_PORT_SHAPE_CTRLb_UNIT(x) FIELD_PREP(YT921X_PORT_SHAPE_CTRLb_UNIT_M, (x))
+#define YT921X_PORT_SHAPE_CTRLa_CBS_M GENMASK(31, 18)
+#define YT921X_PORT_SHAPE_CTRLa_CBS(x) FIELD_PREP(YT921X_PORT_SHAPE_CTRLa_CBS_M, (x))
+#define YT921X_PORT_SHAPE_CTRLa_CIR_M GENMASK(17, 0)
+#define YT921X_PORT_SHAPE_CTRLa_CIR(x) FIELD_PREP(YT921X_PORT_SHAPE_CTRLa_CIR_M, (x))
+
#define YT921X_EDATA_EXTMODE 0xfb
#define YT921X_EDATA_LEN 0x100
@@ -559,6 +606,11 @@ enum yt921x_fdb_entry_status {
#define YT921X_METER_UNIT_MAX ((1 << 3) - 1)
#define YT921X_METER_CIR_MAX ((1 << 18) - 1)
#define YT921X_METER_CBS_MAX ((1 << 16) - 1)
+#define YT921X_PORT_SHAPE_SLOT_MIN 80
+#define YT921X_QUEUE_SHAPE_SLOT_MIN 132
+#define YT921X_SHAPE_UNIT_MAX ((1 << 3) - 1)
+#define YT921X_SHAPE_CIR_MAX ((1 << 18) - 1)
+#define YT921X_SHAPE_CBS_MAX ((1 << 14) - 1)
#define YT921X_LAG_NUM 2
#define YT921X_LAG_PORT_NUM 4
@@ -576,7 +628,16 @@ enum yt921x_fdb_entry_status {
#define YT921X_TAG_LEN 8
/* 8 internal + 2 external + 1 mcu */
-#define YT921X_PORT_NUM 11
+#define YT921X_PORT_NUM 11
+#define YT921X_UCAST_QUEUE_NUM 8
+#define YT921X_MCAST_QUEUE_NUM 4
+#define YT921X_PORT_QUEUE_NUM \
+ (YT921X_UCAST_QUEUE_NUM + YT921X_MCAST_QUEUE_NUM)
+#define YT921X_UCAST_QUEUE_ID(port, queue) \
+ (YT921X_UCAST_QUEUE_NUM * (port) + (queue))
+#define YT921X_MCAST_QUEUE_ID(port, queue) \
+ (YT921X_UCAST_QUEUE_NUM * YT921X_PORT_NUM + \
+ YT921X_MCAST_QUEUE_NUM * (port) + (queue))
#define yt921x_port_is_internal(port) ((port) < 8)
#define yt921x_port_is_external(port) (8 <= (port) && (port) < 9)
@@ -655,6 +716,8 @@ struct yt921x_priv {
const struct yt921x_info *info;
unsigned int meter_slot_ns;
+ unsigned int port_shape_slot_ns;
+ unsigned int queue_shape_slot_ns;
/* cache of dsa_cpu_ports(ds) */
u16 cpu_ports_mask;
unsigned char cycle_ns;
--
2.53.0
^ permalink raw reply related
* [PATCH net-next v4 3/4] net: dsa: yt921x: Add port police support
From: David Yang @ 2026-04-09 17:12 UTC (permalink / raw)
To: netdev
Cc: David Yang, Andrew Lunn, Vladimir Oltean, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, linux-kernel
In-Reply-To: <20260409171209.2575583-1-mmyangfl@gmail.com>
Enable rate meter ability and support limiting the rate of incoming
traffic.
Signed-off-by: David Yang <mmyangfl@gmail.com>
---
drivers/net/dsa/yt921x.c | 325 ++++++++++++++++++++++++++++++++++++++-
drivers/net/dsa/yt921x.h | 54 +++++++
2 files changed, 378 insertions(+), 1 deletion(-)
diff --git a/drivers/net/dsa/yt921x.c b/drivers/net/dsa/yt921x.c
index 0c07b903fd68..f0ebbcd2151c 100644
--- a/drivers/net/dsa/yt921x.c
+++ b/drivers/net/dsa/yt921x.c
@@ -263,6 +263,14 @@ yt921x_reg_toggle_bits(struct yt921x_priv *priv, u32 reg, u32 mask, bool set)
* eliminate potential issues, although partial reads/writes are also possible.
*/
+static void update_ctrls_unaligned(u32 *lo, u32 *hi, u64 mask, u64 val)
+{
+ *lo &= ~lower_32_bits(mask);
+ *hi &= ~upper_32_bits(mask);
+ *lo |= lower_32_bits(val);
+ *hi |= upper_32_bits(val);
+}
+
static int
yt921x_regs_read(struct yt921x_priv *priv, u32 reg, u32 *vals,
unsigned int num_regs)
@@ -373,6 +381,12 @@ yt921x_reg64_clear_bits(struct yt921x_priv *priv, u32 reg, const u32 *masks)
return yt921x_regs_clear_bits(priv, reg, masks, 2);
}
+static int
+yt921x_reg96_write(struct yt921x_priv *priv, u32 reg, const u32 *vals)
+{
+ return yt921x_regs_write(priv, reg, vals, 3);
+}
+
static int yt921x_reg_mdio_read(void *context, u32 reg, u32 *valp)
{
struct yt921x_reg_mdio *mdio = context;
@@ -1066,6 +1080,13 @@ yt921x_dsa_set_mac_eee(struct dsa_switch *ds, int port, struct ethtool_keee *e)
return res;
}
+static int yt921x_mtu_fetch(struct yt921x_priv *priv, int port)
+{
+ struct dsa_port *dp = dsa_to_port(&priv->ds, port);
+
+ return dp->user ? READ_ONCE(dp->user->mtu) : ETH_DATA_LEN;
+}
+
static int
yt921x_dsa_port_change_mtu(struct dsa_switch *ds, int port, int new_mtu)
{
@@ -1097,6 +1118,268 @@ static int yt921x_dsa_port_max_mtu(struct dsa_switch *ds, int port)
return YT921X_FRAME_SIZE_MAX - ETH_HLEN - ETH_FCS_LEN - YT921X_TAG_LEN;
}
+/* v * 2^e */
+static u64 ldexpu64(u64 v, int e)
+{
+ return e >= 0 ? v << e : v >> -e;
+}
+
+/* slot (ns) * rate (/s) / 10^9 (ns/s) = 2^C * token * 4^unit */
+static u32 rate2token(u64 rate, unsigned int slot_ns, int unit, int C)
+{
+ int e = 2 * unit + C + YT921X_TOKEN_RATE_C;
+
+ return div_u64(ldexpu64(slot_ns * rate, -e), 1000000000);
+}
+
+static u64 token2rate(u32 token, unsigned int slot_ns, int unit, int C)
+{
+ int e = 2 * unit + C + YT921X_TOKEN_RATE_C;
+
+ return div_u64(ldexpu64(mul_u32_u32(1000000000, token), e), slot_ns);
+}
+
+/* burst = 2^C * token * 4^unit */
+static u32 burst2token(u64 burst, int unit, int C)
+{
+ return ldexpu64(burst, -(2 * unit + C));
+}
+
+static u64 token2burst(u32 token, int unit, int C)
+{
+ return ldexpu64(token, 2 * unit + C);
+}
+
+struct yt921x_marker {
+ u32 cir;
+ u32 cbs;
+ u32 ebs;
+ int unit;
+ bool pkt_mode;
+};
+
+#define YT921X_MARKER_PKT_MODE BIT(0)
+#define YT921X_MARKER_SINGLE_BUCKET BIT(1)
+
+static int
+yt921x_marker_tfm(struct yt921x_marker *marker, u64 rate, u64 burst,
+ unsigned int flags, unsigned int slot_ns, u32 cir_max,
+ u32 cbs_max, int unit_max, struct yt921x_priv *priv, int port,
+ struct netlink_ext_ack *extack)
+{
+ const int C = flags & YT921X_MARKER_PKT_MODE ? YT921X_TOKEN_PKT_C :
+ YT921X_TOKEN_BYTE_C;
+ struct device *dev = to_device(priv);
+ struct yt921x_marker m;
+ u64 burst_est;
+ u64 burst_sug;
+ u64 burst_max;
+ u64 rate_max;
+
+ m.unit = unit_max;
+ rate_max = token2rate(cir_max, slot_ns, m.unit, C);
+ burst_max = token2burst(cbs_max, m.unit, C);
+
+ /* Check for unusual values */
+ if (rate > rate_max || burst > burst_max) {
+ NL_SET_ERR_MSG_MOD(extack, "Unexpected tremendous rate");
+ return -ERANGE;
+ }
+
+ /* Check for matching burst */
+ burst_est = div_u64(slot_ns * rate, 1000000000);
+ burst_sug = burst_est;
+ if (flags & YT921X_MARKER_PKT_MODE)
+ burst_sug++;
+ else
+ burst_sug += ETH_HLEN + yt921x_mtu_fetch(priv, port) +
+ ETH_FCS_LEN;
+ if (burst_sug > burst)
+ dev_warn(dev,
+ "Consider burst at least %llu to match rate %llu\n",
+ burst_sug, rate);
+
+ /* Select unit */
+ for (; m.unit > 0; m.unit--) {
+ if (rate > (rate_max >> 2) || burst > (burst_max >> 2))
+ break;
+ rate_max >>= 2;
+ burst_max >>= 2;
+ }
+
+ /* Calculate information rate and bucket size */
+ m.cir = rate2token(rate, slot_ns, m.unit, C);
+ if (!m.cir)
+ m.cir = 1;
+ else if (WARN_ON(m.cir > cir_max))
+ m.cir = cir_max;
+ m.cbs = burst2token(burst, m.unit, C);
+ if (!m.cbs)
+ m.cbs = 1;
+ else if (WARN_ON(m.cbs > cbs_max))
+ m.cbs = cbs_max;
+
+ /* Cut EBS */
+ m.ebs = 0;
+ if (!(flags & YT921X_MARKER_SINGLE_BUCKET)) {
+ /* We don't have a chance to adjust rate when MTU is changed */
+ if (flags & YT921X_MARKER_PKT_MODE)
+ burst_est++;
+ else
+ burst_est += YT921X_FRAME_SIZE_MAX;
+
+ if (burst_est < burst) {
+ u32 pbs = m.cbs;
+
+ m.cbs = burst2token(burst_est, m.unit, C);
+ if (!m.cbs) {
+ m.cbs = 1;
+ } else if (m.cbs > cbs_max) {
+ WARN_ON(1);
+ m.cbs = cbs_max;
+ }
+
+ if (pbs > m.cbs)
+ m.ebs = pbs - m.cbs;
+ }
+ }
+
+ dev_dbg(dev,
+ "slot %u ns, rate %llu, burst %llu -> unit %d, cir %u, cbs %u, ebs %u\n",
+ slot_ns, rate, burst, m.unit, m.cir, m.cbs, m.ebs);
+
+ m.pkt_mode = flags & YT921X_MARKER_PKT_MODE;
+ *marker = m;
+ return 0;
+}
+
+static int
+yt921x_marker_tfm_police(struct yt921x_marker *marker,
+ const struct flow_action_police *police,
+ unsigned int flags, struct yt921x_priv *priv, int port,
+ struct netlink_ext_ack *extack)
+{
+ bool pkt_mode = !!police->rate_pkt_ps;
+ u64 burst;
+ u64 rate;
+
+ rate = pkt_mode ? police->rate_pkt_ps : police->rate_bytes_ps;
+ burst = pkt_mode ? police->burst_pkt : police->burst;
+ if (pkt_mode)
+ flags |= YT921X_MARKER_PKT_MODE;
+
+ return yt921x_marker_tfm(marker, rate, burst, flags,
+ priv->meter_slot_ns, YT921X_METER_CIR_MAX,
+ YT921X_METER_CBS_MAX, YT921X_METER_UNIT_MAX,
+ priv, port, extack);
+}
+
+static int
+yt921x_police_validate(const struct flow_action_police *police,
+ const struct flow_action *action,
+ const struct flow_action_entry *act,
+ struct netlink_ext_ack *extack)
+{
+ if (police->exceed.act_id != FLOW_ACTION_DROP) {
+ NL_SET_ERR_MSG_MOD(extack,
+ "Offload not supported when exceed action is not drop");
+ return -EOPNOTSUPP;
+ }
+
+ if (police->notexceed.act_id != FLOW_ACTION_PIPE &&
+ police->notexceed.act_id != FLOW_ACTION_ACCEPT) {
+ NL_SET_ERR_MSG_MOD(extack,
+ "Offload not supported when conform action is not pipe or ok");
+ return -EOPNOTSUPP;
+ }
+
+ if (police->notexceed.act_id == FLOW_ACTION_ACCEPT && action && act &&
+ !flow_action_is_last_entry(action, act)) {
+ NL_SET_ERR_MSG_MOD(extack,
+ "Offload not supported when conform action is ok, but action is not last");
+ return -EOPNOTSUPP;
+ }
+
+ /* mtu defaults to unlimited but we got 2040 here, don't know why */
+ if (police->peakrate_bytes_ps || police->avrate || police->overhead) {
+ NL_SET_ERR_MSG_MOD(extack,
+ "Offload not supported when peakrate/avrate/overhead is configured");
+ return -EOPNOTSUPP;
+ }
+
+ return 0;
+}
+
+static int
+yt921x_meter_config(struct yt921x_priv *priv, unsigned int id,
+ const struct yt921x_marker *marker)
+{
+ u32 ctrls[3];
+
+ ctrls[0] = 0;
+ ctrls[1] = YT921X_METER_CTRLb_CIR(marker->cir);
+ ctrls[2] = YT921X_METER_CTRLc_UNIT(marker->unit) |
+ YT921X_METER_CTRLc_DROP_R |
+ YT921X_METER_CTRLc_TOKEN_OVERFLOW_EN |
+ YT921X_METER_CTRLc_METER_EN;
+ if (marker->pkt_mode)
+ ctrls[2] |= YT921X_METER_CTRLc_PKT_MODE;
+ update_ctrls_unaligned(&ctrls[0], &ctrls[1],
+ YT921X_METER_CTRLab_EBS_M,
+ YT921X_METER_CTRLab_EBS(marker->ebs));
+ update_ctrls_unaligned(&ctrls[1], &ctrls[2],
+ YT921X_METER_CTRLbc_CBS_M,
+ YT921X_METER_CTRLbc_CBS(marker->cbs));
+
+ return yt921x_reg96_write(priv, YT921X_METERn_CTRL(id), ctrls);
+}
+
+static void yt921x_dsa_port_policer_del(struct dsa_switch *ds, int port)
+{
+ struct yt921x_priv *priv = to_yt921x_priv(ds);
+ struct device *dev = to_device(priv);
+ int res;
+
+ mutex_lock(&priv->reg_lock);
+ res = yt921x_reg_write(priv, YT921X_PORTn_METER(port), 0);
+ mutex_unlock(&priv->reg_lock);
+
+ if (res)
+ dev_err(dev, "Failed to %s port %d: %i\n", "delete policer on",
+ port, res);
+}
+
+static int
+yt921x_dsa_port_policer_add(struct dsa_switch *ds, int port,
+ const struct flow_action_police *police,
+ struct netlink_ext_ack *extack)
+{
+ struct yt921x_priv *priv = to_yt921x_priv(ds);
+ struct yt921x_marker marker;
+ u32 ctrl;
+ int res;
+
+ res = yt921x_police_validate(police, NULL, NULL, extack);
+ if (res)
+ return res;
+
+ res = yt921x_marker_tfm_police(&marker, police, 0, priv, port, extack);
+ if (res)
+ return res;
+
+ mutex_lock(&priv->reg_lock);
+ res = yt921x_meter_config(priv, port + YT921X_METER_NUM, &marker);
+ if (res)
+ goto end;
+
+ ctrl = YT921X_PORT_METER_ID(port) | YT921X_PORT_METER_EN;
+ res = yt921x_reg_write(priv, YT921X_PORTn_METER(port), ctrl);
+end:
+ mutex_unlock(&priv->reg_lock);
+
+ return res;
+}
+
static int
yt921x_mirror_del(struct yt921x_priv *priv, int port, bool ingress)
{
@@ -3052,6 +3335,7 @@ static int yt921x_chip_detect(struct yt921x_priv *priv)
u32 chipid;
u32 major;
u32 mode;
+ u32 val;
int res;
res = yt921x_reg_read(priv, YT921X_CHIP_ID, &chipid);
@@ -3086,12 +3370,27 @@ static int yt921x_chip_detect(struct yt921x_priv *priv)
return -ENODEV;
}
+ res = yt921x_reg_read(priv, YT921X_SYS_CLK, &val);
+ if (res)
+ return res;
+ switch (FIELD_GET(YT921X_SYS_CLK_SEL_M, val)) {
+ case 0:
+ priv->cycle_ns = info->major == YT9215_MAJOR ? 8 : 6;
+ break;
+ case YT921X_SYS_CLK_143M:
+ priv->cycle_ns = 7;
+ break;
+ default:
+ priv->cycle_ns = 8;
+ }
+
/* Print chipid here since we are interested in lower 16 bits */
dev_info(dev,
"Motorcomm %s ethernet switch, chipid: 0x%x, chipmode: 0x%x 0x%x\n",
info->name, chipid, mode, extmode);
priv->info = info;
+
return 0;
}
@@ -3213,6 +3512,23 @@ static int yt921x_chip_setup_dsa(struct yt921x_priv *priv)
return 0;
}
+static int yt921x_chip_setup_tc(struct yt921x_priv *priv)
+{
+ unsigned int op_ns;
+ u32 ctrl;
+ int res;
+
+ op_ns = 8 * priv->cycle_ns;
+
+ ctrl = max(priv->meter_slot_ns / op_ns, YT921X_METER_SLOT_MIN);
+ res = yt921x_reg_write(priv, YT921X_METER_SLOT, ctrl);
+ if (res)
+ return res;
+ priv->meter_slot_ns = ctrl * op_ns;
+
+ return 0;
+}
+
static int __maybe_unused yt921x_chip_setup_qos(struct yt921x_priv *priv)
{
u32 ctrl;
@@ -3259,7 +3575,7 @@ static int yt921x_chip_setup(struct yt921x_priv *priv)
u32 ctrl;
int res;
- ctrl = YT921X_FUNC_MIB;
+ ctrl = YT921X_FUNC_MIB | YT921X_FUNC_METER;
res = yt921x_reg_set_bits(priv, YT921X_FUNC, ctrl);
if (res)
return res;
@@ -3268,6 +3584,10 @@ static int yt921x_chip_setup(struct yt921x_priv *priv)
if (res)
return res;
+ res = yt921x_chip_setup_tc(priv);
+ if (res)
+ return res;
+
#if IS_ENABLED(CONFIG_DCB)
res = yt921x_chip_setup_qos(priv);
if (res)
@@ -3359,6 +3679,9 @@ static const struct dsa_switch_ops yt921x_dsa_switch_ops = {
/* mtu */
.port_change_mtu = yt921x_dsa_port_change_mtu,
.port_max_mtu = yt921x_dsa_port_max_mtu,
+ /* rate */
+ .port_policer_del = yt921x_dsa_port_policer_del,
+ .port_policer_add = yt921x_dsa_port_policer_add,
/* hsr */
.port_hsr_leave = dsa_port_simple_hsr_leave,
.port_hsr_join = dsa_port_simple_hsr_join,
diff --git a/drivers/net/dsa/yt921x.h b/drivers/net/dsa/yt921x.h
index 4989d87c2492..01cc2298be94 100644
--- a/drivers/net/dsa/yt921x.h
+++ b/drivers/net/dsa/yt921x.h
@@ -23,6 +23,7 @@
#define YT921X_RST_HW BIT(31)
#define YT921X_RST_SW BIT(1)
#define YT921X_FUNC 0x80004
+#define YT921X_FUNC_METER BIT(4)
#define YT921X_FUNC_MIB BIT(1)
#define YT921X_CHIP_ID 0x80008
#define YT921X_CHIP_ID_MAJOR GENMASK(31, 16)
@@ -239,6 +240,11 @@
#define YT921X_EDATA_DATA_STATUS_M GENMASK(3, 0)
#define YT921X_EDATA_DATA_STATUS(x) FIELD_PREP(YT921X_EDATA_DATA_STATUS_M, (x))
#define YT921X_EDATA_DATA_IDLE YT921X_EDATA_DATA_STATUS(3)
+#define YT921X_SYS_CLK 0xe0040
+#define YT921X_SYS_CLK_SEL_M GENMASK(1, 0) /* unknown: 167M */
+#define YT9215_SYS_CLK_125M 0
+#define YT9218_SYS_CLK_167M 0
+#define YT921X_SYS_CLK_143M 1
#define YT921X_EXT_MBUS_OP 0x6a000
#define YT921X_INT_MBUS_OP 0xf0000
@@ -465,6 +471,42 @@ enum yt921x_app_selector {
#define YT921X_LAG_HASH_MAC_DA BIT(1)
#define YT921X_LAG_HASH_SRC_PORT BIT(0)
+#define YT921X_PORTn_RATE(port) (0x220000 + 4 * (port))
+#define YT921X_PORT_RATE_GAP_VALUE GENMASK(4, 0) /* default 20 */
+#define YT921X_METER_SLOT 0x220104
+#define YT921X_METER_SLOT_SLOT_M GENMASK(11, 0)
+#define YT921X_PORTn_METER(port) (0x220108 + 4 * (port))
+#define YT921X_PORT_METER_EN BIT(4)
+#define YT921X_PORT_METER_ID_M GENMASK(3, 0)
+#define YT921X_PORT_METER_ID(x) FIELD_PREP(YT921X_PORT_METER_ID_M, (x))
+#define YT921X_METERn_CTRL(x) (0x220800 + 0x10 * (x))
+#define YT921X_METER_CTRLc_METER_EN BIT(14)
+#define YT921X_METER_CTRLc_TOKEN_OVERFLOW_EN BIT(13) /* RFC4115: yellow use unused green bw */
+#define YT921X_METER_CTRLc_DROP_M GENMASK(12, 11)
+#define YT921X_METER_CTRLc_DROP(x) FIELD_PREP(YT921X_METER_CTRLc_DROP_M, (x))
+#define YT921X_METER_CTRLc_DROP_GYR YT921X_METER_CTRLc_DROP(0)
+#define YT921X_METER_CTRLc_DROP_YR YT921X_METER_CTRLc_DROP(1)
+#define YT921X_METER_CTRLc_DROP_R YT921X_METER_CTRLc_DROP(2)
+#define YT921X_METER_CTRLc_DROP_NONE YT921X_METER_CTRLc_DROP(3)
+#define YT921X_METER_CTRLc_COLOR_BLIND BIT(10)
+#define YT921X_METER_CTRLc_UNIT_M GENMASK(9, 7)
+#define YT921X_METER_CTRLc_UNIT(x) FIELD_PREP(YT921X_METER_CTRLc_UNIT_M, (x))
+#define YT921X_METER_CTRLc_BYTE_MODE_INCLUDE_GAP BIT(6) /* +GAP_VALUE bytes each packet */
+#define YT921X_METER_CTRLc_PKT_MODE BIT(5) /* 0: byte rate mode */
+#define YT921X_METER_CTRLc_RFC2698 BIT(4) /* 0: RFC4115 */
+#define YT921X_METER_CTRLbc_CBS_M GENMASK_ULL(35, 20)
+#define YT921X_METER_CTRLbc_CBS(x) FIELD_PREP(YT921X_METER_CTRLbc_CBS_M, (x))
+#define YT921X_METER_CTRLb_CIR_M GENMASK(19, 2)
+#define YT921X_METER_CTRLb_CIR(x) FIELD_PREP(YT921X_METER_CTRLb_CIR_M, (x))
+#define YT921X_METER_CTRLab_EBS_M GENMASK_ULL(33, 18)
+#define YT921X_METER_CTRLab_EBS(x) FIELD_PREP(YT921X_METER_CTRLab_EBS_M, (x))
+#define YT921X_METER_CTRLa_EIR_M GENMASK(17, 0)
+#define YT921X_METER_CTRLa_EIR(x) FIELD_PREP(YT921X_METER_CTRLa_EIR_M, (x))
+#define YT921X_METERn_STAT_EXCESS(x) (0x221000 + 8 * (x))
+#define YT921X_METERn_STAT_COMMITTED(x) (0x221004 + 8 * (x))
+#define YT921X_METER_STAT_TOKEN_M GENMASK(30, 15)
+#define YT921X_METER_STAT_QUEUE_M GENMASK(14, 0)
+
#define YT921X_PORTn_VLAN_CTRL(port) (0x230010 + 4 * (port))
#define YT921X_PORT_VLAN_CTRL_SVLAN_PRIO_EN BIT(31)
#define YT921X_PORT_VLAN_CTRL_CVLAN_PRIO_EN BIT(30)
@@ -508,6 +550,16 @@ enum yt921x_fdb_entry_status {
#define YT921X_MSTI_NUM 16
+#define YT921X_TOKEN_BYTE_C 1 /* 1 token = 2^1 byte */
+#define YT921X_TOKEN_PKT_C -6 /* 1 token = 2^-6 packets */
+#define YT921X_TOKEN_RATE_C -15
+/* Custom meters only, not including dedicated port meters (11) */
+#define YT921X_METER_NUM 64
+#define YT921X_METER_SLOT_MIN 80
+#define YT921X_METER_UNIT_MAX ((1 << 3) - 1)
+#define YT921X_METER_CIR_MAX ((1 << 18) - 1)
+#define YT921X_METER_CBS_MAX ((1 << 16) - 1)
+
#define YT921X_LAG_NUM 2
#define YT921X_LAG_PORT_NUM 4
@@ -602,8 +654,10 @@ struct yt921x_priv {
struct dsa_switch ds;
const struct yt921x_info *info;
+ unsigned int meter_slot_ns;
/* cache of dsa_cpu_ports(ds) */
u16 cpu_ports_mask;
+ unsigned char cycle_ns;
/* protect the access to the switch registers */
struct mutex reg_lock;
--
2.53.0
^ permalink raw reply related
* [PATCH net-next v4 2/4] net: dsa: yt921x: Refactor long register helpers
From: David Yang @ 2026-04-09 17:12 UTC (permalink / raw)
To: netdev
Cc: David Yang, Andrew Lunn, Vladimir Oltean, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, linux-kernel
In-Reply-To: <20260409171209.2575583-1-mmyangfl@gmail.com>
Dealing long registers with u64 is good, until you realize there are
longer 96-bit registers.
Refactor reg64 helpers to use u32 arrays instead of u64 values, in
preparation for 96-bit registers. We do not keep the separate u64
version for reg64 to avoid duplicated wrappers, although it looks better
when dealing with reg64 *only*.
Helpers for reg96 should be added when they are actually used to avoid
function unused warnings.
Signed-off-by: David Yang <mmyangfl@gmail.com>
---
drivers/net/dsa/yt921x.c | 162 +++++++++++++++++++++++++++------------
drivers/net/dsa/yt921x.h | 36 ++++-----
2 files changed, 129 insertions(+), 69 deletions(-)
diff --git a/drivers/net/dsa/yt921x.c b/drivers/net/dsa/yt921x.c
index 87139448bec3..0c07b903fd68 100644
--- a/drivers/net/dsa/yt921x.c
+++ b/drivers/net/dsa/yt921x.c
@@ -255,63 +255,122 @@ yt921x_reg_toggle_bits(struct yt921x_priv *priv, u32 reg, u32 mask, bool set)
return yt921x_reg_update_bits(priv, reg, mask, !set ? 0 : mask);
}
-/* Some registers, like VLANn_CTRL, should always be written in 64-bit, even if
- * you are to write only the lower / upper 32 bits.
+/* Some multi-word registers, like VLANn_CTRL, should be treated as a single
+ * long register. More specifically, writes to parts of its words won't become
+ * visible, until the last word is written.
*
- * There is no such restriction for reading, but we still provide 64-bit read
- * wrappers so that we always handle u64 values.
+ * Here we require full read and write operations over these registers to
+ * eliminate potential issues, although partial reads/writes are also possible.
*/
-static int yt921x_reg64_read(struct yt921x_priv *priv, u32 reg, u64 *valp)
+static int
+yt921x_regs_read(struct yt921x_priv *priv, u32 reg, u32 *vals,
+ unsigned int num_regs)
{
- u32 lo;
- u32 hi;
int res;
- res = yt921x_reg_read(priv, reg, &lo);
- if (res)
- return res;
- res = yt921x_reg_read(priv, reg + 4, &hi);
- if (res)
- return res;
+ for (unsigned int i = 0; i < num_regs; i++) {
+ res = yt921x_reg_read(priv, reg + 4 * i, &vals[i]);
+ if (res)
+ return res;
+ }
+
+ return 0;
+}
+
+static int
+yt921x_regs_write(struct yt921x_priv *priv, u32 reg, const u32 *vals,
+ unsigned int num_regs)
+{
+ int res;
+
+ for (unsigned int i = 0; i < num_regs; i++) {
+ res = yt921x_reg_write(priv, reg + 4 * i, vals[i]);
+ if (res)
+ return res;
+ }
- *valp = ((u64)hi << 32) | lo;
return 0;
}
-static int yt921x_reg64_write(struct yt921x_priv *priv, u32 reg, u64 val)
+static int
+yt921x_regs_update_bits(struct yt921x_priv *priv, u32 reg, const u32 *masks,
+ const u32 *vals, unsigned int num_regs)
{
+ bool changed = false;
+ u32 vs[4];
int res;
- res = yt921x_reg_write(priv, reg, (u32)val);
+ BUILD_BUG_ON(num_regs > ARRAY_SIZE(vs));
+
+ res = yt921x_regs_read(priv, reg, vs, num_regs);
if (res)
return res;
- return yt921x_reg_write(priv, reg + 4, (u32)(val >> 32));
+
+ for (unsigned int i = 0; i < num_regs; i++) {
+ u32 u = vs[i];
+
+ u &= ~masks[i];
+ u |= vals[i];
+ if (u != vs[i])
+ changed = true;
+
+ vs[i] = u;
+ }
+
+ if (!changed)
+ return 0;
+
+ return yt921x_regs_write(priv, reg, vs, num_regs);
}
static int
-yt921x_reg64_update_bits(struct yt921x_priv *priv, u32 reg, u64 mask, u64 val)
+yt921x_regs_clear_bits(struct yt921x_priv *priv, u32 reg, const u32 *masks,
+ unsigned int num_regs)
{
+ bool changed = false;
+ u32 vs[4];
int res;
- u64 v;
- u64 u;
- res = yt921x_reg64_read(priv, reg, &v);
+ BUILD_BUG_ON(num_regs > ARRAY_SIZE(vs));
+
+ res = yt921x_regs_read(priv, reg, vs, num_regs);
if (res)
return res;
- u = v;
- u &= ~mask;
- u |= val;
- if (u == v)
+ for (unsigned int i = 0; i < num_regs; i++) {
+ u32 u = vs[i];
+
+ u &= ~masks[i];
+ if (u != vs[i])
+ changed = true;
+
+ vs[i] = u;
+ }
+
+ if (!changed)
return 0;
- return yt921x_reg64_write(priv, reg, u);
+ return yt921x_regs_write(priv, reg, vs, num_regs);
+}
+
+static int
+yt921x_reg64_write(struct yt921x_priv *priv, u32 reg, const u32 *vals)
+{
+ return yt921x_regs_write(priv, reg, vals, 2);
}
-static int yt921x_reg64_clear_bits(struct yt921x_priv *priv, u32 reg, u64 mask)
+static int
+yt921x_reg64_update_bits(struct yt921x_priv *priv, u32 reg, const u32 *masks,
+ const u32 *vals)
{
- return yt921x_reg64_update_bits(priv, reg, mask, 0);
+ return yt921x_regs_update_bits(priv, reg, masks, vals, 2);
+}
+
+static int
+yt921x_reg64_clear_bits(struct yt921x_priv *priv, u32 reg, const u32 *masks)
+{
+ return yt921x_regs_clear_bits(priv, reg, masks, 2);
}
static int yt921x_reg_mdio_read(void *context, u32 reg, u32 *valp)
@@ -1844,33 +1903,31 @@ yt921x_vlan_filtering(struct yt921x_priv *priv, int port, bool vlan_filtering)
return 0;
}
-static int
-yt921x_vlan_del(struct yt921x_priv *priv, int port, u16 vid)
+static int yt921x_vlan_del(struct yt921x_priv *priv, int port, u16 vid)
{
- u64 mask64;
+ u32 masks[2];
- mask64 = YT921X_VLAN_CTRL_PORTS(port) |
- YT921X_VLAN_CTRL_UNTAG_PORTn(port);
+ masks[0] = YT921X_VLAN_CTRLa_PORTn(port);
+ masks[1] = YT921X_VLAN_CTRLb_UNTAG_PORTn(port);
- return yt921x_reg64_clear_bits(priv, YT921X_VLANn_CTRL(vid), mask64);
+ return yt921x_reg64_clear_bits(priv, YT921X_VLANn_CTRL(vid), masks);
}
static int
yt921x_vlan_add(struct yt921x_priv *priv, int port, u16 vid, bool untagged)
{
- u64 mask64;
- u64 ctrl64;
+ u32 masks[2];
+ u32 ctrls[2];
- mask64 = YT921X_VLAN_CTRL_PORTn(port) |
- YT921X_VLAN_CTRL_PORTS(priv->cpu_ports_mask);
- ctrl64 = mask64;
+ masks[0] = YT921X_VLAN_CTRLa_PORTn(port) |
+ YT921X_VLAN_CTRLa_PORTS(priv->cpu_ports_mask);
+ ctrls[0] = masks[0];
- mask64 |= YT921X_VLAN_CTRL_UNTAG_PORTn(port);
- if (untagged)
- ctrl64 |= YT921X_VLAN_CTRL_UNTAG_PORTn(port);
+ masks[1] = YT921X_VLAN_CTRLb_UNTAG_PORTn(port);
+ ctrls[1] = untagged ? masks[1] : 0;
return yt921x_reg64_update_bits(priv, YT921X_VLANn_CTRL(vid),
- mask64, ctrl64);
+ masks, ctrls);
}
static int
@@ -2318,8 +2375,8 @@ yt921x_dsa_vlan_msti_set(struct dsa_switch *ds, struct dsa_bridge bridge,
const struct switchdev_vlan_msti *msti)
{
struct yt921x_priv *priv = to_yt921x_priv(ds);
- u64 mask64;
- u64 ctrl64;
+ u32 masks[2];
+ u32 ctrls[2];
int res;
if (!msti->vid)
@@ -2327,12 +2384,14 @@ yt921x_dsa_vlan_msti_set(struct dsa_switch *ds, struct dsa_bridge bridge,
if (!msti->msti || msti->msti >= YT921X_MSTI_NUM)
return -EINVAL;
- mask64 = YT921X_VLAN_CTRL_STP_ID_M;
- ctrl64 = YT921X_VLAN_CTRL_STP_ID(msti->msti);
+ masks[0] = 0;
+ ctrls[0] = 0;
+ masks[1] = YT921X_VLAN_CTRLb_STP_ID_M;
+ ctrls[1] = YT921X_VLAN_CTRLb_STP_ID(msti->msti);
mutex_lock(&priv->reg_lock);
res = yt921x_reg64_update_bits(priv, YT921X_VLANn_CTRL(msti->vid),
- mask64, ctrl64);
+ masks, ctrls);
mutex_unlock(&priv->reg_lock);
return res;
@@ -3084,7 +3143,7 @@ static int yt921x_chip_setup_dsa(struct yt921x_priv *priv)
{
struct dsa_switch *ds = &priv->ds;
unsigned long cpu_ports_mask;
- u64 ctrl64;
+ u32 ctrls[2];
u32 ctrl;
int port;
int res;
@@ -3145,8 +3204,9 @@ static int yt921x_chip_setup_dsa(struct yt921x_priv *priv)
/* Tagged VID 0 should be treated as untagged, which confuses the
* hardware a lot
*/
- ctrl64 = YT921X_VLAN_CTRL_LEARN_DIS | YT921X_VLAN_CTRL_PORTS_M;
- res = yt921x_reg64_write(priv, YT921X_VLANn_CTRL(0), ctrl64);
+ ctrls[0] = YT921X_VLAN_CTRLa_LEARN_DIS | YT921X_VLAN_CTRLa_PORTS_M;
+ ctrls[1] = 0;
+ res = yt921x_reg64_write(priv, YT921X_VLANn_CTRL(0), ctrls);
if (res)
return res;
diff --git a/drivers/net/dsa/yt921x.h b/drivers/net/dsa/yt921x.h
index 3f129b8d403f..4989d87c2492 100644
--- a/drivers/net/dsa/yt921x.h
+++ b/drivers/net/dsa/yt921x.h
@@ -429,24 +429,24 @@ enum yt921x_app_selector {
#define YT921X_FDB_HW_FLUSH_ON_LINKDOWN BIT(0)
#define YT921X_VLANn_CTRL(vlan) (0x188000 + 8 * (vlan))
-#define YT921X_VLAN_CTRL_UNTAG_PORTS_M GENMASK_ULL(50, 40)
-#define YT921X_VLAN_CTRL_UNTAG_PORTS(x) FIELD_PREP(YT921X_VLAN_CTRL_UNTAG_PORTS_M, (x))
-#define YT921X_VLAN_CTRL_UNTAG_PORTn(port) BIT_ULL((port) + 40)
-#define YT921X_VLAN_CTRL_STP_ID_M GENMASK_ULL(39, 36)
-#define YT921X_VLAN_CTRL_STP_ID(x) FIELD_PREP(YT921X_VLAN_CTRL_STP_ID_M, (x))
-#define YT921X_VLAN_CTRL_SVLAN_EN BIT_ULL(35)
-#define YT921X_VLAN_CTRL_FID_M GENMASK_ULL(34, 23)
-#define YT921X_VLAN_CTRL_FID(x) FIELD_PREP(YT921X_VLAN_CTRL_FID_M, (x))
-#define YT921X_VLAN_CTRL_LEARN_DIS BIT_ULL(22)
-#define YT921X_VLAN_CTRL_PRIO_EN BIT_ULL(21)
-#define YT921X_VLAN_CTRL_PRIO_M GENMASK_ULL(20, 18)
-#define YT921X_VLAN_CTRL_PRIO(x) FIELD_PREP(YT921X_VLAN_CTRL_PRIO_M, (x))
-#define YT921X_VLAN_CTRL_PORTS_M GENMASK_ULL(17, 7)
-#define YT921X_VLAN_CTRL_PORTS(x) FIELD_PREP(YT921X_VLAN_CTRL_PORTS_M, (x))
-#define YT921X_VLAN_CTRL_PORTn(port) BIT_ULL((port) + 7)
-#define YT921X_VLAN_CTRL_BYPASS_1X_AC BIT_ULL(6)
-#define YT921X_VLAN_CTRL_METER_EN BIT_ULL(5)
-#define YT921X_VLAN_CTRL_METER_ID_M GENMASK_ULL(4, 0)
+#define YT921X_VLAN_CTRLb_UNTAG_PORTS_M GENMASK(18, 8)
+#define YT921X_VLAN_CTRLb_UNTAG_PORTS(x) FIELD_PREP(YT921X_VLAN_CTRLb_UNTAG_PORTS_M, (x))
+#define YT921X_VLAN_CTRLb_UNTAG_PORTn(port) BIT((port) + 8)
+#define YT921X_VLAN_CTRLb_STP_ID_M GENMASK(7, 4)
+#define YT921X_VLAN_CTRLb_STP_ID(x) FIELD_PREP(YT921X_VLAN_CTRLb_STP_ID_M, (x))
+#define YT921X_VLAN_CTRLb_SVLAN_EN BIT(3)
+#define YT921X_VLAN_CTRLab_FID_M GENMASK_ULL(34, 23)
+#define YT921X_VLAN_CTRLab_FID(x) FIELD_PREP(YT921X_VLAN_CTRLab_FID_M, (x))
+#define YT921X_VLAN_CTRLa_LEARN_DIS BIT(22)
+#define YT921X_VLAN_CTRLa_PRIO_EN BIT(21)
+#define YT921X_VLAN_CTRLa_PRIO_M GENMASK(20, 18)
+#define YT921X_VLAN_CTRLa_PRIO(x) FIELD_PREP(YT921X_VLAN_CTRLa_PRIO_M, (x))
+#define YT921X_VLAN_CTRLa_PORTS_M GENMASK(17, 7)
+#define YT921X_VLAN_CTRLa_PORTS(x) FIELD_PREP(YT921X_VLAN_CTRLa_PORTS_M, (x))
+#define YT921X_VLAN_CTRLa_PORTn(port) BIT((port) + 7)
+#define YT921X_VLAN_CTRLa_BYPASS_1X_AC BIT(6)
+#define YT921X_VLAN_CTRLa_METER_EN BIT(5)
+#define YT921X_VLAN_CTRLa_METER_ID_M GENMASK(4, 0)
#define YT921X_TPID_IGRn(x) (0x210000 + 4 * (x)) /* [0, 3] */
#define YT921X_TPID_IGR_TPID_M GENMASK(15, 0)
--
2.53.0
^ permalink raw reply related
* [PATCH net-next v4 1/4] net: dsa: pass extack to user tc policers
From: David Yang @ 2026-04-09 17:12 UTC (permalink / raw)
To: netdev
Cc: David Yang, Andrew Lunn, Vladimir Oltean, UNGLinuxDriver,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, linux-kernel
In-Reply-To: <20260409171209.2575583-1-mmyangfl@gmail.com>
Users may use extack for a friendly error message instead of dumping
everything into dmesg.
Make the according transformations to the two users (sja1105 and
felix).
Signed-off-by: David Yang <mmyangfl@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
---
drivers/net/dsa/ocelot/felix.c | 3 ++-
drivers/net/dsa/sja1105/sja1105_main.c | 3 ++-
include/net/dsa.h | 3 ++-
net/dsa/user.c | 2 +-
4 files changed, 7 insertions(+), 4 deletions(-)
diff --git a/drivers/net/dsa/ocelot/felix.c b/drivers/net/dsa/ocelot/felix.c
index 84cf8e7fb17a..4272ea6e9ca8 100644
--- a/drivers/net/dsa/ocelot/felix.c
+++ b/drivers/net/dsa/ocelot/felix.c
@@ -2001,7 +2001,8 @@ static int felix_cls_flower_stats(struct dsa_switch *ds, int port,
}
static int felix_port_policer_add(struct dsa_switch *ds, int port,
- const struct flow_action_police *policer)
+ const struct flow_action_police *policer,
+ struct netlink_ext_ack *extack)
{
struct ocelot *ocelot = ds->priv;
struct ocelot_policer pol = {
diff --git a/drivers/net/dsa/sja1105/sja1105_main.c b/drivers/net/dsa/sja1105/sja1105_main.c
index c72c2bfdcffb..dbfa45064747 100644
--- a/drivers/net/dsa/sja1105/sja1105_main.c
+++ b/drivers/net/dsa/sja1105/sja1105_main.c
@@ -2847,7 +2847,8 @@ static void sja1105_mirror_del(struct dsa_switch *ds, int port,
}
static int sja1105_port_policer_add(struct dsa_switch *ds, int port,
- const struct flow_action_police *policer)
+ const struct flow_action_police *policer,
+ struct netlink_ext_ack *extack)
{
struct sja1105_l2_policing_entry *policing;
struct sja1105_private *priv = ds->priv;
diff --git a/include/net/dsa.h b/include/net/dsa.h
index 8b6d34e8a6f0..4cc67469cf2e 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -1122,7 +1122,8 @@ struct dsa_switch_ops {
void (*port_mirror_del)(struct dsa_switch *ds, int port,
struct dsa_mall_mirror_tc_entry *mirror);
int (*port_policer_add)(struct dsa_switch *ds, int port,
- const struct flow_action_police *policer);
+ const struct flow_action_police *policer,
+ struct netlink_ext_ack *extack);
void (*port_policer_del)(struct dsa_switch *ds, int port);
int (*port_setup_tc)(struct dsa_switch *ds, int port,
enum tc_setup_type type, void *type_data);
diff --git a/net/dsa/user.c b/net/dsa/user.c
index c4bd6fe90b45..8704c1a3a5b7 100644
--- a/net/dsa/user.c
+++ b/net/dsa/user.c
@@ -1499,7 +1499,7 @@ dsa_user_add_cls_matchall_police(struct net_device *dev,
policer = &mall_tc_entry->policer;
*policer = act->police;
- err = ds->ops->port_policer_add(ds, dp->index, policer);
+ err = ds->ops->port_policer_add(ds, dp->index, policer, extack);
if (err) {
kfree(mall_tc_entry);
return err;
--
2.53.0
^ permalink raw reply related
* [PATCH net-next v4 0/4] net: dsa: yt921x: Add port police/tbf support
From: David Yang @ 2026-04-09 17:12 UTC (permalink / raw)
To: netdev
Cc: David Yang, Andrew Lunn, Vladimir Oltean, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, linux-kernel,
Vladimir Oltean, UNGLinuxDriver, Simon Horman
v3: https://lore.kernel.org/r/20260407160559.1747616-1-mmyangfl@gmail.com
- explain long registers more accurately
- fix missing packet mode flag
- rearrange function layout, in preparation for further patches
v2: https://lore.kernel.org/r/20260402223437.109097-1-mmyangfl@gmail.com
- refine commit messages and code styles, no functional changes
v1: https://lore.kernel.org/r/20260225090853.2021140-1-mmyangfl@gmail.com
- pass extack to user tc policers
- keep reg64 helpers along with reg96
- avoid macros in favor of functions
- adjust log messages
David Yang (4):
net: dsa: pass extack to user tc policers
net: dsa: yt921x: Refactor long register helpers
net: dsa: yt921x: Add port police support
net: dsa: yt921x: Add port qdisc tbf support
drivers/net/dsa/ocelot/felix.c | 3 +-
drivers/net/dsa/sja1105/sja1105_main.c | 3 +-
drivers/net/dsa/yt921x.c | 621 ++++++++++++++++++++++---
drivers/net/dsa/yt921x.h | 155 +++++-
include/net/dsa.h | 3 +-
net/dsa/user.c | 2 +-
6 files changed, 712 insertions(+), 75 deletions(-)
--
2.53.0
^ permalink raw reply
* Re: [PATCH 1/1] selftests: vsock: avoid mktemp -u for Unix socket paths
From: Simon Horman @ 2026-04-09 17:02 UTC (permalink / raw)
To: CaoRuichuang
Cc: Stefano Garzarella, Shuah Khan, virtualization, netdev,
linux-kselftest, linux-kernel
In-Reply-To: <20260405195709.85957-1-create0818@163.com>
On Mon, Apr 06, 2026 at 03:57:09AM +0800, CaoRuichuang wrote:
> The namespace tests create temporary Unix socket paths with mktemp -u and
> then hand them to socat. That only prints an unused pathname and leaves
> a race before the socket is created.
>
> Create a private temporary directory with mktemp -d and place the Unix
> socket inside it instead. This keeps the path unique without relying on
> mktemp -u for filesystem socket names.
>
> Signed-off-by: CaoRuichuang <create0818@163.com>
This patch was sent twice in quick succession.
Please don't do that.
Instead, please allow 24h between posts of updated patches to netdev.
And please version patches, like this:
Subject: [PATCH v2] ...
The process is intended to help reviewers.
For more information, please see:
https://docs.kernel.org/process/maintainer-netdev.html
^ permalink raw reply
* Re: [PATCH 1/1] selftests: vsock: avoid mktemp -u for Unix socket paths
From: Simon Horman @ 2026-04-09 17:00 UTC (permalink / raw)
To: CaoRuichuang
Cc: Stefano Garzarella, Shuah Khan, virtualization, netdev,
linux-kselftest, linux-kernel
In-Reply-To: <20260405195733.86043-1-create0818@163.com>
On Mon, Apr 06, 2026 at 03:57:33AM +0800, CaoRuichuang wrote:
> The namespace tests create temporary Unix socket paths with mktemp -u and
> then hand them to socat. That only prints an unused pathname and leaves
> a race before the socket is created.
>
> Create a private temporary directory with mktemp -d and place the Unix
> socket inside it instead. This keeps the path unique without relying on
> mktemp -u for filesystem socket names.
>
> Signed-off-by: CaoRuichuang <create0818@163.com>
I think the subject could be improved a bit.
More along the lines of the problem being solved,
less on how it is achieved.
...
> @@ -845,7 +850,7 @@ test_ns_diff_global_vm_connect_to_global_host_ok() {
> if ! vm_start "${pidfile}" "${ns0}"; then
> log_host "failed to start vm (cid=${cid}, ns=${ns0})"
> terminate_pids "${pids[@]}"
> - rm -f "${unixfile}"
> + rm -rf "${unixdir}"
rm -rf makes me feel queasy.
Can rmdir, and rm, ideally without the -f, be used instead?
rm "$unixfile"
rmdir "$unixdir"
> return "${KSFT_FAIL}"
> fi
>
...
^ permalink raw reply
* [PATCH net 2/2] can: raw: fix ro->uniq use-after-free in raw_rcv()
From: Marc Kleine-Budde @ 2026-04-09 16:57 UTC (permalink / raw)
To: netdev
Cc: davem, kuba, linux-can, kernel, Samuel Page, stable,
Oliver Hartkopp, Marc Kleine-Budde
In-Reply-To: <20260409165942.588421-1-mkl@pengutronix.de>
From: Samuel Page <sam@bynar.io>
raw_release() unregisters raw CAN receive filters via can_rx_unregister(),
but receiver deletion is deferred with call_rcu(). This leaves a window
where raw_rcv() may still be running in an RCU read-side critical section
after raw_release() frees ro->uniq, leading to a use-after-free of the
percpu uniq storage.
Move free_percpu(ro->uniq) out of raw_release() and into a raw-specific
socket destructor. can_rx_unregister() takes an extra reference to the
socket and only drops it from the RCU callback, so freeing uniq from
sk_destruct ensures the percpu area is not released until the relevant
callbacks have drained.
Fixes: 514ac99c64b2 ("can: fix multiple delivery of a single CAN frame for overlapping CAN filters")
Cc: stable@vger.kernel.org # v4.1+
Assisted-by: Bynario AI
Signed-off-by: Samuel Page <sam@bynar.io>
Link: https://patch.msgid.link/26ec626d-cae7-4418-9782-7198864d070c@bynar.io
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
[mkl: applied manually]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
---
net/can/raw.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/net/can/raw.c b/net/can/raw.c
index eee244ffc31e..58a96e933deb 100644
--- a/net/can/raw.c
+++ b/net/can/raw.c
@@ -361,6 +361,14 @@ static int raw_notifier(struct notifier_block *nb, unsigned long msg,
return NOTIFY_DONE;
}
+static void raw_sock_destruct(struct sock *sk)
+{
+ struct raw_sock *ro = raw_sk(sk);
+
+ free_percpu(ro->uniq);
+ can_sock_destruct(sk);
+}
+
static int raw_init(struct sock *sk)
{
struct raw_sock *ro = raw_sk(sk);
@@ -387,6 +395,8 @@ static int raw_init(struct sock *sk)
if (unlikely(!ro->uniq))
return -ENOMEM;
+ sk->sk_destruct = raw_sock_destruct;
+
/* set notifier */
spin_lock(&raw_notifier_lock);
list_add_tail(&ro->notifier, &raw_notifier_list);
@@ -436,7 +446,6 @@ static int raw_release(struct socket *sock)
ro->bound = 0;
ro->dev = NULL;
ro->count = 0;
- free_percpu(ro->uniq);
sock_orphan(sk);
sock->sk = NULL;
--
2.53.0
^ permalink raw reply related
* [PATCH net 1/2] can: ucan: fix devres lifetime
From: Marc Kleine-Budde @ 2026-04-09 16:57 UTC (permalink / raw)
To: netdev
Cc: davem, kuba, linux-can, kernel, Johan Hovold, stable,
Jakob Unterwurzacher, Marc Kleine-Budde
In-Reply-To: <20260409165942.588421-1-mkl@pengutronix.de>
From: Johan Hovold <johan@kernel.org>
USB drivers bind to USB interfaces and any device managed resources
should have their lifetime tied to the interface rather than parent USB
device. This avoids issues like memory leaks when drivers are unbound
without their devices being physically disconnected (e.g. on probe
deferral or configuration changes).
Fix the control message buffer lifetime so that it is released on driver
unbind.
Fixes: 9f2d3eae88d2 ("can: ucan: add driver for Theobroma Systems UCAN devices")
Cc: stable@vger.kernel.org # 4.19
Cc: Jakob Unterwurzacher <jakob.unterwurzacher@theobroma-systems.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://patch.msgid.link/20260327104520.1310158-1-johan@kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
---
drivers/net/can/usb/ucan.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/can/usb/ucan.c b/drivers/net/can/usb/ucan.c
index 0ea0ac75e42f..ee3c1abbd063 100644
--- a/drivers/net/can/usb/ucan.c
+++ b/drivers/net/can/usb/ucan.c
@@ -1397,7 +1397,7 @@ static int ucan_probe(struct usb_interface *intf,
*/
/* Prepare Memory for control transfers */
- ctl_msg_buffer = devm_kzalloc(&udev->dev,
+ ctl_msg_buffer = devm_kzalloc(&intf->dev,
sizeof(union ucan_ctl_payload),
GFP_KERNEL);
if (!ctl_msg_buffer) {
base-commit: ebe560ea5f54134279356703e73b7f867c89db13
--
2.53.0
^ permalink raw reply related
* [PATCH net 0/2] pull-request: can 2026-04-09
From: Marc Kleine-Budde @ 2026-04-09 16:57 UTC (permalink / raw)
To: netdev; +Cc: davem, kuba, linux-can, kernel
Hello netdev-team,
this is a pull request of 2 patches for net/main.
Johan Hovold's patch fixes the a devres lifetime in the ucan driver.
The last patch is by Samuel Page and fixes a use-after-free in
raw_rcv() in the CAN_RAW protocol.
regards,
Marc
---
The following changes since commit ebe560ea5f54134279356703e73b7f867c89db13:
l2tp: Drop large packets with UDP encap (2026-04-09 10:19:05 +0200)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can.git tags/linux-can-fixes-for-7.0-20260409
for you to fetch changes up to a535a9217ca3f2fccedaafb2fddb4c48f27d36dc:
can: raw: fix ro->uniq use-after-free in raw_rcv() (2026-04-09 18:51:42 +0200)
----------------------------------------------------------------
linux-can-fixes-for-7.0-20260409
----------------------------------------------------------------
Johan Hovold (1):
can: ucan: fix devres lifetime
Samuel Page (1):
can: raw: fix ro->uniq use-after-free in raw_rcv()
drivers/net/can/usb/ucan.c | 2 +-
net/can/raw.c | 11 ++++++++++-
2 files changed, 11 insertions(+), 2 deletions(-)
^ permalink raw reply
* Re: [RFC net PATCH v1] net: pcs: pcs-mtk-lynxi: fix bpi-r3 serdes configuration
From: Vladimir Oltean @ 2026-04-09 16:49 UTC (permalink / raw)
To: Frank Wunderlich, Chester A. Unal, Felix Fietkau
Cc: Alexander Couzens, Daniel Golle, Andrew Lunn, Heiner Kallweit,
Russell King, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Matthias Brugger, AngeloGioacchino Del Regno,
Frank Wunderlich, netdev, linux-kernel, linux-arm-kernel,
linux-mediatek
In-Reply-To: <20260409133344.129620-1-linux@fw-web.de>
On Thu, Apr 09, 2026 at 03:33:42PM +0200, Frank Wunderlich wrote:
> From: Frank Wunderlich <frank-w@public-files.de>
>
> Commit 8871389da151 introduces common pcs dts properties which writes
> rx=normal,tx=normal polarity to register SGMSYS_QPHY_WRAP_CTRL of switch.
> This is initialized with tx-bit set and so change inverts polarity
> compared to before.
>
> It looks like mt7531 has tx polarity inverted in hardware and set tx-bit
> by default to restore the normal polarity.
>
> Till this patch the register write was only called when mediatek,pnswap
> property was set which cannot be done for switch because the fw-node param
> was always NULL from switch driver in the mtk_pcs_lynxi_create call.
>
> Do not configure switch side like it's done before.
>
> Fixes: 8871389da151 ("net: pcs: pcs-mtk-lynxi: deprecate "mediatek,pnswap"")
> Signed-off-by: Frank Wunderlich <frank-w@public-files.de>
> ---
> drivers/net/pcs/pcs-mtk-lynxi.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/drivers/net/pcs/pcs-mtk-lynxi.c b/drivers/net/pcs/pcs-mtk-lynxi.c
> index c12f8087af9b..a753bd88cbc2 100644
> --- a/drivers/net/pcs/pcs-mtk-lynxi.c
> +++ b/drivers/net/pcs/pcs-mtk-lynxi.c
> @@ -129,6 +129,9 @@ static int mtk_pcs_config_polarity(struct mtk_pcs_lynxi *mpcs,
> unsigned int val = 0;
> int ret;
>
> + if (!fwnode)
> + return 0;
> +
> if (fwnode_property_read_bool(fwnode, "mediatek,pnswap"))
> default_pol = PHY_POL_INVERT;
>
> --
> 2.43.0
>
I notice Arınc, listed by ./scripts/get_maintainer.pl drivers/net/dsa/mt7530.c,
and Felix, listed by ./scripts/get_maintainer.pl drivers/net/ethernet/mediatek/mtk_eth_soc.c,
are not on CC. Maybe they have more info.
Only the switch port has a chance of having a non-zero default polarity
setting? (coming from the efuse, if I understood this discussion properly)
https://lore.kernel.org/netdev/C59EED96-3973-4074-A4D8-C264949D447E@linux.dev/
The GMAC doesn't?
^ permalink raw reply
* RE: [Intel-wired-lan] [PATCH net v2 RESEND] ice: fix race condition in TX timestamp ring cleanup
From: Rinitha, SX @ 2026-04-09 16:49 UTC (permalink / raw)
To: Keita Morisaki, Nguyen, Anthony L, Kitszel, Przemyslaw,
Andrew Lunn, David S . Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni
Cc: Alice Michael, Loktionov, Aleksandr, Fijalkowski, Maciej,
Greenwalt, Paul, intel-wired-lan@lists.osuosl.org,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <20260224054533.3372943-1-kmta1236@gmail.com>
> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf Of Keita Morisaki
> Sent: 24 February 2026 11:16
> To: Nguyen, Anthony L <anthony.l.nguyen@intel.com>; Kitszel, Przemyslaw <przemyslaw.kitszel@intel.com>; Andrew Lunn <andrew+netdev@lunn.ch>; David S . Miller <davem@davemloft.net>; Eric Dumazet <edumazet@google.com>; Jakub Kicinski <kuba@kernel.org>; Paolo Abeni <pabeni@redhat.com>
> Cc: Alice Michael <alice.michael@intel.com>; Loktionov, Aleksandr <aleksandr.loktionov@intel.com>; Fijalkowski, Maciej <maciej.fijalkowski@intel.com>; Greenwalt, Paul <paul.greenwalt@intel.com>; intel-wired-lan@lists.osuosl.org; netdev@vger.kernel.org; linux-kernel@vger.kernel.org; Keita Morisaki <kmta1236@gmail.com>
> Subject: [Intel-wired-lan] [PATCH net v2 RESEND] ice: fix race condition in TX timestamp ring cleanup
>
> Fix a race condition between ice_free_tx_tstamp_ring() and ice_tx_map() that can cause a NULL pointer dereference.
>
> ice_free_tx_tstamp_ring currently clears the ICE_TX_FLAGS_TXTIME flag after NULLing the tstamp_ring. This could allow a concurrent ice_tx_map call on another CPU to dereference the tstamp_ring, which could lead to a NULL pointer dereference.
>
> CPU A:ice_free_tx_tstamp_ring() | CPU B:ice_tx_map()
> --------------------------------|---------------------------------
> tx_ring->tstamp_ring = NULL |
> | ice_is_txtime_cfg() -> true
> | tstamp_ring = tx_ring->tstamp_ring
> | tstamp_ring->count // NULL deref!
> flags &= ~ICE_TX_FLAGS_TXTIME |
>
> Fix by:
> 1. Reordering ice_free_tx_tstamp_ring() to clear the flag before
> NULLing the pointer, with smp_wmb() to ensure proper ordering.
> 2. Adding smp_rmb() in ice_tx_map() after the flag check to order the
> flag read before the pointer read, using READ_ONCE() for the
> pointer, and adding a NULL check as a safety net.
> 3. Converting tx_ring->flags from u8 to DECLARE_BITMAP() and using
> atomic bitops (set_bit(), clear_bit(), test_bit()) for all flag
> operations throughout the driver:
> - ICE_TX_RING_FLAGS_XDP
> - ICE_TX_RING_FLAGS_VLAN_L2TAG1
> - ICE_TX_RING_FLAGS_VLAN_L2TAG2
> - ICE_TX_RING_FLAGS_TXTIME
>
> Fixes: ccde82e909467 ("ice: add E830 Earliest TxTime First Offload support")
> Signed-off-by: Keita Morisaki <kmta1236@gmail.com>
> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
> ---
> Changes in v2:
> - Convert tx_ring->flags from u8 to DECLARE_BITMAP() and use atomic
> bitops (set_bit(), clear_bit(), test_bit()) for all flag operations
> instead of WRITE_ONCE() for flag updates
> - Rename flags from ICE_TX_FLAGS_RING_* to ICE_TX_RING_FLAGS_* to
> distinguish from per-packet flags (ICE_TX_FLAGS_*)
>
> drivers/net/ethernet/intel/ice/ice.h | 4 ++--
> drivers/net/ethernet/intel/ice/ice_dcb_lib.c | 2 +-
> drivers/net/ethernet/intel/ice/ice_lib.c | 4 ++--
> drivers/net/ethernet/intel/ice/ice_txrx.c | 23 ++++++++++++++------
> drivers/net/ethernet/intel/ice/ice_txrx.h | 16 +++++++++-----
> 5 files changed, 31 insertions(+), 18 deletions(-)
>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
^ permalink raw reply
* Re: [PATCH] nfc: llcp: fix missing return after LLCP_CLOSED check in recv_hdlc and recv_disc
From: Simon Horman @ 2026-04-09 16:45 UTC (permalink / raw)
To: Lekë Hapçiu
Cc: netdev, linux-nfc, davem, kuba, krzysztof.kozlowski, stable
In-Reply-To: <20260405164158.1344049-1-snowwlake@icloud.com>
On Sun, Apr 05, 2026 at 06:41:58PM +0200, Lekë Hapçiu wrote:
> From: Lekë Hapçiu <framemain@outlook.com>
>
> nfc_llcp_recv_hdlc() and nfc_llcp_recv_disc() both call
> nfc_llcp_sock_get() (which increments the socket reference count) and
> lock_sock() before processing incoming PDUs. When the socket is found
> to be in state LLCP_CLOSED both functions correctly call release_sock()
> and nfc_llcp_sock_put() to undo those operations, but are missing a
> return statement:
>
> lock_sock(sk);
> if (sk->sk_state == LLCP_CLOSED) {
> release_sock(sk);
> nfc_llcp_sock_put(llcp_sock);
> /* ← return missing */
> }
> /* Falls through with lock released and reference dropped */
> ...
> release_sock(sk); /* double unlock */
> nfc_llcp_sock_put(llcp_sock); /* double put → refcount underflow */
>
> The fall-through causes three independent bugs:
>
> 1. Use-after-free: all llcp_sock field accesses after the LLCP_CLOSED
> block occur with the socket lock released and the reference dropped;
> another CPU may free the socket concurrently.
>
> 2. Double release_sock: sk_lock.owned is already 0 — LOCKDEP reports
> "WARNING: suspicious unlock balance detected".
>
> 3. Double nfc_llcp_sock_put: the refcount is decremented a second time
> at the end of the function, potentially driving it below zero
> (refcount_t underflow), corrupting the SLUB freelist and causing a
> subsequent use-after-free or double-free.
>
> Both functions are reachable from any NFC P2P peer within physical
> proximity (~4 cm) without hostile NFCC firmware:
> - nfc_llcp_recv_hdlc: triggered by sending an LLCP I, RR, or RNR PDU
> to a SAP pair whose connection has been torn down.
> - nfc_llcp_recv_disc: triggered by sending an LLCP DISC PDU to a SAP
> pair that is already in LLCP_CLOSED state.
>
> Fix: add the missing return statement in both functions so that the
> LLCP_CLOSED branch exits after cleanup.
>
> Fixes: Introduced with nfc_llcp_recv_hdlc / nfc_llcp_recv_disc
> Signed-off-by: Lekë Hapçiu <framemain@outlook.com>
Curiously this seems to duplicate this patch:
- [PATCH net] nfc: llcp: add missing return after LLCP_CLOSED checks
https://lore.kernel.org/all/20260408081006.3723-1-qjx1298677004@gmail.com/
--
pw-bot: changes-requested
^ permalink raw reply
* [PATCH net-next v4] selftests/net: convert so_txtime to drv-net
From: Willem de Bruijn @ 2026-04-09 16:40 UTC (permalink / raw)
To: netdev
Cc: davem, kuba, edumazet, pabeni, horms, linux-kselftest, shuah,
Willem de Bruijn
From: Willem de Bruijn <willemb@google.com>
In preparation for extending to pacing hardware offload, convert the
so_txtime.sh test to a drv-net test that can be run against netdevsim
and real hardware.
Also update so_txtime.c to not exit on first failure, but run to
completion and report exit code there. This helps with debugging
unexpected results, especially when processing multiple packets,
as in the "reverse_order" testcase.
Signed-off-by: Willem de Bruijn <willemb@google.com>
----
v3 -> v4
- restore original qdisc after test
- drop unnecessary underscore in tap test names
v3: https://lore.kernel.org/netdev/20260407191543.0593b5aa@kernel.org/
v2 -> v3
- Makefile: so_txtime from YNL_GEN_FILES to TEST_GEN_FILES (Sashiko, NIPA)
v2: https://lore.kernel.org/netdev/20260405014458.1038165-1-willemdebruijn.kernel@gmail.com/
v1 -> v2
- move so_txtime.c for net/lib to drivers/net (Jakub)
- fix drivers/net/config order (Jakub)
- detect passing when failure is expected (Jakub, Sashiko)
- pass pylint --disable=R (Jakub)
- only call ksft_run once (Jakub)
- do not sleep if waiting time is negative (Sashiko)
- add \n when converting error() to fprintf() (Sashiko)
- 4 space indentation, instead of 2 space
- increase sync delay from 100 to 200ms, to fix rare vng flakes
v1: https://lore.kernel.org/netdev/20260403175047.152646-1-willemdebruijn.kernel@gmail.com/
---
.../testing/selftests/drivers/net/.gitignore | 1 +
tools/testing/selftests/drivers/net/Makefile | 2 +
tools/testing/selftests/drivers/net/config | 2 +
.../selftests/{ => drivers}/net/so_txtime.c | 24 +++-
.../selftests/drivers/net/so_txtime.py | 97 +++++++++++++++
tools/testing/selftests/net/.gitignore | 1 -
tools/testing/selftests/net/Makefile | 2 -
tools/testing/selftests/net/so_txtime.sh | 110 ------------------
8 files changed, 121 insertions(+), 118 deletions(-)
rename tools/testing/selftests/{ => drivers}/net/so_txtime.c (96%)
create mode 100755 tools/testing/selftests/drivers/net/so_txtime.py
delete mode 100755 tools/testing/selftests/net/so_txtime.sh
diff --git a/tools/testing/selftests/drivers/net/.gitignore b/tools/testing/selftests/drivers/net/.gitignore
index 585ecb4d5dc4..e5314ce4bb2d 100644
--- a/tools/testing/selftests/drivers/net/.gitignore
+++ b/tools/testing/selftests/drivers/net/.gitignore
@@ -1,3 +1,4 @@
# SPDX-License-Identifier: GPL-2.0-only
napi_id_helper
psp_responder
+so_txtime
diff --git a/tools/testing/selftests/drivers/net/Makefile b/tools/testing/selftests/drivers/net/Makefile
index 7c7fa75b80c2..49a61b506f53 100644
--- a/tools/testing/selftests/drivers/net/Makefile
+++ b/tools/testing/selftests/drivers/net/Makefile
@@ -7,6 +7,7 @@ TEST_INCLUDES := $(wildcard lib/py/*.py) \
TEST_GEN_FILES := \
napi_id_helper \
+ so_txtime \
# end of TEST_GEN_FILES
TEST_PROGS := \
@@ -20,6 +21,7 @@ TEST_PROGS := \
queues.py \
ring_reconfig.py \
shaper.py \
+ so_txtime.py \
stats.py \
xdp.py \
# end of TEST_PROGS
diff --git a/tools/testing/selftests/drivers/net/config b/tools/testing/selftests/drivers/net/config
index 77ccf83d87e0..2d39f263e03b 100644
--- a/tools/testing/selftests/drivers/net/config
+++ b/tools/testing/selftests/drivers/net/config
@@ -7,4 +7,6 @@ CONFIG_NETCONSOLE=m
CONFIG_NETCONSOLE_DYNAMIC=y
CONFIG_NETCONSOLE_EXTENDED_LOG=y
CONFIG_NETDEVSIM=m
+CONFIG_NET_SCH_ETF=m
+CONFIG_NET_SCH_FQ=m
CONFIG_XDP_SOCKETS=y
diff --git a/tools/testing/selftests/net/so_txtime.c b/tools/testing/selftests/drivers/net/so_txtime.c
similarity index 96%
rename from tools/testing/selftests/net/so_txtime.c
rename to tools/testing/selftests/drivers/net/so_txtime.c
index b76df1efc2ef..7d144001ecf2 100644
--- a/tools/testing/selftests/net/so_txtime.c
+++ b/tools/testing/selftests/drivers/net/so_txtime.c
@@ -33,6 +33,8 @@
#include <unistd.h>
#include <poll.h>
+#include "kselftest.h"
+
static int cfg_clockid = CLOCK_TAI;
static uint16_t cfg_port = 8000;
static int cfg_variance_us = 4000;
@@ -43,6 +45,8 @@ static bool cfg_rx;
static uint64_t glob_tstart;
static uint64_t tdeliver_max;
+static int errors;
+
/* encode one timed transmission (of a 1B payload) */
struct timed_send {
char data;
@@ -131,13 +135,15 @@ static void do_recv_one(int fdr, struct timed_send *ts)
fprintf(stderr, "payload:%c delay:%lld expected:%lld (us)\n",
rbuf[0], (long long)tstop, (long long)texpect);
- if (rbuf[0] != ts->data)
- error(1, 0, "payload mismatch. expected %c", ts->data);
+ if (rbuf[0] != ts->data) {
+ fprintf(stderr, "payload mismatch. expected %c\n", ts->data);
+ errors++;
+ }
if (llabs(tstop - texpect) > cfg_variance_us) {
fprintf(stderr, "exceeds variance (%d us)\n", cfg_variance_us);
if (!getenv("KSFT_MACHINE_SLOW"))
- exit(1);
+ errors++;
}
}
@@ -255,8 +261,11 @@ static void start_time_wait(void)
return;
now = gettime_ns(CLOCK_REALTIME);
- if (cfg_start_time_ns < now)
+ if (cfg_start_time_ns < now) {
+ fprintf(stderr, "FAIL: start time already passed\n");
+ errors++;
return;
+ }
err = usleep((cfg_start_time_ns - now) / 1000);
if (err)
@@ -513,5 +522,10 @@ int main(int argc, char **argv)
else
do_test_tx((void *)&cfg_src_addr, cfg_alen);
- return 0;
+ if (errors) {
+ fprintf(stderr, "FAIL: %d errors\n", errors);
+ return KSFT_FAIL;
+ }
+
+ return KSFT_PASS;
}
diff --git a/tools/testing/selftests/drivers/net/so_txtime.py b/tools/testing/selftests/drivers/net/so_txtime.py
new file mode 100755
index 000000000000..8738dd02ad0f
--- /dev/null
+++ b/tools/testing/selftests/drivers/net/so_txtime.py
@@ -0,0 +1,97 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: GPL-2.0
+
+"""Regression tests for the SO_TXTIME interface.
+
+Test delivery time in FQ and ETF qdiscs.
+"""
+
+import json
+import time
+
+from lib.py import ksft_exit, ksft_run, ksft_variants
+from lib.py import KsftNamedVariant, KsftSkipEx
+from lib.py import NetDrvEpEnv, bkg, cmd
+
+
+def test_so_txtime(cfg, clockid, ipver, args_tx, args_rx, expect_fail):
+ """Main function. Run so_txtime as sender and receiver."""
+ bin_path = cfg.test_dir / "so_txtime"
+
+ tstart = time.time_ns() + 200_000_000
+
+ cmd_addr = f"-S {cfg.addr_v[ipver]} -D {cfg.remote_addr_v[ipver]}"
+ cmd_base = f"{bin_path} -{ipver} -c {clockid} -t {tstart} {cmd_addr}"
+ cmd_rx = f"{cmd_base} {args_rx} -r"
+ cmd_tx = f"{cmd_base} {args_tx}"
+
+ with bkg(cmd_rx, host=cfg.remote, fail=(not expect_fail), exit_wait=True):
+ cmd(cmd_tx)
+
+
+def _test_variants_mono():
+ for ipver in ["4", "6"]:
+ for testcase in [
+ ["no_delay", "a,-1", "a,-1"],
+ ["zero_delay", "a,0", "a,0"],
+ ["one_pkt", "a,10", "a,10"],
+ ["in_order", "a,10,b,20", "a,10,b,20"],
+ ["reverse_order", "a,20,b,10", "b,20,a,20"],
+ ]:
+ name = f"v{ipver}_{testcase[0]}"
+ yield KsftNamedVariant(name, ipver, testcase[1], testcase[2])
+
+
+@ksft_variants(_test_variants_mono())
+def test_so_txtime_mono(cfg, ipver, args_tx, args_rx):
+ """Run all variants of monotonic (fq) tests."""
+ cmd(f"tc qdisc replace dev {cfg.ifname} root fq")
+ test_so_txtime(cfg, "mono", ipver, args_tx, args_rx, False)
+
+
+def _test_variants_etf():
+ for ipver in ["4", "6"]:
+ for testcase in [
+ ["no_delay", "a,-1", "a,-1", True],
+ ["zero_delay", "a,0", "a,0", True],
+ ["one_pkt", "a,10", "a,10", False],
+ ["in_order", "a,10,b,20", "a,10,b,20", False],
+ ["reverse_order", "a,20,b,10", "b,10,a,20", False],
+ ]:
+ name = f"v{ipver}_{testcase[0]}"
+ yield KsftNamedVariant(
+ name, ipver, testcase[1], testcase[2], testcase[3]
+ )
+
+
+@ksft_variants(_test_variants_etf())
+def test_so_txtime_etf(cfg, ipver, args_tx, args_rx, expect_fail):
+ """Run all variants of etf tests."""
+ try:
+ # ETF does not support change, so remove and re-add it instead.
+ cmd_prefix = f"tc qdisc replace dev {cfg.ifname} root"
+ cmd(f"{cmd_prefix} pfifo_fast")
+ cmd(f"{cmd_prefix} etf clockid CLOCK_TAI delta 400000")
+ except Exception as e:
+ raise KsftSkipEx("tc does not support qdisc etf. skipping") from e
+
+ test_so_txtime(cfg, "tai", ipver, args_tx, args_rx, expect_fail)
+
+
+def main() -> None:
+ """Boilerplate ksft main."""
+ with NetDrvEpEnv(__file__) as cfg:
+ # Record original root qdisc
+ cmd_obj = cmd((f"tc -j qdisc show dev {cfg.ifname} root"))
+ qdisc_root = json.loads(cmd_obj.stdout)[0].get("kind", None)
+
+ ksft_run([test_so_txtime_mono, test_so_txtime_etf], args=(cfg,))
+
+ # Restore original root qdisc. If mq, populate with default_qdisc nodes
+ if (qdisc_root):
+ cmd(f"tc qdisc replace dev {cfg.ifname} root {qdisc_root}")
+ ksft_exit()
+
+
+if __name__ == "__main__":
+ main()
diff --git a/tools/testing/selftests/net/.gitignore b/tools/testing/selftests/net/.gitignore
index 97ad4d551d44..02ad4c99a2b4 100644
--- a/tools/testing/selftests/net/.gitignore
+++ b/tools/testing/selftests/net/.gitignore
@@ -40,7 +40,6 @@ skf_net_off
socket
so_incoming_cpu
so_netns_cookie
-so_txtime
so_rcv_listener
stress_reuseport_listen
tap
diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
index 6bced3ed798b..b7f51e8f190f 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -81,7 +81,6 @@ TEST_PROGS := \
rxtimestamp.sh \
sctp_vrf.sh \
skf_net_off.sh \
- so_txtime.sh \
srv6_end_dt46_l3vpn_test.sh \
srv6_end_dt4_l3vpn_test.sh \
srv6_end_dt6_l3vpn_test.sh \
@@ -154,7 +153,6 @@ TEST_GEN_FILES := \
skf_net_off \
so_netns_cookie \
so_rcv_listener \
- so_txtime \
socket \
stress_reuseport_listen \
tcp_fastopen_backup_key \
diff --git a/tools/testing/selftests/net/so_txtime.sh b/tools/testing/selftests/net/so_txtime.sh
deleted file mode 100755
index 5e861ad32a42..000000000000
--- a/tools/testing/selftests/net/so_txtime.sh
+++ /dev/null
@@ -1,110 +0,0 @@
-#!/bin/bash
-# SPDX-License-Identifier: GPL-2.0
-#
-# Regression tests for the SO_TXTIME interface
-
-set -e
-
-readonly ksft_skip=4
-readonly DEV="veth0"
-readonly BIN="./so_txtime"
-
-readonly RAND="$(mktemp -u XXXXXX)"
-readonly NSPREFIX="ns-${RAND}"
-readonly NS1="${NSPREFIX}1"
-readonly NS2="${NSPREFIX}2"
-
-readonly SADDR4='192.168.1.1'
-readonly DADDR4='192.168.1.2'
-readonly SADDR6='fd::1'
-readonly DADDR6='fd::2'
-
-cleanup() {
- ip netns del "${NS2}"
- ip netns del "${NS1}"
-}
-
-trap cleanup EXIT
-
-# Create virtual ethernet pair between network namespaces
-ip netns add "${NS1}"
-ip netns add "${NS2}"
-
-ip link add "${DEV}" netns "${NS1}" type veth \
- peer name "${DEV}" netns "${NS2}"
-
-# Bring the devices up
-ip -netns "${NS1}" link set "${DEV}" up
-ip -netns "${NS2}" link set "${DEV}" up
-
-# Set fixed MAC addresses on the devices
-ip -netns "${NS1}" link set dev "${DEV}" address 02:02:02:02:02:02
-ip -netns "${NS2}" link set dev "${DEV}" address 06:06:06:06:06:06
-
-# Add fixed IP addresses to the devices
-ip -netns "${NS1}" addr add 192.168.1.1/24 dev "${DEV}"
-ip -netns "${NS2}" addr add 192.168.1.2/24 dev "${DEV}"
-ip -netns "${NS1}" addr add fd::1/64 dev "${DEV}" nodad
-ip -netns "${NS2}" addr add fd::2/64 dev "${DEV}" nodad
-
-run_test() {
- local readonly IP="$1"
- local readonly CLOCK="$2"
- local readonly TXARGS="$3"
- local readonly RXARGS="$4"
-
- if [[ "${IP}" == "4" ]]; then
- local readonly SADDR="${SADDR4}"
- local readonly DADDR="${DADDR4}"
- elif [[ "${IP}" == "6" ]]; then
- local readonly SADDR="${SADDR6}"
- local readonly DADDR="${DADDR6}"
- else
- echo "Invalid IP version ${IP}"
- exit 1
- fi
-
- local readonly START="$(date +%s%N --date="+ 0.1 seconds")"
-
- ip netns exec "${NS2}" "${BIN}" -"${IP}" -c "${CLOCK}" -t "${START}" -S "${SADDR}" -D "${DADDR}" "${RXARGS}" -r &
- ip netns exec "${NS1}" "${BIN}" -"${IP}" -c "${CLOCK}" -t "${START}" -S "${SADDR}" -D "${DADDR}" "${TXARGS}"
- wait "$!"
-}
-
-do_test() {
- run_test $@
- [ $? -ne 0 ] && ret=1
-}
-
-do_fail_test() {
- run_test $@
- [ $? -eq 0 ] && ret=1
-}
-
-ip netns exec "${NS1}" tc qdisc add dev "${DEV}" root fq
-set +e
-ret=0
-do_test 4 mono a,-1 a,-1
-do_test 6 mono a,0 a,0
-do_test 6 mono a,10 a,10
-do_test 4 mono a,10,b,20 a,10,b,20
-do_test 6 mono a,20,b,10 b,20,a,20
-
-if ip netns exec "${NS1}" tc qdisc replace dev "${DEV}" root etf clockid CLOCK_TAI delta 400000; then
- do_fail_test 4 tai a,-1 a,-1
- do_fail_test 6 tai a,0 a,0
- do_test 6 tai a,10 a,10
- do_test 4 tai a,10,b,20 a,10,b,20
- do_test 6 tai a,20,b,10 b,10,a,20
-else
- echo "tc ($(tc -V)) does not support qdisc etf. skipping"
- [ $ret -eq 0 ] && ret=$ksft_skip
-fi
-
-if [ $ret -eq 0 ]; then
- echo OK. All tests passed
-elif [[ $ret -ne $ksft_skip && -n "$KSFT_MACHINE_SLOW" ]]; then
- echo "Ignoring errors due to slow environment" 1>&2
- ret=0
-fi
-exit $ret
--
2.53.0.1213.gd9a14994de-goog
^ permalink raw reply related
* Re: [PATCH] nfc: llcp: fix u8 offset truncation in LLCP TLV parsers
From: Simon Horman @ 2026-04-09 16:41 UTC (permalink / raw)
To: Lekë Hapçiu
Cc: netdev, davem, edumazet, kuba, pabeni, stable, linux-kernel,
Lekë Hapçiu
In-Reply-To: <20260405105938.1334488-1-snowwlake@icloud.com>
On Sun, Apr 05, 2026 at 12:59:38PM +0200, Lekë Hapçiu wrote:
> From: Lekë Hapçiu <framemain@outlook.com>
>
> nfc_llcp_parse_gb_tlv() and nfc_llcp_parse_connection_tlv() declare
> 'offset' as u8, but compare it against a u16 tlv_array_len:
>
> u8 type, length, offset = 0;
> while (offset < tlv_array_len) { /* tlv_array_len is u16 */
> ...
> offset += length + 2; /* wraps at 256 */
> tlv += length + 2;
> }
>
> When tlv_array_len > 255 -- possible in nfc_llcp_parse_connection_tlv()
> when the peer has negotiated MIUX = 0x7FF (MIU = 2175 bytes), so that
> a CONNECT PDU can carry a TLV array of up to 2173 bytes -- the u8
> offset wraps back below tlv_array_len after every 128 zero-length TLV
> entries and the loop never terminates. The 'tlv' pointer meanwhile
> advances without bound into adjacent kernel heap, causing:
>
> * an OOB read of kernel heap content past the skb end;
> * a kernel page fault / oops once 'tlv' leaves mapped memory.
Is the more general explanation of this problem that the length of packet
data is used as the tlv_array_len parameter, and that length is not
verified to be within the expected bound?
I am also concerned that the packet data needs to be pulled
before it can be safely accessed.
>
> This is reachable from any NFC P2P peer device within ~4 cm without
> requiring compromised NFCC firmware.
I think some further explanation is warranted here if that is the case.
> Fix: promote 'offset' from u8 to u16 in both parsers, matching the
> type of their tlv_array_len parameter.
>
> nfc_llcp_parse_gb_tlv() takes GB bytes from the ATR_RES (max 44 bytes),
> so the wrap cannot occur in practice there. Change it anyway for
> correctness and to prevent copy-paste reintroduction.
In the case of both nfc_llcp_parse_gb_tlv() it seems to me that wrap-around
can occur because the value of length, which is used to increment the value
of offset, is read from unverified packet data. Is the data validate
somewhere, or known to be correct?
If so, I expect this can also result in an overrun of tlv.
And I think the same problem exists in nfc_llcp_parse_gb_tlv().
>
> Cc: stable@vger.kernel.org
A fixes tag is needed here.
> Signed-off-by: Lekë Hapçiu <framemain@outlook.com>
The patch should be targeted at net-next like this:
Subject: [PATCH net] ...
And please group together related patches - I see several for nfc -
in a patchset.
More on the Netdev development process can be found here
https://docs.kernel.org/process/maintainer-netdev.html
--
pw-bot: changes-requested
^ permalink raw reply
* [PATCH v3 net] vsock: fix buffer size clamping order
From: Norbert Szetei @ 2026-04-09 16:34 UTC (permalink / raw)
To: Stefano Garzarella
Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, virtualization, netdev, linux-kernel
In vsock_update_buffer_size(), the buffer size was being clamped to the
maximum first, and then to the minimum. If a user sets a minimum buffer
size larger than the maximum, the minimum check overrides the maximum
check, inverting the constraint.
This breaks the intended socket memory boundaries by allowing the
vsk->buffer_size to grow beyond the configured vsk->buffer_max_size.
Fix this by checking the minimum first, and then the maximum. This
ensures the buffer size never exceeds the buffer_max_size.
Fixes: b9f2b0ffde0c ("vsock: handle buffer_size sockopts in the core")
Suggested-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Norbert Szetei <norbert@doyensec.com>
---
v3:
- Added Fixes and Suggested-by tags.
net/vmw_vsock/af_vsock.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index d912ed2f012a..08f4dfb9782c 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -1951,12 +1951,12 @@ static void vsock_update_buffer_size(struct vsock_sock *vsk,
const struct vsock_transport *transport,
u64 val)
{
- if (val > vsk->buffer_max_size)
- val = vsk->buffer_max_size;
-
if (val < vsk->buffer_min_size)
val = vsk->buffer_min_size;
+ if (val > vsk->buffer_max_size)
+ val = vsk->buffer_max_size;
+
if (val != vsk->buffer_size &&
transport && transport->notify_buffer_size)
transport->notify_buffer_size(vsk, &val);
--
2.53.0
^ permalink raw reply related
* Re: BUG: net-next (7.0-rc6 based and later) fails to boot on Jetson Xavier NX
From: Russell King (Oracle) @ 2026-04-09 16:16 UTC (permalink / raw)
To: Linus Torvalds
Cc: Will Deacon, Robin Murphy, netdev, linux-arm-kernel, linux-kernel,
iommu, linux-ext4, dmaengine, Marek Szyprowski, Theodore Ts'o,
Andreas Dilger, Vinod Koul, Frank Li
In-Reply-To: <CAHk-=whO3F1u+nme4cnYMy5baYmb7CH=wE63dcNaPLWD0vKaew@mail.gmail.com>
On Thu, Apr 09, 2026 at 08:37:53AM -0700, Linus Torvalds wrote:
> On Thu, 9 Apr 2026 at 05:24, Will Deacon <will@kernel.org> wrote:
> >
> > On Wed, Apr 08, 2026 at 08:52:32PM +0100, Russell King (Oracle) wrote:
> > > What's the status on the iommu fix? Is it merged into mainline yet?
> > > If it isn't already, that means net-next remains unbootable going
> > > into the merge window without manually carrying the fix locally.
> >
> > I'll pick it up for 7.0 in the iommu tree.
>
> ... and now it's in my tree.
Thanks, I see you merged it prior to the net tree, which should mean
the fix finds its way into net-next! Yay! Double thanks for that!
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
^ permalink raw reply
* Re: [PATCH net-next v40 0/8] eea: Add basic driver framework for Alibaba Elastic Ethernet Adaptor
From: Jakub Kicinski @ 2026-04-09 16:16 UTC (permalink / raw)
To: Xuan Zhuo
Cc: netdev, Andrew Lunn, David S. Miller, Eric Dumazet, Paolo Abeni,
Wen Gu, Philo Lu, Vadim Fedorenko, Dong Yibo, Jes Sorensen,
Heiner Kallweit, Dust Li
In-Reply-To: <20260409122130.129416-1-xuanzhuo@linux.alibaba.com>
On Thu, 9 Apr 2026 20:21:22 +0800 Xuan Zhuo wrote:
> Subject: [PATCH net-next v40 0/8] eea: Add basic driver framework for Alibaba Elastic Ethernet Adaptor
I asked you to post not more than 2 versions a week:
https://lore.kernel.org/all/20260316195252.7fa24729@kernel.org/
This is the 5th posting since last Saturday.
I don't want to see any more posting of this driver until after the
merge window. If you violate a direct ask like that 1 more time you
will get temporarily banned.
And of course this posting is getting discarded.
--
pw-bot: defer
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox