* [PATCH net] net: sched: fix warn on htb offloaded class creation
@ 2023-10-26 15:36 Paolo Abeni
2023-10-27 13:57 ` Maxim Mikityanskiy
0 siblings, 1 reply; 11+ messages in thread
From: Paolo Abeni @ 2023-10-26 15:36 UTC (permalink / raw)
To: netdev
Cc: Jamal Hadi Salim, Cong Wang, Jiri Pirko, David S. Miller,
Eric Dumazet, Jakub Kicinski, Tariq Toukan, Maxim Mikityanskiy
The following commands:
tc qdisc add dev eth1 handle 2: root htb offload
tc class add dev eth1 parent 2: classid 2:1 htb rate 5mbit burst 15k
yeld to a WARN in the HTB qdisc:
WARNING: CPU: 2 PID: 1583 at net/sched/sch_htb.c:1959
CPU: 2 PID: 1583 Comm: tc Kdump: loaded 6.6.0-rc2.mptcp_7895773e5235+ #59
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc37 04/01/2014
RIP: 0010:htb_change_class+0x25c4/0x2e30 [sch_htb]
Code: 24 58 48 b8 00 00 00 00 00 fc ff df 48 89 ca 48 c1 ea 03 80 3c 02 00 0f 85 92 01 00 00 49 89 8c 24 b0 01 00 00 e9 77 fc ff ff <0f> 0b e9 15 ec ff ff 80 3d f8 35 00 00 00 0f 85 d4 f9 ff ff ba 32
RSP: 0018:ffffc900015df240 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff88811b4ca000 RCX: ffff88811db42800
RDX: 1ffff11023b68502 RSI: ffffffffaf2e6a00 RDI: ffff88811db42810
RBP: ffff88811db45000 R08: 0000000000000001 R09: fffffbfff664bbc9
R10: ffffffffb325de4f R11: ffffffffb2d33748 R12: 0000000000000000
R13: ffff88811db43000 R14: ffff88811b4caaac R15: ffff8881252c0030
FS: 00007f6c1f126740(0000) GS:ffff88815aa00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055dca8e5b4a8 CR3: 000000011bc7a006 CR4: 0000000000370ee0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
tc_ctl_tclass+0x394/0xeb0
rtnetlink_rcv_msg+0x2f5/0xaa0
netlink_rcv_skb+0x12e/0x3a0
netlink_unicast+0x421/0x730
netlink_sendmsg+0x79e/0xc60
____sys_sendmsg+0x95a/0xc20
___sys_sendmsg+0xee/0x170
__sys_sendmsg+0xc6/0x170
do_syscall_64+0x58/0x80
entry_SYSCALL_64_after_hwframe+0x6e/0xd8
The first command creates per TX queue pfifo qdiscs in
tc_modify_qdisc() -> htb_init() and grafts the pfifo to each dev_queue
via tc_modify_qdisc() -> qdisc_graft() -> htb_attach().
When the command completes, the qdisc_sleeping for each dev_queue is a
pfifo one. The next class creation will trigger the reported splat.
Address the issue taking care of old non-builtin qdisc in
htb_change_class().
Fixes: d03b195b5aa0 ("sch_htb: Hierarchical QoS hardware offload")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
net/sched/sch_htb.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
index 0d947414e616..dc682bd542b4 100644
--- a/net/sched/sch_htb.c
+++ b/net/sched/sch_htb.c
@@ -1955,8 +1955,7 @@ static int htb_change_class(struct Qdisc *sch, u32 classid,
qdisc_refcount_inc(new_q);
}
old_q = htb_graft_helper(dev_queue, new_q);
- /* No qdisc_put needed. */
- WARN_ON(!(old_q->flags & TCQ_F_BUILTIN));
+ qdisc_put(old_q);
}
sch_tree_lock(sch);
if (parent && !parent->level) {
--
2.41.0
^ permalink raw reply related [flat|nested] 11+ messages in thread* Re: [PATCH net] net: sched: fix warn on htb offloaded class creation 2023-10-26 15:36 [PATCH net] net: sched: fix warn on htb offloaded class creation Paolo Abeni @ 2023-10-27 13:57 ` Maxim Mikityanskiy 2023-10-29 12:01 ` Gal Pressman 2023-10-31 9:11 ` Paolo Abeni 0 siblings, 2 replies; 11+ messages in thread From: Maxim Mikityanskiy @ 2023-10-27 13:57 UTC (permalink / raw) To: Paolo Abeni Cc: netdev, Jamal Hadi Salim, Cong Wang, Jiri Pirko, David S. Miller, Eric Dumazet, Jakub Kicinski, Tariq Toukan, Gal Pressman, Saeed Mahameed I believe this is not the right fix. On Thu, 26 Oct 2023 at 17:36:48 +0200, Paolo Abeni wrote: > The following commands: > > tc qdisc add dev eth1 handle 2: root htb offload > tc class add dev eth1 parent 2: classid 2:1 htb rate 5mbit burst 15k > > yeld to a WARN in the HTB qdisc: Something is off here. These are literally the most basic commands one could invoke with HTB offload, I'm sure they worked. Is it something that broke recently? Tariq/Gal/Saeed, could you check them on a Mellanox NIC? > > WARNING: CPU: 2 PID: 1583 at net/sched/sch_htb.c:1959 > CPU: 2 PID: 1583 Comm: tc Kdump: loaded 6.6.0-rc2.mptcp_7895773e5235+ #59 > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc37 04/01/2014 > RIP: 0010:htb_change_class+0x25c4/0x2e30 [sch_htb] > Code: 24 58 48 b8 00 00 00 00 00 fc ff df 48 89 ca 48 c1 ea 03 80 3c 02 00 0f 85 92 01 00 00 49 89 8c 24 b0 01 00 00 e9 77 fc ff ff <0f> 0b e9 15 ec ff ff 80 3d f8 35 00 00 00 0f 85 d4 f9 ff ff ba 32 > RSP: 0018:ffffc900015df240 EFLAGS: 00010246 > RAX: 0000000000000000 RBX: ffff88811b4ca000 RCX: ffff88811db42800 > RDX: 1ffff11023b68502 RSI: ffffffffaf2e6a00 RDI: ffff88811db42810 > RBP: ffff88811db45000 R08: 0000000000000001 R09: fffffbfff664bbc9 > R10: ffffffffb325de4f R11: ffffffffb2d33748 R12: 0000000000000000 > R13: ffff88811db43000 R14: ffff88811b4caaac R15: ffff8881252c0030 > FS: 00007f6c1f126740(0000) GS:ffff88815aa00000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 000055dca8e5b4a8 CR3: 000000011bc7a006 CR4: 0000000000370ee0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > Call Trace: > <TASK> > tc_ctl_tclass+0x394/0xeb0 > rtnetlink_rcv_msg+0x2f5/0xaa0 > netlink_rcv_skb+0x12e/0x3a0 > netlink_unicast+0x421/0x730 > netlink_sendmsg+0x79e/0xc60 > ____sys_sendmsg+0x95a/0xc20 > ___sys_sendmsg+0xee/0x170 > __sys_sendmsg+0xc6/0x170 > do_syscall_64+0x58/0x80 > entry_SYSCALL_64_after_hwframe+0x6e/0xd8 > > The first command creates per TX queue pfifo qdiscs in > tc_modify_qdisc() -> htb_init() and grafts the pfifo to each dev_queue > via tc_modify_qdisc() -> qdisc_graft() -> htb_attach(). Not exactly; it grafts pfifo to direct queues only. htb_attach_offload explicitly grafts noop to all the remaining queues. > When the command completes, the qdisc_sleeping for each dev_queue is a > pfifo one. The next class creation will trigger the reported splat. > > Address the issue taking care of old non-builtin qdisc in > htb_change_class(). > > Fixes: d03b195b5aa0 ("sch_htb: Hierarchical QoS hardware offload") > Signed-off-by: Paolo Abeni <pabeni@redhat.com> > --- > net/sched/sch_htb.c | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c > index 0d947414e616..dc682bd542b4 100644 > --- a/net/sched/sch_htb.c > +++ b/net/sched/sch_htb.c > @@ -1955,8 +1955,7 @@ static int htb_change_class(struct Qdisc *sch, u32 classid, > qdisc_refcount_inc(new_q); > } > old_q = htb_graft_helper(dev_queue, new_q); > - /* No qdisc_put needed. */ > - WARN_ON(!(old_q->flags & TCQ_F_BUILTIN)); > + qdisc_put(old_q); We can get here after one of two cases above: 1. A new queue is allocated with TC_HTB_LEAF_ALLOC_QUEUE. It's supposed to have a noop qdisc by default (after htb_attach_offload). 2. An existing leaf is converted to an inner node with TC_HTB_LEAF_TO_INNER, its queue is reused. htb_graft_helper(dev_queue, NULL) makes sure that the qdisc is noop, not pfifo anymore. This WARN_ON is here for a reason. If it triggered, it indicates a bug somewhere else, because we don't expect anything but noop_qdisc after the `if` above. Silencing the warning doesn't fix the bug and may lead to damaging the consistency of the data structures. > } > sch_tree_lock(sch); > if (parent && !parent->level) { > -- > 2.41.0 > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net] net: sched: fix warn on htb offloaded class creation 2023-10-27 13:57 ` Maxim Mikityanskiy @ 2023-10-29 12:01 ` Gal Pressman 2023-10-31 9:11 ` Paolo Abeni 1 sibling, 0 replies; 11+ messages in thread From: Gal Pressman @ 2023-10-29 12:01 UTC (permalink / raw) To: Maxim Mikityanskiy, Paolo Abeni Cc: netdev, Jamal Hadi Salim, Cong Wang, Jiri Pirko, David S. Miller, Eric Dumazet, Jakub Kicinski, Tariq Toukan, Saeed Mahameed Hey Maxim! On 27/10/2023 16:57, Maxim Mikityanskiy wrote: > I believe this is not the right fix. > > On Thu, 26 Oct 2023 at 17:36:48 +0200, Paolo Abeni wrote: >> The following commands: >> >> tc qdisc add dev eth1 handle 2: root htb offload >> tc class add dev eth1 parent 2: classid 2:1 htb rate 5mbit burst 15k >> >> yeld to a WARN in the HTB qdisc: > > Something is off here. These are literally the most basic commands one > could invoke with HTB offload, I'm sure they worked. Is it something > that broke recently? Tariq/Gal/Saeed, could you check them on a Mellanox > NIC? This is working on our NICs, we do not hit that warning. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net] net: sched: fix warn on htb offloaded class creation 2023-10-27 13:57 ` Maxim Mikityanskiy 2023-10-29 12:01 ` Gal Pressman @ 2023-10-31 9:11 ` Paolo Abeni 2023-10-31 14:40 ` Maxim Mikityanskiy 1 sibling, 1 reply; 11+ messages in thread From: Paolo Abeni @ 2023-10-31 9:11 UTC (permalink / raw) To: Maxim Mikityanskiy Cc: netdev, Jamal Hadi Salim, Cong Wang, Jiri Pirko, David S. Miller, Eric Dumazet, Jakub Kicinski, Tariq Toukan, Gal Pressman, Saeed Mahameed Hi, I'm sorry for the late reply. On Fri, 2023-10-27 at 16:57 +0300, Maxim Mikityanskiy wrote: > I believe this is not the right fix. > > On Thu, 26 Oct 2023 at 17:36:48 +0200, Paolo Abeni wrote: > > The following commands: > > > > tc qdisc add dev eth1 handle 2: root htb offload > > tc class add dev eth1 parent 2: classid 2:1 htb rate 5mbit burst 15k > > > > yeld to a WARN in the HTB qdisc: > > Something is off here. These are literally the most basic commands one > could invoke with HTB offload, I'm sure they worked. Is it something > that broke recently? Tariq/Gal/Saeed, could you check them on a Mellanox > NIC? > > > > > WARNING: CPU: 2 PID: 1583 at net/sched/sch_htb.c:1959 > > CPU: 2 PID: 1583 Comm: tc Kdump: loaded 6.6.0-rc2.mptcp_7895773e5235+ #59 > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc37 04/01/2014 > > RIP: 0010:htb_change_class+0x25c4/0x2e30 [sch_htb] > > Code: 24 58 48 b8 00 00 00 00 00 fc ff df 48 89 ca 48 c1 ea 03 80 3c 02 00 0f 85 92 01 00 00 49 89 8c 24 b0 01 00 00 e9 77 fc ff ff <0f> 0b e9 15 ec ff ff 80 3d f8 35 00 00 00 0f 85 d4 f9 ff ff ba 32 > > RSP: 0018:ffffc900015df240 EFLAGS: 00010246 > > RAX: 0000000000000000 RBX: ffff88811b4ca000 RCX: ffff88811db42800 > > RDX: 1ffff11023b68502 RSI: ffffffffaf2e6a00 RDI: ffff88811db42810 > > RBP: ffff88811db45000 R08: 0000000000000001 R09: fffffbfff664bbc9 > > R10: ffffffffb325de4f R11: ffffffffb2d33748 R12: 0000000000000000 > > R13: ffff88811db43000 R14: ffff88811b4caaac R15: ffff8881252c0030 > > FS: 00007f6c1f126740(0000) GS:ffff88815aa00000(0000) knlGS:0000000000000000 > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > CR2: 000055dca8e5b4a8 CR3: 000000011bc7a006 CR4: 0000000000370ee0 > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > > Call Trace: > > <TASK> > > tc_ctl_tclass+0x394/0xeb0 > > rtnetlink_rcv_msg+0x2f5/0xaa0 > > netlink_rcv_skb+0x12e/0x3a0 > > netlink_unicast+0x421/0x730 > > netlink_sendmsg+0x79e/0xc60 > > ____sys_sendmsg+0x95a/0xc20 > > ___sys_sendmsg+0xee/0x170 > > __sys_sendmsg+0xc6/0x170 > > do_syscall_64+0x58/0x80 > > entry_SYSCALL_64_after_hwframe+0x6e/0xd8 > > > > The first command creates per TX queue pfifo qdiscs in > > tc_modify_qdisc() -> htb_init() and grafts the pfifo to each dev_queue > > via tc_modify_qdisc() -> qdisc_graft() -> htb_attach(). > > Not exactly; it grafts pfifo to direct queues only. htb_attach_offload > explicitly grafts noop to all the remaining queues. num_direct_qdiscs == real_num_tx_queues: https://elixir.bootlin.com/linux/latest/source/net/sched/sch_htb.c#L1101 pfifo will be configured on all the TX queues available at TC creation time, right? Lacking a mlx card with offload support I hack basic htb support in netdevsim and I observe the splat on top of such device. I can as well share the netdevsim patch - it will need some clean-up. > > > When the command completes, the qdisc_sleeping for each dev_queue is a > > pfifo one. The next class creation will trigger the reported splat. > > > > Address the issue taking care of old non-builtin qdisc in > > htb_change_class(). > > > > Fixes: d03b195b5aa0 ("sch_htb: Hierarchical QoS hardware offload") > > Signed-off-by: Paolo Abeni <pabeni@redhat.com> > > --- > > net/sched/sch_htb.c | 3 +-- > > 1 file changed, 1 insertion(+), 2 deletions(-) > > > > diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c > > index 0d947414e616..dc682bd542b4 100644 > > --- a/net/sched/sch_htb.c > > +++ b/net/sched/sch_htb.c > > @@ -1955,8 +1955,7 @@ static int htb_change_class(struct Qdisc *sch, u32 classid, > > qdisc_refcount_inc(new_q); > > } > > old_q = htb_graft_helper(dev_queue, new_q); > > - /* No qdisc_put needed. */ > > - WARN_ON(!(old_q->flags & TCQ_F_BUILTIN)); > > + qdisc_put(old_q); > > We can get here after one of two cases above: > > 1. A new queue is allocated with TC_HTB_LEAF_ALLOC_QUEUE. It's supposed > to have a noop qdisc by default (after htb_attach_offload). So most likely the trivial netdevsim implementation I used was not good enough. Which constrains should respect TC_HTB_LEAF_ALLOC_QUEUE WRT the returned qid value? should it in the (real_num_tx_queues, num_tx_queues] range? Can HTB actually configure H/W shaping on real_num_tx_queues? I find no clear documentation WRT the above. Thanks! Paolo ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net] net: sched: fix warn on htb offloaded class creation 2023-10-31 9:11 ` Paolo Abeni @ 2023-10-31 14:40 ` Maxim Mikityanskiy 2023-11-09 21:54 ` Chittim, Madhu 0 siblings, 1 reply; 11+ messages in thread From: Maxim Mikityanskiy @ 2023-10-31 14:40 UTC (permalink / raw) To: Paolo Abeni Cc: netdev, Jamal Hadi Salim, Cong Wang, Jiri Pirko, David S. Miller, Eric Dumazet, Jakub Kicinski, Tariq Toukan, Gal Pressman, Saeed Mahameed On Tue, 31 Oct 2023 at 10:11:14 +0100, Paolo Abeni wrote: > Hi, > > I'm sorry for the late reply. > > On Fri, 2023-10-27 at 16:57 +0300, Maxim Mikityanskiy wrote: > > I believe this is not the right fix. > > > > On Thu, 26 Oct 2023 at 17:36:48 +0200, Paolo Abeni wrote: > > > The following commands: > > > > > > tc qdisc add dev eth1 handle 2: root htb offload > > > tc class add dev eth1 parent 2: classid 2:1 htb rate 5mbit burst 15k > > > > > > yeld to a WARN in the HTB qdisc: > > > > Something is off here. These are literally the most basic commands one > > could invoke with HTB offload, I'm sure they worked. Is it something > > that broke recently? Tariq/Gal/Saeed, could you check them on a Mellanox > > NIC? > > > > > > > > WARNING: CPU: 2 PID: 1583 at net/sched/sch_htb.c:1959 > > > CPU: 2 PID: 1583 Comm: tc Kdump: loaded 6.6.0-rc2.mptcp_7895773e5235+ #59 > > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc37 04/01/2014 > > > RIP: 0010:htb_change_class+0x25c4/0x2e30 [sch_htb] > > > Code: 24 58 48 b8 00 00 00 00 00 fc ff df 48 89 ca 48 c1 ea 03 80 3c 02 00 0f 85 92 01 00 00 49 89 8c 24 b0 01 00 00 e9 77 fc ff ff <0f> 0b e9 15 ec ff ff 80 3d f8 35 00 00 00 0f 85 d4 f9 ff ff ba 32 > > > RSP: 0018:ffffc900015df240 EFLAGS: 00010246 > > > RAX: 0000000000000000 RBX: ffff88811b4ca000 RCX: ffff88811db42800 > > > RDX: 1ffff11023b68502 RSI: ffffffffaf2e6a00 RDI: ffff88811db42810 > > > RBP: ffff88811db45000 R08: 0000000000000001 R09: fffffbfff664bbc9 > > > R10: ffffffffb325de4f R11: ffffffffb2d33748 R12: 0000000000000000 > > > R13: ffff88811db43000 R14: ffff88811b4caaac R15: ffff8881252c0030 > > > FS: 00007f6c1f126740(0000) GS:ffff88815aa00000(0000) knlGS:0000000000000000 > > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > CR2: 000055dca8e5b4a8 CR3: 000000011bc7a006 CR4: 0000000000370ee0 > > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > > > Call Trace: > > > <TASK> > > > tc_ctl_tclass+0x394/0xeb0 > > > rtnetlink_rcv_msg+0x2f5/0xaa0 > > > netlink_rcv_skb+0x12e/0x3a0 > > > netlink_unicast+0x421/0x730 > > > netlink_sendmsg+0x79e/0xc60 > > > ____sys_sendmsg+0x95a/0xc20 > > > ___sys_sendmsg+0xee/0x170 > > > __sys_sendmsg+0xc6/0x170 > > > do_syscall_64+0x58/0x80 > > > entry_SYSCALL_64_after_hwframe+0x6e/0xd8 > > > > > > The first command creates per TX queue pfifo qdiscs in > > > tc_modify_qdisc() -> htb_init() and grafts the pfifo to each dev_queue > > > via tc_modify_qdisc() -> qdisc_graft() -> htb_attach(). > > > > Not exactly; it grafts pfifo to direct queues only. htb_attach_offload > > explicitly grafts noop to all the remaining queues. > > num_direct_qdiscs == real_num_tx_queues: > > https://elixir.bootlin.com/linux/latest/source/net/sched/sch_htb.c#L1101 > > pfifo will be configured on all the TX queues available at TC creation > time, right? Yes, all real TX queues will be used as direct queues (for unclassified traffic). num_tx_queues should be somewhat bigger than real_num_tx_queues - it should reserve a queue per potential leaf class. pfifo is configured on direct queues, and the reserved queues have noop. Then, when a new leaf class is added (TC_HTB_LEAF_ALLOC_QUEUE), the driver allocates a new queue and increases real_num_tx_queues. HTB assigns a pfifo qdisc to the newly allocated queue. Changing the hierarchy (deleting a node or converting an inner node to a leaf) may reorder the classful queues (with indexes >= the initial real_num_tx_queues), so that there are no gaps. > Lacking a mlx card with offload support I hack basic htb support in > netdevsim and I observe the splat on top of such device. I can as well > share the netdevsim patch - it will need some clean-up. I will be happy to review the netdevsim patch, but I don't promise prompt responsiveness. > > > > > When the command completes, the qdisc_sleeping for each dev_queue is a > > > pfifo one. The next class creation will trigger the reported splat. > > > > > > Address the issue taking care of old non-builtin qdisc in > > > htb_change_class(). > > > > > > Fixes: d03b195b5aa0 ("sch_htb: Hierarchical QoS hardware offload") > > > Signed-off-by: Paolo Abeni <pabeni@redhat.com> > > > --- > > > net/sched/sch_htb.c | 3 +-- > > > 1 file changed, 1 insertion(+), 2 deletions(-) > > > > > > diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c > > > index 0d947414e616..dc682bd542b4 100644 > > > --- a/net/sched/sch_htb.c > > > +++ b/net/sched/sch_htb.c > > > @@ -1955,8 +1955,7 @@ static int htb_change_class(struct Qdisc *sch, u32 classid, > > > qdisc_refcount_inc(new_q); > > > } > > > old_q = htb_graft_helper(dev_queue, new_q); > > > - /* No qdisc_put needed. */ > > > - WARN_ON(!(old_q->flags & TCQ_F_BUILTIN)); > > > + qdisc_put(old_q); > > > > We can get here after one of two cases above: > > > > 1. A new queue is allocated with TC_HTB_LEAF_ALLOC_QUEUE. It's supposed > > to have a noop qdisc by default (after htb_attach_offload). > > So most likely the trivial netdevsim implementation I used was not good > enough. > > Which constrains should respect TC_HTB_LEAF_ALLOC_QUEUE WRT the > returned qid value? should it in the (real_num_tx_queues, > num_tx_queues] range? Let's say N is real_num_tx_queues as it was at the moment of attaching. HTB queues should be allocated from [N, num_tx_queues), and real_num_tx_queues should be increased accordingly. It should not return queues number [0, N). Deletions should fill the gaps: if queue X is being deleted, N <= X < real_num_tx_queues - 1, then the gap should be filled with queue number real_num_tx_queues - 1 by swapping the queues (real_num_tx_queues will be decreased by 1 accordingly). Some care also needs to be taken when converting inner-to-leaf (TC_HTB_LEAF_DEL_LAST) and leaf-to-inner (it's better to get insights from [1], there are also some comments). > Can HTB actually configure H/W shaping on > real_num_tx_queues? It will be on real_num_tx_queues, but after it's increased to add new HTB queues. The original queues [0, N) are used for direct traffic, same as the non-offloaded HTB's direct_queue (it's not shaped). > I find no clear documentation WRT the above. I'm sorry for the lack of documentation. All I have is the commit message [2] and a netdev talk [3]. Maybe the slides could be of some use... I hope the above explanation clarifies something, and feel free to ask further questions, I'll be glad to explain what hasn't been documented properly. [1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/mellanox/mlx5/core/en/htb.c?id=5a6a09e97199d6600d31383055f9d43fbbcbe86f [2]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d03b195b5aa015f6c11988b86a3625f8d5dbac52 [3]: https://netdevconf.info/0x14/session.html?talk-hierarchical-QoS-hardware-offload > Thanks! > > Paolo > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net] net: sched: fix warn on htb offloaded class creation 2023-10-31 14:40 ` Maxim Mikityanskiy @ 2023-11-09 21:54 ` Chittim, Madhu 2023-11-10 11:07 ` Maxim Mikityanskiy 0 siblings, 1 reply; 11+ messages in thread From: Chittim, Madhu @ 2023-11-09 21:54 UTC (permalink / raw) To: Maxim Mikityanskiy, Paolo Abeni Cc: netdev, Jamal Hadi Salim, Cong Wang, Jiri Pirko, David S. Miller, Eric Dumazet, Jakub Kicinski, Tariq Toukan, Gal Pressman, Saeed Mahameed, xuejun.zhang, sridhar.samudrala On 10/31/2023 7:40 AM, Maxim Mikityanskiy wrote: > On Tue, 31 Oct 2023 at 10:11:14 +0100, Paolo Abeni wrote: >> Hi, >> >> I'm sorry for the late reply. >> >> On Fri, 2023-10-27 at 16:57 +0300, Maxim Mikityanskiy wrote: >>> I believe this is not the right fix. >>> >>> On Thu, 26 Oct 2023 at 17:36:48 +0200, Paolo Abeni wrote: >>>> The following commands: >>>> >>>> tc qdisc add dev eth1 handle 2: root htb offload >>>> tc class add dev eth1 parent 2: classid 2:1 htb rate 5mbit burst 15k >>>> >>>> yeld to a WARN in the HTB qdisc: >>> >>> Something is off here. These are literally the most basic commands one >>> could invoke with HTB offload, I'm sure they worked. Is it something >>> that broke recently? Tariq/Gal/Saeed, could you check them on a Mellanox >>> NIC? >>> >>>> >>>> WARNING: CPU: 2 PID: 1583 at net/sched/sch_htb.c:1959 >>>> CPU: 2 PID: 1583 Comm: tc Kdump: loaded 6.6.0-rc2.mptcp_7895773e5235+ #59 >>>> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc37 04/01/2014 >>>> RIP: 0010:htb_change_class+0x25c4/0x2e30 [sch_htb] >>>> Code: 24 58 48 b8 00 00 00 00 00 fc ff df 48 89 ca 48 c1 ea 03 80 3c 02 00 0f 85 92 01 00 00 49 89 8c 24 b0 01 00 00 e9 77 fc ff ff <0f> 0b e9 15 ec ff ff 80 3d f8 35 00 00 00 0f 85 d4 f9 ff ff ba 32 >>>> RSP: 0018:ffffc900015df240 EFLAGS: 00010246 >>>> RAX: 0000000000000000 RBX: ffff88811b4ca000 RCX: ffff88811db42800 >>>> RDX: 1ffff11023b68502 RSI: ffffffffaf2e6a00 RDI: ffff88811db42810 >>>> RBP: ffff88811db45000 R08: 0000000000000001 R09: fffffbfff664bbc9 >>>> R10: ffffffffb325de4f R11: ffffffffb2d33748 R12: 0000000000000000 >>>> R13: ffff88811db43000 R14: ffff88811b4caaac R15: ffff8881252c0030 >>>> FS: 00007f6c1f126740(0000) GS:ffff88815aa00000(0000) knlGS:0000000000000000 >>>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>> CR2: 000055dca8e5b4a8 CR3: 000000011bc7a006 CR4: 0000000000370ee0 >>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >>>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >>>> Call Trace: >>>> <TASK> >>>> tc_ctl_tclass+0x394/0xeb0 >>>> rtnetlink_rcv_msg+0x2f5/0xaa0 >>>> netlink_rcv_skb+0x12e/0x3a0 >>>> netlink_unicast+0x421/0x730 >>>> netlink_sendmsg+0x79e/0xc60 >>>> ____sys_sendmsg+0x95a/0xc20 >>>> ___sys_sendmsg+0xee/0x170 >>>> __sys_sendmsg+0xc6/0x170 >>>> do_syscall_64+0x58/0x80 >>>> entry_SYSCALL_64_after_hwframe+0x6e/0xd8 >>>> >>>> The first command creates per TX queue pfifo qdiscs in >>>> tc_modify_qdisc() -> htb_init() and grafts the pfifo to each dev_queue >>>> via tc_modify_qdisc() -> qdisc_graft() -> htb_attach(). >>> >>> Not exactly; it grafts pfifo to direct queues only. htb_attach_offload >>> explicitly grafts noop to all the remaining queues. >> >> num_direct_qdiscs == real_num_tx_queues: >> >> https://elixir.bootlin.com/linux/latest/source/net/sched/sch_htb.c#L1101 >> >> pfifo will be configured on all the TX queues available at TC creation >> time, right? > > Yes, all real TX queues will be used as direct queues (for unclassified > traffic). num_tx_queues should be somewhat bigger than > real_num_tx_queues - it should reserve a queue per potential leaf class. > > pfifo is configured on direct queues, and the reserved queues have noop. > Then, when a new leaf class is added (TC_HTB_LEAF_ALLOC_QUEUE), the > driver allocates a new queue and increases real_num_tx_queues. HTB > assigns a pfifo qdisc to the newly allocated queue. > > Changing the hierarchy (deleting a node or converting an inner node to a > leaf) may reorder the classful queues (with indexes >= the initial > real_num_tx_queues), so that there are no gaps. > >> Lacking a mlx card with offload support I hack basic htb support in >> netdevsim and I observe the splat on top of such device. I can as well >> share the netdevsim patch - it will need some clean-up. > > I will be happy to review the netdevsim patch, but I don't promise > prompt responsiveness. > >>> >>>> When the command completes, the qdisc_sleeping for each dev_queue is a >>>> pfifo one. The next class creation will trigger the reported splat. >>>> >>>> Address the issue taking care of old non-builtin qdisc in >>>> htb_change_class(). >>>> >>>> Fixes: d03b195b5aa0 ("sch_htb: Hierarchical QoS hardware offload") >>>> Signed-off-by: Paolo Abeni <pabeni@redhat.com> >>>> --- >>>> net/sched/sch_htb.c | 3 +-- >>>> 1 file changed, 1 insertion(+), 2 deletions(-) >>>> >>>> diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c >>>> index 0d947414e616..dc682bd542b4 100644 >>>> --- a/net/sched/sch_htb.c >>>> +++ b/net/sched/sch_htb.c >>>> @@ -1955,8 +1955,7 @@ static int htb_change_class(struct Qdisc *sch, u32 classid, >>>> qdisc_refcount_inc(new_q); >>>> } >>>> old_q = htb_graft_helper(dev_queue, new_q); >>>> - /* No qdisc_put needed. */ >>>> - WARN_ON(!(old_q->flags & TCQ_F_BUILTIN)); >>>> + qdisc_put(old_q); >>> >>> We can get here after one of two cases above: >>> >>> 1. A new queue is allocated with TC_HTB_LEAF_ALLOC_QUEUE. It's supposed >>> to have a noop qdisc by default (after htb_attach_offload). >> >> So most likely the trivial netdevsim implementation I used was not good >> enough. >> >> Which constrains should respect TC_HTB_LEAF_ALLOC_QUEUE WRT the >> returned qid value? should it in the (real_num_tx_queues, >> num_tx_queues] range? > > Let's say N is real_num_tx_queues as it was at the moment of attaching. > HTB queues should be allocated from [N, num_tx_queues), and > real_num_tx_queues should be increased accordingly. It should not return > queues number [0, N). > > Deletions should fill the gaps: if queue X is being deleted, N <= X < > real_num_tx_queues - 1, then the gap should be filled with queue number > real_num_tx_queues - 1 by swapping the queues (real_num_tx_queues will > be decreased by 1 accordingly). Some care also needs to be taken when > converting inner-to-leaf (TC_HTB_LEAF_DEL_LAST) and leaf-to-inner (it's > better to get insights from [1], there are also some comments). > >> Can HTB actually configure H/W shaping on >> real_num_tx_queues? > > It will be on real_num_tx_queues, but after it's increased to add new > HTB queues. The original queues [0, N) are used for direct traffic, same > as the non-offloaded HTB's direct_queue (it's not shaped). > >> I find no clear documentation WRT the above. > > I'm sorry for the lack of documentation. All I have is the commit > message [2] and a netdev talk [3]. Maybe the slides could be of some > use... > > I hope the above explanation clarifies something, and feel free to ask > further questions, I'll be glad to explain what hasn't been documented > properly. We would like to enable Tx rate limiting using htb offload on all the existing queues. We are able to do with the following set of commands with Paolo's patch tc qdisc add dev enp175s0v0 handle 1: root htb offload tc class add dev enp175s0v0 parent 1: classid 1:1 htb rate 1000mbit ceil 2000mbit burst 100k where classid 1:1 is tx queue0 tx_minrate is rate 1000mbps tx_maxrate is ceil 2000mbps In order to not break your implementation could bring in if condition instead WARN_ON, something like below if (!(old_q->flags & TCQ_F_BUILTIN)) qdisc_put(old_q); Would this work for you, please advise. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net] net: sched: fix warn on htb offloaded class creation 2023-11-09 21:54 ` Chittim, Madhu @ 2023-11-10 11:07 ` Maxim Mikityanskiy 2023-11-12 8:48 ` Paolo Abeni 0 siblings, 1 reply; 11+ messages in thread From: Maxim Mikityanskiy @ 2023-11-10 11:07 UTC (permalink / raw) To: Chittim, Madhu Cc: Paolo Abeni, netdev, Jamal Hadi Salim, Cong Wang, Jiri Pirko, David S. Miller, Eric Dumazet, Jakub Kicinski, Tariq Toukan, Gal Pressman, Saeed Mahameed, xuejun.zhang, sridhar.samudrala On Thu, 09 Nov 2023 at 13:54:17 -0800, Chittim, Madhu wrote: > > > On 10/31/2023 7:40 AM, Maxim Mikityanskiy wrote: > > On Tue, 31 Oct 2023 at 10:11:14 +0100, Paolo Abeni wrote: > > > Hi, > > > > > > I'm sorry for the late reply. > > > > > > On Fri, 2023-10-27 at 16:57 +0300, Maxim Mikityanskiy wrote: > > > > I believe this is not the right fix. > > > > > > > > On Thu, 26 Oct 2023 at 17:36:48 +0200, Paolo Abeni wrote: > > > > > The following commands: > > > > > > > > > > tc qdisc add dev eth1 handle 2: root htb offload > > > > > tc class add dev eth1 parent 2: classid 2:1 htb rate 5mbit burst 15k > > > > > > > > > > yeld to a WARN in the HTB qdisc: > > > > > > > > Something is off here. These are literally the most basic commands one > > > > could invoke with HTB offload, I'm sure they worked. Is it something > > > > that broke recently? Tariq/Gal/Saeed, could you check them on a Mellanox > > > > NIC? > > > > > > > > > > > > > > WARNING: CPU: 2 PID: 1583 at net/sched/sch_htb.c:1959 > > > > > CPU: 2 PID: 1583 Comm: tc Kdump: loaded 6.6.0-rc2.mptcp_7895773e5235+ #59 > > > > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc37 04/01/2014 > > > > > RIP: 0010:htb_change_class+0x25c4/0x2e30 [sch_htb] > > > > > Code: 24 58 48 b8 00 00 00 00 00 fc ff df 48 89 ca 48 c1 ea 03 80 3c 02 00 0f 85 92 01 00 00 49 89 8c 24 b0 01 00 00 e9 77 fc ff ff <0f> 0b e9 15 ec ff ff 80 3d f8 35 00 00 00 0f 85 d4 f9 ff ff ba 32 > > > > > RSP: 0018:ffffc900015df240 EFLAGS: 00010246 > > > > > RAX: 0000000000000000 RBX: ffff88811b4ca000 RCX: ffff88811db42800 > > > > > RDX: 1ffff11023b68502 RSI: ffffffffaf2e6a00 RDI: ffff88811db42810 > > > > > RBP: ffff88811db45000 R08: 0000000000000001 R09: fffffbfff664bbc9 > > > > > R10: ffffffffb325de4f R11: ffffffffb2d33748 R12: 0000000000000000 > > > > > R13: ffff88811db43000 R14: ffff88811b4caaac R15: ffff8881252c0030 > > > > > FS: 00007f6c1f126740(0000) GS:ffff88815aa00000(0000) knlGS:0000000000000000 > > > > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > > > CR2: 000055dca8e5b4a8 CR3: 000000011bc7a006 CR4: 0000000000370ee0 > > > > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > > > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > > > > > Call Trace: > > > > > <TASK> > > > > > tc_ctl_tclass+0x394/0xeb0 > > > > > rtnetlink_rcv_msg+0x2f5/0xaa0 > > > > > netlink_rcv_skb+0x12e/0x3a0 > > > > > netlink_unicast+0x421/0x730 > > > > > netlink_sendmsg+0x79e/0xc60 > > > > > ____sys_sendmsg+0x95a/0xc20 > > > > > ___sys_sendmsg+0xee/0x170 > > > > > __sys_sendmsg+0xc6/0x170 > > > > > do_syscall_64+0x58/0x80 > > > > > entry_SYSCALL_64_after_hwframe+0x6e/0xd8 > > > > > > > > > > The first command creates per TX queue pfifo qdiscs in > > > > > tc_modify_qdisc() -> htb_init() and grafts the pfifo to each dev_queue > > > > > via tc_modify_qdisc() -> qdisc_graft() -> htb_attach(). > > > > > > > > Not exactly; it grafts pfifo to direct queues only. htb_attach_offload > > > > explicitly grafts noop to all the remaining queues. > > > > > > num_direct_qdiscs == real_num_tx_queues: > > > > > > https://elixir.bootlin.com/linux/latest/source/net/sched/sch_htb.c#L1101 > > > > > > pfifo will be configured on all the TX queues available at TC creation > > > time, right? > > > > Yes, all real TX queues will be used as direct queues (for unclassified > > traffic). num_tx_queues should be somewhat bigger than > > real_num_tx_queues - it should reserve a queue per potential leaf class. > > > > pfifo is configured on direct queues, and the reserved queues have noop. > > Then, when a new leaf class is added (TC_HTB_LEAF_ALLOC_QUEUE), the > > driver allocates a new queue and increases real_num_tx_queues. HTB > > assigns a pfifo qdisc to the newly allocated queue. > > > > Changing the hierarchy (deleting a node or converting an inner node to a > > leaf) may reorder the classful queues (with indexes >= the initial > > real_num_tx_queues), so that there are no gaps. > > > > > Lacking a mlx card with offload support I hack basic htb support in > > > netdevsim and I observe the splat on top of such device. I can as well > > > share the netdevsim patch - it will need some clean-up. > > > > I will be happy to review the netdevsim patch, but I don't promise > > prompt responsiveness. > > > > > > > > > > > When the command completes, the qdisc_sleeping for each dev_queue is a > > > > > pfifo one. The next class creation will trigger the reported splat. > > > > > > > > > > Address the issue taking care of old non-builtin qdisc in > > > > > htb_change_class(). > > > > > > > > > > Fixes: d03b195b5aa0 ("sch_htb: Hierarchical QoS hardware offload") > > > > > Signed-off-by: Paolo Abeni <pabeni@redhat.com> > > > > > --- > > > > > net/sched/sch_htb.c | 3 +-- > > > > > 1 file changed, 1 insertion(+), 2 deletions(-) > > > > > > > > > > diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c > > > > > index 0d947414e616..dc682bd542b4 100644 > > > > > --- a/net/sched/sch_htb.c > > > > > +++ b/net/sched/sch_htb.c > > > > > @@ -1955,8 +1955,7 @@ static int htb_change_class(struct Qdisc *sch, u32 classid, > > > > > qdisc_refcount_inc(new_q); > > > > > } > > > > > old_q = htb_graft_helper(dev_queue, new_q); > > > > > - /* No qdisc_put needed. */ > > > > > - WARN_ON(!(old_q->flags & TCQ_F_BUILTIN)); > > > > > + qdisc_put(old_q); > > > > > > > > We can get here after one of two cases above: > > > > > > > > 1. A new queue is allocated with TC_HTB_LEAF_ALLOC_QUEUE. It's supposed > > > > to have a noop qdisc by default (after htb_attach_offload). > > > > > > So most likely the trivial netdevsim implementation I used was not good > > > enough. > > > > > > Which constrains should respect TC_HTB_LEAF_ALLOC_QUEUE WRT the > > > returned qid value? should it in the (real_num_tx_queues, > > > num_tx_queues] range? > > > > Let's say N is real_num_tx_queues as it was at the moment of attaching. > > HTB queues should be allocated from [N, num_tx_queues), and > > real_num_tx_queues should be increased accordingly. It should not return > > queues number [0, N). > > > > Deletions should fill the gaps: if queue X is being deleted, N <= X < > > real_num_tx_queues - 1, then the gap should be filled with queue number > > real_num_tx_queues - 1 by swapping the queues (real_num_tx_queues will > > be decreased by 1 accordingly). Some care also needs to be taken when > > converting inner-to-leaf (TC_HTB_LEAF_DEL_LAST) and leaf-to-inner (it's > > better to get insights from [1], there are also some comments). > > > > > Can HTB actually configure H/W shaping on > > > real_num_tx_queues? > > > > It will be on real_num_tx_queues, but after it's increased to add new > > HTB queues. The original queues [0, N) are used for direct traffic, same > > as the non-offloaded HTB's direct_queue (it's not shaped). > > > > > I find no clear documentation WRT the above. > > > > I'm sorry for the lack of documentation. All I have is the commit > > message [2] and a netdev talk [3]. Maybe the slides could be of some > > use... > > > > I hope the above explanation clarifies something, and feel free to ask > > further questions, I'll be glad to explain what hasn't been documented > > properly. > > We would like to enable Tx rate limiting using htb offload on all the > existing queues. I don't seem to understand how you see it possible with HTB. 1. Where would the unclassified traffic go? HTB uses a set of filters that steer certain traffic to certain classes. Everything that doesn't match the filter goes to the direct queue, which doesn't have shaping. In the non-offloaded HTB it's struct htb_sched::direct_queue. With HTB offload it's mapped to the standard set of queues. 2. How do you see the mapping of HTB classes to netdev queues if you use a fixed set of queues? In the current implementation of HTB offload, each new HTB class creates a new queue. If you are bound to using only N standard queues, how would the allocation work when you add the second, third, etc. HTB classes? > We are able to do with the following set of commands with > Paolo's patch > > tc qdisc add dev enp175s0v0 handle 1: root htb offload > tc class add dev enp175s0v0 parent 1: classid 1:1 htb rate 1000mbit ceil > 2000mbit burst 100k As long as you don't attach any filters, this set of commands is not supposed to shape anything. It should TX full-speed, you should be able to see this effect if you configure non-offloaded HTB. As long as there are no filters, and the default classid doesn't match your only class, all traffic will go to the direct queue. What is your goal? If you need shaping for the whole netdev, maybe HTB is not needed, and it's enough to attach a catchall filter with the police action? Or use /sys/class/net/eth0/queues/tx-*/tx_maxrate per-queue? Or tc mqprio mode channel, that allows to group existing queues and assign rate limits? > where > classid 1:1 is tx queue0 > tx_minrate is rate 1000mbps > tx_maxrate is ceil 2000mbps > > > In order to not break your implementation could bring in if condition > instead WARN_ON, something like below > if (!(old_q->flags & TCQ_F_BUILTIN)) > qdisc_put(old_q); > > Would this work for you, please advise. One of the reasons this WARN is there to prevent misuse of the API (the other is to catch bugs that lead to inconsistencies in the internal state). We can't just remove the WARN and start misusing the API. Your ideas require more major design changes, but before that I need to understand your approach to solving (1) and (2). ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net] net: sched: fix warn on htb offloaded class creation 2023-11-10 11:07 ` Maxim Mikityanskiy @ 2023-11-12 8:48 ` Paolo Abeni 2023-11-13 14:51 ` Maxim Mikityanskiy 0 siblings, 1 reply; 11+ messages in thread From: Paolo Abeni @ 2023-11-12 8:48 UTC (permalink / raw) To: Maxim Mikityanskiy, Chittim, Madhu, Jiri Pirko Cc: netdev, Jamal Hadi Salim, Cong Wang, David S. Miller, Eric Dumazet, Jakub Kicinski, Tariq Toukan, Gal Pressman, Saeed Mahameed, xuejun.zhang, sridhar.samudrala On Fri, 2023-11-10 at 13:07 +0200, Maxim Mikityanskiy wrote: > On Thu, 09 Nov 2023 at 13:54:17 -0800, Chittim, Madhu wrote: > > We would like to enable Tx rate limiting using htb offload on all the > > existing queues. > > I don't seem to understand how you see it possible with HTB. I must admit I feel sorry for not being able to join any of the upcoming conferences, but to me it looks like there is some communication gap that could be filled by in-person discussion. Specifically the above to me sounds contradictory to what you stated here: https://lore.kernel.org/netdev/ZUEQzsKiIlgtbN-S@mail.gmail.com/ """ > Can HTB actually configure H/W shaping on > real_num_tx_queues? It will be on real_num_tx_queues, but after it's increased to add new HTB queues. The original queues [0, N) are used for direct traffic, same as the non-offloaded HTB's direct_queue (it's not shaped). """ > What is your goal? We are looking for clean interface to configure individually min/max shaping on each TX queue for a given netdev (actually virtual function). Jiri's suggested to use TC: https://lore.kernel.org/netdev/ZOTVkXWCLY88YfjV@nanopsycho/ which IMHO makes sense as far as there is an existing interface (qdisc) providing the required features (almost) out-of-the-box. It looks like it's not the case. If a non trivial configuration and/or significant implementation are required, IMHO other options could be better... > If you need shaping for the whole netdev, maybe HTB > is not needed, and it's enough to attach a catchall filter with the > police action? Or use /sys/class/net/eth0/queues/tx-*/tx_maxrate > per-queue? ... specifically this latter one (it will require implementing in- kernel support for 'max_rate', but should quite straight-forward). It would be great to converge on some agreement on the way forward, thanks! Paolo ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net] net: sched: fix warn on htb offloaded class creation 2023-11-12 8:48 ` Paolo Abeni @ 2023-11-13 14:51 ` Maxim Mikityanskiy 2023-11-13 15:21 ` Paolo Abeni 0 siblings, 1 reply; 11+ messages in thread From: Maxim Mikityanskiy @ 2023-11-13 14:51 UTC (permalink / raw) To: Paolo Abeni Cc: Chittim, Madhu, Jiri Pirko, netdev, Jamal Hadi Salim, Cong Wang, David S. Miller, Eric Dumazet, Jakub Kicinski, Tariq Toukan, Gal Pressman, Saeed Mahameed, xuejun.zhang, sridhar.samudrala On Sun, 12 Nov 2023 at 09:48:19 +0100, Paolo Abeni wrote: > On Fri, 2023-11-10 at 13:07 +0200, Maxim Mikityanskiy wrote: > > On Thu, 09 Nov 2023 at 13:54:17 -0800, Chittim, Madhu wrote: > > > We would like to enable Tx rate limiting using htb offload on all the > > > existing queues. > > > > I don't seem to understand how you see it possible with HTB. > > I must admit I feel sorry for not being able to join any of the > upcoming conferences, but to me it looks like there is some > communication gap that could be filled by in-person discussion. > > Specifically the above to me sounds contradictory to what you stated > here: > > https://lore.kernel.org/netdev/ZUEQzsKiIlgtbN-S@mail.gmail.com/ > > """ > > Can HTB actually configure H/W shaping on > > real_num_tx_queues? > > It will be on real_num_tx_queues, but after it's increased to add new > HTB queues. The original queues [0, N) are used for direct traffic, > same as the non-offloaded HTB's direct_queue (it's not shaped). > """ Sorry if that was confusing, there is actually no contradition, let me rephrase. Queues number [0, orig_real_num_tx_queues) are direct, they are not shaped, they correspond to HTB's unclassified traffic. Queues number [orig_real_num_tx_queues, real_num_tx_queues) correspond to HTB classes and are shaped. Here orig_real_num_tx_queues is how many queues the netdev had before HTB offload was attached. It's basically the standard set of queues, and HTB creates a new queue per class. Let me know if that helps. > > What is your goal? > > We are looking for clean interface to configure individually min/max > shaping on each TX queue for a given netdev (actually virtual > function). Have you tried tc mqprio? If you set `mode channel` and create queue groups with only one queue each, you can set min_rate and max_rate for each group (==queue), and it works with the existing set of queues. > Jiri's suggested to use TC: > > https://lore.kernel.org/netdev/ZOTVkXWCLY88YfjV@nanopsycho/ > > which IMHO makes sense as far as there is an existing interface (qdisc) > providing the required features (almost) out-of-the-box. It looks like > it's not the case. > > If a non trivial configuration and/or significant implementation are > required, IMHO other options could be better... > > > If you need shaping for the whole netdev, maybe HTB > > is not needed, and it's enough to attach a catchall filter with the > > police action? Or use /sys/class/net/eth0/queues/tx-*/tx_maxrate > > per-queue? > > ... specifically this latter one (it will require implementing in- > kernel support for 'max_rate', but should quite straight-forward). > > It would be great to converge on some agreement on the way forward, > thanks! > > Paolo > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net] net: sched: fix warn on htb offloaded class creation 2023-11-13 14:51 ` Maxim Mikityanskiy @ 2023-11-13 15:21 ` Paolo Abeni 2023-11-15 18:51 ` Chittim, Madhu 0 siblings, 1 reply; 11+ messages in thread From: Paolo Abeni @ 2023-11-13 15:21 UTC (permalink / raw) To: Maxim Mikityanskiy, Jiri Pirko Cc: Chittim, Madhu, netdev, Jamal Hadi Salim, Cong Wang, David S. Miller, Eric Dumazet, Jakub Kicinski, Tariq Toukan, Gal Pressman, Saeed Mahameed, xuejun.zhang, sridhar.samudrala On Mon, 2023-11-13 at 16:51 +0200, Maxim Mikityanskiy wrote: > On Sun, 12 Nov 2023 at 09:48:19 +0100, Paolo Abeni wrote: > > On Fri, 2023-11-10 at 13:07 +0200, Maxim Mikityanskiy wrote: > > > On Thu, 09 Nov 2023 at 13:54:17 -0800, Chittim, Madhu wrote: > > > > We would like to enable Tx rate limiting using htb offload on all the > > > > existing queues. > > > > > > I don't seem to understand how you see it possible with HTB. > > > > I must admit I feel sorry for not being able to join any of the > > upcoming conferences, but to me it looks like there is some > > communication gap that could be filled by in-person discussion. > > > > Specifically the above to me sounds contradictory to what you stated > > here: > > > > https://lore.kernel.org/netdev/ZUEQzsKiIlgtbN-S@mail.gmail.com/ > > > > """ > > > Can HTB actually configure H/W shaping on > > > real_num_tx_queues? > > > > It will be on real_num_tx_queues, but after it's increased to add new > > HTB queues. The original queues [0, N) are used for direct traffic, > > same as the non-offloaded HTB's direct_queue (it's not shaped). > > """ > > Sorry if that was confusing, there is actually no contradition, let me > rephrase. Queues number [0, orig_real_num_tx_queues) are direct, they > are not shaped, they correspond to HTB's unclassified traffic. Queues > number [orig_real_num_tx_queues, real_num_tx_queues) correspond to HTB > classes and are shaped. Here orig_real_num_tx_queues is how many queues > the netdev had before HTB offload was attached. It's basically the > standard set of queues, and HTB creates a new queue per class. Let me > know if that helps. > > > > What is your goal? > > > > We are looking for clean interface to configure individually min/max > > shaping on each TX queue for a given netdev (actually virtual > > function). > > Have you tried tc mqprio? If you set `mode channel` and create queue > groups with only one queue each, you can set min_rate and max_rate for > each group (==queue), and it works with the existing set of queues. mqprio does not fit well here as: * enforce an hard limit of 16 queues imposed by uAPI * traffic class/queue selection depends on skb priority. That does not fit well if e.g. the user would rely on a 1to1 CPU to queue mapping. It looks like any existing scheduler would need some extension to fit this use case. @Jiri: when you suggested TC do you have anything specific in your mind? Otherwise it looks like extending /sys/class/net/eth0/queues/tx- */tx_maxrate would be more straight-forward to me. Thanks! Paolo ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net] net: sched: fix warn on htb offloaded class creation 2023-11-13 15:21 ` Paolo Abeni @ 2023-11-15 18:51 ` Chittim, Madhu 0 siblings, 0 replies; 11+ messages in thread From: Chittim, Madhu @ 2023-11-15 18:51 UTC (permalink / raw) To: Paolo Abeni, Maxim Mikityanskiy, Jiri Pirko Cc: netdev, Jamal Hadi Salim, Cong Wang, David S. Miller, Eric Dumazet, Jakub Kicinski, Tariq Toukan, Gal Pressman, Saeed Mahameed, xuejun.zhang, sridhar.samudrala On 11/13/2023 7:21 AM, Paolo Abeni wrote: > On Mon, 2023-11-13 at 16:51 +0200, Maxim Mikityanskiy wrote: >> On Sun, 12 Nov 2023 at 09:48:19 +0100, Paolo Abeni wrote: >>> On Fri, 2023-11-10 at 13:07 +0200, Maxim Mikityanskiy wrote: >>>> On Thu, 09 Nov 2023 at 13:54:17 -0800, Chittim, Madhu wrote: >>>>> We would like to enable Tx rate limiting using htb offload on all the >>>>> existing queues. >>>> >>>> I don't seem to understand how you see it possible with HTB. >>> >>> I must admit I feel sorry for not being able to join any of the >>> upcoming conferences, but to me it looks like there is some >>> communication gap that could be filled by in-person discussion. >>> >>> Specifically the above to me sounds contradictory to what you stated >>> here: >>> >>> https://lore.kernel.org/netdev/ZUEQzsKiIlgtbN-S@mail.gmail.com/ >>> >>> """ >>>> Can HTB actually configure H/W shaping on >>>> real_num_tx_queues? >>> >>> It will be on real_num_tx_queues, but after it's increased to add new >>> HTB queues. The original queues [0, N) are used for direct traffic, >>> same as the non-offloaded HTB's direct_queue (it's not shaped). >>> """ >> >> Sorry if that was confusing, there is actually no contradition, let me >> rephrase. Queues number [0, orig_real_num_tx_queues) are direct, they >> are not shaped, they correspond to HTB's unclassified traffic. Queues >> number [orig_real_num_tx_queues, real_num_tx_queues) correspond to HTB >> classes and are shaped. Here orig_real_num_tx_queues is how many queues >> the netdev had before HTB offload was attached. It's basically the >> standard set of queues, and HTB creates a new queue per class. Let me >> know if that helps. >> >>>> What is your goal? >>> >>> We are looking for clean interface to configure individually min/max >>> shaping on each TX queue for a given netdev (actually virtual >>> function). >> >> Have you tried tc mqprio? If you set `mode channel` and create queue >> groups with only one queue each, you can set min_rate and max_rate for >> each group (==queue), and it works with the existing set of queues. > > mqprio does not fit well here as: > > * enforce an hard limit of 16 queues imposed by uAPI > * traffic class/queue selection depends on skb priority. That does not > fit well if e.g. the user would rely on a 1to1 CPU to queue mapping. > > It looks like any existing scheduler would need some extension to fit > this use case. > > @Jiri: when you suggested TC do you have anything specific in your > mind? > > Otherwise it looks like extending /sys/class/net/eth0/queues/tx- > */tx_maxrate would be more straight-forward to me. We are planning to go with this approach of extending sysfs entry to have tx-maxrate and use this in our iAVF driver to support per queue Tx rate limiting Below is the link to our original implementation using devlink interface which was rejected by Jiri https://lore.kernel.org/netdev/20230822034003.31628-1-wenjun1.wu@intel.com/ @Jiri, @Jakub, Please advise > > Thanks! > > Paolo > ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2023-11-15 18:51 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-10-26 15:36 [PATCH net] net: sched: fix warn on htb offloaded class creation Paolo Abeni 2023-10-27 13:57 ` Maxim Mikityanskiy 2023-10-29 12:01 ` Gal Pressman 2023-10-31 9:11 ` Paolo Abeni 2023-10-31 14:40 ` Maxim Mikityanskiy 2023-11-09 21:54 ` Chittim, Madhu 2023-11-10 11:07 ` Maxim Mikityanskiy 2023-11-12 8:48 ` Paolo Abeni 2023-11-13 14:51 ` Maxim Mikityanskiy 2023-11-13 15:21 ` Paolo Abeni 2023-11-15 18:51 ` Chittim, Madhu
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).