netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net] net: sched: fix warn on htb offloaded class creation
@ 2023-10-26 15:36 Paolo Abeni
  2023-10-27 13:57 ` Maxim Mikityanskiy
  0 siblings, 1 reply; 11+ messages in thread
From: Paolo Abeni @ 2023-10-26 15:36 UTC (permalink / raw)
  To: netdev
  Cc: Jamal Hadi Salim, Cong Wang, Jiri Pirko, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Tariq Toukan, Maxim Mikityanskiy

The following commands:

tc qdisc add dev eth1 handle 2: root htb offload
tc class add dev eth1 parent 2: classid 2:1 htb rate 5mbit burst 15k

yeld to a WARN in the HTB qdisc:

 WARNING: CPU: 2 PID: 1583 at net/sched/sch_htb.c:1959
 CPU: 2 PID: 1583 Comm: tc Kdump: loaded 6.6.0-rc2.mptcp_7895773e5235+ #59
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc37 04/01/2014
 RIP: 0010:htb_change_class+0x25c4/0x2e30 [sch_htb]
 Code: 24 58 48 b8 00 00 00 00 00 fc ff df 48 89 ca 48 c1 ea 03 80 3c 02 00 0f 85 92 01 00 00 49 89 8c 24 b0 01 00 00 e9 77 fc ff ff <0f> 0b e9 15 ec ff ff 80 3d f8 35 00 00 00 0f 85 d4 f9 ff ff ba 32
 RSP: 0018:ffffc900015df240 EFLAGS: 00010246
 RAX: 0000000000000000 RBX: ffff88811b4ca000 RCX: ffff88811db42800
 RDX: 1ffff11023b68502 RSI: ffffffffaf2e6a00 RDI: ffff88811db42810
 RBP: ffff88811db45000 R08: 0000000000000001 R09: fffffbfff664bbc9
 R10: ffffffffb325de4f R11: ffffffffb2d33748 R12: 0000000000000000
 R13: ffff88811db43000 R14: ffff88811b4caaac R15: ffff8881252c0030
 FS:  00007f6c1f126740(0000) GS:ffff88815aa00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 000055dca8e5b4a8 CR3: 000000011bc7a006 CR4: 0000000000370ee0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 Call Trace:
 <TASK>
  tc_ctl_tclass+0x394/0xeb0
  rtnetlink_rcv_msg+0x2f5/0xaa0
  netlink_rcv_skb+0x12e/0x3a0
  netlink_unicast+0x421/0x730
  netlink_sendmsg+0x79e/0xc60
  ____sys_sendmsg+0x95a/0xc20
  ___sys_sendmsg+0xee/0x170
  __sys_sendmsg+0xc6/0x170
 do_syscall_64+0x58/0x80
 entry_SYSCALL_64_after_hwframe+0x6e/0xd8

The first command creates per TX queue pfifo qdiscs in
tc_modify_qdisc() -> htb_init() and grafts the pfifo to each dev_queue
via tc_modify_qdisc() ->  qdisc_graft() -> htb_attach().

When the command completes, the qdisc_sleeping for each dev_queue is a
pfifo one. The next class creation will trigger the reported splat.

Address the issue taking care of old non-builtin qdisc in
htb_change_class().

Fixes: d03b195b5aa0 ("sch_htb: Hierarchical QoS hardware offload")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/sched/sch_htb.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
index 0d947414e616..dc682bd542b4 100644
--- a/net/sched/sch_htb.c
+++ b/net/sched/sch_htb.c
@@ -1955,8 +1955,7 @@ static int htb_change_class(struct Qdisc *sch, u32 classid,
 				qdisc_refcount_inc(new_q);
 			}
 			old_q = htb_graft_helper(dev_queue, new_q);
-			/* No qdisc_put needed. */
-			WARN_ON(!(old_q->flags & TCQ_F_BUILTIN));
+			qdisc_put(old_q);
 		}
 		sch_tree_lock(sch);
 		if (parent && !parent->level) {
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH net] net: sched: fix warn on htb offloaded class creation
  2023-10-26 15:36 [PATCH net] net: sched: fix warn on htb offloaded class creation Paolo Abeni
@ 2023-10-27 13:57 ` Maxim Mikityanskiy
  2023-10-29 12:01   ` Gal Pressman
  2023-10-31  9:11   ` Paolo Abeni
  0 siblings, 2 replies; 11+ messages in thread
From: Maxim Mikityanskiy @ 2023-10-27 13:57 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: netdev, Jamal Hadi Salim, Cong Wang, Jiri Pirko, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Tariq Toukan, Gal Pressman,
	Saeed Mahameed

I believe this is not the right fix.

On Thu, 26 Oct 2023 at 17:36:48 +0200, Paolo Abeni wrote:
> The following commands:
> 
> tc qdisc add dev eth1 handle 2: root htb offload
> tc class add dev eth1 parent 2: classid 2:1 htb rate 5mbit burst 15k
> 
> yeld to a WARN in the HTB qdisc:

Something is off here. These are literally the most basic commands one
could invoke with HTB offload, I'm sure they worked. Is it something
that broke recently? Tariq/Gal/Saeed, could you check them on a Mellanox
NIC?

> 
>  WARNING: CPU: 2 PID: 1583 at net/sched/sch_htb.c:1959
>  CPU: 2 PID: 1583 Comm: tc Kdump: loaded 6.6.0-rc2.mptcp_7895773e5235+ #59
>  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc37 04/01/2014
>  RIP: 0010:htb_change_class+0x25c4/0x2e30 [sch_htb]
>  Code: 24 58 48 b8 00 00 00 00 00 fc ff df 48 89 ca 48 c1 ea 03 80 3c 02 00 0f 85 92 01 00 00 49 89 8c 24 b0 01 00 00 e9 77 fc ff ff <0f> 0b e9 15 ec ff ff 80 3d f8 35 00 00 00 0f 85 d4 f9 ff ff ba 32
>  RSP: 0018:ffffc900015df240 EFLAGS: 00010246
>  RAX: 0000000000000000 RBX: ffff88811b4ca000 RCX: ffff88811db42800
>  RDX: 1ffff11023b68502 RSI: ffffffffaf2e6a00 RDI: ffff88811db42810
>  RBP: ffff88811db45000 R08: 0000000000000001 R09: fffffbfff664bbc9
>  R10: ffffffffb325de4f R11: ffffffffb2d33748 R12: 0000000000000000
>  R13: ffff88811db43000 R14: ffff88811b4caaac R15: ffff8881252c0030
>  FS:  00007f6c1f126740(0000) GS:ffff88815aa00000(0000) knlGS:0000000000000000
>  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>  CR2: 000055dca8e5b4a8 CR3: 000000011bc7a006 CR4: 0000000000370ee0
>  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>  Call Trace:
>  <TASK>
>   tc_ctl_tclass+0x394/0xeb0
>   rtnetlink_rcv_msg+0x2f5/0xaa0
>   netlink_rcv_skb+0x12e/0x3a0
>   netlink_unicast+0x421/0x730
>   netlink_sendmsg+0x79e/0xc60
>   ____sys_sendmsg+0x95a/0xc20
>   ___sys_sendmsg+0xee/0x170
>   __sys_sendmsg+0xc6/0x170
>  do_syscall_64+0x58/0x80
>  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
> 
> The first command creates per TX queue pfifo qdiscs in
> tc_modify_qdisc() -> htb_init() and grafts the pfifo to each dev_queue
> via tc_modify_qdisc() ->  qdisc_graft() -> htb_attach().

Not exactly; it grafts pfifo to direct queues only. htb_attach_offload
explicitly grafts noop to all the remaining queues.

> When the command completes, the qdisc_sleeping for each dev_queue is a
> pfifo one. The next class creation will trigger the reported splat.
> 
> Address the issue taking care of old non-builtin qdisc in
> htb_change_class().
> 
> Fixes: d03b195b5aa0 ("sch_htb: Hierarchical QoS hardware offload")
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
>  net/sched/sch_htb.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
> index 0d947414e616..dc682bd542b4 100644
> --- a/net/sched/sch_htb.c
> +++ b/net/sched/sch_htb.c
> @@ -1955,8 +1955,7 @@ static int htb_change_class(struct Qdisc *sch, u32 classid,
>  				qdisc_refcount_inc(new_q);
>  			}
>  			old_q = htb_graft_helper(dev_queue, new_q);
> -			/* No qdisc_put needed. */
> -			WARN_ON(!(old_q->flags & TCQ_F_BUILTIN));
> +			qdisc_put(old_q);

We can get here after one of two cases above:

1. A new queue is allocated with TC_HTB_LEAF_ALLOC_QUEUE. It's supposed
to have a noop qdisc by default (after htb_attach_offload).

2. An existing leaf is converted to an inner node with
TC_HTB_LEAF_TO_INNER, its queue is reused. htb_graft_helper(dev_queue,
NULL) makes sure that the qdisc is noop, not pfifo anymore.

This WARN_ON is here for a reason. If it triggered, it indicates a bug
somewhere else, because we don't expect anything but noop_qdisc after
the `if` above. Silencing the warning doesn't fix the bug and may lead
to damaging the consistency of the data structures.

>  		}
>  		sch_tree_lock(sch);
>  		if (parent && !parent->level) {
> -- 
> 2.41.0
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net] net: sched: fix warn on htb offloaded class creation
  2023-10-27 13:57 ` Maxim Mikityanskiy
@ 2023-10-29 12:01   ` Gal Pressman
  2023-10-31  9:11   ` Paolo Abeni
  1 sibling, 0 replies; 11+ messages in thread
From: Gal Pressman @ 2023-10-29 12:01 UTC (permalink / raw)
  To: Maxim Mikityanskiy, Paolo Abeni
  Cc: netdev, Jamal Hadi Salim, Cong Wang, Jiri Pirko, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Tariq Toukan, Saeed Mahameed

Hey Maxim!

On 27/10/2023 16:57, Maxim Mikityanskiy wrote:
> I believe this is not the right fix.
> 
> On Thu, 26 Oct 2023 at 17:36:48 +0200, Paolo Abeni wrote:
>> The following commands:
>>
>> tc qdisc add dev eth1 handle 2: root htb offload
>> tc class add dev eth1 parent 2: classid 2:1 htb rate 5mbit burst 15k
>>
>> yeld to a WARN in the HTB qdisc:
> 
> Something is off here. These are literally the most basic commands one
> could invoke with HTB offload, I'm sure they worked. Is it something
> that broke recently? Tariq/Gal/Saeed, could you check them on a Mellanox
> NIC?

This is working on our NICs, we do not hit that warning.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net] net: sched: fix warn on htb offloaded class creation
  2023-10-27 13:57 ` Maxim Mikityanskiy
  2023-10-29 12:01   ` Gal Pressman
@ 2023-10-31  9:11   ` Paolo Abeni
  2023-10-31 14:40     ` Maxim Mikityanskiy
  1 sibling, 1 reply; 11+ messages in thread
From: Paolo Abeni @ 2023-10-31  9:11 UTC (permalink / raw)
  To: Maxim Mikityanskiy
  Cc: netdev, Jamal Hadi Salim, Cong Wang, Jiri Pirko, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Tariq Toukan, Gal Pressman,
	Saeed Mahameed

Hi,

I'm sorry for the late reply.

On Fri, 2023-10-27 at 16:57 +0300, Maxim Mikityanskiy wrote:
> I believe this is not the right fix.
> 
> On Thu, 26 Oct 2023 at 17:36:48 +0200, Paolo Abeni wrote:
> > The following commands:
> > 
> > tc qdisc add dev eth1 handle 2: root htb offload
> > tc class add dev eth1 parent 2: classid 2:1 htb rate 5mbit burst 15k
> > 
> > yeld to a WARN in the HTB qdisc:
> 
> Something is off here. These are literally the most basic commands one
> could invoke with HTB offload, I'm sure they worked. Is it something
> that broke recently? Tariq/Gal/Saeed, could you check them on a Mellanox
> NIC?
> 
> > 
> >  WARNING: CPU: 2 PID: 1583 at net/sched/sch_htb.c:1959
> >  CPU: 2 PID: 1583 Comm: tc Kdump: loaded 6.6.0-rc2.mptcp_7895773e5235+ #59
> >  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc37 04/01/2014
> >  RIP: 0010:htb_change_class+0x25c4/0x2e30 [sch_htb]
> >  Code: 24 58 48 b8 00 00 00 00 00 fc ff df 48 89 ca 48 c1 ea 03 80 3c 02 00 0f 85 92 01 00 00 49 89 8c 24 b0 01 00 00 e9 77 fc ff ff <0f> 0b e9 15 ec ff ff 80 3d f8 35 00 00 00 0f 85 d4 f9 ff ff ba 32
> >  RSP: 0018:ffffc900015df240 EFLAGS: 00010246
> >  RAX: 0000000000000000 RBX: ffff88811b4ca000 RCX: ffff88811db42800
> >  RDX: 1ffff11023b68502 RSI: ffffffffaf2e6a00 RDI: ffff88811db42810
> >  RBP: ffff88811db45000 R08: 0000000000000001 R09: fffffbfff664bbc9
> >  R10: ffffffffb325de4f R11: ffffffffb2d33748 R12: 0000000000000000
> >  R13: ffff88811db43000 R14: ffff88811b4caaac R15: ffff8881252c0030
> >  FS:  00007f6c1f126740(0000) GS:ffff88815aa00000(0000) knlGS:0000000000000000
> >  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >  CR2: 000055dca8e5b4a8 CR3: 000000011bc7a006 CR4: 0000000000370ee0
> >  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> >  Call Trace:
> >  <TASK>
> >   tc_ctl_tclass+0x394/0xeb0
> >   rtnetlink_rcv_msg+0x2f5/0xaa0
> >   netlink_rcv_skb+0x12e/0x3a0
> >   netlink_unicast+0x421/0x730
> >   netlink_sendmsg+0x79e/0xc60
> >   ____sys_sendmsg+0x95a/0xc20
> >   ___sys_sendmsg+0xee/0x170
> >   __sys_sendmsg+0xc6/0x170
> >  do_syscall_64+0x58/0x80
> >  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
> > 
> > The first command creates per TX queue pfifo qdiscs in
> > tc_modify_qdisc() -> htb_init() and grafts the pfifo to each dev_queue
> > via tc_modify_qdisc() ->  qdisc_graft() -> htb_attach().
> 
> Not exactly; it grafts pfifo to direct queues only. htb_attach_offload
> explicitly grafts noop to all the remaining queues.

num_direct_qdiscs == real_num_tx_queues:

https://elixir.bootlin.com/linux/latest/source/net/sched/sch_htb.c#L1101

pfifo will be configured on all the TX queues available at TC creation
time, right?

Lacking a mlx card with offload support I hack basic htb support in
netdevsim and I observe the splat on top of such device. I can as well
share the netdevsim patch - it will need some clean-up.
> 
> > When the command completes, the qdisc_sleeping for each dev_queue is a
> > pfifo one. The next class creation will trigger the reported splat.
> > 
> > Address the issue taking care of old non-builtin qdisc in
> > htb_change_class().
> > 
> > Fixes: d03b195b5aa0 ("sch_htb: Hierarchical QoS hardware offload")
> > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > ---
> >  net/sched/sch_htb.c | 3 +--
> >  1 file changed, 1 insertion(+), 2 deletions(-)
> > 
> > diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
> > index 0d947414e616..dc682bd542b4 100644
> > --- a/net/sched/sch_htb.c
> > +++ b/net/sched/sch_htb.c
> > @@ -1955,8 +1955,7 @@ static int htb_change_class(struct Qdisc *sch, u32 classid,
> >  				qdisc_refcount_inc(new_q);
> >  			}
> >  			old_q = htb_graft_helper(dev_queue, new_q);
> > -			/* No qdisc_put needed. */
> > -			WARN_ON(!(old_q->flags & TCQ_F_BUILTIN));
> > +			qdisc_put(old_q);
> 
> We can get here after one of two cases above:
> 
> 1. A new queue is allocated with TC_HTB_LEAF_ALLOC_QUEUE. It's supposed
> to have a noop qdisc by default (after htb_attach_offload).

So most likely the trivial netdevsim implementation I used was not good
enough.

Which constrains should respect TC_HTB_LEAF_ALLOC_QUEUE WRT the
returned qid value? should it in the (real_num_tx_queues,
num_tx_queues] range? Can HTB actually configure H/W shaping on
real_num_tx_queues?

I find no clear documentation WRT the above.

Thanks!

Paolo


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net] net: sched: fix warn on htb offloaded class creation
  2023-10-31  9:11   ` Paolo Abeni
@ 2023-10-31 14:40     ` Maxim Mikityanskiy
  2023-11-09 21:54       ` Chittim, Madhu
  0 siblings, 1 reply; 11+ messages in thread
From: Maxim Mikityanskiy @ 2023-10-31 14:40 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: netdev, Jamal Hadi Salim, Cong Wang, Jiri Pirko, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Tariq Toukan, Gal Pressman,
	Saeed Mahameed

On Tue, 31 Oct 2023 at 10:11:14 +0100, Paolo Abeni wrote:
> Hi,
> 
> I'm sorry for the late reply.
> 
> On Fri, 2023-10-27 at 16:57 +0300, Maxim Mikityanskiy wrote:
> > I believe this is not the right fix.
> > 
> > On Thu, 26 Oct 2023 at 17:36:48 +0200, Paolo Abeni wrote:
> > > The following commands:
> > > 
> > > tc qdisc add dev eth1 handle 2: root htb offload
> > > tc class add dev eth1 parent 2: classid 2:1 htb rate 5mbit burst 15k
> > > 
> > > yeld to a WARN in the HTB qdisc:
> > 
> > Something is off here. These are literally the most basic commands one
> > could invoke with HTB offload, I'm sure they worked. Is it something
> > that broke recently? Tariq/Gal/Saeed, could you check them on a Mellanox
> > NIC?
> > 
> > > 
> > >  WARNING: CPU: 2 PID: 1583 at net/sched/sch_htb.c:1959
> > >  CPU: 2 PID: 1583 Comm: tc Kdump: loaded 6.6.0-rc2.mptcp_7895773e5235+ #59
> > >  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc37 04/01/2014
> > >  RIP: 0010:htb_change_class+0x25c4/0x2e30 [sch_htb]
> > >  Code: 24 58 48 b8 00 00 00 00 00 fc ff df 48 89 ca 48 c1 ea 03 80 3c 02 00 0f 85 92 01 00 00 49 89 8c 24 b0 01 00 00 e9 77 fc ff ff <0f> 0b e9 15 ec ff ff 80 3d f8 35 00 00 00 0f 85 d4 f9 ff ff ba 32
> > >  RSP: 0018:ffffc900015df240 EFLAGS: 00010246
> > >  RAX: 0000000000000000 RBX: ffff88811b4ca000 RCX: ffff88811db42800
> > >  RDX: 1ffff11023b68502 RSI: ffffffffaf2e6a00 RDI: ffff88811db42810
> > >  RBP: ffff88811db45000 R08: 0000000000000001 R09: fffffbfff664bbc9
> > >  R10: ffffffffb325de4f R11: ffffffffb2d33748 R12: 0000000000000000
> > >  R13: ffff88811db43000 R14: ffff88811b4caaac R15: ffff8881252c0030
> > >  FS:  00007f6c1f126740(0000) GS:ffff88815aa00000(0000) knlGS:0000000000000000
> > >  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > >  CR2: 000055dca8e5b4a8 CR3: 000000011bc7a006 CR4: 0000000000370ee0
> > >  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > >  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > >  Call Trace:
> > >  <TASK>
> > >   tc_ctl_tclass+0x394/0xeb0
> > >   rtnetlink_rcv_msg+0x2f5/0xaa0
> > >   netlink_rcv_skb+0x12e/0x3a0
> > >   netlink_unicast+0x421/0x730
> > >   netlink_sendmsg+0x79e/0xc60
> > >   ____sys_sendmsg+0x95a/0xc20
> > >   ___sys_sendmsg+0xee/0x170
> > >   __sys_sendmsg+0xc6/0x170
> > >  do_syscall_64+0x58/0x80
> > >  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
> > > 
> > > The first command creates per TX queue pfifo qdiscs in
> > > tc_modify_qdisc() -> htb_init() and grafts the pfifo to each dev_queue
> > > via tc_modify_qdisc() ->  qdisc_graft() -> htb_attach().
> > 
> > Not exactly; it grafts pfifo to direct queues only. htb_attach_offload
> > explicitly grafts noop to all the remaining queues.
> 
> num_direct_qdiscs == real_num_tx_queues:
> 
> https://elixir.bootlin.com/linux/latest/source/net/sched/sch_htb.c#L1101
> 
> pfifo will be configured on all the TX queues available at TC creation
> time, right?

Yes, all real TX queues will be used as direct queues (for unclassified
traffic). num_tx_queues should be somewhat bigger than
real_num_tx_queues - it should reserve a queue per potential leaf class.

pfifo is configured on direct queues, and the reserved queues have noop.
Then, when a new leaf class is added (TC_HTB_LEAF_ALLOC_QUEUE), the
driver allocates a new queue and increases real_num_tx_queues. HTB
assigns a pfifo qdisc to the newly allocated queue.

Changing the hierarchy (deleting a node or converting an inner node to a
leaf) may reorder the classful queues (with indexes >= the initial
real_num_tx_queues), so that there are no gaps.

> Lacking a mlx card with offload support I hack basic htb support in
> netdevsim and I observe the splat on top of such device. I can as well
> share the netdevsim patch - it will need some clean-up.

I will be happy to review the netdevsim patch, but I don't promise
prompt responsiveness.

> > 
> > > When the command completes, the qdisc_sleeping for each dev_queue is a
> > > pfifo one. The next class creation will trigger the reported splat.
> > > 
> > > Address the issue taking care of old non-builtin qdisc in
> > > htb_change_class().
> > > 
> > > Fixes: d03b195b5aa0 ("sch_htb: Hierarchical QoS hardware offload")
> > > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > > ---
> > >  net/sched/sch_htb.c | 3 +--
> > >  1 file changed, 1 insertion(+), 2 deletions(-)
> > > 
> > > diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
> > > index 0d947414e616..dc682bd542b4 100644
> > > --- a/net/sched/sch_htb.c
> > > +++ b/net/sched/sch_htb.c
> > > @@ -1955,8 +1955,7 @@ static int htb_change_class(struct Qdisc *sch, u32 classid,
> > >  				qdisc_refcount_inc(new_q);
> > >  			}
> > >  			old_q = htb_graft_helper(dev_queue, new_q);
> > > -			/* No qdisc_put needed. */
> > > -			WARN_ON(!(old_q->flags & TCQ_F_BUILTIN));
> > > +			qdisc_put(old_q);
> > 
> > We can get here after one of two cases above:
> > 
> > 1. A new queue is allocated with TC_HTB_LEAF_ALLOC_QUEUE. It's supposed
> > to have a noop qdisc by default (after htb_attach_offload).
> 
> So most likely the trivial netdevsim implementation I used was not good
> enough.
> 
> Which constrains should respect TC_HTB_LEAF_ALLOC_QUEUE WRT the
> returned qid value? should it in the (real_num_tx_queues,
> num_tx_queues] range?

Let's say N is real_num_tx_queues as it was at the moment of attaching.
HTB queues should be allocated from [N, num_tx_queues), and
real_num_tx_queues should be increased accordingly. It should not return
queues number [0, N).

Deletions should fill the gaps: if queue X is being deleted, N <= X <
real_num_tx_queues - 1, then the gap should be filled with queue number
real_num_tx_queues - 1 by swapping the queues (real_num_tx_queues will
be decreased by 1 accordingly). Some care also needs to be taken when
converting inner-to-leaf (TC_HTB_LEAF_DEL_LAST) and leaf-to-inner (it's
better to get insights from [1], there are also some comments).

> Can HTB actually configure H/W shaping on
> real_num_tx_queues?

It will be on real_num_tx_queues, but after it's increased to add new
HTB queues. The original queues [0, N) are used for direct traffic, same
as the non-offloaded HTB's direct_queue (it's not shaped).

> I find no clear documentation WRT the above.

I'm sorry for the lack of documentation. All I have is the commit
message [2] and a netdev talk [3]. Maybe the slides could be of some
use...

I hope the above explanation clarifies something, and feel free to ask
further questions, I'll be glad to explain what hasn't been documented
properly.

[1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/mellanox/mlx5/core/en/htb.c?id=5a6a09e97199d6600d31383055f9d43fbbcbe86f
[2]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d03b195b5aa015f6c11988b86a3625f8d5dbac52
[3]: https://netdevconf.info/0x14/session.html?talk-hierarchical-QoS-hardware-offload

> Thanks!
> 
> Paolo
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net] net: sched: fix warn on htb offloaded class creation
  2023-10-31 14:40     ` Maxim Mikityanskiy
@ 2023-11-09 21:54       ` Chittim, Madhu
  2023-11-10 11:07         ` Maxim Mikityanskiy
  0 siblings, 1 reply; 11+ messages in thread
From: Chittim, Madhu @ 2023-11-09 21:54 UTC (permalink / raw)
  To: Maxim Mikityanskiy, Paolo Abeni
  Cc: netdev, Jamal Hadi Salim, Cong Wang, Jiri Pirko, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Tariq Toukan, Gal Pressman,
	Saeed Mahameed, xuejun.zhang, sridhar.samudrala



On 10/31/2023 7:40 AM, Maxim Mikityanskiy wrote:
> On Tue, 31 Oct 2023 at 10:11:14 +0100, Paolo Abeni wrote:
>> Hi,
>>
>> I'm sorry for the late reply.
>>
>> On Fri, 2023-10-27 at 16:57 +0300, Maxim Mikityanskiy wrote:
>>> I believe this is not the right fix.
>>>
>>> On Thu, 26 Oct 2023 at 17:36:48 +0200, Paolo Abeni wrote:
>>>> The following commands:
>>>>
>>>> tc qdisc add dev eth1 handle 2: root htb offload
>>>> tc class add dev eth1 parent 2: classid 2:1 htb rate 5mbit burst 15k
>>>>
>>>> yeld to a WARN in the HTB qdisc:
>>>
>>> Something is off here. These are literally the most basic commands one
>>> could invoke with HTB offload, I'm sure they worked. Is it something
>>> that broke recently? Tariq/Gal/Saeed, could you check them on a Mellanox
>>> NIC?
>>>
>>>>
>>>>   WARNING: CPU: 2 PID: 1583 at net/sched/sch_htb.c:1959
>>>>   CPU: 2 PID: 1583 Comm: tc Kdump: loaded 6.6.0-rc2.mptcp_7895773e5235+ #59
>>>>   Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc37 04/01/2014
>>>>   RIP: 0010:htb_change_class+0x25c4/0x2e30 [sch_htb]
>>>>   Code: 24 58 48 b8 00 00 00 00 00 fc ff df 48 89 ca 48 c1 ea 03 80 3c 02 00 0f 85 92 01 00 00 49 89 8c 24 b0 01 00 00 e9 77 fc ff ff <0f> 0b e9 15 ec ff ff 80 3d f8 35 00 00 00 0f 85 d4 f9 ff ff ba 32
>>>>   RSP: 0018:ffffc900015df240 EFLAGS: 00010246
>>>>   RAX: 0000000000000000 RBX: ffff88811b4ca000 RCX: ffff88811db42800
>>>>   RDX: 1ffff11023b68502 RSI: ffffffffaf2e6a00 RDI: ffff88811db42810
>>>>   RBP: ffff88811db45000 R08: 0000000000000001 R09: fffffbfff664bbc9
>>>>   R10: ffffffffb325de4f R11: ffffffffb2d33748 R12: 0000000000000000
>>>>   R13: ffff88811db43000 R14: ffff88811b4caaac R15: ffff8881252c0030
>>>>   FS:  00007f6c1f126740(0000) GS:ffff88815aa00000(0000) knlGS:0000000000000000
>>>>   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>   CR2: 000055dca8e5b4a8 CR3: 000000011bc7a006 CR4: 0000000000370ee0
>>>>   DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>>   DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>>   Call Trace:
>>>>   <TASK>
>>>>    tc_ctl_tclass+0x394/0xeb0
>>>>    rtnetlink_rcv_msg+0x2f5/0xaa0
>>>>    netlink_rcv_skb+0x12e/0x3a0
>>>>    netlink_unicast+0x421/0x730
>>>>    netlink_sendmsg+0x79e/0xc60
>>>>    ____sys_sendmsg+0x95a/0xc20
>>>>    ___sys_sendmsg+0xee/0x170
>>>>    __sys_sendmsg+0xc6/0x170
>>>>   do_syscall_64+0x58/0x80
>>>>   entry_SYSCALL_64_after_hwframe+0x6e/0xd8
>>>>
>>>> The first command creates per TX queue pfifo qdiscs in
>>>> tc_modify_qdisc() -> htb_init() and grafts the pfifo to each dev_queue
>>>> via tc_modify_qdisc() ->  qdisc_graft() -> htb_attach().
>>>
>>> Not exactly; it grafts pfifo to direct queues only. htb_attach_offload
>>> explicitly grafts noop to all the remaining queues.
>>
>> num_direct_qdiscs == real_num_tx_queues:
>>
>> https://elixir.bootlin.com/linux/latest/source/net/sched/sch_htb.c#L1101
>>
>> pfifo will be configured on all the TX queues available at TC creation
>> time, right?
> 
> Yes, all real TX queues will be used as direct queues (for unclassified
> traffic). num_tx_queues should be somewhat bigger than
> real_num_tx_queues - it should reserve a queue per potential leaf class.
> 
> pfifo is configured on direct queues, and the reserved queues have noop.
> Then, when a new leaf class is added (TC_HTB_LEAF_ALLOC_QUEUE), the
> driver allocates a new queue and increases real_num_tx_queues. HTB
> assigns a pfifo qdisc to the newly allocated queue.
> 
> Changing the hierarchy (deleting a node or converting an inner node to a
> leaf) may reorder the classful queues (with indexes >= the initial
> real_num_tx_queues), so that there are no gaps.
> 
>> Lacking a mlx card with offload support I hack basic htb support in
>> netdevsim and I observe the splat on top of such device. I can as well
>> share the netdevsim patch - it will need some clean-up.
> 
> I will be happy to review the netdevsim patch, but I don't promise
> prompt responsiveness.
> 
>>>
>>>> When the command completes, the qdisc_sleeping for each dev_queue is a
>>>> pfifo one. The next class creation will trigger the reported splat.
>>>>
>>>> Address the issue taking care of old non-builtin qdisc in
>>>> htb_change_class().
>>>>
>>>> Fixes: d03b195b5aa0 ("sch_htb: Hierarchical QoS hardware offload")
>>>> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
>>>> ---
>>>>   net/sched/sch_htb.c | 3 +--
>>>>   1 file changed, 1 insertion(+), 2 deletions(-)
>>>>
>>>> diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
>>>> index 0d947414e616..dc682bd542b4 100644
>>>> --- a/net/sched/sch_htb.c
>>>> +++ b/net/sched/sch_htb.c
>>>> @@ -1955,8 +1955,7 @@ static int htb_change_class(struct Qdisc *sch, u32 classid,
>>>>   				qdisc_refcount_inc(new_q);
>>>>   			}
>>>>   			old_q = htb_graft_helper(dev_queue, new_q);
>>>> -			/* No qdisc_put needed. */
>>>> -			WARN_ON(!(old_q->flags & TCQ_F_BUILTIN));
>>>> +			qdisc_put(old_q);
>>>
>>> We can get here after one of two cases above:
>>>
>>> 1. A new queue is allocated with TC_HTB_LEAF_ALLOC_QUEUE. It's supposed
>>> to have a noop qdisc by default (after htb_attach_offload).
>>
>> So most likely the trivial netdevsim implementation I used was not good
>> enough.
>>
>> Which constrains should respect TC_HTB_LEAF_ALLOC_QUEUE WRT the
>> returned qid value? should it in the (real_num_tx_queues,
>> num_tx_queues] range?
> 
> Let's say N is real_num_tx_queues as it was at the moment of attaching.
> HTB queues should be allocated from [N, num_tx_queues), and
> real_num_tx_queues should be increased accordingly. It should not return
> queues number [0, N).
> 
> Deletions should fill the gaps: if queue X is being deleted, N <= X <
> real_num_tx_queues - 1, then the gap should be filled with queue number
> real_num_tx_queues - 1 by swapping the queues (real_num_tx_queues will
> be decreased by 1 accordingly). Some care also needs to be taken when
> converting inner-to-leaf (TC_HTB_LEAF_DEL_LAST) and leaf-to-inner (it's
> better to get insights from [1], there are also some comments).
> 
>> Can HTB actually configure H/W shaping on
>> real_num_tx_queues?
> 
> It will be on real_num_tx_queues, but after it's increased to add new
> HTB queues. The original queues [0, N) are used for direct traffic, same
> as the non-offloaded HTB's direct_queue (it's not shaped).
> 
>> I find no clear documentation WRT the above.
> 
> I'm sorry for the lack of documentation. All I have is the commit
> message [2] and a netdev talk [3]. Maybe the slides could be of some
> use...
> 
> I hope the above explanation clarifies something, and feel free to ask
> further questions, I'll be glad to explain what hasn't been documented
> properly.

We would like to enable Tx rate limiting using htb offload on all the 
existing queues. We are able to do with the following set of commands 
with Paolo's patch

tc qdisc add dev enp175s0v0 handle 1: root htb offload
tc class add dev enp175s0v0 parent 1: classid 1:1 htb rate 1000mbit ceil 
2000mbit burst 100k

where
   classid 1:1 is tx queue0
   tx_minrate is rate 1000mbps
   tx_maxrate is ceil 2000mbps


In order to not break your implementation could bring in if condition 
instead WARN_ON, something like below
	if (!(old_q->flags & TCQ_F_BUILTIN))
		qdisc_put(old_q);

Would this work for you, please advise.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net] net: sched: fix warn on htb offloaded class creation
  2023-11-09 21:54       ` Chittim, Madhu
@ 2023-11-10 11:07         ` Maxim Mikityanskiy
  2023-11-12  8:48           ` Paolo Abeni
  0 siblings, 1 reply; 11+ messages in thread
From: Maxim Mikityanskiy @ 2023-11-10 11:07 UTC (permalink / raw)
  To: Chittim, Madhu
  Cc: Paolo Abeni, netdev, Jamal Hadi Salim, Cong Wang, Jiri Pirko,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Tariq Toukan,
	Gal Pressman, Saeed Mahameed, xuejun.zhang, sridhar.samudrala

On Thu, 09 Nov 2023 at 13:54:17 -0800, Chittim, Madhu wrote:
> 
> 
> On 10/31/2023 7:40 AM, Maxim Mikityanskiy wrote:
> > On Tue, 31 Oct 2023 at 10:11:14 +0100, Paolo Abeni wrote:
> > > Hi,
> > > 
> > > I'm sorry for the late reply.
> > > 
> > > On Fri, 2023-10-27 at 16:57 +0300, Maxim Mikityanskiy wrote:
> > > > I believe this is not the right fix.
> > > > 
> > > > On Thu, 26 Oct 2023 at 17:36:48 +0200, Paolo Abeni wrote:
> > > > > The following commands:
> > > > > 
> > > > > tc qdisc add dev eth1 handle 2: root htb offload
> > > > > tc class add dev eth1 parent 2: classid 2:1 htb rate 5mbit burst 15k
> > > > > 
> > > > > yeld to a WARN in the HTB qdisc:
> > > > 
> > > > Something is off here. These are literally the most basic commands one
> > > > could invoke with HTB offload, I'm sure they worked. Is it something
> > > > that broke recently? Tariq/Gal/Saeed, could you check them on a Mellanox
> > > > NIC?
> > > > 
> > > > > 
> > > > >   WARNING: CPU: 2 PID: 1583 at net/sched/sch_htb.c:1959
> > > > >   CPU: 2 PID: 1583 Comm: tc Kdump: loaded 6.6.0-rc2.mptcp_7895773e5235+ #59
> > > > >   Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc37 04/01/2014
> > > > >   RIP: 0010:htb_change_class+0x25c4/0x2e30 [sch_htb]
> > > > >   Code: 24 58 48 b8 00 00 00 00 00 fc ff df 48 89 ca 48 c1 ea 03 80 3c 02 00 0f 85 92 01 00 00 49 89 8c 24 b0 01 00 00 e9 77 fc ff ff <0f> 0b e9 15 ec ff ff 80 3d f8 35 00 00 00 0f 85 d4 f9 ff ff ba 32
> > > > >   RSP: 0018:ffffc900015df240 EFLAGS: 00010246
> > > > >   RAX: 0000000000000000 RBX: ffff88811b4ca000 RCX: ffff88811db42800
> > > > >   RDX: 1ffff11023b68502 RSI: ffffffffaf2e6a00 RDI: ffff88811db42810
> > > > >   RBP: ffff88811db45000 R08: 0000000000000001 R09: fffffbfff664bbc9
> > > > >   R10: ffffffffb325de4f R11: ffffffffb2d33748 R12: 0000000000000000
> > > > >   R13: ffff88811db43000 R14: ffff88811b4caaac R15: ffff8881252c0030
> > > > >   FS:  00007f6c1f126740(0000) GS:ffff88815aa00000(0000) knlGS:0000000000000000
> > > > >   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > >   CR2: 000055dca8e5b4a8 CR3: 000000011bc7a006 CR4: 0000000000370ee0
> > > > >   DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > > >   DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > > >   Call Trace:
> > > > >   <TASK>
> > > > >    tc_ctl_tclass+0x394/0xeb0
> > > > >    rtnetlink_rcv_msg+0x2f5/0xaa0
> > > > >    netlink_rcv_skb+0x12e/0x3a0
> > > > >    netlink_unicast+0x421/0x730
> > > > >    netlink_sendmsg+0x79e/0xc60
> > > > >    ____sys_sendmsg+0x95a/0xc20
> > > > >    ___sys_sendmsg+0xee/0x170
> > > > >    __sys_sendmsg+0xc6/0x170
> > > > >   do_syscall_64+0x58/0x80
> > > > >   entry_SYSCALL_64_after_hwframe+0x6e/0xd8
> > > > > 
> > > > > The first command creates per TX queue pfifo qdiscs in
> > > > > tc_modify_qdisc() -> htb_init() and grafts the pfifo to each dev_queue
> > > > > via tc_modify_qdisc() ->  qdisc_graft() -> htb_attach().
> > > > 
> > > > Not exactly; it grafts pfifo to direct queues only. htb_attach_offload
> > > > explicitly grafts noop to all the remaining queues.
> > > 
> > > num_direct_qdiscs == real_num_tx_queues:
> > > 
> > > https://elixir.bootlin.com/linux/latest/source/net/sched/sch_htb.c#L1101
> > > 
> > > pfifo will be configured on all the TX queues available at TC creation
> > > time, right?
> > 
> > Yes, all real TX queues will be used as direct queues (for unclassified
> > traffic). num_tx_queues should be somewhat bigger than
> > real_num_tx_queues - it should reserve a queue per potential leaf class.
> > 
> > pfifo is configured on direct queues, and the reserved queues have noop.
> > Then, when a new leaf class is added (TC_HTB_LEAF_ALLOC_QUEUE), the
> > driver allocates a new queue and increases real_num_tx_queues. HTB
> > assigns a pfifo qdisc to the newly allocated queue.
> > 
> > Changing the hierarchy (deleting a node or converting an inner node to a
> > leaf) may reorder the classful queues (with indexes >= the initial
> > real_num_tx_queues), so that there are no gaps.
> > 
> > > Lacking a mlx card with offload support I hack basic htb support in
> > > netdevsim and I observe the splat on top of such device. I can as well
> > > share the netdevsim patch - it will need some clean-up.
> > 
> > I will be happy to review the netdevsim patch, but I don't promise
> > prompt responsiveness.
> > 
> > > > 
> > > > > When the command completes, the qdisc_sleeping for each dev_queue is a
> > > > > pfifo one. The next class creation will trigger the reported splat.
> > > > > 
> > > > > Address the issue taking care of old non-builtin qdisc in
> > > > > htb_change_class().
> > > > > 
> > > > > Fixes: d03b195b5aa0 ("sch_htb: Hierarchical QoS hardware offload")
> > > > > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > > > > ---
> > > > >   net/sched/sch_htb.c | 3 +--
> > > > >   1 file changed, 1 insertion(+), 2 deletions(-)
> > > > > 
> > > > > diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
> > > > > index 0d947414e616..dc682bd542b4 100644
> > > > > --- a/net/sched/sch_htb.c
> > > > > +++ b/net/sched/sch_htb.c
> > > > > @@ -1955,8 +1955,7 @@ static int htb_change_class(struct Qdisc *sch, u32 classid,
> > > > >   				qdisc_refcount_inc(new_q);
> > > > >   			}
> > > > >   			old_q = htb_graft_helper(dev_queue, new_q);
> > > > > -			/* No qdisc_put needed. */
> > > > > -			WARN_ON(!(old_q->flags & TCQ_F_BUILTIN));
> > > > > +			qdisc_put(old_q);
> > > > 
> > > > We can get here after one of two cases above:
> > > > 
> > > > 1. A new queue is allocated with TC_HTB_LEAF_ALLOC_QUEUE. It's supposed
> > > > to have a noop qdisc by default (after htb_attach_offload).
> > > 
> > > So most likely the trivial netdevsim implementation I used was not good
> > > enough.
> > > 
> > > Which constrains should respect TC_HTB_LEAF_ALLOC_QUEUE WRT the
> > > returned qid value? should it in the (real_num_tx_queues,
> > > num_tx_queues] range?
> > 
> > Let's say N is real_num_tx_queues as it was at the moment of attaching.
> > HTB queues should be allocated from [N, num_tx_queues), and
> > real_num_tx_queues should be increased accordingly. It should not return
> > queues number [0, N).
> > 
> > Deletions should fill the gaps: if queue X is being deleted, N <= X <
> > real_num_tx_queues - 1, then the gap should be filled with queue number
> > real_num_tx_queues - 1 by swapping the queues (real_num_tx_queues will
> > be decreased by 1 accordingly). Some care also needs to be taken when
> > converting inner-to-leaf (TC_HTB_LEAF_DEL_LAST) and leaf-to-inner (it's
> > better to get insights from [1], there are also some comments).
> > 
> > > Can HTB actually configure H/W shaping on
> > > real_num_tx_queues?
> > 
> > It will be on real_num_tx_queues, but after it's increased to add new
> > HTB queues. The original queues [0, N) are used for direct traffic, same
> > as the non-offloaded HTB's direct_queue (it's not shaped).
> > 
> > > I find no clear documentation WRT the above.
> > 
> > I'm sorry for the lack of documentation. All I have is the commit
> > message [2] and a netdev talk [3]. Maybe the slides could be of some
> > use...
> > 
> > I hope the above explanation clarifies something, and feel free to ask
> > further questions, I'll be glad to explain what hasn't been documented
> > properly.
> 
> We would like to enable Tx rate limiting using htb offload on all the
> existing queues.

I don't seem to understand how you see it possible with HTB.

1. Where would the unclassified traffic go? HTB uses a set of filters
that steer certain traffic to certain classes. Everything that doesn't
match the filter goes to the direct queue, which doesn't have shaping.
In the non-offloaded HTB it's struct htb_sched::direct_queue. With HTB
offload it's mapped to the standard set of queues.

2. How do you see the mapping of HTB classes to netdev queues if you use
a fixed set of queues? In the current implementation of HTB offload,
each new HTB class creates a new queue. If you are bound to using only N
standard queues, how would the allocation work when you add the second,
third, etc. HTB classes?

> We are able to do with the following set of commands with
> Paolo's patch
> 
> tc qdisc add dev enp175s0v0 handle 1: root htb offload
> tc class add dev enp175s0v0 parent 1: classid 1:1 htb rate 1000mbit ceil
> 2000mbit burst 100k

As long as you don't attach any filters, this set of commands is not
supposed to shape anything. It should TX full-speed, you should be able
to see this effect if you configure non-offloaded HTB. As long as there
are no filters, and the default classid doesn't match your only class,
all traffic will go to the direct queue.

What is your goal? If you need shaping for the whole netdev, maybe HTB
is not needed, and it's enough to attach a catchall filter with the
police action? Or use /sys/class/net/eth0/queues/tx-*/tx_maxrate
per-queue? Or tc mqprio mode channel, that allows to group existing
queues and assign rate limits?

> where
>   classid 1:1 is tx queue0
>   tx_minrate is rate 1000mbps
>   tx_maxrate is ceil 2000mbps
> 
> 
> In order to not break your implementation could bring in if condition
> instead WARN_ON, something like below
> 	if (!(old_q->flags & TCQ_F_BUILTIN))
> 		qdisc_put(old_q);
> 
> Would this work for you, please advise.

One of the reasons this WARN is there to prevent misuse of the API (the
other is to catch bugs that lead to inconsistencies in the internal
state). We can't just remove the WARN and start misusing the API. Your
ideas require more major design changes, but before that I need to
understand your approach to solving (1) and (2).

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net] net: sched: fix warn on htb offloaded class creation
  2023-11-10 11:07         ` Maxim Mikityanskiy
@ 2023-11-12  8:48           ` Paolo Abeni
  2023-11-13 14:51             ` Maxim Mikityanskiy
  0 siblings, 1 reply; 11+ messages in thread
From: Paolo Abeni @ 2023-11-12  8:48 UTC (permalink / raw)
  To: Maxim Mikityanskiy, Chittim, Madhu, Jiri Pirko
  Cc: netdev, Jamal Hadi Salim, Cong Wang, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Tariq Toukan, Gal Pressman,
	Saeed Mahameed, xuejun.zhang, sridhar.samudrala

On Fri, 2023-11-10 at 13:07 +0200, Maxim Mikityanskiy wrote:
> On Thu, 09 Nov 2023 at 13:54:17 -0800, Chittim, Madhu wrote:
> > We would like to enable Tx rate limiting using htb offload on all the
> > existing queues.
> 
> I don't seem to understand how you see it possible with HTB.

I must admit I feel sorry for not being able to join any of the
upcoming conferences, but to me it looks like there is some
communication gap that could be filled by in-person discussion.

Specifically the above to me sounds contradictory to what you stated
here:

https://lore.kernel.org/netdev/ZUEQzsKiIlgtbN-S@mail.gmail.com/

"""
> Can HTB actually configure H/W shaping on
> real_num_tx_queues?

It will be on real_num_tx_queues, but after it's increased to add new
HTB queues. The original queues [0, N) are used for direct traffic,
same as the non-offloaded HTB's direct_queue (it's not shaped).
"""

> What is your goal? 

We are looking for clean interface to configure individually min/max
shaping on each TX queue for a given netdev (actually virtual
function).

Jiri's suggested to use TC:

https://lore.kernel.org/netdev/ZOTVkXWCLY88YfjV@nanopsycho/

which IMHO makes sense as far as there is an existing interface (qdisc)
providing the required features (almost) out-of-the-box. It looks like
it's not the case.

If a non trivial configuration and/or significant implementation are
required, IMHO other options could be better...

> If you need shaping for the whole netdev, maybe HTB
> is not needed, and it's enough to attach a catchall filter with the
> police action? Or use /sys/class/net/eth0/queues/tx-*/tx_maxrate
> per-queue?

... specifically this latter one (it will require implementing in-
kernel support for 'max_rate', but should quite straight-forward).

It would be great to converge on some agreement on the way forward,
thanks!

Paolo


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net] net: sched: fix warn on htb offloaded class creation
  2023-11-12  8:48           ` Paolo Abeni
@ 2023-11-13 14:51             ` Maxim Mikityanskiy
  2023-11-13 15:21               ` Paolo Abeni
  0 siblings, 1 reply; 11+ messages in thread
From: Maxim Mikityanskiy @ 2023-11-13 14:51 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Chittim, Madhu, Jiri Pirko, netdev, Jamal Hadi Salim, Cong Wang,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Tariq Toukan,
	Gal Pressman, Saeed Mahameed, xuejun.zhang, sridhar.samudrala

On Sun, 12 Nov 2023 at 09:48:19 +0100, Paolo Abeni wrote:
> On Fri, 2023-11-10 at 13:07 +0200, Maxim Mikityanskiy wrote:
> > On Thu, 09 Nov 2023 at 13:54:17 -0800, Chittim, Madhu wrote:
> > > We would like to enable Tx rate limiting using htb offload on all the
> > > existing queues.
> > 
> > I don't seem to understand how you see it possible with HTB.
> 
> I must admit I feel sorry for not being able to join any of the
> upcoming conferences, but to me it looks like there is some
> communication gap that could be filled by in-person discussion.
> 
> Specifically the above to me sounds contradictory to what you stated
> here:
> 
> https://lore.kernel.org/netdev/ZUEQzsKiIlgtbN-S@mail.gmail.com/
> 
> """
> > Can HTB actually configure H/W shaping on
> > real_num_tx_queues?
> 
> It will be on real_num_tx_queues, but after it's increased to add new
> HTB queues. The original queues [0, N) are used for direct traffic,
> same as the non-offloaded HTB's direct_queue (it's not shaped).
> """

Sorry if that was confusing, there is actually no contradition, let me
rephrase. Queues number [0, orig_real_num_tx_queues) are direct, they
are not shaped, they correspond to HTB's unclassified traffic. Queues
number [orig_real_num_tx_queues, real_num_tx_queues) correspond to HTB
classes and are shaped. Here orig_real_num_tx_queues is how many queues
the netdev had before HTB offload was attached. It's basically the
standard set of queues, and HTB creates a new queue per class. Let me
know if that helps.

> > What is your goal? 
> 
> We are looking for clean interface to configure individually min/max
> shaping on each TX queue for a given netdev (actually virtual
> function).

Have you tried tc mqprio? If you set `mode channel` and create queue
groups with only one queue each, you can set min_rate and max_rate for
each group (==queue), and it works with the existing set of queues.

> Jiri's suggested to use TC:
> 
> https://lore.kernel.org/netdev/ZOTVkXWCLY88YfjV@nanopsycho/
> 
> which IMHO makes sense as far as there is an existing interface (qdisc)
> providing the required features (almost) out-of-the-box. It looks like
> it's not the case.
> 
> If a non trivial configuration and/or significant implementation are
> required, IMHO other options could be better...
> 
> > If you need shaping for the whole netdev, maybe HTB
> > is not needed, and it's enough to attach a catchall filter with the
> > police action? Or use /sys/class/net/eth0/queues/tx-*/tx_maxrate
> > per-queue?
> 
> ... specifically this latter one (it will require implementing in-
> kernel support for 'max_rate', but should quite straight-forward).
> 
> It would be great to converge on some agreement on the way forward,
> thanks!
> 
> Paolo
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net] net: sched: fix warn on htb offloaded class creation
  2023-11-13 14:51             ` Maxim Mikityanskiy
@ 2023-11-13 15:21               ` Paolo Abeni
  2023-11-15 18:51                 ` Chittim, Madhu
  0 siblings, 1 reply; 11+ messages in thread
From: Paolo Abeni @ 2023-11-13 15:21 UTC (permalink / raw)
  To: Maxim Mikityanskiy, Jiri Pirko
  Cc: Chittim, Madhu, netdev, Jamal Hadi Salim, Cong Wang,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Tariq Toukan,
	Gal Pressman, Saeed Mahameed, xuejun.zhang, sridhar.samudrala

On Mon, 2023-11-13 at 16:51 +0200, Maxim Mikityanskiy wrote:
> On Sun, 12 Nov 2023 at 09:48:19 +0100, Paolo Abeni wrote:
> > On Fri, 2023-11-10 at 13:07 +0200, Maxim Mikityanskiy wrote:
> > > On Thu, 09 Nov 2023 at 13:54:17 -0800, Chittim, Madhu wrote:
> > > > We would like to enable Tx rate limiting using htb offload on all the
> > > > existing queues.
> > > 
> > > I don't seem to understand how you see it possible with HTB.
> > 
> > I must admit I feel sorry for not being able to join any of the
> > upcoming conferences, but to me it looks like there is some
> > communication gap that could be filled by in-person discussion.
> > 
> > Specifically the above to me sounds contradictory to what you stated
> > here:
> > 
> > https://lore.kernel.org/netdev/ZUEQzsKiIlgtbN-S@mail.gmail.com/
> > 
> > """
> > > Can HTB actually configure H/W shaping on
> > > real_num_tx_queues?
> > 
> > It will be on real_num_tx_queues, but after it's increased to add new
> > HTB queues. The original queues [0, N) are used for direct traffic,
> > same as the non-offloaded HTB's direct_queue (it's not shaped).
> > """
> 
> Sorry if that was confusing, there is actually no contradition, let me
> rephrase. Queues number [0, orig_real_num_tx_queues) are direct, they
> are not shaped, they correspond to HTB's unclassified traffic. Queues
> number [orig_real_num_tx_queues, real_num_tx_queues) correspond to HTB
> classes and are shaped. Here orig_real_num_tx_queues is how many queues
> the netdev had before HTB offload was attached. It's basically the
> standard set of queues, and HTB creates a new queue per class. Let me
> know if that helps.
> 
> > > What is your goal? 
> > 
> > We are looking for clean interface to configure individually min/max
> > shaping on each TX queue for a given netdev (actually virtual
> > function).
> 
> Have you tried tc mqprio? If you set `mode channel` and create queue
> groups with only one queue each, you can set min_rate and max_rate for
> each group (==queue), and it works with the existing set of queues.

mqprio does not fit well here as:

* enforce an hard limit of 16 queues imposed by uAPI
* traffic class/queue selection depends on skb priority. That does not
fit well if e.g. the user would rely on a 1to1 CPU to queue mapping.

It looks like any existing scheduler would need some extension to fit
this use case.

@Jiri: when you suggested TC do you have anything specific in your
mind?

Otherwise it looks like extending /sys/class/net/eth0/queues/tx-
*/tx_maxrate would be more straight-forward to me.

Thanks!

Paolo


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net] net: sched: fix warn on htb offloaded class creation
  2023-11-13 15:21               ` Paolo Abeni
@ 2023-11-15 18:51                 ` Chittim, Madhu
  0 siblings, 0 replies; 11+ messages in thread
From: Chittim, Madhu @ 2023-11-15 18:51 UTC (permalink / raw)
  To: Paolo Abeni, Maxim Mikityanskiy, Jiri Pirko
  Cc: netdev, Jamal Hadi Salim, Cong Wang, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Tariq Toukan, Gal Pressman,
	Saeed Mahameed, xuejun.zhang, sridhar.samudrala



On 11/13/2023 7:21 AM, Paolo Abeni wrote:
> On Mon, 2023-11-13 at 16:51 +0200, Maxim Mikityanskiy wrote:
>> On Sun, 12 Nov 2023 at 09:48:19 +0100, Paolo Abeni wrote:
>>> On Fri, 2023-11-10 at 13:07 +0200, Maxim Mikityanskiy wrote:
>>>> On Thu, 09 Nov 2023 at 13:54:17 -0800, Chittim, Madhu wrote:
>>>>> We would like to enable Tx rate limiting using htb offload on all the
>>>>> existing queues.
>>>>
>>>> I don't seem to understand how you see it possible with HTB.
>>>
>>> I must admit I feel sorry for not being able to join any of the
>>> upcoming conferences, but to me it looks like there is some
>>> communication gap that could be filled by in-person discussion.
>>>
>>> Specifically the above to me sounds contradictory to what you stated
>>> here:
>>>
>>> https://lore.kernel.org/netdev/ZUEQzsKiIlgtbN-S@mail.gmail.com/
>>>
>>> """
>>>> Can HTB actually configure H/W shaping on
>>>> real_num_tx_queues?
>>>
>>> It will be on real_num_tx_queues, but after it's increased to add new
>>> HTB queues. The original queues [0, N) are used for direct traffic,
>>> same as the non-offloaded HTB's direct_queue (it's not shaped).
>>> """
>>
>> Sorry if that was confusing, there is actually no contradition, let me
>> rephrase. Queues number [0, orig_real_num_tx_queues) are direct, they
>> are not shaped, they correspond to HTB's unclassified traffic. Queues
>> number [orig_real_num_tx_queues, real_num_tx_queues) correspond to HTB
>> classes and are shaped. Here orig_real_num_tx_queues is how many queues
>> the netdev had before HTB offload was attached. It's basically the
>> standard set of queues, and HTB creates a new queue per class. Let me
>> know if that helps.
>>
>>>> What is your goal?
>>>
>>> We are looking for clean interface to configure individually min/max
>>> shaping on each TX queue for a given netdev (actually virtual
>>> function).
>>
>> Have you tried tc mqprio? If you set `mode channel` and create queue
>> groups with only one queue each, you can set min_rate and max_rate for
>> each group (==queue), and it works with the existing set of queues.
> 
> mqprio does not fit well here as:
> 
> * enforce an hard limit of 16 queues imposed by uAPI
> * traffic class/queue selection depends on skb priority. That does not
> fit well if e.g. the user would rely on a 1to1 CPU to queue mapping.
> 
> It looks like any existing scheduler would need some extension to fit
> this use case.
> 
> @Jiri: when you suggested TC do you have anything specific in your
> mind?
> 
> Otherwise it looks like extending /sys/class/net/eth0/queues/tx-
> */tx_maxrate would be more straight-forward to me.

We are planning to go with this approach of extending sysfs entry to 
have tx-maxrate and use this in our iAVF driver to support per queue Tx 
rate limiting

Below is the link to our original implementation using devlink interface 
which was rejected by Jiri
https://lore.kernel.org/netdev/20230822034003.31628-1-wenjun1.wu@intel.com/

@Jiri, @Jakub, Please advise

> 
> Thanks!
> 
> Paolo
> 



^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2023-11-15 18:51 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-10-26 15:36 [PATCH net] net: sched: fix warn on htb offloaded class creation Paolo Abeni
2023-10-27 13:57 ` Maxim Mikityanskiy
2023-10-29 12:01   ` Gal Pressman
2023-10-31  9:11   ` Paolo Abeni
2023-10-31 14:40     ` Maxim Mikityanskiy
2023-11-09 21:54       ` Chittim, Madhu
2023-11-10 11:07         ` Maxim Mikityanskiy
2023-11-12  8:48           ` Paolo Abeni
2023-11-13 14:51             ` Maxim Mikityanskiy
2023-11-13 15:21               ` Paolo Abeni
2023-11-15 18:51                 ` Chittim, Madhu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).