From: "Toke Høiland-Jørgensen" <toke@redhat.com>
To: Eric Dumazet <eric.dumazet@gmail.com>,
Cong Wang <xiyou.wangcong@gmail.com>,
Akshat Kakkar <akshat.1984@gmail.com>
Cc: Anton Danilov <littlesmilingcloud@gmail.com>,
NetFilter <netfilter-devel@vger.kernel.org>,
lartc <lartc@vger.kernel.org>, netdev <netdev@vger.kernel.org>
Subject: Re: Unable to create htb tc classes more than 64K
Date: Mon, 26 Aug 2019 07:28:50 +0000 [thread overview]
Message-ID: <87k1b0l70t.fsf@toke.dk> (raw)
In-Reply-To: <9cbefe10-b172-ae2a-0ac7-d972468eb7a2@gmail.com>
Eric Dumazet <eric.dumazet@gmail.com> writes:
> On 8/25/19 7:52 PM, Cong Wang wrote:
>> On Wed, Aug 21, 2019 at 11:00 PM Akshat Kakkar <akshat.1984@gmail.com> wrote:
>>>
>>> On Thu, Aug 22, 2019 at 3:37 AM Cong Wang <xiyou.wangcong@gmail.com> wrote:
>>>>> I am using ipset + iptables to classify and not filters. Besides, if
>>>>> tc is allowing me to define qdisc -> classes -> qdsic -> classes
>>>>> (1,2,3 ...) sort of structure (ie like the one shown in ascii tree)
>>>>> then how can those lowest child classes be actually used or consumed?
>>>>
>>>> Just install tc filters on the lower level too.
>>>
>>> If I understand correctly, you are saying,
>>> instead of :
>>> tc filter add dev eno2 parent 100: protocol ip prio 1 handle
>>> 0x00000001 fw flowid 1:10
>>> tc filter add dev eno2 parent 100: protocol ip prio 1 handle
>>> 0x00000002 fw flowid 1:20
>>> tc filter add dev eno2 parent 100: protocol ip prio 1 handle
>>> 0x00000003 fw flowid 2:10
>>> tc filter add dev eno2 parent 100: protocol ip prio 1 handle
>>> 0x00000004 fw flowid 2:20
>>>
>>>
>>> I should do this: (i.e. changing parent to just immediate qdisc)
>>> tc filter add dev eno2 parent 1: protocol ip prio 1 handle 0x00000001
>>> fw flowid 1:10
>>> tc filter add dev eno2 parent 1: protocol ip prio 1 handle 0x00000002
>>> fw flowid 1:20
>>> tc filter add dev eno2 parent 2: protocol ip prio 1 handle 0x00000003
>>> fw flowid 2:10
>>> tc filter add dev eno2 parent 2: protocol ip prio 1 handle 0x00000004
>>> fw flowid 2:20
>>
>>
>> Yes, this is what I meant.
>>
>>
>>>
>>> I tried this previously. But there is not change in the result.
>>> Behaviour is exactly same, i.e. I am still getting 100Mbps and not
>>> 100kbps or 300kbps
>>>
>>> Besides, as I mentioned previously I am using ipset + skbprio and not
>>> filters stuff. Filters I used just to test.
>>>
>>> ipset -N foo hash:ip,mark skbinfo
>>>
>>> ipset -A foo 10.10.10.10, 0x0x00000001 skbprio 1:10
>>> ipset -A foo 10.10.10.20, 0x0x00000002 skbprio 1:20
>>> ipset -A foo 10.10.10.30, 0x0x00000003 skbprio 2:10
>>> ipset -A foo 10.10.10.40, 0x0x00000004 skbprio 2:20
>>>
>>> iptables -A POSTROUTING -j SET --map-set foo dst,dst --map-prio
>>
>> Hmm..
>>
>> I am not familiar with ipset, but it seems to save the skbprio into
>> skb->priority, so it doesn't need TC filter to classify it again.
>>
>> I guess your packets might go to the direct queue of HTB, which
>> bypasses the token bucket. Can you dump the stats and check?
>
> With more than 64K 'classes' I suggest to use a single FQ qdisc [1], and
> an eBPF program using EDT model (Earliest Departure Time)
>
> The BPF program would perform the classification, then find a data structure
> based on the 'class', and then update/maintain class virtual times and skb->tstamp
>
> TBF = bpf_map_lookup_elem(&map, &classid);
>
> uint64_t now = bpf_ktime_get_ns();
> uint64_t time_to_send = max(TBF->time_to_send, now);
>
> time_to_send += (u64)qdisc_pkt_len(skb) * NSEC_PER_SEC / TBF->rate;
> if (time_to_send > TBF->max_horizon) {
> return TC_ACT_SHOT;
> }
> TBF->time_to_send = time_to_send;
> skb->tstamp = max(time_to_send, skb->tstamp);
> if (time_to_send - now > TBF->ecn_horizon)
> bpf_skb_ecn_set_ce(skb);
> return TC_ACT_OK;
>
> tools/testing/selftests/bpf/progs/test_tc_edt.c shows something similar.
>
>
> [1] MQ + FQ if the device is multi-queues.
>
> Note that this setup scales very well on SMP, since we no longer are forced
> to use a single HTB hierarchy (protected by a single spinlock)
Wow, this is very cool! Thanks for that walk-through, Eric :)
-Toke
WARNING: multiple messages have this Message-ID (diff)
From: "Toke Høiland-Jørgensen" <toke@redhat.com>
To: Eric Dumazet <eric.dumazet@gmail.com>,
Cong Wang <xiyou.wangcong@gmail.com>,
Akshat Kakkar <akshat.1984@gmail.com>
Cc: Anton Danilov <littlesmilingcloud@gmail.com>,
NetFilter <netfilter-devel@vger.kernel.org>,
lartc <lartc@vger.kernel.org>, netdev <netdev@vger.kernel.org>
Subject: Re: Unable to create htb tc classes more than 64K
Date: Mon, 26 Aug 2019 09:28:50 +0200 [thread overview]
Message-ID: <87k1b0l70t.fsf@toke.dk> (raw)
In-Reply-To: <9cbefe10-b172-ae2a-0ac7-d972468eb7a2@gmail.com>
Eric Dumazet <eric.dumazet@gmail.com> writes:
> On 8/25/19 7:52 PM, Cong Wang wrote:
>> On Wed, Aug 21, 2019 at 11:00 PM Akshat Kakkar <akshat.1984@gmail.com> wrote:
>>>
>>> On Thu, Aug 22, 2019 at 3:37 AM Cong Wang <xiyou.wangcong@gmail.com> wrote:
>>>>> I am using ipset + iptables to classify and not filters. Besides, if
>>>>> tc is allowing me to define qdisc -> classes -> qdsic -> classes
>>>>> (1,2,3 ...) sort of structure (ie like the one shown in ascii tree)
>>>>> then how can those lowest child classes be actually used or consumed?
>>>>
>>>> Just install tc filters on the lower level too.
>>>
>>> If I understand correctly, you are saying,
>>> instead of :
>>> tc filter add dev eno2 parent 100: protocol ip prio 1 handle
>>> 0x00000001 fw flowid 1:10
>>> tc filter add dev eno2 parent 100: protocol ip prio 1 handle
>>> 0x00000002 fw flowid 1:20
>>> tc filter add dev eno2 parent 100: protocol ip prio 1 handle
>>> 0x00000003 fw flowid 2:10
>>> tc filter add dev eno2 parent 100: protocol ip prio 1 handle
>>> 0x00000004 fw flowid 2:20
>>>
>>>
>>> I should do this: (i.e. changing parent to just immediate qdisc)
>>> tc filter add dev eno2 parent 1: protocol ip prio 1 handle 0x00000001
>>> fw flowid 1:10
>>> tc filter add dev eno2 parent 1: protocol ip prio 1 handle 0x00000002
>>> fw flowid 1:20
>>> tc filter add dev eno2 parent 2: protocol ip prio 1 handle 0x00000003
>>> fw flowid 2:10
>>> tc filter add dev eno2 parent 2: protocol ip prio 1 handle 0x00000004
>>> fw flowid 2:20
>>
>>
>> Yes, this is what I meant.
>>
>>
>>>
>>> I tried this previously. But there is not change in the result.
>>> Behaviour is exactly same, i.e. I am still getting 100Mbps and not
>>> 100kbps or 300kbps
>>>
>>> Besides, as I mentioned previously I am using ipset + skbprio and not
>>> filters stuff. Filters I used just to test.
>>>
>>> ipset -N foo hash:ip,mark skbinfo
>>>
>>> ipset -A foo 10.10.10.10, 0x0x00000001 skbprio 1:10
>>> ipset -A foo 10.10.10.20, 0x0x00000002 skbprio 1:20
>>> ipset -A foo 10.10.10.30, 0x0x00000003 skbprio 2:10
>>> ipset -A foo 10.10.10.40, 0x0x00000004 skbprio 2:20
>>>
>>> iptables -A POSTROUTING -j SET --map-set foo dst,dst --map-prio
>>
>> Hmm..
>>
>> I am not familiar with ipset, but it seems to save the skbprio into
>> skb->priority, so it doesn't need TC filter to classify it again.
>>
>> I guess your packets might go to the direct queue of HTB, which
>> bypasses the token bucket. Can you dump the stats and check?
>
> With more than 64K 'classes' I suggest to use a single FQ qdisc [1], and
> an eBPF program using EDT model (Earliest Departure Time)
>
> The BPF program would perform the classification, then find a data structure
> based on the 'class', and then update/maintain class virtual times and skb->tstamp
>
> TBF = bpf_map_lookup_elem(&map, &classid);
>
> uint64_t now = bpf_ktime_get_ns();
> uint64_t time_to_send = max(TBF->time_to_send, now);
>
> time_to_send += (u64)qdisc_pkt_len(skb) * NSEC_PER_SEC / TBF->rate;
> if (time_to_send > TBF->max_horizon) {
> return TC_ACT_SHOT;
> }
> TBF->time_to_send = time_to_send;
> skb->tstamp = max(time_to_send, skb->tstamp);
> if (time_to_send - now > TBF->ecn_horizon)
> bpf_skb_ecn_set_ce(skb);
> return TC_ACT_OK;
>
> tools/testing/selftests/bpf/progs/test_tc_edt.c shows something similar.
>
>
> [1] MQ + FQ if the device is multi-queues.
>
> Note that this setup scales very well on SMP, since we no longer are forced
> to use a single HTB hierarchy (protected by a single spinlock)
Wow, this is very cool! Thanks for that walk-through, Eric :)
-Toke
next prev parent reply other threads:[~2019-08-26 7:28 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-08-16 12:48 Unable to create htb tc classes more than 64K Akshat Kakkar
2019-08-16 12:48 ` Akshat Kakkar
2019-08-16 17:45 ` Cong Wang
2019-08-16 17:45 ` Cong Wang
2019-08-17 12:46 ` Akshat Kakkar
2019-08-17 12:58 ` Akshat Kakkar
2019-08-17 18:24 ` Cong Wang
2019-08-17 18:24 ` Cong Wang
2019-08-17 19:04 ` Akshat Kakkar
2019-08-17 19:16 ` Akshat Kakkar
2019-08-20 6:26 ` Akshat Kakkar
2019-08-20 6:38 ` Akshat Kakkar
2019-08-21 22:06 ` Cong Wang
2019-08-21 22:06 ` Cong Wang
2019-08-22 5:59 ` Akshat Kakkar
2019-08-22 6:11 ` Akshat Kakkar
2019-08-25 17:52 ` Cong Wang
2019-08-25 17:52 ` Cong Wang
2019-08-26 6:32 ` Eric Dumazet
2019-08-26 6:32 ` Eric Dumazet
2019-08-26 7:28 ` Toke Høiland-Jørgensen [this message]
2019-08-26 7:28 ` Toke Høiland-Jørgensen
2019-08-27 20:53 ` Dave Taht
2019-08-27 20:53 ` Dave Taht
2019-08-27 21:09 ` Eric Dumazet
2019-08-27 21:09 ` Eric Dumazet
2019-08-27 21:41 ` Dave Taht
2019-08-27 21:41 ` Dave Taht
2020-01-10 12:38 ` Akshat Kakkar
2020-01-10 12:50 ` Akshat Kakkar
2019-08-26 16:45 ` Jesper Dangaard Brouer
2019-08-26 16:45 ` Jesper Dangaard Brouer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87k1b0l70t.fsf@toke.dk \
--to=toke@redhat.com \
--cc=akshat.1984@gmail.com \
--cc=eric.dumazet@gmail.com \
--cc=lartc@vger.kernel.org \
--cc=littlesmilingcloud@gmail.com \
--cc=netdev@vger.kernel.org \
--cc=netfilter-devel@vger.kernel.org \
--cc=xiyou.wangcong@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.