netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Using HTB over MultiQ
@ 2013-11-07 13:12 Anton 'EvilMan' Danilov
  2013-11-07 13:49 ` Sergey Popovich
                   ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Anton 'EvilMan' Danilov @ 2013-11-07 13:12 UTC (permalink / raw)
  To: netdev

Hello.

I'm experimenting with high performance linux router with 10G NICs.
On high traffic rates the performance are limited by the lock of root
queue discipline. For avoid impact of locking i've decided to build
QoS scheme over the multiq qdisc.

And I have the issues with use to multiq discipline.

My setup:
1. Multiq qdisc is on top of interface.
2. To every multiq class i've attached htb discipline with own
hierachy of child classes.
3. The filters (u32 with hashing) are attached to the root multiq discipline.

Graphical scheme of hierarchy -
http://pixpin.ru/images/2013/11/07/multiq-hierarchy1.png

Fragments of script:

#add top qdisc and classes
 qdisc add dev eth0 root handle 10: multiq
 qdisc add dev eth0 parent 10:1 handle 11: htb
 class add dev eth0 parent 11: classid 11:1 htb rate 1250Mbit
 qdisc add dev eth0 parent 10:2 handle 12: htb
 class add dev eth0 parent 12: classid 12:1 htb rate 1250Mbit
 qdisc add dev eth0 parent 10:3 handle 13: htb
 class add dev eth0 parent 13: classid 13:1 htb rate 1250Mbit
 qdisc add dev eth0 parent 10:4 handle 14: htb
 class add dev eth0 parent 14: classid 14:1 htb rate 1250Mbit
 qdisc add dev eth0 parent 10:5 handle 15: htb
 class add dev eth0 parent 15: classid 15:1 htb rate 1250Mbit
 qdisc add dev eth0 parent 10:6 handle 16: htb
 class add dev eth0 parent 16: classid 16:1 htb rate 1250Mbit
 qdisc add dev eth0 parent 10:7 handle 17: htb
 class add dev eth0 parent 17: classid 17:1 htb rate 1250Mbit
 qdisc add dev eth0 parent 10:8 handle 18: htb
 class add dev eth0 parent 18: classid 18:1 htb rate 1250Mbit

#add leaf classes and qdiscs (several hundreds)
 ...
 class add dev eth0 parent 11:1 classid 11:1736 htb rate 1024kbit
 qdisc add dev eth0 parent 11:1736 handle 1736 pfifo limit 50
 ...

But I see zero statistics on the leaf htb classes and nonzero
statistics on the classifier filters:

~$ tc -s -p filter list dev eth1
 ...
 filter parent 10: protocol ip pref 5 u32 fh 2:f2:800 order 2048 key
ht 2 bkt f2 flowid 11:1736  (rule hit 306 success 306)
   match IP src xx.xx.xx.xx/30 (success 306 )
 ...

~$ tc -s -s -d c ls dev eth1 classid 11:1736
 class htb 11:1736 parent 11:1 leaf 1736: prio 0 quantum 12800 rate
1024Kbit ceil 1024Kbit burst 1599b/1 mpu 0b overhead 0b cburst 1599b/1
mpu 0b overhead 0b level 0
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  rate 0bit 0pps backlog 0b 0p requeues 0
  lended: 0 borrowed: 0 giants: 0
  tokens: 195312 ctokens: 195312

I think I've lost from view the some aspects of settings.
Has anyone setuped the like complex scheme over the multiq discipline?



-- 
Anton.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Using HTB over MultiQ
  2013-11-07 13:12 Using HTB over MultiQ Anton 'EvilMan' Danilov
@ 2013-11-07 13:49 ` Sergey Popovich
  2013-11-07 16:13   ` Eric Dumazet
  2013-11-07 14:11 ` Eric Dumazet
  2013-11-07 14:29 ` Eric Dumazet
  2 siblings, 1 reply; 20+ messages in thread
From: Sergey Popovich @ 2013-11-07 13:49 UTC (permalink / raw)
  To: netdev


> But I see zero statistics on the leaf htb classes and nonzero
> statistics on the classifier filters:
> 
> ~$ tc -s -p filter list dev eth1
>  ...
>  filter parent 10: protocol ip pref 5 u32 fh 2:f2:800 order 2048 key
> ht 2 bkt f2 flowid 11:1736  (rule hit 306 success 306)
>    match IP src xx.xx.xx.xx/30 (success 306 )
>  ...
> 
> ~$ tc -s -s -d c ls dev eth1 classid 11:1736
>  class htb 11:1736 parent 11:1 leaf 1736: prio 0 quantum 12800 rate
> 1024Kbit ceil 1024Kbit burst 1599b/1 mpu 0b overhead 0b cburst 1599b/1
> mpu 0b overhead 0b level 0
>   Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>   rate 0bit 0pps backlog 0b 0p requeues 0
>   lended: 0 borrowed: 0 giants: 0
>   tokens: 195312 ctokens: 195312
> 
> I think I've lost from view the some aspects of settings.
> Has anyone setuped the like complex scheme over the multiq discipline?

Since

commit 64153ce0a7b61b2a5cacb01805cbf670142339e9
Author: Eric Dumazet
Date:   Thu Jun 6 14:53:16 2013 -0700

    net_sched: htb: do not setup default rate estimators

rate estimators do not setup by default. To enable rate estimators
you could try to load module sch_htb with htb_rate_est=1 
or echo 1 >/sys/module/sch_htb/parameters/htb_rate_est and recreate
hierarchy.

-- 
SP5474-RIPE
Sergey Popovich

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Using HTB over MultiQ
  2013-11-07 13:12 Using HTB over MultiQ Anton 'EvilMan' Danilov
  2013-11-07 13:49 ` Sergey Popovich
@ 2013-11-07 14:11 ` Eric Dumazet
  2013-11-07 14:20   ` Eric Dumazet
  2013-11-07 14:29 ` Eric Dumazet
  2 siblings, 1 reply; 20+ messages in thread
From: Eric Dumazet @ 2013-11-07 14:11 UTC (permalink / raw)
  To: Anton 'EvilMan' Danilov; +Cc: netdev

On Thu, 2013-11-07 at 17:12 +0400, Anton 'EvilMan' Danilov wrote:
> Hello.
> 
> I'm experimenting with high performance linux router with 10G NICs.
> On high traffic rates the performance are limited by the lock of root
> queue discipline. For avoid impact of locking i've decided to build
> QoS scheme over the multiq qdisc.
> 
> And I have the issues with use to multiq discipline.
> 
> My setup:
> 1. Multiq qdisc is on top of interface.
> 2. To every multiq class i've attached htb discipline with own
> hierachy of child classes.
> 3. The filters (u32 with hashing) are attached to the root multiq discipline.
> 
> Graphical scheme of hierarchy -
> http://pixpin.ru/images/2013/11/07/multiq-hierarchy1.png
> 
> Fragments of script:
> 
> #add top qdisc and classes
>  qdisc add dev eth0 root handle 10: multiq
>  qdisc add dev eth0 parent 10:1 handle 11: htb
>  class add dev eth0 parent 11: classid 11:1 htb rate 1250Mbit
>  qdisc add dev eth0 parent 10:2 handle 12: htb
>  class add dev eth0 parent 12: classid 12:1 htb rate 1250Mbit
>  qdisc add dev eth0 parent 10:3 handle 13: htb
>  class add dev eth0 parent 13: classid 13:1 htb rate 1250Mbit
>  qdisc add dev eth0 parent 10:4 handle 14: htb
>  class add dev eth0 parent 14: classid 14:1 htb rate 1250Mbit
>  qdisc add dev eth0 parent 10:5 handle 15: htb
>  class add dev eth0 parent 15: classid 15:1 htb rate 1250Mbit
>  qdisc add dev eth0 parent 10:6 handle 16: htb
>  class add dev eth0 parent 16: classid 16:1 htb rate 1250Mbit
>  qdisc add dev eth0 parent 10:7 handle 17: htb
>  class add dev eth0 parent 17: classid 17:1 htb rate 1250Mbit
>  qdisc add dev eth0 parent 10:8 handle 18: htb
>  class add dev eth0 parent 18: classid 18:1 htb rate 1250Mbit
> 
> #add leaf classes and qdiscs (several hundreds)
>  ...
>  class add dev eth0 parent 11:1 classid 11:1736 htb rate 1024kbit
>  qdisc add dev eth0 parent 11:1736 handle 1736 pfifo limit 50
>  ...
> 
> But I see zero statistics on the leaf htb classes and nonzero
> statistics on the classifier filters:
> 
> ~$ tc -s -p filter list dev eth1
>  ...
>  filter parent 10: protocol ip pref 5 u32 fh 2:f2:800 order 2048 key
> ht 2 bkt f2 flowid 11:1736  (rule hit 306 success 306)
>    match IP src xx.xx.xx.xx/30 (success 306 )
>  ...
> 
> ~$ tc -s -s -d c ls dev eth1 classid 11:1736
>  class htb 11:1736 parent 11:1 leaf 1736: prio 0 quantum 12800 rate
> 1024Kbit ceil 1024Kbit burst 1599b/1 mpu 0b overhead 0b cburst 1599b/1
> mpu 0b overhead 0b level 0
>   Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>   rate 0bit 0pps backlog 0b 0p requeues 0
>   lended: 0 borrowed: 0 giants: 0
>   tokens: 195312 ctokens: 195312
> 
> I think I've lost from view the some aspects of settings.
> Has anyone setuped the like complex scheme over the multiq discipline?
> 

I think this is not going to work, because multiqueue selection happens
before applying the filters to find a flowid.

And queue selection selects the queue depending on factors that are not
coupled to your filters, like cpu number

It looks like you want to rate limit a large number of flows, you could
try the following setup :

for ETH in eth0
do
 tc qd del dev $ETH root 2>/dev/null

 tc qd add dev $ETH root handle 100: mq 
 for i in `seq 1 8`
 do
  tc qd add dev $ETH parent 100:$i handle $i fq maxrate 1Mbit
 done
done

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Using HTB over MultiQ
  2013-11-07 14:11 ` Eric Dumazet
@ 2013-11-07 14:20   ` Eric Dumazet
  2013-11-07 14:39     ` John Fastabend
  0 siblings, 1 reply; 20+ messages in thread
From: Eric Dumazet @ 2013-11-07 14:20 UTC (permalink / raw)
  To: Anton 'EvilMan' Danilov; +Cc: netdev

On Thu, 2013-11-07 at 06:11 -0800, Eric Dumazet wrote:
> On Thu, 2013-11-07 at 17:12 +0400, Anton 'EvilMan' Danilov wrote:
> > Hello.
> > 
> > I'm experimenting with high performance linux router with 10G NICs.
> > On high traffic rates the performance are limited by the lock of root
> > queue discipline. For avoid impact of locking i've decided to build
> > QoS scheme over the multiq qdisc.
> > 

Oh well, this is a router.

So I think you need to change the device queue selection so that it only
depends on your filters / htb classes.

Otherwise flows for a particular 'customer' might be spread on the 8
queues, so the rate could be 8 * 1Mbit, instead of 1Mbit.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Using HTB over MultiQ
  2013-11-07 13:12 Using HTB over MultiQ Anton 'EvilMan' Danilov
  2013-11-07 13:49 ` Sergey Popovich
  2013-11-07 14:11 ` Eric Dumazet
@ 2013-11-07 14:29 ` Eric Dumazet
  2 siblings, 0 replies; 20+ messages in thread
From: Eric Dumazet @ 2013-11-07 14:29 UTC (permalink / raw)
  To: Anton 'EvilMan' Danilov; +Cc: netdev

On Thu, 2013-11-07 at 17:12 +0400, Anton 'EvilMan' Danilov wrote:

> But I see zero statistics on the leaf htb classes and nonzero
> statistics on the classifier filters:
> 
> ~$ tc -s -p filter list dev eth1
>  ...
>  filter parent 10: protocol ip pref 5 u32 fh 2:f2:800 order 2048 key
> ht 2 bkt f2 flowid 11:1736  (rule hit 306 success 306)
>    match IP src xx.xx.xx.xx/30 (success 306 )
>  ...

Filter matches and flowid 11:1736 chosen

But then, when the packet enters HTB, it enters HTB on one of the 8 HTB
trees, and only one of them contains 11:1736 and could queue the packet
into its local pfifo 50.

For 7 others HTB trees, flowid 11:1736 doesn't match any htb class, so
packet is queued into the default queue attached to the HTB tree.

> 
> ~$ tc -s -s -d c ls dev eth1 classid 11:1736
>  class htb 11:1736 parent 11:1 leaf 1736: prio 0 quantum 12800 rate
> 1024Kbit ceil 1024Kbit burst 1599b/1 mpu 0b overhead 0b cburst 1599b/1
> mpu 0b overhead 0b level 0
>   Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>   rate 0bit 0pps backlog 0b 0p requeues 0
>   lended: 0 borrowed: 0 giants: 0
>   tokens: 195312 ctokens: 195312

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Using HTB over MultiQ
  2013-11-07 14:20   ` Eric Dumazet
@ 2013-11-07 14:39     ` John Fastabend
  2013-11-07 14:54       ` Eric Dumazet
  2013-11-08 14:53       ` Anton 'EvilMan' Danilov
  0 siblings, 2 replies; 20+ messages in thread
From: John Fastabend @ 2013-11-07 14:39 UTC (permalink / raw)
  To: Eric Dumazet, Anton 'EvilMan' Danilov; +Cc: netdev

On 11/7/2013 6:20 AM, Eric Dumazet wrote:
> On Thu, 2013-11-07 at 06:11 -0800, Eric Dumazet wrote:
>> On Thu, 2013-11-07 at 17:12 +0400, Anton 'EvilMan' Danilov wrote:
>>> Hello.
>>>
>>> I'm experimenting with high performance linux router with 10G NICs.
>>> On high traffic rates the performance are limited by the lock of root
>>> queue discipline. For avoid impact of locking i've decided to build
>>> QoS scheme over the multiq qdisc.
>>>
>
> Oh well, this is a router.
>
> So I think you need to change the device queue selection so that it only
> depends on your filters / htb classes.
>
> Otherwise flows for a particular 'customer' might be spread on the 8
> queues, so the rate could be 8 * 1Mbit, instead of 1Mbit.
>


With the multiq qdisc you could attach filter to the root qdisc and use
skbedit to set the queue_mapping field,

#tc filter add dev eth0 parent 1: protocol ip prio 1 u32 \
	match ip dst 192.168.0.3 \
	action skbedit queue_mapping 3

if you configure the filters to map to the correct classes this would
work.

Or another way would be use mqprio and steer packets to HTB classes
using the skb->priority. The priority can be set by iptables/nftables
or an ingress filter.

.John

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Using HTB over MultiQ
  2013-11-07 14:39     ` John Fastabend
@ 2013-11-07 14:54       ` Eric Dumazet
  2013-11-07 15:06         ` John Fastabend
  2013-11-08 14:53       ` Anton 'EvilMan' Danilov
  1 sibling, 1 reply; 20+ messages in thread
From: Eric Dumazet @ 2013-11-07 14:54 UTC (permalink / raw)
  To: John Fastabend; +Cc: Anton 'EvilMan' Danilov, netdev

On Thu, 2013-11-07 at 06:39 -0800, John Fastabend wrote:

> 
> With the multiq qdisc you could attach filter to the root qdisc and use
> skbedit to set the queue_mapping field,
> 
> #tc filter add dev eth0 parent 1: protocol ip prio 1 u32 \
> 	match ip dst 192.168.0.3 \
> 	action skbedit queue_mapping 3
> 

Oh right, this is the way ;)

I wonder if we can have 'action skbedit rxhash 34' ?


> if you configure the filters to map to the correct classes this would
> work.
> 
> Or another way would be use mqprio and steer packets to HTB classes
> using the skb->priority. The priority can be set by iptables/nftables
> or an ingress filter.

Yes, but this might duplicate the 'customer' tree Anton has to put on
the filters anyway ?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Using HTB over MultiQ
  2013-11-07 14:54       ` Eric Dumazet
@ 2013-11-07 15:06         ` John Fastabend
  2013-11-07 15:17           ` John Fastabend
  0 siblings, 1 reply; 20+ messages in thread
From: John Fastabend @ 2013-11-07 15:06 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Anton 'EvilMan' Danilov, netdev

On 11/7/2013 6:54 AM, Eric Dumazet wrote:
> On Thu, 2013-11-07 at 06:39 -0800, John Fastabend wrote:
>
>>
>> With the multiq qdisc you could attach filter to the root qdisc and use
>> skbedit to set the queue_mapping field,
>>
>> #tc filter add dev eth0 parent 1: protocol ip prio 1 u32 \
>> 	match ip dst 192.168.0.3 \
>> 	action skbedit queue_mapping 3
>>
>
> Oh right, this is the way ;)
>
> I wonder if we can have 'action skbedit rxhash 34' ?
>

Sure, it should easy enough.

>
>> if you configure the filters to map to the correct classes this would
>> work.
>>
>> Or another way would be use mqprio and steer packets to HTB classes
>> using the skb->priority. The priority can be set by iptables/nftables
>> or an ingress filter.
>
> Yes, but this might duplicate the 'customer' tree Anton has to put on
> the filters anyway ?
>

hmm not sure I understand. As long as customer flows are mapped to the
correct HTB qdisc via skb priority with the correct htb child classes
what would be duplicated?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Using HTB over MultiQ
  2013-11-07 15:06         ` John Fastabend
@ 2013-11-07 15:17           ` John Fastabend
  2013-11-07 16:10             ` Eric Dumazet
  0 siblings, 1 reply; 20+ messages in thread
From: John Fastabend @ 2013-11-07 15:17 UTC (permalink / raw)
  To: John Fastabend; +Cc: Eric Dumazet, Anton 'EvilMan' Danilov, netdev

On Thu, Nov 07, 2013 at 07:06:54AM -0800, John Fastabend wrote:
> On 11/7/2013 6:54 AM, Eric Dumazet wrote:
> >On Thu, 2013-11-07 at 06:39 -0800, John Fastabend wrote:
> >
> >>
> >>With the multiq qdisc you could attach filter to the root qdisc and use
> >>skbedit to set the queue_mapping field,
> >>
> >>#tc filter add dev eth0 parent 1: protocol ip prio 1 u32 \
> >>	match ip dst 192.168.0.3 \
> >>	action skbedit queue_mapping 3
> >>
> >
> >Oh right, this is the way ;)
> >
> >I wonder if we can have 'action skbedit rxhash 34' ?
> >
> 
> Sure, it should easy enough.

I think this is all it would take I only compile tested this for now.
If its useful I could send out a real (tested) patch later today.

diff --git a/include/net/tc_act/tc_skbedit.h b/include/net/tc_act/tc_skbedit.h
index e103fe0..3951f7d 100644
--- a/include/net/tc_act/tc_skbedit.h
+++ b/include/net/tc_act/tc_skbedit.h
@@ -27,6 +27,7 @@ struct tcf_skbedit {
 	u32			flags;
 	u32     		priority;
 	u32     		mark;
+	u32			rxhash;
 	u16			queue_mapping;
 	/* XXX: 16-bit pad here? */
 };
diff --git a/include/uapi/linux/tc_act/tc_skbedit.h b/include/uapi/linux/tc_act/tc_skbedit.h
index 7a2e910..d5a1d55 100644
--- a/include/uapi/linux/tc_act/tc_skbedit.h
+++ b/include/uapi/linux/tc_act/tc_skbedit.h
@@ -27,6 +27,7 @@
 #define SKBEDIT_F_PRIORITY		0x1
 #define SKBEDIT_F_QUEUE_MAPPING		0x2
 #define SKBEDIT_F_MARK			0x4
+#define SKBEDIT_F_RXHASH		0x8
 
 struct tc_skbedit {
 	tc_gen;
@@ -39,6 +40,7 @@ enum {
 	TCA_SKBEDIT_PRIORITY,
 	TCA_SKBEDIT_QUEUE_MAPPING,
 	TCA_SKBEDIT_MARK,
+	TCA_SKBEDIT_RXHASH,
 	__TCA_SKBEDIT_MAX
 };
 #define TCA_SKBEDIT_MAX (__TCA_SKBEDIT_MAX - 1)
diff --git a/net/sched/act_skbedit.c b/net/sched/act_skbedit.c
index cb42211..f6b6820 100644
--- a/net/sched/act_skbedit.c
+++ b/net/sched/act_skbedit.c
@@ -55,6 +55,8 @@ static int tcf_skbedit(struct sk_buff *skb, const struct tc_action *a,
 		skb_set_queue_mapping(skb, d->queue_mapping);
 	if (d->flags & SKBEDIT_F_MARK)
 		skb->mark = d->mark;
+	if (d->flags & SKBEDIT_F_RXHASH)
+		skb->rxhash = d->rxhash;
 
 	spin_unlock(&d->tcf_lock);
 	return d->tcf_action;
@@ -65,6 +67,7 @@ static const struct nla_policy skbedit_policy[TCA_SKBEDIT_MAX + 1] = {
 	[TCA_SKBEDIT_PRIORITY]		= { .len = sizeof(u32) },
 	[TCA_SKBEDIT_QUEUE_MAPPING]	= { .len = sizeof(u16) },
 	[TCA_SKBEDIT_MARK]		= { .len = sizeof(u32) },
+	[TCA_SKBEDIT_RXHASH]		= { .len = sizeof(u32) },
 };
 
 static int tcf_skbedit_init(struct net *net, struct nlattr *nla,
@@ -75,7 +78,7 @@ static int tcf_skbedit_init(struct net *net, struct nlattr *nla,
 	struct tc_skbedit *parm;
 	struct tcf_skbedit *d;
 	struct tcf_common *pc;
-	u32 flags = 0, *priority = NULL, *mark = NULL;
+	u32 flags = 0, *priority = NULL, *mark = NULL, *rxhash = NULL;
 	u16 *queue_mapping = NULL;
 	int ret = 0, err;
 
@@ -104,6 +107,11 @@ static int tcf_skbedit_init(struct net *net, struct nlattr *nla,
 		mark = nla_data(tb[TCA_SKBEDIT_MARK]);
 	}
 
+	if (tb[TCA_SKBEDIT_RXHASH] != NULL) {
+		flags |= SKBEDIT_F_RXHASH;
+		rxhash = nla_data(tb[TCA_SKBEDIT_RXHASH]);
+	}
+
 	if (!flags)
 		return -EINVAL;
 
@@ -135,6 +143,8 @@ static int tcf_skbedit_init(struct net *net, struct nlattr *nla,
 		d->queue_mapping = *queue_mapping;
 	if (flags & SKBEDIT_F_MARK)
 		d->mark = *mark;
+	if (flags & SKBEDIT_F_RXHASH)
+		d->rxhash = *rxhash;
 
 	d->tcf_action = parm->action;
 
@@ -181,6 +191,9 @@ static int tcf_skbedit_dump(struct sk_buff *skb, struct tc_action *a,
 	    nla_put(skb, TCA_SKBEDIT_MARK, sizeof(d->mark),
 		    &d->mark))
 		goto nla_put_failure;
+	if ((d->flags & SKBEDIT_F_RXHASH) &&
+	    nla_put(skb, TCA_SKBEDIT_RXHASH, sizeof(d->rxhash),
+		    &d->rxhash))
 	t.install = jiffies_to_clock_t(jiffies - d->tcf_tm.install);
 	t.lastuse = jiffies_to_clock_t(jiffies - d->tcf_tm.lastuse);
 	t.expires = jiffies_to_clock_t(d->tcf_tm.expires);

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: Using HTB over MultiQ
  2013-11-07 15:17           ` John Fastabend
@ 2013-11-07 16:10             ` Eric Dumazet
  0 siblings, 0 replies; 20+ messages in thread
From: Eric Dumazet @ 2013-11-07 16:10 UTC (permalink / raw)
  To: John Fastabend; +Cc: John Fastabend, Anton 'EvilMan' Danilov, netdev

On Thu, 2013-11-07 at 07:17 -0800, John Fastabend wrote:

> I think this is all it would take I only compile tested this for now.
> If its useful I could send out a real (tested) patch later today.

Well, this might be useful, my question was more 'Is it currently
supported', thanks !

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Using HTB over MultiQ
  2013-11-07 13:49 ` Sergey Popovich
@ 2013-11-07 16:13   ` Eric Dumazet
  0 siblings, 0 replies; 20+ messages in thread
From: Eric Dumazet @ 2013-11-07 16:13 UTC (permalink / raw)
  To: Sergey Popovich; +Cc: netdev

On Thu, 2013-11-07 at 15:49 +0200, Sergey Popovich wrote:


> commit 64153ce0a7b61b2a5cacb01805cbf670142339e9
> Author: Eric Dumazet
> Date:   Thu Jun 6 14:53:16 2013 -0700
> 
>     net_sched: htb: do not setup default rate estimators
> 
> rate estimators do not setup by default. To enable rate estimators
> you could try to load module sch_htb with htb_rate_est=1 
> or echo 1 >/sys/module/sch_htb/parameters/htb_rate_est and recreate
> hierarchy.
> 

To check if packets went through a class, you do not need
a rate estimator. Its 0 anyway after an idle period.

You have to take a look at counters as in :

Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Using HTB over MultiQ
  2013-11-07 14:39     ` John Fastabend
  2013-11-07 14:54       ` Eric Dumazet
@ 2013-11-08 14:53       ` Anton 'EvilMan' Danilov
  2013-11-08 15:07         ` Eric Dumazet
  1 sibling, 1 reply; 20+ messages in thread
From: Anton 'EvilMan' Danilov @ 2013-11-08 14:53 UTC (permalink / raw)
  To: John Fastabend; +Cc: Eric Dumazet, netdev

Hello.

On 11/7/2013 6:20 AM, Eric Dumazet wrote:
>
> So I think you need to change the device queue selection so that it only
> depends on your filters / htb classes.
>
> Otherwise flows for a particular 'customer' might be spread on the 8
> queues, so the rate could be 8 * 1Mbit, instead of 1Mbit.

2013/11/7 John Fastabend <john.r.fastabend@intel.com>:
>
> With the multiq qdisc you could attach filter to the root qdisc and use
> skbedit to set the queue_mapping field,
>
> #tc filter add dev eth0 parent 1: protocol ip prio 1 u32 \
>         match ip dst 192.168.0.3 \
>         action skbedit queue_mapping 3
>
> if you configure the filters to map to the correct classes this would
> work.
>

I did as you adviced to me. I'm using the skbedit action with classify
filters for map the packet to assigned queue.
But i've got the another unexpected issue. u32 classifier of the root
multiq qdisc isn't setting the skb->priority value, and result of the
classifying is lost. For avoid this behaviour i'm setting priority by
the skbedit action.
After this all works as expected.
Unfortunately performance aren't improved.

Over 60 percent of 'perf top' is _raw_spin_lock function.
Speed of traffic is near 6Gbit/s, packet rate ~ 1.3 Mpps. But this
speed are limited by shaper.

Thanks for help.

-- 
Anton.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Using HTB over MultiQ
  2013-11-08 14:53       ` Anton 'EvilMan' Danilov
@ 2013-11-08 15:07         ` Eric Dumazet
  2013-11-08 15:11           ` John Fastabend
       [not found]           ` <CAEzD07LmzCtVWM4wnq57N+NfqDUK3bLWDisSceyPfg4MiWz5=Q@mail.gmail.com>
  0 siblings, 2 replies; 20+ messages in thread
From: Eric Dumazet @ 2013-11-08 15:07 UTC (permalink / raw)
  To: Anton 'EvilMan' Danilov; +Cc: John Fastabend, netdev

On Fri, 2013-11-08 at 18:53 +0400, Anton 'EvilMan' Danilov wrote:

> Over 60 percent of 'perf top' is _raw_spin_lock function.
> Speed of traffic is near 6Gbit/s, packet rate ~ 1.3 Mpps. But this
> speed are limited by shaper.

Please post :

ethtool -S eth0  # or other nics

perf record -a -g sleep 10

perf report | tail -n 200

And possibly it would be nice if you send your tc script so that we can
check ;)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Using HTB over MultiQ
  2013-11-08 15:07         ` Eric Dumazet
@ 2013-11-08 15:11           ` John Fastabend
  2013-11-08 15:53             ` Anton 'EvilMan' Danilov
  2013-11-08 17:55             ` Eric Dumazet
       [not found]           ` <CAEzD07LmzCtVWM4wnq57N+NfqDUK3bLWDisSceyPfg4MiWz5=Q@mail.gmail.com>
  1 sibling, 2 replies; 20+ messages in thread
From: John Fastabend @ 2013-11-08 15:11 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Anton 'EvilMan' Danilov, netdev

On 11/8/2013 7:07 AM, Eric Dumazet wrote:
> On Fri, 2013-11-08 at 18:53 +0400, Anton 'EvilMan' Danilov wrote:
>
>> Over 60 percent of 'perf top' is _raw_spin_lock function.
>> Speed of traffic is near 6Gbit/s, packet rate ~ 1.3 Mpps. But this
>> speed are limited by shaper.
>
> Please post :
>
> ethtool -S eth0  # or other nics
>
> perf record -a -g sleep 10
>
> perf report | tail -n 200
>
> And possibly it would be nice if you send your tc script so that we can
> check ;)
>

perf would be interesting but note that multiq still uses the root qdisc
lock which you original stated you were trying to avoid.

mq and mqprio are really the only two existing qdiscs that work well for
performance with small packet sizes and multiqueue nics at least in my
experience/setup with these kinds of micro-benchmarks.

.John

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Using HTB over MultiQ
  2013-11-08 15:11           ` John Fastabend
@ 2013-11-08 15:53             ` Anton 'EvilMan' Danilov
  2013-11-08 21:56               ` Eric Dumazet
  2013-11-08 17:55             ` Eric Dumazet
  1 sibling, 1 reply; 20+ messages in thread
From: Anton 'EvilMan' Danilov @ 2013-11-08 15:53 UTC (permalink / raw)
  To: John Fastabend; +Cc: Eric Dumazet, netdev

2013/11/8 John Fastabend <john.r.fastabend@intel.com>:

> perf would be interesting but note that multiq still uses the root qdisc
> lock which you original stated you were trying to avoid.
>
> mq and mqprio are really the only two existing qdiscs that work well for
> performance with small packet sizes and multiqueue nics at least in my
> experience/setup with these kinds of micro-benchmarks.
>
> .John
>

Hm.. I think I need another way to classify packets for using of mq
qdisc, because it don't support classify at root (not supported error
on try attach filter).
On next week i'll try to write ipset extension for setting of priority
and queue_mapping value through the IPSET netfilter target.


-- 
Anton.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Fwd: Using HTB over MultiQ
       [not found]           ` <CAEzD07LmzCtVWM4wnq57N+NfqDUK3bLWDisSceyPfg4MiWz5=Q@mail.gmail.com>
@ 2013-11-08 16:11             ` Anton 'EvilMan' Danilov
  0 siblings, 0 replies; 20+ messages in thread
From: Anton 'EvilMan' Danilov @ 2013-11-08 16:11 UTC (permalink / raw)
  To: netdev, John Fastabend, Eric Dumazet

3/11/8 Eric Dumazet <eric.dumazet@gmail.com>:
> Please post :
>
> ethtool -S eth0  # or other nics
>
dau@diamond-b-new:~$ sudo ethtool -i eth0
driver: ixgbe
version: 3.18.7
firmware-version: 0x61c10001
bus-info: 0000:06:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no

dau@diamond-b-new:~$ sudo ethtool -S eth0
NIC statistics:
     rx_packets: 27615733650
     tx_packets: 22631364386
     rx_bytes: 21970067159056
     tx_bytes: 10777613703708
     rx_errors: 970
     tx_errors: 0
     rx_dropped: 0
     tx_dropped: 0
     multicast: 0
     collisions: 0
     rx_over_errors: 0
     rx_crc_errors: 969
     rx_frame_errors: 0
     rx_fifo_errors: 0
     rx_missed_errors: 0
     tx_aborted_errors: 0
     tx_carrier_errors: 0
     tx_fifo_errors: 0
     tx_heartbeat_errors: 0
     rx_pkts_nic: 27615733771
     tx_pkts_nic: 22631364450
     rx_bytes_nic: 22080530126479
     tx_bytes_nic: 10903921708333
     lsc_int: 7
     tx_busy: 0
     non_eop_descs: 0
     broadcast: 1
     rx_no_buffer_count: 0
     tx_timeout_count: 0
     tx_restart_queue: 0
     rx_long_length_errors: 0
     rx_short_length_errors: 0
     tx_flow_control_xon: 6158021
     rx_flow_control_xon: 0
     tx_flow_control_xoff: 6492967
     rx_flow_control_xoff: 0
     rx_csum_offload_errors: 10963
     alloc_rx_page_failed: 0
     alloc_rx_buff_failed: 0
     rx_no_dma_resources: 0
     hw_rsc_aggregated: 0
     hw_rsc_flushed: 0
     fdir_match: 12699568207
     fdir_miss: 17278480118
     fdir_overflow: 105313
     os2bmc_rx_by_bmc: 0
     os2bmc_tx_by_bmc: 0
     os2bmc_tx_by_host: 0
     os2bmc_rx_by_host: 0
     tx_queue_0_packets: 3513849609
     tx_queue_0_bytes: 1713928985198
     tx_queue_1_packets: 2975756171
     tx_queue_1_bytes: 1482722458160
     tx_queue_2_packets: 2767637888
     tx_queue_2_bytes: 1193622863115
     tx_queue_3_packets: 2544906780
     tx_queue_3_bytes: 1135152930636
     tx_queue_4_packets: 2372537806
     tx_queue_4_bytes: 999288935581
     tx_queue_5_packets: 2440784133
     tx_queue_5_bytes: 1061348848250
     tx_queue_6_packets: 3649915131
     tx_queue_6_bytes: 2220031265880
     tx_queue_7_packets: 2365976873
     tx_queue_7_bytes: 971517421682
    ...skip empty queues..
     rx_queue_0_packets: 3833356046
     rx_queue_0_bytes: 2979383872046
     rx_queue_1_packets: 3468460501
     rx_queue_1_bytes: 2944894700402
     rx_queue_2_packets: 4490817931
     rx_queue_2_bytes: 3331801734194
     rx_queue_3_packets: 3040960354
     rx_queue_3_bytes: 2311877907901
     rx_queue_4_packets: 2825992742
     rx_queue_4_bytes: 2145413330911
     rx_queue_5_packets: 3032906907
     rx_queue_5_bytes: 2455554004223
     rx_queue_6_packets: 3675117297
     rx_queue_6_bytes: 3266611260920
     rx_queue_7_packets: 3248121993
     rx_queue_7_bytes: 2534530380798

> perf record -a -g sleep 10
>
> perf report | tail -n 200
>
# Samples: 299K of event 'cycles'
# Event count (approx.): 274090453333
#
# Overhead          Command             Shared Object
                     Symbol
# ........  ...............  ........................
...........................................
#
    11.36%          swapper  [kernel.kallsyms]         [k]
_raw_spin_lock
                    |
                    --- _raw_spin_lock
                       |
                       |--92.83%-- dev_queue_xmit
                       |          ip_finish_output
                       |          ip_output
                       |          ip_forward_finish
                       |          ip_forward
                       |          ip_rcv_finish
                       |          ip_rcv
                       |          __netif_receive_skb_core
                       |          __netif_receive_skb
                       |          netif_receive_skb
                       |          napi_gro_receive
                       |          ixgbe_poll
                       |          net_rx_action
                       |          __do_softirq
                       |          call_softirq
                       |          do_softirq
                       |          irq_exit
                       |          |
                       |          |--96.12%-- do_IRQ
                       |          |          common_interrupt
                       |          |          |
                       |          |          |--86.81%-- cpuidle_idle_call
                       |          |          |          arch_cpu_idle
                       |          |          |          cpu_startup_entry
                       |          |          |          |
                       |          |          |          |--78.31%--
start_secondary
                       |          |          |          |
                       |          |          |           --21.69%-- rest_init
                       |          |          |                     start_kernel
                       |          |          |
x86_64_start_reservations
                       |          |          |
x86_64_start_kernel
                       |          |          |
                       |          |          |--12.90%-- cpuidle_enter_state
                       |          |          |          cpuidle_idle_call
                       |          |          |          arch_cpu_idle
                       |          |          |          cpu_startup_entry
                       |          |          |          |
                       |          |          |          |--80.50%--
start_secondary
                       |          |          |          |
                       |          |          |           --19.50%-- rest_init
                       |          |          |                     start_kernel
                       |          |          |
x86_64_start_reservations
                       |          |          |
x86_64_start_kernel
                       |          |           --0.30%-- [...]
                       |          |
                       |          |--3.87%-- smp_apic_timer_interrupt
                       |          |          apic_timer_interrupt
                       |          |          |
                       |          |          |--44.35%-- cpu_startup_entry
                       |          |          |          |
                       |          |          |          |--72.25%--
start_secondary
                       |          |          |          |
                       |          |          |           --27.75%-- rest_init
                       |          |          |                     start_kernel
                       |          |          |
x86_64_start_reservations
                       |          |          |
x86_64_start_kernel
                       |          |          |
                       |          |          |--37.26%-- cpuidle_idle_call
                       |          |          |          arch_cpu_idle
                       |          |          |          cpu_startup_entry
                       |          |          |          |
                       |          |          |          |--80.34%--
start_secondary
                       |          |          |          |
                       |          |          |           --19.66%-- rest_init
                       |          |          |                     start_kernel
                       |          |          |
x86_64_start_reservations
                       |          |          |
x86_64_start_kernel
                       |          |          |
                       |          |          |--8.83%-- cpuidle_enter_state
                       |          |          |          cpuidle_idle_call
                       |          |          |          arch_cpu_idle
                       |          |          |          cpu_startup_entry
                       |          |          |          |
                       |          |          |          |--76.66%--
start_secondary
                       |          |          |          |
                       |          |          |           --23.34%-- rest_init
                       |          |          |                     start_kernel
                       |          |          |
x86_64_start_reservations
                       |          |          |
x86_64_start_kernel
                       |          |          |
                       |          |          |--3.54%-- arch_cpu_idle
                       |          |          |          cpu_startup_entry
                       |          |          |          |
                       |          |          |          |--55.28%--
start_secondary
                       |          |          |          |
                       |          |          |           --44.72%-- rest_init
                       |          |          |                     start_kernel
                       |          |          |
x86_64_start_reservations
                       |          |          |
x86_64_start_kernel
                       |          |          |
                       |          |          |--2.10%-- start_secondary
                       |          |          |
                       |          |          |--2.02%-- __schedule
                       |          |          |          schedule
                       |          |          |
schedule_preempt_disabled
                       |          |          |          cpu_startup_entry
                       |          |          |          |
                       |          |          |          |--75.10%--
start_secondary
                       |          |          |          |
                       |          |          |           --24.90%-- rest_init
                       |          |          |                     start_kernel
                       |          |          |
x86_64_start_reservations
                       |          |          |
x86_64_start_kernel
                       |          |          |
                       |          |          |--0.92%-- rest_init
                       |          |          |          start_kernel
                       |          |          |
x86_64_start_reservations
                       |          |          |          x86_64_start_kernel
                       |          |          |
                       |          |          |--0.55%-- ns_to_timeval
                       |          |          |          cpuidle_enter_state
                       |          |          |          cpuidle_idle_call
                       |          |          |          arch_cpu_idle
                       |          |          |          cpu_startup_entry
                       |          |          |          rest_init
                       |          |          |          start_kernel
                       |          |          |
x86_64_start_reservations
                       |          |          |          x86_64_start_kernel
                       |          |           --0.43%-- [...]
                       |           --0.01%-- [...]
                       |
                       |--5.76%-- sch_direct_xmit
                       |          __qdisc_run
                       |          |
                       |          |--99.95%-- dev_queue_xmit
                       |          |          ip_finish_output
                       |          |          ip_output
                       |          |          ip_forward_finish
                       |          |          ip_forward
                       |          |          ip_rcv_finish
                       |          |          ip_rcv
                       |          |          __netif_receive_skb_core
                       |          |          __netif_receive_skb
                       |          |          netif_receive_skb
                       |          |          napi_gro_receive
                       |          |          ixgbe_poll
                       |          |          net_rx_action
                       |          |          __do_softirq
                       |          |          call_softirq
                       |          |          do_softirq
                       |          |          irq_exit
                       |          |          |
                       |          |          |--99.10%-- do_IRQ
                       |          |          |          common_interrupt
                       |          |          |          |
                       |          |          |          |--86.84%--
cpuidle_idle_call
                       |          |          |          |          arch_cpu_idle
                       |          |          |          |
cpu_startup_entry
                       |          |          |          |          |
                       |          |          |          |
|--79.93%-- start_secondary
                       |          |          |          |          |
                       |          |          |          |
--20.07%-- rest_init
                       |          |          |          |
       start_kernel
                       |          |          |          |
       x86_64_start_reservations
                       |          |          |          |
       x86_64_start_kernel
                       |          |          |          |
                       |          |          |          |--12.81%--
cpuidle_enter_state
                       |          |          |          |
cpuidle_idle_call
                       |          |          |          |          arch_cpu_idle
                       |          |          |          |
cpu_startup_entry
                       |          |          |          |          |
                       |          |          |          |
|--78.16%-- start_secondary
                       |          |          |          |          |
                       |          |          |          |
--21.84%-- rest_init
                       |          |          |          |
       start_kernel
                       |          |          |          |
       x86_64_start_reservations
                       |          |          |          |
       x86_64_start_kernel
                       |          |          |           --0.34%-- [...]
                       |          |          |
                       |          |           --0.90%-- smp_apic_timer_interrupt
                       |          |                     apic_timer_interrupt
                       |          |                     |
                       |          |                     |--49.13%--
cpuidle_idle_call
                       |          |                     |          arch_cpu_idle
                       |          |                     |
cpu_startup_entry
                       |          |                     |          |
                       |          |                     |
|--67.34%-- start_secondary
                       |          |                     |          |
                       |          |                     |
--32.66%-- rest_init
                       |          |                     |
       start_kernel
                       |          |                     |
       x86_64_start_reservations
                       |          |                     |
       x86_64_start_kernel
                       |          |                     |
                       |          |                     |--28.97%--
cpu_startup_entry
                       |          |                     |          |
                       |          |                     |
|--55.35%-- rest_init
                       |          |                     |          |
       start_kernel
                       |          |                     |          |
       x86_64_start_reservations
                       |          |                     |          |
       x86_64_start_kernel
                       |          |                     |          |
                       |          |                     |
--44.65%-- start_secondary
                       |          |                     |
                       |          |                     |--5.56%--
cpuidle_enter_state
                       |          |                     |
cpuidle_idle_call
                       |          |                     |          arch_cpu_idle
                       |          |                     |
cpu_startup_entry
                       |          |                     |
start_secondary
                       |          |                     |
                       |          |                     |--5.54%-- __schedule
                       |          |                     |          schedule
                       |          |                     |
schedule_preempt_disabled
                       |          |                     |
cpu_startup_entry
                       |          |                     |
start_secondary
                       |          |                     |
                       |          |                     |--5.45%--
start_secondary
                       |          |                     |
                       |          |                      --5.35%-- arch_cpu_idle
                       |          |
cpu_startup_entry
                       |          |
start_secondary
                       |           --0.05%-- [...]
                       |
                       |--0.70%-- __hrtimer_start_range_ns
                       |          hrtimer_start
                       |          htb_dequeue
                       |          0xffffffffa02e5089
                       |          __qdisc_run
                       |          dev_queue_xmit
                       |          ip_finish_output
                       |          ip_output
                       |          ip_forward_finish
                       |          ip_forward
                       |          ip_rcv_finish
                       |          ip_rcv
                       |          __netif_receive_skb_core
                       |          __netif_receive_skb
                       |          netif_receive_skb
                       |          napi_gro_receive
                       |          ixgbe_poll
                       |          net_rx_action
                       |          __do_softirq
                       |          call_softirq
                       |          do_softirq
                       |          irq_exit
                       |          |
                       |          |--98.36%-- do_IRQ
                       |          |          common_interrupt
                       |          |          |
                       |          |          |--83.48%-- cpuidle_idle_call
                       |          |          |          arch_cpu_idle
                       |          |          |          cpu_startup_entry
                       |          |          |          |
                       |          |          |          |--79.48%--
start_secondary
                       |          |          |          |
                       |          |          |           --20.52%-- rest_init
                       |          |          |                     start_kernel
                       |          |          |
x86_64_start_reservations
                       |          |          |
x86_64_start_kernel
                       |          |          |
                       |          |          |--15.24%-- cpuidle_enter_state
                       |          |          |          cpuidle_idle_call
                       |          |          |          arch_cpu_idle
                       |          |          |          cpu_startup_entry
                       |          |          |          |
                       |          |          |          |--91.68%--
start_secondary
                       |          |          |          |
                       |          |          |           --8.32%-- rest_init
                       |          |          |                     start_kernel
                       |          |          |
x86_64_start_reservations
                       |          |          |
x86_64_start_kernel
                       |          |           --1.28%-- [...]
                       |          |
                       |           --1.64%-- smp_apic_timer_interrupt
                       |                     apic_timer_interrupt
                       |                     |
                       |                     |--75.50%-- cpuidle_idle_call
                       |                     |          arch_cpu_idle
                       |                     |          cpu_startup_entry
                       |                     |          start_secondary
                       |                     |
                       |                      --24.50%-- cpu_startup_entry
                       |                                rest_init
                       |                                start_kernel



> And possibly it would be nice if you send your tc script so that we can
> check ;)
>
>

#!/sbin/tc -b
 #generated by script

 #internal networks iface (customers) - eth0
 #external iface - eth1

 qdisc add dev eth0 root handle 10: multiq

 #htb qdisc with root and default classes per hw-queue

 qdisc add dev eth0 parent 10:1 handle 11: htb default 2
 class add dev eth0 parent 11: classid 11:1 htb rate 1250Mbit
 class add dev eth0 parent 11:1 classid 11:2 htb rate 125Mbit ceil 1250Mbit
 qdisc add dev eth0 parent 10:2 handle 12: htb default 2
 class add dev eth0 parent 12: classid 12:1 htb rate 1250Mbit
 class add dev eth0 parent 12:1 classid 12:2 htb rate 125Mbit ceil 1250Mbit
 qdisc add dev eth0 parent 10:3 handle 13: htb default 2
 class add dev eth0 parent 13: classid 13:1 htb rate 1250Mbit
 class add dev eth0 parent 13:1 classid 13:2 htb rate 125Mbit ceil 1250Mbit
 qdisc add dev eth0 parent 10:4 handle 14: htb default 2
 class add dev eth0 parent 14: classid 14:1 htb rate 1250Mbit
 class add dev eth0 parent 14:1 classid 14:2 htb rate 125Mbit ceil 1250Mbit
 qdisc add dev eth0 parent 10:5 handle 15: htb default 2
 class add dev eth0 parent 15: classid 15:1 htb rate 1250Mbit
 class add dev eth0 parent 15:1 classid 15:2 htb rate 125Mbit ceil 1250Mbit
 qdisc add dev eth0 parent 10:6 handle 16: htb default 2
 class add dev eth0 parent 16: classid 16:1 htb rate 1250Mbit
 class add dev eth0 parent 16:1 classid 16:2 htb rate 125Mbit ceil 1250Mbit
 qdisc add dev eth0 parent 10:7 handle 17: htb default 2
 class add dev eth0 parent 17: classid 17:1 htb rate 1250Mbit
 class add dev eth0 parent 17:1 classid 17:2 htb rate 125Mbit ceil 1250Mbit
 qdisc add dev eth0 parent 10:8 handle 18: htb default 2
 class add dev eth0 parent 18: classid 18:1 htb rate 1250Mbit
 class add dev eth0 parent 18:1 classid 18:2 htb rate 125Mbit ceil 1250Mbit

 qdisc add dev eth1 root handle 10: multiq
 qdisc add dev eth1 parent 10:1 handle 11: htb default 2
 class add dev eth1 parent 11: classid 11:1 htb rate 1250Mbit
 class add dev eth1 parent 11:1 classid 11:2 htb rate 125Mbit ceil 1250Mbit
 qdisc add dev eth1 parent 10:2 handle 12: htb default 2
 class add dev eth1 parent 12: classid 12:1 htb rate 1250Mbit
 class add dev eth1 parent 12:1 classid 12:2 htb rate 125Mbit ceil 1250Mbit
 qdisc add dev eth1 parent 10:3 handle 13: htb default 2
 class add dev eth1 parent 13: classid 13:1 htb rate 1250Mbit
 class add dev eth1 parent 13:1 classid 13:2 htb rate 125Mbit ceil 1250Mbit
 qdisc add dev eth1 parent 10:4 handle 14: htb default 2
 class add dev eth1 parent 14: classid 14:1 htb rate 1250Mbit
 class add dev eth1 parent 14:1 classid 14:2 htb rate 125Mbit ceil 1250Mbit
 qdisc add dev eth1 parent 10:5 handle 15: htb default 2
 class add dev eth1 parent 15: classid 15:1 htb rate 1250Mbit
 class add dev eth1 parent 15:1 classid 15:2 htb rate 125Mbit ceil 1250Mbit
 qdisc add dev eth1 parent 10:6 handle 16: htb default 2
 class add dev eth1 parent 16: classid 16:1 htb rate 1250Mbit
 class add dev eth1 parent 16:1 classid 16:2 htb rate 125Mbit ceil 1250Mbit
 qdisc add dev eth1 parent 10:7 handle 17: htb default 2
 class add dev eth1 parent 17: classid 17:1 htb rate 1250Mbit
 class add dev eth1 parent 17:1 classid 17:2 htb rate 125Mbit ceil 1250Mbit
 qdisc add dev eth1 parent 10:8 handle 18: htb default 2
 class add dev eth1 parent 18: classid 18:1 htb rate 1250Mbit
 class add dev eth1 parent 18:1 classid 18:2 htb rate 125Mbit ceil 1250Mbit

 #one leaf class with pfifo qdisc per customer
 class add dev eth0 parent 16:1 classid 16:1237 htb rate 1024kbit
 qdisc add dev eth0 parent 16:1237 handle 1237 pfifo limit 50
 class add dev eth1 parent 16:1 classid 16:1237 htb rate 1024kbit
 qdisc add dev eth1 parent 16:1237 handle 1237 pfifo limit 50
 class add dev eth0 parent 15:1 classid 15:1244 htb rate 512kbit
 qdisc add dev eth0 parent 15:1244 handle 1244 pfifo limit 50
 class add dev eth1 parent 15:1 classid 15:1244 htb rate 512kbit
 qdisc add dev eth1 parent 15:1244 handle 1244 pfifo limit 50
 class add dev eth0 parent 18:1 classid 18:1191 htb rate 4096kbit
 qdisc add dev eth0 parent 18:1191 handle 1191 pfifo limit 50
 class add dev eth1 parent 18:1 classid 18:1191 htb rate 4096kbit
 qdisc add dev eth1 parent 18:1191 handle 1191 pfifo limit 50
 class add dev eth0 parent 12:1 classid 12:1193 htb rate 40960kbit
 qdisc add dev eth0 parent 12:1193 handle 1193 pfifo limit 50
 class add dev eth1 parent 12:1 classid 12:1193 htb rate 40960kbit
 qdisc add dev eth1 parent 12:1193 handle 1193 pfifo limit 50
 class add dev eth0 parent 13:1 classid 13:1194 htb rate 2048kbit
 qdisc add dev eth0 parent 13:1194 handle 1194 pfifo limit 50
 ...skip several hundreds line...

#classifier on u32 filter with hashing.

#my own network
 filter add dev eth0 protocol ip prio 5 parent 10:0 handle ::1 u32
match ip src 87.244.0.0/24 classid 11:3 action skbedit queue_mapping 1
priority 11:2
 filter add dev eth1 protocol ip prio 5 parent 10:0 handle ::1 u32
match ip dst 87.244.0.0/24 classid 11:3 action skbedit queue_mapping 1
priority 11:2

#hash table per subnet
# 217
 filter add dev eth0 protocol ip prio 5 parent 10:0 handle 100: u32 divisor 256
 filter add dev eth1 protocol ip prio 5 parent 10:0 handle 100: u32 divisor 256
# 10.
 filter add dev eth0 protocol ip prio 5 parent 10:0 handle 200: u32 divisor 256
 filter add dev eth1 protocol ip prio 5 parent 10:0 handle 200: u32 divisor 256
# 87.244
 filter add dev eth0 protocol ip prio 5 parent 10:0 handle 400: u32 divisor 256
 filter add dev eth1 protocol ip prio 5 parent 10:0 handle 400: u32 divisor 256
# 195.208.174
 filter add dev eth0 protocol ip prio 5 parent 10:0 handle 500: u32 divisor 256
 filter add dev eth1 protocol ip prio 5 parent 10:0 handle 500: u32 divisor 256

 filter add dev eth0 protocol ip prio 5 parent 10:0 u32 ht 800:: match
ip dst 217.170.112.0/20 hashkey mask 0x0000ff00 at 16 link 100:
 filter add dev eth1 protocol ip prio 5 parent 10:0 u32 ht 800:: match
ip src 217.170.112.0/20 hashkey mask 0x0000ff00 at 12 link 100:
 filter add dev eth0 protocol ip prio 5 parent 10:0 u32 ht 800:: match
ip dst 87.244.0.0/18 hashkey mask 0x0000ff00 at 16 link 400:
 filter add dev eth1 protocol ip prio 5 parent 10:0 u32 ht 800:: match
ip src 87.244.0.0/18 hashkey mask 0x0000ff00 at 12 link 400:
 filter add dev eth0 protocol ip prio 5 parent 10:0 u32 ht 800:: match
ip dst 10.245.0.0/22 hashkey mask 0x0000ff00 at 16 link 200:
 filter add dev eth1 protocol ip prio 5 parent 10:0 u32 ht 800:: match
ip src 10.245.0.0/22 hashkey mask 0x0000ff00 at 12 link 200:
 filter add dev eth0 protocol ip prio 5 parent 10:0 u32 ht 800:: match
ip dst 195.208.174.0/24 hashkey mask 0x0000ff00 at 16 link 500:
 filter add dev eth1 protocol ip prio 5 parent 10:0 u32 ht 800:: match
ip src 195.208.174.0/24 hashkey mask 0x0000ff00 at 12 link 500:

 filter add dev eth0 protocol ip prio 5 parent 10:0 handle 1: u32 divisor 256
 filter add dev eth1 protocol ip prio 5 parent 10:0 handle 1: u32 divisor 256
 filter add dev eth0 protocol ip prio 5 parent 10:0 u32 ht 400:0:
match ip dst 87.244.0.0/24 hashkey mask 0x000000ff at 16 link 1:
 filter add dev eth1 protocol ip prio 5 parent 10:0 u32 ht 400:0:
match ip src 87.244.0.0/24 hashkey mask 0x000000ff at 12 link 1:
 filter add dev eth0 protocol ip prio 5 parent 10:0 handle 2: u32 divisor 256
 filter add dev eth1 protocol ip prio 5 parent 10:0 handle 2: u32 divisor 256
 filter add dev eth0 protocol ip prio 5 parent 10:0 u32 ht 400:1:
match ip dst 87.244.1.0/24 hashkey mask 0x000000ff at 16 link 2:
 filter add dev eth1 protocol ip prio 5 parent 10:0 u32 ht 400:1:
match ip src 87.244.1.0/24 hashkey mask 0x000000ff at 12 link 2:

#fill the list of filters. one filter per ip (need to optimize! should
be filter per customer's subnet!)
#set priority! otherwise classifying is lost on enter to HTB qdisc!
 filter add dev eth0 protocol ip prio 5 parent 10:0 u32 ht 2:4: match
ip dst 87.244.1.4 classid 18:2911 action skbedit queue_mapping 7
priority 18:2911
 filter add dev eth1 protocol ip prio 5 parent 10:0 u32 ht 2:4: match
ip src 87.244.1.4 classid 18:2911 action skbedit queue_mapping 7
priority 18:2911
 filter add dev eth0 protocol ip prio 5 parent 10:0 u32 ht 2:5: match
ip dst 87.244.1.5 classid 18:2911 action skbedit queue_mapping 7
priority 18:2911
 filter add dev eth1 protocol ip prio 5 parent 10:0 u32 ht 2:5: match
ip src 87.244.1.5 classid 18:2911 action skbedit queue_mapping 7
priority 18:2911
 filter add dev eth0 protocol ip prio 5 parent 10:0 u32 ht 2:6: match
ip dst 87.244.1.6 classid 18:2911 action skbedit queue_mapping 7
priority 18:2911
 filter add dev eth1 protocol ip prio 5 parent 10:0 u32 ht 2:6: match
ip src 87.244.1.6 classid 18:2911 action skbedit queue_mapping 7
priority 18:2911
 filter add dev eth0 protocol ip prio 5 parent 10:0 u32 ht 2:7: match
ip dst 87.244.1.7 classid 18:2911 action skbedit queue_mapping 7
priority 18:2911
 filter add dev eth1 protocol ip prio 5 parent 10:0 u32 ht 2:7: match
ip src 87.244.1.7 classid 18:2911 action skbedit queue_mapping 7
priority 18:2911
 filter add dev eth0 protocol ip prio 5 parent 10:0 u32 ht 2:c: match
ip dst 87.244.1.12 classid 13:3306 action skbedit queue_mapping 2
priority 13:3306
 filter add dev eth1 protocol ip prio 5 parent 10:0 u32 ht 2:c: match
ip src 87.244.1.12 classid 13:3306 action skbedit queue_mapping 2
priority 13:3306
 filter add dev eth0 protocol ip prio 5 parent 10:0 u32 ht 2:d: match
ip dst 87.244.1.13 classid 13:3306 action skbedit queue_mapping 2
priority 13:3306
 filter add dev eth1 protocol ip prio 5 parent 10:0 u32 ht 2:d: match
ip src 87.244.1.13 classid 13:3306 action skbedit queue_mapping 2
priority 13:3306
 filter add dev eth0 protocol ip prio 5 parent 10:0 u32 ht 2:e: match
ip dst 87.244.1.14 classid 13:3306 action skbedit queue_mapping 2
priority 13:3306
...skip...


--
Anton.


-- 
Anton.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Using HTB over MultiQ
  2013-11-08 15:11           ` John Fastabend
  2013-11-08 15:53             ` Anton 'EvilMan' Danilov
@ 2013-11-08 17:55             ` Eric Dumazet
  2013-11-08 20:01               ` Eric Dumazet
  1 sibling, 1 reply; 20+ messages in thread
From: Eric Dumazet @ 2013-11-08 17:55 UTC (permalink / raw)
  To: John Fastabend; +Cc: Anton 'EvilMan' Danilov, netdev

On Fri, 2013-11-08 at 07:11 -0800, John Fastabend wrote:

> 
> perf would be interesting but note that multiq still uses the root qdisc
> lock which you original stated you were trying to avoid.
> 
> mq and mqprio are really the only two existing qdiscs that work well for
> performance with small packet sizes and multiqueue nics at least in my
> experience/setup with these kinds of micro-benchmarks.

Right, but can we actually use HTB on MQ ?

I do not think so : following does not work.

for ETH in eth0
do
 tc qd del dev $ETH root 2>/dev/null

 tc qd add dev $ETH root handle 100: mq 
 for i in `seq 1 4`
 do
  tc qd add dev $ETH parent 100:$i handle $i htb
 done
done

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Using HTB over MultiQ
  2013-11-08 17:55             ` Eric Dumazet
@ 2013-11-08 20:01               ` Eric Dumazet
  0 siblings, 0 replies; 20+ messages in thread
From: Eric Dumazet @ 2013-11-08 20:01 UTC (permalink / raw)
  To: John Fastabend; +Cc: Anton 'EvilMan' Danilov, netdev

On Fri, 2013-11-08 at 09:55 -0800, Eric Dumazet wrote:

> I do not think so : following does not work.
> 
> for ETH in eth0
> do
>  tc qd del dev $ETH root 2>/dev/null
> 
>  tc qd add dev $ETH root handle 100: mq 
>  for i in `seq 1 4`
>  do
>   tc qd add dev $ETH parent 100:$i handle $i htb
>  done
> done

Humpf, I forgot a "default 1" on htb qdisc creation, sorry for the
noise ;)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Using HTB over MultiQ
  2013-11-08 15:53             ` Anton 'EvilMan' Danilov
@ 2013-11-08 21:56               ` Eric Dumazet
  2013-11-08 23:11                 ` John Fastabend
  0 siblings, 1 reply; 20+ messages in thread
From: Eric Dumazet @ 2013-11-08 21:56 UTC (permalink / raw)
  To: Anton 'EvilMan' Danilov; +Cc: John Fastabend, netdev

On Fri, 2013-11-08 at 19:53 +0400, Anton 'EvilMan' Danilov wrote:

> Hm.. I think I need another way to classify packets for using of mq
> qdisc, because it don't support classify at root (not supported error
> on try attach filter).

I think it might be time to add filters on the device, and run them
without any lock (rcu protection only)

John, I think you had some previous work ?

> On next week i'll try to write ipset extension for setting of priority
> and queue_mapping value through the IPSET netfilter target.

Sounds a nice idea as well.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Using HTB over MultiQ
  2013-11-08 21:56               ` Eric Dumazet
@ 2013-11-08 23:11                 ` John Fastabend
  0 siblings, 0 replies; 20+ messages in thread
From: John Fastabend @ 2013-11-08 23:11 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Anton 'EvilMan' Danilov, netdev

On 11/8/2013 1:56 PM, Eric Dumazet wrote:
> On Fri, 2013-11-08 at 19:53 +0400, Anton 'EvilMan' Danilov wrote:
>
>> Hm.. I think I need another way to classify packets for using of mq
>> qdisc, because it don't support classify at root (not supported error
>> on try attach filter).
>
> I think it might be time to add filters on the device, and run them
> without any lock (rcu protection only)
>
> John, I think you had some previous work ?

Yes, I have a net-next fork with this but I currently introduced a bug
in the u32 classifier. I'll resurrect this next week and send something
out. We can take a look and see if its useful.

>
>> On next week i'll try to write ipset extension for setting of priority
>> and queue_mapping value through the IPSET netfilter target.
>
> Sounds a nice idea as well.
>

Another thought would be to put a netfilter hook above the qdisc and on
ingress (not in the bridge module). I've been meaning to take a look
at nftables I'm not sure how much duplication exists between nftables
and tc classifiers at this point.

>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2013-11-08 23:11 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-11-07 13:12 Using HTB over MultiQ Anton 'EvilMan' Danilov
2013-11-07 13:49 ` Sergey Popovich
2013-11-07 16:13   ` Eric Dumazet
2013-11-07 14:11 ` Eric Dumazet
2013-11-07 14:20   ` Eric Dumazet
2013-11-07 14:39     ` John Fastabend
2013-11-07 14:54       ` Eric Dumazet
2013-11-07 15:06         ` John Fastabend
2013-11-07 15:17           ` John Fastabend
2013-11-07 16:10             ` Eric Dumazet
2013-11-08 14:53       ` Anton 'EvilMan' Danilov
2013-11-08 15:07         ` Eric Dumazet
2013-11-08 15:11           ` John Fastabend
2013-11-08 15:53             ` Anton 'EvilMan' Danilov
2013-11-08 21:56               ` Eric Dumazet
2013-11-08 23:11                 ` John Fastabend
2013-11-08 17:55             ` Eric Dumazet
2013-11-08 20:01               ` Eric Dumazet
     [not found]           ` <CAEzD07LmzCtVWM4wnq57N+NfqDUK3bLWDisSceyPfg4MiWz5=Q@mail.gmail.com>
2013-11-08 16:11             ` Fwd: " Anton 'EvilMan' Danilov
2013-11-07 14:29 ` Eric Dumazet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).