netem and hierarchical ingress traffic shaping

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* netem and hierarchical ingress traffic shaping
@ 2011-12-18  5:12 John A. Sullivan III
  2011-12-18 19:55 ` Stephen Hemminger
  0 siblings, 1 reply; 22+ messages in thread
From: John A. Sullivan III @ 2011-12-18  5:12 UTC (permalink / raw)
  To: netdev

Hello, all.  I am having some delightful success building a test WAN
environment using hfsc and netem.  We have placed netem on both ingress
and egress for various test environment reasons.  We are also using hfsc
on both ingress and egress for traffic shaping.

Since netem appears to be classless, we realized we would need to
replace the SFQ on each leaf with netem which we really didn't want to
do - not only to not lose SFQ but because we didn't want to maintain the
netem parameters on each leaf.  So, we activated our ifb1 interface,
placed netem on it and redirected all the egress traffic to ifb1.  Taht
worked fine.

However, how do we do this on the ingress? To use hfsc on ingres, we are
already redirecting to ifb0.  We can't redirect ifb0 to ifb1.  If we
apply multiple filters, one to redirect into ifb0 and the other to ifb1,
only the first match takes effect.  So we are living with replacing SFQ
on the ifb0 leaves with netem.

Is there any other way? Thanks - John

In case it is of interest, here is the set of rules we are using:

tc qdisc add dev eth1 root handle 1: hfsc default 20
tc class add dev eth1 parent 1: classid 1:1 hfsc sc rate 1490kbit ul rate 1490kbit
tc class add dev eth1 parent 1:1 classid 1:20 hfsc rt rate 400kbit ls rate 200kbit
tc qdisc add dev eth1 parent 1:20 handle 1201 sfq perturb 10
tc class add dev eth1 parent 1:1 classid 1:10 hfsc rt umax 16kbit dmax 50ms rate 200kbit ls rate 1000kbit
tc qdisc add dev eth1 parent 1:10 handle 1101 sfq perturb 60
tc class add dev eth1 parent 1:1 classid 1:30 hfsc rt umax 1514b dmax 20ms rate 20kbit
tc qdisc add dev eth1 parent 1:30 handle 1301 sfq perturb 60
iptables -t mangle -A POSTROUTING -p 6 --syn --dport 443 -j CONNMARK --set-mark 0x10
iptables -t mangle -A PREROUTING -p 6 --syn --dport 822 -j CONNMARK --set-mark 0x11
iptables -t mangle -A POSTROUTING -o eth1 -p 6 -j CONNMARK --restore-mark
tc filter add dev eth1 parent 1:0 protocol ip prio 10 handle 0x10 fw flowid 1:10
tc filter add dev eth1 parent 1:0 protocol ip prio 10 handle 0x11 fw flowid 1:30
tc qdisc add dev eth1 ingress
modprobe ifb
ifconfig ifb0 up
tc filter add dev eth1 parent ffff: protocol ip prio 50 u32 match u32 0 0 action mirred egress redirect dev ifb0
tc qdisc add dev ifb0 root handle 1: hfsc default 20
tc class add dev ifb0 parent 1: classid 1:1 hfsc sc rate 1490kbit ul rate 1490kbit
tc class add dev ifb0 parent 1:1 classid 1:20 hfsc rt rate 400kbit ls rate 200kbit
tc qdisc add dev ifb0 parent 1:20 handle 1201 netem delay 25ms 5ms distribution normal loss 0.1% 30%
tc class add dev ifb0 parent 1:1 classid 1:10 hfsc rt umax 16kbit dmax 50ms rate 200kbit ls rate 1000kbit
tc qdisc add dev ifb0 parent 1:10 handle 1101 netem delay 25ms 5ms distribution normal loss 0.1% 30%
tc class add dev ifb0 parent 1:1 classid 1:30 hfsc rt umax 1514b dmax 20ms rate 20kbit
tc qdisc add dev ifb0 parent 1:30 handle 1301 netem delay 25ms 5ms distribution normal loss 0.1% 30%
tc filter add dev ifb0 parent 1:0 protocol ip prio 1 handle 6: u32 divisor 1
tc filter add dev ifb0 parent 1:0 protocol ip prio 1 u32 match ip protocol 6 0xff link 6: offset at 0 mask 0x0f00 shift 6 plus 0 eat
tc filter add dev ifb0 parent 1:0 protocol ip prio 1 u32 ht 6:0 match tcp src 443 0x00ff flowid 1:10
tc filter add dev ifb0 parent 1:0 protocol ip prio 1 u32 ht 6:0 match tcp dst 822 0xff00 flowid 1:30
ifconfig ifb1 up
tc qdisc add dev ifb1 root handle 2 netem delay 25ms 5ms distribution normal loss 0.1% 30%
tc filter add dev eth1 parent 1:0 protocol ip prio 1 u32 match u32 0 0 action mirred egress redirect dev ifb1

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: netem and hierarchical ingress traffic shaping
  2011-12-18  5:12 netem and hierarchical ingress traffic shaping John A. Sullivan III
@ 2011-12-18 19:55 ` Stephen Hemminger
  2011-12-19 16:53   ` John A. Sullivan III
  2011-12-23 17:33   ` Eric Dumazet
  0 siblings, 2 replies; 22+ messages in thread
From: Stephen Hemminger @ 2011-12-18 19:55 UTC (permalink / raw)
  To: John A. Sullivan III; +Cc: netdev

On Sun, 18 Dec 2011 00:12:12 -0500
"John A. Sullivan III" <jsullivan@opensourcedevel.com> wrote:

> Since netem appears to be classless, we realized we would need to
> replace the SFQ on each leaf with netem which we really didn't want to
> do - not only to not lose SFQ but because we didn't want to maintain the
> netem parameters on each leaf.  So, we activated our ifb1 interface,
> placed netem on it and redirected all the egress traffic to ifb1.  Taht
> worked fine.

Current versions of netem can take one class.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: netem and hierarchical ingress traffic shaping
  2011-12-18 19:55 ` Stephen Hemminger
@ 2011-12-19 16:53   ` John A. Sullivan III
  2011-12-23 17:33   ` Eric Dumazet
  1 sibling, 0 replies; 22+ messages in thread
From: John A. Sullivan III @ 2011-12-19 16:53 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev

On Sun, 2011-12-18 at 11:55 -0800, Stephen Hemminger wrote:
> On Sun, 18 Dec 2011 00:12:12 -0500
> "John A. Sullivan III" <jsullivan@opensourcedevel.com> wrote:
> 
> > Since netem appears to be classless, we realized we would need to
> > replace the SFQ on each leaf with netem which we really didn't want to
> > do - not only to not lose SFQ but because we didn't want to maintain the
> > netem parameters on each leaf.  So, we activated our ifb1 interface,
> > placed netem on it and redirected all the egress traffic to ifb1.  Taht
> > worked fine.
> 
> Current versions of netem can take one class.
<snip>
Thanks but how would that work? For example, I assume we could not send
netem processed packets to SFQ because SFQ is classless.  We tried
anyway and got:

root@testswitch01:~# tc class add dev ifb0 parent 1201: classid 1202 sfq
perturb 60
Error: Qdisc "sfq" is classless.
root@testswitch01:~# tc qdisc add dev ifb0 parent 1201: handle 1202 sfq
perturb 60
RTNETLINK answers: Operation not supported

1201 is a netem qdisc at the end of hfsc.

Not being quite sure what you meant, we thought we experiment before
asking so we tried to attach an hfsc class to the netem qdisc.  We
didn't think it would work and it didn't:

root@testswitch01:~# tc qdisc add dev eth3 root handle 7 netem delay
25ms 5ms distribution normal loss 0.1% 30%
root@testswitch01:~# tc class add dev eth3 parent 7: classid 7:1 hfsc sc
rate 1490kbit ul rate 1490kbit
RTNETLINK answers: Invalid argument

So we tried adding the hfsc qdisc first:
root@testswitch01:~# tc qdisc add dev eth3 parent 7: handle 7:1 hfsc
default 70
RTNETLINK answers: File exists

Then, just for kicks, we tried adding an SFQ qdisc in case the problem
with the previous attempts was that the netem qdisc was not at the top
level:

root@testswitch01:~# tc qdisc add dev eth3 parent 7: handle 7:1 sfq
perturb 60
RTNETLINK answers: File exists

So I'm still not sure how we can do anything other than stick netem on
the end of one of our branches or at root.  Thanks - John

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: netem and hierarchical ingress traffic shaping
  2011-12-18 19:55 ` Stephen Hemminger
  2011-12-19 16:53   ` John A. Sullivan III
@ 2011-12-23 17:33   ` Eric Dumazet
  2011-12-23 17:39     ` Eric Dumazet
  1 sibling, 1 reply; 22+ messages in thread
From: Eric Dumazet @ 2011-12-23 17:33 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: John A. Sullivan III, netdev

Le dimanche 18 décembre 2011 à 11:55 -0800, Stephen Hemminger a écrit :
> On Sun, 18 Dec 2011 00:12:12 -0500
> "John A. Sullivan III" <jsullivan@opensourcedevel.com> wrote:
> 
> > Since netem appears to be classless, we realized we would need to
> > replace the SFQ on each leaf with netem which we really didn't want to
> > do - not only to not lose SFQ but because we didn't want to maintain the
> > netem parameters on each leaf.  So, we activated our ifb1 interface,
> > placed netem on it and redirected all the egress traffic to ifb1.  Taht
> > worked fine.
> 
> Current versions of netem can take one class.
> --

Hmm, I can see that (commit 10f6dfcfde884441) 

But it wont work very well, it assumes qdisc uses a single queue if
netem reordering is requested :

(__skb_queue_head(&q->qdisc->q, skb))

We should allow reordering if netem queue is changed from tfifo, only if
new qdisc is compatible with __skb_queue_head(&q->qdisc->q, skb)

(maybe providing a new ->queue_at_head() new ops)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: netem and hierarchical ingress traffic shaping
  2011-12-23 17:33   ` Eric Dumazet
@ 2011-12-23 17:39     ` Eric Dumazet
  2011-12-23 17:54       ` Dave Taht
  0 siblings, 1 reply; 22+ messages in thread
From: Eric Dumazet @ 2011-12-23 17:39 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: John A. Sullivan III, netdev

Le vendredi 23 décembre 2011 à 18:33 +0100, Eric Dumazet a écrit :
> Le dimanche 18 décembre 2011 à 11:55 -0800, Stephen Hemminger a écrit :
> > On Sun, 18 Dec 2011 00:12:12 -0500
> > "John A. Sullivan III" <jsullivan@opensourcedevel.com> wrote:
> > 
> > > Since netem appears to be classless, we realized we would need to
> > > replace the SFQ on each leaf with netem which we really didn't want to
> > > do - not only to not lose SFQ but because we didn't want to maintain the
> > > netem parameters on each leaf.  So, we activated our ifb1 interface,
> > > placed netem on it and redirected all the egress traffic to ifb1.  Taht
> > > worked fine.
> > 
> > Current versions of netem can take one class.
> > --
> 
> Hmm, I can see that (commit 10f6dfcfde884441) 
> 
> But it wont work very well, it assumes qdisc uses a single queue if
> netem reordering is requested :
> 
> (__skb_queue_head(&q->qdisc->q, skb))
> 
> We should allow reordering if netem queue is changed from tfifo, only if
> new qdisc is compatible with __skb_queue_head(&q->qdisc->q, skb)
> 
> (maybe providing a new ->queue_at_head() new ops)

Also, child qdisc must not scratch skb->cb[], since netem stores
time_to_send in it.

I guess nobody actually tried this netem mis-feature :(

I'll send a fix.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: netem and hierarchical ingress traffic shaping
  2011-12-23 17:39     ` Eric Dumazet
@ 2011-12-23 17:54       ` Dave Taht
  2011-12-23 18:28         ` Eric Dumazet
  0 siblings, 1 reply; 22+ messages in thread
From: Dave Taht @ 2011-12-23 17:54 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Stephen Hemminger, John A. Sullivan III, netdev

On Fri, Dec 23, 2011 at 6:39 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le vendredi 23 décembre 2011 à 18:33 +0100, Eric Dumazet a écrit :
>> Le dimanche 18 décembre 2011 à 11:55 -0800, Stephen Hemminger a écrit :
>> > On Sun, 18 Dec 2011 00:12:12 -0500
>> > "John A. Sullivan III" <jsullivan@opensourcedevel.com> wrote:
>> >
>> > > Since netem appears to be classless, we realized we would need to
>> > > replace the SFQ on each leaf with netem which we really didn't want to
>> > > do - not only to not lose SFQ but because we didn't want to maintain the
>> > > netem parameters on each leaf.  So, we activated our ifb1 interface,
>> > > placed netem on it and redirected all the egress traffic to ifb1.  Taht
>> > > worked fine.
>> >
>> > Current versions of netem can take one class.
>> > --
>>
>> Hmm, I can see that (commit 10f6dfcfde884441)
>>
>> But it wont work very well, it assumes qdisc uses a single queue if
>> netem reordering is requested :
>>
>> (__skb_queue_head(&q->qdisc->q, skb))
>>
>> We should allow reordering if netem queue is changed from tfifo, only if
>> new qdisc is compatible with __skb_queue_head(&q->qdisc->q, skb)
>>
>> (maybe providing a new ->queue_at_head() new ops)
>
> Also, child qdisc must not scratch skb->cb[], since netem stores
> time_to_send in it.

Are there any place where all 48 bytes of cb are used?

I wouldn't mind if 'time_to_send' became a separate skb field
for a more generic 'time_in_queue'...

> I guess nobody actually tried this netem mis-feature :(


>
> I'll send a fix.
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Dave Täht
SKYPE: davetaht
US Tel: 1-239-829-5608
FR Tel: 0638645374
http://www.bufferbloat.net

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: netem and hierarchical ingress traffic shaping
  2011-12-23 17:54       ` Dave Taht
@ 2011-12-23 18:28         ` Eric Dumazet
  2011-12-23 18:54           ` Dave Taht
  2011-12-23 19:07           ` Stephen Hemminger
  0 siblings, 2 replies; 22+ messages in thread
From: Eric Dumazet @ 2011-12-23 18:28 UTC (permalink / raw)
  To: Dave Taht; +Cc: Stephen Hemminger, John A. Sullivan III, netdev

Le vendredi 23 décembre 2011 à 18:54 +0100, Dave Taht a écrit :

> Are there any place where all 48 bytes of cb are used?
> 

Yes, but on in qdisc layer.

struct tcp_skb_cb is known to be 44 bytes (when IPv6 is enabled)

In qdisc layer, we use a small part of it, for the moment.

> I wouldn't mind if 'time_to_send' became a separate skb field
> for a more generic 'time_in_queue'...
> 

This wont happen.

As I posted in an earlier patch, this can be added in "struct
qdisc_skb_cb"

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: netem and hierarchical ingress traffic shaping
  2011-12-23 18:28         ` Eric Dumazet
@ 2011-12-23 18:54           ` Dave Taht
  2011-12-23 19:07           ` Stephen Hemminger
  1 sibling, 0 replies; 22+ messages in thread
From: Dave Taht @ 2011-12-23 18:54 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Stephen Hemminger, John A. Sullivan III, netdev

On Fri, Dec 23, 2011 at 7:28 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le vendredi 23 décembre 2011 à 18:54 +0100, Dave Taht a écrit :
>
>> Are there any place where all 48 bytes of cb are used?
>>
>
> Yes, but on in qdisc layer.
>
> struct tcp_skb_cb is known to be 44 bytes (when IPv6 is enabled)
>
> In qdisc layer, we use a small part of it, for the moment.
>
>> I wouldn't mind if 'time_to_send' became a separate skb field
>> for a more generic 'time_in_queue'...
>>
>
> This wont happen.
>
> As I posted in an earlier patch, this can be added in "struct
> qdisc_skb_cb"

Ah... this patch?

http://patchwork.ozlabs.org/patch/125329/

I liked what I saw then, but nobody chirped up to review it...

I guess I know what I'm doing this weekend.

>
>
>



-- 
Dave Täht
SKYPE: davetaht
US Tel: 1-239-829-5608
FR Tel: 0638645374
http://www.bufferbloat.net

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: netem and hierarchical ingress traffic shaping
  2011-12-23 18:28         ` Eric Dumazet
  2011-12-23 18:54           ` Dave Taht
@ 2011-12-23 19:07           ` Stephen Hemminger
  2011-12-23 19:21             ` Eric Dumazet
  2011-12-23 19:36             ` netem and hierarchical ingress traffic shaping David Miller
  1 sibling, 2 replies; 22+ messages in thread
From: Stephen Hemminger @ 2011-12-23 19:07 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Dave Taht, John A. Sullivan III, netdev

On Fri, 23 Dec 2011 19:28:27 +0100
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> Le vendredi 23 décembre 2011 à 18:54 +0100, Dave Taht a écrit :
> 
> > Are there any place where all 48 bytes of cb are used?
> > 
> 
> Yes, but on in qdisc layer.
> 
> struct tcp_skb_cb is known to be 44 bytes (when IPv6 is enabled)
> 
> In qdisc layer, we use a small part of it, for the moment.
> 
> > I wouldn't mind if 'time_to_send' became a separate skb field
> > for a more generic 'time_in_queue'...
> > 
> 
> This wont happen.
> 
> As I posted in an earlier patch, this can be added in "struct
> qdisc_skb_cb"
> 
> 
> 

skb_cb is the dumping ground of the networking layer.
The assumption was that the qdisc could use the skb_cb
for it's own scratchpad. Netem is using it for tagging
packets in the queue. 

So basically, netem, choke, and sfb are incompatible with
each other. This is not that bad, why not add a flag to qdisc
ops to indicate which qdisc are using cb and block user from
trying to do something bogus.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: netem and hierarchical ingress traffic shaping
  2011-12-23 19:07           ` Stephen Hemminger
@ 2011-12-23 19:21             ` Eric Dumazet
  2011-12-29  4:26               ` [PATCH net-next] netem: fix classful handling Eric Dumazet
  2011-12-23 19:36             ` netem and hierarchical ingress traffic shaping David Miller
  1 sibling, 1 reply; 22+ messages in thread
From: Eric Dumazet @ 2011-12-23 19:21 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Dave Taht, John A. Sullivan III, netdev

Le vendredi 23 décembre 2011 à 11:07 -0800, Stephen Hemminger a écrit :

> skb_cb is the dumping ground of the networking layer.
> The assumption was that the qdisc could use the skb_cb
> for it's own scratchpad. Netem is using it for tagging
> packets in the queue. 
> 
> So basically, netem, choke, and sfb are incompatible with
> each other. This is not that bad, why not add a flag to qdisc
> ops to indicate which qdisc are using cb and block user from
> trying to do something bogus.

This is not how I planned to solve the problem.

I think we need an internal tfifo for netem use.

Then be able to add another qdisc on top of netem.


tfifo as a first stage, hardcoded (only limit is tunable)
[ optional 2nd stage, any qdisc ]

netem_queue()
{
	queue packet to tfifo
	(eventually at head, of reordering)
}

netem_dequeue()
{
	if (other_qdisc) {
		for_each_packet_from_tfifo_time_ready() {
			other_qdisc->enqueue(skb);
		}
		try_to_dequeue_one_packet_from(other_qdisc);
	} else {
		dequeue_one_packet_from_tfifo_time_ready();
	}
}

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: netem and hierarchical ingress traffic shaping
  2011-12-23 19:07           ` Stephen Hemminger
  2011-12-23 19:21             ` Eric Dumazet
@ 2011-12-23 19:36             ` David Miller
  1 sibling, 0 replies; 22+ messages in thread
From: David Miller @ 2011-12-23 19:36 UTC (permalink / raw)
  To: shemminger; +Cc: eric.dumazet, dave.taht, jsullivan, netdev

From: Stephen Hemminger <shemminger@vyatta.com>
Date: Fri, 23 Dec 2011 11:07:49 -0800

> So basically, netem, choke, and sfb are incompatible with
> each other. This is not that bad, why not add a flag to qdisc
> ops to indicate which qdisc are using cb and block user from
> trying to do something bogus.

Or we could do what we do in the inet stack, define a layout
that allows the different layers to use different parts of the
CB.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH net-next] netem: fix classful handling
  2011-12-23 19:21             ` Eric Dumazet
@ 2011-12-29  4:26               ` Eric Dumazet
  2011-12-29  6:17                 ` Stephen Hemminger
  0 siblings, 1 reply; 22+ messages in thread
From: Eric Dumazet @ 2011-12-29  4:26 UTC (permalink / raw)
  To: David Miller; +Cc: Dave Taht, John A. Sullivan III, netdev, Stephen Hemminger

Commit 10f6dfcfde (Revert "sch_netem: Remove classful functionality")
reintroduced classful functionality to netem, but broke basic netem
behavior :

netem uses an t(ime)fifo queue, and store timestamps in skb->cb[]

If qdisc is changed, time constraints are not respected and other qdisc
can destroy skb->cb[] and block netem at dequeue time.

Fix this by always using internal tfifo, and optionally attach a child
qdisc to netem.

Example of use :

DEV=eth3
tc qdisc del dev $DEV root
tc qdisc add dev $DEV root handle 30: est 1sec 8sec netem delay 20ms 10ms
tc qdisc add dev $DEV parent 30:0 sfq

# tc -s -d qdisc show dev eth3
qdisc netem 30: root refcnt 18 limit 1000 delay 20.0ms  10.0ms
 Sent 893810 bytes 891 pkt (dropped 0, overlimits 0 requeues 0) 
 rate 690192bit 61pps backlog 5972b 2p requeues 0 
qdisc sfq 800b: parent 30: limit 127p quantum 1514b flows 127/1024 divisor 1024 
 Sent 893810 bytes 891 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0 

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Stephen Hemminger <shemminger@vyatta.com>
---
 net/sched/sch_netem.c |  204 +++++++++++++++-------------------------
 1 file changed, 81 insertions(+), 123 deletions(-)

diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index ffcaa59..641bee5 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -67,7 +67,11 @@
 */
 
 struct netem_sched_data {
+	/* internal t(ime)fifo qdisc uses sch->q and sch->limit */
+
+	/* optional qdisc for classful handling (NULL at netem init) */
 	struct Qdisc	*qdisc;
+
 	struct qdisc_watchdog watchdog;
 
 	psched_tdiff_t latency;
@@ -117,7 +121,9 @@ struct netem_sched_data {
 
 };
 
-/* Time stamp put into socket buffer control block */
+/* Time stamp put into socket buffer control block
+ * Only valid when skbs are in our internal t(ime)fifo queue.
+ */
 struct netem_skb_cb {
 	psched_time_t	time_to_send;
 };
@@ -324,6 +330,31 @@ static psched_time_t packet_len_2_sched_time(unsigned int len, struct netem_sche
 	return PSCHED_NS2TICKS(ticks);
 }
 
+static int tfifo_enqueue(struct sk_buff *nskb, struct Qdisc *sch)
+{
+	struct sk_buff_head *list = &sch->q;
+	psched_time_t tnext = netem_skb_cb(nskb)->time_to_send;
+	struct sk_buff *skb;
+
+	if (likely(skb_queue_len(list) < sch->limit)) {
+		skb = skb_peek_tail(list);
+		/* Optimize for add at tail */
+		if (likely(!skb || tnext >= netem_skb_cb(skb)->time_to_send))
+			return qdisc_enqueue_tail(nskb, sch);
+
+		skb_queue_reverse_walk(list, skb) {
+			if (tnext >= netem_skb_cb(skb)->time_to_send)
+				break;
+		}
+
+		__skb_queue_after(list, skb, nskb);
+		sch->qstats.backlog += qdisc_pkt_len(nskb);
+		return NET_XMIT_SUCCESS;
+	}
+
+	return qdisc_reshape_fail(nskb, sch);
+}
+
 /*
  * Insert one skb into qdisc.
  * Note: parent depends on return value to account for queue length.
@@ -399,7 +430,7 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 		now = psched_get_time();
 
 		if (q->rate) {
-			struct sk_buff_head *list = &q->qdisc->q;
+			struct sk_buff_head *list = &sch->q;
 
 			delay += packet_len_2_sched_time(skb->len, q);
 
@@ -417,7 +448,7 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 
 		cb->time_to_send = now + delay;
 		++q->counter;
-		ret = qdisc_enqueue(skb, q->qdisc);
+		ret = tfifo_enqueue(skb, sch);
 	} else {
 		/*
 		 * Do re-ordering by putting one out of N packets at the front
@@ -426,9 +457,7 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 		cb->time_to_send = psched_get_time();
 		q->counter = 0;
 
-		__skb_queue_head(&q->qdisc->q, skb);
-		q->qdisc->qstats.backlog += qdisc_pkt_len(skb);
-		q->qdisc->qstats.requeues++;
+		__skb_queue_head(&sch->q, skb);
 		ret = NET_XMIT_SUCCESS;
 	}
 
@@ -439,19 +468,20 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 		}
 	}
 
-	sch->q.qlen++;
 	return NET_XMIT_SUCCESS;
 }
 
 static unsigned int netem_drop(struct Qdisc *sch)
 {
 	struct netem_sched_data *q = qdisc_priv(sch);
-	unsigned int len = 0;
+	unsigned int len;
 
-	if (q->qdisc->ops->drop && (len = q->qdisc->ops->drop(q->qdisc)) != 0) {
-		sch->q.qlen--;
+	len = qdisc_queue_drop(sch);
+	if (!len && q->qdisc && q->qdisc->ops->drop)
+	    len = q->qdisc->ops->drop(q->qdisc);
+	if (len)
 		sch->qstats.drops++;
-	}
+
 	return len;
 }
 
@@ -463,16 +493,16 @@ static struct sk_buff *netem_dequeue(struct Qdisc *sch)
 	if (qdisc_is_throttled(sch))
 		return NULL;
 
-	skb = q->qdisc->ops->peek(q->qdisc);
+tfifo_dequeue:
+	skb = qdisc_peek_head(sch);
 	if (skb) {
 		const struct netem_skb_cb *cb = netem_skb_cb(skb);
-		psched_time_t now = psched_get_time();
 
 		/* if more time remaining? */
-		if (cb->time_to_send <= now) {
-			skb = qdisc_dequeue_peeked(q->qdisc);
+		if (cb->time_to_send <= psched_get_time()) {
+			skb = qdisc_dequeue_tail(sch);
 			if (unlikely(!skb))
-				return NULL;
+				goto qdisc_dequeue;
 
 #ifdef CONFIG_NET_CLS_ACT
 			/*
@@ -483,15 +513,37 @@ static struct sk_buff *netem_dequeue(struct Qdisc *sch)
 				skb->tstamp.tv64 = 0;
 #endif
 
-			sch->q.qlen--;
+			if (q->qdisc) {
+				int err = qdisc_enqueue(skb, q->qdisc);
+
+				if (unlikely(err != NET_XMIT_SUCCESS)) {
+					if (net_xmit_drop_count(err)) {
+						sch->qstats.drops++;
+						qdisc_tree_decrease_qlen(sch, 1);
+					}
+				}
+				goto tfifo_dequeue;
+			}
+deliver:
 			qdisc_unthrottled(sch);
 			qdisc_bstats_update(sch, skb);
 			return skb;
 		}
 
+		if (q->qdisc) {
+			skb = q->qdisc->ops->dequeue(q->qdisc);
+			if (skb)
+				goto deliver;
+		}
 		qdisc_watchdog_schedule(&q->watchdog, cb->time_to_send);
 	}
 
+qdisc_dequeue:
+	if (q->qdisc) {
+		skb = q->qdisc->ops->dequeue(q->qdisc);
+		if (skb)
+			goto deliver;
+	}
 	return NULL;
 }
 
@@ -499,8 +551,9 @@ static void netem_reset(struct Qdisc *sch)
 {
 	struct netem_sched_data *q = qdisc_priv(sch);
 
-	qdisc_reset(q->qdisc);
-	sch->q.qlen = 0;
+	qdisc_reset_queue(sch);
+	if (q->qdisc)
+		qdisc_reset(q->qdisc);
 	qdisc_watchdog_cancel(&q->watchdog);
 }
 
@@ -689,11 +742,7 @@ static int netem_change(struct Qdisc *sch, struct nlattr *opt)
 	if (ret < 0)
 		return ret;
 
-	ret = fifo_set_limit(q->qdisc, qopt->limit);
-	if (ret) {
-		pr_info("netem: can't set fifo limit\n");
-		return ret;
-	}
+	sch->limit = qopt->limit;
 
 	q->latency = qopt->latency;
 	q->jitter = qopt->jitter;
@@ -734,88 +783,6 @@ static int netem_change(struct Qdisc *sch, struct nlattr *opt)
 	return ret;
 }
 
-/*
- * Special case version of FIFO queue for use by netem.
- * It queues in order based on timestamps in skb's
- */
-struct fifo_sched_data {
-	u32 limit;
-	psched_time_t oldest;
-};
-
-static int tfifo_enqueue(struct sk_buff *nskb, struct Qdisc *sch)
-{
-	struct fifo_sched_data *q = qdisc_priv(sch);
-	struct sk_buff_head *list = &sch->q;
-	psched_time_t tnext = netem_skb_cb(nskb)->time_to_send;
-	struct sk_buff *skb;
-
-	if (likely(skb_queue_len(list) < q->limit)) {
-		/* Optimize for add at tail */
-		if (likely(skb_queue_empty(list) || tnext >= q->oldest)) {
-			q->oldest = tnext;
-			return qdisc_enqueue_tail(nskb, sch);
-		}
-
-		skb_queue_reverse_walk(list, skb) {
-			const struct netem_skb_cb *cb = netem_skb_cb(skb);
-
-			if (tnext >= cb->time_to_send)
-				break;
-		}
-
-		__skb_queue_after(list, skb, nskb);
-
-		sch->qstats.backlog += qdisc_pkt_len(nskb);
-
-		return NET_XMIT_SUCCESS;
-	}
-
-	return qdisc_reshape_fail(nskb, sch);
-}
-
-static int tfifo_init(struct Qdisc *sch, struct nlattr *opt)
-{
-	struct fifo_sched_data *q = qdisc_priv(sch);
-
-	if (opt) {
-		struct tc_fifo_qopt *ctl = nla_data(opt);
-		if (nla_len(opt) < sizeof(*ctl))
-			return -EINVAL;
-
-		q->limit = ctl->limit;
-	} else
-		q->limit = max_t(u32, qdisc_dev(sch)->tx_queue_len, 1);
-
-	q->oldest = PSCHED_PASTPERFECT;
-	return 0;
-}
-
-static int tfifo_dump(struct Qdisc *sch, struct sk_buff *skb)
-{
-	struct fifo_sched_data *q = qdisc_priv(sch);
-	struct tc_fifo_qopt opt = { .limit = q->limit };
-
-	NLA_PUT(skb, TCA_OPTIONS, sizeof(opt), &opt);
-	return skb->len;
-
-nla_put_failure:
-	return -1;
-}
-
-static struct Qdisc_ops tfifo_qdisc_ops __read_mostly = {
-	.id		=	"tfifo",
-	.priv_size	=	sizeof(struct fifo_sched_data),
-	.enqueue	=	tfifo_enqueue,
-	.dequeue	=	qdisc_dequeue_head,
-	.peek		=	qdisc_peek_head,
-	.drop		=	qdisc_queue_drop,
-	.init		=	tfifo_init,
-	.reset		=	qdisc_reset_queue,
-	.change		=	tfifo_init,
-	.dump		=	tfifo_dump,
-};
-
 static int netem_init(struct Qdisc *sch, struct nlattr *opt)
 {
 	struct netem_sched_data *q = qdisc_priv(sch);
@@ -827,18 +794,9 @@ static int netem_init(struct Qdisc *sch, struct nlattr *opt)
 	qdisc_watchdog_init(&q->watchdog, sch);
 
 	q->loss_model = CLG_RANDOM;
-	q->qdisc = qdisc_create_dflt(sch->dev_queue, &tfifo_qdisc_ops,
-				     TC_H_MAKE(sch->handle, 1));
-	if (!q->qdisc) {
-		pr_notice("netem: qdisc create tfifo qdisc failed\n");
-		return -ENOMEM;
-	}
-
 	ret = netem_change(sch, opt);
-	if (ret) {
+	if (ret)
 		pr_info("netem: change failed\n");
-		qdisc_destroy(q->qdisc);
-	}
 	return ret;
 }
 
@@ -847,7 +805,8 @@ static void netem_destroy(struct Qdisc *sch)
 	struct netem_sched_data *q = qdisc_priv(sch);
 
 	qdisc_watchdog_cancel(&q->watchdog);
-	qdisc_destroy(q->qdisc);
+	if (q->qdisc)
+		qdisc_destroy(q->qdisc);
 	dist_free(q->delay_dist);
 }
 
@@ -951,7 +910,7 @@ static int netem_dump_class(struct Qdisc *sch, unsigned long cl,
 {
 	struct netem_sched_data *q = qdisc_priv(sch);
 
-	if (cl != 1) 	/* only one class */
+	if (cl != 1 || !q->qdisc) 	/* only one class */
 		return -ENOENT;
 
 	tcm->tcm_handle |= TC_H_MIN(1);
@@ -965,14 +924,13 @@ static int netem_graft(struct Qdisc *sch, unsigned long arg, struct Qdisc *new,
 {
 	struct netem_sched_data *q = qdisc_priv(sch);
 
-	if (new == NULL)
-		new = &noop_qdisc;
-
 	sch_tree_lock(sch);
 	*old = q->qdisc;
 	q->qdisc = new;
-	qdisc_tree_decrease_qlen(*old, (*old)->q.qlen);
-	qdisc_reset(*old);
+	if (*old) {
+		qdisc_tree_decrease_qlen(*old, (*old)->q.qlen);
+		qdisc_reset(*old);
+	}
 	sch_tree_unlock(sch);
 
 	return 0;

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next] netem: fix classful handling
  2011-12-29  4:26               ` [PATCH net-next] netem: fix classful handling Eric Dumazet
@ 2011-12-29  6:17                 ` Stephen Hemminger
  2011-12-29  9:12                   ` Eric Dumazet
  0 siblings, 1 reply; 22+ messages in thread
From: Stephen Hemminger @ 2011-12-29  6:17 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, Dave Taht, John A. Sullivan III, netdev

On Thu, 29 Dec 2011 05:26:00 +0100
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> Commit 10f6dfcfde (Revert "sch_netem: Remove classful functionality")
> reintroduced classful functionality to netem, but broke basic netem
> behavior :
> 
> netem uses an t(ime)fifo queue, and store timestamps in skb->cb[]
> 
> If qdisc is changed, time constraints are not respected and other qdisc
> can destroy skb->cb[] and block netem at dequeue time.
> 
> Fix this by always using internal tfifo, and optionally attach a child
> qdisc to netem.
> 
> Example of use :
> 
> DEV=eth3
> tc qdisc del dev $DEV root
> tc qdisc add dev $DEV root handle 30: est 1sec 8sec netem delay 20ms 10ms
> tc qdisc add dev $DEV parent 30:0 sfq

Does it work with TBF which is a more useful option?

Also, the whole tfifo idea is only to support the wierd idea that
if doing random delay that packets should get reordered based on the
results of the random value; it was an behavior some users wanted
because that is what NISTnet did.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next] netem: fix classful handling
  2011-12-29  6:17                 ` Stephen Hemminger
@ 2011-12-29  9:12                   ` Eric Dumazet
  2011-12-29 16:52                     ` Hagen Paul Pfeifer
  2011-12-30 22:12                     ` David Miller
  0 siblings, 2 replies; 22+ messages in thread
From: Eric Dumazet @ 2011-12-29  9:12 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David Miller, Dave Taht, John A. Sullivan III, netdev

Le mercredi 28 décembre 2011 à 22:17 -0800, Stephen Hemminger a écrit :
> On Thu, 29 Dec 2011 05:26:00 +0100
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
> 
> > Commit 10f6dfcfde (Revert "sch_netem: Remove classful functionality")
> > reintroduced classful functionality to netem, but broke basic netem
> > behavior :
> > 
> > netem uses an t(ime)fifo queue, and store timestamps in skb->cb[]
> > 
> > If qdisc is changed, time constraints are not respected and other qdisc
> > can destroy skb->cb[] and block netem at dequeue time.
> > 
> > Fix this by always using internal tfifo, and optionally attach a child
> > qdisc to netem.
> > 
> > Example of use :
> > 
> > DEV=eth3
> > tc qdisc del dev $DEV root
> > tc qdisc add dev $DEV root handle 30: est 1sec 8sec netem delay 20ms 10ms
> > tc qdisc add dev $DEV parent 30:0 sfq
> 
> Does it work with TBF which is a more useful option?
> 

Yes :

tc qdisc add dev $DEV root handle 30:0 est 1sec 8sec netem \
	delay 20ms 10ms
tc qdisc add dev $DEV handle 40:0 parent 30:0 tbf \
	burst 20480 limit 20480 mtu 1514 rate 32000bps


qdisc netem 30: root refcnt 18 limit 1000 delay 20.0ms  10.0ms
 Sent 190792 bytes 413 pkt (dropped 0, overlimits 0 requeues 0) 
 rate 18416bit 3pps backlog 0b 0p requeues 0 
qdisc tbf 40: parent 30: rate 256000bit burst 20Kb/8 mpu 0b lat 0us 
 Sent 190792 bytes 413 pkt (dropped 6, overlimits 10 requeues 0) 
 backlog 0b 5p requeues 0 

> Also, the whole tfifo idea is only to support the wierd idea that
> if doing random delay that packets should get reordered based on the
> results of the random value; it was an behavior some users wanted
> because that is what NISTnet did.

tfifo supports a time ordered queuing, wich mimics some jitter in the
network. This seems quite useful.

I see what you suggest : adding 'time_to_send' in the generic qdisc cb.

But it makes no sense if we attach a reordering qdisc, like SFQ :
A 'high prio' packet will block the whole netem because we'll have to
throttle since this packet time_to_send will be in the future, while
many other elligible packets are in queue.

In the case no jitter is asked to netem, we directly enqueue at the
queue tail with no extra cost, since we need to access last skb in queue
to perform the qdisc_enqueue_tail()

I am sending a v2 because of two lines I inadvertently removed in the case
we queue the new packet at the head of tfifo. (allowing to bypass the
sch->limit check... I'll send a separate patch to add this check)

Thanks

[PATCH v2 net-next] netem: fix classful handling

Commit 10f6dfcfde (Revert "sch_netem: Remove classful functionality")
reintroduced classful functionality to netem, but broke basic netem
behavior :

netem uses an t(ime)fifo queue, and store timestamps in skb->cb[]

If qdisc is changed, time constraints are not respected and other qdisc
can destroy skb->cb[] and block netem at dequeue time.

Fix this by always using internal tfifo, and optionally attach a child
qdisc to netem (or a tree of qdiscs)

Example of use :

DEV=eth3
tc qdisc del dev $DEV root
tc qdisc add dev $DEV root handle 30: est 1sec 8sec netem delay 20ms 10ms
tc qdisc add dev $DEV handle 40:0 parent 30:0 tbf \
	burst 20480 limit 20480 mtu 1514 rate 32000bps


qdisc netem 30: root refcnt 18 limit 1000 delay 20.0ms  10.0ms
 Sent 190792 bytes 413 pkt (dropped 0, overlimits 0 requeues 0) 
 rate 18416bit 3pps backlog 0b 0p requeues 0 
qdisc tbf 40: parent 30: rate 256000bit burst 20Kb/8 mpu 0b lat 0us 
 Sent 190792 bytes 413 pkt (dropped 6, overlimits 10 requeues 0) 
 backlog 0b 5p requeues 0 

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Stephen Hemminger <shemminger@vyatta.com>
---
 net/sched/sch_netem.c |  202 ++++++++++++++++------------------------
 1 file changed, 81 insertions(+), 121 deletions(-)

diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index ffcaa59..1e611cb 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -67,7 +67,11 @@
 */
 
 struct netem_sched_data {
+	/* internal t(ime)fifo qdisc uses sch->q and sch->limit */
+
+	/* optional qdisc for classful handling (NULL at netem init) */
 	struct Qdisc	*qdisc;
+
 	struct qdisc_watchdog watchdog;
 
 	psched_tdiff_t latency;
@@ -117,7 +121,9 @@ struct netem_sched_data {
 
 };
 
-/* Time stamp put into socket buffer control block */
+/* Time stamp put into socket buffer control block
+ * Only valid when skbs are in our internal t(ime)fifo queue.
+ */
 struct netem_skb_cb {
 	psched_time_t	time_to_send;
 };
@@ -324,6 +330,31 @@ static psched_time_t packet_len_2_sched_time(unsigned int len, struct netem_sche
 	return PSCHED_NS2TICKS(ticks);
 }
 
+static int tfifo_enqueue(struct sk_buff *nskb, struct Qdisc *sch)
+{
+	struct sk_buff_head *list = &sch->q;
+	psched_time_t tnext = netem_skb_cb(nskb)->time_to_send;
+	struct sk_buff *skb;
+
+	if (likely(skb_queue_len(list) < sch->limit)) {
+		skb = skb_peek_tail(list);
+		/* Optimize for add at tail */
+		if (likely(!skb || tnext >= netem_skb_cb(skb)->time_to_send))
+			return qdisc_enqueue_tail(nskb, sch);
+
+		skb_queue_reverse_walk(list, skb) {
+			if (tnext >= netem_skb_cb(skb)->time_to_send)
+				break;
+		}
+
+		__skb_queue_after(list, skb, nskb);
+		sch->qstats.backlog += qdisc_pkt_len(nskb);
+		return NET_XMIT_SUCCESS;
+	}
+
+	return qdisc_reshape_fail(nskb, sch);
+}
+
 /*
  * Insert one skb into qdisc.
  * Note: parent depends on return value to account for queue length.
@@ -399,7 +430,7 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 		now = psched_get_time();
 
 		if (q->rate) {
-			struct sk_buff_head *list = &q->qdisc->q;
+			struct sk_buff_head *list = &sch->q;
 
 			delay += packet_len_2_sched_time(skb->len, q);
 
@@ -417,7 +448,7 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 
 		cb->time_to_send = now + delay;
 		++q->counter;
-		ret = qdisc_enqueue(skb, q->qdisc);
+		ret = tfifo_enqueue(skb, sch);
 	} else {
 		/*
 		 * Do re-ordering by putting one out of N packets at the front
@@ -426,7 +457,7 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 		cb->time_to_send = psched_get_time();
 		q->counter = 0;
 
-		__skb_queue_head(&q->qdisc->q, skb);
+		__skb_queue_head(&sch->q, skb);
 		q->qdisc->qstats.backlog += qdisc_pkt_len(skb);
 		q->qdisc->qstats.requeues++;
 		ret = NET_XMIT_SUCCESS;
@@ -439,19 +470,20 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 		}
 	}
 
-	sch->q.qlen++;
 	return NET_XMIT_SUCCESS;
 }
 
 static unsigned int netem_drop(struct Qdisc *sch)
 {
 	struct netem_sched_data *q = qdisc_priv(sch);
-	unsigned int len = 0;
+	unsigned int len;
 
-	if (q->qdisc->ops->drop && (len = q->qdisc->ops->drop(q->qdisc)) != 0) {
-		sch->q.qlen--;
+	len = qdisc_queue_drop(sch);
+	if (!len && q->qdisc && q->qdisc->ops->drop)
+	    len = q->qdisc->ops->drop(q->qdisc);
+	if (len)
 		sch->qstats.drops++;
-	}
+
 	return len;
 }
 
@@ -463,16 +495,16 @@ static struct sk_buff *netem_dequeue(struct Qdisc *sch)
 	if (qdisc_is_throttled(sch))
 		return NULL;
 
-	skb = q->qdisc->ops->peek(q->qdisc);
+tfifo_dequeue:
+	skb = qdisc_peek_head(sch);
 	if (skb) {
 		const struct netem_skb_cb *cb = netem_skb_cb(skb);
-		psched_time_t now = psched_get_time();
 
 		/* if more time remaining? */
-		if (cb->time_to_send <= now) {
-			skb = qdisc_dequeue_peeked(q->qdisc);
+		if (cb->time_to_send <= psched_get_time()) {
+			skb = qdisc_dequeue_tail(sch);
 			if (unlikely(!skb))
-				return NULL;
+				goto qdisc_dequeue;
 
 #ifdef CONFIG_NET_CLS_ACT
 			/*
@@ -483,15 +515,37 @@ static struct sk_buff *netem_dequeue(struct Qdisc *sch)
 				skb->tstamp.tv64 = 0;
 #endif
 
-			sch->q.qlen--;
+			if (q->qdisc) {
+				int err = qdisc_enqueue(skb, q->qdisc);
+
+				if (unlikely(err != NET_XMIT_SUCCESS)) {
+					if (net_xmit_drop_count(err)) {
+						sch->qstats.drops++;
+						qdisc_tree_decrease_qlen(sch, 1);
+					}
+				}
+				goto tfifo_dequeue;
+			}
+deliver:
 			qdisc_unthrottled(sch);
 			qdisc_bstats_update(sch, skb);
 			return skb;
 		}
 
+		if (q->qdisc) {
+			skb = q->qdisc->ops->dequeue(q->qdisc);
+			if (skb)
+				goto deliver;
+		}
 		qdisc_watchdog_schedule(&q->watchdog, cb->time_to_send);
 	}
 
+qdisc_dequeue:
+	if (q->qdisc) {
+		skb = q->qdisc->ops->dequeue(q->qdisc);
+		if (skb)
+			goto deliver;
+	}
 	return NULL;
 }
 
@@ -499,8 +553,9 @@ static void netem_reset(struct Qdisc *sch)
 {
 	struct netem_sched_data *q = qdisc_priv(sch);
 
-	qdisc_reset(q->qdisc);
-	sch->q.qlen = 0;
+	qdisc_reset_queue(sch);
+	if (q->qdisc)
+		qdisc_reset(q->qdisc);
 	qdisc_watchdog_cancel(&q->watchdog);
 }
 
@@ -689,11 +744,7 @@ static int netem_change(struct Qdisc *sch, struct nlattr *opt)
 	if (ret < 0)
 		return ret;
 
-	ret = fifo_set_limit(q->qdisc, qopt->limit);
-	if (ret) {
-		pr_info("netem: can't set fifo limit\n");
-		return ret;
-	}
+	sch->limit = qopt->limit;
 
 	q->latency = qopt->latency;
 	q->jitter = qopt->jitter;
@@ -734,88 +785,6 @@ static int netem_change(struct Qdisc *sch, struct nlattr *opt)
 	return ret;
 }
 
-/*
- * Special case version of FIFO queue for use by netem.
- * It queues in order based on timestamps in skb's
- */
-struct fifo_sched_data {
-	u32 limit;
-	psched_time_t oldest;
-};
-
-static int tfifo_enqueue(struct sk_buff *nskb, struct Qdisc *sch)
-{
-	struct fifo_sched_data *q = qdisc_priv(sch);
-	struct sk_buff_head *list = &sch->q;
-	psched_time_t tnext = netem_skb_cb(nskb)->time_to_send;
-	struct sk_buff *skb;
-
-	if (likely(skb_queue_len(list) < q->limit)) {
-		/* Optimize for add at tail */
-		if (likely(skb_queue_empty(list) || tnext >= q->oldest)) {
-			q->oldest = tnext;
-			return qdisc_enqueue_tail(nskb, sch);
-		}
-
-		skb_queue_reverse_walk(list, skb) {
-			const struct netem_skb_cb *cb = netem_skb_cb(skb);
-
-			if (tnext >= cb->time_to_send)
-				break;
-		}
-
-		__skb_queue_after(list, skb, nskb);
-
-		sch->qstats.backlog += qdisc_pkt_len(nskb);
-
-		return NET_XMIT_SUCCESS;
-	}
-
-	return qdisc_reshape_fail(nskb, sch);
-}
-
-static int tfifo_init(struct Qdisc *sch, struct nlattr *opt)
-{
-	struct fifo_sched_data *q = qdisc_priv(sch);
-
-	if (opt) {
-		struct tc_fifo_qopt *ctl = nla_data(opt);
-		if (nla_len(opt) < sizeof(*ctl))
-			return -EINVAL;
-
-		q->limit = ctl->limit;
-	} else
-		q->limit = max_t(u32, qdisc_dev(sch)->tx_queue_len, 1);
-
-	q->oldest = PSCHED_PASTPERFECT;
-	return 0;
-}
-
-static int tfifo_dump(struct Qdisc *sch, struct sk_buff *skb)
-{
-	struct fifo_sched_data *q = qdisc_priv(sch);
-	struct tc_fifo_qopt opt = { .limit = q->limit };
-
-	NLA_PUT(skb, TCA_OPTIONS, sizeof(opt), &opt);
-	return skb->len;
-
-nla_put_failure:
-	return -1;
-}
-
-static struct Qdisc_ops tfifo_qdisc_ops __read_mostly = {
-	.id		=	"tfifo",
-	.priv_size	=	sizeof(struct fifo_sched_data),
-	.enqueue	=	tfifo_enqueue,
-	.dequeue	=	qdisc_dequeue_head,
-	.peek		=	qdisc_peek_head,
-	.drop		=	qdisc_queue_drop,
-	.init		=	tfifo_init,
-	.reset		=	qdisc_reset_queue,
-	.change		=	tfifo_init,
-	.dump		=	tfifo_dump,
-};
-
 static int netem_init(struct Qdisc *sch, struct nlattr *opt)
 {
 	struct netem_sched_data *q = qdisc_priv(sch);
@@ -827,18 +796,9 @@ static int netem_init(struct Qdisc *sch, struct nlattr *opt)
 	qdisc_watchdog_init(&q->watchdog, sch);
 
 	q->loss_model = CLG_RANDOM;
-	q->qdisc = qdisc_create_dflt(sch->dev_queue, &tfifo_qdisc_ops,
-				     TC_H_MAKE(sch->handle, 1));
-	if (!q->qdisc) {
-		pr_notice("netem: qdisc create tfifo qdisc failed\n");
-		return -ENOMEM;
-	}
-
 	ret = netem_change(sch, opt);
-	if (ret) {
+	if (ret)
 		pr_info("netem: change failed\n");
-		qdisc_destroy(q->qdisc);
-	}
 	return ret;
 }
 
@@ -847,7 +807,8 @@ static void netem_destroy(struct Qdisc *sch)
 	struct netem_sched_data *q = qdisc_priv(sch);
 
 	qdisc_watchdog_cancel(&q->watchdog);
-	qdisc_destroy(q->qdisc);
+	if (q->qdisc)
+		qdisc_destroy(q->qdisc);
 	dist_free(q->delay_dist);
 }
 
@@ -951,7 +912,7 @@ static int netem_dump_class(struct Qdisc *sch, unsigned long cl,
 {
 	struct netem_sched_data *q = qdisc_priv(sch);
 
-	if (cl != 1) 	/* only one class */
+	if (cl != 1 || !q->qdisc) 	/* only one class */
 		return -ENOENT;
 
 	tcm->tcm_handle |= TC_H_MIN(1);
@@ -965,14 +926,13 @@ static int netem_graft(struct Qdisc *sch, unsigned long arg, struct Qdisc *new,
 {
 	struct netem_sched_data *q = qdisc_priv(sch);
 
-	if (new == NULL)
-		new = &noop_qdisc;
-
 	sch_tree_lock(sch);
 	*old = q->qdisc;
 	q->qdisc = new;
-	qdisc_tree_decrease_qlen(*old, (*old)->q.qlen);
-	qdisc_reset(*old);
+	if (*old) {
+		qdisc_tree_decrease_qlen(*old, (*old)->q.qlen);
+		qdisc_reset(*old);
+	}
 	sch_tree_unlock(sch);
 
 	return 0;

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next] netem: fix classful handling
  2011-12-29  9:12                   ` Eric Dumazet
@ 2011-12-29 16:52                     ` Hagen Paul Pfeifer
  2011-12-29 17:15                       ` Eric Dumazet
  2011-12-30 22:12                     ` David Miller
  1 sibling, 1 reply; 22+ messages in thread
From: Hagen Paul Pfeifer @ 2011-12-29 16:52 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Stephen Hemminger, David Miller, Dave Taht, John A. Sullivan III,
	netdev

* Eric Dumazet | 2011-12-29 10:12:02 [+0100]:

>> Also, the whole tfifo idea is only to support the wierd idea that
>> if doing random delay that packets should get reordered based on the
>> results of the random value; it was an behavior some users wanted
>> because that is what NISTnet did.
>
>tfifo supports a time ordered queuing, wich mimics some jitter in the
>network. This seems quite useful.
>
>I see what you suggest : adding 'time_to_send' in the generic qdisc cb.
>
>But it makes no sense if we attach a reordering qdisc, like SFQ :
>A 'high prio' packet will block the whole netem because we'll have to
>throttle since this packet time_to_send will be in the future, while
>many other elligible packets are in queue.

In other words netem jitter and a qdisc !tfifo will not work. Correct? The
rate extension also peak the last packet to get the reference time (assuming a
strict ordering):

[...]
now = netem_skb_cb(skb_peek_tail(list))->time_to_send;
[...]


We should avoid a different (unseeable) behavior depending on the queue
(tfifo, SFQ).  Another point: operate netem and qdisc on the same computer can
lead to timing abnormalities. In our test setups we operate qdisc/tcp/whatever
setups and netem on more then on computer.

Hagen

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next] netem: fix classful handling
  2011-12-29 16:52                     ` Hagen Paul Pfeifer
@ 2011-12-29 17:15                       ` Eric Dumazet
  2011-12-29 17:43                         ` Hagen Paul Pfeifer
  0 siblings, 1 reply; 22+ messages in thread
From: Eric Dumazet @ 2011-12-29 17:15 UTC (permalink / raw)
  To: Hagen Paul Pfeifer
  Cc: Stephen Hemminger, David Miller, Dave Taht, John A. Sullivan III,
	netdev

Le jeudi 29 décembre 2011 à 17:52 +0100, Hagen Paul Pfeifer a écrit :
> * Eric Dumazet | 2011-12-29 10:12:02 [+0100]:
> 
> >> Also, the whole tfifo idea is only to support the wierd idea that
> >> if doing random delay that packets should get reordered based on the
> >> results of the random value; it was an behavior some users wanted
> >> because that is what NISTnet did.
> >
> >tfifo supports a time ordered queuing, wich mimics some jitter in the
> >network. This seems quite useful.
> >
> >I see what you suggest : adding 'time_to_send' in the generic qdisc cb.
> >
> >But it makes no sense if we attach a reordering qdisc, like SFQ :
> >A 'high prio' packet will block the whole netem because we'll have to
> >throttle since this packet time_to_send will be in the future, while
> >many other elligible packets are in queue.
> 
> In other words netem jitter and a qdisc !tfifo will not work. Correct? The
> rate extension also peak the last packet to get the reference time (assuming a
> strict ordering):
> 

Yep, current situation is borked. It assumes we _use_ tfifo, for delay
jitters but also for rate extension.

> [...]
> now = netem_skb_cb(skb_peek_tail(list))->time_to_send;
> [...]
> 
> 
> We should avoid a different (unseeable) behavior depending on the queue
> (tfifo, SFQ).  Another point: operate netem and qdisc on the same computer can
> lead to timing abnormalities. In our test setups we operate qdisc/tcp/whatever
> setups and netem on more then on computer.
> 

After my patch you could use netem as a delay module before a complex
qdisc setup for example.

[ Simulating a 10ms delay on a 10Gigabit link is expensive, since you
need to allow up to ~150.000 packets in tfifo. Maybe we should switch to
tbfifo [giving a limit in bytes, not packets ] ]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next] netem: fix classful handling
  2011-12-29 17:15                       ` Eric Dumazet
@ 2011-12-29 17:43                         ` Hagen Paul Pfeifer
  2011-12-29 18:10                           ` Eric Dumazet
  0 siblings, 1 reply; 22+ messages in thread
From: Hagen Paul Pfeifer @ 2011-12-29 17:43 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Stephen Hemminger, David Miller, Dave Taht, John A. Sullivan III,
	netdev

* Eric Dumazet | 2011-12-29 18:15:50 [+0100]:

>> In other words netem jitter and a qdisc !tfifo will not work. Correct? The
>> rate extension also peak the last packet to get the reference time (assuming a
>> strict ordering):
>> 
>
>Yep, current situation is borked. It assumes we _use_ tfifo, for delay
>jitters but also for rate extension.

Mhh, should we signal this to the user via 'tc -s qdisc show'? Or should we
assume that a user who set netem rate|jitter AND a qdisc !tfifo knows what he
does - because we assume he is a experienced user? At least somewhere in the
manpage a comment should point to this characteristic.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next] netem: fix classful handling
  2011-12-29 17:43                         ` Hagen Paul Pfeifer
@ 2011-12-29 18:10                           ` Eric Dumazet
  2011-12-29 18:25                             ` Hagen Paul Pfeifer
  0 siblings, 1 reply; 22+ messages in thread
From: Eric Dumazet @ 2011-12-29 18:10 UTC (permalink / raw)
  To: Hagen Paul Pfeifer
  Cc: Stephen Hemminger, David Miller, Dave Taht, John A. Sullivan III,
	netdev

Le jeudi 29 décembre 2011 à 18:43 +0100, Hagen Paul Pfeifer a écrit :
> * Eric Dumazet | 2011-12-29 18:15:50 [+0100]:
> 
> >> In other words netem jitter and a qdisc !tfifo will not work. Correct? The
> >> rate extension also peak the last packet to get the reference time (assuming a
> >> strict ordering):
> >> 
> >
> >Yep, current situation is borked. It assumes we _use_ tfifo, for delay
> >jitters but also for rate extension.
> 
> Mhh, should we signal this to the user via 'tc -s qdisc show'? Or should we
> assume that a user who set netem rate|jitter AND a qdisc !tfifo knows what he
> does - because we assume he is a experienced user? At least somewhere in the
> manpage a comment should point to this characteristic.

I dont quite understand the question. The patch I posted is supposed to
fix the problem. What do you want to tell to the user ?

netem module is probably only used by experimented users, on very recent
kernels anyway.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next] netem: fix classful handling
  2011-12-29 18:10                           ` Eric Dumazet
@ 2011-12-29 18:25                             ` Hagen Paul Pfeifer
  2011-12-29 18:31                               ` Stephen Hemminger
  2011-12-29 18:36                               ` Eric Dumazet
  0 siblings, 2 replies; 22+ messages in thread
From: Hagen Paul Pfeifer @ 2011-12-29 18:25 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Stephen Hemminger, David Miller, Dave Taht, John A. Sullivan III,
	netdev

* Eric Dumazet | 2011-12-29 19:10:36 [+0100]:

>I dont quite understand the question. The patch I posted is supposed to
>fix the problem. What do you want to tell to the user ?

I assumed that the patch makes it possible to replace standard tfifo?! Tfifo
provides strict ordering, other qdisc's do not. So I thought if someone use
netem rate|jitter with e.g. SFQ then this should be mentioned somewhere. E.g.
"rate|jitter can only be used with tfifo qdisc". Correct me if I am wrong.

Cheers

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next] netem: fix classful handling
  2011-12-29 18:25                             ` Hagen Paul Pfeifer
@ 2011-12-29 18:31                               ` Stephen Hemminger
  2011-12-29 18:36                               ` Eric Dumazet
  1 sibling, 0 replies; 22+ messages in thread
From: Stephen Hemminger @ 2011-12-29 18:31 UTC (permalink / raw)
  To: Hagen Paul Pfeifer
  Cc: Eric Dumazet, David Miller, Dave Taht, John A. Sullivan III,
	netdev

On Thu, 29 Dec 2011 19:25:03 +0100
Hagen Paul Pfeifer <hagen@jauu.net> wrote:

> * Eric Dumazet | 2011-12-29 19:10:36 [+0100]:
> 
> >I dont quite understand the question. The patch I posted is supposed to
> >fix the problem. What do you want to tell to the user ?
> 
> I assumed that the patch makes it possible to replace standard tfifo?! Tfifo
> provides strict ordering, other qdisc's do not. So I thought if someone use
> netem rate|jitter with e.g. SFQ then this should be mentioned somewhere. E.g.
> "rate|jitter can only be used with tfifo qdisc". Correct me if I am wrong.
> 
> Cheers

This is actually a feature. It is documented that tfifo can be replaced
with pfifo. In other words, if default (tfifo) is used then packets
can be reordered by the delay/jitter values. But if pfifo (or other
qdisc) is used, the packets will not be reordered.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next] netem: fix classful handling
  2011-12-29 18:25                             ` Hagen Paul Pfeifer
  2011-12-29 18:31                               ` Stephen Hemminger
@ 2011-12-29 18:36                               ` Eric Dumazet
  1 sibling, 0 replies; 22+ messages in thread
From: Eric Dumazet @ 2011-12-29 18:36 UTC (permalink / raw)
  To: Hagen Paul Pfeifer
  Cc: Stephen Hemminger, David Miller, Dave Taht, John A. Sullivan III,
	netdev

Le jeudi 29 décembre 2011 à 19:25 +0100, Hagen Paul Pfeifer a écrit :
> * Eric Dumazet | 2011-12-29 19:10:36 [+0100]:
> 
> >I dont quite understand the question. The patch I posted is supposed to
> >fix the problem. What do you want to tell to the user ?
> 
> I assumed that the patch makes it possible to replace standard tfifo?! Tfifo
> provides strict ordering, other qdisc's do not. So I thought if someone use
> netem rate|jitter with e.g. SFQ then this should be mentioned somewhere. E.g.
> "rate|jitter can only be used with tfifo qdisc". Correct me if I am wrong.
> 

Current netem uses a single queue, default tfifo.

Then if you change this tfifo by SFQ, you lose tfifo, and netem doesnt
work at all.

I claim we cannot remove tfifo.

After my patch, you now have :

1) An internal mandatory tfifo queue, to fulfill time_to_send
requirements.

2) An optional qdisc (SFQ in your example), where packets are queued
once dequeued from tfifo at the right time (after netem delay/rate
respected), eventually a packet can finaly be delivered to device with a
_bigger_ delay than the one predicted in tfifo, because of trafic
shaping happening in this optional qdisc.

Full netem block :

              +---------+                   +--------+
-->>-enqueue--| tfifo   |---->>>>--enqueue--|  SFQ   |-dequeue--->>>>> 
              +---------+                   +--------+

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next] netem: fix classful handling
  2011-12-29  9:12                   ` Eric Dumazet
  2011-12-29 16:52                     ` Hagen Paul Pfeifer
@ 2011-12-30 22:12                     ` David Miller
  1 sibling, 0 replies; 22+ messages in thread
From: David Miller @ 2011-12-30 22:12 UTC (permalink / raw)
  To: eric.dumazet; +Cc: shemminger, dave.taht, jsullivan, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 29 Dec 2011 10:12:02 +0100

> [PATCH v2 net-next] netem: fix classful handling

Applied, thanks Eric.

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2011-12-30 22:12 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-12-18  5:12 netem and hierarchical ingress traffic shaping John A. Sullivan III
2011-12-18 19:55 ` Stephen Hemminger
2011-12-19 16:53   ` John A. Sullivan III
2011-12-23 17:33   ` Eric Dumazet
2011-12-23 17:39     ` Eric Dumazet
2011-12-23 17:54       ` Dave Taht
2011-12-23 18:28         ` Eric Dumazet
2011-12-23 18:54           ` Dave Taht
2011-12-23 19:07           ` Stephen Hemminger
2011-12-23 19:21             ` Eric Dumazet
2011-12-29  4:26               ` [PATCH net-next] netem: fix classful handling Eric Dumazet
2011-12-29  6:17                 ` Stephen Hemminger
2011-12-29  9:12                   ` Eric Dumazet
2011-12-29 16:52                     ` Hagen Paul Pfeifer
2011-12-29 17:15                       ` Eric Dumazet
2011-12-29 17:43                         ` Hagen Paul Pfeifer
2011-12-29 18:10                           ` Eric Dumazet
2011-12-29 18:25                             ` Hagen Paul Pfeifer
2011-12-29 18:31                               ` Stephen Hemminger
2011-12-29 18:36                               ` Eric Dumazet
2011-12-30 22:12                     ` David Miller
2011-12-23 19:36             ` netem and hierarchical ingress traffic shaping David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).