* netem and hierarchical ingress traffic shaping @ 2011-12-18 5:12 John A. Sullivan III 2011-12-18 19:55 ` Stephen Hemminger 0 siblings, 1 reply; 22+ messages in thread From: John A. Sullivan III @ 2011-12-18 5:12 UTC (permalink / raw) To: netdev Hello, all. I am having some delightful success building a test WAN environment using hfsc and netem. We have placed netem on both ingress and egress for various test environment reasons. We are also using hfsc on both ingress and egress for traffic shaping. Since netem appears to be classless, we realized we would need to replace the SFQ on each leaf with netem which we really didn't want to do - not only to not lose SFQ but because we didn't want to maintain the netem parameters on each leaf. So, we activated our ifb1 interface, placed netem on it and redirected all the egress traffic to ifb1. Taht worked fine. However, how do we do this on the ingress? To use hfsc on ingres, we are already redirecting to ifb0. We can't redirect ifb0 to ifb1. If we apply multiple filters, one to redirect into ifb0 and the other to ifb1, only the first match takes effect. So we are living with replacing SFQ on the ifb0 leaves with netem. Is there any other way? Thanks - John In case it is of interest, here is the set of rules we are using: tc qdisc add dev eth1 root handle 1: hfsc default 20 tc class add dev eth1 parent 1: classid 1:1 hfsc sc rate 1490kbit ul rate 1490kbit tc class add dev eth1 parent 1:1 classid 1:20 hfsc rt rate 400kbit ls rate 200kbit tc qdisc add dev eth1 parent 1:20 handle 1201 sfq perturb 10 tc class add dev eth1 parent 1:1 classid 1:10 hfsc rt umax 16kbit dmax 50ms rate 200kbit ls rate 1000kbit tc qdisc add dev eth1 parent 1:10 handle 1101 sfq perturb 60 tc class add dev eth1 parent 1:1 classid 1:30 hfsc rt umax 1514b dmax 20ms rate 20kbit tc qdisc add dev eth1 parent 1:30 handle 1301 sfq perturb 60 iptables -t mangle -A POSTROUTING -p 6 --syn --dport 443 -j CONNMARK --set-mark 0x10 iptables -t mangle -A PREROUTING -p 6 --syn --dport 822 -j CONNMARK --set-mark 0x11 iptables -t mangle -A POSTROUTING -o eth1 -p 6 -j CONNMARK --restore-mark tc filter add dev eth1 parent 1:0 protocol ip prio 10 handle 0x10 fw flowid 1:10 tc filter add dev eth1 parent 1:0 protocol ip prio 10 handle 0x11 fw flowid 1:30 tc qdisc add dev eth1 ingress modprobe ifb ifconfig ifb0 up tc filter add dev eth1 parent ffff: protocol ip prio 50 u32 match u32 0 0 action mirred egress redirect dev ifb0 tc qdisc add dev ifb0 root handle 1: hfsc default 20 tc class add dev ifb0 parent 1: classid 1:1 hfsc sc rate 1490kbit ul rate 1490kbit tc class add dev ifb0 parent 1:1 classid 1:20 hfsc rt rate 400kbit ls rate 200kbit tc qdisc add dev ifb0 parent 1:20 handle 1201 netem delay 25ms 5ms distribution normal loss 0.1% 30% tc class add dev ifb0 parent 1:1 classid 1:10 hfsc rt umax 16kbit dmax 50ms rate 200kbit ls rate 1000kbit tc qdisc add dev ifb0 parent 1:10 handle 1101 netem delay 25ms 5ms distribution normal loss 0.1% 30% tc class add dev ifb0 parent 1:1 classid 1:30 hfsc rt umax 1514b dmax 20ms rate 20kbit tc qdisc add dev ifb0 parent 1:30 handle 1301 netem delay 25ms 5ms distribution normal loss 0.1% 30% tc filter add dev ifb0 parent 1:0 protocol ip prio 1 handle 6: u32 divisor 1 tc filter add dev ifb0 parent 1:0 protocol ip prio 1 u32 match ip protocol 6 0xff link 6: offset at 0 mask 0x0f00 shift 6 plus 0 eat tc filter add dev ifb0 parent 1:0 protocol ip prio 1 u32 ht 6:0 match tcp src 443 0x00ff flowid 1:10 tc filter add dev ifb0 parent 1:0 protocol ip prio 1 u32 ht 6:0 match tcp dst 822 0xff00 flowid 1:30 ifconfig ifb1 up tc qdisc add dev ifb1 root handle 2 netem delay 25ms 5ms distribution normal loss 0.1% 30% tc filter add dev eth1 parent 1:0 protocol ip prio 1 u32 match u32 0 0 action mirred egress redirect dev ifb1 ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: netem and hierarchical ingress traffic shaping 2011-12-18 5:12 netem and hierarchical ingress traffic shaping John A. Sullivan III @ 2011-12-18 19:55 ` Stephen Hemminger 2011-12-19 16:53 ` John A. Sullivan III 2011-12-23 17:33 ` Eric Dumazet 0 siblings, 2 replies; 22+ messages in thread From: Stephen Hemminger @ 2011-12-18 19:55 UTC (permalink / raw) To: John A. Sullivan III; +Cc: netdev On Sun, 18 Dec 2011 00:12:12 -0500 "John A. Sullivan III" <jsullivan@opensourcedevel.com> wrote: > Since netem appears to be classless, we realized we would need to > replace the SFQ on each leaf with netem which we really didn't want to > do - not only to not lose SFQ but because we didn't want to maintain the > netem parameters on each leaf. So, we activated our ifb1 interface, > placed netem on it and redirected all the egress traffic to ifb1. Taht > worked fine. Current versions of netem can take one class. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: netem and hierarchical ingress traffic shaping 2011-12-18 19:55 ` Stephen Hemminger @ 2011-12-19 16:53 ` John A. Sullivan III 2011-12-23 17:33 ` Eric Dumazet 1 sibling, 0 replies; 22+ messages in thread From: John A. Sullivan III @ 2011-12-19 16:53 UTC (permalink / raw) To: Stephen Hemminger; +Cc: netdev On Sun, 2011-12-18 at 11:55 -0800, Stephen Hemminger wrote: > On Sun, 18 Dec 2011 00:12:12 -0500 > "John A. Sullivan III" <jsullivan@opensourcedevel.com> wrote: > > > Since netem appears to be classless, we realized we would need to > > replace the SFQ on each leaf with netem which we really didn't want to > > do - not only to not lose SFQ but because we didn't want to maintain the > > netem parameters on each leaf. So, we activated our ifb1 interface, > > placed netem on it and redirected all the egress traffic to ifb1. Taht > > worked fine. > > Current versions of netem can take one class. <snip> Thanks but how would that work? For example, I assume we could not send netem processed packets to SFQ because SFQ is classless. We tried anyway and got: root@testswitch01:~# tc class add dev ifb0 parent 1201: classid 1202 sfq perturb 60 Error: Qdisc "sfq" is classless. root@testswitch01:~# tc qdisc add dev ifb0 parent 1201: handle 1202 sfq perturb 60 RTNETLINK answers: Operation not supported 1201 is a netem qdisc at the end of hfsc. Not being quite sure what you meant, we thought we experiment before asking so we tried to attach an hfsc class to the netem qdisc. We didn't think it would work and it didn't: root@testswitch01:~# tc qdisc add dev eth3 root handle 7 netem delay 25ms 5ms distribution normal loss 0.1% 30% root@testswitch01:~# tc class add dev eth3 parent 7: classid 7:1 hfsc sc rate 1490kbit ul rate 1490kbit RTNETLINK answers: Invalid argument So we tried adding the hfsc qdisc first: root@testswitch01:~# tc qdisc add dev eth3 parent 7: handle 7:1 hfsc default 70 RTNETLINK answers: File exists Then, just for kicks, we tried adding an SFQ qdisc in case the problem with the previous attempts was that the netem qdisc was not at the top level: root@testswitch01:~# tc qdisc add dev eth3 parent 7: handle 7:1 sfq perturb 60 RTNETLINK answers: File exists So I'm still not sure how we can do anything other than stick netem on the end of one of our branches or at root. Thanks - John ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: netem and hierarchical ingress traffic shaping 2011-12-18 19:55 ` Stephen Hemminger 2011-12-19 16:53 ` John A. Sullivan III @ 2011-12-23 17:33 ` Eric Dumazet 2011-12-23 17:39 ` Eric Dumazet 1 sibling, 1 reply; 22+ messages in thread From: Eric Dumazet @ 2011-12-23 17:33 UTC (permalink / raw) To: Stephen Hemminger; +Cc: John A. Sullivan III, netdev Le dimanche 18 décembre 2011 à 11:55 -0800, Stephen Hemminger a écrit : > On Sun, 18 Dec 2011 00:12:12 -0500 > "John A. Sullivan III" <jsullivan@opensourcedevel.com> wrote: > > > Since netem appears to be classless, we realized we would need to > > replace the SFQ on each leaf with netem which we really didn't want to > > do - not only to not lose SFQ but because we didn't want to maintain the > > netem parameters on each leaf. So, we activated our ifb1 interface, > > placed netem on it and redirected all the egress traffic to ifb1. Taht > > worked fine. > > Current versions of netem can take one class. > -- Hmm, I can see that (commit 10f6dfcfde884441) But it wont work very well, it assumes qdisc uses a single queue if netem reordering is requested : (__skb_queue_head(&q->qdisc->q, skb)) We should allow reordering if netem queue is changed from tfifo, only if new qdisc is compatible with __skb_queue_head(&q->qdisc->q, skb) (maybe providing a new ->queue_at_head() new ops) ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: netem and hierarchical ingress traffic shaping 2011-12-23 17:33 ` Eric Dumazet @ 2011-12-23 17:39 ` Eric Dumazet 2011-12-23 17:54 ` Dave Taht 0 siblings, 1 reply; 22+ messages in thread From: Eric Dumazet @ 2011-12-23 17:39 UTC (permalink / raw) To: Stephen Hemminger; +Cc: John A. Sullivan III, netdev Le vendredi 23 décembre 2011 à 18:33 +0100, Eric Dumazet a écrit : > Le dimanche 18 décembre 2011 à 11:55 -0800, Stephen Hemminger a écrit : > > On Sun, 18 Dec 2011 00:12:12 -0500 > > "John A. Sullivan III" <jsullivan@opensourcedevel.com> wrote: > > > > > Since netem appears to be classless, we realized we would need to > > > replace the SFQ on each leaf with netem which we really didn't want to > > > do - not only to not lose SFQ but because we didn't want to maintain the > > > netem parameters on each leaf. So, we activated our ifb1 interface, > > > placed netem on it and redirected all the egress traffic to ifb1. Taht > > > worked fine. > > > > Current versions of netem can take one class. > > -- > > Hmm, I can see that (commit 10f6dfcfde884441) > > But it wont work very well, it assumes qdisc uses a single queue if > netem reordering is requested : > > (__skb_queue_head(&q->qdisc->q, skb)) > > We should allow reordering if netem queue is changed from tfifo, only if > new qdisc is compatible with __skb_queue_head(&q->qdisc->q, skb) > > (maybe providing a new ->queue_at_head() new ops) Also, child qdisc must not scratch skb->cb[], since netem stores time_to_send in it. I guess nobody actually tried this netem mis-feature :( I'll send a fix. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: netem and hierarchical ingress traffic shaping 2011-12-23 17:39 ` Eric Dumazet @ 2011-12-23 17:54 ` Dave Taht 2011-12-23 18:28 ` Eric Dumazet 0 siblings, 1 reply; 22+ messages in thread From: Dave Taht @ 2011-12-23 17:54 UTC (permalink / raw) To: Eric Dumazet; +Cc: Stephen Hemminger, John A. Sullivan III, netdev On Fri, Dec 23, 2011 at 6:39 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > Le vendredi 23 décembre 2011 à 18:33 +0100, Eric Dumazet a écrit : >> Le dimanche 18 décembre 2011 à 11:55 -0800, Stephen Hemminger a écrit : >> > On Sun, 18 Dec 2011 00:12:12 -0500 >> > "John A. Sullivan III" <jsullivan@opensourcedevel.com> wrote: >> > >> > > Since netem appears to be classless, we realized we would need to >> > > replace the SFQ on each leaf with netem which we really didn't want to >> > > do - not only to not lose SFQ but because we didn't want to maintain the >> > > netem parameters on each leaf. So, we activated our ifb1 interface, >> > > placed netem on it and redirected all the egress traffic to ifb1. Taht >> > > worked fine. >> > >> > Current versions of netem can take one class. >> > -- >> >> Hmm, I can see that (commit 10f6dfcfde884441) >> >> But it wont work very well, it assumes qdisc uses a single queue if >> netem reordering is requested : >> >> (__skb_queue_head(&q->qdisc->q, skb)) >> >> We should allow reordering if netem queue is changed from tfifo, only if >> new qdisc is compatible with __skb_queue_head(&q->qdisc->q, skb) >> >> (maybe providing a new ->queue_at_head() new ops) > > Also, child qdisc must not scratch skb->cb[], since netem stores > time_to_send in it. Are there any place where all 48 bytes of cb are used? I wouldn't mind if 'time_to_send' became a separate skb field for a more generic 'time_in_queue'... > I guess nobody actually tried this netem mis-feature :( > > I'll send a fix. > > > > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Dave Täht SKYPE: davetaht US Tel: 1-239-829-5608 FR Tel: 0638645374 http://www.bufferbloat.net ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: netem and hierarchical ingress traffic shaping 2011-12-23 17:54 ` Dave Taht @ 2011-12-23 18:28 ` Eric Dumazet 2011-12-23 18:54 ` Dave Taht 2011-12-23 19:07 ` Stephen Hemminger 0 siblings, 2 replies; 22+ messages in thread From: Eric Dumazet @ 2011-12-23 18:28 UTC (permalink / raw) To: Dave Taht; +Cc: Stephen Hemminger, John A. Sullivan III, netdev Le vendredi 23 décembre 2011 à 18:54 +0100, Dave Taht a écrit : > Are there any place where all 48 bytes of cb are used? > Yes, but on in qdisc layer. struct tcp_skb_cb is known to be 44 bytes (when IPv6 is enabled) In qdisc layer, we use a small part of it, for the moment. > I wouldn't mind if 'time_to_send' became a separate skb field > for a more generic 'time_in_queue'... > This wont happen. As I posted in an earlier patch, this can be added in "struct qdisc_skb_cb" ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: netem and hierarchical ingress traffic shaping 2011-12-23 18:28 ` Eric Dumazet @ 2011-12-23 18:54 ` Dave Taht 2011-12-23 19:07 ` Stephen Hemminger 1 sibling, 0 replies; 22+ messages in thread From: Dave Taht @ 2011-12-23 18:54 UTC (permalink / raw) To: Eric Dumazet; +Cc: Stephen Hemminger, John A. Sullivan III, netdev On Fri, Dec 23, 2011 at 7:28 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > Le vendredi 23 décembre 2011 à 18:54 +0100, Dave Taht a écrit : > >> Are there any place where all 48 bytes of cb are used? >> > > Yes, but on in qdisc layer. > > struct tcp_skb_cb is known to be 44 bytes (when IPv6 is enabled) > > In qdisc layer, we use a small part of it, for the moment. > >> I wouldn't mind if 'time_to_send' became a separate skb field >> for a more generic 'time_in_queue'... >> > > This wont happen. > > As I posted in an earlier patch, this can be added in "struct > qdisc_skb_cb" Ah... this patch? http://patchwork.ozlabs.org/patch/125329/ I liked what I saw then, but nobody chirped up to review it... I guess I know what I'm doing this weekend. > > > -- Dave Täht SKYPE: davetaht US Tel: 1-239-829-5608 FR Tel: 0638645374 http://www.bufferbloat.net ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: netem and hierarchical ingress traffic shaping 2011-12-23 18:28 ` Eric Dumazet 2011-12-23 18:54 ` Dave Taht @ 2011-12-23 19:07 ` Stephen Hemminger 2011-12-23 19:21 ` Eric Dumazet 2011-12-23 19:36 ` netem and hierarchical ingress traffic shaping David Miller 1 sibling, 2 replies; 22+ messages in thread From: Stephen Hemminger @ 2011-12-23 19:07 UTC (permalink / raw) To: Eric Dumazet; +Cc: Dave Taht, John A. Sullivan III, netdev On Fri, 23 Dec 2011 19:28:27 +0100 Eric Dumazet <eric.dumazet@gmail.com> wrote: > Le vendredi 23 décembre 2011 à 18:54 +0100, Dave Taht a écrit : > > > Are there any place where all 48 bytes of cb are used? > > > > Yes, but on in qdisc layer. > > struct tcp_skb_cb is known to be 44 bytes (when IPv6 is enabled) > > In qdisc layer, we use a small part of it, for the moment. > > > I wouldn't mind if 'time_to_send' became a separate skb field > > for a more generic 'time_in_queue'... > > > > This wont happen. > > As I posted in an earlier patch, this can be added in "struct > qdisc_skb_cb" > > > skb_cb is the dumping ground of the networking layer. The assumption was that the qdisc could use the skb_cb for it's own scratchpad. Netem is using it for tagging packets in the queue. So basically, netem, choke, and sfb are incompatible with each other. This is not that bad, why not add a flag to qdisc ops to indicate which qdisc are using cb and block user from trying to do something bogus. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: netem and hierarchical ingress traffic shaping 2011-12-23 19:07 ` Stephen Hemminger @ 2011-12-23 19:21 ` Eric Dumazet 2011-12-29 4:26 ` [PATCH net-next] netem: fix classful handling Eric Dumazet 2011-12-23 19:36 ` netem and hierarchical ingress traffic shaping David Miller 1 sibling, 1 reply; 22+ messages in thread From: Eric Dumazet @ 2011-12-23 19:21 UTC (permalink / raw) To: Stephen Hemminger; +Cc: Dave Taht, John A. Sullivan III, netdev Le vendredi 23 décembre 2011 à 11:07 -0800, Stephen Hemminger a écrit : > skb_cb is the dumping ground of the networking layer. > The assumption was that the qdisc could use the skb_cb > for it's own scratchpad. Netem is using it for tagging > packets in the queue. > > So basically, netem, choke, and sfb are incompatible with > each other. This is not that bad, why not add a flag to qdisc > ops to indicate which qdisc are using cb and block user from > trying to do something bogus. This is not how I planned to solve the problem. I think we need an internal tfifo for netem use. Then be able to add another qdisc on top of netem. tfifo as a first stage, hardcoded (only limit is tunable) [ optional 2nd stage, any qdisc ] netem_queue() { queue packet to tfifo (eventually at head, of reordering) } netem_dequeue() { if (other_qdisc) { for_each_packet_from_tfifo_time_ready() { other_qdisc->enqueue(skb); } try_to_dequeue_one_packet_from(other_qdisc); } else { dequeue_one_packet_from_tfifo_time_ready(); } } ^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH net-next] netem: fix classful handling 2011-12-23 19:21 ` Eric Dumazet @ 2011-12-29 4:26 ` Eric Dumazet 2011-12-29 6:17 ` Stephen Hemminger 0 siblings, 1 reply; 22+ messages in thread From: Eric Dumazet @ 2011-12-29 4:26 UTC (permalink / raw) To: David Miller; +Cc: Dave Taht, John A. Sullivan III, netdev, Stephen Hemminger Commit 10f6dfcfde (Revert "sch_netem: Remove classful functionality") reintroduced classful functionality to netem, but broke basic netem behavior : netem uses an t(ime)fifo queue, and store timestamps in skb->cb[] If qdisc is changed, time constraints are not respected and other qdisc can destroy skb->cb[] and block netem at dequeue time. Fix this by always using internal tfifo, and optionally attach a child qdisc to netem. Example of use : DEV=eth3 tc qdisc del dev $DEV root tc qdisc add dev $DEV root handle 30: est 1sec 8sec netem delay 20ms 10ms tc qdisc add dev $DEV parent 30:0 sfq # tc -s -d qdisc show dev eth3 qdisc netem 30: root refcnt 18 limit 1000 delay 20.0ms 10.0ms Sent 893810 bytes 891 pkt (dropped 0, overlimits 0 requeues 0) rate 690192bit 61pps backlog 5972b 2p requeues 0 qdisc sfq 800b: parent 30: limit 127p quantum 1514b flows 127/1024 divisor 1024 Sent 893810 bytes 891 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Stephen Hemminger <shemminger@vyatta.com> --- net/sched/sch_netem.c | 204 +++++++++++++++------------------------- 1 file changed, 81 insertions(+), 123 deletions(-) diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c index ffcaa59..641bee5 100644 --- a/net/sched/sch_netem.c +++ b/net/sched/sch_netem.c @@ -67,7 +67,11 @@ */ struct netem_sched_data { + /* internal t(ime)fifo qdisc uses sch->q and sch->limit */ + + /* optional qdisc for classful handling (NULL at netem init) */ struct Qdisc *qdisc; + struct qdisc_watchdog watchdog; psched_tdiff_t latency; @@ -117,7 +121,9 @@ struct netem_sched_data { }; -/* Time stamp put into socket buffer control block */ +/* Time stamp put into socket buffer control block + * Only valid when skbs are in our internal t(ime)fifo queue. + */ struct netem_skb_cb { psched_time_t time_to_send; }; @@ -324,6 +330,31 @@ static psched_time_t packet_len_2_sched_time(unsigned int len, struct netem_sche return PSCHED_NS2TICKS(ticks); } +static int tfifo_enqueue(struct sk_buff *nskb, struct Qdisc *sch) +{ + struct sk_buff_head *list = &sch->q; + psched_time_t tnext = netem_skb_cb(nskb)->time_to_send; + struct sk_buff *skb; + + if (likely(skb_queue_len(list) < sch->limit)) { + skb = skb_peek_tail(list); + /* Optimize for add at tail */ + if (likely(!skb || tnext >= netem_skb_cb(skb)->time_to_send)) + return qdisc_enqueue_tail(nskb, sch); + + skb_queue_reverse_walk(list, skb) { + if (tnext >= netem_skb_cb(skb)->time_to_send) + break; + } + + __skb_queue_after(list, skb, nskb); + sch->qstats.backlog += qdisc_pkt_len(nskb); + return NET_XMIT_SUCCESS; + } + + return qdisc_reshape_fail(nskb, sch); +} + /* * Insert one skb into qdisc. * Note: parent depends on return value to account for queue length. @@ -399,7 +430,7 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch) now = psched_get_time(); if (q->rate) { - struct sk_buff_head *list = &q->qdisc->q; + struct sk_buff_head *list = &sch->q; delay += packet_len_2_sched_time(skb->len, q); @@ -417,7 +448,7 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch) cb->time_to_send = now + delay; ++q->counter; - ret = qdisc_enqueue(skb, q->qdisc); + ret = tfifo_enqueue(skb, sch); } else { /* * Do re-ordering by putting one out of N packets at the front @@ -426,9 +457,7 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch) cb->time_to_send = psched_get_time(); q->counter = 0; - __skb_queue_head(&q->qdisc->q, skb); - q->qdisc->qstats.backlog += qdisc_pkt_len(skb); - q->qdisc->qstats.requeues++; + __skb_queue_head(&sch->q, skb); ret = NET_XMIT_SUCCESS; } @@ -439,19 +468,20 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch) } } - sch->q.qlen++; return NET_XMIT_SUCCESS; } static unsigned int netem_drop(struct Qdisc *sch) { struct netem_sched_data *q = qdisc_priv(sch); - unsigned int len = 0; + unsigned int len; - if (q->qdisc->ops->drop && (len = q->qdisc->ops->drop(q->qdisc)) != 0) { - sch->q.qlen--; + len = qdisc_queue_drop(sch); + if (!len && q->qdisc && q->qdisc->ops->drop) + len = q->qdisc->ops->drop(q->qdisc); + if (len) sch->qstats.drops++; - } + return len; } @@ -463,16 +493,16 @@ static struct sk_buff *netem_dequeue(struct Qdisc *sch) if (qdisc_is_throttled(sch)) return NULL; - skb = q->qdisc->ops->peek(q->qdisc); +tfifo_dequeue: + skb = qdisc_peek_head(sch); if (skb) { const struct netem_skb_cb *cb = netem_skb_cb(skb); - psched_time_t now = psched_get_time(); /* if more time remaining? */ - if (cb->time_to_send <= now) { - skb = qdisc_dequeue_peeked(q->qdisc); + if (cb->time_to_send <= psched_get_time()) { + skb = qdisc_dequeue_tail(sch); if (unlikely(!skb)) - return NULL; + goto qdisc_dequeue; #ifdef CONFIG_NET_CLS_ACT /* @@ -483,15 +513,37 @@ static struct sk_buff *netem_dequeue(struct Qdisc *sch) skb->tstamp.tv64 = 0; #endif - sch->q.qlen--; + if (q->qdisc) { + int err = qdisc_enqueue(skb, q->qdisc); + + if (unlikely(err != NET_XMIT_SUCCESS)) { + if (net_xmit_drop_count(err)) { + sch->qstats.drops++; + qdisc_tree_decrease_qlen(sch, 1); + } + } + goto tfifo_dequeue; + } +deliver: qdisc_unthrottled(sch); qdisc_bstats_update(sch, skb); return skb; } + if (q->qdisc) { + skb = q->qdisc->ops->dequeue(q->qdisc); + if (skb) + goto deliver; + } qdisc_watchdog_schedule(&q->watchdog, cb->time_to_send); } +qdisc_dequeue: + if (q->qdisc) { + skb = q->qdisc->ops->dequeue(q->qdisc); + if (skb) + goto deliver; + } return NULL; } @@ -499,8 +551,9 @@ static void netem_reset(struct Qdisc *sch) { struct netem_sched_data *q = qdisc_priv(sch); - qdisc_reset(q->qdisc); - sch->q.qlen = 0; + qdisc_reset_queue(sch); + if (q->qdisc) + qdisc_reset(q->qdisc); qdisc_watchdog_cancel(&q->watchdog); } @@ -689,11 +742,7 @@ static int netem_change(struct Qdisc *sch, struct nlattr *opt) if (ret < 0) return ret; - ret = fifo_set_limit(q->qdisc, qopt->limit); - if (ret) { - pr_info("netem: can't set fifo limit\n"); - return ret; - } + sch->limit = qopt->limit; q->latency = qopt->latency; q->jitter = qopt->jitter; @@ -734,88 +783,6 @@ static int netem_change(struct Qdisc *sch, struct nlattr *opt) return ret; } -/* - * Special case version of FIFO queue for use by netem. - * It queues in order based on timestamps in skb's - */ -struct fifo_sched_data { - u32 limit; - psched_time_t oldest; -}; - -static int tfifo_enqueue(struct sk_buff *nskb, struct Qdisc *sch) -{ - struct fifo_sched_data *q = qdisc_priv(sch); - struct sk_buff_head *list = &sch->q; - psched_time_t tnext = netem_skb_cb(nskb)->time_to_send; - struct sk_buff *skb; - - if (likely(skb_queue_len(list) < q->limit)) { - /* Optimize for add at tail */ - if (likely(skb_queue_empty(list) || tnext >= q->oldest)) { - q->oldest = tnext; - return qdisc_enqueue_tail(nskb, sch); - } - - skb_queue_reverse_walk(list, skb) { - const struct netem_skb_cb *cb = netem_skb_cb(skb); - - if (tnext >= cb->time_to_send) - break; - } - - __skb_queue_after(list, skb, nskb); - - sch->qstats.backlog += qdisc_pkt_len(nskb); - - return NET_XMIT_SUCCESS; - } - - return qdisc_reshape_fail(nskb, sch); -} - -static int tfifo_init(struct Qdisc *sch, struct nlattr *opt) -{ - struct fifo_sched_data *q = qdisc_priv(sch); - - if (opt) { - struct tc_fifo_qopt *ctl = nla_data(opt); - if (nla_len(opt) < sizeof(*ctl)) - return -EINVAL; - - q->limit = ctl->limit; - } else - q->limit = max_t(u32, qdisc_dev(sch)->tx_queue_len, 1); - - q->oldest = PSCHED_PASTPERFECT; - return 0; -} - -static int tfifo_dump(struct Qdisc *sch, struct sk_buff *skb) -{ - struct fifo_sched_data *q = qdisc_priv(sch); - struct tc_fifo_qopt opt = { .limit = q->limit }; - - NLA_PUT(skb, TCA_OPTIONS, sizeof(opt), &opt); - return skb->len; - -nla_put_failure: - return -1; -} - -static struct Qdisc_ops tfifo_qdisc_ops __read_mostly = { - .id = "tfifo", - .priv_size = sizeof(struct fifo_sched_data), - .enqueue = tfifo_enqueue, - .dequeue = qdisc_dequeue_head, - .peek = qdisc_peek_head, - .drop = qdisc_queue_drop, - .init = tfifo_init, - .reset = qdisc_reset_queue, - .change = tfifo_init, - .dump = tfifo_dump, -}; - static int netem_init(struct Qdisc *sch, struct nlattr *opt) { struct netem_sched_data *q = qdisc_priv(sch); @@ -827,18 +794,9 @@ static int netem_init(struct Qdisc *sch, struct nlattr *opt) qdisc_watchdog_init(&q->watchdog, sch); q->loss_model = CLG_RANDOM; - q->qdisc = qdisc_create_dflt(sch->dev_queue, &tfifo_qdisc_ops, - TC_H_MAKE(sch->handle, 1)); - if (!q->qdisc) { - pr_notice("netem: qdisc create tfifo qdisc failed\n"); - return -ENOMEM; - } - ret = netem_change(sch, opt); - if (ret) { + if (ret) pr_info("netem: change failed\n"); - qdisc_destroy(q->qdisc); - } return ret; } @@ -847,7 +805,8 @@ static void netem_destroy(struct Qdisc *sch) struct netem_sched_data *q = qdisc_priv(sch); qdisc_watchdog_cancel(&q->watchdog); - qdisc_destroy(q->qdisc); + if (q->qdisc) + qdisc_destroy(q->qdisc); dist_free(q->delay_dist); } @@ -951,7 +910,7 @@ static int netem_dump_class(struct Qdisc *sch, unsigned long cl, { struct netem_sched_data *q = qdisc_priv(sch); - if (cl != 1) /* only one class */ + if (cl != 1 || !q->qdisc) /* only one class */ return -ENOENT; tcm->tcm_handle |= TC_H_MIN(1); @@ -965,14 +924,13 @@ static int netem_graft(struct Qdisc *sch, unsigned long arg, struct Qdisc *new, { struct netem_sched_data *q = qdisc_priv(sch); - if (new == NULL) - new = &noop_qdisc; - sch_tree_lock(sch); *old = q->qdisc; q->qdisc = new; - qdisc_tree_decrease_qlen(*old, (*old)->q.qlen); - qdisc_reset(*old); + if (*old) { + qdisc_tree_decrease_qlen(*old, (*old)->q.qlen); + qdisc_reset(*old); + } sch_tree_unlock(sch); return 0; ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH net-next] netem: fix classful handling 2011-12-29 4:26 ` [PATCH net-next] netem: fix classful handling Eric Dumazet @ 2011-12-29 6:17 ` Stephen Hemminger 2011-12-29 9:12 ` Eric Dumazet 0 siblings, 1 reply; 22+ messages in thread From: Stephen Hemminger @ 2011-12-29 6:17 UTC (permalink / raw) To: Eric Dumazet; +Cc: David Miller, Dave Taht, John A. Sullivan III, netdev On Thu, 29 Dec 2011 05:26:00 +0100 Eric Dumazet <eric.dumazet@gmail.com> wrote: > Commit 10f6dfcfde (Revert "sch_netem: Remove classful functionality") > reintroduced classful functionality to netem, but broke basic netem > behavior : > > netem uses an t(ime)fifo queue, and store timestamps in skb->cb[] > > If qdisc is changed, time constraints are not respected and other qdisc > can destroy skb->cb[] and block netem at dequeue time. > > Fix this by always using internal tfifo, and optionally attach a child > qdisc to netem. > > Example of use : > > DEV=eth3 > tc qdisc del dev $DEV root > tc qdisc add dev $DEV root handle 30: est 1sec 8sec netem delay 20ms 10ms > tc qdisc add dev $DEV parent 30:0 sfq Does it work with TBF which is a more useful option? Also, the whole tfifo idea is only to support the wierd idea that if doing random delay that packets should get reordered based on the results of the random value; it was an behavior some users wanted because that is what NISTnet did. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH net-next] netem: fix classful handling 2011-12-29 6:17 ` Stephen Hemminger @ 2011-12-29 9:12 ` Eric Dumazet 2011-12-29 16:52 ` Hagen Paul Pfeifer 2011-12-30 22:12 ` David Miller 0 siblings, 2 replies; 22+ messages in thread From: Eric Dumazet @ 2011-12-29 9:12 UTC (permalink / raw) To: Stephen Hemminger; +Cc: David Miller, Dave Taht, John A. Sullivan III, netdev Le mercredi 28 décembre 2011 à 22:17 -0800, Stephen Hemminger a écrit : > On Thu, 29 Dec 2011 05:26:00 +0100 > Eric Dumazet <eric.dumazet@gmail.com> wrote: > > > Commit 10f6dfcfde (Revert "sch_netem: Remove classful functionality") > > reintroduced classful functionality to netem, but broke basic netem > > behavior : > > > > netem uses an t(ime)fifo queue, and store timestamps in skb->cb[] > > > > If qdisc is changed, time constraints are not respected and other qdisc > > can destroy skb->cb[] and block netem at dequeue time. > > > > Fix this by always using internal tfifo, and optionally attach a child > > qdisc to netem. > > > > Example of use : > > > > DEV=eth3 > > tc qdisc del dev $DEV root > > tc qdisc add dev $DEV root handle 30: est 1sec 8sec netem delay 20ms 10ms > > tc qdisc add dev $DEV parent 30:0 sfq > > Does it work with TBF which is a more useful option? > Yes : tc qdisc add dev $DEV root handle 30:0 est 1sec 8sec netem \ delay 20ms 10ms tc qdisc add dev $DEV handle 40:0 parent 30:0 tbf \ burst 20480 limit 20480 mtu 1514 rate 32000bps qdisc netem 30: root refcnt 18 limit 1000 delay 20.0ms 10.0ms Sent 190792 bytes 413 pkt (dropped 0, overlimits 0 requeues 0) rate 18416bit 3pps backlog 0b 0p requeues 0 qdisc tbf 40: parent 30: rate 256000bit burst 20Kb/8 mpu 0b lat 0us Sent 190792 bytes 413 pkt (dropped 6, overlimits 10 requeues 0) backlog 0b 5p requeues 0 > Also, the whole tfifo idea is only to support the wierd idea that > if doing random delay that packets should get reordered based on the > results of the random value; it was an behavior some users wanted > because that is what NISTnet did. tfifo supports a time ordered queuing, wich mimics some jitter in the network. This seems quite useful. I see what you suggest : adding 'time_to_send' in the generic qdisc cb. But it makes no sense if we attach a reordering qdisc, like SFQ : A 'high prio' packet will block the whole netem because we'll have to throttle since this packet time_to_send will be in the future, while many other elligible packets are in queue. In the case no jitter is asked to netem, we directly enqueue at the queue tail with no extra cost, since we need to access last skb in queue to perform the qdisc_enqueue_tail() I am sending a v2 because of two lines I inadvertently removed in the case we queue the new packet at the head of tfifo. (allowing to bypass the sch->limit check... I'll send a separate patch to add this check) Thanks [PATCH v2 net-next] netem: fix classful handling Commit 10f6dfcfde (Revert "sch_netem: Remove classful functionality") reintroduced classful functionality to netem, but broke basic netem behavior : netem uses an t(ime)fifo queue, and store timestamps in skb->cb[] If qdisc is changed, time constraints are not respected and other qdisc can destroy skb->cb[] and block netem at dequeue time. Fix this by always using internal tfifo, and optionally attach a child qdisc to netem (or a tree of qdiscs) Example of use : DEV=eth3 tc qdisc del dev $DEV root tc qdisc add dev $DEV root handle 30: est 1sec 8sec netem delay 20ms 10ms tc qdisc add dev $DEV handle 40:0 parent 30:0 tbf \ burst 20480 limit 20480 mtu 1514 rate 32000bps qdisc netem 30: root refcnt 18 limit 1000 delay 20.0ms 10.0ms Sent 190792 bytes 413 pkt (dropped 0, overlimits 0 requeues 0) rate 18416bit 3pps backlog 0b 0p requeues 0 qdisc tbf 40: parent 30: rate 256000bit burst 20Kb/8 mpu 0b lat 0us Sent 190792 bytes 413 pkt (dropped 6, overlimits 10 requeues 0) backlog 0b 5p requeues 0 Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Stephen Hemminger <shemminger@vyatta.com> --- net/sched/sch_netem.c | 202 ++++++++++++++++------------------------ 1 file changed, 81 insertions(+), 121 deletions(-) diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c index ffcaa59..1e611cb 100644 --- a/net/sched/sch_netem.c +++ b/net/sched/sch_netem.c @@ -67,7 +67,11 @@ */ struct netem_sched_data { + /* internal t(ime)fifo qdisc uses sch->q and sch->limit */ + + /* optional qdisc for classful handling (NULL at netem init) */ struct Qdisc *qdisc; + struct qdisc_watchdog watchdog; psched_tdiff_t latency; @@ -117,7 +121,9 @@ struct netem_sched_data { }; -/* Time stamp put into socket buffer control block */ +/* Time stamp put into socket buffer control block + * Only valid when skbs are in our internal t(ime)fifo queue. + */ struct netem_skb_cb { psched_time_t time_to_send; }; @@ -324,6 +330,31 @@ static psched_time_t packet_len_2_sched_time(unsigned int len, struct netem_sche return PSCHED_NS2TICKS(ticks); } +static int tfifo_enqueue(struct sk_buff *nskb, struct Qdisc *sch) +{ + struct sk_buff_head *list = &sch->q; + psched_time_t tnext = netem_skb_cb(nskb)->time_to_send; + struct sk_buff *skb; + + if (likely(skb_queue_len(list) < sch->limit)) { + skb = skb_peek_tail(list); + /* Optimize for add at tail */ + if (likely(!skb || tnext >= netem_skb_cb(skb)->time_to_send)) + return qdisc_enqueue_tail(nskb, sch); + + skb_queue_reverse_walk(list, skb) { + if (tnext >= netem_skb_cb(skb)->time_to_send) + break; + } + + __skb_queue_after(list, skb, nskb); + sch->qstats.backlog += qdisc_pkt_len(nskb); + return NET_XMIT_SUCCESS; + } + + return qdisc_reshape_fail(nskb, sch); +} + /* * Insert one skb into qdisc. * Note: parent depends on return value to account for queue length. @@ -399,7 +430,7 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch) now = psched_get_time(); if (q->rate) { - struct sk_buff_head *list = &q->qdisc->q; + struct sk_buff_head *list = &sch->q; delay += packet_len_2_sched_time(skb->len, q); @@ -417,7 +448,7 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch) cb->time_to_send = now + delay; ++q->counter; - ret = qdisc_enqueue(skb, q->qdisc); + ret = tfifo_enqueue(skb, sch); } else { /* * Do re-ordering by putting one out of N packets at the front @@ -426,7 +457,7 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch) cb->time_to_send = psched_get_time(); q->counter = 0; - __skb_queue_head(&q->qdisc->q, skb); + __skb_queue_head(&sch->q, skb); q->qdisc->qstats.backlog += qdisc_pkt_len(skb); q->qdisc->qstats.requeues++; ret = NET_XMIT_SUCCESS; @@ -439,19 +470,20 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch) } } - sch->q.qlen++; return NET_XMIT_SUCCESS; } static unsigned int netem_drop(struct Qdisc *sch) { struct netem_sched_data *q = qdisc_priv(sch); - unsigned int len = 0; + unsigned int len; - if (q->qdisc->ops->drop && (len = q->qdisc->ops->drop(q->qdisc)) != 0) { - sch->q.qlen--; + len = qdisc_queue_drop(sch); + if (!len && q->qdisc && q->qdisc->ops->drop) + len = q->qdisc->ops->drop(q->qdisc); + if (len) sch->qstats.drops++; - } + return len; } @@ -463,16 +495,16 @@ static struct sk_buff *netem_dequeue(struct Qdisc *sch) if (qdisc_is_throttled(sch)) return NULL; - skb = q->qdisc->ops->peek(q->qdisc); +tfifo_dequeue: + skb = qdisc_peek_head(sch); if (skb) { const struct netem_skb_cb *cb = netem_skb_cb(skb); - psched_time_t now = psched_get_time(); /* if more time remaining? */ - if (cb->time_to_send <= now) { - skb = qdisc_dequeue_peeked(q->qdisc); + if (cb->time_to_send <= psched_get_time()) { + skb = qdisc_dequeue_tail(sch); if (unlikely(!skb)) - return NULL; + goto qdisc_dequeue; #ifdef CONFIG_NET_CLS_ACT /* @@ -483,15 +515,37 @@ static struct sk_buff *netem_dequeue(struct Qdisc *sch) skb->tstamp.tv64 = 0; #endif - sch->q.qlen--; + if (q->qdisc) { + int err = qdisc_enqueue(skb, q->qdisc); + + if (unlikely(err != NET_XMIT_SUCCESS)) { + if (net_xmit_drop_count(err)) { + sch->qstats.drops++; + qdisc_tree_decrease_qlen(sch, 1); + } + } + goto tfifo_dequeue; + } +deliver: qdisc_unthrottled(sch); qdisc_bstats_update(sch, skb); return skb; } + if (q->qdisc) { + skb = q->qdisc->ops->dequeue(q->qdisc); + if (skb) + goto deliver; + } qdisc_watchdog_schedule(&q->watchdog, cb->time_to_send); } +qdisc_dequeue: + if (q->qdisc) { + skb = q->qdisc->ops->dequeue(q->qdisc); + if (skb) + goto deliver; + } return NULL; } @@ -499,8 +553,9 @@ static void netem_reset(struct Qdisc *sch) { struct netem_sched_data *q = qdisc_priv(sch); - qdisc_reset(q->qdisc); - sch->q.qlen = 0; + qdisc_reset_queue(sch); + if (q->qdisc) + qdisc_reset(q->qdisc); qdisc_watchdog_cancel(&q->watchdog); } @@ -689,11 +744,7 @@ static int netem_change(struct Qdisc *sch, struct nlattr *opt) if (ret < 0) return ret; - ret = fifo_set_limit(q->qdisc, qopt->limit); - if (ret) { - pr_info("netem: can't set fifo limit\n"); - return ret; - } + sch->limit = qopt->limit; q->latency = qopt->latency; q->jitter = qopt->jitter; @@ -734,88 +785,6 @@ static int netem_change(struct Qdisc *sch, struct nlattr *opt) return ret; } -/* - * Special case version of FIFO queue for use by netem. - * It queues in order based on timestamps in skb's - */ -struct fifo_sched_data { - u32 limit; - psched_time_t oldest; -}; - -static int tfifo_enqueue(struct sk_buff *nskb, struct Qdisc *sch) -{ - struct fifo_sched_data *q = qdisc_priv(sch); - struct sk_buff_head *list = &sch->q; - psched_time_t tnext = netem_skb_cb(nskb)->time_to_send; - struct sk_buff *skb; - - if (likely(skb_queue_len(list) < q->limit)) { - /* Optimize for add at tail */ - if (likely(skb_queue_empty(list) || tnext >= q->oldest)) { - q->oldest = tnext; - return qdisc_enqueue_tail(nskb, sch); - } - - skb_queue_reverse_walk(list, skb) { - const struct netem_skb_cb *cb = netem_skb_cb(skb); - - if (tnext >= cb->time_to_send) - break; - } - - __skb_queue_after(list, skb, nskb); - - sch->qstats.backlog += qdisc_pkt_len(nskb); - - return NET_XMIT_SUCCESS; - } - - return qdisc_reshape_fail(nskb, sch); -} - -static int tfifo_init(struct Qdisc *sch, struct nlattr *opt) -{ - struct fifo_sched_data *q = qdisc_priv(sch); - - if (opt) { - struct tc_fifo_qopt *ctl = nla_data(opt); - if (nla_len(opt) < sizeof(*ctl)) - return -EINVAL; - - q->limit = ctl->limit; - } else - q->limit = max_t(u32, qdisc_dev(sch)->tx_queue_len, 1); - - q->oldest = PSCHED_PASTPERFECT; - return 0; -} - -static int tfifo_dump(struct Qdisc *sch, struct sk_buff *skb) -{ - struct fifo_sched_data *q = qdisc_priv(sch); - struct tc_fifo_qopt opt = { .limit = q->limit }; - - NLA_PUT(skb, TCA_OPTIONS, sizeof(opt), &opt); - return skb->len; - -nla_put_failure: - return -1; -} - -static struct Qdisc_ops tfifo_qdisc_ops __read_mostly = { - .id = "tfifo", - .priv_size = sizeof(struct fifo_sched_data), - .enqueue = tfifo_enqueue, - .dequeue = qdisc_dequeue_head, - .peek = qdisc_peek_head, - .drop = qdisc_queue_drop, - .init = tfifo_init, - .reset = qdisc_reset_queue, - .change = tfifo_init, - .dump = tfifo_dump, -}; - static int netem_init(struct Qdisc *sch, struct nlattr *opt) { struct netem_sched_data *q = qdisc_priv(sch); @@ -827,18 +796,9 @@ static int netem_init(struct Qdisc *sch, struct nlattr *opt) qdisc_watchdog_init(&q->watchdog, sch); q->loss_model = CLG_RANDOM; - q->qdisc = qdisc_create_dflt(sch->dev_queue, &tfifo_qdisc_ops, - TC_H_MAKE(sch->handle, 1)); - if (!q->qdisc) { - pr_notice("netem: qdisc create tfifo qdisc failed\n"); - return -ENOMEM; - } - ret = netem_change(sch, opt); - if (ret) { + if (ret) pr_info("netem: change failed\n"); - qdisc_destroy(q->qdisc); - } return ret; } @@ -847,7 +807,8 @@ static void netem_destroy(struct Qdisc *sch) struct netem_sched_data *q = qdisc_priv(sch); qdisc_watchdog_cancel(&q->watchdog); - qdisc_destroy(q->qdisc); + if (q->qdisc) + qdisc_destroy(q->qdisc); dist_free(q->delay_dist); } @@ -951,7 +912,7 @@ static int netem_dump_class(struct Qdisc *sch, unsigned long cl, { struct netem_sched_data *q = qdisc_priv(sch); - if (cl != 1) /* only one class */ + if (cl != 1 || !q->qdisc) /* only one class */ return -ENOENT; tcm->tcm_handle |= TC_H_MIN(1); @@ -965,14 +926,13 @@ static int netem_graft(struct Qdisc *sch, unsigned long arg, struct Qdisc *new, { struct netem_sched_data *q = qdisc_priv(sch); - if (new == NULL) - new = &noop_qdisc; - sch_tree_lock(sch); *old = q->qdisc; q->qdisc = new; - qdisc_tree_decrease_qlen(*old, (*old)->q.qlen); - qdisc_reset(*old); + if (*old) { + qdisc_tree_decrease_qlen(*old, (*old)->q.qlen); + qdisc_reset(*old); + } sch_tree_unlock(sch); return 0; ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH net-next] netem: fix classful handling 2011-12-29 9:12 ` Eric Dumazet @ 2011-12-29 16:52 ` Hagen Paul Pfeifer 2011-12-29 17:15 ` Eric Dumazet 2011-12-30 22:12 ` David Miller 1 sibling, 1 reply; 22+ messages in thread From: Hagen Paul Pfeifer @ 2011-12-29 16:52 UTC (permalink / raw) To: Eric Dumazet Cc: Stephen Hemminger, David Miller, Dave Taht, John A. Sullivan III, netdev * Eric Dumazet | 2011-12-29 10:12:02 [+0100]: >> Also, the whole tfifo idea is only to support the wierd idea that >> if doing random delay that packets should get reordered based on the >> results of the random value; it was an behavior some users wanted >> because that is what NISTnet did. > >tfifo supports a time ordered queuing, wich mimics some jitter in the >network. This seems quite useful. > >I see what you suggest : adding 'time_to_send' in the generic qdisc cb. > >But it makes no sense if we attach a reordering qdisc, like SFQ : >A 'high prio' packet will block the whole netem because we'll have to >throttle since this packet time_to_send will be in the future, while >many other elligible packets are in queue. In other words netem jitter and a qdisc !tfifo will not work. Correct? The rate extension also peak the last packet to get the reference time (assuming a strict ordering): [...] now = netem_skb_cb(skb_peek_tail(list))->time_to_send; [...] We should avoid a different (unseeable) behavior depending on the queue (tfifo, SFQ). Another point: operate netem and qdisc on the same computer can lead to timing abnormalities. In our test setups we operate qdisc/tcp/whatever setups and netem on more then on computer. Hagen ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH net-next] netem: fix classful handling 2011-12-29 16:52 ` Hagen Paul Pfeifer @ 2011-12-29 17:15 ` Eric Dumazet 2011-12-29 17:43 ` Hagen Paul Pfeifer 0 siblings, 1 reply; 22+ messages in thread From: Eric Dumazet @ 2011-12-29 17:15 UTC (permalink / raw) To: Hagen Paul Pfeifer Cc: Stephen Hemminger, David Miller, Dave Taht, John A. Sullivan III, netdev Le jeudi 29 décembre 2011 à 17:52 +0100, Hagen Paul Pfeifer a écrit : > * Eric Dumazet | 2011-12-29 10:12:02 [+0100]: > > >> Also, the whole tfifo idea is only to support the wierd idea that > >> if doing random delay that packets should get reordered based on the > >> results of the random value; it was an behavior some users wanted > >> because that is what NISTnet did. > > > >tfifo supports a time ordered queuing, wich mimics some jitter in the > >network. This seems quite useful. > > > >I see what you suggest : adding 'time_to_send' in the generic qdisc cb. > > > >But it makes no sense if we attach a reordering qdisc, like SFQ : > >A 'high prio' packet will block the whole netem because we'll have to > >throttle since this packet time_to_send will be in the future, while > >many other elligible packets are in queue. > > In other words netem jitter and a qdisc !tfifo will not work. Correct? The > rate extension also peak the last packet to get the reference time (assuming a > strict ordering): > Yep, current situation is borked. It assumes we _use_ tfifo, for delay jitters but also for rate extension. > [...] > now = netem_skb_cb(skb_peek_tail(list))->time_to_send; > [...] > > > We should avoid a different (unseeable) behavior depending on the queue > (tfifo, SFQ). Another point: operate netem and qdisc on the same computer can > lead to timing abnormalities. In our test setups we operate qdisc/tcp/whatever > setups and netem on more then on computer. > After my patch you could use netem as a delay module before a complex qdisc setup for example. [ Simulating a 10ms delay on a 10Gigabit link is expensive, since you need to allow up to ~150.000 packets in tfifo. Maybe we should switch to tbfifo [giving a limit in bytes, not packets ] ] ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH net-next] netem: fix classful handling 2011-12-29 17:15 ` Eric Dumazet @ 2011-12-29 17:43 ` Hagen Paul Pfeifer 2011-12-29 18:10 ` Eric Dumazet 0 siblings, 1 reply; 22+ messages in thread From: Hagen Paul Pfeifer @ 2011-12-29 17:43 UTC (permalink / raw) To: Eric Dumazet Cc: Stephen Hemminger, David Miller, Dave Taht, John A. Sullivan III, netdev * Eric Dumazet | 2011-12-29 18:15:50 [+0100]: >> In other words netem jitter and a qdisc !tfifo will not work. Correct? The >> rate extension also peak the last packet to get the reference time (assuming a >> strict ordering): >> > >Yep, current situation is borked. It assumes we _use_ tfifo, for delay >jitters but also for rate extension. Mhh, should we signal this to the user via 'tc -s qdisc show'? Or should we assume that a user who set netem rate|jitter AND a qdisc !tfifo knows what he does - because we assume he is a experienced user? At least somewhere in the manpage a comment should point to this characteristic. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH net-next] netem: fix classful handling 2011-12-29 17:43 ` Hagen Paul Pfeifer @ 2011-12-29 18:10 ` Eric Dumazet 2011-12-29 18:25 ` Hagen Paul Pfeifer 0 siblings, 1 reply; 22+ messages in thread From: Eric Dumazet @ 2011-12-29 18:10 UTC (permalink / raw) To: Hagen Paul Pfeifer Cc: Stephen Hemminger, David Miller, Dave Taht, John A. Sullivan III, netdev Le jeudi 29 décembre 2011 à 18:43 +0100, Hagen Paul Pfeifer a écrit : > * Eric Dumazet | 2011-12-29 18:15:50 [+0100]: > > >> In other words netem jitter and a qdisc !tfifo will not work. Correct? The > >> rate extension also peak the last packet to get the reference time (assuming a > >> strict ordering): > >> > > > >Yep, current situation is borked. It assumes we _use_ tfifo, for delay > >jitters but also for rate extension. > > Mhh, should we signal this to the user via 'tc -s qdisc show'? Or should we > assume that a user who set netem rate|jitter AND a qdisc !tfifo knows what he > does - because we assume he is a experienced user? At least somewhere in the > manpage a comment should point to this characteristic. I dont quite understand the question. The patch I posted is supposed to fix the problem. What do you want to tell to the user ? netem module is probably only used by experimented users, on very recent kernels anyway. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH net-next] netem: fix classful handling 2011-12-29 18:10 ` Eric Dumazet @ 2011-12-29 18:25 ` Hagen Paul Pfeifer 2011-12-29 18:31 ` Stephen Hemminger 2011-12-29 18:36 ` Eric Dumazet 0 siblings, 2 replies; 22+ messages in thread From: Hagen Paul Pfeifer @ 2011-12-29 18:25 UTC (permalink / raw) To: Eric Dumazet Cc: Stephen Hemminger, David Miller, Dave Taht, John A. Sullivan III, netdev * Eric Dumazet | 2011-12-29 19:10:36 [+0100]: >I dont quite understand the question. The patch I posted is supposed to >fix the problem. What do you want to tell to the user ? I assumed that the patch makes it possible to replace standard tfifo?! Tfifo provides strict ordering, other qdisc's do not. So I thought if someone use netem rate|jitter with e.g. SFQ then this should be mentioned somewhere. E.g. "rate|jitter can only be used with tfifo qdisc". Correct me if I am wrong. Cheers ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH net-next] netem: fix classful handling 2011-12-29 18:25 ` Hagen Paul Pfeifer @ 2011-12-29 18:31 ` Stephen Hemminger 2011-12-29 18:36 ` Eric Dumazet 1 sibling, 0 replies; 22+ messages in thread From: Stephen Hemminger @ 2011-12-29 18:31 UTC (permalink / raw) To: Hagen Paul Pfeifer Cc: Eric Dumazet, David Miller, Dave Taht, John A. Sullivan III, netdev On Thu, 29 Dec 2011 19:25:03 +0100 Hagen Paul Pfeifer <hagen@jauu.net> wrote: > * Eric Dumazet | 2011-12-29 19:10:36 [+0100]: > > >I dont quite understand the question. The patch I posted is supposed to > >fix the problem. What do you want to tell to the user ? > > I assumed that the patch makes it possible to replace standard tfifo?! Tfifo > provides strict ordering, other qdisc's do not. So I thought if someone use > netem rate|jitter with e.g. SFQ then this should be mentioned somewhere. E.g. > "rate|jitter can only be used with tfifo qdisc". Correct me if I am wrong. > > Cheers This is actually a feature. It is documented that tfifo can be replaced with pfifo. In other words, if default (tfifo) is used then packets can be reordered by the delay/jitter values. But if pfifo (or other qdisc) is used, the packets will not be reordered. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH net-next] netem: fix classful handling 2011-12-29 18:25 ` Hagen Paul Pfeifer 2011-12-29 18:31 ` Stephen Hemminger @ 2011-12-29 18:36 ` Eric Dumazet 1 sibling, 0 replies; 22+ messages in thread From: Eric Dumazet @ 2011-12-29 18:36 UTC (permalink / raw) To: Hagen Paul Pfeifer Cc: Stephen Hemminger, David Miller, Dave Taht, John A. Sullivan III, netdev Le jeudi 29 décembre 2011 à 19:25 +0100, Hagen Paul Pfeifer a écrit : > * Eric Dumazet | 2011-12-29 19:10:36 [+0100]: > > >I dont quite understand the question. The patch I posted is supposed to > >fix the problem. What do you want to tell to the user ? > > I assumed that the patch makes it possible to replace standard tfifo?! Tfifo > provides strict ordering, other qdisc's do not. So I thought if someone use > netem rate|jitter with e.g. SFQ then this should be mentioned somewhere. E.g. > "rate|jitter can only be used with tfifo qdisc". Correct me if I am wrong. > Current netem uses a single queue, default tfifo. Then if you change this tfifo by SFQ, you lose tfifo, and netem doesnt work at all. I claim we cannot remove tfifo. After my patch, you now have : 1) An internal mandatory tfifo queue, to fulfill time_to_send requirements. 2) An optional qdisc (SFQ in your example), where packets are queued once dequeued from tfifo at the right time (after netem delay/rate respected), eventually a packet can finaly be delivered to device with a _bigger_ delay than the one predicted in tfifo, because of trafic shaping happening in this optional qdisc. Full netem block : +---------+ +--------+ -->>-enqueue--| tfifo |---->>>>--enqueue--| SFQ |-dequeue--->>>>> +---------+ +--------+ ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH net-next] netem: fix classful handling 2011-12-29 9:12 ` Eric Dumazet 2011-12-29 16:52 ` Hagen Paul Pfeifer @ 2011-12-30 22:12 ` David Miller 1 sibling, 0 replies; 22+ messages in thread From: David Miller @ 2011-12-30 22:12 UTC (permalink / raw) To: eric.dumazet; +Cc: shemminger, dave.taht, jsullivan, netdev From: Eric Dumazet <eric.dumazet@gmail.com> Date: Thu, 29 Dec 2011 10:12:02 +0100 > [PATCH v2 net-next] netem: fix classful handling Applied, thanks Eric. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: netem and hierarchical ingress traffic shaping 2011-12-23 19:07 ` Stephen Hemminger 2011-12-23 19:21 ` Eric Dumazet @ 2011-12-23 19:36 ` David Miller 1 sibling, 0 replies; 22+ messages in thread From: David Miller @ 2011-12-23 19:36 UTC (permalink / raw) To: shemminger; +Cc: eric.dumazet, dave.taht, jsullivan, netdev From: Stephen Hemminger <shemminger@vyatta.com> Date: Fri, 23 Dec 2011 11:07:49 -0800 > So basically, netem, choke, and sfb are incompatible with > each other. This is not that bad, why not add a flag to qdisc > ops to indicate which qdisc are using cb and block user from > trying to do something bogus. Or we could do what we do in the inet stack, define a layout that allows the different layers to use different parts of the CB. ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2011-12-30 22:12 UTC | newest] Thread overview: 22+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-12-18 5:12 netem and hierarchical ingress traffic shaping John A. Sullivan III 2011-12-18 19:55 ` Stephen Hemminger 2011-12-19 16:53 ` John A. Sullivan III 2011-12-23 17:33 ` Eric Dumazet 2011-12-23 17:39 ` Eric Dumazet 2011-12-23 17:54 ` Dave Taht 2011-12-23 18:28 ` Eric Dumazet 2011-12-23 18:54 ` Dave Taht 2011-12-23 19:07 ` Stephen Hemminger 2011-12-23 19:21 ` Eric Dumazet 2011-12-29 4:26 ` [PATCH net-next] netem: fix classful handling Eric Dumazet 2011-12-29 6:17 ` Stephen Hemminger 2011-12-29 9:12 ` Eric Dumazet 2011-12-29 16:52 ` Hagen Paul Pfeifer 2011-12-29 17:15 ` Eric Dumazet 2011-12-29 17:43 ` Hagen Paul Pfeifer 2011-12-29 18:10 ` Eric Dumazet 2011-12-29 18:25 ` Hagen Paul Pfeifer 2011-12-29 18:31 ` Stephen Hemminger 2011-12-29 18:36 ` Eric Dumazet 2011-12-30 22:12 ` David Miller 2011-12-23 19:36 ` netem and hierarchical ingress traffic shaping David Miller
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).