* tc filter mask for ACK packets off? @ 2012-01-01 2:30 John A. Sullivan III 2012-01-03 7:31 ` Michal Kubeček 0 siblings, 1 reply; 18+ messages in thread From: John A. Sullivan III @ 2012-01-01 2:30 UTC (permalink / raw) To: netdev Hello, all. I've been noticing that virtually all the documentation says we should prioritize ACK only packets and that they can be identified with match u8 0x10 0xff. However, isn't the actual flag field only 6 bits longs and the first two belong to a previous 6 bit reserved field? If that is true, if ever those bits are set, our filters will unnecessarily break. Shouldn't it be match u8 0x10 0x3f? Then again, I'm very new at this. Thanks - John ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: tc filter mask for ACK packets off? 2012-01-01 2:30 tc filter mask for ACK packets off? John A. Sullivan III @ 2012-01-03 7:31 ` Michal Kubeček 2012-01-03 9:36 ` Dave Taht 2012-01-04 0:01 ` Michal Soltys 0 siblings, 2 replies; 18+ messages in thread From: Michal Kubeček @ 2012-01-03 7:31 UTC (permalink / raw) To: netdev; +Cc: John A. Sullivan III On Saturday 31 of December 2011 21:30EN, John A. Sullivan III wrote: > Hello, all. I've been noticing that virtually all the documentation > says we should prioritize ACK only packets and that they can be > identified with match u8 0x10 0xff. However, isn't the actual flag > field only 6 bits longs and the first two belong to a previous 6 bit > reserved field? It's even worse, those two bits are in fact used for ECN (RFC 3168). > If that is true, if ever those bits are set, our filters will > unnecessarily break. Shouldn't it be match u8 0x10 0x3f? I think so. However, by a "ACK only" packet (worth prioritizing), I would rather understand a packet with ACK flag without any payload, not a packet with ACK as the only flag. For many TCP connections, all packets except initial SYN and SYN-ACK and two FIN packets have ACK as the only flag. So my guess is you should rather prioritize all TCP packets with no application layer data. Michal Kubecek ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: tc filter mask for ACK packets off? 2012-01-03 7:31 ` Michal Kubeček @ 2012-01-03 9:36 ` Dave Taht 2012-01-03 10:40 ` [RFC] SFQ planned changes Eric Dumazet 2012-01-03 12:18 ` tc filter mask for ACK packets off? John A. Sullivan III 2012-01-04 0:01 ` Michal Soltys 1 sibling, 2 replies; 18+ messages in thread From: Dave Taht @ 2012-01-03 9:36 UTC (permalink / raw) To: Michal Kubeček; +Cc: netdev, John A. Sullivan III On Tue, Jan 3, 2012 at 8:31 AM, Michal Kubeček <mkubecek@suse.cz> wrote: > On Saturday 31 of December 2011 21:30EN, John A. Sullivan III wrote: >> Hello, all. I've been noticing that virtually all the documentation >> says we should prioritize ACK only packets and that they can be >> identified with match u8 0x10 0xff. However, isn't the actual flag Most of that invalid documentation was derived from the original 'wondershaper' effort, and became 'canon', elsewhere. I'm hoping that we get a chance to correct the documentation on the new wiki and remove the old, incorrect info from the web... wshaper's (2001) assumptions were gradually invalidated over the years. It was a suitable shaper for a 200k-800k download link at the time, when web sites were 70k in size, and people still used things like ssh heavily, men were men, and javascript scarce.... Good follow on work was the esfq and adsl-shaper efforts, but these publications have also been obsoleted by events. ESFQ's core features got incorporated in sfq, and adsl-shaper sort of made it in, genericaly. The stories of those two shaping efforts are useful bits of history worth reading about, to gain context about the problems they were trying to solve. >> field only 6 bits longs and the first two belong to a previous 6 bit >> reserved field? >i > It's even worse, those two bits are in fact used for ECN (RFC 3168). I had submitted a patch to openwrt to fix this issue with wondershaper a while back. I don't know if it got taken up or not... and either way, wshaper's approach doesn't work well on modern bandwidths. The core idea (prioritizing small acks somewhat) retains some value, but the implementation is unworkable. >> If that is true, if ever those bits are set, our filters will >> unnecessarily break. Shouldn't it be match u8 0x10 0x3f? > > I think so. Yes, the old-style, 'canonical', filters break ECN, and have been breaking it everywhere for a decade. Also: most, TCP's timestamp so the ack size is larger. And alas... none of the shapers mentioned above do ipv6 properly. There are innumerable other limitations... notably prioritizing dns, syn, synack also can help. They are unable to detect or prioritize voip packets (sip or skype), either. > However, by a "ACK only" packet (worth prioritizing), I would rather > understand a packet with ACK flag without any payload, not a packet with > ACK as the only flag. For many TCP connections, all packets except > initial SYN and SYN-ACK and two FIN packets have ACK as the only flag. > So my guess is you should rather prioritize all TCP packets with no > application layer data. No. :) I'd go into more detail, but after what I hope are the final two fixes to sfq and qfq land in the net-next kernel (after some more testing), I like to think I have a more valid approach than this in the works, but that too will require some more development and testing. http://www.teklibre.com/~d/bloat/pfifo_fast_vs_sfq_qfq_linear.png If you are interested in seeing that work in progress git clone git://github.com/dtaht/deBloat.git see the src/staqfq.lua script for a start at a general purpose new age shaper... and src/qmodels/*4mbit* for some prototypes of a 'soft bandwidth' one. (regrettably net-next + some patches is required at present) I note that (as of yesterday) sfq is performing as well as qfq did under most workloads, and is considerably simpler than qfq, but what I have in mind for shaping in a asymmetric scenario *may* involve 'weighting' - rather than strictly prioritizing - small acks... and it may not - I'd like to be able to benchmark the various AQM approaches against a variety of workloads before declaring victory. Could use some help with all that.... > > Michal Kubecek > > > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Dave Täht SKYPE: davetaht US Tel: 1-239-829-5608 FR Tel: 0638645374 http://www.bufferbloat.net ^ permalink raw reply [flat|nested] 18+ messages in thread
* [RFC] SFQ planned changes 2012-01-03 9:36 ` Dave Taht @ 2012-01-03 10:40 ` Eric Dumazet 2012-01-03 12:07 ` Dave Taht 2012-01-03 12:18 ` tc filter mask for ACK packets off? John A. Sullivan III 1 sibling, 1 reply; 18+ messages in thread From: Eric Dumazet @ 2012-01-03 10:40 UTC (permalink / raw) To: Dave Taht; +Cc: Michal Kubeček, netdev, John A. Sullivan III Le mardi 03 janvier 2012 à 10:36 +0100, Dave Taht a écrit : > I note that (as of yesterday) sfq is performing as well as qfq did > under most workloads, and is considerably simpler than qfq, but > what I have in mind for shaping in a asymmetric scenario > *may* involve 'weighting' - rather than strictly prioritizing - > small acks... and it may not - I'd like to be able to benchmark > the various AQM approaches against a variety of workloads > before declaring victory. A QFQ setup with more than 1024 classes/qdisc is way too slow at init time, and consume ~384 bytes per class : ~12582912 bytes for 32768 classes. We also are limited to 65536 qdisc per device, so QFQ setup using hash is limited to a 32768 divisor. Now SFQ as implemented in Linux is very limited, with at most 127 flows and limit of 127 packets. [ So if 127 flows are active, we have one packet per flow ] I plan to add to SFQ following features : - Ability to specify a per flow limit Its what is called the 'depth', currently hardcoded to min(127, limit) - Ability to have up to 65535 flows (instead of 127) - Ability to have a head drop (to drop old packets from a flow) example of use : No more than 20 packets per flow, max 8000 flows, max 20000 packets in SFQ qdisc, hash table of 65536 slots. tc qdisc add ... sfq \ flows 8000 \ depth 20 \ headdrop \ limit 20000 divisor 65536 Ram usage : 32 bytes per flow, instead of 384 for QFQ, so much better cache hit ratio. 2 bytes per hash table slots, instead of 8 for QFQ. (perturb timer for a huge SFQ setup would be not recommended) ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC] SFQ planned changes 2012-01-03 10:40 ` [RFC] SFQ planned changes Eric Dumazet @ 2012-01-03 12:07 ` Dave Taht 2012-01-03 12:50 ` Eric Dumazet 0 siblings, 1 reply; 18+ messages in thread From: Dave Taht @ 2012-01-03 12:07 UTC (permalink / raw) To: Eric Dumazet; +Cc: Michal Kubeček, netdev, John A. Sullivan III It will take me a while to fully comment on this... there are all sorts of subtlties to deal with (one biggie - ledbat vs multi-queue behavior)... but I am encouraged by the events of the past months and my testing today.... On Tue, Jan 3, 2012 at 11:40 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > Le mardi 03 janvier 2012 à 10:36 +0100, Dave Taht a écrit : > >> I note that (as of yesterday) sfq is performing as well as qfq did >> under most workloads, and is considerably simpler than qfq, but >> what I have in mind for shaping in a asymmetric scenario >> *may* involve 'weighting' - rather than strictly prioritizing - >> small acks... and it may not - I'd like to be able to benchmark I need to be clear that the above is a subtle problem that I'd have to talk to in a separate mail - AND both SFQ and QFQ do such a better job than wshaper did in the first place that weighting small acks only wins in a limited number of scenarios. We have a larger problem in dealing with TSO/GSO size superpackets that's hard to solve. I'd prefer to think, design tests, and benchmark, and think again for a while... >> the various AQM approaches against a variety of workloads >> before declaring victory. > > > A QFQ setup with more than 1024 classes/qdisc is way too slow at init > time, and consume ~384 bytes per class : ~12582912 bytes for 32768 > classes. QFQ could be improved with some of the same techniques you describe below. > We also are limited to 65536 qdisc per device, so QFQ setup using hash > is limited to a 32768 divisor. > > > Now SFQ as implemented in Linux is very limited, with at most 127 flows > and limit of 127 packets. [ So if 127 flows are active, we have one > packet per flow ] I agree SFQ can be improved upwards in scale, greatly. My own personal goal is to evolve towards something that can replace pfifo_fast as the default in linux. I don't know if that goal is shared by all as yet. :) > I plan to add to SFQ following features : From a 'doing science' perspective, I'd like it if it remained possible to continue using and benchmarking SFQ as it was, and create this set of ideas as a new qdisc ('efq'?) As these changes seem to require changes to userspace tc, anyway, and (selfishly) my patching burden is great enough... Perhaps some additional benefit could be had by losing full backward API compatability with sfq, as well? > - Ability to specify a per flow limit > Its what is called the 'depth', > currently hardcoded to min(127, limit) Introducing per-flow buffering (as QFQ does) *re-introduces* the overall AQM problem of managing the size of the individual flows. this CDF graph shows how badly wireless is currently behaving (courtesy Albert Rafetseder of the university of vienna) http://www.teklibre.com/~d/bloat/qfq_vs_pfifo_fast_wireless_iwl_card_vs_cerowrt.pdf (I have to convince gnuplot to give me these!!) If I were to add a larger sub-qdisc depth on QFQ than what's in there (presently 24) the same graph would also show the median latency increase proportionately. The Time in Queue idea for managing that queue depth is quite strong, there may be others. (in fact, I'm carrying your preliminary TiQ patch in my bql trees, not that I've done anything with it yet) > - Ability to have up to 65535 flows (instead of 127) > > - Ability to have a head drop (to drop old packets from a flow) The head drop idea is strong, when combined with time in queue. However: it would be useful to be able to pull forward the next packet in that sub-queue and deliver it, so as to provide proper signalling upstream. Packets nowadays arrive in bursts, which means that once one time stamp has expired, many will. What I just suggested would (worst case) deliver every other packet in a backlog and obviously needs refinement..... > > example of use : No more than 20 packets per flow, max 8000 flows, max > 20000 packets in SFQ qdisc, hash table of 65536 slots. > > tc qdisc add ... sfq \ > flows 8000 \ > depth 20 \ > headdrop \ > limit 20000 divisor 65536 > > Ram usage : 32 bytes per flow, instead of 384 for QFQ, so much better > cache hit ratio. 2 bytes per hash table slots, instead of 8 for QFQ. I do like it! I retain liking for QFQ because other qdiscs (red, for example) can be attached to it, but having a simple, yet good default with SFQ scaled up to modern requirements would also be awesome! > (perturb timer for a huge SFQ setup would be not recommended) no kidding! > > > -- Dave Täht SKYPE: davetaht US Tel: 1-239-829-5608 FR Tel: 0638645374 http://www.bufferbloat.net ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC] SFQ planned changes 2012-01-03 12:07 ` Dave Taht @ 2012-01-03 12:50 ` Eric Dumazet 2012-01-03 16:08 ` Eric Dumazet 0 siblings, 1 reply; 18+ messages in thread From: Eric Dumazet @ 2012-01-03 12:50 UTC (permalink / raw) To: Dave Taht; +Cc: Michal Kubeček, netdev, John A. Sullivan III Le mardi 03 janvier 2012 à 13:07 +0100, Dave Taht a écrit : > From a 'doing science' perspective, I'd like it if it remained possible > to continue using and benchmarking SFQ as it was, and create > this set of ideas as a new qdisc ('efq'?) > > As these changes seem to require changes to userspace tc, anyway, > and (selfishly) my patching burden is great enough... > > Perhaps some additional benefit could be had by losing > full backward API compatability with sfq, as well? No, it's completely compatable with prior version. An old tc command, or lack of new arguments will setup the SFQ qdisc exactly as before. I coded the thing and am doing stress tests before submission. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC] SFQ planned changes 2012-01-03 12:50 ` Eric Dumazet @ 2012-01-03 16:08 ` Eric Dumazet 2012-01-03 23:57 ` Dave Taht 0 siblings, 1 reply; 18+ messages in thread From: Eric Dumazet @ 2012-01-03 16:08 UTC (permalink / raw) To: Dave Taht; +Cc: Michal Kubeček, netdev, John A. Sullivan III Here is the code I ran on my test server with 200 netperf TCP_STREAM flows with pretty good results (each flow gets 0.5 % of bandwidth) $TC qdisc add dev $DEV root handle 1: est 1sec 8sec htb default 1 $TC class add dev $DEV parent 1: classid 1:1 est 1sec 8sec htb \ rate 200Mbit mtu 40000 quantum 80000 $TC qdisc add dev $DEV parent 1:1 handle 10: est 1sec 8sec sfq \ limit 2000 depth 10 headdrop flows 1000 divisor 16384 # tcnew -s -d qdisc show dev eth3 qdisc htb 1: root refcnt 18 r2q 10 default 1 direct_packets_stat 0 ver 3.17 Sent 4512949730 bytes 3030391 pkt (dropped 44409, overlimits 6105100 requeues 1) rate 198288Kbit 16629pps backlog 0b 1732p requeues 1 qdisc sfq 10: parent 1:1 limit 2000p quantum 1514b depth 10 headdrop flows 1000/16384 divisor 16384 Sent 4512949730 bytes 3030391 pkt (dropped 44409, overlimits 0 requeues 0) rate 198288Kbit 16629pps backlog 2622248b 1732p requeues 0 patch on top of current net-next include/linux/pkt_sched.h | 7 + net/sched/sch_sfq.c | 144 ++++++++++++++++++++++++------------ 2 files changed, 104 insertions(+), 47 deletions(-) diff --git a/include/linux/pkt_sched.h b/include/linux/pkt_sched.h index 8daced3..c2c6cfd 100644 --- a/include/linux/pkt_sched.h +++ b/include/linux/pkt_sched.h @@ -162,6 +162,13 @@ struct tc_sfq_qopt { unsigned flows; /* Maximal number of flows */ }; +struct tc_sfq_ext_qopt { + struct tc_sfq_qopt qopt; + unsigned int depth; /* max number of packets per flow */ + unsigned int headdrop; +}; + + struct tc_sfq_xstats { __s32 allot; }; diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c index d329a8a..66682fd 100644 --- a/net/sched/sch_sfq.c +++ b/net/sched/sch_sfq.c @@ -67,15 +67,16 @@ IMPLEMENTATION: This implementation limits maximal queue length to 128; - max mtu to 2^18-1; max 128 flows, number of hash buckets to 1024. - The only goal of this restrictions was that all data - fit into one 4K page on 32bit arches. + max mtu to 2^18-1; + max 65280 flows, + number of hash buckets to 65536. It is easy to increase these values, but not in flight. */ #define SFQ_DEPTH 128 /* max number of packets per flow */ -#define SFQ_SLOTS 128 /* max number of flows */ -#define SFQ_EMPTY_SLOT 255 +#define SFQ_DEFAULT_FLOWS 128 +#define SFQ_MAX_FLOWS (0x10000 - 256) /* max number of flows */ +#define SFQ_EMPTY_SLOT 0xffff #define SFQ_DEFAULT_HASH_DIVISOR 1024 /* We use 16 bits to store allot, and want to handle packets up to 64K @@ -84,13 +85,13 @@ #define SFQ_ALLOT_SHIFT 3 #define SFQ_ALLOT_SIZE(X) DIV_ROUND_UP(X, 1 << SFQ_ALLOT_SHIFT) -/* This type should contain at least SFQ_DEPTH + SFQ_SLOTS values */ -typedef unsigned char sfq_index; +/* This type should contain at least SFQ_DEPTH + SFQ_MAX_FLOWS values */ +typedef u16 sfq_index; /* * We dont use pointers to save space. - * Small indexes [0 ... SFQ_SLOTS - 1] are 'pointers' to slots[] array - * while following values [SFQ_SLOTS ... SFQ_SLOTS + SFQ_DEPTH - 1] + * Small indexes [0 ... SFQ_MAX_FLOWS - 1] are 'pointers' to slots[] array + * while following values [SFQ_MAX_FLOWS ... SFQ_MAX_FLOWS + SFQ_DEPTH - 1] * are 'pointers' to dep[] array */ struct sfq_head { @@ -112,8 +113,11 @@ struct sfq_sched_data { /* Parameters */ int perturb_period; unsigned int quantum; /* Allotment per round: MUST BE >= MTU */ - int limit; + int limit; /* limit of total number of packets in this qdisc */ unsigned int divisor; /* number of slots in hash table */ + unsigned int maxflows; /* number of flows in flows array */ + int headdrop; + int depth; /* limit depth of each flow */ /* Variables */ struct tcf_proto *filter_list; struct timer_list perturb_timer; @@ -122,7 +126,7 @@ struct sfq_sched_data { unsigned short scaled_quantum; /* SFQ_ALLOT_SIZE(quantum) */ struct sfq_slot *tail; /* current slot in round */ sfq_index *ht; /* Hash table (divisor slots) */ - struct sfq_slot slots[SFQ_SLOTS]; + struct sfq_slot *slots; struct sfq_head dep[SFQ_DEPTH]; /* Linked list of slots, indexed by depth */ }; @@ -131,9 +135,9 @@ struct sfq_sched_data { */ static inline struct sfq_head *sfq_dep_head(struct sfq_sched_data *q, sfq_index val) { - if (val < SFQ_SLOTS) + if (val < SFQ_MAX_FLOWS) return &q->slots[val].dep; - return &q->dep[val - SFQ_SLOTS]; + return &q->dep[val - SFQ_MAX_FLOWS]; } /* @@ -199,18 +203,19 @@ static unsigned int sfq_classify(struct sk_buff *skb, struct Qdisc *sch, } /* - * x : slot number [0 .. SFQ_SLOTS - 1] + * x : slot number [0 .. SFQ_MAX_FLOWS - 1] */ static inline void sfq_link(struct sfq_sched_data *q, sfq_index x) { sfq_index p, n; - int qlen = q->slots[x].qlen; + struct sfq_slot *slot = &q->slots[x]; + int qlen = slot->qlen; - p = qlen + SFQ_SLOTS; + p = qlen + SFQ_MAX_FLOWS; n = q->dep[qlen].next; - q->slots[x].dep.next = n; - q->slots[x].dep.prev = p; + slot->dep.next = n; + slot->dep.prev = p; q->dep[qlen].next = x; /* sfq_dep_head(q, p)->next = x */ sfq_dep_head(q, n)->prev = x; @@ -305,7 +310,7 @@ static unsigned int sfq_drop(struct Qdisc *sch) x = q->dep[d].next; slot = &q->slots[x]; drop: - skb = slot_dequeue_tail(slot); + skb = q->headdrop ? slot_dequeue_head(slot) : slot_dequeue_tail(slot); len = qdisc_pkt_len(skb); sfq_dec(q, x); kfree_skb(skb); @@ -349,16 +354,26 @@ sfq_enqueue(struct sk_buff *skb, struct Qdisc *sch) slot = &q->slots[x]; if (x == SFQ_EMPTY_SLOT) { x = q->dep[0].next; /* get a free slot */ + if (x >= SFQ_MAX_FLOWS) + return qdisc_drop(skb, sch); q->ht[hash] = x; slot = &q->slots[x]; slot->hash = hash; } - /* If selected queue has length q->limit, do simple tail drop, - * i.e. drop _this_ packet. - */ - if (slot->qlen >= q->limit) - return qdisc_drop(skb, sch); + if (slot->qlen >= q->depth) { + struct sk_buff *head; + + if (!q->headdrop) + return qdisc_drop(skb, sch); + head = slot_dequeue_head(slot); + sch->qstats.backlog -= qdisc_pkt_len(head); + kfree_skb(head); + sch->qstats.drops++; + sch->qstats.backlog += qdisc_pkt_len(skb); + slot_queue_add(slot, skb); + return NET_XMIT_CN; + } sch->qstats.backlog += qdisc_pkt_len(skb); slot_queue_add(slot, skb); @@ -366,11 +381,11 @@ sfq_enqueue(struct sk_buff *skb, struct Qdisc *sch) if (slot->qlen == 1) { /* The flow is new */ if (q->tail == NULL) { /* It is the first flow */ slot->next = x; + q->tail = slot; } else { slot->next = q->tail->next; q->tail->next = x; } - q->tail = slot; slot->allot = q->scaled_quantum; } if (++sch->q.qlen <= q->limit) @@ -445,16 +460,17 @@ sfq_reset(struct Qdisc *sch) * We dont use sfq_dequeue()/sfq_enqueue() because we dont want to change * counters. */ -static void sfq_rehash(struct sfq_sched_data *q) +static int sfq_rehash(struct sfq_sched_data *q) { struct sk_buff *skb; int i; struct sfq_slot *slot; struct sk_buff_head list; + int dropped = 0; __skb_queue_head_init(&list); - for (i = 0; i < SFQ_SLOTS; i++) { + for (i = 0; i < q->maxflows; i++) { slot = &q->slots[i]; if (!slot->qlen) continue; @@ -474,6 +490,11 @@ static void sfq_rehash(struct sfq_sched_data *q) slot = &q->slots[x]; if (x == SFQ_EMPTY_SLOT) { x = q->dep[0].next; /* get a free slot */ + if (x >= SFQ_MAX_FLOWS) { + kfree_skb(skb); + dropped++; + continue; + } q->ht[hash] = x; slot = &q->slots[x]; slot->hash = hash; @@ -491,6 +512,7 @@ static void sfq_rehash(struct sfq_sched_data *q) slot->allot = q->scaled_quantum; } } + return dropped; } static void sfq_perturbation(unsigned long arg) @@ -502,7 +524,7 @@ static void sfq_perturbation(unsigned long arg) spin_lock(root_lock); q->perturbation = net_random(); if (!q->filter_list && q->tail) - sfq_rehash(q); + qdisc_tree_decrease_qlen(sch, sfq_rehash(q)); spin_unlock(root_lock); if (q->perturb_period) @@ -513,11 +535,13 @@ static int sfq_change(struct Qdisc *sch, struct nlattr *opt) { struct sfq_sched_data *q = qdisc_priv(sch); struct tc_sfq_qopt *ctl = nla_data(opt); + struct tc_sfq_ext_qopt *ctl_ext = NULL; unsigned int qlen; if (opt->nla_len < nla_attr_size(sizeof(*ctl))) return -EINVAL; - + if (opt->nla_len >= nla_attr_size(sizeof(*ctl_ext))) + ctl_ext = nla_data(opt); if (ctl->divisor && (!is_power_of_2(ctl->divisor) || ctl->divisor > 65536)) return -EINVAL; @@ -526,10 +550,18 @@ static int sfq_change(struct Qdisc *sch, struct nlattr *opt) q->quantum = ctl->quantum ? : psched_mtu(qdisc_dev(sch)); q->scaled_quantum = SFQ_ALLOT_SIZE(q->quantum); q->perturb_period = ctl->perturb_period * HZ; - if (ctl->limit) - q->limit = min_t(u32, ctl->limit, SFQ_DEPTH - 1); + if (ctl->flows) + q->maxflows = min_t(u32, ctl->flows, SFQ_MAX_FLOWS); if (ctl->divisor) q->divisor = ctl->divisor; + if (ctl_ext) { + if (ctl_ext->depth) + q->depth = min_t(u32, ctl_ext->depth, SFQ_DEPTH - 1); + q->headdrop = ctl_ext->headdrop; + } + if (ctl->limit) + q->limit = min_t(u32, ctl->limit, q->depth * q->maxflows); + qlen = sch->q.qlen; while (sch->q.qlen > q->limit) sfq_drop(sch); @@ -544,6 +576,16 @@ static int sfq_change(struct Qdisc *sch, struct nlattr *opt) return 0; } +static void sfq_free(void *addr) +{ + if (addr) { + if (is_vmalloc_addr(addr)) + vfree(addr); + else + kfree(addr); + } +} + static int sfq_init(struct Qdisc *sch, struct nlattr *opt) { struct sfq_sched_data *q = qdisc_priv(sch); @@ -555,14 +597,16 @@ static int sfq_init(struct Qdisc *sch, struct nlattr *opt) init_timer_deferrable(&q->perturb_timer); for (i = 0; i < SFQ_DEPTH; i++) { - q->dep[i].next = i + SFQ_SLOTS; - q->dep[i].prev = i + SFQ_SLOTS; + q->dep[i].next = i + SFQ_MAX_FLOWS; + q->dep[i].prev = i + SFQ_MAX_FLOWS; } q->limit = SFQ_DEPTH - 1; + q->depth = SFQ_DEPTH - 1; q->cur_depth = 0; q->tail = NULL; q->divisor = SFQ_DEFAULT_HASH_DIVISOR; + q->maxflows = SFQ_DEFAULT_FLOWS; if (opt == NULL) { q->quantum = psched_mtu(qdisc_dev(sch)); q->scaled_quantum = SFQ_ALLOT_SIZE(q->quantum); @@ -575,15 +619,22 @@ static int sfq_init(struct Qdisc *sch, struct nlattr *opt) } sz = sizeof(q->ht[0]) * q->divisor; - q->ht = kmalloc(sz, GFP_KERNEL); + q->ht = kmalloc(sz, GFP_KERNEL | __GFP_NOWARN); if (!q->ht && sz > PAGE_SIZE) q->ht = vmalloc(sz); if (!q->ht) return -ENOMEM; + + q->slots = kzalloc(sizeof(q->slots[0]) * q->maxflows, GFP_KERNEL | __GFP_NOWARN); + if (!q->slots) + q->slots = vzalloc(sizeof(q->slots[0]) * q->maxflows); + if (!q->slots) { + sfq_free(q->ht); + return -ENOMEM; + } for (i = 0; i < q->divisor; i++) q->ht[i] = SFQ_EMPTY_SLOT; - - for (i = 0; i < SFQ_SLOTS; i++) { + for (i = 0; i < q->maxflows; i++) { slot_queue_init(&q->slots[i]); sfq_link(q, i); } @@ -601,25 +652,24 @@ static void sfq_destroy(struct Qdisc *sch) tcf_destroy_chain(&q->filter_list); q->perturb_period = 0; del_timer_sync(&q->perturb_timer); - if (is_vmalloc_addr(q->ht)) - vfree(q->ht); - else - kfree(q->ht); + sfq_free(q->ht); + sfq_free(q->slots); } static int sfq_dump(struct Qdisc *sch, struct sk_buff *skb) { struct sfq_sched_data *q = qdisc_priv(sch); unsigned char *b = skb_tail_pointer(skb); - struct tc_sfq_qopt opt; - - opt.quantum = q->quantum; - opt.perturb_period = q->perturb_period / HZ; + struct tc_sfq_ext_qopt opt; - opt.limit = q->limit; - opt.divisor = q->divisor; - opt.flows = q->limit; + opt.qopt.quantum = q->quantum; + opt.qopt.perturb_period = q->perturb_period / HZ; + opt.qopt.limit = q->limit; + opt.qopt.divisor = q->divisor; + opt.qopt.flows = q->maxflows; + opt.depth = q->depth; + opt.headdrop = q->headdrop; NLA_PUT(skb, TCA_OPTIONS, sizeof(opt), &opt); return skb->len; ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [RFC] SFQ planned changes 2012-01-03 16:08 ` Eric Dumazet @ 2012-01-03 23:57 ` Dave Taht 2012-01-04 0:14 ` Eric Dumazet 0 siblings, 1 reply; 18+ messages in thread From: Dave Taht @ 2012-01-03 23:57 UTC (permalink / raw) To: Eric Dumazet; +Cc: Michal Kubeček, netdev, John A. Sullivan III On Tue, Jan 3, 2012 at 5:08 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > Here is the code I ran on my test server with 200 netperf TCP_STREAM > flows with pretty good results (each flow gets 0.5 % of bandwidth) Can I encourage you to always simultaneously run a fping and/or a netperf -t TCP_RR latency under load test when doing stuff like this? The amount of backlogged bytes is rather impressive... > > $TC qdisc add dev $DEV root handle 1: est 1sec 8sec htb default 1 > $TC class add dev $DEV parent 1: classid 1:1 est 1sec 8sec htb \ > rate 200Mbit mtu 40000 quantum 80000 > > $TC qdisc add dev $DEV parent 1:1 handle 10: est 1sec 8sec sfq \ > limit 2000 depth 10 headdrop flows 1000 divisor 16384 > > # tcnew -s -d qdisc show dev eth3 > qdisc htb 1: root refcnt 18 r2q 10 default 1 direct_packets_stat 0 ver 3.17 > Sent 4512949730 bytes 3030391 pkt (dropped 44409, overlimits 6105100 requeues 1) > rate 198288Kbit 16629pps backlog 0b 1732p requeues 1 > qdisc sfq 10: parent 1:1 limit 2000p quantum 1514b depth 10 headdrop flows 1000/16384 divisor 16384 > Sent 4512949730 bytes 3030391 pkt (dropped 44409, overlimits 0 requeues 0) > rate 198288Kbit 16629pps backlog 2622248b 1732p requeues 0 > > patch on top of current net-next I'm not going to have time to get to this for a while... > include/linux/pkt_sched.h | 7 + > net/sched/sch_sfq.c | 144 ++++++++++++++++++++++++------------ > 2 files changed, 104 insertions(+), 47 deletions(-) > > diff --git a/include/linux/pkt_sched.h b/include/linux/pkt_sched.h > index 8daced3..c2c6cfd 100644 > --- a/include/linux/pkt_sched.h > +++ b/include/linux/pkt_sched.h > @@ -162,6 +162,13 @@ struct tc_sfq_qopt { > unsigned flows; /* Maximal number of flows */ > }; > > +struct tc_sfq_ext_qopt { > + struct tc_sfq_qopt qopt; > + unsigned int depth; /* max number of packets per flow */ > + unsigned int headdrop; > +}; > + > + > struct tc_sfq_xstats { > __s32 allot; > }; > diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c > index d329a8a..66682fd 100644 > --- a/net/sched/sch_sfq.c > +++ b/net/sched/sch_sfq.c > @@ -67,15 +67,16 @@ > > IMPLEMENTATION: > This implementation limits maximal queue length to 128; > - max mtu to 2^18-1; max 128 flows, number of hash buckets to 1024. > - The only goal of this restrictions was that all data > - fit into one 4K page on 32bit arches. > + max mtu to 2^18-1; > + max 65280 flows, > + number of hash buckets to 65536. > > It is easy to increase these values, but not in flight. */ > > #define SFQ_DEPTH 128 /* max number of packets per flow */ > -#define SFQ_SLOTS 128 /* max number of flows */ > -#define SFQ_EMPTY_SLOT 255 > +#define SFQ_DEFAULT_FLOWS 128 > +#define SFQ_MAX_FLOWS (0x10000 - 256) /* max number of flows */ > +#define SFQ_EMPTY_SLOT 0xffff > #define SFQ_DEFAULT_HASH_DIVISOR 1024 > > /* We use 16 bits to store allot, and want to handle packets up to 64K > @@ -84,13 +85,13 @@ > #define SFQ_ALLOT_SHIFT 3 > #define SFQ_ALLOT_SIZE(X) DIV_ROUND_UP(X, 1 << SFQ_ALLOT_SHIFT) > > -/* This type should contain at least SFQ_DEPTH + SFQ_SLOTS values */ > -typedef unsigned char sfq_index; > +/* This type should contain at least SFQ_DEPTH + SFQ_MAX_FLOWS values */ > +typedef u16 sfq_index; > > /* > * We dont use pointers to save space. > - * Small indexes [0 ... SFQ_SLOTS - 1] are 'pointers' to slots[] array > - * while following values [SFQ_SLOTS ... SFQ_SLOTS + SFQ_DEPTH - 1] > + * Small indexes [0 ... SFQ_MAX_FLOWS - 1] are 'pointers' to slots[] array > + * while following values [SFQ_MAX_FLOWS ... SFQ_MAX_FLOWS + SFQ_DEPTH - 1] > * are 'pointers' to dep[] array > */ > struct sfq_head { > @@ -112,8 +113,11 @@ struct sfq_sched_data { > /* Parameters */ > int perturb_period; > unsigned int quantum; /* Allotment per round: MUST BE >= MTU */ > - int limit; > + int limit; /* limit of total number of packets in this qdisc */ > unsigned int divisor; /* number of slots in hash table */ > + unsigned int maxflows; /* number of flows in flows array */ > + int headdrop; > + int depth; /* limit depth of each flow */ > /* Variables */ > struct tcf_proto *filter_list; > struct timer_list perturb_timer; > @@ -122,7 +126,7 @@ struct sfq_sched_data { > unsigned short scaled_quantum; /* SFQ_ALLOT_SIZE(quantum) */ > struct sfq_slot *tail; /* current slot in round */ > sfq_index *ht; /* Hash table (divisor slots) */ > - struct sfq_slot slots[SFQ_SLOTS]; > + struct sfq_slot *slots; > struct sfq_head dep[SFQ_DEPTH]; /* Linked list of slots, indexed by depth */ > }; > > @@ -131,9 +135,9 @@ struct sfq_sched_data { > */ > static inline struct sfq_head *sfq_dep_head(struct sfq_sched_data *q, sfq_index val) > { > - if (val < SFQ_SLOTS) > + if (val < SFQ_MAX_FLOWS) > return &q->slots[val].dep; > - return &q->dep[val - SFQ_SLOTS]; > + return &q->dep[val - SFQ_MAX_FLOWS]; > } > > /* > @@ -199,18 +203,19 @@ static unsigned int sfq_classify(struct sk_buff *skb, struct Qdisc *sch, > } > > /* > - * x : slot number [0 .. SFQ_SLOTS - 1] > + * x : slot number [0 .. SFQ_MAX_FLOWS - 1] > */ > static inline void sfq_link(struct sfq_sched_data *q, sfq_index x) > { > sfq_index p, n; > - int qlen = q->slots[x].qlen; > + struct sfq_slot *slot = &q->slots[x]; > + int qlen = slot->qlen; > > - p = qlen + SFQ_SLOTS; > + p = qlen + SFQ_MAX_FLOWS; > n = q->dep[qlen].next; > > - q->slots[x].dep.next = n; > - q->slots[x].dep.prev = p; > + slot->dep.next = n; > + slot->dep.prev = p; > > q->dep[qlen].next = x; /* sfq_dep_head(q, p)->next = x */ > sfq_dep_head(q, n)->prev = x; > @@ -305,7 +310,7 @@ static unsigned int sfq_drop(struct Qdisc *sch) > x = q->dep[d].next; > slot = &q->slots[x]; > drop: > - skb = slot_dequeue_tail(slot); > + skb = q->headdrop ? slot_dequeue_head(slot) : slot_dequeue_tail(slot); > len = qdisc_pkt_len(skb); > sfq_dec(q, x); > kfree_skb(skb); > @@ -349,16 +354,26 @@ sfq_enqueue(struct sk_buff *skb, struct Qdisc *sch) > slot = &q->slots[x]; > if (x == SFQ_EMPTY_SLOT) { > x = q->dep[0].next; /* get a free slot */ > + if (x >= SFQ_MAX_FLOWS) > + return qdisc_drop(skb, sch); > q->ht[hash] = x; > slot = &q->slots[x]; > slot->hash = hash; > } > > - /* If selected queue has length q->limit, do simple tail drop, > - * i.e. drop _this_ packet. > - */ > - if (slot->qlen >= q->limit) > - return qdisc_drop(skb, sch); > + if (slot->qlen >= q->depth) { > + struct sk_buff *head; > + > + if (!q->headdrop) > + return qdisc_drop(skb, sch); > + head = slot_dequeue_head(slot); > + sch->qstats.backlog -= qdisc_pkt_len(head); > + kfree_skb(head); > + sch->qstats.drops++; > + sch->qstats.backlog += qdisc_pkt_len(skb); > + slot_queue_add(slot, skb); > + return NET_XMIT_CN; > + } > > sch->qstats.backlog += qdisc_pkt_len(skb); > slot_queue_add(slot, skb); > @@ -366,11 +381,11 @@ sfq_enqueue(struct sk_buff *skb, struct Qdisc *sch) > if (slot->qlen == 1) { /* The flow is new */ > if (q->tail == NULL) { /* It is the first flow */ > slot->next = x; > + q->tail = slot; > } else { > slot->next = q->tail->next; > q->tail->next = x; > } > - q->tail = slot; > slot->allot = q->scaled_quantum; > } > if (++sch->q.qlen <= q->limit) > @@ -445,16 +460,17 @@ sfq_reset(struct Qdisc *sch) > * We dont use sfq_dequeue()/sfq_enqueue() because we dont want to change > * counters. > */ > -static void sfq_rehash(struct sfq_sched_data *q) > +static int sfq_rehash(struct sfq_sched_data *q) > { > struct sk_buff *skb; > int i; > struct sfq_slot *slot; > struct sk_buff_head list; > + int dropped = 0; > > __skb_queue_head_init(&list); > > - for (i = 0; i < SFQ_SLOTS; i++) { > + for (i = 0; i < q->maxflows; i++) { > slot = &q->slots[i]; > if (!slot->qlen) > continue; > @@ -474,6 +490,11 @@ static void sfq_rehash(struct sfq_sched_data *q) > slot = &q->slots[x]; > if (x == SFQ_EMPTY_SLOT) { > x = q->dep[0].next; /* get a free slot */ > + if (x >= SFQ_MAX_FLOWS) { > + kfree_skb(skb); > + dropped++; > + continue; > + } > q->ht[hash] = x; > slot = &q->slots[x]; > slot->hash = hash; > @@ -491,6 +512,7 @@ static void sfq_rehash(struct sfq_sched_data *q) > slot->allot = q->scaled_quantum; > } > } > + return dropped; > } > > static void sfq_perturbation(unsigned long arg) > @@ -502,7 +524,7 @@ static void sfq_perturbation(unsigned long arg) > spin_lock(root_lock); > q->perturbation = net_random(); > if (!q->filter_list && q->tail) > - sfq_rehash(q); > + qdisc_tree_decrease_qlen(sch, sfq_rehash(q)); > spin_unlock(root_lock); > > if (q->perturb_period) > @@ -513,11 +535,13 @@ static int sfq_change(struct Qdisc *sch, struct nlattr *opt) > { > struct sfq_sched_data *q = qdisc_priv(sch); > struct tc_sfq_qopt *ctl = nla_data(opt); > + struct tc_sfq_ext_qopt *ctl_ext = NULL; > unsigned int qlen; > > if (opt->nla_len < nla_attr_size(sizeof(*ctl))) > return -EINVAL; > - > + if (opt->nla_len >= nla_attr_size(sizeof(*ctl_ext))) > + ctl_ext = nla_data(opt); > if (ctl->divisor && > (!is_power_of_2(ctl->divisor) || ctl->divisor > 65536)) > return -EINVAL; > @@ -526,10 +550,18 @@ static int sfq_change(struct Qdisc *sch, struct nlattr *opt) > q->quantum = ctl->quantum ? : psched_mtu(qdisc_dev(sch)); > q->scaled_quantum = SFQ_ALLOT_SIZE(q->quantum); > q->perturb_period = ctl->perturb_period * HZ; > - if (ctl->limit) > - q->limit = min_t(u32, ctl->limit, SFQ_DEPTH - 1); > + if (ctl->flows) > + q->maxflows = min_t(u32, ctl->flows, SFQ_MAX_FLOWS); > if (ctl->divisor) > q->divisor = ctl->divisor; > + if (ctl_ext) { > + if (ctl_ext->depth) > + q->depth = min_t(u32, ctl_ext->depth, SFQ_DEPTH - 1); > + q->headdrop = ctl_ext->headdrop; > + } > + if (ctl->limit) > + q->limit = min_t(u32, ctl->limit, q->depth * q->maxflows); > + > qlen = sch->q.qlen; > while (sch->q.qlen > q->limit) > sfq_drop(sch); > @@ -544,6 +576,16 @@ static int sfq_change(struct Qdisc *sch, struct nlattr *opt) > return 0; > } > > +static void sfq_free(void *addr) > +{ > + if (addr) { > + if (is_vmalloc_addr(addr)) > + vfree(addr); > + else > + kfree(addr); > + } > +} > + > static int sfq_init(struct Qdisc *sch, struct nlattr *opt) > { > struct sfq_sched_data *q = qdisc_priv(sch); > @@ -555,14 +597,16 @@ static int sfq_init(struct Qdisc *sch, struct nlattr *opt) > init_timer_deferrable(&q->perturb_timer); > > for (i = 0; i < SFQ_DEPTH; i++) { > - q->dep[i].next = i + SFQ_SLOTS; > - q->dep[i].prev = i + SFQ_SLOTS; > + q->dep[i].next = i + SFQ_MAX_FLOWS; > + q->dep[i].prev = i + SFQ_MAX_FLOWS; > } > > q->limit = SFQ_DEPTH - 1; > + q->depth = SFQ_DEPTH - 1; > q->cur_depth = 0; > q->tail = NULL; > q->divisor = SFQ_DEFAULT_HASH_DIVISOR; > + q->maxflows = SFQ_DEFAULT_FLOWS; > if (opt == NULL) { > q->quantum = psched_mtu(qdisc_dev(sch)); > q->scaled_quantum = SFQ_ALLOT_SIZE(q->quantum); > @@ -575,15 +619,22 @@ static int sfq_init(struct Qdisc *sch, struct nlattr *opt) > } > > sz = sizeof(q->ht[0]) * q->divisor; > - q->ht = kmalloc(sz, GFP_KERNEL); > + q->ht = kmalloc(sz, GFP_KERNEL | __GFP_NOWARN); > if (!q->ht && sz > PAGE_SIZE) > q->ht = vmalloc(sz); > if (!q->ht) > return -ENOMEM; > + > + q->slots = kzalloc(sizeof(q->slots[0]) * q->maxflows, GFP_KERNEL | __GFP_NOWARN); > + if (!q->slots) > + q->slots = vzalloc(sizeof(q->slots[0]) * q->maxflows); > + if (!q->slots) { > + sfq_free(q->ht); > + return -ENOMEM; > + } > for (i = 0; i < q->divisor; i++) > q->ht[i] = SFQ_EMPTY_SLOT; > - > - for (i = 0; i < SFQ_SLOTS; i++) { > + for (i = 0; i < q->maxflows; i++) { > slot_queue_init(&q->slots[i]); > sfq_link(q, i); > } > @@ -601,25 +652,24 @@ static void sfq_destroy(struct Qdisc *sch) > tcf_destroy_chain(&q->filter_list); > q->perturb_period = 0; > del_timer_sync(&q->perturb_timer); > - if (is_vmalloc_addr(q->ht)) > - vfree(q->ht); > - else > - kfree(q->ht); > + sfq_free(q->ht); > + sfq_free(q->slots); > } > > static int sfq_dump(struct Qdisc *sch, struct sk_buff *skb) > { > struct sfq_sched_data *q = qdisc_priv(sch); > unsigned char *b = skb_tail_pointer(skb); > - struct tc_sfq_qopt opt; > - > - opt.quantum = q->quantum; > - opt.perturb_period = q->perturb_period / HZ; > + struct tc_sfq_ext_qopt opt; > > - opt.limit = q->limit; > - opt.divisor = q->divisor; > - opt.flows = q->limit; > + opt.qopt.quantum = q->quantum; > + opt.qopt.perturb_period = q->perturb_period / HZ; > > + opt.qopt.limit = q->limit; > + opt.qopt.divisor = q->divisor; > + opt.qopt.flows = q->maxflows; > + opt.depth = q->depth; > + opt.headdrop = q->headdrop; > NLA_PUT(skb, TCA_OPTIONS, sizeof(opt), &opt); > > return skb->len; > > -- Dave Täht SKYPE: davetaht US Tel: 1-239-829-5608 FR Tel: 0638645374 http://www.bufferbloat.net ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC] SFQ planned changes 2012-01-03 23:57 ` Dave Taht @ 2012-01-04 0:14 ` Eric Dumazet 2012-01-04 7:56 ` Dave Taht 0 siblings, 1 reply; 18+ messages in thread From: Eric Dumazet @ 2012-01-04 0:14 UTC (permalink / raw) To: Dave Taht; +Cc: Michal Kubeček, netdev, John A. Sullivan III Le mercredi 04 janvier 2012 à 00:57 +0100, Dave Taht a écrit : > On Tue, Jan 3, 2012 at 5:08 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > > Here is the code I ran on my test server with 200 netperf TCP_STREAM > > flows with pretty good results (each flow gets 0.5 % of bandwidth) > > Can I encourage you to always simultaneously run a fping and/or a > netperf -t TCP_RR > ping is pretty nice ;) # ping -c 20 192.168.20.112 PING 192.168.20.112 (192.168.20.112) 56(84) bytes of data. 64 bytes from 192.168.20.112: icmp_req=1 ttl=64 time=0.251 ms 64 bytes from 192.168.20.112: icmp_req=2 ttl=64 time=0.123 ms 64 bytes from 192.168.20.112: icmp_req=3 ttl=64 time=0.124 ms 64 bytes from 192.168.20.112: icmp_req=4 ttl=64 time=0.108 ms 64 bytes from 192.168.20.112: icmp_req=5 ttl=64 time=0.131 ms 64 bytes from 192.168.20.112: icmp_req=6 ttl=64 time=0.126 ms 64 bytes from 192.168.20.112: icmp_req=7 ttl=64 time=0.156 ms 64 bytes from 192.168.20.112: icmp_req=8 ttl=64 time=0.123 ms 64 bytes from 192.168.20.112: icmp_req=9 ttl=64 time=0.111 ms 64 bytes from 192.168.20.112: icmp_req=10 ttl=64 time=0.129 ms 64 bytes from 192.168.20.112: icmp_req=11 ttl=64 time=0.112 ms 64 bytes from 192.168.20.112: icmp_req=12 ttl=64 time=0.138 ms 64 bytes from 192.168.20.112: icmp_req=13 ttl=64 time=0.118 ms 64 bytes from 192.168.20.112: icmp_req=14 ttl=64 time=0.119 ms 64 bytes from 192.168.20.112: icmp_req=15 ttl=64 time=0.121 ms 64 bytes from 192.168.20.112: icmp_req=16 ttl=64 time=0.125 ms 64 bytes from 192.168.20.112: icmp_req=17 ttl=64 time=0.128 ms 64 bytes from 192.168.20.112: icmp_req=18 ttl=64 time=0.108 ms 64 bytes from 192.168.20.112: icmp_req=19 ttl=64 time=0.149 ms 64 bytes from 192.168.20.112: icmp_req=20 ttl=64 time=0.126 ms --- 192.168.20.112 ping statistics --- 20 packets transmitted, 20 received, 0% packet loss, time 18999ms rtt min/avg/max/mdev = 0.108/0.131/0.251/0.031 ms > latency under load test when doing stuff like this? > > The amount of backlogged bytes is rather impressive... 200 tcp flooding flows... thats pretty normal. If I add to this load a TCP_RR one I get : # netperf -H 192.168.20.110 -v 0 -l 10 -t TCP_RR TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.20.110 (192.168.20.110) port 0 AF_INET : demo 7606.18 If I stop the flood and start the TCP_RR alone : # netperf -H 192.168.20.110 -v 0 -l 10 -t TCP_RR TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.20.110 (192.168.20.110) port 0 AF_INET : demo 12031.39 And a ping on idle link : # ping -c 20 192.168.20.112 PING 192.168.20.112 (192.168.20.112) 56(84) bytes of data. 64 bytes from 192.168.20.112: icmp_req=1 ttl=64 time=0.119 ms 64 bytes from 192.168.20.112: icmp_req=2 ttl=64 time=0.090 ms 64 bytes from 192.168.20.112: icmp_req=3 ttl=64 time=0.085 ms 64 bytes from 192.168.20.112: icmp_req=4 ttl=64 time=0.087 ms 64 bytes from 192.168.20.112: icmp_req=5 ttl=64 time=0.084 ms 64 bytes from 192.168.20.112: icmp_req=6 ttl=64 time=0.084 ms 64 bytes from 192.168.20.112: icmp_req=7 ttl=64 time=0.088 ms 64 bytes from 192.168.20.112: icmp_req=8 ttl=64 time=0.085 ms 64 bytes from 192.168.20.112: icmp_req=9 ttl=64 time=0.083 ms 64 bytes from 192.168.20.112: icmp_req=10 ttl=64 time=0.082 ms 64 bytes from 192.168.20.112: icmp_req=11 ttl=64 time=0.082 ms 64 bytes from 192.168.20.112: icmp_req=12 ttl=64 time=0.085 ms 64 bytes from 192.168.20.112: icmp_req=13 ttl=64 time=0.086 ms 64 bytes from 192.168.20.112: icmp_req=14 ttl=64 time=0.084 ms 64 bytes from 192.168.20.112: icmp_req=15 ttl=64 time=0.089 ms 64 bytes from 192.168.20.112: icmp_req=16 ttl=64 time=0.081 ms 64 bytes from 192.168.20.112: icmp_req=17 ttl=64 time=0.084 ms 64 bytes from 192.168.20.112: icmp_req=18 ttl=64 time=0.086 ms 64 bytes from 192.168.20.112: icmp_req=19 ttl=64 time=0.084 ms 64 bytes from 192.168.20.112: icmp_req=20 ttl=64 time=0.084 ms --- 192.168.20.112 ping statistics --- 20 packets transmitted, 20 received, 0% packet loss, time 19000ms rtt min/avg/max/mdev = 0.081/0.086/0.119/0.012 ms I can do a test on full Gigabit speed (removing the HTB) and 1000 flows and post results ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC] SFQ planned changes 2012-01-04 0:14 ` Eric Dumazet @ 2012-01-04 7:56 ` Dave Taht 2012-01-04 8:17 ` Eric Dumazet 0 siblings, 1 reply; 18+ messages in thread From: Dave Taht @ 2012-01-04 7:56 UTC (permalink / raw) To: Eric Dumazet; +Cc: Michal Kubeček, netdev, John A. Sullivan III On Wed, Jan 4, 2012 at 1:14 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > Le mercredi 04 janvier 2012 à 00:57 +0100, Dave Taht a écrit : >> On Tue, Jan 3, 2012 at 5:08 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: >> > Here is the code I ran on my test server with 200 netperf TCP_STREAM >> > flows with pretty good results (each flow gets 0.5 % of bandwidth) >> >> Can I encourage you to always simultaneously run a fping and/or a >> netperf -t TCP_RR >> So I sat down and setup something that could do gigE and exercise everything I had lying around worth playing with to see what crashed... > a ping on idle link : > # ping -c 20 192.168.20.112 > PING 192.168.20.112 (192.168.20.112) 56(84) bytes of data. > 64 bytes from 192.168.20.112: icmp_req=1 ttl=64 time=0.119 ms > 64 bytes from 192.168.20.112: icmp_req=2 ttl=64 time=0.090 ms > 64 bytes from 192.168.20.112: icmp_req=3 ttl=64 time=0.085 ms > 64 bytes from 192.168.20.112: icmp_req=4 ttl=64 time=0.087 ms I find puzzling that my baseline ping time is nearly 3x yours. I guess this is the price I pay for a 680mhz box on the other end. My baseline ping (1 hop e1000e to router) 64 bytes from 172.30.50.1: icmp_req=18 ttl=64 time=0.239 ms 64 bytes from 172.30.50.1: icmp_req=19 ttl=64 time=0.247 ms 64 bytes from 172.30.50.1: icmp_req=20 ttl=64 time=0.301 ms (or, in my data format) |T|172.30.50.1 |172.30.47.1 |172.30.47.27 | |-+-+-+-+| |1|0.34|0.63|0.59| |2|0.28|0.42|0.45| |3|0.39|0.41|0.48| |4|0.37|0.42|0.51| |5|0.33|0.43|0.49| your load test: > # ping -c 20 192.168.20.112 > PING 192.168.20.112 (192.168.20.112) 56(84) bytes of data. > 64 bytes from 192.168.20.112: icmp_req=1 ttl=64 time=0.251 ms > 64 bytes from 192.168.20.112: icmp_req=2 ttl=64 time=0.123 ms > 64 bytes from 192.168.20.112: icmp_req=3 ttl=64 time=0.124 ms This was my complex qfq/sfq test that ran all night (somehow), at gigE. STAQFQ is on on the source laptop, 100 iperfs in play, 600 seconds at a time, net transfer rate about 250Mbit.... - and I rate limited BQL to 9000 limit_max. GSO/TSO are off throughout (STAQFQ is 514 QFQ bins, 24 pfifo_fast qdiscs per) first router has staqfq on the external interface connected to laptop #1 sfq on the internal interface connected to router #2 router #2 had sfq on it's external interface and internal interface laptop #2 had pfifo_fast on it |count|e1000e to router|to next router|through next router's switch to laptop #2| |100|0.40|0.42|0.57| |101|0.49|0.48|0.54| |102|0.59|0.65|0.73| |103|0.48|0.59|0.83| |104|0.36|0.56|0.75| |105|0.51|0.63|0.66| |106|0.41|0.60|0.40| |107|0.62|0.44|0.81| |108|0.33|0.36|0.79| |109|0.49|0.49|0.49| |110|0.48|0.42|0.54| Three notes of interest while I sort through this: 1) I saw spikes of up to 12ms with BQL's limiter enabled at one point or another. I'll try to duplicate that. 2) I did manage to crash QFQ multiple times earlier in the night (on every interface that has sfq on it now) 3) And when the ping ends up in the wrong bin, the results can be interesting. |125|0.56|98.91|0.55| |126|0.41|96.54|0.52| |127|0.35|96.11|0.91| |128|0.23|106.52|0.57| |129|0.42|104.01|0.83| |130|0.44|105.92|0.59| 4) there was packet loss (yea!) and many other anomalies, I ran each test for 600 seconds, need to look at the actual data transferred, and will try a plot later. But I can say the 2 day old SFQ stuff stands up to a load test... And QFQ can do pretty well, too, when not crashing... I will get to your new patch set over the weekend. -- Dave Täht SKYPE: davetaht US Tel: 1-239-829-5608 FR Tel: 0638645374 http://www.bufferbloat.net ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC] SFQ planned changes 2012-01-04 7:56 ` Dave Taht @ 2012-01-04 8:17 ` Eric Dumazet 0 siblings, 0 replies; 18+ messages in thread From: Eric Dumazet @ 2012-01-04 8:17 UTC (permalink / raw) To: Dave Taht; +Cc: Michal Kubeček, netdev, John A. Sullivan III Le mercredi 04 janvier 2012 à 08:56 +0100, Dave Taht a écrit : > I find puzzling that my baseline ping time is nearly 3x yours. > > I guess this is the price I pay for a 680mhz box on the other end. > Hmm... maybe... but this seems strange. A ping handler should be a matter of 1 to 10 us at most. not 100 us. Checkout on receiver machine rx coalescing params : ethtool -c eth0 Here : the sender is a normal link (not trunk mode) # ethtool -c eth3 Coalesce parameters for eth3: Adaptive RX: off TX: off stats-block-usecs: 0 sample-interval: 0 pkt-rate-low: 0 pkt-rate-high: 0 rx-usecs: 24 rx-frames: 0 rx-usecs-irq: 0 rx-frames-irq: 0 tx-usecs: 48 tx-frames: 0 tx-usecs-irq: 0 tx-frames-irq: 0 And on my 2nd server, receiver of the ping request, (but also 2 switches are crossed between these machines). This eth2 is part of a bond0 device, and trunk (vlan) activated on this link. $ ethtool -c eth2 Coalesce parameters for eth2: Adaptive RX: off TX: off stats-block-usecs: 0 sample-interval: 0 pkt-rate-low: 0 pkt-rate-high: 0 rx-usecs: 20 rx-frames: 5 rx-usecs-irq: 0 rx-frames-irq: 5 tx-usecs: 72 tx-frames: 53 tx-usecs-irq: 0 tx-frames-irq: 5 So I have a 20us delay at rx time before NIC sends an interrupt to the Host to 'deliver' the incoming packet. If I change it to 1 us : ethtool -C eth2 rx-usecs 1 then pings are even better, but a given load should generate more interrupts. # ping 192.168.20.110 PING 192.168.20.110 (192.168.20.110) 56(84) bytes of data. 64 bytes from 192.168.20.110: icmp_req=1 ttl=64 time=0.067 ms 64 bytes from 192.168.20.110: icmp_req=2 ttl=64 time=0.061 ms 64 bytes from 192.168.20.110: icmp_req=3 ttl=64 time=0.064 ms 64 bytes from 192.168.20.110: icmp_req=4 ttl=64 time=0.064 ms 64 bytes from 192.168.20.110: icmp_req=5 ttl=64 time=0.061 ms 64 bytes from 192.168.20.110: icmp_req=6 ttl=64 time=0.061 ms 64 bytes from 192.168.20.110: icmp_req=7 ttl=64 time=0.062 ms 64 bytes from 192.168.20.110: icmp_req=8 ttl=64 time=0.060 ms 64 bytes from 192.168.20.110: icmp_req=9 ttl=64 time=0.062 ms 64 bytes from 192.168.20.110: icmp_req=10 ttl=64 time=0.062 ms 64 bytes from 192.168.20.110: icmp_req=11 ttl=64 time=0.062 ms 64 bytes from 192.168.20.110: icmp_req=12 ttl=64 time=0.062 ms 64 bytes from 192.168.20.110: icmp_req=13 ttl=64 time=0.058 ms 64 bytes from 192.168.20.110: icmp_req=14 ttl=64 time=0.062 ms 64 bytes from 192.168.20.110: icmp_req=15 ttl=64 time=0.063 ms 64 bytes from 192.168.20.110: icmp_req=16 ttl=64 time=0.063 ms 64 bytes from 192.168.20.110: icmp_req=17 ttl=64 time=0.059 ms 64 bytes from 192.168.20.110: icmp_req=18 ttl=64 time=0.062 ms ^C --- 192.168.20.110 ping statistics --- 18 packets transmitted, 18 received, 0% packet loss, time 16999ms rtt min/avg/max/mdev = 0.058/0.061/0.067/0.010 ms ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: tc filter mask for ACK packets off? 2012-01-03 9:36 ` Dave Taht 2012-01-03 10:40 ` [RFC] SFQ planned changes Eric Dumazet @ 2012-01-03 12:18 ` John A. Sullivan III 2012-01-03 12:32 ` Eric Dumazet 1 sibling, 1 reply; 18+ messages in thread From: John A. Sullivan III @ 2012-01-03 12:18 UTC (permalink / raw) To: Dave Taht; +Cc: Michal Kubeček, netdev On Tue, 2012-01-03 at 10:36 +0100, Dave Taht wrote: <snip> > I'd go into more detail, but after what I hope are the final two > fixes to sfq and qfq land in the net-next kernel (after some more > testing), I like to think I have a more valid approach than this > in the works, but that too will require some more development > and testing. > > http://www.teklibre.com/~d/bloat/pfifo_fast_vs_sfq_qfq_linear.png > <snip> Hmmm . . . certainly shattered my concerns about replacing pfifo_fast with SFQ! Thanks - John ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: tc filter mask for ACK packets off? 2012-01-03 12:18 ` tc filter mask for ACK packets off? John A. Sullivan III @ 2012-01-03 12:32 ` Eric Dumazet 2012-01-03 12:45 ` John A. Sullivan III 0 siblings, 1 reply; 18+ messages in thread From: Eric Dumazet @ 2012-01-03 12:32 UTC (permalink / raw) To: John A. Sullivan III; +Cc: Dave Taht, Michal Kubeček, netdev Le mardi 03 janvier 2012 à 07:18 -0500, John A. Sullivan III a écrit : > On Tue, 2012-01-03 at 10:36 +0100, Dave Taht wrote: > <snip> > > I'd go into more detail, but after what I hope are the final two > > fixes to sfq and qfq land in the net-next kernel (after some more > > testing), I like to think I have a more valid approach than this > > in the works, but that too will require some more development > > and testing. > > > > http://www.teklibre.com/~d/bloat/pfifo_fast_vs_sfq_qfq_linear.png > > > <snip> > Hmmm . . . certainly shattered my concerns about replacing pfifo_fast > with SFQ! Thanks - John Before you do, take the time to read the warning in sfq source : ADVANTAGE: - It is very cheap. Both CPU and memory requirements are minimal. DRAWBACKS: - "Stochastic" -> It is not 100% fair. When hash collisions occur, several flows are considered as one. - "Round-robin" -> It introduces larger delays than virtual clock based schemes, and should not be used for isolating interactive traffic from non-interactive. It means, that this scheduler should be used as leaf of CBQ or P3, which put interactive traffic to higher priority band. SFQ (as a direct replacement of dev root qdisc) is fine if most of your trafic is of same kind/priority. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: tc filter mask for ACK packets off? 2012-01-03 12:32 ` Eric Dumazet @ 2012-01-03 12:45 ` John A. Sullivan III 2012-01-03 13:00 ` Dave Taht 0 siblings, 1 reply; 18+ messages in thread From: John A. Sullivan III @ 2012-01-03 12:45 UTC (permalink / raw) To: Eric Dumazet; +Cc: Dave Taht, Michal Kubeček, netdev On Tue, 2012-01-03 at 13:32 +0100, Eric Dumazet wrote: > Le mardi 03 janvier 2012 à 07:18 -0500, John A. Sullivan III a écrit : > > On Tue, 2012-01-03 at 10:36 +0100, Dave Taht wrote: > > <snip> > > > I'd go into more detail, but after what I hope are the final two > > > fixes to sfq and qfq land in the net-next kernel (after some more > > > testing), I like to think I have a more valid approach than this > > > in the works, but that too will require some more development > > > and testing. > > > > > > http://www.teklibre.com/~d/bloat/pfifo_fast_vs_sfq_qfq_linear.png > > > > > <snip> > > Hmmm . . . certainly shattered my concerns about replacing pfifo_fast > > with SFQ! Thanks - John > > Before you do, take the time to read the warning in sfq source : > > > ADVANTAGE: > > - It is very cheap. Both CPU and memory requirements are minimal. > > DRAWBACKS: > > - "Stochastic" -> It is not 100% fair. > When hash collisions occur, several flows are considered as one. > > - "Round-robin" -> It introduces larger delays than virtual clock > based schemes, and should not be used for isolating interactive > traffic from non-interactive. It means, that this scheduler > should be used as leaf of CBQ or P3, which put interactive traffic > to higher priority band. > > > SFQ (as a direct replacement of dev root qdisc) is fine if most of your trafic > is of same kind/priority. > > > Yes, I suppose I should have been more specific, replacing pfifo_fast when I am using something else to prioritize and shape my traffic like HFSC. Hmm . . . although I still wonder about iSCSI SANs . . . Thanks - John ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: tc filter mask for ACK packets off? 2012-01-03 12:45 ` John A. Sullivan III @ 2012-01-03 13:00 ` Dave Taht 2012-01-03 17:57 ` John A. Sullivan III 0 siblings, 1 reply; 18+ messages in thread From: Dave Taht @ 2012-01-03 13:00 UTC (permalink / raw) To: John A. Sullivan III; +Cc: Eric Dumazet, Michal Kubeček, netdev On Tue, Jan 3, 2012 at 1:45 PM, John A. Sullivan III <jsullivan@opensourcedevel.com> wrote: > On Tue, 2012-01-03 at 13:32 +0100, Eric Dumazet wrote: >> Le mardi 03 janvier 2012 à 07:18 -0500, John A. Sullivan III a écrit : >> > On Tue, 2012-01-03 at 10:36 +0100, Dave Taht wrote: >> > <snip> >> > > I'd go into more detail, but after what I hope are the final two >> > > fixes to sfq and qfq land in the net-next kernel (after some more >> > > testing), I like to think I have a more valid approach than this >> > > in the works, but that too will require some more development >> > > and testing. >> > > >> > > http://www.teklibre.com/~d/bloat/pfifo_fast_vs_sfq_qfq_linear.png >> > > >> > <snip> >> > Hmmm . . . certainly shattered my concerns about replacing pfifo_fast >> > with SFQ! Thanks - John SFQ as presently implemented (and by presently, I mean, as of yesterday, by tomorrow it could be different at the rate eric is going!) is VERY suitable for sub 100Mbit desktops, wireless stations/laptops other devices, home gateways with sub 100Mbit uplinks, and the like. That's a few hundred million devices that aren't using it today and defaulting to pfifo_fast and suffering for it. QFQ is it's big brother and I have hopes it can scale up to 10GigE, once suitable techniques are found for managing the sub-queue depth. The enhancements to SFQ eric proposed in the other thread might get it to where it outperforms (by a lot) pfifo_fast in it's default configuration (eg txqueuelen 1000) with few side effects. Scaling further up than that... ... I don't have a good picture of gigE performance at the moment with any of these advanced qdiscs and have no recomendation. I do recomend highly that you fiddle with this stuff! I do have to note that the graph above had GSO/TSO turned off. >> Before you do, take the time to read the warning in sfq source : >> >> >> ADVANTAGE: >> >> - It is very cheap. Both CPU and memory requirements are minimal. >> >> DRAWBACKS: >> >> - "Stochastic" -> It is not 100% fair. >> When hash collisions occur, several flows are considered as one. This is in part the benefit of SFQ vs QFQ in that the maximum queue depth is well managed. >> - "Round-robin" -> It introduces larger delays than virtual clock >> based schemes, and should not be used for isolating interactive >> traffic from non-interactive. It means, that this scheduler >> should be used as leaf of CBQ or P3, which put interactive traffic >> to higher priority band. These delays are NOTHING compared to what pfifo_fast can induce. Very little traffic nowadays is marked as interactive to any statistically significant extent, so any FQ method effectively makes more traffic interactive than prioritization can. >> SFQ (as a direct replacement of dev root qdisc) is fine if most of your trafic >> is of same kind/priority. Which is the case for most desktops, laptops, gws, wireless, etc. > Yes, I suppose I should have been more specific, replacing pfifo_fast > when I am using something else to prioritize and shape my traffic like > HFSC. I enjoyed getting your HFSC experience secondhand. It would be very interesting getting your feedback on trying this stuff. More data is needed to beat the bloat. > Hmm . . . although I still wonder about iSCSI SANs . . . Thanks I wonder too. Most of the people running iSCSI seem to have an aversion to packet loss, yet are running over TCP. I *think* FQ methods will improve latency dramatically for iSCSI when iSCSI has multiple initiators.... > - John > -- Dave Täht SKYPE: davetaht US Tel: 1-239-829-5608 FR Tel: 0638645374 http://www.bufferbloat.net ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: tc filter mask for ACK packets off? 2012-01-03 13:00 ` Dave Taht @ 2012-01-03 17:57 ` John A. Sullivan III 0 siblings, 0 replies; 18+ messages in thread From: John A. Sullivan III @ 2012-01-03 17:57 UTC (permalink / raw) To: Dave Taht; +Cc: Eric Dumazet, Michal Kubeček, netdev On Tue, 2012-01-03 at 14:00 +0100, Dave Taht wrote: <snip> > SFQ as presently implemented (and by presently, I mean, as of yesterday, > by tomorrow it could be different at the rate eric is going!) is VERY > suitable for > sub 100Mbit desktops, wireless stations/laptops other devices, > home gateways with sub 100Mbit uplinks, and the like. That's a few > hundred million devices that aren't using it today and defaulting to > pfifo_fast and suffering for it. > > QFQ is it's big brother and I have hopes it can scale up to 10GigE, > once suitable techniques are found for managing the sub-queue depth. > > The enhancements to SFQ eric proposed in the other thread might get it > to where it outperforms (by a lot) pfifo_fast in it's default configuration > (eg txqueuelen 1000) with few side effects. Scaling further up than that... > > ... I don't have a good picture of gigE performance at the moment with > any of these advanced qdiscs and have no recomendation. Hmm . . . that's interesting in light of our thoughts about using SFQ for iSCSI. In that case, the links are GbE or 10GbE. Is there a problem using SFQ on those size links rather than pfifo_fast? > <snip> > >> - "Round-robin" -> It introduces larger delays than virtual clock > >> based schemes, and should not be used for isolating interactive > >> traffic from non-interactive. It means, that this scheduler > >> should be used as leaf of CBQ or P3, which put interactive traffic > >> to higher priority band. > > These delays are NOTHING compared to what pfifo_fast can induce. > > Very little traffic nowadays is marked as interactive to any statistically > significant extent, so any FQ method effectively makes more traffic > interactive than prioritization can. That may be changing quickly. I am doing a lot of work with Destkop Virtualization. This is all interactive traffic and, unlike terminal screens over telnet or ssh in the past, these can be fairly large chunks of data using full sized packets. They are also bursty rather than periodic. I would think we very much need prioritization here combined with FQ (hence our interest in HFSC + SFQ). > <snip> > > Hmm . . . although I still wonder about iSCSI SANs . . . Thanks > > I wonder too. Most of the people running iSCSI seem to have an > aversion to packet loss, yet are running over TCP. I *think* > FQ methods will improve latency dramatically for iSCSI > when iSCSI has multiple initiators.... <snip> I haven't had a chance to play with this yet but I'll do a little thinking out loud. Since these can be very large data transmissions, I would think it quite possible that a new connection's SYN packet is stuck behind a pile of full sized iSCSI packets. On the other hand, I'm not sure where the bottleneck is in iSCSI and if these queues ever backlog. I just ran a quick, simple test on a non-optimized SAN doing a cat /dev/zero > filename, hit 3.6Gbps throughput with four e1000 NICs doing multipath multibus and saw no backlog in the pfifo_fast qdiscs. If we do ever backlog, I would think SFQ would provide for a more immediate response to new streams whereas users of the bulk downloads already in process would not even notice the blip when the new stream is inserted. I would be a little concerned about iSCSI packets being delivered out of order when multipath multibus is used, i.e., the iSCSI commands are round robined around several NICs and thus several queues. If those queues are in varying states of backlog, a later packet in one queue might be delivered before an earlier packet in another queue. Then again, I would think pfifo_fast could produce a greater delay than SFQ - John ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: tc filter mask for ACK packets off? 2012-01-03 7:31 ` Michal Kubeček 2012-01-03 9:36 ` Dave Taht @ 2012-01-04 0:01 ` Michal Soltys 1 sibling, 0 replies; 18+ messages in thread From: Michal Soltys @ 2012-01-04 0:01 UTC (permalink / raw) To: Michal Kubeček; +Cc: netdev, John A. Sullivan III On 12-01-03 08:31, Michal Kubeček wrote: > On Saturday 31 of December 2011 21:30EN, John A. Sullivan III wrote: >> <cut> > > However, by a "ACK only" packet (worth prioritizing), I would rather > understand a packet with ACK flag without any payload, not a packet with > ACK as the only flag. For many TCP connections, all packets except > initial SYN and SYN-ACK and two FIN packets have ACK as the only flag. > So my guess is you should rather prioritize all TCP packets with no > application layer data. > > Michal Kubecek In context of the above - xtables-addons provide length2 match which (possibly paired with other iptables matches) gives excellent control for such tasks. ^ permalink raw reply [flat|nested] 18+ messages in thread
* tc filter mask for ACK packets off? @ 2012-01-01 2:30 John A. Sullivan III 0 siblings, 0 replies; 18+ messages in thread From: John A. Sullivan III @ 2012-01-01 2:30 UTC (permalink / raw) To: netdev Hello, all. I've been noticing that virtually all the documentation says we should prioritize ACK only packets and that they can be identified with match u8 0x10 0xff. However, isn't the actual flag field only 6 bits longs and the first two belong to a previous 6 bit reserved field? If that is true, if ever those bits are set, our filters will unnecessarily break. Shouldn't it be match u8 0x10 0x3f? Then again, I'm very new at this. Thanks - John ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2012-01-04 8:17 UTC | newest] Thread overview: 18+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-01-01 2:30 tc filter mask for ACK packets off? John A. Sullivan III 2012-01-03 7:31 ` Michal Kubeček 2012-01-03 9:36 ` Dave Taht 2012-01-03 10:40 ` [RFC] SFQ planned changes Eric Dumazet 2012-01-03 12:07 ` Dave Taht 2012-01-03 12:50 ` Eric Dumazet 2012-01-03 16:08 ` Eric Dumazet 2012-01-03 23:57 ` Dave Taht 2012-01-04 0:14 ` Eric Dumazet 2012-01-04 7:56 ` Dave Taht 2012-01-04 8:17 ` Eric Dumazet 2012-01-03 12:18 ` tc filter mask for ACK packets off? John A. Sullivan III 2012-01-03 12:32 ` Eric Dumazet 2012-01-03 12:45 ` John A. Sullivan III 2012-01-03 13:00 ` Dave Taht 2012-01-03 17:57 ` John A. Sullivan III 2012-01-04 0:01 ` Michal Soltys -- strict thread matches above, loose matches on Subject: below -- 2012-01-01 2:30 John A. Sullivan III
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).