* OpenWRT wrong adjustment of fq_codel defaults (Was: fq_codel_drop vs a udp flood)
[not found] ` <1462476207.13075.20.camel@edumazet-glaptop3.roam.corp.google.com>
@ 2016-05-06 9:42 ` Jesper Dangaard Brouer
2016-05-06 12:47 ` Jesper Dangaard Brouer
2016-05-07 9:57 ` OpenWRT wrong adjustment of fq_codel defaults (Was: [Codel] " Kevin Darbyshire-Bryant
[not found] ` <CAA93jw4XZ1+LLX1z6wQ6DHsp4vFCS5zmMz-uku_8SBNG_KuUxA@mail.gmail.com>
1 sibling, 2 replies; 30+ messages in thread
From: Jesper Dangaard Brouer @ 2016-05-06 9:42 UTC (permalink / raw)
To: Eric Dumazet, Felix Fietkau, Dave Taht
Cc: make-wifi-fast, zajec5, ath10k, netdev@vger.kernel.org,
codel@lists.bufferbloat.net, Jonathan Morton, Roman Yeryomin
Hi Felix,
This is an important fix for OpenWRT, please read!
OpenWRT changed the default fq_codel sch->limit from 10240 to 1024,
without also adjusting q->flows_cnt. Eric explains below that you must
also adjust the buckets (q->flows_cnt) for this not to break. (Just
adjust it to 128)
Problematic OpenWRT commit in question:
http://git.openwrt.org/?p=openwrt.git;a=patch;h=12cd6578084e
12cd6578084e ("kernel: revert fq_codel quantum override to prevent it from causing too much cpu load with higher speed (#21326)")
I also highly recommend you cherry-pick this very recent commit:
net-next: 9d18562a2278 ("fq_codel: add batch ability to fq_codel_drop()")
https://git.kernel.org/davem/net-next/c/9d18562a227
This should fix very high CPU usage in-case fq_codel goes into drop mode.
The problem is that drop mode was considered rare, and implementation
wise it was chosen to be more expensive (to save cycles on normal mode).
Unfortunately is it easy to trigger with an UDP flood. Drop mode is
especially expensive for smaller devices, as it scans a 4K big array,
thus 64 cache misses for small devices!
The fix is to allow drop-mode to bulk-drop more packets when entering
drop-mode (default 64 bulk drop). That way we don't suddenly
experience a significantly higher processing cost per packet, but
instead can amortize this.
To Eric, should we recommend OpenWRT to adjust default (max) 64 bulk
drop, given we also recommend bucket size to be 128 ? (thus the amount
of memory to scan is less, but their CPU is also much smaller).
--Jesper
On Thu, 05 May 2016 12:23:27 -0700 Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Thu, 2016-05-05 at 19:25 +0300, Roman Yeryomin wrote:
> > On 5 May 2016 at 19:12, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > > On Thu, 2016-05-05 at 17:53 +0300, Roman Yeryomin wrote:
> > >
> > >>
> > >> qdisc fq_codel 0: dev eth0 root refcnt 2 limit 1024p flows 1024
> > >> quantum 1514 target 5.0ms interval 100.0ms ecn
> > >> Sent 12306 bytes 128 pkt (dropped 0, overlimits 0 requeues 0)
> > >> backlog 0b 0p requeues 0
> > >> maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
> > >> new_flows_len 0 old_flows_len 0
> > >
> > >
> > > Limit of 1024 packets and 1024 flows is not wise I think.
> > >
> > > (If all buckets are in use, each bucket has a virtual queue of 1 packet,
> > > which is almost the same than having no queue at all)
> > >
> > > I suggest to have at least 8 packets per bucket, to let Codel have a
> > > chance to trigger.
> > >
> > > So you could either reduce number of buckets to 128 (if memory is
> > > tight), or increase limit to 8192.
> >
> > Will try, but what I've posted is default, I didn't change/configure that.
>
> fq_codel has a default of 10240 packets and 1024 buckets.
>
> http://lxr.free-electrons.com/source/net/sched/sch_fq_codel.c#L413
>
> If someone changed that in the linux variant you use, he probably should
> explain the rationale.
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
Author of http://www.iptv-analyzer.org
LinkedIn: http://www.linkedin.com/in/brouer
_______________________________________________
Codel mailing list
Codel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/codel
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: OpenWRT wrong adjustment of fq_codel defaults (Was: fq_codel_drop vs a udp flood)
2016-05-06 9:42 ` OpenWRT wrong adjustment of fq_codel defaults (Was: fq_codel_drop vs a udp flood) Jesper Dangaard Brouer
@ 2016-05-06 12:47 ` Jesper Dangaard Brouer
2016-05-06 18:43 ` Roman Yeryomin
2016-05-07 9:57 ` OpenWRT wrong adjustment of fq_codel defaults (Was: [Codel] " Kevin Darbyshire-Bryant
1 sibling, 1 reply; 30+ messages in thread
From: Jesper Dangaard Brouer @ 2016-05-06 12:47 UTC (permalink / raw)
To: Felix Fietkau, Dave Taht
Cc: make-wifi-fast, zajec5, ath10k, codel@lists.bufferbloat.net,
netdev@vger.kernel.org, Jonathan Morton, Roman Yeryomin,
openwrt-devel
I've created a OpenWRT ticket[1] on this issue, as it seems that someone[2]
closed Felix'es OpenWRT email account (bad choice! emails bouncing).
Sounds like OpenWRT and the LEDE https://www.lede-project.org/ project
is in some kind of conflict.
OpenWRT ticket [1] https://dev.openwrt.org/ticket/22349
[2] http://thread.gmane.org/gmane.comp.embedded.openwrt.devel/40298/focus=40335
On Fri, 6 May 2016 11:42:43 +0200
Jesper Dangaard Brouer <brouer@redhat.com> wrote:
> Hi Felix,
>
> This is an important fix for OpenWRT, please read!
>
> OpenWRT changed the default fq_codel sch->limit from 10240 to 1024,
> without also adjusting q->flows_cnt. Eric explains below that you must
> also adjust the buckets (q->flows_cnt) for this not to break. (Just
> adjust it to 128)
>
> Problematic OpenWRT commit in question:
> http://git.openwrt.org/?p=openwrt.git;a=patch;h=12cd6578084e
> 12cd6578084e ("kernel: revert fq_codel quantum override to prevent it from causing too much cpu load with higher speed (#21326)")
>
>
> I also highly recommend you cherry-pick this very recent commit:
> net-next: 9d18562a2278 ("fq_codel: add batch ability to fq_codel_drop()")
> https://git.kernel.org/davem/net-next/c/9d18562a227
>
> This should fix very high CPU usage in-case fq_codel goes into drop mode.
> The problem is that drop mode was considered rare, and implementation
> wise it was chosen to be more expensive (to save cycles on normal mode).
> Unfortunately is it easy to trigger with an UDP flood. Drop mode is
> especially expensive for smaller devices, as it scans a 4K big array,
> thus 64 cache misses for small devices!
>
> The fix is to allow drop-mode to bulk-drop more packets when entering
> drop-mode (default 64 bulk drop). That way we don't suddenly
> experience a significantly higher processing cost per packet, but
> instead can amortize this.
>
> To Eric, should we recommend OpenWRT to adjust default (max) 64 bulk
> drop, given we also recommend bucket size to be 128 ? (thus the amount
> of memory to scan is less, but their CPU is also much smaller).
>
> --Jesper
>
>
> On Thu, 05 May 2016 12:23:27 -0700 Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> > On Thu, 2016-05-05 at 19:25 +0300, Roman Yeryomin wrote:
> > > On 5 May 2016 at 19:12, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > > > On Thu, 2016-05-05 at 17:53 +0300, Roman Yeryomin wrote:
> > > >
> > > >>
> > > >> qdisc fq_codel 0: dev eth0 root refcnt 2 limit 1024p flows 1024
> > > >> quantum 1514 target 5.0ms interval 100.0ms ecn
> > > >> Sent 12306 bytes 128 pkt (dropped 0, overlimits 0 requeues 0)
> > > >> backlog 0b 0p requeues 0
> > > >> maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
> > > >> new_flows_len 0 old_flows_len 0
> > > >
> > > >
> > > > Limit of 1024 packets and 1024 flows is not wise I think.
> > > >
> > > > (If all buckets are in use, each bucket has a virtual queue of 1 packet,
> > > > which is almost the same than having no queue at all)
> > > >
> > > > I suggest to have at least 8 packets per bucket, to let Codel have a
> > > > chance to trigger.
> > > >
> > > > So you could either reduce number of buckets to 128 (if memory is
> > > > tight), or increase limit to 8192.
> > >
> > > Will try, but what I've posted is default, I didn't change/configure that.
> >
> > fq_codel has a default of 10240 packets and 1024 buckets.
> >
> > http://lxr.free-electrons.com/source/net/sched/sch_fq_codel.c#L413
> >
> > If someone changed that in the linux variant you use, he probably should
> > explain the rationale.
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
Author of http://www.iptv-analyzer.org
LinkedIn: http://www.linkedin.com/in/brouer
_______________________________________________
Codel mailing list
Codel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/codel
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH net-next] fq_codel: add memory limitation per queue
[not found] ` <1462541156.13075.34.camel@edumazet-glaptop3.roam.corp.google.com>
@ 2016-05-06 15:55 ` Eric Dumazet
2016-05-09 3:49 ` David Miller
` (2 more replies)
0 siblings, 3 replies; 30+ messages in thread
From: Eric Dumazet @ 2016-05-06 15:55 UTC (permalink / raw)
To: David Miller; +Cc: Jesper Dangaard Brouer, Dave Täht, netdev, moeller0
From: Eric Dumazet <edumazet@google.com>
On small embedded routers, one wants to control maximal amount of
memory used by fq_codel, instead of controlling number of packets or
bytes, since GRO/TSO make these not practical.
Assuming skb->truesize is accurate, we have to keep track of
skb->truesize sum for skbs in queue.
This patch adds a new TCA_FQ_CODEL_MEMORY_LIMIT attribute.
I chose a default value of 32 MBytes, which looks reasonable even
for heavy duty usages. (Prior fq_codel users should not be hurt
when they upgrade their kernels)
Two fields are added to tc_fq_codel_qd_stats to report :
- Current memory usage
- Number of drops caused by memory limits
# tc qd replace dev eth1 root est 1sec 4sec fq_codel memory_limit 4M
..
# tc -s -d qd sh dev eth1
qdisc fq_codel 8008: root refcnt 257 limit 10240p flows 1024
quantum 1514 target 5.0ms interval 100.0ms memory_limit 4Mb ecn
Sent 2083566791363 bytes 1376214889 pkt (dropped 4994406, overlimits 0
requeues 21705223)
rate 9841Mbit 812549pps backlog 3906120b 376p requeues 21705223
maxpacket 68130 drop_overlimit 4994406 new_flow_count 28855414
ecn_mark 0 memory_used 4190048 drop_overmemory 4994406
new_flows_len 1 old_flows_len 177
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Dave Täht <dave.taht@gmail.com>
Cc: Sebastian Möller <moeller0@gmx.de>
---
include/uapi/linux/pkt_sched.h | 3 +++
net/sched/sch_fq_codel.c | 27 ++++++++++++++++++++++++---
2 files changed, 27 insertions(+), 3 deletions(-)
diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
index a11afecd4482..2382eed50278 100644
--- a/include/uapi/linux/pkt_sched.h
+++ b/include/uapi/linux/pkt_sched.h
@@ -719,6 +719,7 @@ enum {
TCA_FQ_CODEL_QUANTUM,
TCA_FQ_CODEL_CE_THRESHOLD,
TCA_FQ_CODEL_DROP_BATCH_SIZE,
+ TCA_FQ_CODEL_MEMORY_LIMIT,
__TCA_FQ_CODEL_MAX
};
@@ -743,6 +744,8 @@ struct tc_fq_codel_qd_stats {
__u32 new_flows_len; /* count of flows in new list */
__u32 old_flows_len; /* count of flows in old list */
__u32 ce_mark; /* packets above ce_threshold */
+ __u32 memory_usage; /* in bytes */
+ __u32 drop_overmemory;
};
struct tc_fq_codel_cl_stats {
diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c
index e7b42b0d5145..bb8bd9314629 100644
--- a/net/sched/sch_fq_codel.c
+++ b/net/sched/sch_fq_codel.c
@@ -60,8 +60,11 @@ struct fq_codel_sched_data {
u32 perturbation; /* hash perturbation */
u32 quantum; /* psched_mtu(qdisc_dev(sch)); */
u32 drop_batch_size;
+ u32 memory_limit;
struct codel_params cparams;
struct codel_stats cstats;
+ u32 memory_usage;
+ u32 drop_overmemory;
u32 drop_overlimit;
u32 new_flow_count;
@@ -143,6 +146,7 @@ static unsigned int fq_codel_drop(struct Qdisc *sch, unsigned int max_packets)
unsigned int maxbacklog = 0, idx = 0, i, len;
struct fq_codel_flow *flow;
unsigned int threshold;
+ unsigned int mem = 0;
/* Queue is full! Find the fat flow and drop packet(s) from it.
* This might sound expensive, but with 1024 flows, we scan
@@ -167,11 +171,13 @@ static unsigned int fq_codel_drop(struct Qdisc *sch, unsigned int max_packets)
do {
skb = dequeue_head(flow);
len += qdisc_pkt_len(skb);
+ mem += skb->truesize;
kfree_skb(skb);
} while (++i < max_packets && len < threshold);
flow->dropped += i;
q->backlogs[idx] -= len;
+ q->memory_usage -= mem;
sch->qstats.drops += i;
sch->qstats.backlog -= len;
sch->q.qlen -= i;
@@ -193,6 +199,7 @@ static int fq_codel_enqueue(struct sk_buff *skb, struct Qdisc *sch)
unsigned int idx, prev_backlog, prev_qlen;
struct fq_codel_flow *flow;
int uninitialized_var(ret);
+ bool memory_limited;
idx = fq_codel_classify(skb, sch, &ret);
if (idx == 0) {
@@ -215,7 +222,9 @@ static int fq_codel_enqueue(struct sk_buff *skb, struct Qdisc *sch)
flow->deficit = q->quantum;
flow->dropped = 0;
}
- if (++sch->q.qlen <= sch->limit)
+ q->memory_usage += skb->truesize;
+ memory_limited = q->memory_usage > q->memory_limit;
+ if (++sch->q.qlen <= sch->limit && !memory_limited)
return NET_XMIT_SUCCESS;
prev_backlog = sch->qstats.backlog;
@@ -229,7 +238,8 @@ static int fq_codel_enqueue(struct sk_buff *skb, struct Qdisc *sch)
ret = fq_codel_drop(sch, q->drop_batch_size);
q->drop_overlimit += prev_qlen - sch->q.qlen;
-
+ if (memory_limited)
+ q->drop_overmemory += prev_qlen - sch->q.qlen;
/* As we dropped packet(s), better let upper stack know this */
qdisc_tree_reduce_backlog(sch, prev_qlen - sch->q.qlen,
prev_backlog - sch->qstats.backlog);
@@ -308,6 +318,7 @@ begin:
list_del_init(&flow->flowchain);
goto begin;
}
+ q->memory_usage -= skb->truesize;
qdisc_bstats_update(sch, skb);
flow->deficit -= qdisc_pkt_len(skb);
/* We cant call qdisc_tree_reduce_backlog() if our qlen is 0,
@@ -355,6 +366,7 @@ static const struct nla_policy fq_codel_policy[TCA_FQ_CODEL_MAX + 1] = {
[TCA_FQ_CODEL_QUANTUM] = { .type = NLA_U32 },
[TCA_FQ_CODEL_CE_THRESHOLD] = { .type = NLA_U32 },
[TCA_FQ_CODEL_DROP_BATCH_SIZE] = { .type = NLA_U32 },
+ [TCA_FQ_CODEL_MEMORY_LIMIT] = { .type = NLA_U32 },
};
static int fq_codel_change(struct Qdisc *sch, struct nlattr *opt)
@@ -409,7 +421,11 @@ static int fq_codel_change(struct Qdisc *sch, struct nlattr *opt)
if (tb[TCA_FQ_CODEL_DROP_BATCH_SIZE])
q->drop_batch_size = min(1U, nla_get_u32(tb[TCA_FQ_CODEL_DROP_BATCH_SIZE]));
- while (sch->q.qlen > sch->limit) {
+ if (tb[TCA_FQ_CODEL_MEMORY_LIMIT])
+ q->memory_limit = min(1U << 31, nla_get_u32(tb[TCA_FQ_CODEL_MEMORY_LIMIT]));
+
+ while (sch->q.qlen > sch->limit ||
+ q->memory_usage > q->memory_limit) {
struct sk_buff *skb = fq_codel_dequeue(sch);
q->cstats.drop_len += qdisc_pkt_len(skb);
@@ -454,6 +470,7 @@ static int fq_codel_init(struct Qdisc *sch, struct nlattr *opt)
sch->limit = 10*1024;
q->flows_cnt = 1024;
+ q->memory_limit = 32 << 20; /* 32 MBytes */
q->drop_batch_size = 64;
q->quantum = psched_mtu(qdisc_dev(sch));
q->perturbation = prandom_u32();
@@ -515,6 +532,8 @@ static int fq_codel_dump(struct Qdisc *sch, struct sk_buff *skb)
q->quantum) ||
nla_put_u32(skb, TCA_FQ_CODEL_DROP_BATCH_SIZE,
q->drop_batch_size) ||
+ nla_put_u32(skb, TCA_FQ_CODEL_MEMORY_LIMIT,
+ q->memory_limit) ||
nla_put_u32(skb, TCA_FQ_CODEL_FLOWS,
q->flows_cnt))
goto nla_put_failure;
@@ -543,6 +562,8 @@ static int fq_codel_dump_stats(struct Qdisc *sch, struct gnet_dump *d)
st.qdisc_stats.ecn_mark = q->cstats.ecn_mark;
st.qdisc_stats.new_flow_count = q->new_flow_count;
st.qdisc_stats.ce_mark = q->cstats.ce_mark;
+ st.qdisc_stats.memory_usage = q->memory_usage;
+ st.qdisc_stats.drop_overmemory = q->drop_overmemory;
list_for_each(pos, &q->new_flows)
st.qdisc_stats.new_flows_len++;
^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: OpenWRT wrong adjustment of fq_codel defaults (Was: fq_codel_drop vs a udp flood)
2016-05-06 12:47 ` Jesper Dangaard Brouer
@ 2016-05-06 18:43 ` Roman Yeryomin
2016-05-06 18:56 ` Roman Yeryomin
0 siblings, 1 reply; 30+ messages in thread
From: Roman Yeryomin @ 2016-05-06 18:43 UTC (permalink / raw)
To: Jesper Dangaard Brouer
Cc: make-wifi-fast, Rafał Miłecki, ath10k,
codel@lists.bufferbloat.net, netdev@vger.kernel.org,
Jonathan Morton, OpenWrt Development List, Felix Fietkau
On 6 May 2016 at 15:47, Jesper Dangaard Brouer <brouer@redhat.com> wrote:
>
> I've created a OpenWRT ticket[1] on this issue, as it seems that someone[2]
> closed Felix'es OpenWRT email account (bad choice! emails bouncing).
> Sounds like OpenWRT and the LEDE https://www.lede-project.org/ project
> is in some kind of conflict.
>
> OpenWRT ticket [1] https://dev.openwrt.org/ticket/22349
>
> [2] http://thread.gmane.org/gmane.comp.embedded.openwrt.devel/40298/focus=40335
OK, so, after porting the patch to 4.1 openwrt kernel and playing a
bit with fq_codel limits I was able to get 420Mbps UDP like this:
tc qdisc replace dev wlan0 parent :1 fq_codel flows 16 limit 256
This is certainly better than 30Mbps but still more than two times
less than before (900).
TCP also improved a little (550 to ~590).
Felix, others, do you want to see the ported patch, maybe I did something wrong?
Doesn't look like it will save ath10k from performance regression.
>
> On Fri, 6 May 2016 11:42:43 +0200
> Jesper Dangaard Brouer <brouer@redhat.com> wrote:
>
>> Hi Felix,
>>
>> This is an important fix for OpenWRT, please read!
>>
>> OpenWRT changed the default fq_codel sch->limit from 10240 to 1024,
>> without also adjusting q->flows_cnt. Eric explains below that you must
>> also adjust the buckets (q->flows_cnt) for this not to break. (Just
>> adjust it to 128)
>>
>> Problematic OpenWRT commit in question:
>> http://git.openwrt.org/?p=openwrt.git;a=patch;h=12cd6578084e
>> 12cd6578084e ("kernel: revert fq_codel quantum override to prevent it from causing too much cpu load with higher speed (#21326)")
>>
>>
>> I also highly recommend you cherry-pick this very recent commit:
>> net-next: 9d18562a2278 ("fq_codel: add batch ability to fq_codel_drop()")
>> https://git.kernel.org/davem/net-next/c/9d18562a227
>>
>> This should fix very high CPU usage in-case fq_codel goes into drop mode.
>> The problem is that drop mode was considered rare, and implementation
>> wise it was chosen to be more expensive (to save cycles on normal mode).
>> Unfortunately is it easy to trigger with an UDP flood. Drop mode is
>> especially expensive for smaller devices, as it scans a 4K big array,
>> thus 64 cache misses for small devices!
>>
>> The fix is to allow drop-mode to bulk-drop more packets when entering
>> drop-mode (default 64 bulk drop). That way we don't suddenly
>> experience a significantly higher processing cost per packet, but
>> instead can amortize this.
>>
>> To Eric, should we recommend OpenWRT to adjust default (max) 64 bulk
>> drop, given we also recommend bucket size to be 128 ? (thus the amount
>> of memory to scan is less, but their CPU is also much smaller).
>>
>> --Jesper
>>
>>
>> On Thu, 05 May 2016 12:23:27 -0700 Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>
>> > On Thu, 2016-05-05 at 19:25 +0300, Roman Yeryomin wrote:
>> > > On 5 May 2016 at 19:12, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> > > > On Thu, 2016-05-05 at 17:53 +0300, Roman Yeryomin wrote:
>> > > >
>> > > >>
>> > > >> qdisc fq_codel 0: dev eth0 root refcnt 2 limit 1024p flows 1024
>> > > >> quantum 1514 target 5.0ms interval 100.0ms ecn
>> > > >> Sent 12306 bytes 128 pkt (dropped 0, overlimits 0 requeues 0)
>> > > >> backlog 0b 0p requeues 0
>> > > >> maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>> > > >> new_flows_len 0 old_flows_len 0
>> > > >
>> > > >
>> > > > Limit of 1024 packets and 1024 flows is not wise I think.
>> > > >
>> > > > (If all buckets are in use, each bucket has a virtual queue of 1 packet,
>> > > > which is almost the same than having no queue at all)
>> > > >
>> > > > I suggest to have at least 8 packets per bucket, to let Codel have a
>> > > > chance to trigger.
>> > > >
>> > > > So you could either reduce number of buckets to 128 (if memory is
>> > > > tight), or increase limit to 8192.
>> > >
>> > > Will try, but what I've posted is default, I didn't change/configure that.
>> >
>> > fq_codel has a default of 10240 packets and 1024 buckets.
>> >
>> > http://lxr.free-electrons.com/source/net/sched/sch_fq_codel.c#L413
>> >
>> > If someone changed that in the linux variant you use, he probably should
>> > explain the rationale.
>
> --
> Best regards,
> Jesper Dangaard Brouer
> MSc.CS, Principal Kernel Engineer at Red Hat
> Author of http://www.iptv-analyzer.org
> LinkedIn: http://www.linkedin.com/in/brouer
_______________________________________________
Codel mailing list
Codel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/codel
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: OpenWRT wrong adjustment of fq_codel defaults (Was: fq_codel_drop vs a udp flood)
2016-05-06 18:43 ` Roman Yeryomin
@ 2016-05-06 18:56 ` Roman Yeryomin
2016-05-06 19:43 ` Dave Taht
0 siblings, 1 reply; 30+ messages in thread
From: Roman Yeryomin @ 2016-05-06 18:56 UTC (permalink / raw)
To: Jesper Dangaard Brouer
Cc: make-wifi-fast, Rafał Miłecki, ath10k,
codel@lists.bufferbloat.net, netdev@vger.kernel.org,
Jonathan Morton, OpenWrt Development List, Felix Fietkau
On 6 May 2016 at 21:43, Roman Yeryomin <leroi.lists@gmail.com> wrote:
> On 6 May 2016 at 15:47, Jesper Dangaard Brouer <brouer@redhat.com> wrote:
>>
>> I've created a OpenWRT ticket[1] on this issue, as it seems that someone[2]
>> closed Felix'es OpenWRT email account (bad choice! emails bouncing).
>> Sounds like OpenWRT and the LEDE https://www.lede-project.org/ project
>> is in some kind of conflict.
>>
>> OpenWRT ticket [1] https://dev.openwrt.org/ticket/22349
>>
>> [2] http://thread.gmane.org/gmane.comp.embedded.openwrt.devel/40298/focus=40335
>
> OK, so, after porting the patch to 4.1 openwrt kernel and playing a
> bit with fq_codel limits I was able to get 420Mbps UDP like this:
> tc qdisc replace dev wlan0 parent :1 fq_codel flows 16 limit 256
Forgot to mention, I've reduced drop_batch_size down to 32
> This is certainly better than 30Mbps but still more than two times
> less than before (900).
> TCP also improved a little (550 to ~590).
>
> Felix, others, do you want to see the ported patch, maybe I did something wrong?
> Doesn't look like it will save ath10k from performance regression.
>
>>
>> On Fri, 6 May 2016 11:42:43 +0200
>> Jesper Dangaard Brouer <brouer@redhat.com> wrote:
>>
>>> Hi Felix,
>>>
>>> This is an important fix for OpenWRT, please read!
>>>
>>> OpenWRT changed the default fq_codel sch->limit from 10240 to 1024,
>>> without also adjusting q->flows_cnt. Eric explains below that you must
>>> also adjust the buckets (q->flows_cnt) for this not to break. (Just
>>> adjust it to 128)
>>>
>>> Problematic OpenWRT commit in question:
>>> http://git.openwrt.org/?p=openwrt.git;a=patch;h=12cd6578084e
>>> 12cd6578084e ("kernel: revert fq_codel quantum override to prevent it from causing too much cpu load with higher speed (#21326)")
>>>
>>>
>>> I also highly recommend you cherry-pick this very recent commit:
>>> net-next: 9d18562a2278 ("fq_codel: add batch ability to fq_codel_drop()")
>>> https://git.kernel.org/davem/net-next/c/9d18562a227
>>>
>>> This should fix very high CPU usage in-case fq_codel goes into drop mode.
>>> The problem is that drop mode was considered rare, and implementation
>>> wise it was chosen to be more expensive (to save cycles on normal mode).
>>> Unfortunately is it easy to trigger with an UDP flood. Drop mode is
>>> especially expensive for smaller devices, as it scans a 4K big array,
>>> thus 64 cache misses for small devices!
>>>
>>> The fix is to allow drop-mode to bulk-drop more packets when entering
>>> drop-mode (default 64 bulk drop). That way we don't suddenly
>>> experience a significantly higher processing cost per packet, but
>>> instead can amortize this.
>>>
>>> To Eric, should we recommend OpenWRT to adjust default (max) 64 bulk
>>> drop, given we also recommend bucket size to be 128 ? (thus the amount
>>> of memory to scan is less, but their CPU is also much smaller).
>>>
>>> --Jesper
>>>
>>>
>>> On Thu, 05 May 2016 12:23:27 -0700 Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>>
>>> > On Thu, 2016-05-05 at 19:25 +0300, Roman Yeryomin wrote:
>>> > > On 5 May 2016 at 19:12, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>> > > > On Thu, 2016-05-05 at 17:53 +0300, Roman Yeryomin wrote:
>>> > > >
>>> > > >>
>>> > > >> qdisc fq_codel 0: dev eth0 root refcnt 2 limit 1024p flows 1024
>>> > > >> quantum 1514 target 5.0ms interval 100.0ms ecn
>>> > > >> Sent 12306 bytes 128 pkt (dropped 0, overlimits 0 requeues 0)
>>> > > >> backlog 0b 0p requeues 0
>>> > > >> maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>>> > > >> new_flows_len 0 old_flows_len 0
>>> > > >
>>> > > >
>>> > > > Limit of 1024 packets and 1024 flows is not wise I think.
>>> > > >
>>> > > > (If all buckets are in use, each bucket has a virtual queue of 1 packet,
>>> > > > which is almost the same than having no queue at all)
>>> > > >
>>> > > > I suggest to have at least 8 packets per bucket, to let Codel have a
>>> > > > chance to trigger.
>>> > > >
>>> > > > So you could either reduce number of buckets to 128 (if memory is
>>> > > > tight), or increase limit to 8192.
>>> > >
>>> > > Will try, but what I've posted is default, I didn't change/configure that.
>>> >
>>> > fq_codel has a default of 10240 packets and 1024 buckets.
>>> >
>>> > http://lxr.free-electrons.com/source/net/sched/sch_fq_codel.c#L413
>>> >
>>> > If someone changed that in the linux variant you use, he probably should
>>> > explain the rationale.
>>
>> --
>> Best regards,
>> Jesper Dangaard Brouer
>> MSc.CS, Principal Kernel Engineer at Red Hat
>> Author of http://www.iptv-analyzer.org
>> LinkedIn: http://www.linkedin.com/in/brouer
_______________________________________________
Codel mailing list
Codel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/codel
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: OpenWRT wrong adjustment of fq_codel defaults (Was: fq_codel_drop vs a udp flood)
2016-05-06 18:56 ` Roman Yeryomin
@ 2016-05-06 19:43 ` Dave Taht
2016-05-15 22:34 ` Roman Yeryomin
0 siblings, 1 reply; 30+ messages in thread
From: Dave Taht @ 2016-05-06 19:43 UTC (permalink / raw)
To: Roman Yeryomin
Cc: make-wifi-fast, Rafał Miłecki, ath10k,
netdev@vger.kernel.org, codel@lists.bufferbloat.net,
Jonathan Morton, OpenWrt Development List, Felix Fietkau
On Fri, May 6, 2016 at 11:56 AM, Roman Yeryomin <leroi.lists@gmail.com> wrote:
> On 6 May 2016 at 21:43, Roman Yeryomin <leroi.lists@gmail.com> wrote:
>> On 6 May 2016 at 15:47, Jesper Dangaard Brouer <brouer@redhat.com> wrote:
>>>
>>> I've created a OpenWRT ticket[1] on this issue, as it seems that someone[2]
>>> closed Felix'es OpenWRT email account (bad choice! emails bouncing).
>>> Sounds like OpenWRT and the LEDE https://www.lede-project.org/ project
>>> is in some kind of conflict.
>>>
>>> OpenWRT ticket [1] https://dev.openwrt.org/ticket/22349
>>>
>>> [2] http://thread.gmane.org/gmane.comp.embedded.openwrt.devel/40298/focus=40335
>>
>> OK, so, after porting the patch to 4.1 openwrt kernel and playing a
>> bit with fq_codel limits I was able to get 420Mbps UDP like this:
>> tc qdisc replace dev wlan0 parent :1 fq_codel flows 16 limit 256
>
> Forgot to mention, I've reduced drop_batch_size down to 32
0) Not clear to me if that's the right line, there are 4 wifi queues,
and the third one
is the BE queue. That is too low a limit, also, for normal use. And:
for the purpose of this particular UDP test, flows 16 is ok, but not
ideal.
1) What's the tcp number (with a simultaneous ping) with this latest patchset?
(I care about tcp performance a lot more than udp floods - surviving a
udp flood yes, performance, no)
before/after?
tc -s qdisc show dev wlan0 during/after results?
IF you are doing builds for the archer c7v2, I can join in on this... (?)
I did do a test of the ath10k "before", fq_codel *never engaged*, and
tcp induced latencies under load, e at 100mbit, cracked 600ms, while
staying flat (20ms) at 100mbit. (not the same patches you are testing)
on x86. I have got tcp 300Mbit out of an osx box, similar latency,
have yet to get anything more on anything I currently have
before/after patchsets.
I'll go add flooding to the tests, I just finished a series comparing
two different speed stations and life was good on that.
"before" - fq_codel never engages, we see seconds of latency under load.
root@apu2:~# tc -s qdisc show dev wlp4s0
qdisc mq 0: root
Sent 8570563893 bytes 6326983 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
qdisc fq_codel 0: parent :1 limit 10240p flows 1024 quantum 1514
target 5.0ms interval 100.0ms ecn
Sent 2262 bytes 17 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: parent :2 limit 10240p flows 1024 quantum 1514
target 5.0ms interval 100.0ms ecn
Sent 220486569 bytes 152058 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
maxpacket 18168 drop_overlimit 0 new_flow_count 1 ecn_mark 0
new_flows_len 0 old_flows_len 1
qdisc fq_codel 0: parent :3 limit 10240p flows 1024 quantum 1514
target 5.0ms interval 100.0ms ecn
Sent 8340546509 bytes 6163431 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
maxpacket 68130 drop_overlimit 0 new_flow_count 120050 ecn_mark 0
new_flows_len 1 old_flows_len 3
qdisc fq_codel 0: parent :4 limit 10240p flows 1024 quantum 1514
target 5.0ms interval 100.0ms ecn
Sent 9528553 bytes 11477 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
maxpacket 66 drop_overlimit 0 new_flow_count 1 ecn_mark 0
new_flows_len 1 old_flows_len 0
```
>> This is certainly better than 30Mbps but still more than two times
>> less than before (900).
The number that I still am not sure we got is that you were sending
900mbit udp and recieving 900mbit on the prior tests?
>> TCP also improved a little (550 to ~590).
The limit is probably a bit low, also. You might want to try target
20ms as well.
>>
>> Felix, others, do you want to see the ported patch, maybe I did something wrong?
>> Doesn't look like it will save ath10k from performance regression.
what was tcp "before"? (I'm sorry, such a long thread)
>>
>>>
>>> On Fri, 6 May 2016 11:42:43 +0200
>>> Jesper Dangaard Brouer <brouer@redhat.com> wrote:
>>>
>>>> Hi Felix,
>>>>
>>>> This is an important fix for OpenWRT, please read!
>>>>
>>>> OpenWRT changed the default fq_codel sch->limit from 10240 to 1024,
>>>> without also adjusting q->flows_cnt. Eric explains below that you must
>>>> also adjust the buckets (q->flows_cnt) for this not to break. (Just
>>>> adjust it to 128)
>>>>
>>>> Problematic OpenWRT commit in question:
>>>> http://git.openwrt.org/?p=openwrt.git;a=patch;h=12cd6578084e
>>>> 12cd6578084e ("kernel: revert fq_codel quantum override to prevent it from causing too much cpu load with higher speed (#21326)")
>>>>
>>>>
>>>> I also highly recommend you cherry-pick this very recent commit:
>>>> net-next: 9d18562a2278 ("fq_codel: add batch ability to fq_codel_drop()")
>>>> https://git.kernel.org/davem/net-next/c/9d18562a227
>>>>
>>>> This should fix very high CPU usage in-case fq_codel goes into drop mode.
>>>> The problem is that drop mode was considered rare, and implementation
>>>> wise it was chosen to be more expensive (to save cycles on normal mode).
>>>> Unfortunately is it easy to trigger with an UDP flood. Drop mode is
>>>> especially expensive for smaller devices, as it scans a 4K big array,
>>>> thus 64 cache misses for small devices!
>>>>
>>>> The fix is to allow drop-mode to bulk-drop more packets when entering
>>>> drop-mode (default 64 bulk drop). That way we don't suddenly
>>>> experience a significantly higher processing cost per packet, but
>>>> instead can amortize this.
>>>>
>>>> To Eric, should we recommend OpenWRT to adjust default (max) 64 bulk
>>>> drop, given we also recommend bucket size to be 128 ? (thus the amount
>>>> of memory to scan is less, but their CPU is also much smaller).
>>>>
>>>> --Jesper
>>>>
>>>>
>>>> On Thu, 05 May 2016 12:23:27 -0700 Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>>>
>>>> > On Thu, 2016-05-05 at 19:25 +0300, Roman Yeryomin wrote:
>>>> > > On 5 May 2016 at 19:12, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>>> > > > On Thu, 2016-05-05 at 17:53 +0300, Roman Yeryomin wrote:
>>>> > > >
>>>> > > >>
>>>> > > >> qdisc fq_codel 0: dev eth0 root refcnt 2 limit 1024p flows 1024
>>>> > > >> quantum 1514 target 5.0ms interval 100.0ms ecn
>>>> > > >> Sent 12306 bytes 128 pkt (dropped 0, overlimits 0 requeues 0)
>>>> > > >> backlog 0b 0p requeues 0
>>>> > > >> maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>>>> > > >> new_flows_len 0 old_flows_len 0
>>>> > > >
>>>> > > >
>>>> > > > Limit of 1024 packets and 1024 flows is not wise I think.
>>>> > > >
>>>> > > > (If all buckets are in use, each bucket has a virtual queue of 1 packet,
>>>> > > > which is almost the same than having no queue at all)
>>>> > > >
>>>> > > > I suggest to have at least 8 packets per bucket, to let Codel have a
>>>> > > > chance to trigger.
>>>> > > >
>>>> > > > So you could either reduce number of buckets to 128 (if memory is
>>>> > > > tight), or increase limit to 8192.
>>>> > >
>>>> > > Will try, but what I've posted is default, I didn't change/configure that.
>>>> >
>>>> > fq_codel has a default of 10240 packets and 1024 buckets.
>>>> >
>>>> > http://lxr.free-electrons.com/source/net/sched/sch_fq_codel.c#L413
>>>> >
>>>> > If someone changed that in the linux variant you use, he probably should
>>>> > explain the rationale.
>>>
>>> --
>>> Best regards,
>>> Jesper Dangaard Brouer
>>> MSc.CS, Principal Kernel Engineer at Red Hat
>>> Author of http://www.iptv-analyzer.org
>>> LinkedIn: http://www.linkedin.com/in/brouer
--
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org
_______________________________________________
Codel mailing list
Codel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/codel
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: OpenWRT wrong adjustment of fq_codel defaults (Was: [Codel] fq_codel_drop vs a udp flood)
2016-05-06 9:42 ` OpenWRT wrong adjustment of fq_codel defaults (Was: fq_codel_drop vs a udp flood) Jesper Dangaard Brouer
2016-05-06 12:47 ` Jesper Dangaard Brouer
@ 2016-05-07 9:57 ` Kevin Darbyshire-Bryant
2016-05-15 22:47 ` Roman Yeryomin
1 sibling, 1 reply; 30+ messages in thread
From: Kevin Darbyshire-Bryant @ 2016-05-07 9:57 UTC (permalink / raw)
To: Jesper Dangaard Brouer, Eric Dumazet, Felix Fietkau, Dave Taht
Cc: make-wifi-fast, zajec5, ath10k, codel@lists.bufferbloat.net,
netdev@vger.kernel.org, Jonathan Morton, Roman Yeryomin
[-- Attachment #1.1: Type: text/plain, Size: 1814 bytes --]
On 06/05/16 10:42, Jesper Dangaard Brouer wrote:
> Hi Felix,
>
> This is an important fix for OpenWRT, please read!
>
> OpenWRT changed the default fq_codel sch->limit from 10240 to 1024,
> without also adjusting q->flows_cnt. Eric explains below that you must
> also adjust the buckets (q->flows_cnt) for this not to break. (Just
> adjust it to 128)
>
> Problematic OpenWRT commit in question:
> http://git.openwrt.org/?p=openwrt.git;a=patch;h=12cd6578084e
> 12cd6578084e ("kernel: revert fq_codel quantum override to prevent it from causing too much cpu load with higher speed (#21326)")
I 'pull requested' this to the lede-staging tree on github.
https://github.com/lede-project/staging/pull/11
One way or another Felix & co should see the change :-)
>
>
> I also highly recommend you cherry-pick this very recent commit:
> net-next: 9d18562a2278 ("fq_codel: add batch ability to fq_codel_drop()")
> https://git.kernel.org/davem/net-next/c/9d18562a227
>
> This should fix very high CPU usage in-case fq_codel goes into drop mode.
> The problem is that drop mode was considered rare, and implementation
> wise it was chosen to be more expensive (to save cycles on normal mode).
> Unfortunately is it easy to trigger with an UDP flood. Drop mode is
> especially expensive for smaller devices, as it scans a 4K big array,
> thus 64 cache misses for small devices!
>
> The fix is to allow drop-mode to bulk-drop more packets when entering
> drop-mode (default 64 bulk drop). That way we don't suddenly
> experience a significantly higher processing cost per packet, but
> instead can amortize this.
I haven't done the above cherry-pick patch & backport patch creation for
4.4/4.1/3.18 yet - maybe if $dayjob permits time and no one else beats
me to it :-)
Kevin
[-- Attachment #1.2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4816 bytes --]
[-- Attachment #2: Type: text/plain, Size: 146 bytes --]
_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH net-next] fq_codel: add memory limitation per queue
2016-05-06 15:55 ` [PATCH net-next] fq_codel: add memory limitation per queue Eric Dumazet
@ 2016-05-09 3:49 ` David Miller
2016-05-09 4:14 ` Cong Wang
2016-05-16 1:16 ` [PATCH net-next] fq_codel: fix memory limitation drift Eric Dumazet
2 siblings, 0 replies; 30+ messages in thread
From: David Miller @ 2016-05-09 3:49 UTC (permalink / raw)
To: eric.dumazet; +Cc: brouer, dave.taht, netdev, moeller0
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 06 May 2016 08:55:12 -0700
> From: Eric Dumazet <edumazet@google.com>
>
> On small embedded routers, one wants to control maximal amount of
> memory used by fq_codel, instead of controlling number of packets or
> bytes, since GRO/TSO make these not practical.
>
> Assuming skb->truesize is accurate, we have to keep track of
> skb->truesize sum for skbs in queue.
>
> This patch adds a new TCA_FQ_CODEL_MEMORY_LIMIT attribute.
>
> I chose a default value of 32 MBytes, which looks reasonable even
> for heavy duty usages. (Prior fq_codel users should not be hurt
> when they upgrade their kernels)
>
> Two fields are added to tc_fq_codel_qd_stats to report :
> - Current memory usage
> - Number of drops caused by memory limits
>
> # tc qd replace dev eth1 root est 1sec 4sec fq_codel memory_limit 4M
> ..
> # tc -s -d qd sh dev eth1
> qdisc fq_codel 8008: root refcnt 257 limit 10240p flows 1024
> quantum 1514 target 5.0ms interval 100.0ms memory_limit 4Mb ecn
> Sent 2083566791363 bytes 1376214889 pkt (dropped 4994406, overlimits 0
> requeues 21705223)
> rate 9841Mbit 812549pps backlog 3906120b 376p requeues 21705223
> maxpacket 68130 drop_overlimit 4994406 new_flow_count 28855414
> ecn_mark 0 memory_used 4190048 drop_overmemory 4994406
> new_flows_len 1 old_flows_len 177
>
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
Applied, thanks Eric.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH net-next] fq_codel: add memory limitation per queue
2016-05-06 15:55 ` [PATCH net-next] fq_codel: add memory limitation per queue Eric Dumazet
2016-05-09 3:49 ` David Miller
@ 2016-05-09 4:14 ` Cong Wang
2016-05-09 4:31 ` Eric Dumazet
2016-05-16 1:16 ` [PATCH net-next] fq_codel: fix memory limitation drift Eric Dumazet
2 siblings, 1 reply; 30+ messages in thread
From: Cong Wang @ 2016-05-09 4:14 UTC (permalink / raw)
To: Eric Dumazet
Cc: David Miller, Jesper Dangaard Brouer, Dave Täht, netdev,
moeller0
On Fri, May 6, 2016 at 8:55 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> @@ -193,6 +199,7 @@ static int fq_codel_enqueue(struct sk_buff *skb, struct Qdisc *sch)
> unsigned int idx, prev_backlog, prev_qlen;
> struct fq_codel_flow *flow;
> int uninitialized_var(ret);
> + bool memory_limited;
>
> idx = fq_codel_classify(skb, sch, &ret);
> if (idx == 0) {
> @@ -215,7 +222,9 @@ static int fq_codel_enqueue(struct sk_buff *skb, struct Qdisc *sch)
> flow->deficit = q->quantum;
> flow->dropped = 0;
> }
> - if (++sch->q.qlen <= sch->limit)
> + q->memory_usage += skb->truesize;
> + memory_limited = q->memory_usage > q->memory_limit;
> + if (++sch->q.qlen <= sch->limit && !memory_limited)
> return NET_XMIT_SUCCESS;
>
> prev_backlog = sch->qstats.backlog;
> @@ -229,7 +238,8 @@ static int fq_codel_enqueue(struct sk_buff *skb, struct Qdisc *sch)
> ret = fq_codel_drop(sch, q->drop_batch_size);
>
> q->drop_overlimit += prev_qlen - sch->q.qlen;
> -
> + if (memory_limited)
> + q->drop_overmemory += prev_qlen - sch->q.qlen;
So when the packet is dropped due to memory over limit, should
we return failure for this case? Or I miss anything?
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH net-next] fq_codel: add memory limitation per queue
2016-05-09 4:14 ` Cong Wang
@ 2016-05-09 4:31 ` Eric Dumazet
2016-05-09 5:07 ` Cong Wang
0 siblings, 1 reply; 30+ messages in thread
From: Eric Dumazet @ 2016-05-09 4:31 UTC (permalink / raw)
To: Cong Wang
Cc: David Miller, Jesper Dangaard Brouer, Dave Täht, netdev,
moeller0
On Sun, 2016-05-08 at 21:14 -0700, Cong Wang wrote:
> So when the packet is dropped due to memory over limit, should
> we return failure for this case? Or I miss anything?
Same behavior than before.
If we dropped some packets of this flow, we return NET_XMIT_CN
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH net-next] fq_codel: add memory limitation per queue
2016-05-09 4:31 ` Eric Dumazet
@ 2016-05-09 5:07 ` Cong Wang
2016-05-09 14:26 ` Eric Dumazet
0 siblings, 1 reply; 30+ messages in thread
From: Cong Wang @ 2016-05-09 5:07 UTC (permalink / raw)
To: Eric Dumazet
Cc: David Miller, Jesper Dangaard Brouer, Dave Täht, netdev,
moeller0
On Sun, May 8, 2016 at 9:31 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Sun, 2016-05-08 at 21:14 -0700, Cong Wang wrote:
>
>> So when the packet is dropped due to memory over limit, should
>> we return failure for this case? Or I miss anything?
>
> Same behavior than before.
>
> If we dropped some packets of this flow, we return NET_XMIT_CN
I think for the limited memory case, the upper layer is supposed
to stop sending more packets when hitting the limit.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH net-next] fq_codel: add memory limitation per queue
2016-05-09 5:07 ` Cong Wang
@ 2016-05-09 14:26 ` Eric Dumazet
2016-05-10 4:34 ` Cong Wang
0 siblings, 1 reply; 30+ messages in thread
From: Eric Dumazet @ 2016-05-09 14:26 UTC (permalink / raw)
To: Cong Wang
Cc: David Miller, Jesper Dangaard Brouer, Dave Täht, netdev,
moeller0
On Sun, 2016-05-08 at 22:07 -0700, Cong Wang wrote:
> On Sun, May 8, 2016 at 9:31 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > On Sun, 2016-05-08 at 21:14 -0700, Cong Wang wrote:
> >
> >> So when the packet is dropped due to memory over limit, should
> >> we return failure for this case? Or I miss anything?
> >
> > Same behavior than before.
> >
> > If we dropped some packets of this flow, we return NET_XMIT_CN
>
> I think for the limited memory case, the upper layer is supposed
> to stop sending more packets when hitting the limit.
They doe. NET_XMIT_CN for example aborts IP fragmentation.
TCP flows will also instantly react.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH net-next] fq_codel: add memory limitation per queue
2016-05-09 14:26 ` Eric Dumazet
@ 2016-05-10 4:34 ` Cong Wang
2016-05-10 4:45 ` Eric Dumazet
0 siblings, 1 reply; 30+ messages in thread
From: Cong Wang @ 2016-05-10 4:34 UTC (permalink / raw)
To: Eric Dumazet
Cc: David Miller, Jesper Dangaard Brouer, Dave Täht, netdev,
moeller0
On Mon, May 9, 2016 at 7:26 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Sun, 2016-05-08 at 22:07 -0700, Cong Wang wrote:
>> On Sun, May 8, 2016 at 9:31 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> > On Sun, 2016-05-08 at 21:14 -0700, Cong Wang wrote:
>> >
>> >> So when the packet is dropped due to memory over limit, should
>> >> we return failure for this case? Or I miss anything?
>> >
>> > Same behavior than before.
>> >
>> > If we dropped some packets of this flow, we return NET_XMIT_CN
>>
>> I think for the limited memory case, the upper layer is supposed
>> to stop sending more packets when hitting the limit.
>
> They doe. NET_XMIT_CN for example aborts IP fragmentation.
>
> TCP flows will also instantly react.
But not for the NET_XMIT_SUCCESS case:
return ret == idx ? NET_XMIT_CN : NET_XMIT_SUCCESS;
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH net-next] fq_codel: add memory limitation per queue
2016-05-10 4:34 ` Cong Wang
@ 2016-05-10 4:45 ` Eric Dumazet
2016-05-10 4:57 ` Cong Wang
0 siblings, 1 reply; 30+ messages in thread
From: Eric Dumazet @ 2016-05-10 4:45 UTC (permalink / raw)
To: Cong Wang
Cc: David Miller, Jesper Dangaard Brouer, Dave Täht, netdev,
moeller0
On Mon, 2016-05-09 at 21:34 -0700, Cong Wang wrote:
> On Mon, May 9, 2016 at 7:26 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > On Sun, 2016-05-08 at 22:07 -0700, Cong Wang wrote:
> >> On Sun, May 8, 2016 at 9:31 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >> > On Sun, 2016-05-08 at 21:14 -0700, Cong Wang wrote:
> >> >
> >> >> So when the packet is dropped due to memory over limit, should
> >> >> we return failure for this case? Or I miss anything?
> >> >
> >> > Same behavior than before.
> >> >
> >> > If we dropped some packets of this flow, we return NET_XMIT_CN
> >>
> >> I think for the limited memory case, the upper layer is supposed
> >> to stop sending more packets when hitting the limit.
> >
> > They doe. NET_XMIT_CN for example aborts IP fragmentation.
> >
> > TCP flows will also instantly react.
>
> But not for the NET_XMIT_SUCCESS case:
>
> return ret == idx ? NET_XMIT_CN : NET_XMIT_SUCCESS;
I believe you missed whole point of FQ (SFQ, FQ_CODEL, FQ, ...)
If we dropped a packet of another flow because this other flow is an
elephant, why should we notify the mouse that we shot an elephant ?
We return NET_XMIT_SUCCESS because we properly queued this packet for
this flow. This is absolutely right.
If you do not like fq, just use pfifo, and yes you'll kill the mice.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH net-next] fq_codel: add memory limitation per queue
2016-05-10 4:45 ` Eric Dumazet
@ 2016-05-10 4:57 ` Cong Wang
2016-05-10 5:10 ` Eric Dumazet
0 siblings, 1 reply; 30+ messages in thread
From: Cong Wang @ 2016-05-10 4:57 UTC (permalink / raw)
To: Eric Dumazet
Cc: David Miller, Jesper Dangaard Brouer, Dave Täht, netdev,
moeller0
On Mon, May 9, 2016 at 9:45 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Mon, 2016-05-09 at 21:34 -0700, Cong Wang wrote:
>> On Mon, May 9, 2016 at 7:26 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> > On Sun, 2016-05-08 at 22:07 -0700, Cong Wang wrote:
>> >> On Sun, May 8, 2016 at 9:31 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> >> > On Sun, 2016-05-08 at 21:14 -0700, Cong Wang wrote:
>> >> >
>> >> >> So when the packet is dropped due to memory over limit, should
>> >> >> we return failure for this case? Or I miss anything?
>> >> >
>> >> > Same behavior than before.
>> >> >
>> >> > If we dropped some packets of this flow, we return NET_XMIT_CN
>> >>
>> >> I think for the limited memory case, the upper layer is supposed
>> >> to stop sending more packets when hitting the limit.
>> >
>> > They doe. NET_XMIT_CN for example aborts IP fragmentation.
>> >
>> > TCP flows will also instantly react.
>>
>> But not for the NET_XMIT_SUCCESS case:
>>
>> return ret == idx ? NET_XMIT_CN : NET_XMIT_SUCCESS;
>
>
> I believe you missed whole point of FQ (SFQ, FQ_CODEL, FQ, ...)
>
> If we dropped a packet of another flow because this other flow is an
> elephant, why should we notify the mouse that we shot an elephant ?
>
> We return NET_XMIT_SUCCESS because we properly queued this packet for
> this flow. This is absolutely right.
>
Sure, but we are talking about memory constraint case, aren't we?
If the whole system are suffering from memory pressure, the whole
qdisc should be halted?
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH net-next] fq_codel: add memory limitation per queue
2016-05-10 4:57 ` Cong Wang
@ 2016-05-10 5:10 ` Eric Dumazet
0 siblings, 0 replies; 30+ messages in thread
From: Eric Dumazet @ 2016-05-10 5:10 UTC (permalink / raw)
To: Cong Wang
Cc: David Miller, Jesper Dangaard Brouer, Dave Täht, netdev,
moeller0
On Mon, 2016-05-09 at 21:57 -0700, Cong Wang wrote:
> Sure, but we are talking about memory constraint case, aren't we?
>
> If the whole system are suffering from memory pressure, the whole
> qdisc should be halted?
Please read the patch again.
I added a mem control, exactly to control memory usage in the first
place. If the admin allows this qdisc to consume 4MBytes, then we can
queue up to 4 Mbytes on it.
If we evict packets from _other_ flow because of whatever limit is hit
(being number of packets or memory usage), we do not report to the
innocent guy that some packets were dropped.
The innocent guy packet _is_ queued and _should_ be sent eventually.
Of course, if we could predict the future and that 456 usec later, the
packet will be lost anyway, we would notify the innocent guy right away.
But this is left for future improvement.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: OpenWRT wrong adjustment of fq_codel defaults (Was: fq_codel_drop vs a udp flood)
2016-05-06 19:43 ` Dave Taht
@ 2016-05-15 22:34 ` Roman Yeryomin
2016-05-15 23:07 ` Eric Dumazet
` (2 more replies)
0 siblings, 3 replies; 30+ messages in thread
From: Roman Yeryomin @ 2016-05-15 22:34 UTC (permalink / raw)
To: Dave Taht
Cc: make-wifi-fast, Rafał Miłecki, ath10k,
netdev@vger.kernel.org, codel@lists.bufferbloat.net,
Jonathan Morton, OpenWrt Development List, Felix Fietkau
On 6 May 2016 at 22:43, Dave Taht <dave.taht@gmail.com> wrote:
> On Fri, May 6, 2016 at 11:56 AM, Roman Yeryomin <leroi.lists@gmail.com> wrote:
>> On 6 May 2016 at 21:43, Roman Yeryomin <leroi.lists@gmail.com> wrote:
>>> On 6 May 2016 at 15:47, Jesper Dangaard Brouer <brouer@redhat.com> wrote:
>>>>
>>>> I've created a OpenWRT ticket[1] on this issue, as it seems that someone[2]
>>>> closed Felix'es OpenWRT email account (bad choice! emails bouncing).
>>>> Sounds like OpenWRT and the LEDE https://www.lede-project.org/ project
>>>> is in some kind of conflict.
>>>>
>>>> OpenWRT ticket [1] https://dev.openwrt.org/ticket/22349
>>>>
>>>> [2] http://thread.gmane.org/gmane.comp.embedded.openwrt.devel/40298/focus=40335
>>>
>>> OK, so, after porting the patch to 4.1 openwrt kernel and playing a
>>> bit with fq_codel limits I was able to get 420Mbps UDP like this:
>>> tc qdisc replace dev wlan0 parent :1 fq_codel flows 16 limit 256
>>
>> Forgot to mention, I've reduced drop_batch_size down to 32
>
> 0) Not clear to me if that's the right line, there are 4 wifi queues,
> and the third one
> is the BE queue.
That was an example, sorry, should have stated that. I've applied same
settings to all 4 queues.
> That is too low a limit, also, for normal use. And:
> for the purpose of this particular UDP test, flows 16 is ok, but not
> ideal.
I played with different combinations, it doesn't make any
(significant) difference: 20-30Mbps, not more.
What numbers would you propose?
> 1) What's the tcp number (with a simultaneous ping) with this latest patchset?
> (I care about tcp performance a lot more than udp floods - surviving a
> udp flood yes, performance, no)
During the test (both TCP and UDP) it's roughly 5ms in average, not
running tests ~2ms. Actually I'm now wondering if target is working at
all, because I had same result with target 80ms..
So, yes, latency is good, but performance is poor.
> before/after?
>
> tc -s qdisc show dev wlan0 during/after results?
during the test:
qdisc mq 0: root
Sent 1600496000 bytes 1057194 pkt (dropped 1421568, overlimits 0 requeues 17)
backlog 1545794b 1021p requeues 17
qdisc fq_codel 8001: parent :1 limit 1024p flows 16 quantum 1514
target 80.0ms ce_threshold 32us interval 100.0ms ecn
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
new_flows_len 0 old_flows_len 0
qdisc fq_codel 8002: parent :2 limit 1024p flows 16 quantum 1514
target 80.0ms ce_threshold 32us interval 100.0ms ecn
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
new_flows_len 0 old_flows_len 0
qdisc fq_codel 8003: parent :3 limit 1024p flows 16 quantum 1514
target 80.0ms ce_threshold 32us interval 100.0ms ecn
Sent 1601271168 bytes 1057706 pkt (dropped 1422304, overlimits 0 requeues 17)
backlog 1541252b 1018p requeues 17
maxpacket 1514 drop_overlimit 1422304 new_flow_count 35 ecn_mark 0
new_flows_len 0 old_flows_len 1
qdisc fq_codel 8004: parent :4 limit 1024p flows 16 quantum 1514
target 80.0ms ce_threshold 32us interval 100.0ms ecn
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
new_flows_len 0 old_flows_len 0
after the test (60sec):
qdisc mq 0: root
Sent 3084996052 bytes 2037744 pkt (dropped 2770176, overlimits 0 requeues 28)
backlog 0b 0p requeues 28
qdisc fq_codel 8001: parent :1 limit 1024p flows 16 quantum 1514
target 80.0ms ce_threshold 32us interval 100.0ms ecn
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
new_flows_len 0 old_flows_len 0
qdisc fq_codel 8002: parent :2 limit 1024p flows 16 quantum 1514
target 80.0ms ce_threshold 32us interval 100.0ms ecn
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
new_flows_len 0 old_flows_len 0
qdisc fq_codel 8003: parent :3 limit 1024p flows 16 quantum 1514
target 80.0ms ce_threshold 32us interval 100.0ms ecn
Sent 3084996052 bytes 2037744 pkt (dropped 2770176, overlimits 0 requeues 28)
backlog 0b 0p requeues 28
maxpacket 1514 drop_overlimit 2770176 new_flow_count 64 ecn_mark 0
new_flows_len 0 old_flows_len 1
qdisc fq_codel 8004: parent :4 limit 1024p flows 16 quantum 1514
target 80.0ms ce_threshold 32us interval 100.0ms ecn
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
new_flows_len 0 old_flows_len 0
> IF you are doing builds for the archer c7v2, I can join in on this... (?)
I'm not but I have c7 somewhere, so I can do a build for it and also
test, so we are on the same page.
> I did do a test of the ath10k "before", fq_codel *never engaged*, and
> tcp induced latencies under load, e at 100mbit, cracked 600ms, while
> staying flat (20ms) at 100mbit. (not the same patches you are testing)
> on x86. I have got tcp 300Mbit out of an osx box, similar latency,
> have yet to get anything more on anything I currently have
> before/after patchsets.
>
> I'll go add flooding to the tests, I just finished a series comparing
> two different speed stations and life was good on that.
>
> "before" - fq_codel never engages, we see seconds of latency under load.
>
> root@apu2:~# tc -s qdisc show dev wlp4s0
> qdisc mq 0: root
> Sent 8570563893 bytes 6326983 pkt (dropped 0, overlimits 0 requeues 0)
> backlog 0b 0p requeues 0
> qdisc fq_codel 0: parent :1 limit 10240p flows 1024 quantum 1514
> target 5.0ms interval 100.0ms ecn
> Sent 2262 bytes 17 pkt (dropped 0, overlimits 0 requeues 0)
> backlog 0b 0p requeues 0
> maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
> new_flows_len 0 old_flows_len 0
> qdisc fq_codel 0: parent :2 limit 10240p flows 1024 quantum 1514
> target 5.0ms interval 100.0ms ecn
> Sent 220486569 bytes 152058 pkt (dropped 0, overlimits 0 requeues 0)
> backlog 0b 0p requeues 0
> maxpacket 18168 drop_overlimit 0 new_flow_count 1 ecn_mark 0
> new_flows_len 0 old_flows_len 1
> qdisc fq_codel 0: parent :3 limit 10240p flows 1024 quantum 1514
> target 5.0ms interval 100.0ms ecn
> Sent 8340546509 bytes 6163431 pkt (dropped 0, overlimits 0 requeues 0)
> backlog 0b 0p requeues 0
> maxpacket 68130 drop_overlimit 0 new_flow_count 120050 ecn_mark 0
> new_flows_len 1 old_flows_len 3
> qdisc fq_codel 0: parent :4 limit 10240p flows 1024 quantum 1514
> target 5.0ms interval 100.0ms ecn
> Sent 9528553 bytes 11477 pkt (dropped 0, overlimits 0 requeues 0)
> backlog 0b 0p requeues 0
> maxpacket 66 drop_overlimit 0 new_flow_count 1 ecn_mark 0
> new_flows_len 1 old_flows_len 0
> ```
>
>
>>> This is certainly better than 30Mbps but still more than two times
>>> less than before (900).
>
> The number that I still am not sure we got is that you were sending
> 900mbit udp and recieving 900mbit on the prior tests?
900 was sending, AP POV (wifi client is downloading)
>>> TCP also improved a little (550 to ~590).
>
> The limit is probably a bit low, also. You might want to try target
> 20ms as well.
I've tried limit up to 1024 and target up to 80ms
>>>
>>> Felix, others, do you want to see the ported patch, maybe I did something wrong?
>>> Doesn't look like it will save ath10k from performance regression.
>
> what was tcp "before"? (I'm sorry, such a long thread)
750Mbps
>>>
>>>>
>>>> On Fri, 6 May 2016 11:42:43 +0200
>>>> Jesper Dangaard Brouer <brouer@redhat.com> wrote:
>>>>
>>>>> Hi Felix,
>>>>>
>>>>> This is an important fix for OpenWRT, please read!
>>>>>
>>>>> OpenWRT changed the default fq_codel sch->limit from 10240 to 1024,
>>>>> without also adjusting q->flows_cnt. Eric explains below that you must
>>>>> also adjust the buckets (q->flows_cnt) for this not to break. (Just
>>>>> adjust it to 128)
>>>>>
>>>>> Problematic OpenWRT commit in question:
>>>>> http://git.openwrt.org/?p=openwrt.git;a=patch;h=12cd6578084e
>>>>> 12cd6578084e ("kernel: revert fq_codel quantum override to prevent it from causing too much cpu load with higher speed (#21326)")
>>>>>
>>>>>
>>>>> I also highly recommend you cherry-pick this very recent commit:
>>>>> net-next: 9d18562a2278 ("fq_codel: add batch ability to fq_codel_drop()")
>>>>> https://git.kernel.org/davem/net-next/c/9d18562a227
>>>>>
>>>>> This should fix very high CPU usage in-case fq_codel goes into drop mode.
>>>>> The problem is that drop mode was considered rare, and implementation
>>>>> wise it was chosen to be more expensive (to save cycles on normal mode).
>>>>> Unfortunately is it easy to trigger with an UDP flood. Drop mode is
>>>>> especially expensive for smaller devices, as it scans a 4K big array,
>>>>> thus 64 cache misses for small devices!
>>>>>
>>>>> The fix is to allow drop-mode to bulk-drop more packets when entering
>>>>> drop-mode (default 64 bulk drop). That way we don't suddenly
>>>>> experience a significantly higher processing cost per packet, but
>>>>> instead can amortize this.
>>>>>
>>>>> To Eric, should we recommend OpenWRT to adjust default (max) 64 bulk
>>>>> drop, given we also recommend bucket size to be 128 ? (thus the amount
>>>>> of memory to scan is less, but their CPU is also much smaller).
>>>>>
>>>>> --Jesper
>>>>>
>>>>>
>>>>> On Thu, 05 May 2016 12:23:27 -0700 Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>>>>
>>>>> > On Thu, 2016-05-05 at 19:25 +0300, Roman Yeryomin wrote:
>>>>> > > On 5 May 2016 at 19:12, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>>>> > > > On Thu, 2016-05-05 at 17:53 +0300, Roman Yeryomin wrote:
>>>>> > > >
>>>>> > > >>
>>>>> > > >> qdisc fq_codel 0: dev eth0 root refcnt 2 limit 1024p flows 1024
>>>>> > > >> quantum 1514 target 5.0ms interval 100.0ms ecn
>>>>> > > >> Sent 12306 bytes 128 pkt (dropped 0, overlimits 0 requeues 0)
>>>>> > > >> backlog 0b 0p requeues 0
>>>>> > > >> maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>>>>> > > >> new_flows_len 0 old_flows_len 0
>>>>> > > >
>>>>> > > >
>>>>> > > > Limit of 1024 packets and 1024 flows is not wise I think.
>>>>> > > >
>>>>> > > > (If all buckets are in use, each bucket has a virtual queue of 1 packet,
>>>>> > > > which is almost the same than having no queue at all)
>>>>> > > >
>>>>> > > > I suggest to have at least 8 packets per bucket, to let Codel have a
>>>>> > > > chance to trigger.
>>>>> > > >
>>>>> > > > So you could either reduce number of buckets to 128 (if memory is
>>>>> > > > tight), or increase limit to 8192.
>>>>> > >
>>>>> > > Will try, but what I've posted is default, I didn't change/configure that.
>>>>> >
>>>>> > fq_codel has a default of 10240 packets and 1024 buckets.
>>>>> >
>>>>> > http://lxr.free-electrons.com/source/net/sched/sch_fq_codel.c#L413
>>>>> >
>>>>> > If someone changed that in the linux variant you use, he probably should
>>>>> > explain the rationale.
>>>>
>>>> --
>>>> Best regards,
>>>> Jesper Dangaard Brouer
>>>> MSc.CS, Principal Kernel Engineer at Red Hat
>>>> Author of http://www.iptv-analyzer.org
>>>> LinkedIn: http://www.linkedin.com/in/brouer
>
>
>
> --
> Dave Täht
> Let's go make home routers and wifi faster! With better software!
> http://blog.cerowrt.org
_______________________________________________
Codel mailing list
Codel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/codel
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: OpenWRT wrong adjustment of fq_codel defaults (Was: [Codel] fq_codel_drop vs a udp flood)
2016-05-07 9:57 ` OpenWRT wrong adjustment of fq_codel defaults (Was: [Codel] " Kevin Darbyshire-Bryant
@ 2016-05-15 22:47 ` Roman Yeryomin
0 siblings, 0 replies; 30+ messages in thread
From: Roman Yeryomin @ 2016-05-15 22:47 UTC (permalink / raw)
To: Kevin Darbyshire-Bryant
Cc: Jesper Dangaard Brouer, Eric Dumazet, Felix Fietkau, Dave Taht,
make-wifi-fast, Rafał Miłecki, ath10k,
netdev@vger.kernel.org, codel@lists.bufferbloat.net,
Jonathan Morton
On 7 May 2016 at 12:57, Kevin Darbyshire-Bryant
<kevin@darbyshire-bryant.me.uk> wrote:
>
>
> On 06/05/16 10:42, Jesper Dangaard Brouer wrote:
>> Hi Felix,
>>
>> This is an important fix for OpenWRT, please read!
>>
>> OpenWRT changed the default fq_codel sch->limit from 10240 to 1024,
>> without also adjusting q->flows_cnt. Eric explains below that you must
>> also adjust the buckets (q->flows_cnt) for this not to break. (Just
>> adjust it to 128)
>>
>> Problematic OpenWRT commit in question:
>> http://git.openwrt.org/?p=openwrt.git;a=patch;h=12cd6578084e
>> 12cd6578084e ("kernel: revert fq_codel quantum override to prevent it from causing too much cpu load with higher speed (#21326)")
> I 'pull requested' this to the lede-staging tree on github.
> https://github.com/lede-project/staging/pull/11
>
> One way or another Felix & co should see the change :-)
If you would follow the white rabbit, you would see that it doesn't help
>>
>>
>> I also highly recommend you cherry-pick this very recent commit:
>> net-next: 9d18562a2278 ("fq_codel: add batch ability to fq_codel_drop()")
>> https://git.kernel.org/davem/net-next/c/9d18562a227
>>
>> This should fix very high CPU usage in-case fq_codel goes into drop mode.
>> The problem is that drop mode was considered rare, and implementation
>> wise it was chosen to be more expensive (to save cycles on normal mode).
>> Unfortunately is it easy to trigger with an UDP flood. Drop mode is
>> especially expensive for smaller devices, as it scans a 4K big array,
>> thus 64 cache misses for small devices!
>>
>> The fix is to allow drop-mode to bulk-drop more packets when entering
>> drop-mode (default 64 bulk drop). That way we don't suddenly
>> experience a significantly higher processing cost per packet, but
>> instead can amortize this.
> I haven't done the above cherry-pick patch & backport patch creation for
> 4.4/4.1/3.18 yet - maybe if $dayjob permits time and no one else beats
> me to it :-)
>
> Kevin
>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: OpenWRT wrong adjustment of fq_codel defaults (Was: fq_codel_drop vs a udp flood)
2016-05-15 22:34 ` Roman Yeryomin
@ 2016-05-15 23:07 ` Eric Dumazet
2016-05-15 23:27 ` Roman Yeryomin
2016-05-16 8:12 ` [Make-wifi-fast] " David Lang
2016-05-16 8:14 ` Roman Yeryomin
2 siblings, 1 reply; 30+ messages in thread
From: Eric Dumazet @ 2016-05-15 23:07 UTC (permalink / raw)
To: Roman Yeryomin
Cc: make-wifi-fast, Rafał Miłecki, ath10k,
netdev@vger.kernel.org, codel@lists.bufferbloat.net,
Jonathan Morton, OpenWrt Development List, Felix Fietkau
On Mon, 2016-05-16 at 01:34 +0300, Roman Yeryomin wrote:
> qdisc fq_codel 8003: parent :3 limit 1024p flows 16 quantum 1514
> target 80.0ms ce_threshold 32us interval 100.0ms ecn
> Sent 1601271168 bytes 1057706 pkt (dropped 1422304, overlimits 0 requeues 17)
> backlog 1541252b 1018p requeues 17
> maxpacket 1514 drop_overlimit 1422304 new_flow_count 35 ecn_mark 0
> new_flows_len 0 old_flows_len 1
Why do you have ce_threshold set ? You really should not (even if it
does not matter for the kind of traffic you have at this moment)
If your expected link speed is around 1Gbps, or 80,000 packets per
second, then you have to understand that 1024 packets limit is about 12
ms at most.
Even if the queue is full, max sojourn time of a packet would be 12 ms.
I really do not see how 'target 80 ms' could be hit.
You basically have FQ, with no Codel effect, but with the associated
cost of Codel (having to take timestamps)
_______________________________________________
Codel mailing list
Codel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/codel
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: OpenWRT wrong adjustment of fq_codel defaults (Was: fq_codel_drop vs a udp flood)
2016-05-15 23:07 ` Eric Dumazet
@ 2016-05-15 23:27 ` Roman Yeryomin
0 siblings, 0 replies; 30+ messages in thread
From: Roman Yeryomin @ 2016-05-15 23:27 UTC (permalink / raw)
To: Eric Dumazet
Cc: make-wifi-fast, Rafał Miłecki, ath10k,
netdev@vger.kernel.org, codel@lists.bufferbloat.net,
Jonathan Morton, OpenWrt Development List, Felix Fietkau
On 16 May 2016 at 02:07, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Mon, 2016-05-16 at 01:34 +0300, Roman Yeryomin wrote:
>
>> qdisc fq_codel 8003: parent :3 limit 1024p flows 16 quantum 1514
>> target 80.0ms ce_threshold 32us interval 100.0ms ecn
>> Sent 1601271168 bytes 1057706 pkt (dropped 1422304, overlimits 0 requeues 17)
>> backlog 1541252b 1018p requeues 17
>> maxpacket 1514 drop_overlimit 1422304 new_flow_count 35 ecn_mark 0
>> new_flows_len 0 old_flows_len 1
>
> Why do you have ce_threshold set ? You really should not (even if it
> does not matter for the kind of traffic you have at this moment)
No idea, it was there always. How do I unset it? Setting it to 0 doesn't help.
> If your expected link speed is around 1Gbps, or 80,000 packets per
> second, then you have to understand that 1024 packets limit is about 12
> ms at most.
>
> Even if the queue is full, max sojourn time of a packet would be 12 ms.
>
> I really do not see how 'target 80 ms' could be hit.
Well, as I said, I've tried different options. Neither target 20ms (as
Dave proposed) not 12ms save the situation.
> You basically have FQ, with no Codel effect, but with the associated
> cost of Codel (having to take timestamps)
>
>
>
_______________________________________________
Codel mailing list
Codel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/codel
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH net-next] fq_codel: fix memory limitation drift
2016-05-06 15:55 ` [PATCH net-next] fq_codel: add memory limitation per queue Eric Dumazet
2016-05-09 3:49 ` David Miller
2016-05-09 4:14 ` Cong Wang
@ 2016-05-16 1:16 ` Eric Dumazet
2016-05-17 1:57 ` David Miller
2 siblings, 1 reply; 30+ messages in thread
From: Eric Dumazet @ 2016-05-16 1:16 UTC (permalink / raw)
To: David Miller; +Cc: Jesper Dangaard Brouer, Dave Täht, netdev, moeller0
From: Eric Dumazet <edumazet@google.com>
memory_usage must be decreased in dequeue_func(), not in
fq_codel_dequeue(), otherwise packets dropped by Codel algo
are missing this decrease.
Also we need to clear memory_usage in fq_codel_reset()
Fixes: 95b58430abe7 ("fq_codel: add memory limitation per queue")
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
net/sched/sch_fq_codel.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c
index bb8bd9314629..6883a8971562 100644
--- a/net/sched/sch_fq_codel.c
+++ b/net/sched/sch_fq_codel.c
@@ -262,6 +262,7 @@ static struct sk_buff *dequeue_func(struct codel_vars *vars, void *ctx)
if (flow->head) {
skb = dequeue_head(flow);
q->backlogs[flow - q->flows] -= qdisc_pkt_len(skb);
+ q->memory_usage -= skb->truesize;
sch->q.qlen--;
sch->qstats.backlog -= qdisc_pkt_len(skb);
}
@@ -318,7 +319,6 @@ begin:
list_del_init(&flow->flowchain);
goto begin;
}
- q->memory_usage -= skb->truesize;
qdisc_bstats_update(sch, skb);
flow->deficit -= qdisc_pkt_len(skb);
/* We cant call qdisc_tree_reduce_backlog() if our qlen is 0,
@@ -355,6 +355,7 @@ static void fq_codel_reset(struct Qdisc *sch)
}
memset(q->backlogs, 0, q->flows_cnt * sizeof(u32));
sch->q.qlen = 0;
+ q->memory_usage = 0;
}
static const struct nla_policy fq_codel_policy[TCA_FQ_CODEL_MAX + 1] = {
^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [Make-wifi-fast] OpenWRT wrong adjustment of fq_codel defaults (Was: fq_codel_drop vs a udp flood)
2016-05-15 22:34 ` Roman Yeryomin
2016-05-15 23:07 ` Eric Dumazet
@ 2016-05-16 8:12 ` David Lang
2016-05-16 8:26 ` [Make-wifi-fast] OpenWRT wrong adjustment of fq_codel defaults (Was: [Codel] " Roman Yeryomin
2016-05-16 8:14 ` Roman Yeryomin
2 siblings, 1 reply; 30+ messages in thread
From: David Lang @ 2016-05-16 8:12 UTC (permalink / raw)
To: Roman Yeryomin
Cc: make-wifi-fast, Rafał Miłecki, ath10k,
codel@lists.bufferbloat.net, netdev@vger.kernel.org,
OpenWrt Development List, Felix Fietkau
On Mon, 16 May 2016, Roman Yeryomin wrote:
> On 6 May 2016 at 22:43, Dave Taht <dave.taht@gmail.com> wrote:
>> On Fri, May 6, 2016 at 11:56 AM, Roman Yeryomin <leroi.lists@gmail.com> wrote:
>>> On 6 May 2016 at 21:43, Roman Yeryomin <leroi.lists@gmail.com> wrote:
>>>> On 6 May 2016 at 15:47, Jesper Dangaard Brouer <brouer@redhat.com> wrote:
>>>>>
>
>> That is too low a limit, also, for normal use. And:
>> for the purpose of this particular UDP test, flows 16 is ok, but not
>> ideal.
>
> I played with different combinations, it doesn't make any
> (significant) difference: 20-30Mbps, not more.
> What numbers would you propose?
How many different flows did you have going at once? I believe that the reason
for higher numbers isn't for throughput, but to allow for more flows to be
isolated from each other. If you have too few buckets, different flows will end
up being combined into one bucket so that one will affect the other more.
David Lang
_______________________________________________
Codel mailing list
Codel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/codel
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: OpenWRT wrong adjustment of fq_codel defaults (Was: fq_codel_drop vs a udp flood)
2016-05-15 22:34 ` Roman Yeryomin
2016-05-15 23:07 ` Eric Dumazet
2016-05-16 8:12 ` [Make-wifi-fast] " David Lang
@ 2016-05-16 8:14 ` Roman Yeryomin
2016-05-16 14:23 ` [Make-wifi-fast] " Eric Dumazet
2016-05-16 16:04 ` OpenWRT wrong adjustment of fq_codel defaults (Was: [Codel] " Dave Taht
2 siblings, 2 replies; 30+ messages in thread
From: Roman Yeryomin @ 2016-05-16 8:14 UTC (permalink / raw)
To: Rajkumar Manoharan, Michal Kazior
Cc: make-wifi-fast, Rafał Miłecki, ath10k,
netdev@vger.kernel.org, codel@lists.bufferbloat.net,
Jonathan Morton, OpenWrt Development List, Felix Fietkau
On 16 May 2016 at 01:34, Roman Yeryomin <leroi.lists@gmail.com> wrote:
> On 6 May 2016 at 22:43, Dave Taht <dave.taht@gmail.com> wrote:
>> On Fri, May 6, 2016 at 11:56 AM, Roman Yeryomin <leroi.lists@gmail.com> wrote:
>>> On 6 May 2016 at 21:43, Roman Yeryomin <leroi.lists@gmail.com> wrote:
>>>> On 6 May 2016 at 15:47, Jesper Dangaard Brouer <brouer@redhat.com> wrote:
>>>>>
>>>>> I've created a OpenWRT ticket[1] on this issue, as it seems that someone[2]
>>>>> closed Felix'es OpenWRT email account (bad choice! emails bouncing).
>>>>> Sounds like OpenWRT and the LEDE https://www.lede-project.org/ project
>>>>> is in some kind of conflict.
>>>>>
>>>>> OpenWRT ticket [1] https://dev.openwrt.org/ticket/22349
>>>>>
>>>>> [2] http://thread.gmane.org/gmane.comp.embedded.openwrt.devel/40298/focus=40335
>>>>
>>>> OK, so, after porting the patch to 4.1 openwrt kernel and playing a
>>>> bit with fq_codel limits I was able to get 420Mbps UDP like this:
>>>> tc qdisc replace dev wlan0 parent :1 fq_codel flows 16 limit 256
>>>
>>> Forgot to mention, I've reduced drop_batch_size down to 32
>>
>> 0) Not clear to me if that's the right line, there are 4 wifi queues,
>> and the third one
>> is the BE queue.
>
> That was an example, sorry, should have stated that. I've applied same
> settings to all 4 queues.
>
>> That is too low a limit, also, for normal use. And:
>> for the purpose of this particular UDP test, flows 16 is ok, but not
>> ideal.
>
> I played with different combinations, it doesn't make any
> (significant) difference: 20-30Mbps, not more.
> What numbers would you propose?
>
>> 1) What's the tcp number (with a simultaneous ping) with this latest patchset?
>> (I care about tcp performance a lot more than udp floods - surviving a
>> udp flood yes, performance, no)
>
> During the test (both TCP and UDP) it's roughly 5ms in average, not
> running tests ~2ms. Actually I'm now wondering if target is working at
> all, because I had same result with target 80ms..
> So, yes, latency is good, but performance is poor.
>
>> before/after?
>>
>> tc -s qdisc show dev wlan0 during/after results?
>
> during the test:
>
> qdisc mq 0: root
> Sent 1600496000 bytes 1057194 pkt (dropped 1421568, overlimits 0 requeues 17)
> backlog 1545794b 1021p requeues 17
> qdisc fq_codel 8001: parent :1 limit 1024p flows 16 quantum 1514
> target 80.0ms ce_threshold 32us interval 100.0ms ecn
> Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
> backlog 0b 0p requeues 0
> maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
> new_flows_len 0 old_flows_len 0
> qdisc fq_codel 8002: parent :2 limit 1024p flows 16 quantum 1514
> target 80.0ms ce_threshold 32us interval 100.0ms ecn
> Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
> backlog 0b 0p requeues 0
> maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
> new_flows_len 0 old_flows_len 0
> qdisc fq_codel 8003: parent :3 limit 1024p flows 16 quantum 1514
> target 80.0ms ce_threshold 32us interval 100.0ms ecn
> Sent 1601271168 bytes 1057706 pkt (dropped 1422304, overlimits 0 requeues 17)
> backlog 1541252b 1018p requeues 17
> maxpacket 1514 drop_overlimit 1422304 new_flow_count 35 ecn_mark 0
> new_flows_len 0 old_flows_len 1
> qdisc fq_codel 8004: parent :4 limit 1024p flows 16 quantum 1514
> target 80.0ms ce_threshold 32us interval 100.0ms ecn
> Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
> backlog 0b 0p requeues 0
> maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
> new_flows_len 0 old_flows_len 0
>
>
> after the test (60sec):
>
> qdisc mq 0: root
> Sent 3084996052 bytes 2037744 pkt (dropped 2770176, overlimits 0 requeues 28)
> backlog 0b 0p requeues 28
> qdisc fq_codel 8001: parent :1 limit 1024p flows 16 quantum 1514
> target 80.0ms ce_threshold 32us interval 100.0ms ecn
> Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
> backlog 0b 0p requeues 0
> maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
> new_flows_len 0 old_flows_len 0
> qdisc fq_codel 8002: parent :2 limit 1024p flows 16 quantum 1514
> target 80.0ms ce_threshold 32us interval 100.0ms ecn
> Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
> backlog 0b 0p requeues 0
> maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
> new_flows_len 0 old_flows_len 0
> qdisc fq_codel 8003: parent :3 limit 1024p flows 16 quantum 1514
> target 80.0ms ce_threshold 32us interval 100.0ms ecn
> Sent 3084996052 bytes 2037744 pkt (dropped 2770176, overlimits 0 requeues 28)
> backlog 0b 0p requeues 28
> maxpacket 1514 drop_overlimit 2770176 new_flow_count 64 ecn_mark 0
> new_flows_len 0 old_flows_len 1
> qdisc fq_codel 8004: parent :4 limit 1024p flows 16 quantum 1514
> target 80.0ms ce_threshold 32us interval 100.0ms ecn
> Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
> backlog 0b 0p requeues 0
> maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
> new_flows_len 0 old_flows_len 0
>
>
>> IF you are doing builds for the archer c7v2, I can join in on this... (?)
>
> I'm not but I have c7 somewhere, so I can do a build for it and also
> test, so we are on the same page.
>
>> I did do a test of the ath10k "before", fq_codel *never engaged*, and
>> tcp induced latencies under load, e at 100mbit, cracked 600ms, while
>> staying flat (20ms) at 100mbit. (not the same patches you are testing)
>> on x86. I have got tcp 300Mbit out of an osx box, similar latency,
>> have yet to get anything more on anything I currently have
>> before/after patchsets.
>>
>> I'll go add flooding to the tests, I just finished a series comparing
>> two different speed stations and life was good on that.
>>
>> "before" - fq_codel never engages, we see seconds of latency under load.
>>
>> root@apu2:~# tc -s qdisc show dev wlp4s0
>> qdisc mq 0: root
>> Sent 8570563893 bytes 6326983 pkt (dropped 0, overlimits 0 requeues 0)
>> backlog 0b 0p requeues 0
>> qdisc fq_codel 0: parent :1 limit 10240p flows 1024 quantum 1514
>> target 5.0ms interval 100.0ms ecn
>> Sent 2262 bytes 17 pkt (dropped 0, overlimits 0 requeues 0)
>> backlog 0b 0p requeues 0
>> maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>> new_flows_len 0 old_flows_len 0
>> qdisc fq_codel 0: parent :2 limit 10240p flows 1024 quantum 1514
>> target 5.0ms interval 100.0ms ecn
>> Sent 220486569 bytes 152058 pkt (dropped 0, overlimits 0 requeues 0)
>> backlog 0b 0p requeues 0
>> maxpacket 18168 drop_overlimit 0 new_flow_count 1 ecn_mark 0
>> new_flows_len 0 old_flows_len 1
>> qdisc fq_codel 0: parent :3 limit 10240p flows 1024 quantum 1514
>> target 5.0ms interval 100.0ms ecn
>> Sent 8340546509 bytes 6163431 pkt (dropped 0, overlimits 0 requeues 0)
>> backlog 0b 0p requeues 0
>> maxpacket 68130 drop_overlimit 0 new_flow_count 120050 ecn_mark 0
>> new_flows_len 1 old_flows_len 3
>> qdisc fq_codel 0: parent :4 limit 10240p flows 1024 quantum 1514
>> target 5.0ms interval 100.0ms ecn
>> Sent 9528553 bytes 11477 pkt (dropped 0, overlimits 0 requeues 0)
>> backlog 0b 0p requeues 0
>> maxpacket 66 drop_overlimit 0 new_flow_count 1 ecn_mark 0
>> new_flows_len 1 old_flows_len 0
>> ```
>>
>>
>>>> This is certainly better than 30Mbps but still more than two times
>>>> less than before (900).
>>
>> The number that I still am not sure we got is that you were sending
>> 900mbit udp and recieving 900mbit on the prior tests?
>
> 900 was sending, AP POV (wifi client is downloading)
>
>>>> TCP also improved a little (550 to ~590).
>>
>> The limit is probably a bit low, also. You might want to try target
>> 20ms as well.
>
> I've tried limit up to 1024 and target up to 80ms
>
>>>>
>>>> Felix, others, do you want to see the ported patch, maybe I did something wrong?
>>>> Doesn't look like it will save ath10k from performance regression.
>>
>> what was tcp "before"? (I'm sorry, such a long thread)
>
> 750Mbps
Michal, after retesting with your patch (sorry, it was late yesterday,
confused compat-wireless archives) I saw the difference.
So the progress looks like this (all with fq_codel flows 16 limit 1024
target 20ms):
no patches: 380Mbps UDP, 550 TCP
Eric's (fq_codel drop) patch: 420Mbps UDP, 590 TCP (+40Mbps), latency
5-6ms during test
Michal's (improve tx scheduling) patch: 580Mbps UDP, 660 TCP, latency
up to 30-40ms during test
after Rajkumar's proposal to "try without registering wake_tx_queue
callback": 820Mbps UDP, 690 TCP.
So, very close to "as before": 900Mbps UDP, 750 TCP.
But still, I was expecting performance improvements from latest ath10k
code, not regressions.
I know that hw is capable of 800Mbps TCP, which I'm targeting.
Regards,
Roman
p.s. sorry for confusion
_______________________________________________
Codel mailing list
Codel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/codel
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Make-wifi-fast] OpenWRT wrong adjustment of fq_codel defaults (Was: [Codel] fq_codel_drop vs a udp flood)
2016-05-16 8:12 ` [Make-wifi-fast] " David Lang
@ 2016-05-16 8:26 ` Roman Yeryomin
2016-05-16 8:46 ` [Make-wifi-fast] OpenWRT wrong adjustment of fq_codel defaults (Was: " David Lang
0 siblings, 1 reply; 30+ messages in thread
From: Roman Yeryomin @ 2016-05-16 8:26 UTC (permalink / raw)
To: David Lang
Cc: make-wifi-fast, Dave Taht, ath10k, codel@lists.bufferbloat.net,
netdev@vger.kernel.org, OpenWrt Development List
On 16 May 2016 at 11:12, David Lang <david@lang.hm> wrote:
> On Mon, 16 May 2016, Roman Yeryomin wrote:
>
>> On 6 May 2016 at 22:43, Dave Taht <dave.taht@gmail.com> wrote:
>>>
>>> On Fri, May 6, 2016 at 11:56 AM, Roman Yeryomin <leroi.lists@gmail.com>
>>> wrote:
>>>>
>>>> On 6 May 2016 at 21:43, Roman Yeryomin <leroi.lists@gmail.com> wrote:
>>>>>
>>>>> On 6 May 2016 at 15:47, Jesper Dangaard Brouer <brouer@redhat.com>
>>>>> wrote:
>>>>>>
>>>>>>
>>
>>> That is too low a limit, also, for normal use. And:
>>> for the purpose of this particular UDP test, flows 16 is ok, but not
>>> ideal.
>>
>>
>> I played with different combinations, it doesn't make any
>> (significant) difference: 20-30Mbps, not more.
>> What numbers would you propose?
>
>
> How many different flows did you have going at once? I believe that the
> reason for higher numbers isn't for throughput, but to allow for more flows
> to be isolated from each other. If you have too few buckets, different flows
> will end up being combined into one bucket so that one will affect the other
> more.
I'm testing with one flow, I never saw bigger performance with more
flows (e.g. -P8 to iperf3).
Regards,
Roman
_______________________________________________
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/cgi-bin/mailman/listinfo/openwrt-devel
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Make-wifi-fast] OpenWRT wrong adjustment of fq_codel defaults (Was: fq_codel_drop vs a udp flood)
2016-05-16 8:26 ` [Make-wifi-fast] OpenWRT wrong adjustment of fq_codel defaults (Was: [Codel] " Roman Yeryomin
@ 2016-05-16 8:46 ` David Lang
2016-05-16 10:34 ` [OpenWrt-Devel] " Sebastian Moeller
0 siblings, 1 reply; 30+ messages in thread
From: David Lang @ 2016-05-16 8:46 UTC (permalink / raw)
To: Roman Yeryomin
Cc: make-wifi-fast, Rafał Miłecki, ath10k,
codel@lists.bufferbloat.net, netdev@vger.kernel.org,
OpenWrt Development List, Felix Fietkau
On Mon, 16 May 2016, Roman Yeryomin wrote:
> On 16 May 2016 at 11:12, David Lang <david@lang.hm> wrote:
>> On Mon, 16 May 2016, Roman Yeryomin wrote:
>>
>>> On 6 May 2016 at 22:43, Dave Taht <dave.taht@gmail.com> wrote:
>>>>
>>>> On Fri, May 6, 2016 at 11:56 AM, Roman Yeryomin <leroi.lists@gmail.com>
>>>> wrote:
>>>>>
>>>>> On 6 May 2016 at 21:43, Roman Yeryomin <leroi.lists@gmail.com> wrote:
>>>>>>
>>>>>> On 6 May 2016 at 15:47, Jesper Dangaard Brouer <brouer@redhat.com>
>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>
>>>> That is too low a limit, also, for normal use. And:
>>>> for the purpose of this particular UDP test, flows 16 is ok, but not
>>>> ideal.
>>>
>>>
>>> I played with different combinations, it doesn't make any
>>> (significant) difference: 20-30Mbps, not more.
>>> What numbers would you propose?
>>
>>
>> How many different flows did you have going at once? I believe that the
>> reason for higher numbers isn't for throughput, but to allow for more flows
>> to be isolated from each other. If you have too few buckets, different flows
>> will end up being combined into one bucket so that one will affect the other
>> more.
>
> I'm testing with one flow, I never saw bigger performance with more
> flows (e.g. -P8 to iperf3).
The issue isn't performance, it's isolating a DNS request from a VoIP flow
from a streaming video flow from a DVD image download.
The question is how many buckets do you need to have to isolate these in
practice? it depends how many flows you have. The default was 1024 buckets, but
got changed to 128 for low memory devices, and that lower value got made into
the default, even for devices with lots of memory.
I'm wondering if instead of trying to size this based on device memory, can it
be resizable on the fly and grow if too many flows/collisions are detected?
David Lang
_______________________________________________
Codel mailing list
Codel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/codel
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [OpenWrt-Devel] [Make-wifi-fast] OpenWRT wrong adjustment of fq_codel defaults (Was: fq_codel_drop vs a udp flood)
2016-05-16 8:46 ` [Make-wifi-fast] OpenWRT wrong adjustment of fq_codel defaults (Was: " David Lang
@ 2016-05-16 10:34 ` Sebastian Moeller
0 siblings, 0 replies; 30+ messages in thread
From: Sebastian Moeller @ 2016-05-16 10:34 UTC (permalink / raw)
To: David Lang, Roman Yeryomin
Cc: make-wifi-fast, ath10k, codel@lists.bufferbloat.net,
netdev@vger.kernel.org, OpenWrt Development List
Hi David,
On May 16, 2016 10:46:25 AM GMT+02:00, David Lang <david@lang.hm> wrote:
>On Mon, 16 May 2016, Roman Yeryomin wrote:
>
>> On 16 May 2016 at 11:12, David Lang <david@lang.hm> wrote:
>>> On Mon, 16 May 2016, Roman Yeryomin wrote:
>>>
>>>> On 6 May 2016 at 22:43, Dave Taht <dave.taht@gmail.com> wrote:
>>>>>
>>>>> On Fri, May 6, 2016 at 11:56 AM, Roman Yeryomin
><leroi.lists@gmail.com>
>>>>> wrote:
>>>>>>
>>>>>> On 6 May 2016 at 21:43, Roman Yeryomin <leroi.lists@gmail.com>
>wrote:
>>>>>>>
>>>>>>> On 6 May 2016 at 15:47, Jesper Dangaard Brouer
><brouer@redhat.com>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>
>>>>> That is too low a limit, also, for normal use. And:
>>>>> for the purpose of this particular UDP test, flows 16 is ok, but
>not
>>>>> ideal.
>>>>
>>>>
>>>> I played with different combinations, it doesn't make any
>>>> (significant) difference: 20-30Mbps, not more.
>>>> What numbers would you propose?
>>>
>>>
>>> How many different flows did you have going at once? I believe that
>the
>>> reason for higher numbers isn't for throughput, but to allow for
>more flows
>>> to be isolated from each other. If you have too few buckets,
>different flows
>>> will end up being combined into one bucket so that one will affect
>the other
>>> more.
>>
>> I'm testing with one flow, I never saw bigger performance with more
>> flows (e.g. -P8 to iperf3).
>
>The issue isn't performance, it's isolating a DNS request from a VoIP
>flow
>from a streaming video flow from a DVD image download.
>
>The question is how many buckets do you need to have to isolate these
>in
>practice? it depends how many flows you have. The default was 1024
>buckets, but
>got changed to 128 for low memory devices, and that lower value got
>made into
>the default, even for devices with lots of memory.
And I believe that the reduction was suboptimal, we need the Hash buckets to spread the glows around to avoid shared fate due to shared buckets... So the 1024 glows make a lot of sense even if the number of real concurrent flows is lower think birthday paradoxon.
The change came because at full saturation our reduced packet limit only allowed one packet per bucket which is too low for decent performance... also less hash buckets make searching faster.
Since we now can specify a memory limit in addition to the packet limit, we should set the packet limit back to its default of 10240 and instead set the memory limit to something same for each platform. This will effectively have the same consequences as setting a packet limit, except it becomes clearer why performance degrades and I at least take a performance hit gladly over a forced oom reboot....
>
>I'm wondering if instead of trying to size this based on device memory,
>can it
>be resizable on the fly and grow if too many flows/collisions are
>detected?
>
>David Lang
>_______________________________________________
>openwrt-devel mailing list
>openwrt-devel@lists.openwrt.org
>https://lists.openwrt.org/cgi-bin/mailman/listinfo/openwrt-devel
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
_______________________________________________
Codel mailing list
Codel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/codel
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Make-wifi-fast] OpenWRT wrong adjustment of fq_codel defaults (Was: fq_codel_drop vs a udp flood)
2016-05-16 8:14 ` Roman Yeryomin
@ 2016-05-16 14:23 ` Eric Dumazet
2016-05-16 16:04 ` OpenWRT wrong adjustment of fq_codel defaults (Was: [Codel] " Dave Taht
1 sibling, 0 replies; 30+ messages in thread
From: Eric Dumazet @ 2016-05-16 14:23 UTC (permalink / raw)
To: Roman Yeryomin
Cc: Rajkumar Manoharan, make-wifi-fast, Rafał Miłecki,
ath10k, codel@lists.bufferbloat.net, netdev@vger.kernel.org,
OpenWrt Development List, Felix Fietkau
On Mon, 2016-05-16 at 11:14 +0300, Roman Yeryomin wrote:
> So, very close to "as before": 900Mbps UDP, 750 TCP.
> But still, I was expecting performance improvements from latest ath10k
> code, not regressions.
> I know that hw is capable of 800Mbps TCP, which I'm targeting.
One flow can reach 800Mbps.
To get this, a simple pfifo is enough.
But _if_ you also want to get decent results with hundreds of flows
under stress, you need something else, and I do not see how 'something'
else would come for free.
You will see some 'regressions' because of additional cpu costs, unless
you have enough cpu cycles and KB of memory to burn for free.
If your goal is to get max throughput on a single TCP flow, in a clean
env an cheap hardware, you absolutely should stick to pfifo. Nothing
could beat pfifo (well, pfifo could be improved using lockless
implementation, but that would matter if you have different cpus
queueing and dequeueing packets)
But I guess your issues mostly come from a too small packet limits, or
to big TCP windows.
Basically, if you test a single TCP flow, fq_codel should behave like a
pfifo, unless maybe your kernel has a very slow ktime_get_ns()
implementation [1]
If you set a limit of 1024 packets on pfifo, you'll have the same amount
of drops and lower TCP throughput.
[1] We probably should have a self-test to have an estimation of
ktime_get_ns() cost
_______________________________________________
Codel mailing list
Codel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/codel
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: OpenWRT wrong adjustment of fq_codel defaults (Was: [Codel] fq_codel_drop vs a udp flood)
2016-05-16 8:14 ` Roman Yeryomin
2016-05-16 14:23 ` [Make-wifi-fast] " Eric Dumazet
@ 2016-05-16 16:04 ` Dave Taht
2016-05-16 19:46 ` OpenWRT wrong adjustment of fq_codel defaults (Was: " Roman Yeryomin
1 sibling, 1 reply; 30+ messages in thread
From: Dave Taht @ 2016-05-16 16:04 UTC (permalink / raw)
To: Roman Yeryomin
Cc: Rajkumar Manoharan, Michal Kazior, Jesper Dangaard Brouer,
Felix Fietkau, Jonathan Morton, codel@lists.bufferbloat.net,
ath10k, make-wifi-fast, Rafał Miłecki,
netdev@vger.kernel.org, OpenWrt Development List
On Mon, May 16, 2016 at 1:14 AM, Roman Yeryomin <leroi.lists@gmail.com> wrote:
> On 16 May 2016 at 01:34, Roman Yeryomin <leroi.lists@gmail.com> wrote:
>> On 6 May 2016 at 22:43, Dave Taht <dave.taht@gmail.com> wrote:
>>> On Fri, May 6, 2016 at 11:56 AM, Roman Yeryomin <leroi.lists@gmail.com> wrote:
>>>> On 6 May 2016 at 21:43, Roman Yeryomin <leroi.lists@gmail.com> wrote:
>>>>> On 6 May 2016 at 15:47, Jesper Dangaard Brouer <brouer@redhat.com> wrote:
>>>>>>
>>>>>> I've created a OpenWRT ticket[1] on this issue, as it seems that someone[2]
>>>>>> closed Felix'es OpenWRT email account (bad choice! emails bouncing).
>>>>>> Sounds like OpenWRT and the LEDE https://www.lede-project.org/ project
>>>>>> is in some kind of conflict.
>>>>>>
>>>>>> OpenWRT ticket [1] https://dev.openwrt.org/ticket/22349
>>>>>>
>>>>>> [2] http://thread.gmane.org/gmane.comp.embedded.openwrt.devel/40298/focus=40335
>>>>>
>>>>> OK, so, after porting the patch to 4.1 openwrt kernel and playing a
>>>>> bit with fq_codel limits I was able to get 420Mbps UDP like this:
>>>>> tc qdisc replace dev wlan0 parent :1 fq_codel flows 16 limit 256
>>>>
>>>> Forgot to mention, I've reduced drop_batch_size down to 32
>>>
>>> 0) Not clear to me if that's the right line, there are 4 wifi queues,
>>> and the third one
>>> is the BE queue.
>>
>> That was an example, sorry, should have stated that. I've applied same
>> settings to all 4 queues.
>>
>>> That is too low a limit, also, for normal use. And:
>>> for the purpose of this particular UDP test, flows 16 is ok, but not
>>> ideal.
>>
>> I played with different combinations, it doesn't make any
>> (significant) difference: 20-30Mbps, not more.
>> What numbers would you propose?
>>
>>> 1) What's the tcp number (with a simultaneous ping) with this latest patchset?
>>> (I care about tcp performance a lot more than udp floods - surviving a
>>> udp flood yes, performance, no)
>>
>> During the test (both TCP and UDP) it's roughly 5ms in average, not
>> running tests ~2ms. Actually I'm now wondering if target is working at
>> all, because I had same result with target 80ms..
>> So, yes, latency is good, but performance is poor.
>>
>>> before/after?
>>>
>>> tc -s qdisc show dev wlan0 during/after results?
>>
>> during the test:
>>
>> qdisc mq 0: root
>> Sent 1600496000 bytes 1057194 pkt (dropped 1421568, overlimits 0 requeues 17)
>> backlog 1545794b 1021p requeues 17
>> qdisc fq_codel 8001: parent :1 limit 1024p flows 16 quantum 1514
>> target 80.0ms ce_threshold 32us interval 100.0ms ecn
>> Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>> backlog 0b 0p requeues 0
>> maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>> new_flows_len 0 old_flows_len 0
>> qdisc fq_codel 8002: parent :2 limit 1024p flows 16 quantum 1514
>> target 80.0ms ce_threshold 32us interval 100.0ms ecn
>> Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>> backlog 0b 0p requeues 0
>> maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>> new_flows_len 0 old_flows_len 0
>> qdisc fq_codel 8003: parent :3 limit 1024p flows 16 quantum 1514
>> target 80.0ms ce_threshold 32us interval 100.0ms ecn
>> Sent 1601271168 bytes 1057706 pkt (dropped 1422304, overlimits 0 requeues 17)
>> backlog 1541252b 1018p requeues 17
>> maxpacket 1514 drop_overlimit 1422304 new_flow_count 35 ecn_mark 0
>> new_flows_len 0 old_flows_len 1
>> qdisc fq_codel 8004: parent :4 limit 1024p flows 16 quantum 1514
>> target 80.0ms ce_threshold 32us interval 100.0ms ecn
>> Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>> backlog 0b 0p requeues 0
>> maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>> new_flows_len 0 old_flows_len 0
>>
>>
>> after the test (60sec):
>>
>> qdisc mq 0: root
>> Sent 3084996052 bytes 2037744 pkt (dropped 2770176, overlimits 0 requeues 28)
>> backlog 0b 0p requeues 28
>> qdisc fq_codel 8001: parent :1 limit 1024p flows 16 quantum 1514
>> target 80.0ms ce_threshold 32us interval 100.0ms ecn
>> Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>> backlog 0b 0p requeues 0
>> maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>> new_flows_len 0 old_flows_len 0
>> qdisc fq_codel 8002: parent :2 limit 1024p flows 16 quantum 1514
>> target 80.0ms ce_threshold 32us interval 100.0ms ecn
>> Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>> backlog 0b 0p requeues 0
>> maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>> new_flows_len 0 old_flows_len 0
>> qdisc fq_codel 8003: parent :3 limit 1024p flows 16 quantum 1514
>> target 80.0ms ce_threshold 32us interval 100.0ms ecn
>> Sent 3084996052 bytes 2037744 pkt (dropped 2770176, overlimits 0 requeues 28)
>> backlog 0b 0p requeues 28
>> maxpacket 1514 drop_overlimit 2770176 new_flow_count 64 ecn_mark 0
>> new_flows_len 0 old_flows_len 1
>> qdisc fq_codel 8004: parent :4 limit 1024p flows 16 quantum 1514
>> target 80.0ms ce_threshold 32us interval 100.0ms ecn
>> Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>> backlog 0b 0p requeues 0
>> maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>> new_flows_len 0 old_flows_len 0
>>
>>
>>> IF you are doing builds for the archer c7v2, I can join in on this... (?)
>>
>> I'm not but I have c7 somewhere, so I can do a build for it and also
>> test, so we are on the same page.
>>
>>> I did do a test of the ath10k "before", fq_codel *never engaged*, and
>>> tcp induced latencies under load, e at 100mbit, cracked 600ms, while
>>> staying flat (20ms) at 100mbit. (not the same patches you are testing)
>>> on x86. I have got tcp 300Mbit out of an osx box, similar latency,
>>> have yet to get anything more on anything I currently have
>>> before/after patchsets.
>>>
>>> I'll go add flooding to the tests, I just finished a series comparing
>>> two different speed stations and life was good on that.
>>>
>>> "before" - fq_codel never engages, we see seconds of latency under load.
>>>
>>> root@apu2:~# tc -s qdisc show dev wlp4s0
>>> qdisc mq 0: root
>>> Sent 8570563893 bytes 6326983 pkt (dropped 0, overlimits 0 requeues 0)
>>> backlog 0b 0p requeues 0
>>> qdisc fq_codel 0: parent :1 limit 10240p flows 1024 quantum 1514
>>> target 5.0ms interval 100.0ms ecn
>>> Sent 2262 bytes 17 pkt (dropped 0, overlimits 0 requeues 0)
>>> backlog 0b 0p requeues 0
>>> maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>>> new_flows_len 0 old_flows_len 0
>>> qdisc fq_codel 0: parent :2 limit 10240p flows 1024 quantum 1514
>>> target 5.0ms interval 100.0ms ecn
>>> Sent 220486569 bytes 152058 pkt (dropped 0, overlimits 0 requeues 0)
>>> backlog 0b 0p requeues 0
>>> maxpacket 18168 drop_overlimit 0 new_flow_count 1 ecn_mark 0
>>> new_flows_len 0 old_flows_len 1
>>> qdisc fq_codel 0: parent :3 limit 10240p flows 1024 quantum 1514
>>> target 5.0ms interval 100.0ms ecn
>>> Sent 8340546509 bytes 6163431 pkt (dropped 0, overlimits 0 requeues 0)
>>> backlog 0b 0p requeues 0
>>> maxpacket 68130 drop_overlimit 0 new_flow_count 120050 ecn_mark 0
>>> new_flows_len 1 old_flows_len 3
>>> qdisc fq_codel 0: parent :4 limit 10240p flows 1024 quantum 1514
>>> target 5.0ms interval 100.0ms ecn
>>> Sent 9528553 bytes 11477 pkt (dropped 0, overlimits 0 requeues 0)
>>> backlog 0b 0p requeues 0
>>> maxpacket 66 drop_overlimit 0 new_flow_count 1 ecn_mark 0
>>> new_flows_len 1 old_flows_len 0
>>> ```
>>>
>>>
>>>>> This is certainly better than 30Mbps but still more than two times
>>>>> less than before (900).
>>>
>>> The number that I still am not sure we got is that you were sending
>>> 900mbit udp and recieving 900mbit on the prior tests?
>>
>> 900 was sending, AP POV (wifi client is downloading)
>>
>>>>> TCP also improved a little (550 to ~590).
>>>
>>> The limit is probably a bit low, also. You might want to try target
>>> 20ms as well.
>>
>> I've tried limit up to 1024 and target up to 80ms
>>
>>>>>
>>>>> Felix, others, do you want to see the ported patch, maybe I did something wrong?
>>>>> Doesn't look like it will save ath10k from performance regression.
>>>
>>> what was tcp "before"? (I'm sorry, such a long thread)
>>
>> 750Mbps
>
> Michal, after retesting with your patch (sorry, it was late yesterday,
> confused compat-wireless archives) I saw the difference.
> So the progress looks like this (all with fq_codel flows 16 limit 1024
> target 20ms):
> no patches: 380Mbps UDP, 550 TCP
> Eric's (fq_codel drop) patch: 420Mbps UDP, 590 TCP (+40Mbps), latency
> 5-6ms during test
> Michal's (improve tx scheduling) patch: 580Mbps UDP, 660 TCP, latency
> up to 30-40ms during test
> after Rajkumar's proposal to "try without registering wake_tx_queue
> callback": 820Mbps UDP, 690 TCP.
And the simultaneous ping on the last test was?
> So, very close to "as before": 900Mbps UDP, 750 TCP.
> But still, I was expecting performance improvements from latest ath10k
> code, not regressions.
> I know that hw is capable of 800Mbps TCP, which I'm targeting.
>
> Regards,
> Roman
>
> p.s. sorry for confusion
--
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: OpenWRT wrong adjustment of fq_codel defaults (Was: fq_codel_drop vs a udp flood)
2016-05-16 16:04 ` OpenWRT wrong adjustment of fq_codel defaults (Was: [Codel] " Dave Taht
@ 2016-05-16 19:46 ` Roman Yeryomin
0 siblings, 0 replies; 30+ messages in thread
From: Roman Yeryomin @ 2016-05-16 19:46 UTC (permalink / raw)
To: Dave Taht
Cc: Rajkumar Manoharan, make-wifi-fast, Rafał Miłecki,
ath10k, netdev@vger.kernel.org, codel@lists.bufferbloat.net,
Jonathan Morton, OpenWrt Development List, Felix Fietkau
On 16 May 2016 at 19:04, Dave Taht <dave.taht@gmail.com> wrote:
> On Mon, May 16, 2016 at 1:14 AM, Roman Yeryomin <leroi.lists@gmail.com> wrote:
>> On 16 May 2016 at 01:34, Roman Yeryomin <leroi.lists@gmail.com> wrote:
>>> On 6 May 2016 at 22:43, Dave Taht <dave.taht@gmail.com> wrote:
>>>> On Fri, May 6, 2016 at 11:56 AM, Roman Yeryomin <leroi.lists@gmail.com> wrote:
>>>>> On 6 May 2016 at 21:43, Roman Yeryomin <leroi.lists@gmail.com> wrote:
>>>>>> On 6 May 2016 at 15:47, Jesper Dangaard Brouer <brouer@redhat.com> wrote:
>>>>>>>
>>>>>>> I've created a OpenWRT ticket[1] on this issue, as it seems that someone[2]
>>>>>>> closed Felix'es OpenWRT email account (bad choice! emails bouncing).
>>>>>>> Sounds like OpenWRT and the LEDE https://www.lede-project.org/ project
>>>>>>> is in some kind of conflict.
>>>>>>>
>>>>>>> OpenWRT ticket [1] https://dev.openwrt.org/ticket/22349
>>>>>>>
>>>>>>> [2] http://thread.gmane.org/gmane.comp.embedded.openwrt.devel/40298/focus=40335
>>>>>>
>>>>>> OK, so, after porting the patch to 4.1 openwrt kernel and playing a
>>>>>> bit with fq_codel limits I was able to get 420Mbps UDP like this:
>>>>>> tc qdisc replace dev wlan0 parent :1 fq_codel flows 16 limit 256
>>>>>
>>>>> Forgot to mention, I've reduced drop_batch_size down to 32
>>>>
>>>> 0) Not clear to me if that's the right line, there are 4 wifi queues,
>>>> and the third one
>>>> is the BE queue.
>>>
>>> That was an example, sorry, should have stated that. I've applied same
>>> settings to all 4 queues.
>>>
>>>> That is too low a limit, also, for normal use. And:
>>>> for the purpose of this particular UDP test, flows 16 is ok, but not
>>>> ideal.
>>>
>>> I played with different combinations, it doesn't make any
>>> (significant) difference: 20-30Mbps, not more.
>>> What numbers would you propose?
>>>
>>>> 1) What's the tcp number (with a simultaneous ping) with this latest patchset?
>>>> (I care about tcp performance a lot more than udp floods - surviving a
>>>> udp flood yes, performance, no)
>>>
>>> During the test (both TCP and UDP) it's roughly 5ms in average, not
>>> running tests ~2ms. Actually I'm now wondering if target is working at
>>> all, because I had same result with target 80ms..
>>> So, yes, latency is good, but performance is poor.
>>>
>>>> before/after?
>>>>
>>>> tc -s qdisc show dev wlan0 during/after results?
>>>
>>> during the test:
>>>
>>> qdisc mq 0: root
>>> Sent 1600496000 bytes 1057194 pkt (dropped 1421568, overlimits 0 requeues 17)
>>> backlog 1545794b 1021p requeues 17
>>> qdisc fq_codel 8001: parent :1 limit 1024p flows 16 quantum 1514
>>> target 80.0ms ce_threshold 32us interval 100.0ms ecn
>>> Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>>> backlog 0b 0p requeues 0
>>> maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>>> new_flows_len 0 old_flows_len 0
>>> qdisc fq_codel 8002: parent :2 limit 1024p flows 16 quantum 1514
>>> target 80.0ms ce_threshold 32us interval 100.0ms ecn
>>> Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>>> backlog 0b 0p requeues 0
>>> maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>>> new_flows_len 0 old_flows_len 0
>>> qdisc fq_codel 8003: parent :3 limit 1024p flows 16 quantum 1514
>>> target 80.0ms ce_threshold 32us interval 100.0ms ecn
>>> Sent 1601271168 bytes 1057706 pkt (dropped 1422304, overlimits 0 requeues 17)
>>> backlog 1541252b 1018p requeues 17
>>> maxpacket 1514 drop_overlimit 1422304 new_flow_count 35 ecn_mark 0
>>> new_flows_len 0 old_flows_len 1
>>> qdisc fq_codel 8004: parent :4 limit 1024p flows 16 quantum 1514
>>> target 80.0ms ce_threshold 32us interval 100.0ms ecn
>>> Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>>> backlog 0b 0p requeues 0
>>> maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>>> new_flows_len 0 old_flows_len 0
>>>
>>>
>>> after the test (60sec):
>>>
>>> qdisc mq 0: root
>>> Sent 3084996052 bytes 2037744 pkt (dropped 2770176, overlimits 0 requeues 28)
>>> backlog 0b 0p requeues 28
>>> qdisc fq_codel 8001: parent :1 limit 1024p flows 16 quantum 1514
>>> target 80.0ms ce_threshold 32us interval 100.0ms ecn
>>> Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>>> backlog 0b 0p requeues 0
>>> maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>>> new_flows_len 0 old_flows_len 0
>>> qdisc fq_codel 8002: parent :2 limit 1024p flows 16 quantum 1514
>>> target 80.0ms ce_threshold 32us interval 100.0ms ecn
>>> Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>>> backlog 0b 0p requeues 0
>>> maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>>> new_flows_len 0 old_flows_len 0
>>> qdisc fq_codel 8003: parent :3 limit 1024p flows 16 quantum 1514
>>> target 80.0ms ce_threshold 32us interval 100.0ms ecn
>>> Sent 3084996052 bytes 2037744 pkt (dropped 2770176, overlimits 0 requeues 28)
>>> backlog 0b 0p requeues 28
>>> maxpacket 1514 drop_overlimit 2770176 new_flow_count 64 ecn_mark 0
>>> new_flows_len 0 old_flows_len 1
>>> qdisc fq_codel 8004: parent :4 limit 1024p flows 16 quantum 1514
>>> target 80.0ms ce_threshold 32us interval 100.0ms ecn
>>> Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>>> backlog 0b 0p requeues 0
>>> maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>>> new_flows_len 0 old_flows_len 0
>>>
>>>
>>>> IF you are doing builds for the archer c7v2, I can join in on this... (?)
>>>
>>> I'm not but I have c7 somewhere, so I can do a build for it and also
>>> test, so we are on the same page.
>>>
>>>> I did do a test of the ath10k "before", fq_codel *never engaged*, and
>>>> tcp induced latencies under load, e at 100mbit, cracked 600ms, while
>>>> staying flat (20ms) at 100mbit. (not the same patches you are testing)
>>>> on x86. I have got tcp 300Mbit out of an osx box, similar latency,
>>>> have yet to get anything more on anything I currently have
>>>> before/after patchsets.
>>>>
>>>> I'll go add flooding to the tests, I just finished a series comparing
>>>> two different speed stations and life was good on that.
>>>>
>>>> "before" - fq_codel never engages, we see seconds of latency under load.
>>>>
>>>> root@apu2:~# tc -s qdisc show dev wlp4s0
>>>> qdisc mq 0: root
>>>> Sent 8570563893 bytes 6326983 pkt (dropped 0, overlimits 0 requeues 0)
>>>> backlog 0b 0p requeues 0
>>>> qdisc fq_codel 0: parent :1 limit 10240p flows 1024 quantum 1514
>>>> target 5.0ms interval 100.0ms ecn
>>>> Sent 2262 bytes 17 pkt (dropped 0, overlimits 0 requeues 0)
>>>> backlog 0b 0p requeues 0
>>>> maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>>>> new_flows_len 0 old_flows_len 0
>>>> qdisc fq_codel 0: parent :2 limit 10240p flows 1024 quantum 1514
>>>> target 5.0ms interval 100.0ms ecn
>>>> Sent 220486569 bytes 152058 pkt (dropped 0, overlimits 0 requeues 0)
>>>> backlog 0b 0p requeues 0
>>>> maxpacket 18168 drop_overlimit 0 new_flow_count 1 ecn_mark 0
>>>> new_flows_len 0 old_flows_len 1
>>>> qdisc fq_codel 0: parent :3 limit 10240p flows 1024 quantum 1514
>>>> target 5.0ms interval 100.0ms ecn
>>>> Sent 8340546509 bytes 6163431 pkt (dropped 0, overlimits 0 requeues 0)
>>>> backlog 0b 0p requeues 0
>>>> maxpacket 68130 drop_overlimit 0 new_flow_count 120050 ecn_mark 0
>>>> new_flows_len 1 old_flows_len 3
>>>> qdisc fq_codel 0: parent :4 limit 10240p flows 1024 quantum 1514
>>>> target 5.0ms interval 100.0ms ecn
>>>> Sent 9528553 bytes 11477 pkt (dropped 0, overlimits 0 requeues 0)
>>>> backlog 0b 0p requeues 0
>>>> maxpacket 66 drop_overlimit 0 new_flow_count 1 ecn_mark 0
>>>> new_flows_len 1 old_flows_len 0
>>>> ```
>>>>
>>>>
>>>>>> This is certainly better than 30Mbps but still more than two times
>>>>>> less than before (900).
>>>>
>>>> The number that I still am not sure we got is that you were sending
>>>> 900mbit udp and recieving 900mbit on the prior tests?
>>>
>>> 900 was sending, AP POV (wifi client is downloading)
>>>
>>>>>> TCP also improved a little (550 to ~590).
>>>>
>>>> The limit is probably a bit low, also. You might want to try target
>>>> 20ms as well.
>>>
>>> I've tried limit up to 1024 and target up to 80ms
>>>
>>>>>>
>>>>>> Felix, others, do you want to see the ported patch, maybe I did something wrong?
>>>>>> Doesn't look like it will save ath10k from performance regression.
>>>>
>>>> what was tcp "before"? (I'm sorry, such a long thread)
>>>
>>> 750Mbps
>>
>> Michal, after retesting with your patch (sorry, it was late yesterday,
>> confused compat-wireless archives) I saw the difference.
>> So the progress looks like this (all with fq_codel flows 16 limit 1024
>> target 20ms):
>> no patches: 380Mbps UDP, 550 TCP
>> Eric's (fq_codel drop) patch: 420Mbps UDP, 590 TCP (+40Mbps), latency
>> 5-6ms during test
>> Michal's (improve tx scheduling) patch: 580Mbps UDP, 660 TCP, latency
>> up to 30-40ms during test
>> after Rajkumar's proposal to "try without registering wake_tx_queue
>> callback": 820Mbps UDP, 690 TCP.
>
> And the simultaneous ping on the last test was?
same as previous: 30-40ms
Regards,
Roman
_______________________________________________
Codel mailing list
Codel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/codel
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH net-next] fq_codel: fix memory limitation drift
2016-05-16 1:16 ` [PATCH net-next] fq_codel: fix memory limitation drift Eric Dumazet
@ 2016-05-17 1:57 ` David Miller
0 siblings, 0 replies; 30+ messages in thread
From: David Miller @ 2016-05-17 1:57 UTC (permalink / raw)
To: eric.dumazet; +Cc: brouer, dave.taht, netdev, moeller0
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Sun, 15 May 2016 18:16:38 -0700
> From: Eric Dumazet <edumazet@google.com>
>
> memory_usage must be decreased in dequeue_func(), not in
> fq_codel_dequeue(), otherwise packets dropped by Codel algo
> are missing this decrease.
>
> Also we need to clear memory_usage in fq_codel_reset()
>
> Fixes: 95b58430abe7 ("fq_codel: add memory limitation per queue")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
Applied.
^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2016-05-17 1:57 UTC | newest]
Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <CAA93jw6QLyx9EaS+ntB0D3duoysu_Z-UYyQfHnRa=pfqPDfWOw@mail.gmail.com>
[not found] ` <1462125592.5535.194.camel@edumazet-glaptop3.roam.corp.google.com>
[not found] ` <865DA393-262D-40B6-A9D3-1B978CD5F6C6@gmail.com>
[not found] ` <1462128385.5535.200.camel@edumazet-glaptop3.roam.corp.google.com>
[not found] ` <C5D365DA-18EE-446E-9D25-41F48B1C583E@gmail.com>
[not found] ` <1462136140.5535.219.camel@edumazet-glaptop3.roam.corp.google.com>
[not found] ` <CACiydbKUu11=zWitkDha0ddgk1-G_Z4-e1+=9ky776VktF5HHg@mail.gmail.com>
[not found] ` <1462201620.5535.250.camel@edumazet-glaptop3.roam.corp.google.com>
[not found] ` <CACiydbKeKUENncrc-NmYRcku-DGVeGqqzYMqsCqKdxPsR7yUOQ@mail.gmail.com>
[not found] ` <1462205669.5535.254.camel@edumazet-glaptop3.roam.corp.google.com>
[not found] ` <CACiydbL26Jj3EcEL4EmqaH=1Dm-Q0dpVwoWxqUSZ7ry10bRgeg@mail.gmail.com>
[not found] ` <1462464776.13075.18.camel@edumazet-glaptop3.roam.corp.google.com>
[not found] ` <CACiydbJYrFRMJvSHM6tJYWR8vmgV3xxnSUQsb-h5WcnLW=KXoQ@mail.gmail.com>
[not found] ` <1462476207.13075.20.camel@edumazet-glaptop3.roam.corp.google.com>
2016-05-06 9:42 ` OpenWRT wrong adjustment of fq_codel defaults (Was: fq_codel_drop vs a udp flood) Jesper Dangaard Brouer
2016-05-06 12:47 ` Jesper Dangaard Brouer
2016-05-06 18:43 ` Roman Yeryomin
2016-05-06 18:56 ` Roman Yeryomin
2016-05-06 19:43 ` Dave Taht
2016-05-15 22:34 ` Roman Yeryomin
2016-05-15 23:07 ` Eric Dumazet
2016-05-15 23:27 ` Roman Yeryomin
2016-05-16 8:12 ` [Make-wifi-fast] " David Lang
2016-05-16 8:26 ` [Make-wifi-fast] OpenWRT wrong adjustment of fq_codel defaults (Was: [Codel] " Roman Yeryomin
2016-05-16 8:46 ` [Make-wifi-fast] OpenWRT wrong adjustment of fq_codel defaults (Was: " David Lang
2016-05-16 10:34 ` [OpenWrt-Devel] " Sebastian Moeller
2016-05-16 8:14 ` Roman Yeryomin
2016-05-16 14:23 ` [Make-wifi-fast] " Eric Dumazet
2016-05-16 16:04 ` OpenWRT wrong adjustment of fq_codel defaults (Was: [Codel] " Dave Taht
2016-05-16 19:46 ` OpenWRT wrong adjustment of fq_codel defaults (Was: " Roman Yeryomin
2016-05-07 9:57 ` OpenWRT wrong adjustment of fq_codel defaults (Was: [Codel] " Kevin Darbyshire-Bryant
2016-05-15 22:47 ` Roman Yeryomin
[not found] ` <CAA93jw4XZ1+LLX1z6wQ6DHsp4vFCS5zmMz-uku_8SBNG_KuUxA@mail.gmail.com>
[not found] ` <542135C7-D7CC-4E33-B35B-C2AD259FA5AB@gmx.de>
[not found] ` <20160506133323.0b190f47@redhat.com>
[not found] ` <D29AB9B2-D514-4D14-9A16-CFB9ECD05B17@gmx.de>
[not found] ` <1462541156.13075.34.camel@edumazet-glaptop3.roam.corp.google.com>
2016-05-06 15:55 ` [PATCH net-next] fq_codel: add memory limitation per queue Eric Dumazet
2016-05-09 3:49 ` David Miller
2016-05-09 4:14 ` Cong Wang
2016-05-09 4:31 ` Eric Dumazet
2016-05-09 5:07 ` Cong Wang
2016-05-09 14:26 ` Eric Dumazet
2016-05-10 4:34 ` Cong Wang
2016-05-10 4:45 ` Eric Dumazet
2016-05-10 4:57 ` Cong Wang
2016-05-10 5:10 ` Eric Dumazet
2016-05-16 1:16 ` [PATCH net-next] fq_codel: fix memory limitation drift Eric Dumazet
2016-05-17 1:57 ` David Miller
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).