* [PATCH net v3 0/2] net: sched: fix some issues
@ 2013-11-19 7:25 Yang Yingliang
2013-11-19 7:25 ` [PATCH net v3 1/2] net: sched: tbf: fix calculation of max_size Yang Yingliang
2013-11-19 7:25 ` [PATCH net v3 2/2] net: sched: htb: fix calculation of quantum Yang Yingliang
0 siblings, 2 replies; 19+ messages in thread
From: Yang Yingliang @ 2013-11-19 7:25 UTC (permalink / raw)
To: davem, netdev; +Cc: eric.dumazet, brouer, jpirko, jbrouer
fix a regression introduced by commit b757c9336d63f94c6b57532
(tbf: improved accuracy at high rates).
fix quantum calculation introduced by 64bit rates.
v2
patch 1/2: redescribe the regression.
patch 2/2: add Eric's ack.
v3
patch 1/2: use psched_l2t_ns to calculate max_size
and cleanup exit/done section suggested by Jesper.
Yang Yingliang (2):
net: sched: tbf: fix calculation of max_size
net: sched: htb: fix calculation of quantum
net/sched/sch_htb.c | 18 ++++++++------
net/sched/sch_tbf.c | 71 ++++++++++++++++++++++++++++-------------------------
2 files changed, 47 insertions(+), 42 deletions(-)
--
1.8.0
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH net v3 1/2] net: sched: tbf: fix calculation of max_size
2013-11-19 7:25 [PATCH net v3 0/2] net: sched: fix some issues Yang Yingliang
@ 2013-11-19 7:25 ` Yang Yingliang
2013-11-19 9:38 ` Jesper Dangaard Brouer
2013-11-23 19:06 ` Eric Dumazet
2013-11-19 7:25 ` [PATCH net v3 2/2] net: sched: htb: fix calculation of quantum Yang Yingliang
1 sibling, 2 replies; 19+ messages in thread
From: Yang Yingliang @ 2013-11-19 7:25 UTC (permalink / raw)
To: davem, netdev; +Cc: eric.dumazet, brouer, jpirko, jbrouer
commit b757c9336d63f94c6b57532(tbf: improved accuracy at high rates)
introduce a regression.
With the follow command:
tc qdisc add dev eth1 root handle 1: tbf latency 50ms burst 10KB rate 30gbit mtu 64k
Without this patch, the max_size value is 10751(bytes).
But, in fact, the real max_size value should be smaller than 7440(bytes).
Or a packet whose length is bigger than 7440 will cause network congestion.
Because the packet is so big that can't get enough tokens. Even all the tokens
in the buffer is given to the packet.
With this patch, the max_size value is 7440(bytes).
The packets whose length is bigger than 7440(bytes) will be dropped or reshape
in tbf_enqueue().
Suggested-by: Jesper Dangaard Brouer <jbrouer@redhat.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
---
net/sched/sch_tbf.c | 71 ++++++++++++++++++++++++++++-------------------------
1 file changed, 37 insertions(+), 34 deletions(-)
diff --git a/net/sched/sch_tbf.c b/net/sched/sch_tbf.c
index 68f9859..c194129 100644
--- a/net/sched/sch_tbf.c
+++ b/net/sched/sch_tbf.c
@@ -291,33 +291,20 @@ static int tbf_change(struct Qdisc *sch, struct nlattr *opt)
goto done;
qopt = nla_data(tb[TCA_TBF_PARMS]);
- rtab = qdisc_get_rtab(&qopt->rate, tb[TCA_TBF_RTAB]);
- if (rtab == NULL)
- goto done;
-
- if (qopt->peakrate.rate) {
- if (qopt->peakrate.rate > qopt->rate.rate)
- ptab = qdisc_get_rtab(&qopt->peakrate, tb[TCA_TBF_PTAB]);
- if (ptab == NULL)
- goto done;
+ if (qopt->rate.linklayer == TC_LINKLAYER_UNAWARE) {
+ rtab = qdisc_get_rtab(&qopt->rate, tb[TCA_TBF_RTAB]);
+ if (rtab) {
+ qdisc_put_rtab(rtab);
+ rtab = NULL;
+ }
}
-
- for (n = 0; n < 256; n++)
- if (rtab->data[n] > qopt->buffer)
- break;
- max_size = (n << qopt->rate.cell_log) - 1;
- if (ptab) {
- int size;
-
- for (n = 0; n < 256; n++)
- if (ptab->data[n] > qopt->mtu)
- break;
- size = (n << qopt->peakrate.cell_log) - 1;
- if (size < max_size)
- max_size = size;
+ if (qopt->peakrate.linklayer == TC_LINKLAYER_UNAWARE) {
+ ptab = qdisc_get_rtab(&qopt->peakrate, tb[TCA_TBF_PTAB]);
+ if (ptab) {
+ qdisc_put_rtab(ptab);
+ ptab = NULL;
+ }
}
- if (max_size < 0)
- goto done;
if (q->qdisc != &noop_qdisc) {
err = fifo_set_limit(q->qdisc, qopt->limit);
@@ -339,30 +326,46 @@ static int tbf_change(struct Qdisc *sch, struct nlattr *opt)
}
q->limit = qopt->limit;
q->mtu = PSCHED_TICKS2NS(qopt->mtu);
- q->max_size = max_size;
q->buffer = PSCHED_TICKS2NS(qopt->buffer);
q->tokens = q->buffer;
q->ptokens = q->mtu;
if (tb[TCA_TBF_RATE64])
rate64 = nla_get_u64(tb[TCA_TBF_RATE64]);
- psched_ratecfg_precompute(&q->rate, &rtab->rate, rate64);
- if (ptab) {
+ psched_ratecfg_precompute(&q->rate, &qopt->rate, rate64);
+ if (!q->rate.rate_bytes_ps)
+ goto unlock_done;
+ for (n = 0; n < 65536; n++)
+ if (psched_l2t_ns(&q->rate, n) > q->buffer)
+ break;
+ max_size = min_t(u32, n, (256ULL << qopt->rate.cell_log) - 1);
+
+ if (qopt->peakrate.rate) {
+ int size = 0;
if (tb[TCA_TBF_PRATE64])
prate64 = nla_get_u64(tb[TCA_TBF_PRATE64]);
- psched_ratecfg_precompute(&q->peak, &ptab->rate, prate64);
+ psched_ratecfg_precompute(&q->peak, &qopt->peakrate, prate64);
+ for (n = 0; n < 65536; n++)
+ if (psched_l2t_ns(&q->peak, n) > q->mtu)
+ break;
+ size = min_t(u32, n, (256ULL << qopt->rate.cell_log) - 1);
+ max_size = min_t(u32, max_size, size);
q->peak_present = true;
} else {
q->peak_present = false;
}
+ if (!max_size)
+ goto unlock_done;
+ q->max_size = max_size;
+
sch_tree_unlock(sch);
- err = 0;
+ return 0;
+
+unlock_done:
+ sch_tree_unlock(sch);
+ err = -EINVAL;
done:
- if (rtab)
- qdisc_put_rtab(rtab);
- if (ptab)
- qdisc_put_rtab(ptab);
return err;
}
--
1.8.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH net v3 2/2] net: sched: htb: fix calculation of quantum
2013-11-19 7:25 [PATCH net v3 0/2] net: sched: fix some issues Yang Yingliang
2013-11-19 7:25 ` [PATCH net v3 1/2] net: sched: tbf: fix calculation of max_size Yang Yingliang
@ 2013-11-19 7:25 ` Yang Yingliang
1 sibling, 0 replies; 19+ messages in thread
From: Yang Yingliang @ 2013-11-19 7:25 UTC (permalink / raw)
To: davem, netdev; +Cc: eric.dumazet, brouer, jpirko, jbrouer
Now, 32bit rates may be not the true rate.
So use rate_bytes_ps which is from
max(rate32, rate64) to calcualte quantum.
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Acked-by: Eric Dumazet <edumazet@google.com>
---
net/sched/sch_htb.c | 18 ++++++++++--------
1 file changed, 10 insertions(+), 8 deletions(-)
diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
index 0e1e38b..57c6678 100644
--- a/net/sched/sch_htb.c
+++ b/net/sched/sch_htb.c
@@ -1477,11 +1477,20 @@ static int htb_change_class(struct Qdisc *sch, u32 classid,
sch_tree_lock(sch);
}
+ rate64 = tb[TCA_HTB_RATE64] ? nla_get_u64(tb[TCA_HTB_RATE64]) : 0;
+
+ ceil64 = tb[TCA_HTB_CEIL64] ? nla_get_u64(tb[TCA_HTB_CEIL64]) : 0;
+
+ psched_ratecfg_precompute(&cl->rate, &hopt->rate, rate64);
+ psched_ratecfg_precompute(&cl->ceil, &hopt->ceil, ceil64);
+
/* it used to be a nasty bug here, we have to check that node
* is really leaf before changing cl->un.leaf !
*/
if (!cl->level) {
- cl->quantum = hopt->rate.rate / q->rate2quantum;
+ u64 quantum = div64_u64(cl->rate.rate_bytes_ps,
+ q->rate2quantum);
+ cl->quantum = min_t(u64, quantum, INT_MAX);
if (!hopt->quantum && cl->quantum < 1000) {
pr_warning(
"HTB: quantum of class %X is small. Consider r2q change.\n",
@@ -1500,13 +1509,6 @@ static int htb_change_class(struct Qdisc *sch, u32 classid,
cl->prio = TC_HTB_NUMPRIO - 1;
}
- rate64 = tb[TCA_HTB_RATE64] ? nla_get_u64(tb[TCA_HTB_RATE64]) : 0;
-
- ceil64 = tb[TCA_HTB_CEIL64] ? nla_get_u64(tb[TCA_HTB_CEIL64]) : 0;
-
- psched_ratecfg_precompute(&cl->rate, &hopt->rate, rate64);
- psched_ratecfg_precompute(&cl->ceil, &hopt->ceil, ceil64);
-
cl->buffer = PSCHED_TICKS2NS(hopt->buffer);
cl->cbuffer = PSCHED_TICKS2NS(hopt->cbuffer);
--
1.8.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH net v3 1/2] net: sched: tbf: fix calculation of max_size
2013-11-19 7:25 ` [PATCH net v3 1/2] net: sched: tbf: fix calculation of max_size Yang Yingliang
@ 2013-11-19 9:38 ` Jesper Dangaard Brouer
2013-11-20 2:14 ` Yang Yingliang
2013-11-23 19:06 ` Eric Dumazet
1 sibling, 1 reply; 19+ messages in thread
From: Jesper Dangaard Brouer @ 2013-11-19 9:38 UTC (permalink / raw)
To: Yang Yingliang; +Cc: davem, netdev, eric.dumazet, brouer, jpirko
On Tue, 19 Nov 2013 15:25:38 +0800
Yang Yingliang <yangyingliang@huawei.com> wrote:
> commit b757c9336d63f94c6b57532(tbf: improved accuracy at high rates)
> introduce a regression.
>
> With the follow command:
> tc qdisc add dev eth1 root handle 1: tbf latency 50ms burst 10KB rate 30gbit mtu 64k
>
> Without this patch, the max_size value is 10751(bytes).
> But, in fact, the real max_size value should be smaller than 7440(bytes).
> Or a packet whose length is bigger than 7440 will cause network congestion.
> Because the packet is so big that can't get enough tokens. Even all the tokens
> in the buffer is given to the packet.
Sorry, but I don't like the commit message. The real problem is the
value in q->buffer, and that the userspace rate table cannot handle
these high rates, which you don't mention.
I would write something like:
The kernel no longer uses the userspace provided rate table. Thus, it
is wrong to calculate max_size based on this rate table. At high rates
this rate table gets very inaccurate, which can lead wrong calculation
of max_size.
Consequence of max_size being too large is severe, and cause packets
being stalled in tbf_dequeue() because it cannot get enough tokens.
The max_size guards against enqueuing packet sizes above q->buffer
"time" in tbf_enqueue().
This patch fixes the calculation of max_size. By ... add desc ...
perhaps also mention how it is connected to p->mtu (with is also a
"time" value).
The rest of the patch looks okay now, one point below though.
> diff --git a/net/sched/sch_tbf.c b/net/sched/sch_tbf.c
> index 68f9859..c194129 100644
> --- a/net/sched/sch_tbf.c
> +++ b/net/sched/sch_tbf.c
[...]
> @@ -339,30 +326,46 @@ static int tbf_change(struct Qdisc *sch, struct nlattr *opt)
[...]
> + for (n = 0; n < 65536; n++)
> + if (psched_l2t_ns(&q->rate, n) > q->buffer)
> + break;
> + max_size = min_t(u32, n, (256ULL << qopt->rate.cell_log) - 1);
I'm a little uncertain about, if using the 65536 constant is okay, or
considered "bad style".
I'm still a little confused/uncertain why we need the "qopt->rate.cell_log".
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Sr. Network Kernel Developer at Red Hat
Author of http://www.iptv-analyzer.org
LinkedIn: http://www.linkedin.com/in/brouer
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net v3 1/2] net: sched: tbf: fix calculation of max_size
2013-11-19 9:38 ` Jesper Dangaard Brouer
@ 2013-11-20 2:14 ` Yang Yingliang
2013-11-20 10:04 ` Jesper Dangaard Brouer
0 siblings, 1 reply; 19+ messages in thread
From: Yang Yingliang @ 2013-11-20 2:14 UTC (permalink / raw)
To: Jesper Dangaard Brouer; +Cc: davem, netdev, eric.dumazet, jpirko
On 2013/11/19 17:38, Jesper Dangaard Brouer wrote:
> On Tue, 19 Nov 2013 15:25:38 +0800
> Yang Yingliang <yangyingliang@huawei.com> wrote:
>
>> commit b757c9336d63f94c6b57532(tbf: improved accuracy at high rates)
>> introduce a regression.
>>
>> With the follow command:
>> tc qdisc add dev eth1 root handle 1: tbf latency 50ms burst 10KB rate 30gbit mtu 64k
>>
>> Without this patch, the max_size value is 10751(bytes).
>> But, in fact, the real max_size value should be smaller than 7440(bytes).
>> Or a packet whose length is bigger than 7440 will cause network congestion.
>> Because the packet is so big that can't get enough tokens. Even all the tokens
>> in the buffer is given to the packet.
>
> Sorry, but I don't like the commit message. The real problem is the
> value in q->buffer, and that the userspace rate table cannot handle
> these high rates, which you don't mention.
>
> I would write something like:
>
> The kernel no longer uses the userspace provided rate table. Thus, it
> is wrong to calculate max_size based on this rate table. At high rates
> this rate table gets very inaccurate, which can lead wrong calculation
> of max_size.
>
> Consequence of max_size being too large is severe, and cause packets
> being stalled in tbf_dequeue() because it cannot get enough tokens.
> The max_size guards against enqueuing packet sizes above q->buffer
> "time" in tbf_enqueue().
>
> This patch fixes the calculation of max_size. By ... add desc ...
> perhaps also mention how it is connected to p->mtu (with is also a
> "time" value).
>
>
> The rest of the patch looks okay now, one point below though.
OK, change it in v4.
Thanks!
>
>
>> diff --git a/net/sched/sch_tbf.c b/net/sched/sch_tbf.c
>> index 68f9859..c194129 100644
>> --- a/net/sched/sch_tbf.c
>> +++ b/net/sched/sch_tbf.c
> [...]
>> @@ -339,30 +326,46 @@ static int tbf_change(struct Qdisc *sch, struct nlattr *opt)
> [...]
>> + for (n = 0; n < 65536; n++)
>> + if (psched_l2t_ns(&q->rate, n) > q->buffer)
>> + break;
>> + max_size = min_t(u32, n, (256ULL << qopt->rate.cell_log) - 1);
>
> I'm a little uncertain about, if using the 65536 constant is okay, or
> considered "bad style".
I'll use a MACRO to instead of this constant in v4.
Thanks!
>
> I'm still a little confused/uncertain why we need the "qopt->rate.cell_log".
>
Because we need max_size be smaller than mtu(user input in bytes).
E.g. if user inputs like this "... burst 100KB rate 100mbit mtu 4095",
without this patch, max_size is 4095.
But with this patch, if don't use cell_log, max_size is 102400.
I think it's not correct, so I used cell_log here.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net v3 1/2] net: sched: tbf: fix calculation of max_size
2013-11-20 2:14 ` Yang Yingliang
@ 2013-11-20 10:04 ` Jesper Dangaard Brouer
2013-11-20 12:50 ` Yang Yingliang
0 siblings, 1 reply; 19+ messages in thread
From: Jesper Dangaard Brouer @ 2013-11-20 10:04 UTC (permalink / raw)
To: Yang Yingliang
Cc: Jesper Dangaard Brouer, davem, netdev, eric.dumazet, jpirko
On Wed, 20 Nov 2013 10:14:43 +0800
Yang Yingliang <yangyingliang@huawei.com> wrote:
> On 2013/11/19 17:38, Jesper Dangaard Brouer wrote:
> > On Tue, 19 Nov 2013 15:25:38 +0800
> > Yang Yingliang <yangyingliang@huawei.com> wrote:
> >
[...]
> >
> >> diff --git a/net/sched/sch_tbf.c b/net/sched/sch_tbf.c
> >> index 68f9859..c194129 100644
> >> --- a/net/sched/sch_tbf.c
> >> +++ b/net/sched/sch_tbf.c
> > [...]
> >> @@ -339,30 +326,46 @@ static int tbf_change(struct Qdisc *sch, struct nlattr *opt)
> > [...]
> >> + for (n = 0; n < 65536; n++)
> >> + if (psched_l2t_ns(&q->rate, n) > q->buffer)
> >> + break;
> >> + max_size = min_t(u32, n, (256ULL << qopt->rate.cell_log) - 1);
> >
> > I'm a little uncertain about, if using the 65536 constant is okay, or
> > considered "bad style".
>
> I'll use a MACRO to instead of this constant in v4.
I'm not saying you have to use a macro for this. I'm fine with using
65536 here, some other developers might have stronger opinions on this?
> > I'm still a little confused/uncertain why we need the "qopt->rate.cell_log".
> >
>
> Because we need max_size be smaller than mtu(user input in bytes).
> E.g. if user inputs like this "... burst 100KB rate 100mbit mtu 4095",
> without this patch, max_size is 4095.
> But with this patch, if don't use cell_log, max_size is 102400.
> I think it's not correct, so I used cell_log here.
Hmmmm... so you are using cell_log to approximate the MTU size, because
struct tc_ratespec, does not contain the a mtu parameter. You do
realize this is only an approximation. (This also smells like a bad
transition away from the userspace rate tables)
What about the p->mtu parameter from struct tc_tbf_qopt. I know it is
in a "time" format... but hey, you are already using this parameter
requoting your patch:
On Tue, 19 Nov 2013 15:25:38 +0800
Yang Yingliang <yangyingliang@huawei.com> wrote:
> diff --git a/net/sched/sch_tbf.c b/net/sched/sch_tbf.c
> index 68f9859..c194129 100644
> --- a/net/sched/sch_tbf.c
> +++ b/net/sched/sch_tbf.c
[... cut ...]
> @@ -339,30 +326,46 @@ static int tbf_change(struct Qdisc *sch, struct nlattr *opt)
> }
> q->limit = qopt->limit;
> q->mtu = PSCHED_TICKS2NS(qopt->mtu);
> - q->max_size = max_size;
> q->buffer = PSCHED_TICKS2NS(qopt->buffer);
> q->tokens = q->buffer;
> q->ptokens = q->mtu;
>
> if (tb[TCA_TBF_RATE64])
> rate64 = nla_get_u64(tb[TCA_TBF_RATE64]);
> - psched_ratecfg_precompute(&q->rate, &rtab->rate, rate64);
> - if (ptab) {
> + psched_ratecfg_precompute(&q->rate, &qopt->rate, rate64);
> + if (!q->rate.rate_bytes_ps)
> + goto unlock_done;
> + for (n = 0; n < 65536; n++)
> + if (psched_l2t_ns(&q->rate, n) > q->buffer)
> + break;
> + max_size = min_t(u32, n, (256ULL << qopt->rate.cell_log) - 1);
> +
> + if (qopt->peakrate.rate) {
> + int size = 0;
> if (tb[TCA_TBF_PRATE64])
> prate64 = nla_get_u64(tb[TCA_TBF_PRATE64]);
> - psched_ratecfg_precompute(&q->peak, &ptab->rate, prate64);
> + psched_ratecfg_precompute(&q->peak, &qopt->peakrate, prate64);
> + for (n = 0; n < 65536; n++)
> + if (psched_l2t_ns(&q->peak, n) > q->mtu)
You are using q->mtu here, BUT why are you only doing this, when there
is a qopt->peakrate.rate, set??? (I might have missed something)
> + break;
> + size = min_t(u32, n, (256ULL << qopt->rate.cell_log) - 1);
> + max_size = min_t(u32, max_size, size);
> q->peak_present = true;
> } else {
> q->peak_present = false;
> }
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Sr. Network Kernel Developer at Red Hat
Author of http://www.iptv-analyzer.org
LinkedIn: http://www.linkedin.com/in/brouer
For easy reference
------------------
(include/uapi/linux/pkt_sched.h)
struct tc_ratespec {
unsigned char cell_log;
__u8 linklayer; /* lower 4 bits */
unsigned short overhead;
short cell_align;
unsigned short mpu;
__u32 rate;
};
struct tc_tbf_qopt {
struct tc_ratespec rate;
struct tc_ratespec peakrate;
__u32 limit;
__u32 buffer;
__u32 mtu;
};
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net v3 1/2] net: sched: tbf: fix calculation of max_size
2013-11-20 10:04 ` Jesper Dangaard Brouer
@ 2013-11-20 12:50 ` Yang Yingliang
0 siblings, 0 replies; 19+ messages in thread
From: Yang Yingliang @ 2013-11-20 12:50 UTC (permalink / raw)
To: jbrouer; +Cc: brouer, davem, netdev, eric.dumazet, jpirko
On 2013/11/20 18:04, Jesper Dangaard Brouer wrote:
> On Wed, 20 Nov 2013 10:14:43 +0800
> Yang Yingliang <yangyingliang@huawei.com> wrote:
>> On 2013/11/19 17:38, Jesper Dangaard Brouer wrote:
>>> On Tue, 19 Nov 2013 15:25:38 +0800
>>> Yang Yingliang <yangyingliang@huawei.com> wrote:
>>>
> [...]
>>>> diff --git a/net/sched/sch_tbf.c b/net/sched/sch_tbf.c
>>>> index 68f9859..c194129 100644
>>>> --- a/net/sched/sch_tbf.c
>>>> +++ b/net/sched/sch_tbf.c
>>> [...]
>>>> @@ -339,30 +326,46 @@ static int tbf_change(struct Qdisc *sch, struct nlattr *opt)
>>> [...]
>>>> + for (n = 0; n < 65536; n++)
>>>> + if (psched_l2t_ns(&q->rate, n) > q->buffer)
>>>> + break;
>>>> + max_size = min_t(u32, n, (256ULL << qopt->rate.cell_log) - 1);
>>> I'm a little uncertain about, if using the 65536 constant is okay, or
>>> considered "bad style".
>> I'll use a MACRO to instead of this constant in v4.
> I'm not saying you have to use a macro for this. I'm fine with using
> 65536 here, some other developers might have stronger opinions on this?
>
>
>>> I'm still a little confused/uncertain why we need the "qopt->rate.cell_log".
>>>
>> Because we need max_size be smaller than mtu(user input in bytes).
>> E.g. if user inputs like this "... burst 100KB rate 100mbit mtu 4095",
>> without this patch, max_size is 4095.
>> But with this patch, if don't use cell_log, max_size is 102400.
>> I think it's not correct, so I used cell_log here.
>
> Hmmmm... so you are using cell_log to approximate the MTU size, because
> struct tc_ratespec, does not contain the a mtu parameter. You do
> realize this is only an approximation. (This also smells like a bad
> transition away from the userspace rate tables)
Hmmm... not exactly, cell_log can be input by user,
if input like this "... burst 100KB/4 rate 100mbit mtu 4095",
with old calculation method, max_size would be 1023, but with
new calculation method, it's 65536. I use cell_log here make it
be same as before.
Maybe don't need be same, what's your opinion?
Thanks!
>
> What about the p->mtu parameter from struct tc_tbf_qopt. I know it is
> in a "time" format... but hey, you are already using this parameter
> requoting your patch:
>
>
> On Tue, 19 Nov 2013 15:25:38 +0800
> Yang Yingliang <yangyingliang@huawei.com> wrote:
>
>> diff --git a/net/sched/sch_tbf.c b/net/sched/sch_tbf.c
>> index 68f9859..c194129 100644
>> --- a/net/sched/sch_tbf.c
>> +++ b/net/sched/sch_tbf.c
> [... cut ...]
>> @@ -339,30 +326,46 @@ static int tbf_change(struct Qdisc *sch, struct nlattr *opt)
>> }
>> q->limit = qopt->limit;
>> q->mtu = PSCHED_TICKS2NS(qopt->mtu);
>> - q->max_size = max_size;
>> q->buffer = PSCHED_TICKS2NS(qopt->buffer);
>> q->tokens = q->buffer;
>> q->ptokens = q->mtu;
>>
>> if (tb[TCA_TBF_RATE64])
>> rate64 = nla_get_u64(tb[TCA_TBF_RATE64]);
>> - psched_ratecfg_precompute(&q->rate, &rtab->rate, rate64);
>> - if (ptab) {
>> + psched_ratecfg_precompute(&q->rate, &qopt->rate, rate64);
>> + if (!q->rate.rate_bytes_ps)
>> + goto unlock_done;
>> + for (n = 0; n < 65536; n++)
>> + if (psched_l2t_ns(&q->rate, n) > q->buffer)
>> + break;
>> + max_size = min_t(u32, n, (256ULL << qopt->rate.cell_log) - 1);
>> +
>> + if (qopt->peakrate.rate) {
>> + int size = 0;
>> if (tb[TCA_TBF_PRATE64])
>> prate64 = nla_get_u64(tb[TCA_TBF_PRATE64]);
>> - psched_ratecfg_precompute(&q->peak, &ptab->rate, prate64);
>> + psched_ratecfg_precompute(&q->peak, &qopt->peakrate, prate64);
>> + for (n = 0; n < 65536; n++)
>> + if (psched_l2t_ns(&q->peak, n) > q->mtu)
> You are using q->mtu here, BUT why are you only doing this, when there
> is a qopt->peakrate.rate, set??? (I might have missed something)
>
When qopt->peakrate.rate is not set, q->mtu is 0.
Regards,
Yang
>> + break;
>> + size = min_t(u32, n, (256ULL << qopt->rate.cell_log) - 1);
>> + max_size = min_t(u32, max_size, size);
>> q->peak_present = true;
>> } else {
>> q->peak_present = false;
>> }
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net v3 1/2] net: sched: tbf: fix calculation of max_size
2013-11-19 7:25 ` [PATCH net v3 1/2] net: sched: tbf: fix calculation of max_size Yang Yingliang
2013-11-19 9:38 ` Jesper Dangaard Brouer
@ 2013-11-23 19:06 ` Eric Dumazet
2013-11-24 7:28 ` Yang Yingliang
1 sibling, 1 reply; 19+ messages in thread
From: Eric Dumazet @ 2013-11-23 19:06 UTC (permalink / raw)
To: Yang Yingliang; +Cc: davem, netdev, brouer, jpirko, jbrouer
On Tue, 2013-11-19 at 15:25 +0800, Yang Yingliang wrote:
> commit b757c9336d63f94c6b57532(tbf: improved accuracy at high rates)
> introduce a regression.
>
> With the follow command:
> tc qdisc add dev eth1 root handle 1: tbf latency 50ms burst 10KB rate 30gbit mtu 64k
>
> Without this patch, the max_size value is 10751(bytes).
> But, in fact, the real max_size value should be smaller than 7440(bytes).
> Or a packet whose length is bigger than 7440 will cause network congestion.
> Because the packet is so big that can't get enough tokens. Even all the tokens
> in the buffer is given to the packet.
>
> With this patch, the max_size value is 7440(bytes).
> The packets whose length is bigger than 7440(bytes) will be dropped or reshape
> in tbf_enqueue().
This changelog makes no sense.
If userland asks 'burst 10KB', then burst is 10KB, not 7440.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net v3 1/2] net: sched: tbf: fix calculation of max_size
2013-11-23 19:06 ` Eric Dumazet
@ 2013-11-24 7:28 ` Yang Yingliang
2013-11-24 18:40 ` Eric Dumazet
0 siblings, 1 reply; 19+ messages in thread
From: Yang Yingliang @ 2013-11-24 7:28 UTC (permalink / raw)
To: Eric Dumazet, Yang Yingliang; +Cc: davem, netdev, brouer, jpirko, jbrouer
On 2013/11/24 3:06, Eric Dumazet wrote:
> On Tue, 2013-11-19 at 15:25 +0800, Yang Yingliang wrote:
>> commit b757c9336d63f94c6b57532(tbf: improved accuracy at high rates)
>> introduce a regression.
>>
>> With the follow command:
>> tc qdisc add dev eth1 root handle 1: tbf latency 50ms burst 10KB rate 30gbit mtu 64k
>>
>> Without this patch, the max_size value is 10751(bytes).
>> But, in fact, the real max_size value should be smaller than 7440(bytes).
>> Or a packet whose length is bigger than 7440 will cause network congestion.
>> Because the packet is so big that can't get enough tokens. Even all the tokens
>> in the buffer is given to the packet.
>>
>> With this patch, the max_size value is 7440(bytes).
>> The packets whose length is bigger than 7440(bytes) will be dropped or reshape
>> in tbf_enqueue().
> This changelog makes no sense.
>
> If userland asks 'burst 10KB', then burst is 10KB, not 7440.
>
Ideally burst should be 10KB in kernel space.
But at hight rates, when burst is converted to
time in tick in userland, it gets much more loss
than low rates. So the burst can't actually
reach 10KB in kernel.
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net v3 1/2] net: sched: tbf: fix calculation of max_size
2013-11-24 7:28 ` Yang Yingliang
@ 2013-11-24 18:40 ` Eric Dumazet
2013-11-25 3:43 ` Yang Yingliang
0 siblings, 1 reply; 19+ messages in thread
From: Eric Dumazet @ 2013-11-24 18:40 UTC (permalink / raw)
To: Yang Yingliang; +Cc: Yang Yingliang, davem, netdev, brouer, jpirko, jbrouer
On Sun, 2013-11-24 at 15:28 +0800, Yang Yingliang wrote:
> O>
> >> With the follow command:
> >> tc qdisc add dev eth1 root handle 1: tbf latency 50ms burst 10KB rate 30gbit mtu 64k
> >>
> >
> Ideally burst should be 10KB in kernel space.
> But at hight rates, when burst is converted to
> time in tick in userland, it gets much more loss
> than low rates. So the burst can't actually
> reach 10KB in kernel.
If you think tc can help to fix user choices, please provide an
iproute2 patch.
Quite frankly, using a burst of 10KB and a rate of 30gbit is simply a
user error. It cannot possibly work. At all.
As stated in many tbf docs, burst must be larger than device mtu (1514)
By extension, with GRO/GSO, burst should be larger than 68130, otherwise
we need to segment the packets, and this is horribly expensive for high
rates.
I personally tc/tbf needs some changes, because the logical way would be
to use the 1514 value for low rates, but if we use this value, the
kernel gets a value of 1511, which doesn't work.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net v3 1/2] net: sched: tbf: fix calculation of max_size
2013-11-24 18:40 ` Eric Dumazet
@ 2013-11-25 3:43 ` Yang Yingliang
2013-11-25 12:04 ` [PATCH] " Yang Yingliang
0 siblings, 1 reply; 19+ messages in thread
From: Yang Yingliang @ 2013-11-25 3:43 UTC (permalink / raw)
To: Eric Dumazet; +Cc: netdev, davem, brouer, jpirko, jbrouer
On 2013/11/25 2:40, Eric Dumazet wrote:
> On Sun, 2013-11-24 at 15:28 +0800, Yang Yingliang wrote:
>> O>
>>>> With the follow command:
>>>> tc qdisc add dev eth1 root handle 1: tbf latency 50ms burst 10KB rate 30gbit mtu 64k
>>>>
>>>
>> Ideally burst should be 10KB in kernel space.
>> But at hight rates, when burst is converted to
>> time in tick in userland, it gets much more loss
>> than low rates. So the burst can't actually
>> reach 10KB in kernel.
>
> If you think tc can help to fix user choices, please provide an
> iproute2 patch.
Unfortunately, it can't. It always has some loss when burst in bytes
is converted to buffer in ticks, we cannot avoid it, except we send burst
to kernel directly, this need to modify the code both in iproute2 and kernel.
>
> Quite frankly, using a burst of 10KB and a rate of 30gbit is simply a
> user error. It cannot possibly work. At all.
Yep, I agree it's a user error. But the error may cause network down, and
we have a way to avoid it, I think we should fix it. :)
>
> As stated in many tbf docs, burst must be larger than device mtu (1514)
>
> By extension, with GRO/GSO, burst should be larger than 68130, otherwise
> we need to segment the packets, and this is horribly expensive for high
> rates.
>
> I personally tc/tbf needs some changes, because the logical way would be
> to use the 1514 value for low rates, but if we use this value, the
> kernel gets a value of 1511, which doesn't work.
Yep, I agree with your logical way. But I think current logic in kernel
cannot support to only do some changes in tc/tbf. We need changes both
in tc and kernel. I'll post a patch which uses TCA_TBF_BURST to get burst
from tc directly and this patch also need to keep backward compatible with
old tc.
Regards,
Yang
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH] net: sched: tbf: fix calculation of max_size
2013-11-25 3:43 ` Yang Yingliang
@ 2013-11-25 12:04 ` Yang Yingliang
2013-11-25 12:22 ` David Laight
2013-12-02 16:45 ` David Miller
0 siblings, 2 replies; 19+ messages in thread
From: Yang Yingliang @ 2013-11-25 12:04 UTC (permalink / raw)
To: Eric Dumazet; +Cc: netdev, davem, brouer, jpirko, jbrouer
From: Yang Yingliang <yangyingliang@huawei.com>
Current max_size is caluated from rate table. Now, the rate table
has been replaced and it's wrong to caculate max_size based on this
rate table. It can lead wrong calculation of max_size.
The burst in kernel may be lower than user asked, because burst may gets
some loss when transform it to buffer(E.g. "burst 40kb rate 30mbit/s")
and it seems we cannot avoid this loss. And burst's value(max_size) based
on rate table may be equal user asked. If a packet's length is max_size,
this packet will be stalled in tbf_dequeue() because its length is above
the burst in kernel so that it cannot get enough tokens. The max_size guards
against enqueuing packet sizes above q->buffer "time" in tbf_enqueue().
This patch fixes the calculation of max_size by using psched_l2t_ns() and
q->buffer to recalculate burst(max_size).
Also, add support to get burst from userland directly. We can avoid loss
in byte-to-time transform by using burst directly. Iproute2 will need a
patch to send burst to kernel.
Suggested-by: Jesper Dangaard Brouer <jbrouer@redhat.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
---
include/uapi/linux/pkt_sched.h | 2 +
net/sched/sch_tbf.c | 110 ++++++++++++++++++++++++++---------------
2 files changed, 72 insertions(+), 40 deletions(-)
diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
index 307f293..b5a0976 100644
--- a/include/uapi/linux/pkt_sched.h
+++ b/include/uapi/linux/pkt_sched.h
@@ -173,6 +173,8 @@ enum {
TCA_TBF_PTAB,
TCA_TBF_RATE64,
TCA_TBF_PRATE64,
+ TCA_TBF_BURST,
+ TCA_TBF_PBURST,
__TCA_TBF_MAX,
};
diff --git a/net/sched/sch_tbf.c b/net/sched/sch_tbf.c
index a609005..3d01bf0 100644
--- a/net/sched/sch_tbf.c
+++ b/net/sched/sch_tbf.c
@@ -281,8 +281,12 @@ static const struct nla_policy tbf_policy[TCA_TBF_MAX + 1] = {
[TCA_TBF_PTAB] = { .type = NLA_BINARY, .len = TC_RTAB_SIZE },
[TCA_TBF_RATE64] = { .type = NLA_U64 },
[TCA_TBF_PRATE64] = { .type = NLA_U64 },
+ [TCA_TBF_BURST] = { .type = NLA_U32 },
+ [TCA_TBF_PBURST] = { .type = NLA_U32 },
};
+#define MAX_PKT_LEN 65535
+
static int tbf_change(struct Qdisc *sch, struct nlattr *opt)
{
int err;
@@ -292,7 +296,7 @@ static int tbf_change(struct Qdisc *sch, struct nlattr *opt)
struct qdisc_rate_table *rtab = NULL;
struct qdisc_rate_table *ptab = NULL;
struct Qdisc *child = NULL;
- int max_size, n;
+ int max_size;
u64 rate64 = 0, prate64 = 0;
err = nla_parse_nested(tb, TCA_TBF_MAX, opt, tbf_policy);
@@ -304,38 +308,20 @@ static int tbf_change(struct Qdisc *sch, struct nlattr *opt)
goto done;
qopt = nla_data(tb[TCA_TBF_PARMS]);
- rtab = qdisc_get_rtab(&qopt->rate, tb[TCA_TBF_RTAB]);
- if (rtab == NULL)
- goto done;
-
- if (qopt->peakrate.rate) {
- if (qopt->peakrate.rate > qopt->rate.rate)
- ptab = qdisc_get_rtab(&qopt->peakrate, tb[TCA_TBF_PTAB]);
- if (ptab == NULL)
- goto done;
+ if (qopt->rate.linklayer == TC_LINKLAYER_UNAWARE) {
+ rtab = qdisc_get_rtab(&qopt->rate, tb[TCA_TBF_RTAB]);
+ if (rtab) {
+ qdisc_put_rtab(rtab);
+ rtab = NULL;
+ }
}
-
- for (n = 0; n < 256; n++)
- if (rtab->data[n] > qopt->buffer)
- break;
- max_size = (n << qopt->rate.cell_log) - 1;
- if (ptab) {
- int size;
-
- for (n = 0; n < 256; n++)
- if (ptab->data[n] > qopt->mtu)
- break;
- size = (n << qopt->peakrate.cell_log) - 1;
- if (size < max_size)
- max_size = size;
+ if (qopt->peakrate.linklayer == TC_LINKLAYER_UNAWARE) {
+ ptab = qdisc_get_rtab(&qopt->peakrate, tb[TCA_TBF_PTAB]);
+ if (ptab) {
+ qdisc_put_rtab(ptab);
+ ptab = NULL;
+ }
}
- if (max_size < 0)
- goto done;
-
- if (max_size < psched_mtu(qdisc_dev(sch)))
- pr_warn_ratelimited("sch_tbf: burst %u is lower than device %s mtu (%u) !\n",
- max_size, qdisc_dev(sch)->name,
- psched_mtu(qdisc_dev(sch)));
if (q->qdisc != &noop_qdisc) {
err = fifo_set_limit(q->qdisc, qopt->limit);
@@ -357,30 +343,74 @@ static int tbf_change(struct Qdisc *sch, struct nlattr *opt)
}
q->limit = qopt->limit;
q->mtu = PSCHED_TICKS2NS(qopt->mtu);
- q->max_size = max_size;
q->buffer = PSCHED_TICKS2NS(qopt->buffer);
q->tokens = q->buffer;
q->ptokens = q->mtu;
if (tb[TCA_TBF_RATE64])
rate64 = nla_get_u64(tb[TCA_TBF_RATE64]);
- psched_ratecfg_precompute(&q->rate, &rtab->rate, rate64);
- if (ptab) {
+ psched_ratecfg_precompute(&q->rate, &qopt->rate, rate64);
+ if (!q->rate.rate_bytes_ps)
+ goto unlock_done;
+
+ if (tb[TCA_TBF_BURST]) {
+ u32 burst = nla_get_u32(tb[TCA_TBF_BURST]);
+ q->buffer = psched_l2t_ns(&q->rate, burst);
+ max_size = min_t(u32, burst, MAX_PKT_LEN);
+ } else {
+ for (max_size = 0; max_size < MAX_PKT_LEN; max_size++)
+ if (psched_l2t_ns(&q->rate, max_size) > q->buffer)
+ break;
+ max_size--;
+ if (max_size < 0)
+ goto unlock_done;
+ }
+
+ if (qopt->peakrate.rate) {
+ int size = 0;
if (tb[TCA_TBF_PRATE64])
prate64 = nla_get_u64(tb[TCA_TBF_PRATE64]);
- psched_ratecfg_precompute(&q->peak, &ptab->rate, prate64);
+ psched_ratecfg_precompute(&q->peak, &qopt->peakrate, prate64);
+ if (q->peak.rate_bytes_ps <= q->rate.rate_bytes_ps) {
+ pr_err("Rate must be lower than peakrate!\n");
+ goto unlock_done;
+ }
+
+ if (tb[TCA_TBF_PBURST]) {
+ u32 pburst = nla_get_u32(tb[TCA_TBF_PBURST]);
+ q->mtu = psched_l2t_ns(&q->peak, pburst);
+ size = min_t(u32, pburst, MAX_PKT_LEN);
+ } else {
+ for (size = 0; size < MAX_PKT_LEN; size++)
+ if (psched_l2t_ns(&q->peak, size) > q->mtu)
+ break;
+ size--;
+ if (size < 0)
+ goto unlock_done;
+ }
+ max_size = min_t(u32, max_size, size);
q->peak_present = true;
} else {
q->peak_present = false;
}
+ if (!max_size)
+ goto unlock_done;
+
+ if (max_size < psched_mtu(qdisc_dev(sch)))
+ pr_warn_ratelimited("sch_tbf: burst %u is lower than device %s mtu (%u) !\n",
+ max_size, qdisc_dev(sch)->name,
+ psched_mtu(qdisc_dev(sch)));
+
+ q->max_size = max_size;
+
sch_tree_unlock(sch);
- err = 0;
+ return 0;
+
+unlock_done:
+ sch_tree_unlock(sch);
+ err = -EINVAL;
done:
- if (rtab)
- qdisc_put_rtab(rtab);
- if (ptab)
- qdisc_put_rtab(ptab);
return err;
}
-- 1.8.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* RE: [PATCH] net: sched: tbf: fix calculation of max_size
2013-11-25 12:04 ` [PATCH] " Yang Yingliang
@ 2013-11-25 12:22 ` David Laight
2013-11-26 1:28 ` Yang Yingliang
2013-12-02 1:11 ` David Miller
2013-12-02 16:45 ` David Miller
1 sibling, 2 replies; 19+ messages in thread
From: David Laight @ 2013-11-25 12:22 UTC (permalink / raw)
To: Yang Yingliang, Eric Dumazet; +Cc: netdev, davem, brouer, jpirko, jbrouer
> From: Yang Yingliang <yangyingliang@huawei.com>
>
> Current max_size is caluated from rate table. Now, the rate table
> has been replaced and it's wrong to caculate max_size based on this
> rate table. It can lead wrong calculation of max_size.
>
> The burst in kernel may be lower than user asked, because burst may gets
> some loss when transform it to buffer(E.g. "burst 40kb rate 30mbit/s")
> and it seems we cannot avoid this loss. And burst's value(max_size) based
> on rate table may be equal user asked. If a packet's length is max_size,
> this packet will be stalled in tbf_dequeue() because its length is above
> the burst in kernel so that it cannot get enough tokens. The max_size guards
> against enqueuing packet sizes above q->buffer "time" in tbf_enqueue().
Why not adjust the calculations so that the number of allocated tokens
can go negative?
So allow the transfer if the number of tokens is +ve and then subtract
the number needed for the message itself.
I think this would change the semantics of the configured 'burst' value
very slightly (to 'at least' from 'at most') but the average would still
be correct.
FWIW I've done similar rate limiters that run directly in units of 'time'.
The fact that system time advances automatically generates credit.
David
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] net: sched: tbf: fix calculation of max_size
2013-11-25 12:22 ` David Laight
@ 2013-11-26 1:28 ` Yang Yingliang
2013-11-26 2:35 ` Yang Yingliang
2013-12-02 1:11 ` David Miller
1 sibling, 1 reply; 19+ messages in thread
From: Yang Yingliang @ 2013-11-26 1:28 UTC (permalink / raw)
To: David Laight; +Cc: Eric Dumazet, netdev, davem, brouer, jpirko, jbrouer
On 2013/11/25 20:22, David Laight wrote:
>> From: Yang Yingliang <yangyingliang@huawei.com>
>>
>> Current max_size is caluated from rate table. Now, the rate table
>> has been replaced and it's wrong to caculate max_size based on this
>> rate table. It can lead wrong calculation of max_size.
>>
>> The burst in kernel may be lower than user asked, because burst may gets
>> some loss when transform it to buffer(E.g. "burst 40kb rate 30mbit/s")
>> and it seems we cannot avoid this loss. And burst's value(max_size) based
>> on rate table may be equal user asked. If a packet's length is max_size,
>> this packet will be stalled in tbf_dequeue() because its length is above
>> the burst in kernel so that it cannot get enough tokens. The max_size guards
>> against enqueuing packet sizes above q->buffer "time" in tbf_enqueue().
>
> Why not adjust the calculations so that the number of allocated tokens
> can go negative?
>
> So allow the transfer if the number of tokens is +ve and then subtract
> the number needed for the message itself.
Thanks for your advice!
I had considered it before. But, I think that we calculate tokens from ns
but max_size is calculated based on rate table causes the problem. I think we
should make them consistent.
>
> I think this would change the semantics of the configured 'burst' value
> very slightly (to 'at least' from 'at most') but the average would still
> be correct.
Hmm, I'm not sure it is 'at least'. Maybe it could be lower.
Regards,
Yang
>
> FWIW I've done similar rate limiters that run directly in units of 'time'.
> The fact that system time advances automatically generates credit.
>
> David
>
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] net: sched: tbf: fix calculation of max_size
2013-11-26 1:28 ` Yang Yingliang
@ 2013-11-26 2:35 ` Yang Yingliang
0 siblings, 0 replies; 19+ messages in thread
From: Yang Yingliang @ 2013-11-26 2:35 UTC (permalink / raw)
To: David Laight; +Cc: Eric Dumazet, netdev, davem, brouer, jpirko, jbrouer
On 2013/11/26 9:28, Yang Yingliang wrote:
> On 2013/11/25 20:22, David Laight wrote:
>>> From: Yang Yingliang <yangyingliang@huawei.com>
>>>
>>> Current max_size is caluated from rate table. Now, the rate table
>>> has been replaced and it's wrong to caculate max_size based on this
>>> rate table. It can lead wrong calculation of max_size.
>>>
>>> The burst in kernel may be lower than user asked, because burst may gets
>>> some loss when transform it to buffer(E.g. "burst 40kb rate 30mbit/s")
>>> and it seems we cannot avoid this loss. And burst's value(max_size) based
>>> on rate table may be equal user asked. If a packet's length is max_size,
>>> this packet will be stalled in tbf_dequeue() because its length is above
>>> the burst in kernel so that it cannot get enough tokens. The max_size guards
>>> against enqueuing packet sizes above q->buffer "time" in tbf_enqueue().
>>
>> Why not adjust the calculations so that the number of allocated tokens
>> can go negative?
>>
>> So allow the transfer if the number of tokens is +ve and then subtract
>> the number needed for the message itself.
>
> Thanks for your advice!
> I had considered it before. But, I think that we calculate tokens from ns
> but max_size is calculated based on rate table causes the problem. I think we
> should make them consistent.
>
Forgot to say, fixing calculation of max_size is also needed by 64bit rates.
Regards,
Yang
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] net: sched: tbf: fix calculation of max_size
2013-11-25 12:22 ` David Laight
2013-11-26 1:28 ` Yang Yingliang
@ 2013-12-02 1:11 ` David Miller
2013-12-02 10:29 ` David Laight
1 sibling, 1 reply; 19+ messages in thread
From: David Miller @ 2013-12-02 1:11 UTC (permalink / raw)
To: David.Laight; +Cc: yangyingliang, eric.dumazet, netdev, brouer, jpirko, jbrouer
From: "David Laight" <David.Laight@ACULAB.COM>
Date: Mon, 25 Nov 2013 12:22:30 -0000
>> From: Yang Yingliang <yangyingliang@huawei.com>
>>
>> Current max_size is caluated from rate table. Now, the rate table
>> has been replaced and it's wrong to caculate max_size based on this
>> rate table. It can lead wrong calculation of max_size.
>>
>> The burst in kernel may be lower than user asked, because burst may gets
>> some loss when transform it to buffer(E.g. "burst 40kb rate 30mbit/s")
>> and it seems we cannot avoid this loss. And burst's value(max_size) based
>> on rate table may be equal user asked. If a packet's length is max_size,
>> this packet will be stalled in tbf_dequeue() because its length is above
>> the burst in kernel so that it cannot get enough tokens. The max_size guards
>> against enqueuing packet sizes above q->buffer "time" in tbf_enqueue().
>
> Why not adjust the calculations so that the number of allocated tokens
> can go negative?
>
> So allow the transfer if the number of tokens is +ve and then subtract
> the number needed for the message itself.
>
> I think this would change the semantics of the configured 'burst' value
> very slightly (to 'at least' from 'at most') but the average would still
> be correct.
>
> FWIW I've done similar rate limiters that run directly in units of 'time'.
> The fact that system time advances automatically generates credit.
Yang has responded to your concerns, are they addressed?
^ permalink raw reply [flat|nested] 19+ messages in thread
* RE: [PATCH] net: sched: tbf: fix calculation of max_size
2013-12-02 1:11 ` David Miller
@ 2013-12-02 10:29 ` David Laight
0 siblings, 0 replies; 19+ messages in thread
From: David Laight @ 2013-12-02 10:29 UTC (permalink / raw)
To: David Miller; +Cc: yangyingliang, eric.dumazet, netdev, brouer, jpirko, jbrouer
> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org] On Behalf Of David Miller
> Sent: 02 December 2013 01:11
> To: David Laight
> Cc: yangyingliang@huawei.com; eric.dumazet@gmail.com; netdev@vger.kernel.org; brouer@redhat.com;
> jpirko@redhat.com; jbrouer@redhat.com
> Subject: Re: [PATCH] net: sched: tbf: fix calculation of max_size
>
> From: "David Laight" <David.Laight@ACULAB.COM>
> Date: Mon, 25 Nov 2013 12:22:30 -0000
>
> >> From: Yang Yingliang <yangyingliang@huawei.com>
> >>
> >> Current max_size is caluated from rate table. Now, the rate table
> >> has been replaced and it's wrong to caculate max_size based on this
> >> rate table. It can lead wrong calculation of max_size.
> >>
> >> The burst in kernel may be lower than user asked, because burst may gets
> >> some loss when transform it to buffer(E.g. "burst 40kb rate 30mbit/s")
> >> and it seems we cannot avoid this loss. And burst's value(max_size) based
> >> on rate table may be equal user asked. If a packet's length is max_size,
> >> this packet will be stalled in tbf_dequeue() because its length is above
> >> the burst in kernel so that it cannot get enough tokens. The max_size guards
> >> against enqueuing packet sizes above q->buffer "time" in tbf_enqueue().
> >
> > Why not adjust the calculations so that the number of allocated tokens
> > can go negative?
> >
> > So allow the transfer if the number of tokens is +ve and then subtract
> > the number needed for the message itself.
> >
> > I think this would change the semantics of the configured 'burst' value
> > very slightly (to 'at least' from 'at most') but the average would still
> > be correct.
> >
> > FWIW I've done similar rate limiters that run directly in units of 'time'.
> > The fact that system time advances automatically generates credit.
>
> Yang has responded to your concerns, are they addressed?
I was mostly making some suggestions that might have made fixing it easier.
So I'm not really worried.
David
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] net: sched: tbf: fix calculation of max_size
2013-11-25 12:04 ` [PATCH] " Yang Yingliang
2013-11-25 12:22 ` David Laight
@ 2013-12-02 16:45 ` David Miller
2013-12-03 0:59 ` Yang Yingliang
1 sibling, 1 reply; 19+ messages in thread
From: David Miller @ 2013-12-02 16:45 UTC (permalink / raw)
To: yangyingliang; +Cc: eric.dumazet, netdev, brouer, jpirko, jbrouer
From: Yang Yingliang <yangyingliang@huawei.com>
Date: Mon, 25 Nov 2013 20:04:23 +0800
> From: Yang Yingliang <yangyingliang@huawei.com>
>
> Current max_size is caluated from rate table. Now, the rate table
> has been replaced and it's wrong to caculate max_size based on this
> rate table. It can lead wrong calculation of max_size.
>
> The burst in kernel may be lower than user asked, because burst may gets
> some loss when transform it to buffer(E.g. "burst 40kb rate 30mbit/s")
> and it seems we cannot avoid this loss. And burst's value(max_size) based
> on rate table may be equal user asked. If a packet's length is max_size,
> this packet will be stalled in tbf_dequeue() because its length is above
> the burst in kernel so that it cannot get enough tokens. The max_size guards
> against enqueuing packet sizes above q->buffer "time" in tbf_enqueue().
>
> This patch fixes the calculation of max_size by using psched_l2t_ns() and
> q->buffer to recalculate burst(max_size).
>
> Also, add support to get burst from userland directly. We can avoid loss
> in byte-to-time transform by using burst directly. Iproute2 will need a
> patch to send burst to kernel.
>
> Suggested-by: Jesper Dangaard Brouer <jbrouer@redhat.com>
> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
I don't see why you need to add the userland explicit burst capability
to fix the calculation of max_size.
These two things are separate, the new netlink attributes are a new
feature.
Therefore, please submit these two things separately. First, submit
the pure max_size bug fix for 'net'. Then when I open 'net-next' back
up you can submit the support for the new netlink attributes.
Thanks.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] net: sched: tbf: fix calculation of max_size
2013-12-02 16:45 ` David Miller
@ 2013-12-03 0:59 ` Yang Yingliang
0 siblings, 0 replies; 19+ messages in thread
From: Yang Yingliang @ 2013-12-03 0:59 UTC (permalink / raw)
To: David Miller; +Cc: eric.dumazet, netdev, brouer, jpirko, jbrouer
On 2013/12/3 0:45, David Miller wrote:
> From: Yang Yingliang <yangyingliang@huawei.com>
> Date: Mon, 25 Nov 2013 20:04:23 +0800
>
>> From: Yang Yingliang <yangyingliang@huawei.com>
>>
>> Current max_size is caluated from rate table. Now, the rate table
>> has been replaced and it's wrong to caculate max_size based on this
>> rate table. It can lead wrong calculation of max_size.
>>
>> The burst in kernel may be lower than user asked, because burst may gets
>> some loss when transform it to buffer(E.g. "burst 40kb rate 30mbit/s")
>> and it seems we cannot avoid this loss. And burst's value(max_size) based
>> on rate table may be equal user asked. If a packet's length is max_size,
>> this packet will be stalled in tbf_dequeue() because its length is above
>> the burst in kernel so that it cannot get enough tokens. The max_size guards
>> against enqueuing packet sizes above q->buffer "time" in tbf_enqueue().
>>
>> This patch fixes the calculation of max_size by using psched_l2t_ns() and
>> q->buffer to recalculate burst(max_size).
>>
>> Also, add support to get burst from userland directly. We can avoid loss
>> in byte-to-time transform by using burst directly. Iproute2 will need a
>> patch to send burst to kernel.
>>
>> Suggested-by: Jesper Dangaard Brouer <jbrouer@redhat.com>
>> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
>
> I don't see why you need to add the userland explicit burst capability
> to fix the calculation of max_size.
>
> These two things are separate, the new netlink attributes are a new
> feature.
>
> Therefore, please submit these two things separately. First, submit
> the pure max_size bug fix for 'net'. Then when I open 'net-next' back
> up you can submit the support for the new netlink attributes.
>
> Thanks.
>
OK, I'll send a v4.
Regards,
Yang
>
> .
>
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2013-12-03 0:59 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-11-19 7:25 [PATCH net v3 0/2] net: sched: fix some issues Yang Yingliang
2013-11-19 7:25 ` [PATCH net v3 1/2] net: sched: tbf: fix calculation of max_size Yang Yingliang
2013-11-19 9:38 ` Jesper Dangaard Brouer
2013-11-20 2:14 ` Yang Yingliang
2013-11-20 10:04 ` Jesper Dangaard Brouer
2013-11-20 12:50 ` Yang Yingliang
2013-11-23 19:06 ` Eric Dumazet
2013-11-24 7:28 ` Yang Yingliang
2013-11-24 18:40 ` Eric Dumazet
2013-11-25 3:43 ` Yang Yingliang
2013-11-25 12:04 ` [PATCH] " Yang Yingliang
2013-11-25 12:22 ` David Laight
2013-11-26 1:28 ` Yang Yingliang
2013-11-26 2:35 ` Yang Yingliang
2013-12-02 1:11 ` David Miller
2013-12-02 10:29 ` David Laight
2013-12-02 16:45 ` David Miller
2013-12-03 0:59 ` Yang Yingliang
2013-11-19 7:25 ` [PATCH net v3 2/2] net: sched: htb: fix calculation of quantum Yang Yingliang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).