* [PATCH net v6 0/2] net: sched: fix two issues
@ 2013-12-06 7:00 Yang Yingliang
2013-12-06 7:00 ` [PATCH net v6 1/2] net: sched: tbf: fix the calculation of max_size Yang Yingliang
2013-12-06 7:00 ` [PATCH net v6 2/2] net: sched: htb: fix the calculation of quantum Yang Yingliang
0 siblings, 2 replies; 14+ messages in thread
From: Yang Yingliang @ 2013-12-06 7:00 UTC (permalink / raw)
To: davem, netdev; +Cc: eric.dumazet, brouer, jpirko, jbrouer
fix calculation of max_size in tbf
fix quantum calculation introduced by 64bit rates.
v6
patch 1/2:
1.Use a simpler and better way to calculate length in psched_ns_t2l().
2.Replace pr_err() with pr_warn_ratelimited().
patch 2/2: Replace div64_u64() with do_div() suggested by Eric.
v5
patch 1/2:
1.Remove rtab and ptab use suggested by Eric.
2.Don't reduce max_size to 65536, as Eric suggested that
if a burst set to 200KB, we do not want tbf use 64KB
or even less. So add a helper psched_ns_t2l to calculate
max_size directly.
v4
patch 1/2:
1.Update commit message suggested by Jesper.
2.Use a macro to replace 65535 constant.
3.Add condition that when peakrate is lower than rate, return -EINVAL.
4.Don't use cell_log anymore.
v3
patch 1/2: use psched_l2t_ns to calculate max_size
and cleanup exit/done section suggested by Jesper.
v2
patch 1/2: redescribe the regression.
patch 2/2: add Eric's ack.
Yang Yingliang (2):
net: sched: tbf: fix the calculation of max_size
net: sched: htb: fix the calculation of quantum
net/sched/sch_htb.c | 20 ++++++----
net/sched/sch_tbf.c | 104 ++++++++++++++++++++++++++++++----------------------
2 files changed, 73 insertions(+), 51 deletions(-)
--
1.8.0
^ permalink raw reply [flat|nested] 14+ messages in thread* [PATCH net v6 1/2] net: sched: tbf: fix the calculation of max_size 2013-12-06 7:00 [PATCH net v6 0/2] net: sched: fix two issues Yang Yingliang @ 2013-12-06 7:00 ` Yang Yingliang 2013-12-06 10:56 ` David Laight 2013-12-06 7:00 ` [PATCH net v6 2/2] net: sched: htb: fix the calculation of quantum Yang Yingliang 1 sibling, 1 reply; 14+ messages in thread From: Yang Yingliang @ 2013-12-06 7:00 UTC (permalink / raw) To: davem, netdev; +Cc: eric.dumazet, brouer, jpirko, jbrouer Current max_size is caluated from rate table. Now, the rate table has been replaced and it's wrong to caculate max_size based on this rate table. It can lead wrong calculation of max_size. The burst in kernel may be lower than user asked, because burst may gets some loss when transform it to buffer(E.g. "burst 40kb rate 30mbit/s") and it seems we cannot avoid this loss. Burst's value(max_size) based on rate table may be equal user asked. If a packet's length is max_size, this packet will be stalled in tbf_dequeue() because its length is above the burst in kernel so that it cannot get enough tokens. The max_size guards against enqueuing packet sizes above q->buffer "time" in tbf_enqueue(). To make consistent with the calculation of tokens, this patch add a helper psched_ns_t2l() to calculate burst(max_size) directly to fix this problem. After this fix, we can support to using 64bit rates to calculate burst as well. Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> --- net/sched/sch_tbf.c | 104 ++++++++++++++++++++++++++++++---------------------- 1 file changed, 61 insertions(+), 43 deletions(-) diff --git a/net/sched/sch_tbf.c b/net/sched/sch_tbf.c index a609005..dd731f5 100644 --- a/net/sched/sch_tbf.c +++ b/net/sched/sch_tbf.c @@ -118,6 +118,30 @@ struct tbf_sched_data { }; +/* Time to Length, convert time in ns to length in bytes + * to determinate how many bytes can be sent in given time. + */ +static u64 psched_ns_t2l(const struct psched_ratecfg *r, + u64 time_in_ns) +{ + /* The formula is : + * len = (time_in_ns * r->rate_bytes_ps) / NSEC_PER_SEC + */ + u64 len = time_in_ns * r->rate_bytes_ps; + + do_div(len, NSEC_PER_SEC); + + if (unlikely(r->linklayer == TC_LINKLAYER_ATM)) + len = (len / 53) * 48; + + if (len > r->overhead) + len -= r->overhead; + else + len = 0; + + return len; +} + /* * Return length of individual segments of a gso packet, * including all headers (MAC, IP, TCP/UDP) @@ -289,10 +313,8 @@ static int tbf_change(struct Qdisc *sch, struct nlattr *opt) struct tbf_sched_data *q = qdisc_priv(sch); struct nlattr *tb[TCA_TBF_MAX + 1]; struct tc_tbf_qopt *qopt; - struct qdisc_rate_table *rtab = NULL; - struct qdisc_rate_table *ptab = NULL; struct Qdisc *child = NULL; - int max_size, n; + u64 max_size; u64 rate64 = 0, prate64 = 0; err = nla_parse_nested(tb, TCA_TBF_MAX, opt, tbf_policy); @@ -304,38 +326,13 @@ static int tbf_change(struct Qdisc *sch, struct nlattr *opt) goto done; qopt = nla_data(tb[TCA_TBF_PARMS]); - rtab = qdisc_get_rtab(&qopt->rate, tb[TCA_TBF_RTAB]); - if (rtab == NULL) - goto done; - - if (qopt->peakrate.rate) { - if (qopt->peakrate.rate > qopt->rate.rate) - ptab = qdisc_get_rtab(&qopt->peakrate, tb[TCA_TBF_PTAB]); - if (ptab == NULL) - goto done; - } + if (qopt->rate.linklayer == TC_LINKLAYER_UNAWARE) + qdisc_put_rtab(qdisc_get_rtab(&qopt->rate, + tb[TCA_TBF_RTAB])); - for (n = 0; n < 256; n++) - if (rtab->data[n] > qopt->buffer) - break; - max_size = (n << qopt->rate.cell_log) - 1; - if (ptab) { - int size; - - for (n = 0; n < 256; n++) - if (ptab->data[n] > qopt->mtu) - break; - size = (n << qopt->peakrate.cell_log) - 1; - if (size < max_size) - max_size = size; - } - if (max_size < 0) - goto done; - - if (max_size < psched_mtu(qdisc_dev(sch))) - pr_warn_ratelimited("sch_tbf: burst %u is lower than device %s mtu (%u) !\n", - max_size, qdisc_dev(sch)->name, - psched_mtu(qdisc_dev(sch))); + if (qopt->peakrate.linklayer == TC_LINKLAYER_UNAWARE) + qdisc_put_rtab(qdisc_get_rtab(&qopt->peakrate, + tb[TCA_TBF_PTAB])); if (q->qdisc != &noop_qdisc) { err = fifo_set_limit(q->qdisc, qopt->limit); @@ -357,30 +354,51 @@ static int tbf_change(struct Qdisc *sch, struct nlattr *opt) } q->limit = qopt->limit; q->mtu = PSCHED_TICKS2NS(qopt->mtu); - q->max_size = max_size; q->buffer = PSCHED_TICKS2NS(qopt->buffer); q->tokens = q->buffer; q->ptokens = q->mtu; if (tb[TCA_TBF_RATE64]) rate64 = nla_get_u64(tb[TCA_TBF_RATE64]); - psched_ratecfg_precompute(&q->rate, &rtab->rate, rate64); - if (ptab) { + psched_ratecfg_precompute(&q->rate, &qopt->rate, rate64); + if (!q->rate.rate_bytes_ps) + goto unlock_done; + + max_size = min_t(u64, psched_ns_t2l(&q->rate, q->buffer), ~0U); + + if (qopt->peakrate.rate) { if (tb[TCA_TBF_PRATE64]) prate64 = nla_get_u64(tb[TCA_TBF_PRATE64]); - psched_ratecfg_precompute(&q->peak, &ptab->rate, prate64); + psched_ratecfg_precompute(&q->peak, &qopt->peakrate, prate64); + if (q->peak.rate_bytes_ps <= q->rate.rate_bytes_ps) { + pr_warn_ratelimited("sch_tbf: peakrate %llu is lower than or equals to rate %llu !\n", + q->peak.rate_bytes_ps, q->rate.rate_bytes_ps); + goto unlock_done; + } + + max_size = min_t(u64, max_size, psched_ns_t2l(&q->peak, q->mtu)); q->peak_present = true; } else { q->peak_present = false; } + if (max_size < psched_mtu(qdisc_dev(sch))) + pr_warn_ratelimited("sch_tbf: burst %llu is lower than device %s mtu (%u) !\n", + max_size, qdisc_dev(sch)->name, + psched_mtu(qdisc_dev(sch))); + + if (!max_size) + goto unlock_done; + + q->max_size = max_size; + sch_tree_unlock(sch); - err = 0; + return 0; + +unlock_done: + sch_tree_unlock(sch); + err = -EINVAL; done: - if (rtab) - qdisc_put_rtab(rtab); - if (ptab) - qdisc_put_rtab(ptab); return err; } -- 1.8.0 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* RE: [PATCH net v6 1/2] net: sched: tbf: fix the calculation of max_size 2013-12-06 7:00 ` [PATCH net v6 1/2] net: sched: tbf: fix the calculation of max_size Yang Yingliang @ 2013-12-06 10:56 ` David Laight 2013-12-09 2:26 ` Yang Yingliang 2013-12-09 3:26 ` Yang Yingliang 0 siblings, 2 replies; 14+ messages in thread From: David Laight @ 2013-12-06 10:56 UTC (permalink / raw) To: Yang Yingliang, davem, netdev; +Cc: eric.dumazet, brouer, jpirko, jbrouer > From: Yang Yingliang ... > > +/* Time to Length, convert time in ns to length in bytes > + * to determinate how many bytes can be sent in given time. > + */ > +static u64 psched_ns_t2l(const struct psched_ratecfg *r, > + u64 time_in_ns) > +{ > + /* The formula is : > + * len = (time_in_ns * r->rate_bytes_ps) / NSEC_PER_SEC > + */ > + u64 len = time_in_ns * r->rate_bytes_ps; > + > + do_div(len, NSEC_PER_SEC); You are multiplying two values then dividing by 10**9 I'd guess that the intermediate value might exceed 2**64. > + if (unlikely(r->linklayer == TC_LINKLAYER_ATM)) > + len = (len / 53) * 48; You probably want to do the multiply first. But why not scale rate_bytes_ps instead. > + if (len > r->overhead) > + len -= r->overhead; > + else > + len = 0; > + > + return len; > +} Personally I'd work out how much time you have to send each byte. So if you want a rate of 1MB/sec you have a 'time cost' per byte of 1000ns. The cost of sending a packet is simply the length multiplied by this cost. To work out whether a packet can be sent you have a credit variable that tracks current time. If 'credit > now' the packet can't be sent, queue and schedule a wakeup. if 'credit + backlog < now' set credit = now - backlog. if 'credit <= now' send the packet and add the packet's 'cost' to 'credit'. In the non-ratelimited case this is almost no work. You'd probably need to work in 1/1024ns time units and/or blocks of 16 bytes. David ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH net v6 1/2] net: sched: tbf: fix the calculation of max_size 2013-12-06 10:56 ` David Laight @ 2013-12-09 2:26 ` Yang Yingliang 2013-12-09 3:26 ` Yang Yingliang 1 sibling, 0 replies; 14+ messages in thread From: Yang Yingliang @ 2013-12-09 2:26 UTC (permalink / raw) To: David Laight, davem, netdev; +Cc: eric.dumazet, brouer, jpirko, jbrouer On 2013/12/6 18:56, David Laight wrote: >> From: Yang Yingliang > ... >> >> +/* Time to Length, convert time in ns to length in bytes >> + * to determinate how many bytes can be sent in given time. >> + */ >> +static u64 psched_ns_t2l(const struct psched_ratecfg *r, >> + u64 time_in_ns) >> +{ >> + /* The formula is : >> + * len = (time_in_ns * r->rate_bytes_ps) / NSEC_PER_SEC >> + */ >> + u64 len = time_in_ns * r->rate_bytes_ps; >> + >> + do_div(len, NSEC_PER_SEC); > > You are multiplying two values then dividing by 10**9 > I'd guess that the intermediate value might exceed 2**64. I thought the max value of len is burst which is a type of u32 sent userland, so the max value of (time_in_ns * r->rate_bytes_ps) should be 64K*(10**9). Hmm, maybe I should do something in this helper to avoid overflow. > >> + if (unlikely(r->linklayer == TC_LINKLAYER_ATM)) >> + len = (len / 53) * 48; > > You probably want to do the multiply first. > But why not scale rate_bytes_ps instead. > >> + if (len > r->overhead) >> + len -= r->overhead; >> + else >> + len = 0; >> + >> + return len; >> +} > > Personally I'd work out how much time you have to send each byte. > So if you want a rate of 1MB/sec you have a 'time cost' per byte of 1000ns. > The cost of sending a packet is simply the length multiplied by this cost. > To work out whether a packet can be sent you have a credit variable that > tracks current time. > If 'credit > now' the packet can't be sent, queue and schedule a wakeup. > if 'credit + backlog < now' set credit = now - backlog. > if 'credit <= now' send the packet and add the packet's 'cost' to 'credit'. > > In the non-ratelimited case this is almost no work. > > You'd probably need to work in 1/1024ns time units and/or blocks of 16 bytes. > > David > > > > > . > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH net v6 1/2] net: sched: tbf: fix the calculation of max_size 2013-12-06 10:56 ` David Laight 2013-12-09 2:26 ` Yang Yingliang @ 2013-12-09 3:26 ` Yang Yingliang 2013-12-09 10:07 ` David Laight 1 sibling, 1 reply; 14+ messages in thread From: Yang Yingliang @ 2013-12-09 3:26 UTC (permalink / raw) To: David Laight, davem, netdev; +Cc: eric.dumazet, brouer, jpirko, jbrouer On 2013/12/6 18:56, David Laight wrote: >> From: Yang Yingliang > ... >> >> +/* Time to Length, convert time in ns to length in bytes >> + * to determinate how many bytes can be sent in given time. >> + */ >> +static u64 psched_ns_t2l(const struct psched_ratecfg *r, >> + u64 time_in_ns) >> +{ >> + /* The formula is : >> + * len = (time_in_ns * r->rate_bytes_ps) / NSEC_PER_SEC >> + */ >> + u64 len = time_in_ns * r->rate_bytes_ps; >> + >> + do_div(len, NSEC_PER_SEC); > > You are multiplying two values then dividing by 10**9 > I'd guess that the intermediate value might exceed 2**64. > >> + if (unlikely(r->linklayer == TC_LINKLAYER_ATM)) >> + len = (len / 53) * 48; > > You probably want to do the multiply first. > But why not scale rate_bytes_ps instead. > When the linklayer is ATM, the formula to calculate tokens is: ((u64)(DIV_ROUND_UP(len,48)*53) * r->mult) >> r->shift So, I scale len here. Regards, Yang ^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: [PATCH net v6 1/2] net: sched: tbf: fix the calculation of max_size 2013-12-09 3:26 ` Yang Yingliang @ 2013-12-09 10:07 ` David Laight 2013-12-09 12:21 ` Yang Yingliang 0 siblings, 1 reply; 14+ messages in thread From: David Laight @ 2013-12-09 10:07 UTC (permalink / raw) To: Yang Yingliang, davem, netdev; +Cc: eric.dumazet, brouer, jpirko, jbrouer > From: Yang Yingliang > On 2013/12/6 18:56, David Laight wrote: > >> From: Yang Yingliang > > ... > > > > You are multiplying two values then dividing by 10**9 > > I'd guess that the intermediate value might exceed 2**64. > > > >> + if (unlikely(r->linklayer == TC_LINKLAYER_ATM)) > >> + len = (len / 53) * 48; > > > > You probably want to do the multiply first. > > But why not scale rate_bytes_ps instead. > > > > When the linklayer is ATM, the formula to calculate tokens is: > > ((u64)(DIV_ROUND_UP(len,48)*53) * r->mult) >> r->shift > > So, I scale len here. Seems to me that there are a lot of unnecessary multiplies and divides going on here. Looks to me like the code was originally written to require one multiply and one shift for each packet. In any case the latter code is allowing for more of the ATM cell overhead. I'm not at all sure the intent is to remove that when setting up the constants. OTOH should this code be worrying about the packet overheads at all? Does it add in the ethernet pre-emable, CRC and inter packet gap? I'd guess that the most the ATM code should do it round the length up to a multiple of 48 (user payload in a cell). David ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH net v6 1/2] net: sched: tbf: fix the calculation of max_size 2013-12-09 10:07 ` David Laight @ 2013-12-09 12:21 ` Yang Yingliang 2013-12-09 13:10 ` [PATCH RFC ] " Yang Yingliang 2013-12-09 13:11 ` [PATCH net v6 1/2] " David Laight 0 siblings, 2 replies; 14+ messages in thread From: Yang Yingliang @ 2013-12-09 12:21 UTC (permalink / raw) To: David Laight, davem, netdev; +Cc: eric.dumazet, brouer, jpirko, jbrouer On 2013/12/9 18:07, David Laight wrote: >> From: Yang Yingliang >> On 2013/12/6 18:56, David Laight wrote: >>>> From: Yang Yingliang >>> ... >>> >>> You are multiplying two values then dividing by 10**9 >>> I'd guess that the intermediate value might exceed 2**64. >>> >>>> + if (unlikely(r->linklayer == TC_LINKLAYER_ATM)) >>>> + len = (len / 53) * 48; >>> >>> You probably want to do the multiply first. >>> But why not scale rate_bytes_ps instead. >>> >> >> When the linklayer is ATM, the formula to calculate tokens is: >> >> ((u64)(DIV_ROUND_UP(len,48)*53) * r->mult) >> r->shift >> >> So, I scale len here. > > Seems to me that there are a lot of unnecessary multiplies and > divides going on here. > Looks to me like the code was originally written to require one > multiply and one shift for each packet. The way to calc tokens in psched_l2t_ns() means: we got a payload which length is 'len'. To get the actual size we need send, round len up to a multiple of 53. After getting the size, we do multiply and shift to calculate tokens that we need. > > In any case the latter code is allowing for more of the ATM cell > overhead. I'm not at all sure the intent is to remove that when > setting up the constants. > > OTOH should this code be worrying about the packet overheads at all? > Does it add in the ethernet pre-emable, CRC and inter packet gap? This overhead is sent from userspace by tc. > > I'd guess that the most the ATM code should do it round the length > up to a multiple of 48 (user payload in a cell). 53 is ATM cell size include header, sending a header needs tokens as well. I'd guess round the length up to a multiple of 53 in psched_l2t_ns() is necessary. Regards, Yang ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH RFC ] net: sched: tbf: fix the calculation of max_size 2013-12-09 12:21 ` Yang Yingliang @ 2013-12-09 13:10 ` Yang Yingliang 2013-12-09 15:12 ` Eric Dumazet 2013-12-09 13:11 ` [PATCH net v6 1/2] " David Laight 1 sibling, 1 reply; 14+ messages in thread From: Yang Yingliang @ 2013-12-09 13:10 UTC (permalink / raw) To: davem, netdev; +Cc: David Laight, eric.dumazet, brouer, jpirko, jbrouer From: Yang Yingliang <yangyingliang@huawei.com> Current max_size is caluated from rate table. Now, the rate table has been replaced and it's wrong to caculate max_size based on this rate table. It can lead wrong calculation of max_size. The burst in kernel may be lower than user asked, because burst may gets some loss when transform it to buffer(E.g. "burst 40kb rate 30mbit/s") and it seems we cannot avoid this loss. Burst's value(max_size) based on rate table may be equal user asked. If a packet's length is max_size, this packet will be stalled in tbf_dequeue() because its length is above the burst in kernel so that it cannot get enough tokens. The max_size guards against enqueuing packet sizes above q->buffer "time" in tbf_enqueue(). To make consistent with the calculation of tokens, this patch add a helper psched_ns_t2l() to calculate burst(max_size) directly to fix this problem. After this fix, we can support to using 64bit rates to calculate burst as well. Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> --- net/sched/sch_tbf.c | 203 +++++++++++++++++++++++++++++++++++++++------------- 1 file changed, 155 insertions(+), 48 deletions(-) diff --git a/net/sched/sch_tbf.c b/net/sched/sch_tbf.c index a609005..b692d02 100644 --- a/net/sched/sch_tbf.c +++ b/net/sched/sch_tbf.c @@ -117,6 +117,115 @@ struct tbf_sched_data { struct qdisc_watchdog watchdog; /* Watchdog timer */ }; +/* do (u64 + u64) with checking if overflows */ +static u64 do_plus64(u64 a, u64 b) +{ + + short shift = 0; + + if (a == ULLONG_MAX || b == ULLONG_MAX) + return ULLONG_MAX; + + while (shift <= 63) { + u64 _a = a << shift; + u64 _b = b << shift; + if (_a & _b & (1ULL << 63)) { + /* overflow happens */ + return ULLONG_MAX; + } else if (!((_a | _b) & (1ULL << 63))) { + /* won't overflow*/ + return a + b; + } + shift++; + } + + return a + b; +} + + +/* Time to Length, convert time in ns to length in bytes + * to determinate how many bytes can be sent in given time. + */ +static u64 psched_ns_t2l(const struct psched_ratecfg *r, + u64 time_in_ns) +{ + /* The original formula is : + * len = (time_in_ns * rate_bytes_ps) / NSEC_PER_SEC => + * + * len = time_in_ns * (rate_bytes_ps / NSEC_PER_SEC) + + * time_in_ns * ((rate_bytes_ps % NSEC_PER_SEC) / NSEC_PER_SEC) + * + * time_in_ns * ((rate_bytes_ps % NSEC_PER_SEC) / NSEC_PER_SEC) => + * (rate_bytes_ps % NSEC_PER_SEC) * (time_in_ns / NSEC_PER_SEC + + * (time_in_ns % NSEC_PER_SEC) / NSEC_PER_SEC) + * + * The final formula is: + * len = time_in_ns * (rate_bytes_ps / NSEC_PER_SEC) + + * time_in_ns * ((rate_bytes_ps % NSEC_PER_SEC) / NSEC_PER_SEC) + + * (time_in_ns % NSEC_PER_SEC) * (rate_bytes_ps % NSEC_PER_SEC) / NSEC_PER_SEC + */ + u64 r_integer = r->rate_bytes_ps / NSEC_PER_SEC; + u64 multi64 = 0; + u32 multi32 = 0; + u64 len = 0; + + /* step1. calulate: time_in_ns * (rate_bytes_ps / NSEC_PER_SEC) + * + * u64 * u64, if both of them bigger than UINT_MAX, it will overflows, + * so equals to u64 * u32 => (High1+Low1) * (0+Low2) => + * (H1*L2)<<32 + (L1*L2) + */ + if (time_in_ns >> 32 && r_integer >> 32) { + /* overflow happens */ + len = ULLONG_MAX; + } else if (time_in_ns >> 32) { + multi64 = time_in_ns; + multi32 = r_integer; + } else if (r_integer >> 32) { + multi64 = r_integer; + multi32 = time_in_ns; + } else + len = r_integer * time_in_ns; + + if (!len) { + /* len = H1 * L2 */ + len = (multi64 >> 32) * multi32; + if (len > UINT_MAX) { + /* overflow happens */ + len = ULLONG_MAX; + } else { + /* len = (H1 * L2) << 32 */ + len <<= 32; + + /* len = (H1 * L2) << 32 + (L1 * L2) */ + len = do_plus64(len, (multi64 & ~0U) * multi32); + } + } + + /* step2. calulate: len + + * time_in_ns * ((rate_bytes_ps % NSEC_PER_SEC) / NSEC_PER_SEC) + */ + len = do_plus64(len, + time_in_ns * + ((r->rate_bytes_ps % NSEC_PER_SEC) / NSEC_PER_SEC)); + + /* step3. calulate: len + + * (time_in_ns % NSEC_PER_SEC) * (rate_bytes_ps % NSEC_PER_SEC) / NSEC_PER_SEC */ + len = do_plus64(len, + (time_in_ns % NSEC_PER_SEC) * + (r->rate_bytes_ps % NSEC_PER_SEC) / + NSEC_PER_SEC); + + if (unlikely(r->linklayer == TC_LINKLAYER_ATM)) + len = (len / 53) * 48; + + if (len > r->overhead) + len -= r->overhead; + else + len = 0; + + return len; +} /* * Return length of individual segments of a gso packet, @@ -289,10 +398,9 @@ static int tbf_change(struct Qdisc *sch, struct nlattr *opt) struct tbf_sched_data *q = qdisc_priv(sch); struct nlattr *tb[TCA_TBF_MAX + 1]; struct tc_tbf_qopt *qopt; - struct qdisc_rate_table *rtab = NULL; - struct qdisc_rate_table *ptab = NULL; struct Qdisc *child = NULL; - int max_size, n; + u64 max_size; + s64 buffer, mtu; u64 rate64 = 0, prate64 = 0; err = nla_parse_nested(tb, TCA_TBF_MAX, opt, tbf_policy); @@ -304,38 +412,13 @@ static int tbf_change(struct Qdisc *sch, struct nlattr *opt) goto done; qopt = nla_data(tb[TCA_TBF_PARMS]); - rtab = qdisc_get_rtab(&qopt->rate, tb[TCA_TBF_RTAB]); - if (rtab == NULL) - goto done; - - if (qopt->peakrate.rate) { - if (qopt->peakrate.rate > qopt->rate.rate) - ptab = qdisc_get_rtab(&qopt->peakrate, tb[TCA_TBF_PTAB]); - if (ptab == NULL) - goto done; - } - - for (n = 0; n < 256; n++) - if (rtab->data[n] > qopt->buffer) - break; - max_size = (n << qopt->rate.cell_log) - 1; - if (ptab) { - int size; - - for (n = 0; n < 256; n++) - if (ptab->data[n] > qopt->mtu) - break; - size = (n << qopt->peakrate.cell_log) - 1; - if (size < max_size) - max_size = size; - } - if (max_size < 0) - goto done; + if (qopt->rate.linklayer == TC_LINKLAYER_UNAWARE) + qdisc_put_rtab(qdisc_get_rtab(&qopt->rate, + tb[TCA_TBF_RTAB])); - if (max_size < psched_mtu(qdisc_dev(sch))) - pr_warn_ratelimited("sch_tbf: burst %u is lower than device %s mtu (%u) !\n", - max_size, qdisc_dev(sch)->name, - psched_mtu(qdisc_dev(sch))); + if (qopt->peakrate.linklayer == TC_LINKLAYER_UNAWARE) + qdisc_put_rtab(qdisc_get_rtab(&qopt->peakrate, + tb[TCA_TBF_PTAB])); if (q->qdisc != &noop_qdisc) { err = fifo_set_limit(q->qdisc, qopt->limit); @@ -349,38 +432,62 @@ static int tbf_change(struct Qdisc *sch, struct nlattr *opt) } } + buffer = PSCHED_TICKS2NS(qopt->buffer); + mtu = PSCHED_TICKS2NS(qopt->mtu); + sch_tree_lock(sch); if (child) { qdisc_tree_decrease_qlen(q->qdisc, q->qdisc->q.qlen); qdisc_destroy(q->qdisc); q->qdisc = child; } - q->limit = qopt->limit; - q->mtu = PSCHED_TICKS2NS(qopt->mtu); - q->max_size = max_size; - q->buffer = PSCHED_TICKS2NS(qopt->buffer); - q->tokens = q->buffer; - q->ptokens = q->mtu; if (tb[TCA_TBF_RATE64]) rate64 = nla_get_u64(tb[TCA_TBF_RATE64]); - psched_ratecfg_precompute(&q->rate, &rtab->rate, rate64); - if (ptab) { + psched_ratecfg_precompute(&q->rate, &qopt->rate, rate64); + if (!q->rate.rate_bytes_ps) + goto unlock_done; + + max_size = min_t(u64, psched_ns_t2l(&q->rate, buffer), ~0U); + + if (qopt->peakrate.rate) { if (tb[TCA_TBF_PRATE64]) prate64 = nla_get_u64(tb[TCA_TBF_PRATE64]); - psched_ratecfg_precompute(&q->peak, &ptab->rate, prate64); + psched_ratecfg_precompute(&q->peak, &qopt->peakrate, prate64); + if (q->peak.rate_bytes_ps <= q->rate.rate_bytes_ps) { + pr_warn_ratelimited("sch_tbf: peakrate %llu is lower than or equals to rate %llu !\n", + q->peak.rate_bytes_ps, q->rate.rate_bytes_ps); + goto unlock_done; + } + + max_size = min_t(u64, max_size, psched_ns_t2l(&q->peak, mtu)); q->peak_present = true; } else { q->peak_present = false; } + if (!max_size) + goto unlock_done; + + if (max_size < psched_mtu(qdisc_dev(sch))) + pr_warn_ratelimited("sch_tbf: burst %llu is lower than device %s mtu (%u) !\n", + max_size, qdisc_dev(sch)->name, + psched_mtu(qdisc_dev(sch))); + + q->limit = qopt->limit; + q->mtu = mtu; + q->max_size = max_size; + q->buffer = buffer; + q->tokens = q->buffer; + q->ptokens = q->mtu; + sch_tree_unlock(sch); - err = 0; + return 0; + +unlock_done: + sch_tree_unlock(sch); + err = -EINVAL; done: - if (rtab) - qdisc_put_rtab(rtab); - if (ptab) - qdisc_put_rtab(ptab); return err; } -- 1.8.0 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH RFC ] net: sched: tbf: fix the calculation of max_size 2013-12-09 13:10 ` [PATCH RFC ] " Yang Yingliang @ 2013-12-09 15:12 ` Eric Dumazet 2013-12-10 2:29 ` Yang Yingliang 0 siblings, 1 reply; 14+ messages in thread From: Eric Dumazet @ 2013-12-09 15:12 UTC (permalink / raw) To: Yang Yingliang; +Cc: davem, netdev, David Laight, brouer, jpirko, jbrouer On Mon, 2013-12-09 at 21:10 +0800, Yang Yingliang wrote: > From: Yang Yingliang <yangyingliang@huawei.com> > > Current max_size is caluated from rate table. Now, the rate table > has been replaced and it's wrong to caculate max_size based on this > rate table. It can lead wrong calculation of max_size. > > The burst in kernel may be lower than user asked, because burst may gets > some loss when transform it to buffer(E.g. "burst 40kb rate 30mbit/s") > and it seems we cannot avoid this loss. Burst's value(max_size) based on > rate table may be equal user asked. If a packet's length is max_size, this > packet will be stalled in tbf_dequeue() because its length is above the > burst in kernel so that it cannot get enough tokens. The max_size guards > against enqueuing packet sizes above q->buffer "time" in tbf_enqueue(). > > To make consistent with the calculation of tokens, this patch add a helper > psched_ns_t2l() to calculate burst(max_size) directly to fix this problem. > > After this fix, we can support to using 64bit rates to calculate burst as well. > > Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> > --- > net/sched/sch_tbf.c | 203 +++++++++++++++++++++++++++++++++++++++------------- > 1 file changed, 155 insertions(+), 48 deletions(-) There is no way I am going to study this patch. There is no way 32bit * 32bit multiply can overflow 64bit. If the user gave a stupid input, like a buffer bigger than 4 sec, just say no. Nobody ever did such stupid things in the past, and nobody will do in the future because it is so wrong and useless. q->mtu = max_t(u64, ~0U, PSCHED_TICKS2NS(qopt->mtu)); q->buffer = max_t(u64, ~0U, PSCHED_TICKS2NS(qopt->buffer)); Could we keep this code understandable, please ? ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH RFC ] net: sched: tbf: fix the calculation of max_size 2013-12-09 15:12 ` Eric Dumazet @ 2013-12-10 2:29 ` Yang Yingliang 2013-12-10 2:39 ` Eric Dumazet 0 siblings, 1 reply; 14+ messages in thread From: Yang Yingliang @ 2013-12-10 2:29 UTC (permalink / raw) To: Eric Dumazet; +Cc: davem, netdev, David Laight, brouer, jpirko, jbrouer On 2013/12/9 23:12, Eric Dumazet wrote: > On Mon, 2013-12-09 at 21:10 +0800, Yang Yingliang wrote: >> From: Yang Yingliang <yangyingliang@huawei.com> >> >> Current max_size is caluated from rate table. Now, the rate table >> has been replaced and it's wrong to caculate max_size based on this >> rate table. It can lead wrong calculation of max_size. >> >> The burst in kernel may be lower than user asked, because burst may gets >> some loss when transform it to buffer(E.g. "burst 40kb rate 30mbit/s") >> and it seems we cannot avoid this loss. Burst's value(max_size) based on >> rate table may be equal user asked. If a packet's length is max_size, this >> packet will be stalled in tbf_dequeue() because its length is above the >> burst in kernel so that it cannot get enough tokens. The max_size guards >> against enqueuing packet sizes above q->buffer "time" in tbf_enqueue(). >> >> To make consistent with the calculation of tokens, this patch add a helper >> psched_ns_t2l() to calculate burst(max_size) directly to fix this problem. >> >> After this fix, we can support to using 64bit rates to calculate burst as well. >> >> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> >> --- >> net/sched/sch_tbf.c | 203 +++++++++++++++++++++++++++++++++++++++------------- >> 1 file changed, 155 insertions(+), 48 deletions(-) > > > There is no way I am going to study this patch. > > There is no way 32bit * 32bit multiply can overflow 64bit. > > If the user gave a stupid input, like a buffer bigger than 4 sec, just > say no. Nobody ever did such stupid things in the past, and nobody will > do in the future because it is so wrong and useless. > > q->mtu = max_t(u64, ~0U, PSCHED_TICKS2NS(qopt->mtu)); > > q->buffer = max_t(u64, ~0U, PSCHED_TICKS2NS(qopt->buffer)); > > Could we keep this code understandable, please ? Hmm, I cannot figure out why max_t, did you mean min_t? > > > > > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH RFC ] net: sched: tbf: fix the calculation of max_size 2013-12-10 2:29 ` Yang Yingliang @ 2013-12-10 2:39 ` Eric Dumazet 0 siblings, 0 replies; 14+ messages in thread From: Eric Dumazet @ 2013-12-10 2:39 UTC (permalink / raw) To: Yang Yingliang; +Cc: davem, netdev, David Laight, brouer, jpirko, jbrouer On Tue, 2013-12-10 at 10:29 +0800, Yang Yingliang wrote: > Hmm, I cannot figure out why max_t, did you mean min_t? probably ! ^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: [PATCH net v6 1/2] net: sched: tbf: fix the calculation of max_size 2013-12-09 12:21 ` Yang Yingliang 2013-12-09 13:10 ` [PATCH RFC ] " Yang Yingliang @ 2013-12-09 13:11 ` David Laight 2013-12-10 2:04 ` Yang Yingliang 1 sibling, 1 reply; 14+ messages in thread From: David Laight @ 2013-12-09 13:11 UTC (permalink / raw) To: Yang Yingliang, davem, netdev; +Cc: eric.dumazet, brouer, jpirko, jbrouer > From: Yang Yingliang [mailto:yangyingliang@huawei.com] > On 2013/12/9 18:07, David Laight wrote: > >> From: Yang Yingliang > >> On 2013/12/6 18:56, David Laight wrote: > >>>> From: Yang Yingliang > >>> ... > >>> > >>> You are multiplying two values then dividing by 10**9 > >>> I'd guess that the intermediate value might exceed 2**64. > >>> > >>>> + if (unlikely(r->linklayer == TC_LINKLAYER_ATM)) > >>>> + len = (len / 53) * 48; > >>> > >>> You probably want to do the multiply first. > >>> But why not scale rate_bytes_ps instead. > >>> > >> > >> When the linklayer is ATM, the formula to calculate tokens is: > >> > >> ((u64)(DIV_ROUND_UP(len,48)*53) * r->mult) >> r->shift > >> > >> So, I scale len here. > > > > Seems to me that there are a lot of unnecessary multiplies and > > divides going on here. > > Looks to me like the code was originally written to require one > > multiply and one shift for each packet. > > The way to calc tokens in psched_l2t_ns() means: > we got a payload which length is 'len'. To get the actual size we > need send, round len up to a multiple of 53. After getting the > size, we do multiply and shift to calculate tokens that we need. > > > > > In any case the latter code is allowing for more of the ATM cell > > overhead. I'm not at all sure the intent is to remove that when > > setting up the constants. > > > > OTOH should this code be worrying about the packet overheads at all? > > Does it add in the ethernet pre-emable, CRC and inter packet gap? > > This overhead is sent from userspace by tc. > > > > > I'd guess that the most the ATM code should do it round the length > > up to a multiple of 48 (user payload in a cell). > > 53 is ATM cell size include header, sending a header needs tokens as > well. I'd guess round the length up to a multiple of 53 in psched_l2t_ns() > is necessary. The point I'm trying to make is that in one place you multiply by 53/48 and in the other you multiply be 48/53. These two operations almost certainly cancel each other out. Or would modulo small rounding errors if you did the multiplies first. So why do either of them? Whether you should be rounding up ATM data to a whole number of cells is a different question entirely. David ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH net v6 1/2] net: sched: tbf: fix the calculation of max_size 2013-12-09 13:11 ` [PATCH net v6 1/2] " David Laight @ 2013-12-10 2:04 ` Yang Yingliang 0 siblings, 0 replies; 14+ messages in thread From: Yang Yingliang @ 2013-12-10 2:04 UTC (permalink / raw) To: David Laight; +Cc: davem, netdev, eric.dumazet, brouer, jpirko, jbrouer On 2013/12/9 21:11, David Laight wrote: >> From: Yang Yingliang [mailto:yangyingliang@huawei.com] >> On 2013/12/9 18:07, David Laight wrote: >>>> From: Yang Yingliang >>>> On 2013/12/6 18:56, David Laight wrote: >>>>>> From: Yang Yingliang >>>>> ... >>>>> >>>>> You are multiplying two values then dividing by 10**9 >>>>> I'd guess that the intermediate value might exceed 2**64. >>>>> >>>>>> + if (unlikely(r->linklayer == TC_LINKLAYER_ATM)) >>>>>> + len = (len / 53) * 48; >>>>> >>>>> You probably want to do the multiply first. >>>>> But why not scale rate_bytes_ps instead. >>>>> >>>> >>>> When the linklayer is ATM, the formula to calculate tokens is: >>>> >>>> ((u64)(DIV_ROUND_UP(len,48)*53) * r->mult) >> r->shift >>>> >>>> So, I scale len here. >>> >>> Seems to me that there are a lot of unnecessary multiplies and >>> divides going on here. >>> Looks to me like the code was originally written to require one >>> multiply and one shift for each packet. >> >> The way to calc tokens in psched_l2t_ns() means: >> we got a payload which length is 'len'. To get the actual size we >> need send, round len up to a multiple of 53. After getting the >> size, we do multiply and shift to calculate tokens that we need. >> >>> >>> In any case the latter code is allowing for more of the ATM cell >>> overhead. I'm not at all sure the intent is to remove that when >>> setting up the constants. >>> >>> OTOH should this code be worrying about the packet overheads at all? >>> Does it add in the ethernet pre-emable, CRC and inter packet gap? >> >> This overhead is sent from userspace by tc. >> >>> >>> I'd guess that the most the ATM code should do it round the length >>> up to a multiple of 48 (user payload in a cell). >> >> 53 is ATM cell size include header, sending a header needs tokens as >> well. I'd guess round the length up to a multiple of 53 in psched_l2t_ns() >> is necessary. > > The point I'm trying to make is that in one place you multiply by 53/48 > and in the other you multiply be 48/53. > These two operations almost certainly cancel each other out. > Or would modulo small rounding errors if you did the multiplies first. > So why do either of them? Current kernel does multiplication by 53/48 to calculate tokens in psched_l2t_ns(). It expands a packet length. So I need reduce the length which is used as max_size in psched_ns_t2l(), or it will be too big to get enough tokens. len = (len / 53) * 48; Here I do division first, because the length to calculate tokens is 53bytes at least. If len < 53, means that no packet can pass, we should set len to 0. Regards, Yang > > Whether you should be rounding up ATM data to a whole number of cells > is a different question entirely. > > David > > > > > . > ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH net v6 2/2] net: sched: htb: fix the calculation of quantum 2013-12-06 7:00 [PATCH net v6 0/2] net: sched: fix two issues Yang Yingliang 2013-12-06 7:00 ` [PATCH net v6 1/2] net: sched: tbf: fix the calculation of max_size Yang Yingliang @ 2013-12-06 7:00 ` Yang Yingliang 1 sibling, 0 replies; 14+ messages in thread From: Yang Yingliang @ 2013-12-06 7:00 UTC (permalink / raw) To: davem, netdev; +Cc: eric.dumazet, brouer, jpirko, jbrouer Now, 32bit rates may be not the true rate. So use rate_bytes_ps which is from max(rate32, rate64) to calcualte quantum. Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Acked-by: Eric Dumazet <edumazet@google.com> --- net/sched/sch_htb.c | 20 ++++++++++++-------- 1 file changed, 12 insertions(+), 8 deletions(-) diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c index 0e1e38b..717b210 100644 --- a/net/sched/sch_htb.c +++ b/net/sched/sch_htb.c @@ -1477,11 +1477,22 @@ static int htb_change_class(struct Qdisc *sch, u32 classid, sch_tree_lock(sch); } + rate64 = tb[TCA_HTB_RATE64] ? nla_get_u64(tb[TCA_HTB_RATE64]) : 0; + + ceil64 = tb[TCA_HTB_CEIL64] ? nla_get_u64(tb[TCA_HTB_CEIL64]) : 0; + + psched_ratecfg_precompute(&cl->rate, &hopt->rate, rate64); + psched_ratecfg_precompute(&cl->ceil, &hopt->ceil, ceil64); + /* it used to be a nasty bug here, we have to check that node * is really leaf before changing cl->un.leaf ! */ if (!cl->level) { - cl->quantum = hopt->rate.rate / q->rate2quantum; + u64 quantum = cl->rate.rate_bytes_ps; + + do_div(quantum, q->rate2quantum); + cl->quantum = min_t(u64, quantum, INT_MAX); + if (!hopt->quantum && cl->quantum < 1000) { pr_warning( "HTB: quantum of class %X is small. Consider r2q change.\n", @@ -1500,13 +1511,6 @@ static int htb_change_class(struct Qdisc *sch, u32 classid, cl->prio = TC_HTB_NUMPRIO - 1; } - rate64 = tb[TCA_HTB_RATE64] ? nla_get_u64(tb[TCA_HTB_RATE64]) : 0; - - ceil64 = tb[TCA_HTB_CEIL64] ? nla_get_u64(tb[TCA_HTB_CEIL64]) : 0; - - psched_ratecfg_precompute(&cl->rate, &hopt->rate, rate64); - psched_ratecfg_precompute(&cl->ceil, &hopt->ceil, ceil64); - cl->buffer = PSCHED_TICKS2NS(hopt->buffer); cl->cbuffer = PSCHED_TICKS2NS(hopt->cbuffer); -- 1.8.0 ^ permalink raw reply related [flat|nested] 14+ messages in thread
end of thread, other threads:[~2013-12-10 2:39 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-12-06 7:00 [PATCH net v6 0/2] net: sched: fix two issues Yang Yingliang 2013-12-06 7:00 ` [PATCH net v6 1/2] net: sched: tbf: fix the calculation of max_size Yang Yingliang 2013-12-06 10:56 ` David Laight 2013-12-09 2:26 ` Yang Yingliang 2013-12-09 3:26 ` Yang Yingliang 2013-12-09 10:07 ` David Laight 2013-12-09 12:21 ` Yang Yingliang 2013-12-09 13:10 ` [PATCH RFC ] " Yang Yingliang 2013-12-09 15:12 ` Eric Dumazet 2013-12-10 2:29 ` Yang Yingliang 2013-12-10 2:39 ` Eric Dumazet 2013-12-09 13:11 ` [PATCH net v6 1/2] " David Laight 2013-12-10 2:04 ` Yang Yingliang 2013-12-06 7:00 ` [PATCH net v6 2/2] net: sched: htb: fix the calculation of quantum Yang Yingliang
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).