From mboxrd@z Thu Jan 1 00:00:00 1970 From: Yang Yingliang Subject: Re: [PATCH net v2 1/2] net: sched: tbf: fix calculation of max_size Date: Tue, 19 Nov 2013 14:04:25 +0800 Message-ID: <528AFF69.5020603@huawei.com> References: <1384763964-5000-1-git-send-email-yangyingliang@huawei.com> <1384763964-5000-2-git-send-email-yangyingliang@huawei.com> <20131118133246.02651392@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit Cc: , , , , To: Jesper Dangaard Brouer Return-path: Received: from szxga02-in.huawei.com ([119.145.14.65]:39657 "EHLO szxga02-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750707Ab3KSGFb (ORCPT ); Tue, 19 Nov 2013 01:05:31 -0500 In-Reply-To: <20131118133246.02651392@redhat.com> Sender: netdev-owner@vger.kernel.org List-ID: On 2013/11/18 20:32, Jesper Dangaard Brouer wrote: > On Mon, 18 Nov 2013 16:39:23 +0800 > Yang Yingliang wrote: > >> commit b757c9336d63f94c6b57532(tbf: improved accuracy at high rates) >> introduce a regression. >> >> With the follow command: >> tc qdisc add dev eth1 root handle 1: tbf latency 50ms burst 10KB rate 30gbit mtu 64k >> >> Without this patch, the max_size value is 10751(bytes). >> But, in fact, the real max_size value should be smaller than 7440(bytes). >> Or a packet whose length is bigger than 7440 will cause network congestion. >> Because the packet is so big that can't get enough tokens. Even all the tokens >> in the buffer is given to the packet. >> >> With this patch, the max_size value is 7440(bytes). >> The packets whose length is bigger than 7440(bytes) will be dropped or reshape >> in tbf_enqueue(). > > > I acknowledge that TBF seems to have some dependencies to the > userspace constructed rate table (which we do NOT use anymore in the > kernel). And that these should be fixed. > > But I'm not sure that your patch is the best solution... and the patch > also contains some issues, see inlined comments. > > The main annoying problem is *again* how the rate table system got > removed, in the kernel, but nobody fixed userspace. > > > So, the main problem is that qopt->buffer (send from userspace/tc) is > in a "time-format" (user input "burst" in bytes). Which used-to, make > sense because the rate table used the same "time-format". > > Now you are reversing this calculation of "q->buffer" (token burst) > back into bytes, so we can choose "max_size" (to avoid a problem in > tbf_dequeue()). > > I don't like this converting back-and-forth, I'm worried about > rounding errors. > > The easiest "hack" solution would be: > > for (n = 0; n < 65535; n++) > if (psched_l2t_ns(&qopt->rate, n) > q->buffer) > break; > max_size = n; > > Unfortunately we have to keep backward compat with iproute2/tc, but > IMHO it would be a lot easier, if we could fix userspace, and remove > all the length-to-time calculations, as they should now be the > responsibility of the kernel. Well, wishful thinking... > > > >> Signed-off-by: Yang Yingliang >> --- >> include/net/sch_generic.h | 46 ++++++++++++++++++++++++++++++++ >> net/sched/sch_tbf.c | 67 ++++++++++++++++++++++++++--------------------- >> 2 files changed, 83 insertions(+), 30 deletions(-) >> >> diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h >> index d0a6321..8da64f3 100644 >> --- a/include/net/sch_generic.h >> +++ b/include/net/sch_generic.h >> @@ -701,6 +701,52 @@ static inline u64 psched_l2t_ns(const struct psched_ratecfg *r, >> return ((u64)len * r->mult) >> r->shift; >> } >> >> +/* Time to Length, convert time in ns to length in bytes >> + * to determinate how many bytes can be sent in given time. >> + */ >> +static inline u64 psched_ns_t2l(const struct psched_ratecfg *r, >> + u64 time_in_ns) >> +{ >> + u64 len = time_in_ns; >> + u8 shift = r->shift; >> + bool is_div = false; >> + >> + /* The formula is : >> + * len = (time_in_ns << shift) / mult >> + * when time_in_ns does shift, it would overflow. >> + * If overflow happens first time, do division. >> + * Then do shift. If it happens again, >> + * set lenth to ~0ULL. >> + */ >> + while (shift) { >> + if (len & (1ULL << 63)) { >> + if (!is_div) { >> + len = div64_u64(len, r->mult); >> + is_div = true; >> + } else { >> + /* overflow happens */ >> + len = ~0ULL; >> + is_div = true; >> + break; >> + } >> + } >> + len <<= 1; >> + shift--; >> + } >> + if (!is_div) >> + len = div64_u64(len, r->mult); >> + >> + if (unlikely(r->linklayer == TC_LINKLAYER_ATM)) >> + len = (len / 53) * 48; >> + >> + if (len > r->overhead) >> + len -= r->overhead; >> + else >> + len = 0; >> + >> + return len; >> +} >> + > > Are we 100% sure, that the conversion is accurate between > psched_l2t_ns() and psched_ns_t2l for all possible rates. > > E.g. why is it that r->shift have to be recalculate (orig created in > psched_ratecfg_precompute()). It don't recalculate r->shift, it just do shift bit by bit in case that len overflows. If len overflows, set it to ~0ULL. I would like to use the way you suggest to calculate the max_size. So forget about this function.:) > > >> void psched_ratecfg_precompute(struct psched_ratecfg *r, >> const struct tc_ratespec *conf, >> u64 rate64); >> diff --git a/net/sched/sch_tbf.c b/net/sched/sch_tbf.c >> index 68f9859..eb9ce7b 100644 >> --- a/net/sched/sch_tbf.c >> +++ b/net/sched/sch_tbf.c >> @@ -279,7 +279,7 @@ static int tbf_change(struct Qdisc *sch, struct nlattr *opt) >> struct qdisc_rate_table *rtab = NULL; >> struct qdisc_rate_table *ptab = NULL; >> struct Qdisc *child = NULL; >> - int max_size, n; >> + u32 max_size = 0; >> u64 rate64 = 0, prate64 = 0; >> >> err = nla_parse_nested(tb, TCA_TBF_MAX, opt, tbf_policy); >> @@ -291,33 +291,20 @@ static int tbf_change(struct Qdisc *sch, struct nlattr *opt) >> goto done; >> >> qopt = nla_data(tb[TCA_TBF_PARMS]); >> - rtab = qdisc_get_rtab(&qopt->rate, tb[TCA_TBF_RTAB]); >> - if (rtab == NULL) >> - goto done; >> - >> - if (qopt->peakrate.rate) { >> - if (qopt->peakrate.rate > qopt->rate.rate) >> - ptab = qdisc_get_rtab(&qopt->peakrate, tb[TCA_TBF_PTAB]); >> - if (ptab == NULL) >> - goto done; >> + if (qopt->rate.linklayer == TC_LINKLAYER_UNAWARE) { >> + rtab = qdisc_get_rtab(&qopt->rate, tb[TCA_TBF_RTAB]); >> + if (rtab) { >> + qdisc_put_rtab(rtab); >> + rtab = NULL; >> + } > > This is correct code construct, for backward compatible reading of the > rate table. > > But, notice how your free this at once, which means you also should > cleanup the exit/done: section. Agreed, I will change it in v3. Thanks! > > > >> } >> - >> - for (n = 0; n < 256; n++) >> - if (rtab->data[n] > qopt->buffer) >> - break; >> - max_size = (n << qopt->rate.cell_log) - 1; > > This is here we could do the quick-and-dirty solution: > > for (n = 0; n < 65535; n++) > if (psched_l2t_ns(&qopt->rate, n) > q->buffer) > break; > max_size = n; > > > >> - if (ptab) { >> - int size; >> - >> - for (n = 0; n < 256; n++) >> - if (ptab->data[n] > qopt->mtu) >> - break; >> - size = (n << qopt->peakrate.cell_log) - 1; >> - if (size < max_size) >> - max_size = size; >> + if (qopt->peakrate.linklayer == TC_LINKLAYER_UNAWARE) { >> + ptab = qdisc_get_rtab(&qopt->peakrate, tb[TCA_TBF_PTAB]); >> + if (ptab) { >> + qdisc_put_rtab(ptab); >> + ptab = NULL; >> + } >> } >> - if (max_size < 0) >> - goto done; >> >> if (q->qdisc != &noop_qdisc) { >> err = fifo_set_limit(q->qdisc, qopt->limit); >> @@ -339,25 +326,45 @@ static int tbf_change(struct Qdisc *sch, struct nlattr *opt) >> } >> q->limit = qopt->limit; >> q->mtu = PSCHED_TICKS2NS(qopt->mtu); >> - q->max_size = max_size; >> q->buffer = PSCHED_TICKS2NS(qopt->buffer); >> q->tokens = q->buffer; >> q->ptokens = q->mtu; >> >> if (tb[TCA_TBF_RATE64]) >> rate64 = nla_get_u64(tb[TCA_TBF_RATE64]); >> - psched_ratecfg_precompute(&q->rate, &rtab->rate, rate64); >> - if (ptab) { >> + psched_ratecfg_precompute(&q->rate, &qopt->rate, rate64); >> + if (!q->rate.rate_bytes_ps) >> + goto unlock_done; >> + max_size = min_t(u64, psched_ns_t2l(&q->rate, q->buffer), ~0); >> + max_size = min_t(u32, max_size, (256 << qopt->rate.cell_log) - 1); > > The rate system and the rate.cell_log is not really used anymore, so > its a bit strange to use it. Perhaps it's even a bug to base a > calculation on this. cell_log is calculated from mtu(user input in bytes) in userspace. In the old calculation, max_size should be smaller than (256 << cell_log) which means it's smaller than mtu. > >> + >> + if (qopt->peakrate.rate) { >> + u64 size = 0; >> if (tb[TCA_TBF_PRATE64]) >> prate64 = nla_get_u64(tb[TCA_TBF_PRATE64]); >> - psched_ratecfg_precompute(&q->peak, &ptab->rate, prate64); >> + psched_ratecfg_precompute(&q->peak, &qopt->peakrate, prate64); >> + size = psched_ns_t2l(&q->peak, q->mtu); >> + max_size = min_t(u64, max_size, size); >> + max_size = min_t(u32, >> + max_size, >> + (256 << qopt->peakrate.cell_log) - 1); >> q->peak_present = true; >> } else { >> q->peak_present = false; >> } >> >> + if (!max_size) >> + goto unlock_done; >> + q->max_size = max_size; >> + >> sch_tree_unlock(sch); >> err = 0; >> + >> + if (0) { > > I really dislike this construct. I'm afraid what a dumb compile would > optimized this to. OK, change it in v3. Thanks! > >> +unlock_done: >> + sch_tree_unlock(sch); >> + err = -EINVAL; >> + } >> done: >> if (rtab) >> qdisc_put_rtab(rtab); > > This could be cleaned up, as we already have released the rtab's. > >