* [PATCH] sch_htb: ix the deficit overflows
@ 2009-11-27 8:14 Changli Gao
2009-11-28 0:04 ` Jarek Poplawski
0 siblings, 1 reply; 14+ messages in thread
From: Changli Gao @ 2009-11-27 8:14 UTC (permalink / raw)
To: Jamal Hadi Salim, David S. Miller; +Cc: netdev, xiaosuo
fix the deficit overflows.
HTB uses WDRR(Weighted Deficit Round Robin) algorithm to schedule the spare bandwidth, but it doesn't check if the deficit is big enough for the skb when dequeuing skb from a class. In some case(the quantum is smaller than the packet size), the deficit will be decreased, even when it is smaller than ZERO. At last, the deficit will overflows, and become MAX_INT.
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
----
sch_htb.c | 25 ++++++++++++++++++++-----
1 file changed, 20 insertions(+), 5 deletions(-)
diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
index 2e38d1a..293983e 100644
--- a/net/sched/sch_htb.c
+++ b/net/sched/sch_htb.c
@@ -783,6 +783,7 @@ static struct sk_buff *htb_dequeue_tree(struct htb_sched *q, int prio,
{
struct sk_buff *skb = NULL;
struct htb_class *cl, *start;
+ unsigned int len;
/* look initial class up in the row */
start = cl = htb_lookup_leaf(q->row[level] + prio, prio,
q->ptr[level] + prio,
@@ -815,9 +816,23 @@ next:
goto next;
}
- skb = cl->un.leaf.q->dequeue(cl->un.leaf.q);
- if (likely(skb != NULL))
- break;
+ skb = cl->un.leaf.q->ops->peek(cl->un.leaf.q);
+ if (likely(skb != NULL)) {
+ len = qdisc_pkt_len(skb);
+ if (len <= cl->un.leaf.deficit[level]) {
+ skb = qdisc_dequeue_peeked(cl->un.leaf.q);
+ break;
+ }
+ skb = NULL;
+ cl->un.leaf.deficit[level] += cl->quantum;
+ htb_next_rb_node((level ? cl->parent->un.inner.ptr :
+ q->ptr[0]) + prio);
+ cl = htb_lookup_leaf(q->row[level] + prio, prio,
+ q->ptr[level] + prio,
+ q->last_ptr_id[level] + prio);
+ start = cl;
+ goto next;
+ }
qdisc_warn_nonwc("htb", cl->un.leaf.q);
htb_next_rb_node((level ? cl->parent->un.inner.ptr : q->
@@ -829,8 +844,8 @@ next:
} while (cl != start);
if (likely(skb != NULL)) {
- cl->un.leaf.deficit[level] -= qdisc_pkt_len(skb);
- if (cl->un.leaf.deficit[level] < 0) {
+ cl->un.leaf.deficit[level] -= len;
+ if (cl->un.leaf.deficit[level] <= 0) {
cl->un.leaf.deficit[level] += cl->quantum;
htb_next_rb_node((level ? cl->parent->un.inner.ptr : q->
ptr[0]) + prio);
^ permalink raw reply related [flat|nested] 14+ messages in thread* Re: [PATCH] sch_htb: ix the deficit overflows
2009-11-27 8:14 [PATCH] sch_htb: ix the deficit overflows Changli Gao
@ 2009-11-28 0:04 ` Jarek Poplawski
2009-11-30 4:26 ` Changli Gao
0 siblings, 1 reply; 14+ messages in thread
From: Jarek Poplawski @ 2009-11-28 0:04 UTC (permalink / raw)
To: xiaosuo; +Cc: Jamal Hadi Salim, David S. Miller, netdev
Changli Gao wrote, On 11/27/2009 09:14 AM:
> fix the deficit overflows.
>
> HTB uses WDRR(Weighted Deficit Round Robin) algorithm to schedule the spare
> bandwidth, but it doesn't check if the deficit is big enough for the skb when
> dequeuing skb from a class. In some case(the quantum is smaller than the
> packet size), the deficit will be decreased, even when it is smaller than
> ZERO. At last, the deficit will overflows, and become MAX_INT.
This case of the quantum smaller than the packet size should be treated
as a broken config, so I don't think it's worth to do such a deep change
with additional delays and cpu cycles for all to fix it. A warning or
lower limit should be enough (if necessary at all).
Jarek P.
>
> Signed-off-by: Changli Gao <xiaosuo@gmail.com>
> ----
> sch_htb.c | 25 ++++++++++++++++++++-----
> 1 file changed, 20 insertions(+), 5 deletions(-)
> diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
> index 2e38d1a..293983e 100644
> --- a/net/sched/sch_htb.c
> +++ b/net/sched/sch_htb.c
> @@ -783,6 +783,7 @@ static struct sk_buff *htb_dequeue_tree(struct htb_sched *q, int prio,
> {
> struct sk_buff *skb = NULL;
> struct htb_class *cl, *start;
> + unsigned int len;
> /* look initial class up in the row */
> start = cl = htb_lookup_leaf(q->row[level] + prio, prio,
> q->ptr[level] + prio,
> @@ -815,9 +816,23 @@ next:
> goto next;
> }
>
> - skb = cl->un.leaf.q->dequeue(cl->un.leaf.q);
> - if (likely(skb != NULL))
> - break;
> + skb = cl->un.leaf.q->ops->peek(cl->un.leaf.q);
> + if (likely(skb != NULL)) {
> + len = qdisc_pkt_len(skb);
> + if (len <= cl->un.leaf.deficit[level]) {
> + skb = qdisc_dequeue_peeked(cl->un.leaf.q);
> + break;
> + }
> + skb = NULL;
> + cl->un.leaf.deficit[level] += cl->quantum;
> + htb_next_rb_node((level ? cl->parent->un.inner.ptr :
> + q->ptr[0]) + prio);
> + cl = htb_lookup_leaf(q->row[level] + prio, prio,
> + q->ptr[level] + prio,
> + q->last_ptr_id[level] + prio);
> + start = cl;
> + goto next;
> + }
>
> qdisc_warn_nonwc("htb", cl->un.leaf.q);
> htb_next_rb_node((level ? cl->parent->un.inner.ptr : q->
> @@ -829,8 +844,8 @@ next:
> } while (cl != start);
>
> if (likely(skb != NULL)) {
> - cl->un.leaf.deficit[level] -= qdisc_pkt_len(skb);
> - if (cl->un.leaf.deficit[level] < 0) {
> + cl->un.leaf.deficit[level] -= len;
> + if (cl->un.leaf.deficit[level] <= 0) {
> cl->un.leaf.deficit[level] += cl->quantum;
> htb_next_rb_node((level ? cl->parent->un.inner.ptr : q->
> ptr[0]) + prio);
>
> --
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [PATCH] sch_htb: ix the deficit overflows
2009-11-28 0:04 ` Jarek Poplawski
@ 2009-11-30 4:26 ` Changli Gao
2009-11-30 11:10 ` Jarek Poplawski
0 siblings, 1 reply; 14+ messages in thread
From: Changli Gao @ 2009-11-30 4:26 UTC (permalink / raw)
To: Jarek Poplawski; +Cc: Jamal Hadi Salim, David S. Miller, netdev
On Sat, Nov 28, 2009 at 8:04 AM, Jarek Poplawski <jarkao2@gmail.com> wrote:
> Changli Gao wrote, On 11/27/2009 09:14 AM:
>
>
> This case of the quantum smaller than the packet size should be treated
> as a broken config, so I don't think it's worth to do such a deep change
> with additional delays and cpu cycles for all to fix it. A warning or
> lower limit should be enough (if necessary at all).
>
I don't think this change is deep. HTB has it own lower limit for
quantum 1000, but the MTU is various, and maybe larger than that. And
if we use IMQ to shape traffic, the skb will be defragmented by
conntrack, and its size will be larger than MTU.
The previous patch indeed introduces some additional CPU cycles.
Review the new patch bellow please:
diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
index 2e38d1a..d55382b 100644
--- a/net/sched/sch_htb.c
+++ b/net/sched/sch_htb.c
@@ -815,6 +815,17 @@ next:
goto next;
}
+ if (unlikely(cl->un.leaf.deficit[level] < 0)) {
+ cl->un.leaf.deficit[level] += cl->quantum;
+ htb_next_rb_node((level ? cl->parent->un.inner.ptr :
+ q->ptr[0]) + prio);
+ cl = htb_lookup_leaf(q->row[level] + prio, prio,
+ q->ptr[level] + prio,
+ q->last_ptr_id[level] + prio);
+ start = cl;
+ goto next;
+ }
+
skb = cl->un.leaf.q->dequeue(cl->un.leaf.q);
if (likely(skb != NULL))
break;
If you think it is acceptable, I'll resubmit it for inclusion.
--
Regards,
Changli Gao(xiaosuo@gmail.com)
^ permalink raw reply related [flat|nested] 14+ messages in thread* Re: [PATCH] sch_htb: ix the deficit overflows
2009-11-30 4:26 ` Changli Gao
@ 2009-11-30 11:10 ` Jarek Poplawski
2009-12-01 2:32 ` Changli Gao
2009-12-02 9:20 ` David Miller
0 siblings, 2 replies; 14+ messages in thread
From: Jarek Poplawski @ 2009-11-30 11:10 UTC (permalink / raw)
To: Changli Gao; +Cc: Jamal Hadi Salim, David S. Miller, netdev, Martin Devera
On Mon, Nov 30, 2009 at 12:26:33PM +0800, Changli Gao wrote:
> On Sat, Nov 28, 2009 at 8:04 AM, Jarek Poplawski <jarkao2@gmail.com> wrote:
> > Changli Gao wrote, On 11/27/2009 09:14 AM:
> >
> >
> > This case of the quantum smaller than the packet size should be treated
> > as a broken config, so I don't think it's worth to do such a deep change
> > with additional delays and cpu cycles for all to fix it. A warning or
> > lower limit should be enough (if necessary at all).
> >
>
> I don't think this change is deep. HTB has it own lower limit for
> quantum 1000, but the MTU is various, and maybe larger than that.
Users can control this with "r2q" and "quantum", and there is a hint
on quantum size in the user's guide.
> And
> if we use IMQ to shape traffic, the skb will be defragmented by
> conntrack, and its size will be larger than MTU.
IMQ is a very nice thing, but it's considered broken as well, so it
can't be the reason for changing HTB.
>
> The previous patch indeed introduces some additional CPU cycles.
> Review the new patch bellow please:
And this patch is very similar, except ->peek()/dequeue(). Additional
lookups are done instead of dequeuing the first found class, which
might be quite long in some cases.
>
> diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
> index 2e38d1a..d55382b 100644
> --- a/net/sched/sch_htb.c
> +++ b/net/sched/sch_htb.c
> @@ -815,6 +815,17 @@ next:
> goto next;
> }
>
> + if (unlikely(cl->un.leaf.deficit[level] < 0)) {
> + cl->un.leaf.deficit[level] += cl->quantum;
> + htb_next_rb_node((level ? cl->parent->un.inner.ptr :
> + q->ptr[0]) + prio);
> + cl = htb_lookup_leaf(q->row[level] + prio, prio,
> + q->ptr[level] + prio,
> + q->last_ptr_id[level] + prio);
> + start = cl;
> + goto next;
> + }
> +
> skb = cl->un.leaf.q->dequeue(cl->un.leaf.q);
> if (likely(skb != NULL))
> break;
>
> If you think it is acceptable, I'll resubmit it for inclusion.
It's not acceptable to me mainly because the real change done by this
patch is different than you describe: preventing an overflow might be
simple. You change the way DRR is implemented here, and even if it's
right, it should be written explicitly and proved with tests results.
Anyway, I think you should rather care for the author's acceptance,
because the way it's done doesn't look like accidental and has been
heavily tested btw. (I added Martin to CC.)
Regards,
Jarek P.
PS: Btw, this newer version of the patch is broken with spaces.
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [PATCH] sch_htb: ix the deficit overflows
2009-11-30 11:10 ` Jarek Poplawski
@ 2009-12-01 2:32 ` Changli Gao
2009-12-01 8:01 ` Jarek Poplawski
2009-12-02 9:20 ` David Miller
1 sibling, 1 reply; 14+ messages in thread
From: Changli Gao @ 2009-12-01 2:32 UTC (permalink / raw)
To: Jarek Poplawski; +Cc: Jamal Hadi Salim, David S. Miller, netdev, Martin Devera
On Mon, Nov 30, 2009 at 7:10 PM, Jarek Poplawski <jarkao2@gmail.com> wrote:
> On Mon, Nov 30, 2009 at 12:26:33PM +0800, Changli Gao wrote:
>
> Users can control this with "r2q" and "quantum", and there is a hint
> on quantum size in the user's guide.
Yes. But I think most of users will ignore it like me.
>
>> And
>> if we use IMQ to shape traffic, the skb will be defragmented by
>> conntrack, and its size will be larger than MTU.
>
> IMQ is a very nice thing, but it's considered broken as well, so it
> can't be the reason for changing HTB.
I find IMQ is used by many network equipments Linux based. Why not fix
and integrate it into official Linux?
> And this patch is very similar, except ->peek()/dequeue(). Additional
> lookups are done instead of dequeuing the first found class, which
> might be quite long in some cases.
If the quantum is set correctly, there isn't difference except of a
comparison. In the other case, I think some additional CPU cycles are
better than overflow.
>
> It's not acceptable to me mainly because the real change done by this
> patch is different than you describe: preventing an overflow might be
> simple. You change the way DRR is implemented here, and even if it's
> right, it should be written explicitly and proved with tests results.
>
This way is used by CBQ.
> Anyway, I think you should rather care for the author's acceptance,
> because the way it's done doesn't look like accidental and has been
> heavily tested btw. (I added Martin to CC.)
>
> Regards,
> Jarek P.
>
> PS: Btw, this newer version of the patch is broken with spaces.
>
Thanks. It is why I ask for posting again.
--
Regards,
Changli Gao(xiaosuo@gmail.com)
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] sch_htb: ix the deficit overflows
2009-12-01 2:32 ` Changli Gao
@ 2009-12-01 8:01 ` Jarek Poplawski
2009-12-01 8:43 ` Jarek Poplawski
0 siblings, 1 reply; 14+ messages in thread
From: Jarek Poplawski @ 2009-12-01 8:01 UTC (permalink / raw)
To: Changli Gao; +Cc: Jamal Hadi Salim, David S. Miller, netdev, Martin Devera
On Tue, Dec 01, 2009 at 10:32:26AM +0800, Changli Gao wrote:
> On Mon, Nov 30, 2009 at 7:10 PM, Jarek Poplawski <jarkao2@gmail.com> wrote:
> > On Mon, Nov 30, 2009 at 12:26:33PM +0800, Changli Gao wrote:
> >
> > Users can control this with "r2q" and "quantum", and there is a hint
> > on quantum size in the user's guide.
>
> Yes. But I think most of users will ignore it like me.
In most cases this shouldn't matter. Default r2q/quantum should be
OK for higher rates, and lower ones (< 10pps) are probably controlled
mainly by their state, so even an overflowed deficit doesn't have to
matter (unless your tests show something else ;-).
In other cases those users should see some problems or quantum
warnings, and that's when they should stop ignoring the docs.
>
> >
> >> And
> >> if we use IMQ to shape traffic, the skb will be defragmented by
> >> conntrack, and its size will be larger than MTU.
> >
> > IMQ is a very nice thing, but it's considered broken as well, so it
> > can't be the reason for changing HTB.
>
> I find IMQ is used by many network equipments Linux based. Why not fix
> and integrate it into official Linux?
Even I ;-) don't know exact reasons, but I believe some people here
know better.
>
> > And this patch is very similar, except ->peek()/dequeue(). Additional
> > lookups are done instead of dequeuing the first found class, which
> > might be quite long in some cases.
>
> If the quantum is set correctly, there isn't difference except of a
> comparison. In the other case, I think some additional CPU cycles are
> better than overflow.
No, my main point is there _is_ a difference when the quantum is set
correctly. Just these additional lookups.
>
> >
> > It's not acceptable to me mainly because the real change done by this
> > patch is different than you describe: preventing an overflow might be
> > simple. You change the way DRR is implemented here, and even if it's
> > right, it should be written explicitly and proved with tests results.
> >
>
> This way is used by CBQ.
HTB is different by design:
http://luxik.cdi.cz/~devik/qos/htb/manual/theory.htm
Regards,
Jarek P.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] sch_htb: ix the deficit overflows
2009-12-01 8:01 ` Jarek Poplawski
@ 2009-12-01 8:43 ` Jarek Poplawski
2009-12-01 9:18 ` Changli Gao
2009-12-01 19:12 ` Jarek Poplawski
0 siblings, 2 replies; 14+ messages in thread
From: Jarek Poplawski @ 2009-12-01 8:43 UTC (permalink / raw)
To: Changli Gao; +Cc: Jamal Hadi Salim, David S. Miller, netdev, Martin Devera
On Tue, Dec 01, 2009 at 08:01:51AM +0000, Jarek Poplawski wrote:
> On Tue, Dec 01, 2009 at 10:32:26AM +0800, Changli Gao wrote:
> > On Mon, Nov 30, 2009 at 7:10 PM, Jarek Poplawski <jarkao2@gmail.com> wrote:
...
> > > And this patch is very similar, except ->peek()/dequeue(). Additional
> > > lookups are done instead of dequeuing the first found class, which
> > > might be quite long in some cases.
> >
> > If the quantum is set correctly, there isn't difference except of a
> > comparison. In the other case, I think some additional CPU cycles are
> > better than overflow.
>
> No, my main point is there _is_ a difference when the quantum is set
> correctly. Just these additional lookups.
And, again, there are less invasive ways to fix such overflow, like
htb_dequeue_tree()
{
...
if (likely(skb != NULL)) {
cl->un.leaf.deficit[level] -= qdisc_pkt_len(skb);
if (cl->un.leaf.deficit[level] < 0) {
cl->un.leaf.deficit[level] += cl->quantum;
+ if (cl->un.leaf.deficit[level] < 0)
+ cl->un.leaf.deficit[level] = -cl->quantum;
+ /* or other limit */
htb_next_rb_node((level ? cl->parent->un.inner.ptr : q->
ptr[0]) + prio);
}
/* this used to be after charge_class but this constelation
gives us slightly better performance */
if (!cl->un.leaf.q->q.qlen)
htb_deactivate(q, cl);
htb_charge_class(q, cl, level, skb);
}
return skb;
}
or even always zeroing cl->un.leaf.deficit[level] on activation or
deactivation (it's seems unlikely one activity period is enough for
such an overflow).
Jarek P.
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [PATCH] sch_htb: ix the deficit overflows
2009-12-01 8:43 ` Jarek Poplawski
@ 2009-12-01 9:18 ` Changli Gao
2009-12-01 9:39 ` Jarek Poplawski
2009-12-01 19:12 ` Jarek Poplawski
1 sibling, 1 reply; 14+ messages in thread
From: Changli Gao @ 2009-12-01 9:18 UTC (permalink / raw)
To: Jarek Poplawski; +Cc: Jamal Hadi Salim, David S. Miller, netdev, Martin Devera
On Tue, Dec 1, 2009 at 4:43 PM, Jarek Poplawski <jarkao2@gmail.com> wrote:
> On Tue, Dec 01, 2009 at 08:01:51AM +0000, Jarek Poplawski wrote:
>> On Tue, Dec 01, 2009 at 10:32:26AM +0800, Changli Gao wrote:
>> > On Mon, Nov 30, 2009 at 7:10 PM, Jarek Poplawski <jarkao2@gmail.com> wrote:
> ...
>> > > And this patch is very similar, except ->peek()/dequeue(). Additional
>> > > lookups are done instead of dequeuing the first found class, which
>> > > might be quite long in some cases.
>> >
>> > If the quantum is set correctly, there isn't difference except of a
>> > comparison. In the other case, I think some additional CPU cycles are
>> > better than overflow.
>>
>> No, my main point is there _is_ a difference when the quantum is set
>> correctly. Just these additional lookups.
>
> And, again, there are less invasive ways to fix such overflow, like
>
> htb_dequeue_tree()
> {
> ...
> if (likely(skb != NULL)) {
> cl->un.leaf.deficit[level] -= qdisc_pkt_len(skb);
> if (cl->un.leaf.deficit[level] < 0) {
> cl->un.leaf.deficit[level] += cl->quantum;
>
> + if (cl->un.leaf.deficit[level] < 0)
> + cl->un.leaf.deficit[level] = -cl->quantum;
How about this:
if (cl->un.leaf.deficit[level] < 0) {
cl->un.leaf.deficit[level] = 0;
if (!(cl->warned & HTB_WARN_QUANTUM_SMALL)) {
printk(KERN_WARNING
"HTB: quantum of class %X is small.
Consider r2q change.\n",
cl->common.classid);
cl->warned |= HTB_WARN_QUANTUM_SMALL;
}
}
> + /* or other limit */
>
> htb_next_rb_node((level ? cl->parent->un.inner.ptr : q->
> ptr[0]) + prio);
> }
> /* this used to be after charge_class but this constelation
> gives us slightly better performance */
> if (!cl->un.leaf.q->q.qlen)
> htb_deactivate(q, cl);
> htb_charge_class(q, cl, level, skb);
> }
> return skb;
> }
>
> or even always zeroing cl->un.leaf.deficit[level] on activation or
> deactivation (it's seems unlikely one activity period is enough for
> such an overflow).
>
I found this from http://luxik.cdi.cz/~devik/qos/htb/manual/theory.htm:
HTB uses "real" DRR as defined in [4]. CBQ in Linux uses one where
the quantum can be lower than MTU - it is more generic but it is also
no longer O(1) complexity. It also means that you have to use right
scale for rate->quantum conversion so that all quantums are larger
than MTU.
To keep it O(1) complexity, HTB requires users use the right scale for
quantum. So my first two patches are in the wrong direction.
--
Regards,
Changli Gao(xiaosuo@gmail.com)
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [PATCH] sch_htb: ix the deficit overflows
2009-12-01 9:18 ` Changli Gao
@ 2009-12-01 9:39 ` Jarek Poplawski
0 siblings, 0 replies; 14+ messages in thread
From: Jarek Poplawski @ 2009-12-01 9:39 UTC (permalink / raw)
To: Changli Gao; +Cc: Jamal Hadi Salim, David S. Miller, netdev, Martin Devera
On Tue, Dec 01, 2009 at 05:18:32PM +0800, Changli Gao wrote:
> On Tue, Dec 1, 2009 at 4:43 PM, Jarek Poplawski <jarkao2@gmail.com> wrote:
> > On Tue, Dec 01, 2009 at 08:01:51AM +0000, Jarek Poplawski wrote:
> >> On Tue, Dec 01, 2009 at 10:32:26AM +0800, Changli Gao wrote:
> >> > On Mon, Nov 30, 2009 at 7:10 PM, Jarek Poplawski <jarkao2@gmail.com> wrote:
> > ...
> >> > > And this patch is very similar, except ->peek()/dequeue(). Additional
> >> > > lookups are done instead of dequeuing the first found class, which
> >> > > might be quite long in some cases.
> >> >
> >> > If the quantum is set correctly, there isn't difference except of a
> >> > comparison. In the other case, I think some additional CPU cycles are
> >> > better than overflow.
> >>
> >> No, my main point is there _is_ a difference when the quantum is set
> >> correctly. Just these additional lookups.
> >
> > And, again, there are less invasive ways to fix such overflow, like
> >
> > htb_dequeue_tree()
> > {
> > ...
> > if (likely(skb != NULL)) {
> > cl->un.leaf.deficit[level] -= qdisc_pkt_len(skb);
> > if (cl->un.leaf.deficit[level] < 0) {
> > cl->un.leaf.deficit[level] += cl->quantum;
> >
> > + if (cl->un.leaf.deficit[level] < 0)
> > + cl->un.leaf.deficit[level] = -cl->quantum;
>
> How about this:
> if (cl->un.leaf.deficit[level] < 0) {
> cl->un.leaf.deficit[level] = 0;
> if (!(cl->warned & HTB_WARN_QUANTUM_SMALL)) {
> printk(KERN_WARNING
> "HTB: quantum of class %X is small.
> Consider r2q change.\n",
> cl->common.classid);
> cl->warned |= HTB_WARN_QUANTUM_SMALL;
> }
> }
I guess you mean q->warned. Maybe unlikely() would be useful too.
Otherwise, it's acceptable to me, especially when you write you really
hit this problem (not theoretical only ;-)
Jarek P.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] sch_htb: ix the deficit overflows
2009-12-01 8:43 ` Jarek Poplawski
2009-12-01 9:18 ` Changli Gao
@ 2009-12-01 19:12 ` Jarek Poplawski
2009-12-01 19:18 ` Jarek Poplawski
1 sibling, 1 reply; 14+ messages in thread
From: Jarek Poplawski @ 2009-12-01 19:12 UTC (permalink / raw)
To: Changli Gao; +Cc: Jamal Hadi Salim, David S. Miller, netdev, Martin Devera
Jarek Poplawski wrote, On 12/01/2009 09:43 AM:
> And, again, there are less invasive ways to fix such overflow, like
>
> htb_dequeue_tree()
> {
> ...
> if (likely(skb != NULL)) {
> cl->un.leaf.deficit[level] -= qdisc_pkt_len(skb);
> if (cl->un.leaf.deficit[level] < 0) {
> cl->un.leaf.deficit[level] += cl->quantum;
>
> + if (cl->un.leaf.deficit[level] < 0)
> + cl->un.leaf.deficit[level] = -cl->quantum;
> + /* or other limit */
>
> htb_next_rb_node((level ? cl->parent->un.inner.ptr : q->
> ptr[0]) + prio);
> }
> /* this used to be after charge_class but this constelation
> gives us slightly better performance */
> if (!cl->un.leaf.q->q.qlen)
> htb_deactivate(q, cl);
> htb_charge_class(q, cl, level, skb);
> }
> return skb;
> }
>
> or even always zeroing cl->un.leaf.deficit[level] on activation or
> deactivation (it's seems unlikely one activity period is enough for
> such an overflow).
So, as I wrote before, your proposal based on the first variant is
acceptable to me, but please reconsider the second idea too, e.g.
done like this:
}
/* this used to be after charge_class but this constelation
gives us slightly better performance */
if (!cl->un.leaf.q->q.qlen) {
htb_deactivate(q, cl);
+ if (unlikely(cl->un.leaf.deficit[level] < cl->quantum)) {
+ { a warning }
+ cl->un.leaf.deficit[level] = 0;
+ }
}
htb_charge_class(q, cl, level, skb);
}
return skb;
}
Jarek P.
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [PATCH] sch_htb: ix the deficit overflows
2009-12-01 19:12 ` Jarek Poplawski
@ 2009-12-01 19:18 ` Jarek Poplawski
0 siblings, 0 replies; 14+ messages in thread
From: Jarek Poplawski @ 2009-12-01 19:18 UTC (permalink / raw)
To: Changli Gao; +Cc: Jamal Hadi Salim, David S. Miller, netdev, Martin Devera
Jarek Poplawski wrote, On 12/01/2009 08:12 PM:
> So, as I wrote before, your proposal based on the first variant is
> acceptable to me, but please reconsider the second idea too, e.g.
> done like this:
>
> }
> /* this used to be after charge_class but this constelation
> gives us slightly better performance */
> if (!cl->un.leaf.q->q.qlen) {
> htb_deactivate(q, cl);
> + if (unlikely(cl->un.leaf.deficit[level] < cl->quantum)) {
Should be:
+ if (unlikely(cl->un.leaf.deficit[level] < -cl->quantum)) {
Jarek P.
> + { a warning }
> + cl->un.leaf.deficit[level] = 0;
> + }
> }
> htb_charge_class(q, cl, level, skb);
> }
> return skb;
> }
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] sch_htb: ix the deficit overflows
2009-11-30 11:10 ` Jarek Poplawski
2009-12-01 2:32 ` Changli Gao
@ 2009-12-02 9:20 ` David Miller
2009-12-02 10:32 ` Jarek Poplawski
1 sibling, 1 reply; 14+ messages in thread
From: David Miller @ 2009-12-02 9:20 UTC (permalink / raw)
To: jarkao2; +Cc: xiaosuo, hadi, netdev, martin.devera
From: Jarek Poplawski <jarkao2@gmail.com>
Date: Mon, 30 Nov 2009 11:10:20 +0000
> On Mon, Nov 30, 2009 at 12:26:33PM +0800, Changli Gao wrote:
>> And
>> if we use IMQ to shape traffic, the skb will be defragmented by
>> conntrack, and its size will be larger than MTU.
>
> IMQ is a very nice thing, but it's considered broken as well, so it
> can't be the reason for changing HTB.
If you don't like IMQ, fine. Simply consider TSO and GSO as another
set of mechanisms that can introduce this condition.
Because we toss large SKBs all over the strack quite freely,
protections like those suggested by Changli make perfect sense.
We really don't have an MTU for packets within our stack any more.
The code, by default, need to be able to handle anything.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] sch_htb: ix the deficit overflows
2009-12-02 9:20 ` David Miller
@ 2009-12-02 10:32 ` Jarek Poplawski
2009-12-02 11:07 ` Martin Devera
0 siblings, 1 reply; 14+ messages in thread
From: Jarek Poplawski @ 2009-12-02 10:32 UTC (permalink / raw)
To: David Miller; +Cc: xiaosuo, hadi, netdev, martin.devera
On Wed, Dec 02, 2009 at 01:20:17AM -0800, David Miller wrote:
> From: Jarek Poplawski <jarkao2@gmail.com>
> Date: Mon, 30 Nov 2009 11:10:20 +0000
>
> > On Mon, Nov 30, 2009 at 12:26:33PM +0800, Changli Gao wrote:
> >> And
> >> if we use IMQ to shape traffic, the skb will be defragmented by
> >> conntrack, and its size will be larger than MTU.
> >
> > IMQ is a very nice thing, but it's considered broken as well, so it
> > can't be the reason for changing HTB.
>
> If you don't like IMQ, fine. Simply consider TSO and GSO as another
> set of mechanisms that can introduce this condition.
>
> Because we toss large SKBs all over the strack quite freely,
> protections like those suggested by Changli make perfect sense.
>
> We really don't have an MTU for packets within our stack any more.
> The code, by default, need to be able to handle anything.
Alas, an MTU is still a crucial parameter for some schedulers. Anyway,
I hope Changli wasn't mislead to treat my private opinions in this or
any other thread as decisive. Otherwise, I'm sorry.
Jarek P.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] sch_htb: ix the deficit overflows
2009-12-02 10:32 ` Jarek Poplawski
@ 2009-12-02 11:07 ` Martin Devera
0 siblings, 0 replies; 14+ messages in thread
From: Martin Devera @ 2009-12-02 11:07 UTC (permalink / raw)
To: Jarek Poplawski; +Cc: David Miller, xiaosuo, hadi, netdev
Jarek Poplawski wrote:
>>
>> Because we toss large SKBs all over the strack quite freely,
>> protections like those suggested by Changli make perfect sense.
>>
>> We really don't have an MTU for packets within our stack any more.
>> The code, by default, need to be able to handle anything.
>
> Alas, an MTU is still a crucial parameter for some schedulers. Anyway,
> I hope Changli wasn't mislead to treat my private opinions in this or
> any other thread as decisive. Otherwise, I'm sorry.
It seems that the new code is not slower when quantums are larger
than MTU's. Only when it is not the case, the new code introduces
loop which will iteratively increase deficit[x] until some packet
fits.
This can slow dequeue down. I'm not sure how much - but imagine
working setup with small quantums which works just now (only
sharing ratios are wrong).
After installing new kernel with the change the load could go
up and router could start dropping packets.
While I don't believe the load will change so much, Changli
should measure this one IMHO.
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2009-12-02 11:07 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-11-27 8:14 [PATCH] sch_htb: ix the deficit overflows Changli Gao
2009-11-28 0:04 ` Jarek Poplawski
2009-11-30 4:26 ` Changli Gao
2009-11-30 11:10 ` Jarek Poplawski
2009-12-01 2:32 ` Changli Gao
2009-12-01 8:01 ` Jarek Poplawski
2009-12-01 8:43 ` Jarek Poplawski
2009-12-01 9:18 ` Changli Gao
2009-12-01 9:39 ` Jarek Poplawski
2009-12-01 19:12 ` Jarek Poplawski
2009-12-01 19:18 ` Jarek Poplawski
2009-12-02 9:20 ` David Miller
2009-12-02 10:32 ` Jarek Poplawski
2009-12-02 11:07 ` Martin Devera
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).