Re: bad networking related lag in v2.6.22-rc2

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: bad networking related lag in v2.6.22-rc2
       [not found]       ` <20070523063052.GB26814@elte.hu>
@ 2007-05-23 10:56         ` Patrick McHardy
  2007-05-23 11:05           ` Ingo Molnar
  2007-05-23 11:25           ` Herbert Xu
  0 siblings, 2 replies; 11+ messages in thread
From: Patrick McHardy @ 2007-05-23 10:56 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Anant Nitya, linux-kernel, Linus Torvalds, Andrew Morton,
	Thomas Gleixner, David S. Miller, Linux Netdev List, Herbert Xu

[-- Attachment #1: Type: text/plain, Size: 1275 bytes --]

Ingo Molnar wrote:
> if you feel inclined to try the git-bisection then by all means please 
> do it (it will certainly be helpful and educative), but it's optional: i 
> dont think you should 'need' to go through extra debugging chores, my 
> analysis based on the excellent trace you provided still holds and 
> whoever modified htb_dequeue()'s logic recently ought to be able to 
> figure that out (or send you a debug patch to further narrow the problem 
> down).
>
> The trace shows a _clearly_ anomalous loop: for example there's 56396 
> (!) calls to rb_first() in htb_dequeue() [without the kernel ever 
> exiting that function]:
> 
>   earth4:~/s> grep rb_first trace-to-ingo.txt  | wc -l
>   56396


How is this trace to be understood? Is it simply a call trace in
execution-order? If thats the case than we are exiting htb_dequeue,
each call to qdisc_watchdog_schedule happens at the very end of
that function, which would imply a bug in __qdisc_run.

Looking at the recent changes to __qdisc_run, this indeed seems
to be the case, when the qdisc is throttled and has packets queued
we return a value != 0, causing __qdisc_run to loop until all
packets have been sent, which may be a long time.

Anant, can you please verify by testing the attached patch? Thanks.

[-- Attachment #2: x --]
[-- Type: text/plain, Size: 318 bytes --]

diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index f28bb2d..f536060 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -174,7 +174,7 @@ requeue:
 
 out:
 	BUG_ON((int) q->q.qlen < 0);
-	return q->q.qlen;
+	return skb ? q->q.qlen : 0;
 }
 
 void __qdisc_run(struct net_device *dev)

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: bad networking related lag in v2.6.22-rc2
  2007-05-23 10:56         ` bad networking related lag in v2.6.22-rc2 Patrick McHardy
@ 2007-05-23 11:05           ` Ingo Molnar
  2007-05-23 11:25           ` Herbert Xu
  1 sibling, 0 replies; 11+ messages in thread
From: Ingo Molnar @ 2007-05-23 11:05 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: Anant Nitya, linux-kernel, Linus Torvalds, Andrew Morton,
	Thomas Gleixner, David S. Miller, Linux Netdev List, Herbert Xu


* Patrick McHardy <kaber@trash.net> wrote:

> How is this trace to be understood? Is it simply a call trace in 
> execution-order? [...]

yeah. There's a help section at the top of the trace which explains the 
other fields too:

                 _------=> CPU#
                / _-----=> irqs-off
               | / _----=> need-resched
               || / _---=> hardirq/softirq
               ||| / _--=> preempt-depth
               |||| /
               |||||     delay
   cmd     pid ||||| time  |   caller
      \   /    |||||   \   |   /
 privoxy-12926 1.Ns1    0us : ktime_get_ts (ktime_get)

the function name in braces is the parent function. So in this case the 
trace entry means we called ktime_get_ts() from ktime_get().

	Ingo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: bad networking related lag in v2.6.22-rc2
  2007-05-23 10:56         ` bad networking related lag in v2.6.22-rc2 Patrick McHardy
  2007-05-23 11:05           ` Ingo Molnar
@ 2007-05-23 11:25           ` Herbert Xu
  2007-05-23 11:33             ` Patrick McHardy
  2007-05-23 11:40             ` Ingo Molnar
  1 sibling, 2 replies; 11+ messages in thread
From: Herbert Xu @ 2007-05-23 11:25 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: Ingo Molnar, Anant Nitya, linux-kernel, Linus Torvalds,
	Andrew Morton, Thomas Gleixner, David S. Miller,
	Linux Netdev List

On Wed, May 23, 2007 at 12:56:04PM +0200, Patrick McHardy wrote:
> 
> Looking at the recent changes to __qdisc_run, this indeed seems
> to be the case, when the qdisc is throttled and has packets queued
> we return a value != 0, causing __qdisc_run to loop until all
> packets have been sent, which may be a long time.

Good catch! I was obviously half awake at the time :)

We could also fix it this way:

[NET_SCHED]: Fix qdisc_restart return value when dequeue is empty

My previous patch that changed the return value of qdisc_restart
incorrectly made the case where dequeue returns empty continue
processing packets.

This patch is based on diagnosis and fix by Patrick McHardy.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index f28bb2d..cbefe22 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -169,8 +169,8 @@ requeue:
 		else
 			q->ops->requeue(skb, q);
 		netif_schedule(dev);
-		return 0;
 	}
+	return 0;
 
 out:
 	BUG_ON((int) q->q.qlen < 0);

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: bad networking related lag in v2.6.22-rc2
  2007-05-23 11:25           ` Herbert Xu
@ 2007-05-23 11:33             ` Patrick McHardy
  2007-05-23 15:00               ` Linus Torvalds
  2007-05-23 11:40             ` Ingo Molnar
  1 sibling, 1 reply; 11+ messages in thread
From: Patrick McHardy @ 2007-05-23 11:33 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Ingo Molnar, Anant Nitya, linux-kernel, Linus Torvalds,
	Andrew Morton, Thomas Gleixner, David S. Miller,
	Linux Netdev List

Herbert Xu wrote:
> On Wed, May 23, 2007 at 12:56:04PM +0200, Patrick McHardy wrote:
> 
>>Looking at the recent changes to __qdisc_run, this indeed seems
>>to be the case, when the qdisc is throttled and has packets queued
>>we return a value != 0, causing __qdisc_run to loop until all
>>packets have been sent, which may be a long time.
> 
> 
> Good catch! I was obviously half awake at the time :)
> 
> We could also fix it this way:


Yes, that looks better, thanks.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: bad networking related lag in v2.6.22-rc2
  2007-05-23 11:33             ` Patrick McHardy
@ 2007-05-23 15:00               ` Linus Torvalds
  2007-05-23 17:16                 ` Patrick McHardy
  0 siblings, 1 reply; 11+ messages in thread
From: Linus Torvalds @ 2007-05-23 15:00 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: Herbert Xu, Ingo Molnar, Anant Nitya, linux-kernel, Andrew Morton,
	Thomas Gleixner, David S. Miller, Linux Netdev List

On Wed, 23 May 2007, Patrick McHardy wrote:
> 
> Yes, that looks better, thanks.

There appear to be other obvious problems in the recent "cleanups" in this 
area..

Look at

	psched_tdiff_bounded(psched_time_t tv1, psched_time_t tv2, psched_time_t bound)
	{
		return min(tv1 - tv2, bound);
	}

and compare it to the previous code:

	#define PSCHED_TDIFF_SAFE(tv1, tv2, bound) \
		min_t(long long, (tv1) - (tv2), bound)

and ponder how that "trivial cleanup" totally broke the thing. 

Hint: "psched_time_t" is an "u64". What does that mean for

	min(tv1 - tv2, bound);

again, when "tv2" is larger than tv1. It _used_ to return a negative 
value. Now it returns a positive "bound" upper bound, because "tv1-tv2" 
will be used as a huge unsigned (and thus _positive_) integer. And was 
that accidental, or done on purpose?

Sounds accidental to me, since you then want to return a "psched_tdiff_t", 
which is typedeffed to be "long".

Doesn't sound very safe to me, especially since the commit message for 
this is "[NET_SCHED]: turn PSCHED_TDIFF_SAFE into inline function", and 
there's no indication that anybody realized that it changed semantics in 
the process.

Hmm? What _should_ that thing do?

		Linus

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: bad networking related lag in v2.6.22-rc2
  2007-05-23 15:00               ` Linus Torvalds
@ 2007-05-23 17:16                 ` Patrick McHardy
  0 siblings, 0 replies; 11+ messages in thread
From: Patrick McHardy @ 2007-05-23 17:16 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Herbert Xu, Ingo Molnar, Anant Nitya, linux-kernel, Andrew Morton,
	Thomas Gleixner, David S. Miller, Linux Netdev List

Linus Torvalds wrote:
> There appear to be other obvious problems in the recent "cleanups" in this 
> area..
> 
> Look at
> 
> 	psched_tdiff_bounded(psched_time_t tv1, psched_time_t tv2, psched_time_t bound)
> 	{
> 		return min(tv1 - tv2, bound);
> 	}
> 
> and compare it to the previous code:
> 
> 	#define PSCHED_TDIFF_SAFE(tv1, tv2, bound) \
> 		min_t(long long, (tv1) - (tv2), bound)
> 
> and ponder how that "trivial cleanup" totally broke the thing. 
> 
> Hint: "psched_time_t" is an "u64". What does that mean for
> 
> 	min(tv1 - tv2, bound);
> 
> again, when "tv2" is larger than tv1. It _used_ to return a negative 
> value. Now it returns a positive "bound" upper bound, because "tv1-tv2" 
> will be used as a huge unsigned (and thus _positive_) integer. And was 
> that accidental, or done on purpose?
> 
> Sounds accidental to me, since you then want to return a "psched_tdiff_t", 
> which is typedeffed to be "long".
>
> Doesn't sound very safe to me, especially since the commit message for 
> this is "[NET_SCHED]: turn PSCHED_TDIFF_SAFE into inline function", and 
> there's no indication that anybody realized that it changed semantics in 
> the process.


I did realize it, but tv2 > tv1 can't happen and makes no sense for
the users of this function. I probably should have provided a more
detailed changelog entry.

> Hmm? What _should_ that thing do?


It is used to calculate the amount of tokens a tocken bucket has
accumulated since the last refill, thus we always have tv1 >= tv2
(modulo ktime wraps). In fact tv2 > tv1 was never properly
supported. This macro would have returned the negative long long
value, but all users assign it to a psched_tdiff_t (long), so
depending on the exact values, it might still be interpreted as a
large positive value. Additionally there was a second implementation
for the gettimeofday clocksource that didn't return the negative
difference but the bound value.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: bad networking related lag in v2.6.22-rc2
  2007-05-23 11:25           ` Herbert Xu
  2007-05-23 11:33             ` Patrick McHardy
@ 2007-05-23 11:40             ` Ingo Molnar
  2007-05-23 21:30               ` David Miller
  1 sibling, 1 reply; 11+ messages in thread
From: Ingo Molnar @ 2007-05-23 11:40 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Patrick McHardy, Anant Nitya, linux-kernel, Linus Torvalds,
	Andrew Morton, Thomas Gleixner, David S. Miller,
	Linux Netdev List


* Herbert Xu <herbert@gondor.apana.org.au> wrote:

> [NET_SCHED]: Fix qdisc_restart return value when dequeue is empty
> 
> My previous patch that changed the return value of qdisc_restart 
> incorrectly made the case where dequeue returns empty continue 
> processing packets.
> 
> This patch is based on diagnosis and fix by Patrick McHardy.
> 
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

also:

  Reported-and-debugged-by: Anant Nitya <kernel@prachanda.info>

...

i gave your patch a quick test-boot and networking still works fine.

	Ingo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: bad networking related lag in v2.6.22-rc2
  2007-05-23 11:40             ` Ingo Molnar
@ 2007-05-23 21:30               ` David Miller
  2007-05-24  5:41                 ` Patrick McHardy
  2007-05-24  7:12                 ` Anant Nitya
  0 siblings, 2 replies; 11+ messages in thread
From: David Miller @ 2007-05-23 21:30 UTC (permalink / raw)
  To: mingo; +Cc: herbert, kaber, kernel, linux-kernel, torvalds, akpm, tglx,
	netdev

From: Ingo Molnar <mingo@elte.hu>
Date: Wed, 23 May 2007 13:40:21 +0200

> 
> * Herbert Xu <herbert@gondor.apana.org.au> wrote:
> 
> > [NET_SCHED]: Fix qdisc_restart return value when dequeue is empty
> > 
> > My previous patch that changed the return value of qdisc_restart 
> > incorrectly made the case where dequeue returns empty continue 
> > processing packets.
> > 
> > This patch is based on diagnosis and fix by Patrick McHardy.
> > 
> > Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
> 
> also:
> 
>   Reported-and-debugged-by: Anant Nitya <kernel@prachanda.info>

Applied, thanks everyone.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: bad networking related lag in v2.6.22-rc2
  2007-05-23 21:30               ` David Miller
@ 2007-05-24  5:41                 ` Patrick McHardy
  2007-05-24  6:40                   ` David Miller
  2007-05-24  7:12                 ` Anant Nitya
  1 sibling, 1 reply; 11+ messages in thread
From: Patrick McHardy @ 2007-05-24  5:41 UTC (permalink / raw)
  To: David Miller
  Cc: mingo, herbert, kernel, linux-kernel, torvalds, akpm, tglx,
	netdev

[-- Attachment #1: Type: text/plain, Size: 259 bytes --]

David Miller wrote:
>>* Herbert Xu <herbert@gondor.apana.org.au> wrote:
>>
>>>[NET_SCHED]: Fix qdisc_restart return value when dequeue is empty
>
> Applied, thanks everyone.


Even though it didn't fix this problem, this patch I sent earlier is
also needed.


[-- Attachment #2: x --]
[-- Type: text/plain, Size: 1132 bytes --]

[NET_SCHED]: sch_htb: fix event cache time calculation

The event cache time must be an absolute value, when no event exists it is
incorrectly set to 1s instead of 1s in the future.

Signed-off-by: Patrick McHardy <kaber@trash.net>

---
commit 49d1023ea0ea8377e740123d5954e88a00f78b7c
tree 031c210f1b5e37ade5a4fa519f5808cd49225b89
parent 637fc540b0ad22bf7971929e906e704236af06cd
author Patrick McHardy <kaber@trash.net> Mon, 21 May 2007 23:24:16 +0200
committer Patrick McHardy <kaber@trash.net> Mon, 21 May 2007 23:25:51 +0200

 net/sched/sch_htb.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
index 99bcec8..035788c 100644
--- a/net/sched/sch_htb.c
+++ b/net/sched/sch_htb.c
@@ -976,8 +976,9 @@ static struct sk_buff *htb_dequeue(struct Qdisc *sch)
 
 		if (q->now >= q->near_ev_cache[level]) {
 			event = htb_do_events(q, level);
-			q->near_ev_cache[level] = event ? event :
-							  PSCHED_TICKS_PER_SEC;
+			if (!event)
+				event = q->now + PSCHED_TICKS_PER_SEC;
+			q->near_ev_cache[level] = event;
 		} else
 			event = q->near_ev_cache[level];
 

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: bad networking related lag in v2.6.22-rc2
  2007-05-24  5:41                 ` Patrick McHardy
@ 2007-05-24  6:40                   ` David Miller
  0 siblings, 0 replies; 11+ messages in thread
From: David Miller @ 2007-05-24  6:40 UTC (permalink / raw)
  To: kaber; +Cc: mingo, herbert, kernel, linux-kernel, torvalds, akpm, tglx,
	netdev

From: Patrick McHardy <kaber@trash.net>
Date: Thu, 24 May 2007 07:41:00 +0200

> David Miller wrote:
> >>* Herbert Xu <herbert@gondor.apana.org.au> wrote:
> >>
> >>>[NET_SCHED]: Fix qdisc_restart return value when dequeue is empty
> >
> > Applied, thanks everyone.
> 
> 
> Even though it didn't fix this problem, this patch I sent earlier is
> also needed.

Thanks a lot for reminding me about this patch Patrick, applied.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: bad networking related lag in v2.6.22-rc2
  2007-05-23 21:30               ` David Miller
  2007-05-24  5:41                 ` Patrick McHardy
@ 2007-05-24  7:12                 ` Anant Nitya
  1 sibling, 0 replies; 11+ messages in thread
From: Anant Nitya @ 2007-05-24  7:12 UTC (permalink / raw)
  To: linux-kernel
  Cc: David Miller, mingo, herbert, kaber, torvalds, akpm, tglx, netdev

On Thursday 24 May 2007 03:00:56 David Miller wrote:
> From: Ingo Molnar <mingo@elte.hu>
> Date: Wed, 23 May 2007 13:40:21 +0200
>
> > * Herbert Xu <herbert@gondor.apana.org.au> wrote:
> > > [NET_SCHED]: Fix qdisc_restart return value when dequeue is empty
> > >
> > > My previous patch that changed the return value of qdisc_restart
> > > incorrectly made the case where dequeue returns empty continue
> > > processing packets.
> > >
> > > This patch is based on diagnosis and fix by Patrick McHardy.
> > >
> > > Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
> >
> > also:
> >
> >   Reported-and-debugged-by: Anant Nitya <kernel@prachanda.info>
>
> Applied, thanks everyone.

Networking lag I been seeing since 2.6.22-rc1, disappeared after applying this 
patch. Thanks to everyone who helped me run my system sane again. :)

Reagards
Ananitya

-- 
Out of many thousands, one may endeavor for perfection, and of
those who have achieved perfection, hardly one knows Me in truth.
				-- Gita Sutra Of Mysticism

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2007-05-24  7:14 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20070517174533.GA538@elte.hu>
     [not found] ` <200705221147.56571.kernel@prachanda.hub>
     [not found]   ` <20070522062233.GA20002@elte.hu>
     [not found]     ` <200705231110.44526.kernel@prachanda.hub>
     [not found]       ` <20070523063052.GB26814@elte.hu>
2007-05-23 10:56         ` bad networking related lag in v2.6.22-rc2 Patrick McHardy
2007-05-23 11:05           ` Ingo Molnar
2007-05-23 11:25           ` Herbert Xu
2007-05-23 11:33             ` Patrick McHardy
2007-05-23 15:00               ` Linus Torvalds
2007-05-23 17:16                 ` Patrick McHardy
2007-05-23 11:40             ` Ingo Molnar
2007-05-23 21:30               ` David Miller
2007-05-24  5:41                 ` Patrick McHardy
2007-05-24  6:40                   ` David Miller
2007-05-24  7:12                 ` Anant Nitya

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).