* [PATCH] mv643xx_eth: fix SMI bus access timeouts
@ 2008-11-01 5:32 Lennert Buytenhek
2008-11-01 5:47 ` Roland Dreier
` (2 more replies)
0 siblings, 3 replies; 5+ messages in thread
From: Lennert Buytenhek @ 2008-11-01 5:32 UTC (permalink / raw)
To: jeff; +Cc: netdev
The mv643xx_eth mii bus implementation uses wait_event_timeout() to
wait for SMI completion interrupts.
If wait_event_timeout() would return zero, mv643xx_eth would conclude
that the SMI access timed out, but this is not necessarily true --
wait_event_timeout() can also return zero in the case where the SMI
completion interrupt did happen in time but where it took longer than
the requested timeout for the process performing the SMI access to be
scheduled again. This would lead to occasional SMI access timeouts
when the system would be under heavy load.
The fix is to ignore the return value of wait_event_timeout(), and
to re-check the SMI done bit after wait_event_timeout() returns to
determine whether or not the SMI access timed out.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
---
The commit that introduced this was added in the .28 dev cycle, so
this fix is for .28 only. Thanks!
diff --git a/drivers/net/mv643xx_eth.c b/drivers/net/mv643xx_eth.c
index d25a302..0e94ed3 100644
--- a/drivers/net/mv643xx_eth.c
+++ b/drivers/net/mv643xx_eth.c
@@ -1065,9 +1065,12 @@ static int smi_wait_ready(struct mv643xx_eth_shared_private *msp)
return 0;
}
- if (!wait_event_timeout(msp->smi_busy_wait, smi_is_done(msp),
- msecs_to_jiffies(100)))
- return -ETIMEDOUT;
+ if (!smi_is_done(msp)) {
+ wait_event_timeout(msp->smi_busy_wait, smi_is_done(msp),
+ msecs_to_jiffies(100));
+ if (!smi_is_done(msp))
+ return -ETIMEDOUT;
+ }
return 0;
}
^ permalink raw reply related [flat|nested] 5+ messages in thread* Re: [PATCH] mv643xx_eth: fix SMI bus access timeouts
2008-11-01 5:32 [PATCH] mv643xx_eth: fix SMI bus access timeouts Lennert Buytenhek
@ 2008-11-01 5:47 ` Roland Dreier
2008-11-01 5:53 ` Lennert Buytenhek
2008-11-01 5:48 ` Lennert Buytenhek
2008-11-03 20:25 ` Jeff Garzik
2 siblings, 1 reply; 5+ messages in thread
From: Roland Dreier @ 2008-11-01 5:47 UTC (permalink / raw)
To: Lennert Buytenhek; +Cc: jeff, netdev
> If wait_event_timeout() would return zero, mv643xx_eth would conclude
> that the SMI access timed out, but this is not necessarily true --
> wait_event_timeout() can also return zero in the case where the SMI
> completion interrupt did happen in time but where it took longer than
> the requested timeout for the process performing the SMI access to be
> scheduled again. This would lead to occasional SMI access timeouts
> when the system would be under heavy load.
Would it make more sense to fix this in the wait_event_timeout() code
itself a la bb10ed09 ("sched: fix wait_for_completion_timeout() spurious
failure under heavy load")?
- R.
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [PATCH] mv643xx_eth: fix SMI bus access timeouts
2008-11-01 5:47 ` Roland Dreier
@ 2008-11-01 5:53 ` Lennert Buytenhek
0 siblings, 0 replies; 5+ messages in thread
From: Lennert Buytenhek @ 2008-11-01 5:53 UTC (permalink / raw)
To: Roland Dreier; +Cc: jeff, netdev
On Fri, Oct 31, 2008 at 10:47:47PM -0700, Roland Dreier wrote:
> > If wait_event_timeout() would return zero, mv643xx_eth would conclude
> > that the SMI access timed out, but this is not necessarily true --
> > wait_event_timeout() can also return zero in the case where the SMI
> > completion interrupt did happen in time but where it took longer than
> > the requested timeout for the process performing the SMI access to be
> > scheduled again. This would lead to occasional SMI access timeouts
> > when the system would be under heavy load.
>
> Would it make more sense to fix this in the wait_event_timeout() code
> itself a la bb10ed09 ("sched: fix wait_for_completion_timeout() spurious
> failure under heavy load")?
Well, wait_event_timeout() does (or did, before that commit) exactly
what its docbook comment says it does:
* The function returns 0 if the @timeout elapsed, and the remaining
* jiffies if the condition evaluated to true before the timeout elapsed.
Making it return 1 jiffy seems a bit hacky. Why not go all the way
and just make it return 0 or 1 in all cases and audit all the callers
(and update the docbook)?
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] mv643xx_eth: fix SMI bus access timeouts
2008-11-01 5:32 [PATCH] mv643xx_eth: fix SMI bus access timeouts Lennert Buytenhek
2008-11-01 5:47 ` Roland Dreier
@ 2008-11-01 5:48 ` Lennert Buytenhek
2008-11-03 20:25 ` Jeff Garzik
2 siblings, 0 replies; 5+ messages in thread
From: Lennert Buytenhek @ 2008-11-01 5:48 UTC (permalink / raw)
To: jeff; +Cc: netdev
On Sat, Nov 01, 2008 at 06:32:20AM +0100, Lennert Buytenhek wrote:
> The mv643xx_eth mii bus implementation uses wait_event_timeout() to
> wait for SMI completion interrupts.
>
> If wait_event_timeout() would return zero, mv643xx_eth would conclude
> that the SMI access timed out, but this is not necessarily true --
> wait_event_timeout() can also return zero in the case where the SMI
> completion interrupt did happen in time but where it took longer than
> the requested timeout for the process performing the SMI access to be
> scheduled again. This would lead to occasional SMI access timeouts
> when the system would be under heavy load.
FWIW, where I've been seeing this is mostly during heavy softirq
load, e.g. when doing routing when the box can't keep up.
When the system is being hammered like this, simple things like
querying the switch chip for its statistics counters (by doing
"ethtool -S <switch interface>") can take seconds, since querying the
hardware switch stats consists of doing a lot of MII accesses, and
each of those MII accesses takes tens of milliseconds to return because
the issuer of the MII access goes to sleep after issuing the MII access
waiting for an MII done interrupt, but won't get scheduled again to
issue its next MII access until the rx softirq has decided that it has
done enough looping.
This patch makes it a lot more bearable -- but it's still only a bit
of a stopgap:
diff --git a/kernel/softirq.c b/kernel/softirq.c
index c506f26..f7fd630 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -215,7 +215,7 @@ restart:
local_irq_disable();
pending = local_softirq_pending();
- if (pending && --max_restart)
+ if (pending && !need_resched() && --max_restart)
goto restart;
if (pending)
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] mv643xx_eth: fix SMI bus access timeouts
2008-11-01 5:32 [PATCH] mv643xx_eth: fix SMI bus access timeouts Lennert Buytenhek
2008-11-01 5:47 ` Roland Dreier
2008-11-01 5:48 ` Lennert Buytenhek
@ 2008-11-03 20:25 ` Jeff Garzik
2 siblings, 0 replies; 5+ messages in thread
From: Jeff Garzik @ 2008-11-03 20:25 UTC (permalink / raw)
To: Lennert Buytenhek; +Cc: netdev
Lennert Buytenhek wrote:
> The mv643xx_eth mii bus implementation uses wait_event_timeout() to
> wait for SMI completion interrupts.
>
> If wait_event_timeout() would return zero, mv643xx_eth would conclude
> that the SMI access timed out, but this is not necessarily true --
> wait_event_timeout() can also return zero in the case where the SMI
> completion interrupt did happen in time but where it took longer than
> the requested timeout for the process performing the SMI access to be
> scheduled again. This would lead to occasional SMI access timeouts
> when the system would be under heavy load.
>
> The fix is to ignore the return value of wait_event_timeout(), and
> to re-check the SMI done bit after wait_event_timeout() returns to
> determine whether or not the SMI access timed out.
>
> Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
> ---
> The commit that introduced this was added in the .28 dev cycle, so
> this fix is for .28 only. Thanks!
>
> diff --git a/drivers/net/mv643xx_eth.c b/drivers/net/mv643xx_eth.c
> index d25a302..0e94ed3 100644
> --- a/drivers/net/mv643xx_eth.c
> +++ b/drivers/net/mv643xx_eth.c
> @@ -1065,9 +1065,12 @@ static int smi_wait_ready(struct mv643xx_eth_shared_private *msp)
> return 0;
> }
>
> - if (!wait_event_timeout(msp->smi_busy_wait, smi_is_done(msp),
> - msecs_to_jiffies(100)))
> - return -ETIMEDOUT;
> + if (!smi_is_done(msp)) {
> + wait_event_timeout(msp->smi_busy_wait, smi_is_done(msp),
> + msecs_to_jiffies(100));
> + if (!smi_is_done(msp))
> + return -ETIMEDOUT;
> + }
applied
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2008-11-03 20:25 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-11-01 5:32 [PATCH] mv643xx_eth: fix SMI bus access timeouts Lennert Buytenhek
2008-11-01 5:47 ` Roland Dreier
2008-11-01 5:53 ` Lennert Buytenhek
2008-11-01 5:48 ` Lennert Buytenhek
2008-11-03 20:25 ` Jeff Garzik
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).