* [PATCH] MIPS: Kernel hangs occasionally during boot.
@ 2011-11-08 14:59 Al Cooper
2011-11-08 17:55 ` Ralf Baechle
0 siblings, 1 reply; 6+ messages in thread
From: Al Cooper @ 2011-11-08 14:59 UTC (permalink / raw)
To: ralf, linux-mips, linux-kernel; +Cc: Al Cooper
The Kernel hangs occasionally during boot after
"Calibrating delay loop..". This is caused by the
c0_compare_int_usable() routine in cevt-r4k.c returning false which
causes the system to disable the timer and hang later. The false
return happens because the routine is using a series of four calls to
irq_disable_hazard() as a delay while it waits for the timer changes
to propagate to the cp0 cause register. On newer MIPS cores, like the 74K,
the series of irq_disable_hazard() calls turn into ehb instructions and
can take as little as a few clock ticks for all 4 instructions. This
is not enough of a delay, so the routine thinks the timer is not working.
This fix uses up to a max number of cycle counter ticks for the delay
and uses back_to_back_c0_hazard() instead of irq_disable_hazard() to
handle the hazard condition between cp0 writes and cp0 reads.
Signed-off-by: Al Cooper <alcooperx@gmail.com>
---
arch/mips/kernel/cevt-r4k.c | 38 +++++++++++++++++++-------------------
1 files changed, 19 insertions(+), 19 deletions(-)
diff --git a/arch/mips/kernel/cevt-r4k.c b/arch/mips/kernel/cevt-r4k.c
index 98c5a97..e2d8e19 100644
--- a/arch/mips/kernel/cevt-r4k.c
+++ b/arch/mips/kernel/cevt-r4k.c
@@ -103,19 +103,10 @@ static int c0_compare_int_pending(void)
/*
* Compare interrupt can be routed and latched outside the core,
- * so a single execution hazard barrier may not be enough to give
- * it time to clear as seen in the Cause register. 4 time the
- * pipeline depth seems reasonably conservative, and empirically
- * works better in configurations with high CPU/bus clock ratios.
+ * so wait up to worst case number of cycle counter ticks for timer interrupt
+ * changes to propagate to the cause register.
*/
-
-#define compare_change_hazard() \
- do { \
- irq_disable_hazard(); \
- irq_disable_hazard(); \
- irq_disable_hazard(); \
- irq_disable_hazard(); \
- } while (0)
+#define COMPARE_INT_SEEN_TICKS 50
int c0_compare_int_usable(void)
{
@@ -126,8 +117,12 @@ int c0_compare_int_usable(void)
* IP7 already pending? Try to clear it by acking the timer.
*/
if (c0_compare_int_pending()) {
- write_c0_compare(read_c0_count());
- compare_change_hazard();
+ cnt = read_c0_count();
+ write_c0_compare(cnt);
+ back_to_back_c0_hazard();
+ while (read_c0_count() < (cnt + COMPARE_INT_SEEN_TICKS))
+ if (!c0_compare_int_pending())
+ break;
if (c0_compare_int_pending())
return 0;
}
@@ -136,7 +131,7 @@ int c0_compare_int_usable(void)
cnt = read_c0_count();
cnt += delta;
write_c0_compare(cnt);
- compare_change_hazard();
+ back_to_back_c0_hazard();
if ((int)(read_c0_count() - cnt) < 0)
break;
/* increase delta if the timer was already expired */
@@ -145,12 +140,17 @@ int c0_compare_int_usable(void)
while ((int)(read_c0_count() - cnt) <= 0)
; /* Wait for expiry */
- compare_change_hazard();
+ while (read_c0_count() < (cnt + COMPARE_INT_SEEN_TICKS))
+ if (c0_compare_int_pending())
+ break;
if (!c0_compare_int_pending())
return 0;
-
- write_c0_compare(read_c0_count());
- compare_change_hazard();
+ cnt = read_c0_count();
+ write_c0_compare(cnt);
+ back_to_back_c0_hazard();
+ while (read_c0_count() < (cnt + COMPARE_INT_SEEN_TICKS))
+ if (!c0_compare_int_pending())
+ break;
if (c0_compare_int_pending())
return 0;
--
1.7.6
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] MIPS: Kernel hangs occasionally during boot.
2011-11-08 14:59 [PATCH] MIPS: Kernel hangs occasionally during boot Al Cooper
@ 2011-11-08 17:55 ` Ralf Baechle
2011-11-09 7:40 ` Gleb O. Raiko
0 siblings, 1 reply; 6+ messages in thread
From: Ralf Baechle @ 2011-11-08 17:55 UTC (permalink / raw)
To: Al Cooper; +Cc: linux-mips, linux-kernel
On Tue, Nov 08, 2011 at 09:59:01AM -0500, Al Cooper wrote:
> arch/mips/kernel/cevt-r4k.c | 38 +++++++++++++++++++-------------------
> 1 files changed, 19 insertions(+), 19 deletions(-)
>
> diff --git a/arch/mips/kernel/cevt-r4k.c b/arch/mips/kernel/cevt-r4k.c
> index 98c5a97..e2d8e19 100644
> --- a/arch/mips/kernel/cevt-r4k.c
> +++ b/arch/mips/kernel/cevt-r4k.c
> @@ -103,19 +103,10 @@ static int c0_compare_int_pending(void)
>
> /*
> * Compare interrupt can be routed and latched outside the core,
> - * so a single execution hazard barrier may not be enough to give
> - * it time to clear as seen in the Cause register. 4 time the
> - * pipeline depth seems reasonably conservative, and empirically
> - * works better in configurations with high CPU/bus clock ratios.
> + * so wait up to worst case number of cycle counter ticks for timer interrupt
> + * changes to propagate to the cause register.
> */
> -
> -#define compare_change_hazard() \
> - do { \
> - irq_disable_hazard(); \
> - irq_disable_hazard(); \
> - irq_disable_hazard(); \
> - irq_disable_hazard(); \
> - } while (0)
> +#define COMPARE_INT_SEEN_TICKS 50
>
> int c0_compare_int_usable(void)
> {
> @@ -126,8 +117,12 @@ int c0_compare_int_usable(void)
> * IP7 already pending? Try to clear it by acking the timer.
> */
> if (c0_compare_int_pending()) {
> - write_c0_compare(read_c0_count());
> - compare_change_hazard();
> + cnt = read_c0_count();
> + write_c0_compare(cnt);
> + back_to_back_c0_hazard();
back_to_back_c0_hazard is to separate cp0 writes from subsequent reads from
the same cp0 register. So I think no back_to_back_c0_hazard() is needed
here.
> + while (read_c0_count() < (cnt + COMPARE_INT_SEEN_TICKS))
> + if (!c0_compare_int_pending())
> + break;
> if (c0_compare_int_pending())
> return 0;
> }
> @@ -136,7 +131,7 @@ int c0_compare_int_usable(void)
> cnt = read_c0_count();
> cnt += delta;
> write_c0_compare(cnt);
> - compare_change_hazard();
> + back_to_back_c0_hazard();
Same comment as above.
> if ((int)(read_c0_count() - cnt) < 0)
> break;
> /* increase delta if the timer was already expired */
> @@ -145,12 +140,17 @@ int c0_compare_int_usable(void)
> while ((int)(read_c0_count() - cnt) <= 0)
> ; /* Wait for expiry */
>
> - compare_change_hazard();
> + while (read_c0_count() < (cnt + COMPARE_INT_SEEN_TICKS))
> + if (c0_compare_int_pending())
> + break;
> if (!c0_compare_int_pending())
> return 0;
> -
> - write_c0_compare(read_c0_count());
> - compare_change_hazard();
> + cnt = read_c0_count();
> + write_c0_compare(cnt);
> + back_to_back_c0_hazard();
> + while (read_c0_count() < (cnt + COMPARE_INT_SEEN_TICKS))
> + if (!c0_compare_int_pending())
> + break;
> if (c0_compare_int_pending())
> return 0;
I've applied your patch but we may need another hazard barrier to
replace back_to_back_c0_hazard().
Ralf
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] MIPS: Kernel hangs occasionally during boot.
2011-11-08 17:55 ` Ralf Baechle
@ 2011-11-09 7:40 ` Gleb O. Raiko
2011-11-09 9:13 ` Ralf Baechle
2011-11-09 10:34 ` Ralf Baechle
0 siblings, 2 replies; 6+ messages in thread
From: Gleb O. Raiko @ 2011-11-09 7:40 UTC (permalink / raw)
To: Ralf Baechle; +Cc: Al Cooper, linux-mips, linux-kernel
On 08.11.2011 21:55, Ralf Baechle wrote:
> but we may need another hazard barrier to
> replace back_to_back_c0_hazard().
Urgently. We need some ticks to wait until counter state machine has
been updated. The amount of ticks may occasionally be the same as in
case of back_to_back_hazard for some cpus. It's completely different for
others, I sure. Original compare_change_hazard waits up to 12 ticks for
r4k. While I don't think this amount should depend on irq_disable_hazard
as old code assumes, we may still need 12 or so ticks for old cpus.
> Author: Al Cooper <alcooperx@gmail.com> Tue Nov 8 09:59:01 2011 -0500
> Comitter: Ralf Baechle <ralf@linux-mips.org> Tue Nov 8 16:52:51 2011 +0000
> Commit: 9121470d99c029493bd55daa11607b398fe9aea3
> Gitweb: http://git.linux-mips.org/g/linux/9121470d
Could you fix those links, it's broken after you moved git repo in?
Regards,
Gleb.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] MIPS: Kernel hangs occasionally during boot.
2011-11-09 7:40 ` Gleb O. Raiko
@ 2011-11-09 9:13 ` Ralf Baechle
2011-11-09 10:34 ` Ralf Baechle
1 sibling, 0 replies; 6+ messages in thread
From: Ralf Baechle @ 2011-11-09 9:13 UTC (permalink / raw)
To: Gleb O. Raiko; +Cc: Al Cooper, linux-mips, linux-kernel
On Wed, Nov 09, 2011 at 11:40:21AM +0400, Gleb O. Raiko wrote:
> >Author: Al Cooper <alcooperx@gmail.com> Tue Nov 8 09:59:01 2011 -0500
> >Comitter: Ralf Baechle <ralf@linux-mips.org> Tue Nov 8 16:52:51 2011 +0000
> >Commit: 9121470d99c029493bd55daa11607b398fe9aea3
> >Gitweb: http://git.linux-mips.org/g/linux/9121470d
> Could you fix those links, it's broken after you moved git repo in?
Future emails will use URLs that look like
http://git.linux-mips.org/g/ralf/linux/9121470d
with the full repository path sans the .git suffix. There also is a
compat hack to keep the URLs from a few thousand old commit mails working.
Thanks for reporting!
Ralf
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] MIPS: Kernel hangs occasionally during boot.
2011-11-09 7:40 ` Gleb O. Raiko
2011-11-09 9:13 ` Ralf Baechle
@ 2011-11-09 10:34 ` Ralf Baechle
2011-11-09 11:26 ` Gleb O. Raiko
1 sibling, 1 reply; 6+ messages in thread
From: Ralf Baechle @ 2011-11-09 10:34 UTC (permalink / raw)
To: Gleb O. Raiko; +Cc: Al Cooper, linux-mips, linux-kernel
On Wed, Nov 09, 2011 at 11:40:21AM +0400, Gleb O. Raiko wrote:
> On 08.11.2011 21:55, Ralf Baechle wrote:
> >but we may need another hazard barrier to
> >replace back_to_back_c0_hazard().
> Urgently. We need some ticks to wait until counter state machine has
> been updated. The amount of ticks may occasionally be the same as in
> case of back_to_back_hazard for some cpus. It's completely different
> for others, I sure. Original compare_change_hazard waits up to 12
> ticks for r4k. While I don't think this amount should depend on
> irq_disable_hazard as old code assumes, we may still need 12 or so
> ticks for old cpus.
Hmm... Looking at the R4000 manual which generall has the longest
pipeline hazards, mtc0 gets executed at stage 7, interrupts get sampled
at stage 3 meaning there is a (7 - 3 - 1) = 3 cycles hazard. Does
that one statisfy your constraints? Or are additional cycles needed
for a hazard that's generated outside of the CPU's pipeline?
Ralf
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] MIPS: Kernel hangs occasionally during boot.
2011-11-09 10:34 ` Ralf Baechle
@ 2011-11-09 11:26 ` Gleb O. Raiko
0 siblings, 0 replies; 6+ messages in thread
From: Gleb O. Raiko @ 2011-11-09 11:26 UTC (permalink / raw)
To: Ralf Baechle; +Cc: Al Cooper, linux-mips, linux-kernel
On 09.11.2011 14:34, Ralf Baechle wrote:
> Hmm... Looking at the R4000 manual which generall has the longest
> pipeline hazards, mtc0 gets executed at stage 7, interrupts get sampled
> at stage 3 meaning there is a (7 - 3 - 1) = 3 cycles hazard. Does
> that one statisfy your constraints? Or are additional cycles needed
> for a hazard that's generated outside of the CPU's pipeline?
In fact, current back_to_back_hazard is more than enough for cpus I deal
with. I guess, required time to wait equals number of stages between EX
(or RD) and WB stages for modern cpus, because CP0 CAUSE is updated
during WB nowadays.
I suspect, the time required to update internal counter logic for
original r4k might be bigger though. At least old code waited 12 cycles
(4*irq_disable_hazard which is 3 for r4k). Perhaps, we should keep this
code and insert the same amount of nops for old cpus at least.
Regards,
Gleb.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2011-11-09 11:17 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-11-08 14:59 [PATCH] MIPS: Kernel hangs occasionally during boot Al Cooper
2011-11-08 17:55 ` Ralf Baechle
2011-11-09 7:40 ` Gleb O. Raiko
2011-11-09 9:13 ` Ralf Baechle
2011-11-09 10:34 ` Ralf Baechle
2011-11-09 11:26 ` Gleb O. Raiko
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox