[PATCH] MIPS: Kernel hangs occasionally during boot.

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] MIPS: Kernel hangs occasionally during boot.
@ 2011-11-08 14:59 Al Cooper
  2011-11-08 17:55 ` Ralf Baechle
  0 siblings, 1 reply; 6+ messages in thread
From: Al Cooper @ 2011-11-08 14:59 UTC (permalink / raw)
  To: ralf, linux-mips, linux-kernel; +Cc: Al Cooper

The Kernel hangs occasionally during boot after
"Calibrating delay loop..". This is caused by the
c0_compare_int_usable() routine in cevt-r4k.c returning false which
causes the system to disable the timer and hang later. The false
return happens because the routine is using a series of four calls to
irq_disable_hazard() as a delay while it waits for the timer changes
to propagate to the cp0 cause register. On newer MIPS cores, like the 74K,
the series of irq_disable_hazard() calls turn into ehb instructions and
can take as little as a few clock ticks for all 4 instructions. This
is not enough of a delay, so the routine thinks the timer is not working.
This fix uses up to a max number of cycle counter ticks for the delay
and uses back_to_back_c0_hazard() instead of irq_disable_hazard() to
handle the hazard condition between cp0 writes and cp0 reads.

Signed-off-by: Al Cooper <alcooperx@gmail.com>
---
 arch/mips/kernel/cevt-r4k.c |   38 +++++++++++++++++++-------------------
 1 files changed, 19 insertions(+), 19 deletions(-)

diff --git a/arch/mips/kernel/cevt-r4k.c b/arch/mips/kernel/cevt-r4k.c
index 98c5a97..e2d8e19 100644
--- a/arch/mips/kernel/cevt-r4k.c
+++ b/arch/mips/kernel/cevt-r4k.c
@@ -103,19 +103,10 @@ static int c0_compare_int_pending(void)
 
 /*
  * Compare interrupt can be routed and latched outside the core,
- * so a single execution hazard barrier may not be enough to give
- * it time to clear as seen in the Cause register.  4 time the
- * pipeline depth seems reasonably conservative, and empirically
- * works better in configurations with high CPU/bus clock ratios.
+ * so wait up to worst case number of cycle counter ticks for timer interrupt
+ * changes to propagate to the cause register.
  */
-
-#define compare_change_hazard() \
-	do { \
-		irq_disable_hazard(); \
-		irq_disable_hazard(); \
-		irq_disable_hazard(); \
-		irq_disable_hazard(); \
-	} while (0)
+#define COMPARE_INT_SEEN_TICKS 50
 
 int c0_compare_int_usable(void)
 {
@@ -126,8 +117,12 @@ int c0_compare_int_usable(void)
 	 * IP7 already pending?  Try to clear it by acking the timer.
 	 */
 	if (c0_compare_int_pending()) {
-		write_c0_compare(read_c0_count());
-		compare_change_hazard();
+		cnt = read_c0_count();
+		write_c0_compare(cnt);
+		back_to_back_c0_hazard();
+		while (read_c0_count() < (cnt  + COMPARE_INT_SEEN_TICKS))
+			if (!c0_compare_int_pending())
+				break;
 		if (c0_compare_int_pending())
 			return 0;
 	}
@@ -136,7 +131,7 @@ int c0_compare_int_usable(void)
 		cnt = read_c0_count();
 		cnt += delta;
 		write_c0_compare(cnt);
-		compare_change_hazard();
+		back_to_back_c0_hazard();
 		if ((int)(read_c0_count() - cnt) < 0)
 		    break;
 		/* increase delta if the timer was already expired */
@@ -145,12 +140,17 @@ int c0_compare_int_usable(void)
 	while ((int)(read_c0_count() - cnt) <= 0)
 		;	/* Wait for expiry  */
 
-	compare_change_hazard();
+	while (read_c0_count() < (cnt + COMPARE_INT_SEEN_TICKS))
+		if (c0_compare_int_pending())
+			break;
 	if (!c0_compare_int_pending())
 		return 0;
-
-	write_c0_compare(read_c0_count());
-	compare_change_hazard();
+	cnt = read_c0_count();
+	write_c0_compare(cnt);
+	back_to_back_c0_hazard();
+	while (read_c0_count() < (cnt + COMPARE_INT_SEEN_TICKS))
+		if (!c0_compare_int_pending())
+			break;
 	if (c0_compare_int_pending())
 		return 0;
 
-- 
1.7.6



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] MIPS: Kernel hangs occasionally during boot.
  2011-11-08 14:59 [PATCH] MIPS: Kernel hangs occasionally during boot Al Cooper
@ 2011-11-08 17:55 ` Ralf Baechle
  2011-11-09  7:40   ` Gleb O. Raiko
  0 siblings, 1 reply; 6+ messages in thread
From: Ralf Baechle @ 2011-11-08 17:55 UTC (permalink / raw)
  To: Al Cooper; +Cc: linux-mips, linux-kernel

On Tue, Nov 08, 2011 at 09:59:01AM -0500, Al Cooper wrote:

>  arch/mips/kernel/cevt-r4k.c |   38 +++++++++++++++++++-------------------
>  1 files changed, 19 insertions(+), 19 deletions(-)
> 
> diff --git a/arch/mips/kernel/cevt-r4k.c b/arch/mips/kernel/cevt-r4k.c
> index 98c5a97..e2d8e19 100644
> --- a/arch/mips/kernel/cevt-r4k.c
> +++ b/arch/mips/kernel/cevt-r4k.c
> @@ -103,19 +103,10 @@ static int c0_compare_int_pending(void)
>  
>  /*
>   * Compare interrupt can be routed and latched outside the core,
> - * so a single execution hazard barrier may not be enough to give
> - * it time to clear as seen in the Cause register.  4 time the
> - * pipeline depth seems reasonably conservative, and empirically
> - * works better in configurations with high CPU/bus clock ratios.
> + * so wait up to worst case number of cycle counter ticks for timer interrupt
> + * changes to propagate to the cause register.
>   */
> -
> -#define compare_change_hazard() \
> -	do { \
> -		irq_disable_hazard(); \
> -		irq_disable_hazard(); \
> -		irq_disable_hazard(); \
> -		irq_disable_hazard(); \
> -	} while (0)
> +#define COMPARE_INT_SEEN_TICKS 50
>  
>  int c0_compare_int_usable(void)
>  {
> @@ -126,8 +117,12 @@ int c0_compare_int_usable(void)
>  	 * IP7 already pending?  Try to clear it by acking the timer.
>  	 */
>  	if (c0_compare_int_pending()) {
> -		write_c0_compare(read_c0_count());
> -		compare_change_hazard();
> +		cnt = read_c0_count();
> +		write_c0_compare(cnt);
> +		back_to_back_c0_hazard();

back_to_back_c0_hazard is to separate cp0 writes from subsequent reads from
the same cp0 register.  So I think no back_to_back_c0_hazard() is needed
here.

> +		while (read_c0_count() < (cnt  + COMPARE_INT_SEEN_TICKS))
> +			if (!c0_compare_int_pending())
> +				break;
>  		if (c0_compare_int_pending())
>  			return 0;
>  	}
> @@ -136,7 +131,7 @@ int c0_compare_int_usable(void)
>  		cnt = read_c0_count();
>  		cnt += delta;
>  		write_c0_compare(cnt);
> -		compare_change_hazard();
> +		back_to_back_c0_hazard();

Same comment as above.

>  		if ((int)(read_c0_count() - cnt) < 0)
>  		    break;
>  		/* increase delta if the timer was already expired */
> @@ -145,12 +140,17 @@ int c0_compare_int_usable(void)
>  	while ((int)(read_c0_count() - cnt) <= 0)
>  		;	/* Wait for expiry  */
>  
> -	compare_change_hazard();
> +	while (read_c0_count() < (cnt + COMPARE_INT_SEEN_TICKS))
> +		if (c0_compare_int_pending())
> +			break;
>  	if (!c0_compare_int_pending())
>  		return 0;
> -
> -	write_c0_compare(read_c0_count());
> -	compare_change_hazard();
> +	cnt = read_c0_count();
> +	write_c0_compare(cnt);
> +	back_to_back_c0_hazard();
> +	while (read_c0_count() < (cnt + COMPARE_INT_SEEN_TICKS))
> +		if (!c0_compare_int_pending())
> +			break;
>  	if (c0_compare_int_pending())
>  		return 0;

I've applied your patch but we may need another hazard barrier to
replace back_to_back_c0_hazard().

  Ralf

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] MIPS: Kernel hangs occasionally during boot.
  2011-11-08 17:55 ` Ralf Baechle
@ 2011-11-09  7:40   ` Gleb O. Raiko
  2011-11-09  9:13     ` Ralf Baechle
  2011-11-09 10:34     ` Ralf Baechle
  0 siblings, 2 replies; 6+ messages in thread
From: Gleb O. Raiko @ 2011-11-09  7:40 UTC (permalink / raw)
  To: Ralf Baechle; +Cc: Al Cooper, linux-mips, linux-kernel

On 08.11.2011 21:55, Ralf Baechle wrote:
> but we may need another hazard barrier to
> replace back_to_back_c0_hazard().
Urgently. We need some ticks to wait until counter state machine has 
been updated. The amount of ticks may occasionally be the same as in 
case of back_to_back_hazard for some cpus. It's completely different for 
others, I sure. Original compare_change_hazard waits up to 12 ticks for 
r4k. While I don't think this amount should depend on irq_disable_hazard 
as old code assumes, we may still need 12 or so ticks for old cpus.

> Author: Al Cooper <alcooperx@gmail.com> Tue Nov 8 09:59:01 2011 -0500
> Comitter: Ralf Baechle <ralf@linux-mips.org> Tue Nov 8 16:52:51 2011 +0000
> Commit: 9121470d99c029493bd55daa11607b398fe9aea3
> Gitweb: http://git.linux-mips.org/g/linux/9121470d
Could you fix those links, it's broken after you moved git repo in?

Regards,
Gleb.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] MIPS: Kernel hangs occasionally during boot.
  2011-11-09  7:40   ` Gleb O. Raiko
@ 2011-11-09  9:13     ` Ralf Baechle
  2011-11-09 10:34     ` Ralf Baechle
  1 sibling, 0 replies; 6+ messages in thread
From: Ralf Baechle @ 2011-11-09  9:13 UTC (permalink / raw)
  To: Gleb O. Raiko; +Cc: Al Cooper, linux-mips, linux-kernel

On Wed, Nov 09, 2011 at 11:40:21AM +0400, Gleb O. Raiko wrote:

> >Author: Al Cooper <alcooperx@gmail.com> Tue Nov 8 09:59:01 2011 -0500
> >Comitter: Ralf Baechle <ralf@linux-mips.org> Tue Nov 8 16:52:51 2011 +0000
> >Commit: 9121470d99c029493bd55daa11607b398fe9aea3
> >Gitweb: http://git.linux-mips.org/g/linux/9121470d
> Could you fix those links, it's broken after you moved git repo in?

Future emails will use URLs that look like

   http://git.linux-mips.org/g/ralf/linux/9121470d

with the full repository path sans the .git suffix.  There also is a
compat hack to keep the URLs from a few thousand old commit mails working.

Thanks for reporting!

  Ralf

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] MIPS: Kernel hangs occasionally during boot.
  2011-11-09  7:40   ` Gleb O. Raiko
  2011-11-09  9:13     ` Ralf Baechle
@ 2011-11-09 10:34     ` Ralf Baechle
  2011-11-09 11:26       ` Gleb O. Raiko
  1 sibling, 1 reply; 6+ messages in thread
From: Ralf Baechle @ 2011-11-09 10:34 UTC (permalink / raw)
  To: Gleb O. Raiko; +Cc: Al Cooper, linux-mips, linux-kernel

On Wed, Nov 09, 2011 at 11:40:21AM +0400, Gleb O. Raiko wrote:

> On 08.11.2011 21:55, Ralf Baechle wrote:
> >but we may need another hazard barrier to
> >replace back_to_back_c0_hazard().
> Urgently. We need some ticks to wait until counter state machine has
> been updated. The amount of ticks may occasionally be the same as in
> case of back_to_back_hazard for some cpus. It's completely different
> for others, I sure. Original compare_change_hazard waits up to 12
> ticks for r4k. While I don't think this amount should depend on
> irq_disable_hazard as old code assumes, we may still need 12 or so
> ticks for old cpus.

Hmm...  Looking at the R4000 manual which generall has the longest
pipeline hazards, mtc0 gets executed at stage 7, interrupts get sampled
at stage 3 meaning there is a (7 - 3 - 1) = 3 cycles hazard.  Does
that one statisfy your constraints?  Or are additional cycles needed
for a hazard that's generated outside of the CPU's pipeline?

  Ralf

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] MIPS: Kernel hangs occasionally during boot.
  2011-11-09 10:34     ` Ralf Baechle
@ 2011-11-09 11:26       ` Gleb O. Raiko
  0 siblings, 0 replies; 6+ messages in thread
From: Gleb O. Raiko @ 2011-11-09 11:26 UTC (permalink / raw)
  To: Ralf Baechle; +Cc: Al Cooper, linux-mips, linux-kernel

On 09.11.2011 14:34, Ralf Baechle wrote:
> Hmm...  Looking at the R4000 manual which generall has the longest
> pipeline hazards, mtc0 gets executed at stage 7, interrupts get sampled
> at stage 3 meaning there is a (7 - 3 - 1) = 3 cycles hazard.  Does
> that one statisfy your constraints?  Or are additional cycles needed
> for a hazard that's generated outside of the CPU's pipeline?
In fact, current back_to_back_hazard is more than enough for cpus I deal 
with. I guess, required time to wait equals number of stages between EX 
(or RD) and WB stages for modern cpus, because CP0 CAUSE is updated 
during WB nowadays.

I suspect, the time required to update internal counter logic for 
original r4k might be bigger though. At least old code waited 12 cycles 
(4*irq_disable_hazard which is 3 for r4k). Perhaps, we should keep this 
code and insert the same amount of nops for old cpus at least.

Regards,
Gleb.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2011-11-09 11:17 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-11-08 14:59 [PATCH] MIPS: Kernel hangs occasionally during boot Al Cooper
2011-11-08 17:55 ` Ralf Baechle
2011-11-09  7:40   ` Gleb O. Raiko
2011-11-09  9:13     ` Ralf Baechle
2011-11-09 10:34     ` Ralf Baechle
2011-11-09 11:26       ` Gleb O. Raiko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox