All of lore.kernel.org
 help / color / mirror / Atom feed
* Patch: Fix pending timer stuck condition
@ 2005-10-21 18:02 Langsdorf, Mark
  2005-10-21 19:19 ` Dave Jones
  0 siblings, 1 reply; 4+ messages in thread
From: Langsdorf, Mark @ 2005-10-21 18:02 UTC (permalink / raw)
  To: cpufreq

[-- Attachment #1: Type: text/plain, Size: 667 bytes --]

AMD recently discovered that on some hardware,
there is a race condition possible when a C-state
change request goes onto the bus at the same time
as a P-state change request.  Both requests happen,
but the southbridge hardware only acknowledges the
C-state change.  The PowerNow! driver is then 
stuck in a loop, waiting for the P-state change
acknowledgement.  The driver eventually times out,
but can no longer perform P-state changes.

It turns out the solution is to resend the P-state
change, which the southbridge will acknowledge
normally.

Thanks to Johannes Winkelmann for reporting this
and testing the fix.

-Mark Langsdorf
AMD, Inc.


[-- Attachment #2: pn-collision-retry.patch --]
[-- Type: application/octet-stream, Size: 2226 bytes --]

--- linux-2.6.13.4/arch/i386/kernel/cpu/cpufreq/powernow-k8.c	2005-10-10 13:54:29.000000000 -0500
+++ linux-revamp/arch/i386/kernel/cpu/cpufreq/powernow-k8-2.c	2005-10-21 12:44:28.000000000 -0500
@@ -44,7 +44,7 @@
 
 #define PFX "powernow-k8: "
 #define BFX PFX "BIOS error: "
-#define VERSION "version 1.50.3"
+#define VERSION "version 1.50.4"
 #include "powernow-k8.h"
 
 /* serialize freq changes  */
@@ -111,7 +111,7 @@
 	u32 i = 0;
 
 	do {
-		if (i++ > 0x1000000) {
+		if (i++ > 10000) {
 			printk(KERN_ERR PFX "detected change pending stuck\n");
 			return 1;
 		}
@@ -159,6 +159,7 @@
 {
 	u32 lo;
 	u32 savevid = data->currvid;
+	u32 i = 0;
 
 	if ((fid & INVALID_FID_MASK) || (data->currvid & INVALID_VID_MASK)) {
 		printk(KERN_ERR PFX "internal error - overflow on fid write\n");
@@ -170,10 +171,13 @@
 	dprintk("writing fid 0x%x, lo 0x%x, hi 0x%x\n",
 		fid, lo, data->plllock * PLL_LOCK_CONVERSION);
 
-	wrmsr(MSR_FIDVID_CTL, lo, data->plllock * PLL_LOCK_CONVERSION);
-
-	if (query_current_values_with_pending_wait(data))
-		return 1;
+	do {
+		wrmsr(MSR_FIDVID_CTL, lo, data->plllock * PLL_LOCK_CONVERSION);
+		if (i++ > 100) {
+			printk(KERN_ERR PFX "internal error - pending bit very stuck - no further pstate changes possible\n");
+			retrun 1;
+		}			
+	} while (query_current_values_with_pending_wait(data));
 
 	count_off_irt(data);
 
@@ -197,6 +201,7 @@
 {
 	u32 lo;
 	u32 savefid = data->currfid;
+	int i = 0;
 
 	if ((data->currfid & INVALID_FID_MASK) || (vid & INVALID_VID_MASK)) {
 		printk(KERN_ERR PFX "internal error - overflow on vid write\n");
@@ -208,10 +213,13 @@
 	dprintk("writing vid 0x%x, lo 0x%x, hi 0x%x\n",
 		vid, lo, STOP_GRANT_5NS);
 
-	wrmsr(MSR_FIDVID_CTL, lo, STOP_GRANT_5NS);
-
-	if (query_current_values_with_pending_wait(data))
-		return 1;
+	do {
+		wrmsr(MSR_FIDVID_CTL, lo, STOP_GRANT_5NS);
+                if (i++ > 100) {
+                        printk(KERN_ERR PFX "internal error - pending bit very stuck - no further pstate changes possible\n");
+                        return 1;
+                }
+	} while (query_current_values_with_pending_wait(data));
 
 	if (savefid != data->currfid) {
 		printk(KERN_ERR PFX "fid changed on vid trans, old 0x%x new 0x%x\n",

[-- Attachment #3: Type: text/plain, Size: 147 bytes --]

_______________________________________________
Cpufreq mailing list
Cpufreq@lists.linux.org.uk
http://lists.linux.org.uk/mailman/listinfo/cpufreq

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Patch: Fix pending timer stuck condition
  2005-10-21 18:02 Patch: Fix pending timer stuck condition Langsdorf, Mark
@ 2005-10-21 19:19 ` Dave Jones
  0 siblings, 0 replies; 4+ messages in thread
From: Dave Jones @ 2005-10-21 19:19 UTC (permalink / raw)
  To: Langsdorf, Mark; +Cc: cpufreq

On Fri, Oct 21, 2005 at 01:02:48PM -0500, Langsdorf, Mark wrote:
 > AMD recently discovered that on some hardware,
 > there is a race condition possible when a C-state
 > change request goes onto the bus at the same time
 > as a P-state change request.  Both requests happen,
 > but the southbridge hardware only acknowledges the
 > C-state change.  The PowerNow! driver is then 
 > stuck in a loop, waiting for the P-state change
 > acknowledgement.  The driver eventually times out,
 > but can no longer perform P-state changes.
 > 
 > It turns out the solution is to resend the P-state
 > change, which the southbridge will acknowledge
 > normally.
 > 
 > Thanks to Johannes Winkelmann for reporting this
 > and testing the fix.

Excellent detective work!
I'll get this sent along to Linus for inclusion in 2.6.14
as there have been a number of reports of this.

One question though, this part..

-               if (i++ > 0x1000000) {
+               if (i++ > 10000) {
                        printk(KERN_ERR PFX "detected change pending stuck\n");
                        return 1;


means that on affected hardware, we'll see that msg a lot more
regularly now. Was this intentional, or a leftover debugging change ?

Final nit, please add a Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com>
line for future patches.  (I still apply them if they lack that line,
but following the established protocol is a nice bonus touch ;-)

		Dave

^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: Patch: Fix pending timer stuck condition
@ 2005-10-21 19:42 Langsdorf, Mark
  0 siblings, 0 replies; 4+ messages in thread
From: Langsdorf, Mark @ 2005-10-21 19:42 UTC (permalink / raw)
  To: Dave Jones; +Cc: cpufreq

[-- Attachment #1: Type: text/plain, Size: 2211 bytes --]

> On Fri, Oct 21, 2005 at 01:02:48PM -0500, Langsdorf, Mark 
> wrote:  > AMD recently discovered that on some hardware,  > 
> there is a race condition possible when a C-state  > change 
> request goes onto the bus at the same time  > as a P-state 
> change request.  Both requests happen,  > but the southbridge 
> hardware only acknowledges the  > C-state change.  The 
> PowerNow! driver is then 
>  > stuck in a loop, waiting for the P-state change
>  > acknowledgement.  The driver eventually times out,
>  > but can no longer perform P-state changes.
>  > 
>  > It turns out the solution is to resend the P-state
>  > change, which the southbridge will acknowledge
>  > normally.
>  > 
>  > Thanks to Johannes Winkelmann for reporting this
>  > and testing the fix.
> 
> Excellent detective work!

Thanks.

> I'll get this sent along to Linus for inclusion in 2.6.14
> as there have been a number of reports of this.

I know, I know...
 
> One question though, this part..
> 
> -               if (i++ > 0x1000000) {
> +               if (i++ > 10000) {
>                         printk(KERN_ERR PFX "detected change 
> pending stuck\n");
>                         return 1;
> 
> 
> means that on affected hardware, we'll see that msg a lot 
> more regularly now. Was this intentional, or a leftover 
> debugging change ?

Originally, that code read an MSR 16 million times and
then timed out.  Even with a fast processor, that takes
some time.  Under testing, Johannes noted that he lost
his keyboard for 2-3 seconds during a retried frequency
transition.  Bringing down the timeout made it much
more responsive.

You've got a point, though - that message should really
be a dprintk now, not a KERN_ERR.  It's harmless with
the retry unless the retry fails after 100 tries.

> Final nit, please add a Signed-off-by: Mark Langsdorf 
> <mark.langsdorf@amd.com> line for future patches.  (I still 
> apply them if they lack that line, but following the 
> established protocol is a nice bonus touch ;-)

Doh!  Will do.

Resubmitting the patch with the dprintk

Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com>

-Mark Langsdorf
AMD, Inc.

[-- Attachment #2: pn-collision-retry.patch --]
[-- Type: application/octet-stream, Size: 2311 bytes --]

--- linux-2.6.13.4/arch/i386/kernel/cpu/cpufreq/powernow-k8.c	2005-10-21 14:39:50.000000000 -0500
+++ linux-revamp/arch/i386/kernel/cpu/cpufreq/powernow-k8-2.c	2005-10-21 14:30:25.000000000 -0500
@@ -44,7 +44,7 @@
 
 #define PFX "powernow-k8: "
 #define BFX PFX "BIOS error: "
-#define VERSION "version 1.50.3"
+#define VERSION "version 1.50.4"
 #include "powernow-k8.h"
 
 /* serialize freq changes  */
@@ -111,8 +111,8 @@
 	u32 i = 0;
 
 	do {
-		if (i++ > 0x1000000) {
-			printk(KERN_ERR PFX "detected change pending stuck\n");
+		if (i++ > 10000) {
+			dprintk("detected change pending stuck\n");
 			return 1;
 		}
 		rdmsr(MSR_FIDVID_STATUS, lo, hi);
@@ -159,6 +159,7 @@
 {
 	u32 lo;
 	u32 savevid = data->currvid;
+	u32 i = 0;
 
 	if ((fid & INVALID_FID_MASK) || (data->currvid & INVALID_VID_MASK)) {
 		printk(KERN_ERR PFX "internal error - overflow on fid write\n");
@@ -170,10 +171,13 @@
 	dprintk("writing fid 0x%x, lo 0x%x, hi 0x%x\n",
 		fid, lo, data->plllock * PLL_LOCK_CONVERSION);
 
-	wrmsr(MSR_FIDVID_CTL, lo, data->plllock * PLL_LOCK_CONVERSION);
-
-	if (query_current_values_with_pending_wait(data))
-		return 1;
+	do {
+		wrmsr(MSR_FIDVID_CTL, lo, data->plllock * PLL_LOCK_CONVERSION);
+		if (i++ > 100) {
+			printk(KERN_ERR PFX "internal error - pending bit very stuck - no further pstate changes possible\n");
+			retrun 1;
+		}			
+	} while (query_current_values_with_pending_wait(data));
 
 	count_off_irt(data);
 
@@ -197,6 +201,7 @@
 {
 	u32 lo;
 	u32 savefid = data->currfid;
+	int i = 0;
 
 	if ((data->currfid & INVALID_FID_MASK) || (vid & INVALID_VID_MASK)) {
 		printk(KERN_ERR PFX "internal error - overflow on vid write\n");
@@ -208,10 +213,13 @@
 	dprintk("writing vid 0x%x, lo 0x%x, hi 0x%x\n",
 		vid, lo, STOP_GRANT_5NS);
 
-	wrmsr(MSR_FIDVID_CTL, lo, STOP_GRANT_5NS);
-
-	if (query_current_values_with_pending_wait(data))
-		return 1;
+	do {
+		wrmsr(MSR_FIDVID_CTL, lo, STOP_GRANT_5NS);
+                if (i++ > 100) {
+                        printk(KERN_ERR PFX "internal error - pending bit very stuck - no further pstate changes possible\n");
+                        return 1;
+                }
+	} while (query_current_values_with_pending_wait(data));
 
 	if (savefid != data->currfid) {
 		printk(KERN_ERR PFX "fid changed on vid trans, old 0x%x new 0x%x\n",

[-- Attachment #3: Type: text/plain, Size: 147 bytes --]

_______________________________________________
Cpufreq mailing list
Cpufreq@lists.linux.org.uk
http://lists.linux.org.uk/mailman/listinfo/cpufreq

^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: Patch: Fix pending timer stuck condition
@ 2005-10-21 21:34 Devriendt, Paul
  0 siblings, 0 replies; 4+ messages in thread
From: Devriendt, Paul @ 2005-10-21 21:34 UTC (permalink / raw)
  To: Dave Jones, Langsdorf, Mark; +Cc: cpufreq

 
> One question though, this part..
> 
> -               if (i++ > 0x1000000) {
> +               if (i++ > 10000) {
>                         printk(KERN_ERR PFX "detected change 
> pending stuck\n");
>                         return 1;
> 
> 
> means that on affected hardware, we'll see that msg a lot more
> regularly now. Was this intentional, or a leftover debugging change ?

The original huge number was dreamt up by myself after measuring 
the worst case for the pending bit to clear, and then multiplying
by 100,000 or something equally ridiculous, to make sure it never
triggered by accident. As the hardware guarantees that the pending
bit will clear, the driver was supposed to loop for ever, but
polling for ever in a kernel driver seemed like a bad idea. The
lower value makes sense now that the condition occurs and is 
understood.

Paul.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2005-10-21 21:34 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-10-21 18:02 Patch: Fix pending timer stuck condition Langsdorf, Mark
2005-10-21 19:19 ` Dave Jones
  -- strict thread matches above, loose matches on Subject: below --
2005-10-21 19:42 Langsdorf, Mark
2005-10-21 21:34 Devriendt, Paul

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.