public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Re: Background memory scrubbing
@ 2011-04-20 14:40 Robert Whitton
  2011-04-20 15:19 ` Clemens Ladisch
  0 siblings, 1 reply; 11+ messages in thread
From: Robert Whitton @ 2011-04-20 14:40 UTC (permalink / raw)
  To: Clemens Ladisch, rwhitton; +Cc: linux-kernel

> Robert Whitton wrote:
> > I have a home grown module that performs background memory scrubbing
> > to eliminate single bit memory errors before they become a problem.
> > ... it is specifically targeted at the AMD64 PC architecture
> 
> Then why don't you use the memory controller's automatic background
> memory scrubbing support? Doesn't your BIOS have this option?
> 
> Regards,
> Clemens
> 

Hi,

Unfortunately in common with a large number of hardware platforms background scrubbing isn't supported in the hardware (even though ECC error correction is supported) and thus there is no BIOS option to enable it. The software solution has always been fine and the CPU load negligible as it's only necessary to complete one complete scrub every day or so. I just need to find a solution to making this work on newer Linux kernels.

Rob

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Background memory scrubbing
  2011-04-20 14:40 Background memory scrubbing Robert Whitton
@ 2011-04-20 15:19 ` Clemens Ladisch
  2011-04-20 15:35   ` Borislav Petkov
  2011-04-20 19:23   ` Background memory scrubbing Bill Gatliff
  0 siblings, 2 replies; 11+ messages in thread
From: Clemens Ladisch @ 2011-04-20 15:19 UTC (permalink / raw)
  To: rwhitton; +Cc: linux-kernel

Robert Whitton wrote:
> > Robert Whitton wrote:
> > > I have a home grown module that performs background memory scrubbing
> > > to eliminate single bit memory errors before they become a problem.
> > > ... it is specifically targeted at the AMD64 PC architecture
> > 
> > Then why don't you use the memory controller's automatic background
> > memory scrubbing support? Doesn't your BIOS have this option?
> 
> Unfortunately in common with a large number of hardware platforms
> background scrubbing isn't supported in the hardware (even though ECC
> error correction is supported) and thus there is no BIOS option to
> enable it.

Which hardware platform is this?  AFAICT all architectures with ECC
(old AMD64, Family 0Fh, Family 10h) also have scrubbing support.
If your BIOS is too dumb, just try enabling it directly (bits 0-4 of
PCI configuration register 0x58 in function 3 of the CPU's northbridge
device, see the BIOS and Kernel's Developer's Guide for details).


Regards,
Clemens

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Background memory scrubbing
  2011-04-20 15:19 ` Clemens Ladisch
@ 2011-04-20 15:35   ` Borislav Petkov
  2011-04-20 15:46     ` Markus Trippelsdorf
  2011-04-20 19:23   ` Background memory scrubbing Bill Gatliff
  1 sibling, 1 reply; 11+ messages in thread
From: Borislav Petkov @ 2011-04-20 15:35 UTC (permalink / raw)
  To: rwhitton; +Cc: Clemens Ladisch, linux-kernel

On Wed, Apr 20, 2011 at 05:19:41PM +0200, Clemens Ladisch wrote:
> > Unfortunately in common with a large number of hardware platforms
> > background scrubbing isn't supported in the hardware (even though ECC
> > error correction is supported) and thus there is no BIOS option to
> > enable it.
> 
> Which hardware platform is this?  AFAICT all architectures with ECC
> (old AMD64, Family 0Fh, Family 10h) also have scrubbing support.
> If your BIOS is too dumb, just try enabling it directly (bits 0-4 of
> PCI configuration register 0x58 in function 3 of the CPU's northbridge
> device, see the BIOS and Kernel's Developer's Guide for details).

Or even better, if on AMD, you can build the amd64_edac module
(CONFIG_EDAC_AMD64) and do

echo <x> > /sys/devices/system/edac/mc/mc<y>/sdram_scrub_rate

where x is the scrubbing bandwidth in bytes/sec and y is the memory
controller on the machine, i.e. node.

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Background memory scrubbing
  2011-04-20 15:35   ` Borislav Petkov
@ 2011-04-20 15:46     ` Markus Trippelsdorf
  2011-04-20 15:58       ` Borislav Petkov
  0 siblings, 1 reply; 11+ messages in thread
From: Markus Trippelsdorf @ 2011-04-20 15:46 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: rwhitton, Clemens Ladisch, linux-kernel

On 2011.04.20 at 17:35 +0200, Borislav Petkov wrote:
> On Wed, Apr 20, 2011 at 05:19:41PM +0200, Clemens Ladisch wrote:
> > > Unfortunately in common with a large number of hardware platforms
> > > background scrubbing isn't supported in the hardware (even though ECC
> > > error correction is supported) and thus there is no BIOS option to
> > > enable it.
> > 
> > Which hardware platform is this?  AFAICT all architectures with ECC
> > (old AMD64, Family 0Fh, Family 10h) also have scrubbing support.
> > If your BIOS is too dumb, just try enabling it directly (bits 0-4 of
> > PCI configuration register 0x58 in function 3 of the CPU's northbridge
> > device, see the BIOS and Kernel's Developer's Guide for details).
> 
> Or even better, if on AMD, you can build the amd64_edac module
> (CONFIG_EDAC_AMD64) and do
> 
> echo <x> > /sys/devices/system/edac/mc/mc<y>/sdram_scrub_rate
> 
> where x is the scrubbing bandwidth in bytes/sec and y is the memory
> controller on the machine, i.e. node.

BTW is it really necessary to print the following to syslog:

EDAC amd64: pci-read, sdram scrub control value: 15
EDAC MC: Read scrub rate: 97650

everytime one runs:
 # cat /sys/devices/system/edac/mc/mc0/sdram_scrub_rate
97650

?
-- 
Markus

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Background memory scrubbing
  2011-04-20 15:46     ` Markus Trippelsdorf
@ 2011-04-20 15:58       ` Borislav Petkov
  2011-04-20 16:45         ` Markus Trippelsdorf
  0 siblings, 1 reply; 11+ messages in thread
From: Borislav Petkov @ 2011-04-20 15:58 UTC (permalink / raw)
  To: Markus Trippelsdorf; +Cc: rwhitton, Clemens Ladisch, linux-kernel

On Wed, Apr 20, 2011 at 05:46:58PM +0200, Markus Trippelsdorf wrote:
> BTW is it really necessary to print the following to syslog:
> 
> EDAC amd64: pci-read, sdram scrub control value: 15
> EDAC MC: Read scrub rate: 97650
> 
> everytime one runs:
>  # cat /sys/devices/system/edac/mc/mc0/sdram_scrub_rate
> 97650
> 
> ?

This is KERN_DEBUG since .38 (commit 24f9a7fe3f19f3fd310f556364d01a22911724b3)
and it shouldn't appear on the console if you don't change your default log level.

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Background memory scrubbing
  2011-04-20 15:58       ` Borislav Petkov
@ 2011-04-20 16:45         ` Markus Trippelsdorf
  2011-04-20 16:55           ` Markus Trippelsdorf
  0 siblings, 1 reply; 11+ messages in thread
From: Markus Trippelsdorf @ 2011-04-20 16:45 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: rwhitton, Clemens Ladisch, linux-kernel

On 2011.04.20 at 17:58 +0200, Borislav Petkov wrote:
> On Wed, Apr 20, 2011 at 05:46:58PM +0200, Markus Trippelsdorf wrote:
> > BTW is it really necessary to print the following to syslog:
> > 
> > EDAC amd64: pci-read, sdram scrub control value: 15
> > EDAC MC: Read scrub rate: 97650
> > 
> > everytime one runs:
> >  # cat /sys/devices/system/edac/mc/mc0/sdram_scrub_rate
> > 97650
> > 
> > ?
> 
> This is KERN_DEBUG since .38 (commit 24f9a7fe3f19f3fd310f556364d01a22911724b3)
> and it shouldn't appear on the console if you don't change your default log level.

Yes. Sorry, but I was referring to dmesg and not the console. 
What I mean is that maybe debugf1 or debugf2 is more appropriate than
amd64_debug?

-- 
Markus

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Background memory scrubbing
  2011-04-20 16:45         ` Markus Trippelsdorf
@ 2011-04-20 16:55           ` Markus Trippelsdorf
  2011-04-20 17:36             ` Borislav Petkov
  0 siblings, 1 reply; 11+ messages in thread
From: Markus Trippelsdorf @ 2011-04-20 16:55 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: rwhitton, Clemens Ladisch, linux-kernel

On 2011.04.20 at 18:45 +0200, Markus Trippelsdorf wrote:
> On 2011.04.20 at 17:58 +0200, Borislav Petkov wrote:
> > On Wed, Apr 20, 2011 at 05:46:58PM +0200, Markus Trippelsdorf wrote:
> > > BTW is it really necessary to print the following to syslog:
> > > 
> > > EDAC amd64: pci-read, sdram scrub control value: 15
> > > EDAC MC: Read scrub rate: 97650
> > > 
> > > everytime one runs:
> > >  # cat /sys/devices/system/edac/mc/mc0/sdram_scrub_rate
> > > 97650
> > > 
> > > ?
> > 
> > This is KERN_DEBUG since .38 (commit 24f9a7fe3f19f3fd310f556364d01a22911724b3)
> > and it shouldn't appear on the console if you don't change your default log level.
> 
> Yes. Sorry, but I was referring to dmesg and not the console. 
> What I mean is that maybe debugf1 or debugf2 is more appropriate than
> amd64_debug?

In other words:

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 31e71c4f..13b107e 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -211,7 +211,7 @@ static int amd64_get_scrub_rate(struct mem_ctl_info *mci)
 
 	scrubval = scrubval & 0x001F;
 
-	amd64_debug("pci-read, sdram scrub control value: %d\n", scrubval);
+	debugf1 ("pci-read, sdram scrub control value: %d\n", scrubval);
 
 	for (i = 0; i < ARRAY_SIZE(scrubrates); i++) {
 		if (scrubrates[i].scrubval == scrubval) {
diff --git a/drivers/edac/edac_mc_sysfs.c b/drivers/edac/edac_mc_sysfs.c
index 26343fd..8a34aea 100644
--- a/drivers/edac/edac_mc_sysfs.c
+++ b/drivers/edac/edac_mc_sysfs.c
@@ -459,7 +459,7 @@ static ssize_t mci_sdram_scrub_rate_store(struct mem_ctl_info *mci,
 
 	new_bw = mci->set_sdram_scrub_rate(mci, bandwidth);
 	if (new_bw >= 0) {
-		edac_printk(KERN_DEBUG, EDAC_MC, "Scrub rate set to %d\n", new_bw);
+		debugf1 ("Scrub rate set to %d\n", new_bw);
 		return count;
 	}
 
@@ -483,7 +483,7 @@ static ssize_t mci_sdram_scrub_rate_show(struct mem_ctl_info *mci, char *data)
 		return bandwidth;
 	}
 
-	edac_printk(KERN_DEBUG, EDAC_MC, "Read scrub rate: %d\n", bandwidth);
+	debugf1 ("Read scrub rate: %d\n", bandwidth);
 	return sprintf(data, "%d\n", bandwidth);
 }
 

-- 
Markus

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: Background memory scrubbing
  2011-04-20 16:55           ` Markus Trippelsdorf
@ 2011-04-20 17:36             ` Borislav Petkov
  2011-04-20 18:28               ` [PATCH] edac: Remove debugging output in scrub rate handling Markus Trippelsdorf
  0 siblings, 1 reply; 11+ messages in thread
From: Borislav Petkov @ 2011-04-20 17:36 UTC (permalink / raw)
  To: Markus Trippelsdorf
  Cc: Borislav Petkov, rwhitton@iee.org, Clemens Ladisch,
	linux-kernel@vger.kernel.org

On Wed, Apr 20, 2011 at 12:55:33PM -0400, Markus Trippelsdorf wrote:
> > > This is KERN_DEBUG since .38 (commit 24f9a7fe3f19f3fd310f556364d01a22911724b3)
> > > and it shouldn't appear on the console if you don't change your default log level.
> > 
> > Yes. Sorry, but I was referring to dmesg and not the console. 
> > What I mean is that maybe debugf1 or debugf2 is more appropriate than
> > amd64_debug?
> 
> In other words:

Ok, that whole debugging output there is partly historical remains and
we don't really need them if we're smart about it. But you'll have to
change your patch and make a proper commit message :).

> 
> diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
> index 31e71c4f..13b107e 100644
> --- a/drivers/edac/amd64_edac.c
> +++ b/drivers/edac/amd64_edac.c
> @@ -211,7 +211,7 @@ static int amd64_get_scrub_rate(struct mem_ctl_info *mci)
>  
>  	scrubval = scrubval & 0x001F;
>  
> -	amd64_debug("pci-read, sdram scrub control value: %d\n", scrubval);
> +	debugf1 ("pci-read, sdram scrub control value: %d\n", scrubval);

remove this one completely.

>  
>  	for (i = 0; i < ARRAY_SIZE(scrubrates); i++) {
>  		if (scrubrates[i].scrubval == scrubval) {
> diff --git a/drivers/edac/edac_mc_sysfs.c b/drivers/edac/edac_mc_sysfs.c
> index 26343fd..8a34aea 100644
> --- a/drivers/edac/edac_mc_sysfs.c
> +++ b/drivers/edac/edac_mc_sysfs.c
> @@ -459,7 +459,7 @@ static ssize_t mci_sdram_scrub_rate_store(struct mem_ctl_info *mci,
>  
>  	new_bw = mci->set_sdram_scrub_rate(mci, bandwidth);
>  	if (new_bw >= 0) {
> -		edac_printk(KERN_DEBUG, EDAC_MC, "Scrub rate set to %d\n", new_bw);
> +		debugf1 ("Scrub rate set to %d\n", new_bw);
>  		return count;
>  	}

Make here the success case implicit and issue a warning only if we fail
setting the scrub rate:


	new_bw = mci->set_sdram_scrub_rate(mci, bandwidth);
	if (new_bw < 0) {
		edac_printk(KERN_WARNING, EDAC_MC,
			    "Error setting scrub rate to: %lu\n", bandwidth);
		return -EINVAL;
	}

and do the same thing with mci_sdram_scrub_rate_show() below.

>  
> @@ -483,7 +483,7 @@ static ssize_t mci_sdram_scrub_rate_show(struct mem_ctl_info *mci, char *data)
>  		return bandwidth;
>  	}
>  
> -	edac_printk(KERN_DEBUG, EDAC_MC, "Read scrub rate: %d\n", bandwidth);
> +	debugf1 ("Read scrub rate: %d\n", bandwidth);

This one can go too since the success-case is in sysfs anyway.

>  	return sprintf(data, "%d\n", bandwidth);
>  }

Would you like to do that?

Thanks.

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH] edac: Remove debugging output in scrub rate handling
  2011-04-20 17:36             ` Borislav Petkov
@ 2011-04-20 18:28               ` Markus Trippelsdorf
  2011-04-21 11:55                 ` Borislav Petkov
  0 siblings, 1 reply; 11+ messages in thread
From: Markus Trippelsdorf @ 2011-04-20 18:28 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: rwhitton@iee.org, Clemens Ladisch, linux-kernel@vger.kernel.org

On 2011.04.20 at 19:36 +0200, Borislav Petkov wrote:
> On Wed, Apr 20, 2011 at 12:55:33PM -0400, Markus Trippelsdorf wrote:
> > > > This is KERN_DEBUG since .38 (commit 24f9a7fe3f19f3fd310f556364d01a22911724b3)
> > > > and it shouldn't appear on the console if you don't change your default log level.
> > > 
> > > Yes. Sorry, but I was referring to dmesg and not the console. 
> > > What I mean is that maybe debugf1 or debugf2 is more appropriate than
> > > amd64_debug?
> > 
> > In other words:
> 
> Ok, that whole debugging output there is partly historical remains and
> we don't really need them if we're smart about it. But you'll have to
> change your patch and make a proper commit message :).

This patch removes superfluous debugging output in the sysfs scrub rate
handler. It also consolidates the error handling in
mci_sdram_scrub_rate_store.

Signed-off-by: Markus Trippelsdorf <markus@trippelsdorf.de>
---
 drivers/edac/amd64_edac.c    |    2 --
 drivers/edac/edac_mc_sysfs.c |   11 +++++------
 2 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 31e71c4f..4b4071e 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -211,8 +211,6 @@ static int amd64_get_scrub_rate(struct mem_ctl_info *mci)
 
 	scrubval = scrubval & 0x001F;
 
-	amd64_debug("pci-read, sdram scrub control value: %d\n", scrubval);
-
 	for (i = 0; i < ARRAY_SIZE(scrubrates); i++) {
 		if (scrubrates[i].scrubval == scrubval) {
 			retval = scrubrates[i].bandwidth;
diff --git a/drivers/edac/edac_mc_sysfs.c b/drivers/edac/edac_mc_sysfs.c
index 26343fd..6ffe438 100644
--- a/drivers/edac/edac_mc_sysfs.c
+++ b/drivers/edac/edac_mc_sysfs.c
@@ -458,13 +458,13 @@ static ssize_t mci_sdram_scrub_rate_store(struct mem_ctl_info *mci,
 		return -EINVAL;
 
 	new_bw = mci->set_sdram_scrub_rate(mci, bandwidth);
-	if (new_bw >= 0) {
-		edac_printk(KERN_DEBUG, EDAC_MC, "Scrub rate set to %d\n", new_bw);
-		return count;
+	if (new_bw < 0) {
+		edac_printk(KERN_WARNING, EDAC_MC,
+			"Error setting scrub rate to: %lu\n", bandwidth);
+		return -EINVAL;
 	}
 
-	edac_printk(KERN_DEBUG, EDAC_MC, "Error setting scrub rate to: %lu\n", bandwidth);
-	return -EINVAL;
+	return count;
 }
 
 /*
@@ -483,7 +483,6 @@ static ssize_t mci_sdram_scrub_rate_show(struct mem_ctl_info *mci, char *data)
 		return bandwidth;
 	}
 
-	edac_printk(KERN_DEBUG, EDAC_MC, "Read scrub rate: %d\n", bandwidth);
 	return sprintf(data, "%d\n", bandwidth);
 }
 
-- 
Markus

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: Background memory scrubbing
  2011-04-20 15:19 ` Clemens Ladisch
  2011-04-20 15:35   ` Borislav Petkov
@ 2011-04-20 19:23   ` Bill Gatliff
  1 sibling, 0 replies; 11+ messages in thread
From: Bill Gatliff @ 2011-04-20 19:23 UTC (permalink / raw)
  To: Clemens Ladisch; +Cc: rwhitton, linux-kernel

Ladisch:


On Wed, Apr 20, 2011 at 10:19 AM, Clemens Ladisch <clemens@ladisch.de> wrote:
> Which hardware platform is this?  AFAICT all architectures with ECC
> (old AMD64, Family 0Fh, Family 10h) also have scrubbing support.
> If your BIOS is too dumb, just try enabling it directly (bits 0-4 of
> PCI configuration register 0x58 in function 3 of the CPU's northbridge
> device, see the BIOS and Kernel's Developer's Guide for details).

That won't help non-AMD64 platforms that want to scrub.

Is there a way to make Robert's approach work?  I'm aware of a few
non-AMD64, somewhat-exotic platforms that require scrubbing the way
that Robert is proposing...


b.g.
-- 
Bill Gatliff
bgat@billgatliff.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] edac: Remove debugging output in scrub rate handling
  2011-04-20 18:28               ` [PATCH] edac: Remove debugging output in scrub rate handling Markus Trippelsdorf
@ 2011-04-21 11:55                 ` Borislav Petkov
  0 siblings, 0 replies; 11+ messages in thread
From: Borislav Petkov @ 2011-04-21 11:55 UTC (permalink / raw)
  To: Markus Trippelsdorf
  Cc: rwhitton@iee.org, Clemens Ladisch, linux-kernel@vger.kernel.org

On Wed, Apr 20, 2011 at 02:28:45PM -0400, Markus Trippelsdorf wrote:
>
> This patch removes superfluous debugging output in the sysfs scrub rate
> handler. It also consolidates the error handling in
> mci_sdram_scrub_rate_store.
> 
> Signed-off-by: Markus Trippelsdorf <markus@trippelsdorf.de>

Applied, thanks.

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2011-04-21 11:55 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-04-20 14:40 Background memory scrubbing Robert Whitton
2011-04-20 15:19 ` Clemens Ladisch
2011-04-20 15:35   ` Borislav Petkov
2011-04-20 15:46     ` Markus Trippelsdorf
2011-04-20 15:58       ` Borislav Petkov
2011-04-20 16:45         ` Markus Trippelsdorf
2011-04-20 16:55           ` Markus Trippelsdorf
2011-04-20 17:36             ` Borislav Petkov
2011-04-20 18:28               ` [PATCH] edac: Remove debugging output in scrub rate handling Markus Trippelsdorf
2011-04-21 11:55                 ` Borislav Petkov
2011-04-20 19:23   ` Background memory scrubbing Bill Gatliff

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox