public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Re: Background memory scrubbing
@ 2011-04-20 15:46 Robert Whitton
  2011-04-20 16:01 ` Borislav Petkov
  0 siblings, 1 reply; 18+ messages in thread
From: Robert Whitton @ 2011-04-20 15:46 UTC (permalink / raw)
  To: Borislav Petkov, rwhitton; +Cc: Clemens Ladisch, linux-kernel


> On Wed, Apr 20, 2011 at 05:19:41PM +0200, Clemens Ladisch wrote:
> > > Unfortunately in common with a large number of hardware platforms
> > > background scrubbing isn't supported in the hardware (even though ECC
> > > error correction is supported) and thus there is no BIOS option to
> > > enable it.
> > 
> > Which hardware platform is this? AFAICT all architectures with ECC
> > (old AMD64, Family 0Fh, Family 10h) also have scrubbing support.
> > If your BIOS is too dumb, just try enabling it directly (bits 0-4 of
> > PCI configuration register 0x58 in function 3 of the CPU's northbridge
> > device, see the BIOS and Kernel's Developer's Guide for details).
> 
> Or even better, if on AMD, you can build the amd64_edac module
> (CONFIG_EDAC_AMD64) and do
> 
> echo  > /sys/devices/system/edac/mc/mc/sdram_scrub_rate
> 
> where x is the scrubbing bandwidth in bytes/sec and y is the memory
> controller on the machine, i.e. node.
> 
> -- 
> Regards/Gruss,
> Boris.
> 

Unfortunately that also isn't an option on my platform(s). There surely must be a way for a module to be able to get a mapping for each physical page of memory in the system and to be able to use that mapping to do atomic read/writes to scrub the memory.

^ permalink raw reply	[flat|nested] 18+ messages in thread
* Re: Background memory scrubbing
@ 2011-04-20 17:05 Robert Whitton
  2011-04-20 17:53 ` Rik van Riel
  0 siblings, 1 reply; 18+ messages in thread
From: Robert Whitton @ 2011-04-20 17:05 UTC (permalink / raw)
  To: Rik van Riel, rwhitton; +Cc: linux-kernel


On Wed 20/04/11  6:45 PM , Rik van Riel <riel@redhat.com> wrote:

> On 04/20/2011 03:58 AM, Robert Whitton wrote:
> 
> > for each PFN from 256 to the highest valid PFN
> > {
> > if (pfn_valid(PFN))
> > {
> > page = pfn_to_page(PFN)
> > va = kmap(page)
> > atomic_scrub(va, PAGE_SIZE)
> > kunmap(page)
> > }
> >
> > sleep(for_a_while)
> > }
> 
> What exactly does atomic_scrub do?

atomic_scrub is part of the edac subsystem see arch/x86/include/asm/edac.h. It simply does a locked add of zero to each DWORD in the specified range.

(a shame that for 64 bit platforms it doesn't use QWORDS but that's just an optimisation)

> 
> > This code works absolutely fine up to a short distance beyond the 16MB
> boundary (specifically it seems to always fail on my hardware at PFN
> 4105). At this point despite the fact that kmap returns a valid virtual
> address (and it is the virtual address that I expect - 0xffff880001009000)
> I get the kernel oops - "unable to handle kernel paging request".
> 
> Looks like you might be making some of the kernel code that
> is running at that moment unreachable, leading to a kernel
> page fault.
> 
> ______________________________________________________________________
> This email has been scanned by the MessageLabs Email Security System.
> For more information please visit http://www.messagelabs.com/email [1] 
> ______________________________________________________________________
> 
> 
> 
> Links:
> ------
> [1]
> http://webmail.eclipse.net.uk/parse.php?redirect=http://www.messagelabs.com
> /email
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread
* Re: Background memory scrubbing
@ 2011-04-20 14:40 Robert Whitton
  2011-04-20 15:19 ` Clemens Ladisch
  0 siblings, 1 reply; 18+ messages in thread
From: Robert Whitton @ 2011-04-20 14:40 UTC (permalink / raw)
  To: Clemens Ladisch, rwhitton; +Cc: linux-kernel

> Robert Whitton wrote:
> > I have a home grown module that performs background memory scrubbing
> > to eliminate single bit memory errors before they become a problem.
> > ... it is specifically targeted at the AMD64 PC architecture
> 
> Then why don't you use the memory controller's automatic background
> memory scrubbing support? Doesn't your BIOS have this option?
> 
> Regards,
> Clemens
> 

Hi,

Unfortunately in common with a large number of hardware platforms background scrubbing isn't supported in the hardware (even though ECC error correction is supported) and thus there is no BIOS option to enable it. The software solution has always been fine and the CPU load negligible as it's only necessary to complete one complete scrub every day or so. I just need to find a solution to making this work on newer Linux kernels.

Rob

^ permalink raw reply	[flat|nested] 18+ messages in thread
* Background memory scrubbing
@ 2011-04-20  7:58 Robert Whitton
  2011-04-20 13:30 ` Clemens Ladisch
  2011-04-20 16:45 ` Rik van Riel
  0 siblings, 2 replies; 18+ messages in thread
From: Robert Whitton @ 2011-04-20  7:58 UTC (permalink / raw)
  To: linux-kernel; +Cc: rwhitton

Hi,

I have a home grown module that performs background memory scrubbing to eliminate single bit memory errors before they become a problem. This has been working in the 2.6.26 kernels for sometime (it is specifically targeted at the AMD64 PC architecture). I have now moved to the 2.6.32 kernel and it fails with "unable to handle kernel paging request" after a couple of minutes. The code works in summary as follows in a kernel thread...

for each PFN from 256 to the highest valid PFN
{
  if (pfn_valid(PFN))
  {
    page = pfn_to_page(PFN)
    va = kmap(page)
    atomic_scrub(va, PAGE_SIZE)
    kunmap(page)
  }

  sleep(for_a_while)
}


This code works absolutely fine up to a short distance beyond the 16MB boundary (specifically it seems to always fail on my hardware at PFN 4105). At this point despite the fact that kmap returns a valid virtual address (and it is the virtual address that I expect - 0xffff880001009000) I get the kernel oops - "unable to handle kernel paging request".

My immediate thought was to check the kernel page tables and avoid those pages that are marked as not present or read only however it appears that init_mm and pgd_offset_k have both been deprecated. I have also looked at page->flags but I've found that the flags for the first page that fails are exactly the same as for the previous page that works absolutely fine so I don't appear to be able to use page->flags to make a valid distinction.

So I'm looking for any hints on how to fix the original code i.e. how can the I sensibly detect "a priori" if a PFN/page has a valid mapping in the kernel page tables such that I can read/write to that page via a kmap(ped) virtual address. Alternatively since init_mm and pgd_offset_k have been deprecated how can I gain access to the kernel page tables?

Thanks in advance for any help.

Rob


(please CC me in on any responses)





^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2011-04-25 18:21 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-04-20 15:46 Background memory scrubbing Robert Whitton
2011-04-20 16:01 ` Borislav Petkov
2011-04-25 16:53   ` Chris Friesen
  -- strict thread matches above, loose matches on Subject: below --
2011-04-20 17:05 Robert Whitton
2011-04-20 17:53 ` Rik van Riel
2011-04-24 20:47   ` Pavel Machek
2011-04-20 14:40 Robert Whitton
2011-04-20 15:19 ` Clemens Ladisch
2011-04-20 15:35   ` Borislav Petkov
2011-04-20 15:46     ` Markus Trippelsdorf
2011-04-20 15:58       ` Borislav Petkov
2011-04-20 16:45         ` Markus Trippelsdorf
2011-04-20 16:55           ` Markus Trippelsdorf
2011-04-20 17:36             ` Borislav Petkov
2011-04-20 19:23   ` Bill Gatliff
2011-04-20  7:58 Robert Whitton
2011-04-20 13:30 ` Clemens Ladisch
2011-04-20 16:45 ` Rik van Riel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox