linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: Stefan Assmann <sassmann@kpanic.de>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	tony.luck@intel.com, andi@firstfloor.org, mingo@elte.hu,
	hpa@zytor.com, rick@vanrein.org, rdunlap@xenotime.net,
	Nancy Yuen <yuenn@google.com>, Michael Ditto <mditto@google.com>
Subject: Re: [PATCH v2 0/3] support for broken memory modules (BadRAM)
Date: Wed, 22 Jun 2011 11:00:34 -0700	[thread overview]
Message-ID: <20110622110034.89ee399c.akpm@linux-foundation.org> (raw)
In-Reply-To: <1308741534-6846-1-git-send-email-sassmann@kpanic.de>

On Wed, 22 Jun 2011 13:18:51 +0200 Stefan Assmann <sassmann@kpanic.de> wrote:

> Following the RFC for the BadRAM feature here's the updated version with
> spelling fixes, thanks go to Randy Dunlap. Also the code is now less verbose,
> as requested by Andi Kleen.
> v2 with even more spelling fixes suggested by Randy.
> Patches are against vanilla 2.6.39.
> 
> The idea is to allow the user to specify RAM addresses that shouldn't be
> touched by the OS, because they are broken in some way. Not all machines have
> hardware support for hwpoison, ECC RAM, etc, so here's a solution that allows to
> use bitmasks to mask address patterns with the new "badram" kernel command line
> parameter.
> Memtest86 has an option to generate these patterns since v2.3 so the only thing
> for the user to do should be:
> - run Memtest86
> - note down the pattern
> - add badram=<pattern> to the kernel command line
> 
> The concerning pages are then marked with the hwpoison flag and thus won't be
> used by the memory managment system.

The google kernel has a similar capability.  I asked Nancy to comment
on these patches and she said:

: One, the bad addresses are passed via the kernel command line, which
: has a limited length.  It's okay if the addresses can be fit into a
: pattern, but that's not necessarily the case in the google kernel.  And
: even with patterns, the limit on the command line length limits the
: number of patterns that user can specify.  Instead we use lilo to pass
: a file containing the bad pages in e820 format to the kernel.
: 
: Second, the BadRAM patch expands the address patterns from the command
: line into individual entries in the kernel's e820 table.  The e820
: table is a fixed buffer that supports a very small, hard coded number
: of entries (128).  We require a much larger number of entries (on
: the order of a few thousand), so much of the google kernel patch deals
: with expanding the e820 table. Also, with the BadRAM patch, entries
: that don't fit in the table are silently dropped and this isn't
: appropriate for us.
: 
: Another caveat of mapping out too much bad memory in general.  If too
: much memory is removed from low memory, a system may not boot.  We
: solve this by generating good maps.  Our userspace tools do not map out
: memory below a certain limit, and it verifies against a system's iomap
: that only addresses from memory is mapped out.

I have a couple of thoughts here:

- If this patchset is merged and a major user such as google is
  unable to use it and has to continue to carry a separate patch then
  that's a regrettable situation for the upstream kernel.

- Google's is, afaik, the largest use case we know of: zillions of
  machines for a number of years.  And this real-world experience tells
  us that the badram patchset has shortcomings.  Shortcomings which we
  can expect other users to experience.

So.  What are your thoughts on these issues?

Thanks

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2011-06-22 18:00 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-06-22 11:18 [PATCH v2 0/3] support for broken memory modules (BadRAM) Stefan Assmann
2011-06-22 11:18 ` [PATCH v2 1/3] Add string parsing function get_next_ulong Stefan Assmann
2011-06-22 11:18 ` [PATCH v2 2/3] support for broken memory modules (BadRAM) Stefan Assmann
2011-06-22 11:18 ` [PATCH v2 3/3] Add documentation and credits for BadRAM Stefan Assmann
2011-06-22 18:00 ` Andrew Morton [this message]
2011-06-22 18:06   ` [PATCH v2 0/3] support for broken memory modules (BadRAM) Josh Boyer
2011-06-22 18:09   ` Randy Dunlap
2011-06-22 18:11     ` Nancy Yuen
2011-06-22 18:13   ` H. Peter Anvin
2011-06-22 19:01     ` Nancy Yuen
2011-06-22 19:06       ` H. Peter Anvin
2011-06-22 18:24   ` Andi Kleen
2011-06-22 18:38     ` Andrew Morton
2011-06-22 18:56       ` Andi Kleen
2011-06-22 19:05         ` H. Peter Anvin
2011-06-22 19:15           ` Andi Kleen
2011-06-22 20:25             ` H. Peter Anvin
2011-06-22 20:28               ` Andi Kleen
2011-06-22 19:46   ` [PATCH] x86: e820: Eliminate bubble sort from sanitize_e820_map Mike Ditto
2011-06-22 20:18   ` [PATCH v2 0/3] support for broken memory modules (BadRAM) Stefan Assmann
2011-06-23 10:33     ` Rick van Rein
2011-06-23 10:49       ` Rick van Rein
2011-06-23 10:10   ` Rick van Rein
2011-06-22 18:15 ` H. Peter Anvin
2011-06-22 20:30   ` Stefan Assmann
2011-06-22 20:33     ` H. Peter Anvin
2011-06-23 13:39 ` Matthew Garrett
2011-06-23 14:08   ` Stefan Assmann
2011-06-23 14:12     ` Matthew Garrett
2011-06-23 15:37       ` Stefan Assmann
2011-06-23 16:30         ` H. Peter Anvin
2011-06-24  0:59           ` Andi Kleen
2011-06-23 17:00         ` Andi Kleen
2011-06-23 17:12           ` Luck, Tony
2011-06-24  1:03             ` Craig Bergstrom
2011-06-24  1:08               ` Andi Kleen
2011-06-24  1:22                 ` Craig Bergstrom
2011-06-24  8:05               ` Rick van Rein
2011-06-24 14:34                 ` Craig Bergstrom
2011-06-24 16:16                 ` H. Peter Anvin
2011-06-24 16:40                   ` Luck, Tony
2011-06-24 16:56                     ` Rick van Rein
2011-06-24 17:14                       ` H. Peter Anvin
     [not found] <fa.fHPNPTsllvyE/7DxrKwiwgVbVww@ifi.uio.no>
2011-06-24 21:10 ` Shane Nay
2011-06-28  2:33   ` Craig Bergstrom
2011-06-29  8:08     ` Rick van Rein
2011-06-29 15:28       ` craig lkml
2011-06-29 16:06         ` Craig Bergstrom
2011-06-29 21:24           ` Tony Luck
2011-06-30 14:32       ` Jody Belka
  -- strict thread matches above, loose matches on Subject: below --
2011-06-21  9:23 Stefan Assmann
2011-06-21 22:02 ` Andrew Morton
2011-06-22 11:11   ` Stefan Assmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110622110034.89ee399c.akpm@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mditto@google.com \
    --cc=mingo@elte.hu \
    --cc=rdunlap@xenotime.net \
    --cc=rick@vanrein.org \
    --cc=sassmann@kpanic.de \
    --cc=tony.luck@intel.com \
    --cc=yuenn@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).