public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Document hadling of bad memory
@ 2008-11-26 16:15 Pavel Machek
  2008-11-26 16:25 ` Jan-Simon Möller
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Pavel Machek @ 2008-11-26 16:15 UTC (permalink / raw)
  To: kernel list, mtk.manpages, dl9pf, rdunlap, linux-doc,
	Andrew Morton, Trivial patch monkey


Document how to deal with bad memory reported with memtest.

Signed-off-by: Pavel Machek <pavel@suse.cz>

diff --git a/Documentation/bad_memory.txt b/Documentation/bad_memory.txt
new file mode 100644
index 0000000..df84162
--- /dev/null
+++ b/Documentation/bad_memory.txt
@@ -0,0 +1,45 @@
+March 2008
+Jan-Simon Moeller, dl9pf@gmx.de
+
+
+How to deal with bad memory e.g. reported by memtest86+ ?
+#########################################################
+
+There are three possibilities I know of:
+
+1) Reinsert/swap the memory modules
+
+2) Buy new modules (best!) or try to exchange the memory
+   if you have spare-parts
+
+3) Use BadRAM or memmap
+
+This Howto is about number 3) .
+
+
+BadRAM
+######
+BadRAM is the actively developed and available as kernel-patch
+here:  http://rick.vanrein.org/linux/badram/
+
+For more details see the BadRAM documentation.
+
+memmap
+######
+
+memmap is already in the kernel and usable as kernel-parameter at
+boot-time.  Its syntax is slightly strange and you may need to
+calculate the values by yourself!
+
+Syntax to exclude a memory area (see kernel-parameters.txt for details):
+memmap=<size>$<address>
+
+Example: memtest86+ reported here errors at address 0x18691458, 0x18698424 and
+         some others. All had 0x1869xxxx in common, so I chose a pattern of
+         0x18690000,0xffff0000.
+
+With the numbers of the example above:
+memmap=64K$0x18690000
+ or
+memmap=0x10000$0x18690000
+

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: Document hadling of bad memory
  2008-11-26 16:15 Document hadling of bad memory Pavel Machek
@ 2008-11-26 16:25 ` Jan-Simon Möller
  2008-11-27  0:42 ` Jiri Kosina
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 12+ messages in thread
From: Jan-Simon Möller @ 2008-11-26 16:25 UTC (permalink / raw)
  To: Pavel Machek
  Cc: kernel list, mtk.manpages, rdunlap, linux-doc, Andrew Morton,
	Trivial patch monkey

Am Mittwoch 26 November 2008 17:15:21 schrieb Pavel Machek:
> 
> Document how to deal with bad memory reported with memtest.
> 
> Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Jan-Simon Möller <dl9pf@gmx.de>
 
> diff --git a/Documentation/bad_memory.txt b/Documentation/bad_memory.txt
[...]

Best regards,
Jan-Simon

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Document hadling of bad memory
  2008-11-26 16:15 Document hadling of bad memory Pavel Machek
  2008-11-26 16:25 ` Jan-Simon Möller
@ 2008-11-27  0:42 ` Jiri Kosina
  2008-11-28  9:00 ` Rob Landley
  2008-12-01 18:56 ` Randy Dunlap
  3 siblings, 0 replies; 12+ messages in thread
From: Jiri Kosina @ 2008-11-27  0:42 UTC (permalink / raw)
  To: Pavel Machek
  Cc: kernel list, mtk.manpages, dl9pf, rdunlap, linux-doc,
	Andrew Morton, Trivial patch monkey, linux-doc


[ linux-doc@vger.kernel.org added, these should be the proper guys to 
  merge this ]

On Wed, 26 Nov 2008, Pavel Machek wrote:

> 
> Document how to deal with bad memory reported with memtest.
> 
> Signed-off-by: Pavel Machek <pavel@suse.cz>
> 
> diff --git a/Documentation/bad_memory.txt b/Documentation/bad_memory.txt
> new file mode 100644
> index 0000000..df84162
> --- /dev/null
> +++ b/Documentation/bad_memory.txt
> @@ -0,0 +1,45 @@
> +March 2008
> +Jan-Simon Moeller, dl9pf@gmx.de
> +
> +
> +How to deal with bad memory e.g. reported by memtest86+ ?
> +#########################################################
> +
> +There are three possibilities I know of:
> +
> +1) Reinsert/swap the memory modules
> +
> +2) Buy new modules (best!) or try to exchange the memory
> +   if you have spare-parts
> +
> +3) Use BadRAM or memmap
> +
> +This Howto is about number 3) .
> +
> +
> +BadRAM
> +######
> +BadRAM is the actively developed and available as kernel-patch
> +here:  http://rick.vanrein.org/linux/badram/
> +
> +For more details see the BadRAM documentation.
> +
> +memmap
> +######
> +
> +memmap is already in the kernel and usable as kernel-parameter at
> +boot-time.  Its syntax is slightly strange and you may need to
> +calculate the values by yourself!
> +
> +Syntax to exclude a memory area (see kernel-parameters.txt for details):
> +memmap=<size>$<address>
> +
> +Example: memtest86+ reported here errors at address 0x18691458, 0x18698424 and
> +         some others. All had 0x1869xxxx in common, so I chose a pattern of
> +         0x18690000,0xffff0000.
> +
> +With the numbers of the example above:
> +memmap=64K$0x18690000
> + or
> +memmap=0x10000$0x18690000
> +
> 
> -- 
> (english) http://www.livejournal.com/~pavelmachek
> (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
> 

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Document hadling of bad memory
  2008-11-26 16:15 Document hadling of bad memory Pavel Machek
  2008-11-26 16:25 ` Jan-Simon Möller
  2008-11-27  0:42 ` Jiri Kosina
@ 2008-11-28  9:00 ` Rob Landley
  2008-11-28  9:47   ` Jan-Simon Möller
                     ` (2 more replies)
  2008-12-01 18:56 ` Randy Dunlap
  3 siblings, 3 replies; 12+ messages in thread
From: Rob Landley @ 2008-11-28  9:00 UTC (permalink / raw)
  To: Pavel Machek
  Cc: kernel list, mtk.manpages, dl9pf, rdunlap, linux-doc,
	Andrew Morton, Trivial patch monkey

On Wednesday 26 November 2008 10:15:21 Pavel Machek wrote:
> Document how to deal with bad memory reported with memtest.
...
> +BadRAM
> +######
> +BadRAM is the actively developed and available as kernel-patch
> +here:  http://rick.vanrein.org/linux/badram/

So the patch isn't worth merging, but documentation about the out-of-tree 
patch is worth merging?

I'm not objecting, I'm just confused about to what the merge criteria are...

Rob

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Document hadling of bad memory
  2008-11-28  9:00 ` Rob Landley
@ 2008-11-28  9:47   ` Jan-Simon Möller
  2008-11-28 12:18   ` Pavel Machek
  2008-11-29  6:50   ` Andrew Morton
  2 siblings, 0 replies; 12+ messages in thread
From: Jan-Simon Möller @ 2008-11-28  9:47 UTC (permalink / raw)
  To: Rob Landley
  Cc: Pavel Machek, kernel list, mtk.manpages, rdunlap, linux-doc,
	Andrew Morton, Trivial patch monkey

Am Freitag 28 November 2008 10:00:26 schrieb Rob Landley:
> 
> So the patch isn't worth merging, but documentation about the out-of-tree 
> patch is worth merging?
Good point.

IIRC we tried merging the patch, but without luck at that time. It was said, that there's another method
(with an even <irony>better</irony> syntax) which could also handle this case and there should be better 
some hacking to get the syntax parsed to use the functions of this already in-kernel method.
I don't know the status of this (guess: none). 
What I know: badmem worked here really good. (But meantime I bought new ram.)

Best regards,
Jan-Simon

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Document hadling of bad memory
  2008-11-28  9:00 ` Rob Landley
  2008-11-28  9:47   ` Jan-Simon Möller
@ 2008-11-28 12:18   ` Pavel Machek
  2008-11-29  5:28     ` Rob Landley
  2008-11-29  6:50   ` Andrew Morton
  2 siblings, 1 reply; 12+ messages in thread
From: Pavel Machek @ 2008-11-28 12:18 UTC (permalink / raw)
  To: Rob Landley
  Cc: kernel list, mtk.manpages, dl9pf, rdunlap, linux-doc,
	Andrew Morton, Trivial patch monkey

On Fri 2008-11-28 03:00:26, Rob Landley wrote:
> On Wednesday 26 November 2008 10:15:21 Pavel Machek wrote:
> > Document how to deal with bad memory reported with memtest.
> ...
> > +BadRAM
> > +######
> > +BadRAM is the actively developed and available as kernel-patch
> > +here:  http://rick.vanrein.org/linux/badram/
> 
> So the patch isn't worth merging, but documentation about the out-of-tree 
> patch is worth merging?

Well, why not. The patch is unneccessary, but for the poor souls hit
by bad memory, one line pointer can help...
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Document hadling of bad memory
  2008-11-28 12:18   ` Pavel Machek
@ 2008-11-29  5:28     ` Rob Landley
  0 siblings, 0 replies; 12+ messages in thread
From: Rob Landley @ 2008-11-29  5:28 UTC (permalink / raw)
  To: Pavel Machek
  Cc: kernel list, mtk.manpages, dl9pf, rdunlap, linux-doc,
	Andrew Morton, Trivial patch monkey

On Friday 28 November 2008 06:18:38 Pavel Machek wrote:
> On Fri 2008-11-28 03:00:26, Rob Landley wrote:
> > So the patch isn't worth merging, but documentation about the out-of-tree
> > patch is worth merging?
>
> Well, why not. The patch is unneccessary, but for the poor souls hit
> by bad memory, one line pointer can help...
> 									Pavel

Define "unnecessary".

Rob

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Document hadling of bad memory
  2008-11-28  9:00 ` Rob Landley
  2008-11-28  9:47   ` Jan-Simon Möller
  2008-11-28 12:18   ` Pavel Machek
@ 2008-11-29  6:50   ` Andrew Morton
  2 siblings, 0 replies; 12+ messages in thread
From: Andrew Morton @ 2008-11-29  6:50 UTC (permalink / raw)
  To: Rob Landley
  Cc: Pavel Machek, kernel list, mtk.manpages, dl9pf, rdunlap,
	linux-doc, Trivial patch monkey

On Fri, 28 Nov 2008 03:00:26 -0600 Rob Landley <rob@landley.net> wrote:

> On Wednesday 26 November 2008 10:15:21 Pavel Machek wrote:
> > Document how to deal with bad memory reported with memtest.
> ...
> > +BadRAM
> > +######
> > +BadRAM is the actively developed and available as kernel-patch
> > +here:  http://rick.vanrein.org/linux/badram/
> 
> So the patch isn't worth merging, but documentation about the out-of-tree 
> patch is worth merging?
> 
> I'm not objecting, I'm just confused about to what the merge criteria are...
> 

mm..  If someone finds it useful (and I assume that at least one person
would have found it useful, hence the effort to write the patch) then
why not?

(And yeah, yeah, someone might find a .gif of a parrot useful too.  Go
do some work.)


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Document hadling of bad memory
  2008-11-26 16:15 Document hadling of bad memory Pavel Machek
                   ` (2 preceding siblings ...)
  2008-11-28  9:00 ` Rob Landley
@ 2008-12-01 18:56 ` Randy Dunlap
  2008-12-09 12:31   ` Pavel Machek
  3 siblings, 1 reply; 12+ messages in thread
From: Randy Dunlap @ 2008-12-01 18:56 UTC (permalink / raw)
  To: Pavel Machek
  Cc: kernel list, mtk.manpages, dl9pf, rdunlap, linux-doc,
	Andrew Morton, Trivial patch monkey

On Wed, 26 Nov 2008 17:15:21 +0100 Pavel Machek wrote:

> Document how to deal with bad memory reported with memtest.
> 
> Signed-off-by: Pavel Machek <pavel@suse.cz>
> 
> diff --git a/Documentation/bad_memory.txt b/Documentation/bad_memory.txt
> new file mode 100644
> index 0000000..df84162
> --- /dev/null
> +++ b/Documentation/bad_memory.txt
> @@ -0,0 +1,45 @@
> +March 2008
> +Jan-Simon Moeller, dl9pf@gmx.de
> +
> +
> +How to deal with bad memory e.g. reported by memtest86+ ?
> +#########################################################
> +
> +There are three possibilities I know of:
> +
> +1) Reinsert/swap the memory modules
> +
> +2) Buy new modules (best!) or try to exchange the memory
> +   if you have spare-parts
> +
> +3) Use BadRAM or memmap
> +
> +This Howto is about number 3) .

No space between 3) and '.'.

> +
> +
> +BadRAM
> +######
> +BadRAM is the actively developed and available as kernel-patch
> +here:  http://rick.vanrein.org/linux/badram/
> +
> +For more details see the BadRAM documentation.
> +
> +memmap
> +######
> +
> +memmap is already in the kernel and usable as kernel-parameter at

                                                 a kernel parameter at

> +boot-time.  Its syntax is slightly strange and you may need to

   boot time.

> +calculate the values by yourself!

s/!/./

> +
> +Syntax to exclude a memory area (see kernel-parameters.txt for details):
> +memmap=<size>$<address>
> +
> +Example: memtest86+ reported here errors at address 0x18691458, 0x18698424 and

s/here //

> +         some others. All had 0x1869xxxx in common, so I chose a pattern of
> +         0x18690000,0xffff0000.

What is the 0xffff0000 for?  Needs explanation.

> +
> +With the numbers of the example above:
> +memmap=64K$0x18690000
> + or
> +memmap=0x10000$0x18690000
> +

Please lose the last empty line.

and thanks for the patch/new file.

---
~Randy

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Document hadling of bad memory
  2008-12-01 18:56 ` Randy Dunlap
@ 2008-12-09 12:31   ` Pavel Machek
  2008-12-09 21:40     ` Rob Landley
  0 siblings, 1 reply; 12+ messages in thread
From: Pavel Machek @ 2008-12-09 12:31 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: kernel list, mtk.manpages, dl9pf, rdunlap, linux-doc,
	Andrew Morton, Trivial patch monkey


I cleaned the document up according to Randy (thanks!). I don't actually know
enough about DRAM error characcteristics, I guess'round the size of
bad region up to nearest 2^n makes sense.

Signed-off-by: Pavel Machek <pavel@suse.cz>

diff --git a/Documentation/bad_memory.txt b/Documentation/bad_memory.txt
index df84162..a2a8703 100644
--- a/Documentation/bad_memory.txt
+++ b/Documentation/bad_memory.txt
@@ -14,12 +14,12 @@ There are three possibilities I know of:
 
 3) Use BadRAM or memmap
 
-This Howto is about number 3) .
+This Howto is about number 3).
 
 
 BadRAM
 ######
-BadRAM is the actively developed and available as kernel-patch
+BadRAM is the actively developed and available as a kernel patch
 here:  http://rick.vanrein.org/linux/badram/
 
 For more details see the BadRAM documentation.
@@ -27,19 +27,20 @@ For more details see the BadRAM documentation.
 memmap
 ######
 
-memmap is already in the kernel and usable as kernel-parameter at
-boot-time.  Its syntax is slightly strange and you may need to
-calculate the values by yourself!
+memmap is already in the kernel and usable as a kernel parameter at
+boot time.  Its syntax is slightly strange and you may need to
+calculate the values by yourself.
 
 Syntax to exclude a memory area (see kernel-parameters.txt for details):
 memmap=<size>$<address>
 
-Example: memtest86+ reported here errors at address 0x18691458, 0x18698424 and
+Example: memtest86+ reported errors at address 0x18691458, 0x18698424 and
          some others. All had 0x1869xxxx in common, so I chose a pattern of
-         0x18690000,0xffff0000.
+         0x18690000 and size of 0x10000. (Size needs to cover at least all
+	 known bad places, and rounding to nearest power of 2 makes sense
+	 'just to be safe').
 
 With the numbers of the example above:
 memmap=64K$0x18690000
  or
 memmap=0x10000$0x18690000
-

-- 

(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: Document hadling of bad memory
  2008-12-09 12:31   ` Pavel Machek
@ 2008-12-09 21:40     ` Rob Landley
  2008-12-09 23:11       ` Pavel Machek
  0 siblings, 1 reply; 12+ messages in thread
From: Rob Landley @ 2008-12-09 21:40 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Randy Dunlap, kernel list, mtk.manpages, dl9pf, rdunlap,
	linux-doc, Andrew Morton, Trivial patch monkey

On Tuesday 09 December 2008 06:31:52 Pavel Machek wrote:
> I cleaned the document up according to Randy (thanks!). I don't actually
> know enough about DRAM error characcteristics, I guess'round the size of
> bad region up to nearest 2^n makes sense.
>
> Signed-off-by: Pavel Machek <pavel@suse.cz>
>
> diff --git a/Documentation/bad_memory.txt b/Documentation/bad_memory.txt
...
> +This Howto is about number 3).
>
>
>  BadRAM
>  ######
> -BadRAM is the actively developed and available as kernel-patch
> +BadRAM is the actively developed and available as a kernel patch
>  here:  http://rick.vanrein.org/linux/badram/

Ok, once again: the point of this patch is to document an out of tree patch.

The out of tree patch is here:
http://rick.vanrein.org/linux/badram/software/BadRAM-2.6.27.1.patch

It has its own Documentation/badram.txt file and it patches 
Documentation/memory.txt, as acknowledged here:

>  For more details see the BadRAM documentation.
> @@ -27,19 +27,20 @@ For more details see the BadRAM documentation.
>  memmap
>  ######

Now what I don't understand is, why add something to the tree formalizing the 
out-of-tree status of this other patch?  Why not just merge it?  If it's 
interesting enough to have documentation about the patch in the tree, why is 
the patch itself not interesting enough to merge?  It's clearly got an active 
maintainer, and has for years.  (Is there something specific about it that 
needs to be cleaned up?)

Adding this extra documentation to the badram patch sounds great.  Merging the 
badram patch into the linux kernel sounds useful; obviously _this_ patch is 
inherently an expression of interest in it.  Adding documentation about the 
badram patch to the linux kernel tree but _not_ adding the badram patch itself 
seems kind of crazy.

Would someone please explain the reasoning here?  I don't understand it.

Rob

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Document hadling of bad memory
  2008-12-09 21:40     ` Rob Landley
@ 2008-12-09 23:11       ` Pavel Machek
  0 siblings, 0 replies; 12+ messages in thread
From: Pavel Machek @ 2008-12-09 23:11 UTC (permalink / raw)
  To: Rob Landley
  Cc: Randy Dunlap, kernel list, mtk.manpages, dl9pf, rdunlap,
	linux-doc, Andrew Morton, Trivial patch monkey

On Tue 2008-12-09 15:40:41, Rob Landley wrote:
> On Tuesday 09 December 2008 06:31:52 Pavel Machek wrote:
> > I cleaned the document up according to Randy (thanks!). I don't actually
> > know enough about DRAM error characcteristics, I guess'round the size of
> > bad region up to nearest 2^n makes sense.
> >
> > Signed-off-by: Pavel Machek <pavel@suse.cz>
> >
> > diff --git a/Documentation/bad_memory.txt b/Documentation/bad_memory.txt
> ...
> > +This Howto is about number 3).
> >
> >
> >  BadRAM
> >  ######
> > -BadRAM is the actively developed and available as kernel-patch
> > +BadRAM is the actively developed and available as a kernel patch
> >  here:  http://rick.vanrein.org/linux/badram/
> 
> Ok, once again: the point of this patch is to document an out of tree patch.

No; the point of this piece of documentation is to tell people how to
work _without_ that patch. Because it is simple enough.

> The out of tree patch is here:
> http://rick.vanrein.org/linux/badram/software/BadRAM-2.6.27.1.patch
> 
> It has its own Documentation/badram.txt file and it patches 
> Documentation/memory.txt, as acknowledged here:
> 
> >  For more details see the BadRAM documentation.
> > @@ -27,19 +27,20 @@ For more details see the BadRAM documentation.
> >  memmap
> >  ######
> 
> Now what I don't understand is, why add something to the tree formalizing the 
> out-of-tree status of this other patch?  Why not just merge it?  If
> it's 

Take a look at that patch. It is seriously overengineered. This should
not need a config option, should not introduce new page flag, etc.

We already have perfectly working interface for excluding specific
addresses; maybe we need better documentation, and maybe kernel
commandline interface should be changed to be more user friendly, but
we certainly don't want to take the badram patch.

This excerpt should be enough:

diff -pruN linux-2.6.27/include/linux/page-flags.h
linux-2.6.27-new/include/linux/page-flags.h
--- linux-2.6.27/include/linux/page-flags.h	2008-10-10
03:43:53.000000000 +0530
+++ linux-2.6.27-new/include/linux/page-flags.h	2008-10-15
10:04:48.000000000 +0530
@@ -93,6 +93,9 @@ enum pageflags {
 	PG_mappedtodisk,	/* Has blocks allocated on-disk */
 	PG_reclaim,		/* To be reclaimed asap */
 	PG_buddy,		/* Page is free, on buddy lists */
+#ifdef CONFIG_BADRAM
+	PG_badram,              /* BadRam page */
+#endif
 #ifdef CONFIG_IA64_UNCACHED_ALLOCATOR
 	PG_uncached,		/* Page has been mapped as uncached */
 #
 

									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2008-12-09 23:09 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-11-26 16:15 Document hadling of bad memory Pavel Machek
2008-11-26 16:25 ` Jan-Simon Möller
2008-11-27  0:42 ` Jiri Kosina
2008-11-28  9:00 ` Rob Landley
2008-11-28  9:47   ` Jan-Simon Möller
2008-11-28 12:18   ` Pavel Machek
2008-11-29  5:28     ` Rob Landley
2008-11-29  6:50   ` Andrew Morton
2008-12-01 18:56 ` Randy Dunlap
2008-12-09 12:31   ` Pavel Machek
2008-12-09 21:40     ` Rob Landley
2008-12-09 23:11       ` Pavel Machek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox