From: Andrew Morton <akpm@zip.com.au>
To: Hugh Dickins <hugh@veritas.com>
Cc: Marcelo Tosatti <marcelo@conectiva.com.br>,
Benjamin LaHaise <bcrl@redhat.com>,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH] __free_pages_ok oops
Date: Wed, 06 Feb 2002 13:11:43 -0800 [thread overview]
Message-ID: <3C619C0F.367A6E67@zip.com.au> (raw)
In-Reply-To: <3C618863.DA7AC3B9@zip.com.au> <Pine.LNX.4.21.0202061958100.2009-100000@localhost.localdomain>
Hugh Dickins wrote:
>
> On Wed, 6 Feb 2002, Andrew Morton wrote:
> > Hugh Dickins wrote:
> > >
> > > Sorry, no solution, but maybe another oops in __free_pages_ok might help?
> >
> > What problem are you trying to solve?
>
> Amidst all the prune_dcache and other kswapd oopses reported
> (which I'd love to solve, but still can't work out), there have
> been a couple in shrink_cache itself, where the page from the
> inactive_list is not marked as on LRU, or is marked as Active;
> and also I think a couple in rmqueue, where the free page is
> found to be on LRU.
You noticed too, hey :)
I've been collecting these reports for five or six weeks,
also getting things like .config, machine usage patterns,
machine history, etc.
It's like grabbing shadows, really. A significant number
of the reporters are using netfilter, and that's basically
the only thing I have to go on at this time. And a lot of
people use netfilter, so it's probably coincidental.
A number of the reports were confirmed to be against flakey
hardware. A few more were on cranky old P150's and such,
which I'm tending to dismiss. In fact the great majority
of reports are likely to be hardware failures.
But I don't recall seeing this volume of reports against
2.2.x.
And we have things like zeus.kernel.org's death yesterday.
Peter is quite certain that it was a software failure.
It certainly looks like random memory corruption. Quite
frequently the faulting address is just "data". Examples
from my growing vm-oopses folder include:
364d0a11
16a1842f
5f33f59b
410a0d26
d70f589b
6964656e
3562726b
0017e980
65726198
008209dc
Many more are null-pointer derefs.
> Some of those may have been memtest86ed out of contention since,
> and some may have been on SMP and so not candidates; but it did
> just occur to me that we'd like to be sure nothing is messing
> with the LRU at interrupt time, hence the patch. Which of
> course solves nothing, but might shed some light.
Sure. I can't think of any way of chasing this down (if it
exists) apart from putting special-purpose debug code into
the mainstream kernel.
Al suggests a `honey pot' kernel thread which ticks over,
allocating, validating and releasing memory, waiting for
it to get stomped on. If it gets corrupted we can dump
lots of memory and a task list, I guess.
We could also re-enable slab debugging.
Also we can add some magic numbers to inodes and dentries,
validate addresses and memory locations as we walk the lists,
mainly on the shrink_cache path. If corruption is detected
then we dump out lots of memory and look through it for
suspicious kernel addresses.
Any other ideas?
-
next prev parent reply other threads:[~2002-02-06 21:12 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2002-02-06 19:06 [PATCH] __free_pages_ok oops Hugh Dickins
2002-02-06 19:47 ` Andrew Morton
2002-02-06 20:15 ` Hugh Dickins
2002-02-06 21:11 ` Andrew Morton [this message]
2002-02-07 20:31 ` Manfred Spraul
2002-02-07 5:09 ` Benjamin LaHaise
2002-02-07 5:47 ` Andrew Morton
2002-02-07 5:55 ` David S. Miller
2002-02-07 6:19 ` Andrew Morton
2002-02-07 6:49 ` David S. Miller
2002-02-07 7:07 ` Andrew Morton
2002-02-07 11:52 ` Hugh Dickins
2002-02-07 12:34 ` Rik van Riel
2002-02-07 12:37 ` David S. Miller
2002-02-07 12:44 ` Rik van Riel
2002-02-07 13:19 ` Hugh Dickins
2002-02-07 13:27 ` Rik van Riel
2002-02-07 13:55 ` Daniel Phillips
2002-02-07 14:28 ` Hugh Dickins
2002-02-07 14:56 ` Rik van Riel
2002-02-07 20:21 ` Hugh Dickins
2002-02-07 20:58 ` Andrea Arcangeli
2002-02-07 21:09 ` Andrew Morton
2002-02-07 22:18 ` Andrea Arcangeli
2002-02-07 22:31 ` Andrew Morton
2002-02-07 23:09 ` Andrea Arcangeli
2002-02-07 23:27 ` Andrew Morton
2002-02-08 17:46 ` Hugh Dickins
2002-02-09 14:14 ` Gerd Knorr
2002-02-09 15:47 ` arjan
2002-02-09 14:33 ` Benjamin LaHaise
2002-02-12 20:19 ` Hugh Dickins
2002-02-13 18:52 ` Marcelo Tosatti
2002-02-14 10:47 ` Hugh Dickins
2002-02-14 11:10 ` Gerd Knorr
2002-02-14 13:10 ` Andrea Arcangeli
2002-02-14 14:01 ` Hugh Dickins
2002-02-14 15:17 ` Andrea Arcangeli
2002-02-14 16:27 ` Linus Torvalds
2002-02-25 18:32 ` Benjamin LaHaise
2002-02-25 19:35 ` Linus Torvalds
2002-02-07 9:48 ` Benjamin LaHaise
-- strict thread matches above, loose matches on Subject: below --
2002-02-09 8:52 alad
2002-02-09 10:46 ` Hugh Dickins
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3C619C0F.367A6E67@zip.com.au \
--to=akpm@zip.com.au \
--cc=bcrl@redhat.com \
--cc=hugh@veritas.com \
--cc=linux-kernel@vger.kernel.org \
--cc=marcelo@conectiva.com.br \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.