public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@zip.com.au>
To: Hugh Dickins <hugh@veritas.com>
Cc: Marcelo Tosatti <marcelo@conectiva.com.br>,
	Benjamin LaHaise <bcrl@redhat.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] __free_pages_ok oops
Date: Wed, 06 Feb 2002 13:11:43 -0800	[thread overview]
Message-ID: <3C619C0F.367A6E67@zip.com.au> (raw)
In-Reply-To: <3C618863.DA7AC3B9@zip.com.au> <Pine.LNX.4.21.0202061958100.2009-100000@localhost.localdomain>

Hugh Dickins wrote:
> 
> On Wed, 6 Feb 2002, Andrew Morton wrote:
> > Hugh Dickins wrote:
> > >
> > > Sorry, no solution, but maybe another oops in __free_pages_ok might help?
> >
> > What problem are you trying to solve?
> 
> Amidst all the prune_dcache and other kswapd oopses reported
> (which I'd love to solve, but still can't work out), there have
> been a couple in shrink_cache itself, where the page from the
> inactive_list is not marked as on LRU, or is marked as Active;
> and also I think a couple in rmqueue, where the free page is
> found to be on LRU.

You noticed too, hey :)

I've been collecting these reports for five or six weeks,
also getting things like .config, machine usage patterns,
machine history, etc.

It's like grabbing shadows, really.  A significant number
of the reporters are using netfilter, and that's basically
the only thing I have to go on at this time.  And a lot of
people use netfilter, so it's probably coincidental.

A number of the reports were confirmed to be against flakey
hardware.  A few more were on cranky old P150's and such,
which I'm tending to dismiss.  In fact the great majority
of reports are likely to be hardware failures.

But I don't recall seeing this volume of reports against
2.2.x.

And we have things like zeus.kernel.org's death yesterday.
Peter is quite certain that it was a software failure.

It certainly looks like random memory corruption.  Quite
frequently the faulting address is just "data".   Examples
from my growing vm-oopses folder include:

364d0a11
16a1842f
5f33f59b
410a0d26
d70f589b
6964656e
3562726b
0017e980
65726198
008209dc

Many more are null-pointer derefs.

> Some of those may have been memtest86ed out of contention since,
> and some may have been on SMP and so not candidates; but it did
> just occur to me that we'd like to be sure nothing is messing
> with the LRU at interrupt time, hence the patch.  Which of
> course solves nothing, but might shed some light.

Sure.  I can't think of any way of chasing this down (if it
exists) apart from putting special-purpose debug code into
the mainstream kernel.

Al suggests a `honey pot' kernel thread which ticks over,
allocating, validating and releasing memory, waiting for
it to get stomped on.   If it gets corrupted we can dump
lots of memory and a task list, I guess.

We could also re-enable slab debugging.

Also we can add some magic numbers to inodes and dentries,
validate addresses and memory locations as we walk the lists,
mainly on the shrink_cache path.  If corruption is detected
then we dump out lots of memory and look through it for
suspicious kernel addresses.

Any other ideas?

-

  reply	other threads:[~2002-02-06 21:12 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-02-06 19:06 [PATCH] __free_pages_ok oops Hugh Dickins
2002-02-06 19:47 ` Andrew Morton
2002-02-06 20:15   ` Hugh Dickins
2002-02-06 21:11     ` Andrew Morton [this message]
2002-02-07 20:31       ` Manfred Spraul
2002-02-07  5:09 ` Benjamin LaHaise
2002-02-07  5:47   ` Andrew Morton
2002-02-07  5:55     ` David S. Miller
2002-02-07  6:19       ` Andrew Morton
2002-02-07  6:49         ` David S. Miller
2002-02-07  7:07           ` Andrew Morton
2002-02-07 11:52             ` Hugh Dickins
2002-02-07 12:34             ` Rik van Riel
2002-02-07 12:37               ` David S. Miller
2002-02-07 12:44                 ` Rik van Riel
2002-02-07 13:19                   ` Hugh Dickins
2002-02-07 13:27                     ` Rik van Riel
2002-02-07 13:55                       ` Daniel Phillips
2002-02-07 14:28                       ` Hugh Dickins
2002-02-07 14:56                         ` Rik van Riel
2002-02-07 20:21                           ` Hugh Dickins
2002-02-07 20:58                         ` Andrea Arcangeli
2002-02-07 21:09                           ` Andrew Morton
2002-02-07 22:18                             ` Andrea Arcangeli
2002-02-07 22:31                               ` Andrew Morton
2002-02-07 23:09                                 ` Andrea Arcangeli
2002-02-07 23:27                                   ` Andrew Morton
2002-02-08 17:46                                     ` Hugh Dickins
2002-02-09 14:14                                       ` Gerd Knorr
2002-02-09 15:47                                         ` arjan
2002-02-09 14:33                                       ` Benjamin LaHaise
2002-02-12 20:19                                         ` Hugh Dickins
2002-02-13 18:52                                           ` Marcelo Tosatti
2002-02-14 10:47                                             ` Hugh Dickins
2002-02-14 11:10                                               ` Gerd Knorr
2002-02-14 13:10                                                 ` Andrea Arcangeli
2002-02-14 14:01                                                   ` Hugh Dickins
2002-02-14 15:17                                                     ` Andrea Arcangeli
2002-02-14 16:27                                                   ` Linus Torvalds
2002-02-25 18:32                                                     ` Benjamin LaHaise
2002-02-25 19:35                                                       ` Linus Torvalds
2002-02-07  9:48         ` Benjamin LaHaise
  -- strict thread matches above, loose matches on Subject: below --
2002-02-09  8:52 alad
2002-02-09 10:46 ` Hugh Dickins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3C619C0F.367A6E67@zip.com.au \
    --to=akpm@zip.com.au \
    --cc=bcrl@redhat.com \
    --cc=hugh@veritas.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=marcelo@conectiva.com.br \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox