All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrea Arcangeli <andrea@suse.de>
To: linux-kernel@vger.kernel.org
Subject: Re: 2.6.5-rc1-aa1
Date: Thu, 18 Mar 2004 23:14:47 +0100	[thread overview]
Message-ID: <20040318221447.GA3248@dualathlon.random> (raw)
In-Reply-To: <20040318022201.GE2113@dualathlon.random>

After one day of stress testing I could reproduce this:

M------------[ cut here ]------------
^Mkernel BUG at mm/objrmap.c:271!
^Minvalid operand: 0000 [#1]
^MSMP
^MCPU:    0
^MEIP:    0060:[<c014df45>]    Not tainted
^MEFLAGS: 00210246   (2.6.5-rc1-aa1)
^MEIP is at page_add_rmap+0x145/0x180
^Meax: 00000000   ebx: c18d8740   ecx: 00200246   edx: c1a0e6a0
^Mesi: e2c26ea8   edi: 4040a8cc   ebp: 00ab3d00   esp: c4323ec8
^Mds: 007b   es: 007b   ss: 0068
^MProcess python (pid: 16772, threadinfo=c4322000 task=e63285f0)
^MStack: 389c8025 00000000 f43c9028 c0149803 00000000 f43ca338 f3e50400 c18fb628
^M       c197d970 00000001 c18d8740 f3e50404 4040a8cc e2c26ea8 e27d3040 e27d3040
^M       e27d3060 e2c26ea8 e63285f0 c0117b24 00000000 c1a14040 00000000 4040a8cc
^MCall Trace:
^M [<c0149803>] handle_mm_fault+0x4c3/0x900
^M [<c0117b24>] do_page_fault+0x164/0x534
^M [<c011945a>] recalc_task_prio+0x8a/0x1c0
^M [<c011b413>] schedule+0x1e3/0x680
^M [<c014c04a>] do_munmap+0x2da/0x430
^M [<c01179c0>] do_page_fault+0x0/0x534
^M [<c0106d01>] error_code+0x2d/0x38

^MCode: 0f 0b 0f 01 13 00 39 c0 eb 90 0f 0b eb 00 13 00 39 c0 e9 2f


After some more debugging I realized what happened. It's a race condition very
hard to trigger.

There's one task with some anonymous memory swapped out, the pte points
the swp_entry. This task forks() and the swp_entry is duplicated.

One of the two childs generates a swapin with a _read_ (so it remains a
swapcache cow), so one of the two ptes is replaced with a pointer to a
swapcache instead of the swp_entry, the page has mapcount == 1 and count
==2.

Then the memory pressure cause try_to_unmap_one to unmap the swapcache
setting the pte back to the swp_entry, and since there was only 1
mapping, I also clear the PG_anon bitflag, but right before
try_to_unmap_one clears the PG_anon, the other process does a minor fault
like this:

	process 1			process 2
	----------			------------
	swapout
					do_swap_page
					lookup_swap_cache
					SetPageAnon
	ClearPageAnon
					page_add_rmap
					BUG_ON(!page->as.mapping) <- crash

the window for the race is incredibly small, it takes hours of heavy
swap on a real life system doing fork sleeping and touching ram readonly
to trigger it (my testbox never triggered it despite the load yet).

The fix is simple: always set and clear PG_anon under the page_map_lock,
this will avoid the race since all ClearPageAnon already runs under the
page_map_lock. I will implement and test in a few hours.

the other way to fix it is to return doing like Dave, that is to clear
PageAnon implicitly in __free_pages_ok but I don't like that, since it's
not robust, if we lose a bitflag with my code the kernel will oops
immediatly, so it's much easier to find the path that lost the bitflag.
I prefer the objrmap code to manage the PG_anon all explicitly
(atomically during the !mapcount++ and !--mapcount transitions) for both
the setting and the clearing of the bitflag, and if we lose it we crash
immediatly in __free_pages_ok (instead of silenty clearing the bitflag
like it would happen in the objrmap patch). I find this more robust.

  parent reply	other threads:[~2004-03-18 22:14 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-03-18  2:22 2.6.5-rc1-aa1 Andrea Arcangeli
2004-03-18 15:32 ` 2.6.5-rc1-aa1 Rik van Riel
2004-03-18 15:53   ` 2.6.5-rc1-aa1 Andrea Arcangeli
2004-03-18 16:42   ` 2.6.5-rc1-aa1 Andrea Arcangeli
2004-03-18 16:49     ` 2.6.5-rc1-aa1 Rik van Riel
2004-03-18 20:15       ` 2.6.5-rc1-aa1 Diego Calleja García
2004-03-19  0:34         ` 2.6.5-rc1-aa1 Bill Davidsen
2004-03-19  1:51           ` 2.6.5-rc1-aa1 Diego Calleja García
2004-03-20 16:31       ` 2.6.5-rc1-aa1 Andrea Arcangeli
2004-03-20 16:36         ` 2.6.5-rc1-aa1 Marc-Christian Petersen
2004-03-18 20:41 ` 2.6.5-rc1-aa1 Hugh Dickins
2004-03-18 23:06   ` 2.6.5-rc1-aa1 Andrea Arcangeli
2004-03-18 23:29     ` 2.6.5-rc1-aa1 Andrea Arcangeli
2004-03-19  0:49     ` 2.6.5-rc1-aa1 Paul Mackerras
2004-03-20 13:35     ` 2.6.5-rc1-aa1 Rik van Riel
2004-03-20 14:25       ` 2.6.5-rc1-aa1 Andrea Arcangeli
2004-03-18 22:14 ` Andrea Arcangeli [this message]
2004-03-18 22:37   ` 2.6.5-rc1-aa1 Hugh Dickins
2004-03-18 23:09     ` 2.6.5-rc1-aa1 Andrea Arcangeli
     [not found] <Pine.GSO.4.58.0403181228360.24039@blue.engin.umich.edu>
2004-03-18 18:03 ` 2.6.5-rc1-aa1 Rik van Riel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20040318221447.GA3248@dualathlon.random \
    --to=andrea@suse.de \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.