public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Andrea Arcangeli <andrea@suse.de>
To: Rik van Riel <riel@redhat.com>
Cc: Hugh Dickins <hugh@veritas.com>,
	Matt_Domsch@Dell.com, marcelo.tosatti@cyclades.com,
	linux-kernel@vger.kernel.org, benh@kernel.crashing.org
Subject: Re: [PATCH] page->flags corruption fix
Date: Sat, 11 Oct 2003 15:48:31 +0200	[thread overview]
Message-ID: <20031011134831.GJ16013@velociraptor.random> (raw)
In-Reply-To: <Pine.LNX.4.44.0310081156320.5568-100000@chimarrao.boston.redhat.com>

On Wed, Oct 08, 2003 at 11:59:06AM -0400, Rik van Riel wrote:
> On Wed, 8 Oct 2003, Hugh Dickins wrote:
> > On Wed, 8 Oct 2003 Matt_Domsch@Dell.com wrote:
> 
> > > We've seen a similar failure with the RHEL2.1 kernel w/o RMAP patches
> > > too.  So we fully believe it's possible in stock 2.4.x.
> > 
> > A similar failure - but what exactly?
> > And what is the actual race which would account for it?
> > 
> > I don't mind you and Rik fixing bugs!
> > I'd just like to understand the bug before it's fixed.
> 
> 1) cpu A adds page P to the swap cache, loading page->flags
>    and modifying it locally
> 
> 2) a second thread scans a page table entry and sees that
>    the page was accessed, so cpu B moves page P to the
>    active list
> 
> 3) cpu A undoes the PG_inactive -> PG_active bit change,
>    corrupting the page->flags of P
> 
> The -rmap VM doesn't do anything to this bug, except making
> it easy to trigger due to some side effects.

I believe the more correct fix is to hold the pagemap_lru_lock during
__add_to_page_cache. The race exists between pages with PG_lru set (in
the lru) that are being added to the pagecache/swapcache. Holding both
spinlocks really avoids the race, your patch sounds less obviously safe
(since the race still happens but it's more "controlled") and a single
spinlock should be more efficient than the flood of atomic bitops
anyways. Comments?  Hugh?

I also can't see why you care about the page->flags in __free_pages_ok.
by that time, if the page is still visible anywhere that's a _real_
huge bug, so that second part really looks wrong and it should be backed
out IMHO. For sure at that time the page can't be in any lru list, at
least in my tree, see what the correct code is there, this's the only
way to handle that case right (the in_interrupt() part is needed only if
you've aio like in your tree like in my tree). So your changes in
free_pages are definitely superflous in page_alloc.c, and I guess
they're not closing all the bugs in your tree too (you need the below
too, if you allow anon pages to be freed while in the lru still with
asynchronous io).

The first part of your patch I agree fixes the race, but I'd prefer just
taking one more spinlock to avoid the race at all, instead of trying to
control it with a flood of bitops.

static void __free_pages_ok (struct page *page, unsigned int order)
{
	unsigned long index, page_idx, mask, flags;
	free_area_t *area;
	struct page *base;
	zone_t *zone;
#ifdef CONFIG_SMP
	per_cpu_pages_t *per_cpu_pages;
#endif

	/*
	 * Yes, think what happens when other parts of the kernel take 
	 * a reference to a page in order to pin it for io. -ben
	 */
	if (PageLRU(page)) {
		if (unlikely(in_interrupt())) {
			unsigned long flags;

			spin_lock_irqsave(&free_pages_ok_no_irq_lock, flags);

			page->next_hash = free_pages_ok_no_irq_head;
			free_pages_ok_no_irq_head = page;
			page->index = order;

			spin_unlock_irqrestore(&free_pages_ok_no_irq_lock, flags);

			schedule_task(&free_pages_ok_no_irq_task);
			return;
		}
		lru_cache_del(page);
	}
[..]

Andrea - If you prefer relying on open source software, check these links:
	    rsync.kernel.org::pub/scm/linux/kernel/bkcvs/linux-2.[45]/
	    http://www.cobite.com/cvsps/

  parent reply	other threads:[~2003-10-11 13:47 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-10-08 15:31 [PATCH] page->flags corruption fix Matt_Domsch
2003-10-08 15:53 ` Hugh Dickins
2003-10-08 15:59   ` Rik van Riel
2003-10-08 17:15     ` Hugh Dickins
2003-10-08 17:41       ` Marcelo Tosatti
2003-10-08 17:52         ` Rik van Riel
2003-10-11 13:48     ` Andrea Arcangeli [this message]
2003-10-11 16:03       ` Andrea Arcangeli
2003-10-12 11:15         ` Hugh Dickins
2003-10-12 13:21           ` Andrea Arcangeli
2003-10-12 13:35             ` Andrea Arcangeli
2003-10-12 14:11           ` Rik van Riel
2003-10-12 14:36             ` Andrea Arcangeli
2003-10-12 17:20               ` Rik van Riel
2003-10-12 20:40                 ` Andrea Arcangeli
  -- strict thread matches above, loose matches on Subject: below --
2003-10-07 16:26 Rik van Riel
2003-10-08 14:49 ` Hugh Dickins
2003-10-08 14:57   ` David S. Miller
2003-10-08 15:10     ` Rik van Riel
2003-10-08 15:47       ` Hugh Dickins
2003-10-08 15:52         ` Rik van Riel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20031011134831.GJ16013@velociraptor.random \
    --to=andrea@suse.de \
    --cc=Matt_Domsch@Dell.com \
    --cc=benh@kernel.crashing.org \
    --cc=hugh@veritas.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=marcelo.tosatti@cyclades.com \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox