public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] page->flags corruption fix
@ 2003-10-07 16:26 Rik van Riel
  2003-10-08 14:49 ` Hugh Dickins
  0 siblings, 1 reply; 21+ messages in thread
From: Rik van Riel @ 2003-10-07 16:26 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Matt Domsch, linux-kernel, benh

In the "better safe than sorry" category. Thanks go out to
Matt Domsch and Robert Hentosh. A similar fix went into the
2.6 kernel. Please apply.

# This is a BitKeeper generated patch for the following project:
# Project Name: Linux kernel tree
# This patch format is intended for GNU patch command version 2.5 or higher.
# This patch includes the following deltas:
#	           ChangeSet	1.1192  -> 1.1193 
#	  include/linux/mm.h	1.43    -> 1.44   
#	     mm/page_alloc.c	1.63    -> 1.64   
#	        mm/filemap.c	1.88    -> 1.89   
#
# The following is the BitKeeper ChangeSet Log
# --------------------------------------------
# 03/10/07	riel@cessna.boston.redhat.com	1.1193
# fix page->flags corruption due to races between atomic and non-atomic
# accesses, originally found and fixed by Robert Hentosh and Matt Domsch
# --------------------------------------------
#
diff -Nru a/include/linux/mm.h b/include/linux/mm.h
--- a/include/linux/mm.h	Tue Oct  7 12:19:34 2003
+++ b/include/linux/mm.h	Tue Oct  7 12:19:34 2003
@@ -322,9 +322,11 @@
 #define TryLockPage(page)	test_and_set_bit(PG_locked, &(page)->flags)
 #define PageChecked(page)	test_bit(PG_checked, &(page)->flags)
 #define SetPageChecked(page)	set_bit(PG_checked, &(page)->flags)
+#define ClearPageChecked(page)	clear_bit(PG_checked, &(page)->flags)
 #define PageLaunder(page)	test_bit(PG_launder, &(page)->flags)
 #define SetPageLaunder(page)	set_bit(PG_launder, &(page)->flags)
 #define ClearPageLaunder(page)	clear_bit(PG_launder, &(page)->flags)
+#define ClearPageArch1(page)	clear_bit(PG_arch_1, &(page)->flags)
 
 /*
  * The zone field is never updated after free_area_init_core()
diff -Nru a/mm/filemap.c b/mm/filemap.c
--- a/mm/filemap.c	Tue Oct  7 12:19:34 2003
+++ b/mm/filemap.c	Tue Oct  7 12:19:34 2003
@@ -654,10 +654,13 @@
 	struct address_space *mapping, unsigned long offset,
 	struct page **hash)
 {
-	unsigned long flags;
-
-	flags = page->flags & ~(1 << PG_uptodate | 1 << PG_error | 1 << PG_dirty | 1 << PG_referenced | 1 << PG_arch_1 | 1 << PG_checked);
-	page->flags = flags | (1 << PG_locked);
+	ClearPageUptodate(page);
+	ClearPageError(page);
+	ClearPageDirty(page);
+	ClearPageReferenced(page);
+	ClearPageArch1(page);
+	ClearPageChecked(page);
+	LockPage(page);
 	page_cache_get(page);
 	page->index = offset;
 	add_page_to_inode_queue(mapping, page);
diff -Nru a/mm/page_alloc.c b/mm/page_alloc.c
--- a/mm/page_alloc.c	Tue Oct  7 12:19:34 2003
+++ b/mm/page_alloc.c	Tue Oct  7 12:19:34 2003
@@ -109,7 +109,8 @@
 		BUG();
 	if (PageActive(page))
 		BUG();
-	page->flags &= ~((1<<PG_referenced) | (1<<PG_dirty));
+	ClearPageReferenced(page);
+	ClearPageDirty(page);
 
 	if (current->flags & PF_FREE_PAGES)
 		goto local_freelist;


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] page->flags corruption fix
  2003-10-07 16:26 Rik van Riel
@ 2003-10-08 14:49 ` Hugh Dickins
  2003-10-08 14:57   ` David S. Miller
  0 siblings, 1 reply; 21+ messages in thread
From: Hugh Dickins @ 2003-10-08 14:49 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Marcelo Tosatti, Matt Domsch, linux-kernel, benh

On Tue, 7 Oct 2003, Rik van Riel wrote:

> In the "better safe than sorry" category. Thanks go out to
> Matt Domsch and Robert Hentosh. A similar fix went into the
> 2.6 kernel. Please apply.

Seven atomic ops in a row, isn't that rather inefficient?
The 2.6 version clears those PG_flags all together in one
non-atomic op - but elsewhere, in prep_new_page.

Is there an actual test case for why 2.4 now needs this change?

If there's something trying to lock random pages, of course it would
be needed; or should that something be taking a reference instead?

Hugh


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] page->flags corruption fix
  2003-10-08 14:49 ` Hugh Dickins
@ 2003-10-08 14:57   ` David S. Miller
  2003-10-08 15:10     ` Rik van Riel
  0 siblings, 1 reply; 21+ messages in thread
From: David S. Miller @ 2003-10-08 14:57 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: riel, marcelo.tosatti, Matt_Domsch, linux-kernel, benh

On Wed, 8 Oct 2003 15:49:34 +0100 (BST)
Hugh Dickins <hugh@veritas.com> wrote:

> Seven atomic ops in a row, isn't that rather inefficient?
> The 2.6 version clears those PG_flags all together in one
> non-atomic op - but elsewhere, in prep_new_page.
> 
> Is there an actual test case for why 2.4 now needs this change?

It's not a new bug, we've always had this bug in 2.4.x

There is so much code that does atomic bit ops asynchronously on
page->flags that it is therefore not safe for any context to make
non-atomic changes to the values there.

The patch Rik posted is the safest solution to this bug.  If you
want to make it faster by creating a "mutliple bit" set of bitops
go right ahead, but be nice enough to make sure every single architecture
has an implementation before suggesting that the change gets merged
into 2.4.x

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] page->flags corruption fix
  2003-10-08 14:57   ` David S. Miller
@ 2003-10-08 15:10     ` Rik van Riel
  2003-10-08 15:47       ` Hugh Dickins
  0 siblings, 1 reply; 21+ messages in thread
From: Rik van Riel @ 2003-10-08 15:10 UTC (permalink / raw)
  To: David S. Miller
  Cc: Hugh Dickins, marcelo.tosatti, Matt_Domsch, linux-kernel, benh

On Wed, 8 Oct 2003, David S. Miller wrote:
> On Wed, 8 Oct 2003 15:49:34 +0100 (BST)
> Hugh Dickins <hugh@veritas.com> wrote:
> 
> > Seven atomic ops in a row, isn't that rather inefficient?

Absolutely.

> > Is there an actual test case for why 2.4 now needs this change?
> 
> It's not a new bug, we've always had this bug in 2.4.x

Though I suspect it's gotten worse since 2.4.14 or so, where
we moved the final lru_cache_del() into __free_pages_ok() and
the fact that anonymous pages are on the lru lists.

It's quite possible that one CPU adds the page to the swap
cache, while another CPU moves the page around on the inactive
list. At that point both CPUs could be fiddling around with
the page->flags simultaneously.

In fact, this has been observed in heavy stress testing by
Matt Domsch and Robert Hentosh...

-- 
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] page->flags corruption fix
@ 2003-10-08 15:31 Matt_Domsch
  2003-10-08 15:53 ` Hugh Dickins
  0 siblings, 1 reply; 21+ messages in thread
From: Matt_Domsch @ 2003-10-08 15:31 UTC (permalink / raw)
  To: hugh; +Cc: riel, marcelo.tosatti, linux-kernel, benh

On Wed, 2003-10-08 at 09:49, Hugh Dickins wrote:
> On Tue, 7 Oct 2003, Rik van Riel wrote:
> 
> > In the "better safe than sorry" category. Thanks go out to
> > Matt Domsch and Robert Hentosh. A similar fix went into the
> > 2.6 kernel. Please apply.
> 
> Seven atomic ops in a row, isn't that rather inefficient?

Not all arches have atomic_set_mask() and atomic_clear_mask().
asm-arm
asm-arm26
asm-h8300
asm-i386
asm-m68k
asm-m68knommu
asm-ppc
asm-s390
asm-sh
asm-v850
asm-x86_64

do.


> The 2.6 version clears those PG_flags all together in one
> non-atomic op - but elsewhere, in prep_new_page.
>
> Is there an actual test case for why 2.4 now needs this change?

There definitely is when RMAP is present - we've reproduced it
repeatedly in our labs.

We've seen a similar failure with the RHEL2.1 kernel w/o RMAP patches
too.  So we fully believe it's possible in stock 2.4.x.

Thanks,
Matt

-- 
Matt Domsch
Sr. Software Engineer, Lead Engineer
Dell Linux Solutions www.dell.com/linux
Linux on Dell mailing lists @ http://lists.us.dell.com


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] page->flags corruption fix
  2003-10-08 15:10     ` Rik van Riel
@ 2003-10-08 15:47       ` Hugh Dickins
  2003-10-08 15:52         ` Rik van Riel
  0 siblings, 1 reply; 21+ messages in thread
From: Hugh Dickins @ 2003-10-08 15:47 UTC (permalink / raw)
  To: Rik van Riel
  Cc: David S. Miller, marcelo.tosatti, Matt_Domsch, linux-kernel, benh

On Wed, 8 Oct 2003, Rik van Riel wrote:
> 
> Though I suspect it's gotten worse since 2.4.14 or so, where
> we moved the final lru_cache_del() into __free_pages_ok() and
> the fact that anonymous pages are on the lru lists.

I agree both of those make it all more fragile;
but I still don't quite see where it's broken.

The existing shortcuts are normally dealing with either a freed
page or a just allocated, not yet published, page.  And you're
right that the anon->swap case is an exception, less obvious.

> It's quite possible that one CPU adds the page to the swap
> cache, while another CPU moves the page around on the inactive
> list. At that point both CPUs could be fiddling around with
> the page->flags simultaneously.

Don't both of those TryLockPage (in one case holding page_table_lock,
in the other case holding pagemap_lru_lock, to hold the page while
doing so)?

> In fact, this has been observed in heavy stress testing by
> Matt Domsch and Robert Hentosh...

Perhaps Matt could add description of what they observed, in support
of the patch.  I'm not yet convinced that it's necessarily the fix.

Hugh


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] page->flags corruption fix
  2003-10-08 15:47       ` Hugh Dickins
@ 2003-10-08 15:52         ` Rik van Riel
  0 siblings, 0 replies; 21+ messages in thread
From: Rik van Riel @ 2003-10-08 15:52 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: David S. Miller, marcelo.tosatti, Matt_Domsch, linux-kernel, benh

On Wed, 8 Oct 2003, Hugh Dickins wrote:

> > It's quite possible that one CPU adds the page to the swap
> > cache, while another CPU moves the page around on the inactive
> > list. At that point both CPUs could be fiddling around with
> > the page->flags simultaneously.
> 
> Don't both of those TryLockPage (in one case holding page_table_lock,
> in the other case holding pagemap_lru_lock, to hold the page while
> doing so)?

Nope. I'm afraid they don't. Think of eg. mark_page_accessed
moving the page to the active list ...

> > In fact, this has been observed in heavy stress testing by
> > Matt Domsch and Robert Hentosh...
> 
> Perhaps Matt could add description of what they observed, in support
> of the patch.  I'm not yet convinced that it's necessarily the fix.

I'm not convinced all of the operations need to be atomic
(especially the ones in __free_pages_ok()) but since I've
seen the bug happen I'd rather play safe...

-- 
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] page->flags corruption fix
  2003-10-08 15:31 [PATCH] page->flags corruption fix Matt_Domsch
@ 2003-10-08 15:53 ` Hugh Dickins
  2003-10-08 15:59   ` Rik van Riel
  0 siblings, 1 reply; 21+ messages in thread
From: Hugh Dickins @ 2003-10-08 15:53 UTC (permalink / raw)
  To: Matt_Domsch; +Cc: riel, marcelo.tosatti, linux-kernel, benh

On Wed, 8 Oct 2003 Matt_Domsch@Dell.com wrote:
> 
> There definitely is when RMAP is present - we've reproduced it
> repeatedly in our labs.

That would be an argument for changing the rmap patch.

> We've seen a similar failure with the RHEL2.1 kernel w/o RMAP patches
> too.  So we fully believe it's possible in stock 2.4.x.

A similar failure - but what exactly?
And what is the actual race which would account for it?

I don't mind you and Rik fixing bugs!
I'd just like to understand the bug before it's fixed.

Hugh


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] page->flags corruption fix
  2003-10-08 15:53 ` Hugh Dickins
@ 2003-10-08 15:59   ` Rik van Riel
  2003-10-08 17:15     ` Hugh Dickins
  2003-10-11 13:48     ` Andrea Arcangeli
  0 siblings, 2 replies; 21+ messages in thread
From: Rik van Riel @ 2003-10-08 15:59 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Matt_Domsch, marcelo.tosatti, linux-kernel, benh

On Wed, 8 Oct 2003, Hugh Dickins wrote:
> On Wed, 8 Oct 2003 Matt_Domsch@Dell.com wrote:

> > We've seen a similar failure with the RHEL2.1 kernel w/o RMAP patches
> > too.  So we fully believe it's possible in stock 2.4.x.
> 
> A similar failure - but what exactly?
> And what is the actual race which would account for it?
> 
> I don't mind you and Rik fixing bugs!
> I'd just like to understand the bug before it's fixed.

1) cpu A adds page P to the swap cache, loading page->flags
   and modifying it locally

2) a second thread scans a page table entry and sees that
   the page was accessed, so cpu B moves page P to the
   active list

3) cpu A undoes the PG_inactive -> PG_active bit change,
   corrupting the page->flags of P

The -rmap VM doesn't do anything to this bug, except making
it easy to trigger due to some side effects.

-- 
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] page->flags corruption fix
  2003-10-08 15:59   ` Rik van Riel
@ 2003-10-08 17:15     ` Hugh Dickins
  2003-10-08 17:41       ` Marcelo Tosatti
  2003-10-11 13:48     ` Andrea Arcangeli
  1 sibling, 1 reply; 21+ messages in thread
From: Hugh Dickins @ 2003-10-08 17:15 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Matt_Domsch, marcelo.tosatti, linux-kernel, benh

On Wed, 8 Oct 2003, Rik van Riel wrote:
> 
> 1) cpu A adds page P to the swap cache, loading page->flags
>    and modifying it locally

Right, the add_to_swap_cache in try_to_swap_out.

> 2) a second thread scans a page table entry and sees that
>    the page was accessed, so cpu B moves page P to the
>    active list

Right, the mark_page_accessed in try_to_swap_out
(I don't see any other mark_page_accessed as problematic).
Or the del_page_from_active_list in refill_inactive.

> 3) cpu A undoes the PG_inactive -> PG_active bit change,
>    corrupting the page->flags of P

(Mainline is easier than -rmap since no separate PG_inactive bit.)

Thanks a lot for explaining, I see it now.  Personally I'd prefer a
lighter weight patch, either pagemap_lru_lock within add_to_swap_cache,
or better moving the PG_flags clearing from __add_to_page_cache into
__free_pages_ok, where I still believe it can be done non-atomically.

But you've proved your point, and I understand you preferring a more
straightforward and future-proof way.  I'm not writing an alternative
patch, don't let me stand in the way of progress...

A little of the above explanation in the change comment would be nice.

Thanks,
Hugh


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] page->flags corruption fix
  2003-10-08 17:15     ` Hugh Dickins
@ 2003-10-08 17:41       ` Marcelo Tosatti
  2003-10-08 17:52         ` Rik van Riel
  0 siblings, 1 reply; 21+ messages in thread
From: Marcelo Tosatti @ 2003-10-08 17:41 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Rik van Riel, Matt_Domsch, marcelo.tosatti, linux-kernel, benh



On Wed, 8 Oct 2003, Hugh Dickins wrote:

> On Wed, 8 Oct 2003, Rik van Riel wrote:
> > 
> > 1) cpu A adds page P to the swap cache, loading page->flags
> >    and modifying it locally
> 
> Right, the add_to_swap_cache in try_to_swap_out.
> 
> > 2) a second thread scans a page table entry and sees that
> >    the page was accessed, so cpu B moves page P to the
> >    active list
> 
> Right, the mark_page_accessed in try_to_swap_out
> (I don't see any other mark_page_accessed as problematic).
> Or the del_page_from_active_list in refill_inactive.
> 
> > 3) cpu A undoes the PG_inactive -> PG_active bit change,
> >    corrupting the page->flags of P
> 
> (Mainline is easier than -rmap since no separate PG_inactive bit.)
> 
> Thanks a lot for explaining, I see it now.  Personally I'd prefer a
> lighter weight patch, either pagemap_lru_lock within add_to_swap_cache,
> or better moving the PG_flags clearing from __add_to_page_cache into
> __free_pages_ok, where I still believe it can be done non-atomically.
> 
> But you've proved your point, and I understand you preferring a more
> straightforward and future-proof way.  I'm not writing an alternative
> patch, don't let me stand in the way of progress...
> 
> A little of the above explanation in the change comment would be nice.

Maybe comments on top of the code? 

Rik? :)


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] page->flags corruption fix
  2003-10-08 17:41       ` Marcelo Tosatti
@ 2003-10-08 17:52         ` Rik van Riel
  0 siblings, 0 replies; 21+ messages in thread
From: Rik van Riel @ 2003-10-08 17:52 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Hugh Dickins, Matt_Domsch, linux-kernel, benh

On Wed, 8 Oct 2003, Marcelo Tosatti wrote:

> > A little of the above explanation in the change comment would be nice.
> 
> Maybe comments on top of the code? 
> 
> Rik? :)

Here you are:

# This is a BitKeeper generated patch for the following project:
# Project Name: Linux kernel tree
# This patch format is intended for GNU patch command version 2.5 or higher.
# This patch includes the following deltas:
#	           ChangeSet	1.1193  -> 1.1194 
#	        mm/filemap.c	1.89    -> 1.90   
#
# The following is the BitKeeper ChangeSet Log
# --------------------------------------------
# 03/10/08	riel@cessna.boston.redhat.com	1.1194
# comment on why the atomic updates are needed, on request
# --------------------------------------------
#
diff -Nru a/mm/filemap.c b/mm/filemap.c
--- a/mm/filemap.c	Wed Oct  8 13:47:12 2003
+++ b/mm/filemap.c	Wed Oct  8 13:47:12 2003
@@ -654,6 +654,12 @@
 	struct address_space *mapping, unsigned long offset,
 	struct page **hash)
 {
+	/*
+	 * Yes this is inefficient, however it is needed.  The problem
+	 * is that we could be adding a page to the swap cache while
+	 * another CPU is also modifying page->flags, so the updates
+	 * really do need to be atomic.  -- Rik
+	 */
 	ClearPageUptodate(page);
 	ClearPageError(page);
 	ClearPageDirty(page);


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] page->flags corruption fix
  2003-10-08 15:59   ` Rik van Riel
  2003-10-08 17:15     ` Hugh Dickins
@ 2003-10-11 13:48     ` Andrea Arcangeli
  2003-10-11 16:03       ` Andrea Arcangeli
  1 sibling, 1 reply; 21+ messages in thread
From: Andrea Arcangeli @ 2003-10-11 13:48 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Hugh Dickins, Matt_Domsch, marcelo.tosatti, linux-kernel, benh

On Wed, Oct 08, 2003 at 11:59:06AM -0400, Rik van Riel wrote:
> On Wed, 8 Oct 2003, Hugh Dickins wrote:
> > On Wed, 8 Oct 2003 Matt_Domsch@Dell.com wrote:
> 
> > > We've seen a similar failure with the RHEL2.1 kernel w/o RMAP patches
> > > too.  So we fully believe it's possible in stock 2.4.x.
> > 
> > A similar failure - but what exactly?
> > And what is the actual race which would account for it?
> > 
> > I don't mind you and Rik fixing bugs!
> > I'd just like to understand the bug before it's fixed.
> 
> 1) cpu A adds page P to the swap cache, loading page->flags
>    and modifying it locally
> 
> 2) a second thread scans a page table entry and sees that
>    the page was accessed, so cpu B moves page P to the
>    active list
> 
> 3) cpu A undoes the PG_inactive -> PG_active bit change,
>    corrupting the page->flags of P
> 
> The -rmap VM doesn't do anything to this bug, except making
> it easy to trigger due to some side effects.

I believe the more correct fix is to hold the pagemap_lru_lock during
__add_to_page_cache. The race exists between pages with PG_lru set (in
the lru) that are being added to the pagecache/swapcache. Holding both
spinlocks really avoids the race, your patch sounds less obviously safe
(since the race still happens but it's more "controlled") and a single
spinlock should be more efficient than the flood of atomic bitops
anyways. Comments?  Hugh?

I also can't see why you care about the page->flags in __free_pages_ok.
by that time, if the page is still visible anywhere that's a _real_
huge bug, so that second part really looks wrong and it should be backed
out IMHO. For sure at that time the page can't be in any lru list, at
least in my tree, see what the correct code is there, this's the only
way to handle that case right (the in_interrupt() part is needed only if
you've aio like in your tree like in my tree). So your changes in
free_pages are definitely superflous in page_alloc.c, and I guess
they're not closing all the bugs in your tree too (you need the below
too, if you allow anon pages to be freed while in the lru still with
asynchronous io).

The first part of your patch I agree fixes the race, but I'd prefer just
taking one more spinlock to avoid the race at all, instead of trying to
control it with a flood of bitops.

static void __free_pages_ok (struct page *page, unsigned int order)
{
	unsigned long index, page_idx, mask, flags;
	free_area_t *area;
	struct page *base;
	zone_t *zone;
#ifdef CONFIG_SMP
	per_cpu_pages_t *per_cpu_pages;
#endif

	/*
	 * Yes, think what happens when other parts of the kernel take 
	 * a reference to a page in order to pin it for io. -ben
	 */
	if (PageLRU(page)) {
		if (unlikely(in_interrupt())) {
			unsigned long flags;

			spin_lock_irqsave(&free_pages_ok_no_irq_lock, flags);

			page->next_hash = free_pages_ok_no_irq_head;
			free_pages_ok_no_irq_head = page;
			page->index = order;

			spin_unlock_irqrestore(&free_pages_ok_no_irq_lock, flags);

			schedule_task(&free_pages_ok_no_irq_task);
			return;
		}
		lru_cache_del(page);
	}
[..]

Andrea - If you prefer relying on open source software, check these links:
	    rsync.kernel.org::pub/scm/linux/kernel/bkcvs/linux-2.[45]/
	    http://www.cobite.com/cvsps/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] page->flags corruption fix
  2003-10-11 13:48     ` Andrea Arcangeli
@ 2003-10-11 16:03       ` Andrea Arcangeli
  2003-10-12 11:15         ` Hugh Dickins
  0 siblings, 1 reply; 21+ messages in thread
From: Andrea Arcangeli @ 2003-10-11 16:03 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Hugh Dickins, Matt_Domsch, marcelo.tosatti, linux-kernel, benh

On Sat, Oct 11, 2003 at 03:48:31PM +0200, Andrea Arcangeli wrote:
> On Wed, Oct 08, 2003 at 11:59:06AM -0400, Rik van Riel wrote:
> > On Wed, 8 Oct 2003, Hugh Dickins wrote:
> > > On Wed, 8 Oct 2003 Matt_Domsch@Dell.com wrote:
> > 
> > > > We've seen a similar failure with the RHEL2.1 kernel w/o RMAP patches
> > > > too.  So we fully believe it's possible in stock 2.4.x.
> > > 
> > > A similar failure - but what exactly?
> > > And what is the actual race which would account for it?
> > > 
> > > I don't mind you and Rik fixing bugs!
> > > I'd just like to understand the bug before it's fixed.
> > 
> > 1) cpu A adds page P to the swap cache, loading page->flags
> >    and modifying it locally
> > 
> > 2) a second thread scans a page table entry and sees that
> >    the page was accessed, so cpu B moves page P to the
> >    active list
> > 
> > 3) cpu A undoes the PG_inactive -> PG_active bit change,
> >    corrupting the page->flags of P
> > 
> > The -rmap VM doesn't do anything to this bug, except making
> > it easy to trigger due to some side effects.
> 
> I believe the more correct fix is to hold the pagemap_lru_lock during
> __add_to_page_cache. The race exists between pages with PG_lru set (in
> the lru) that are being added to the pagecache/swapcache. Holding both
> spinlocks really avoids the race, your patch sounds less obviously safe
> (since the race still happens but it's more "controlled") and a single
> spinlock should be more efficient than the flood of atomic bitops
> anyways. Comments?  Hugh?

basically the race could only happen by running __add_to_page_cache on a
page that was in the lru already. So it could happen with anon pages and
shm too, never with plain pagecache. It's all about the PG_lru and
PG_active bitflag. Both of them can only be changed inside the
pagemap_lru_lock. So taking the pagemap_lru_lock will avoid the race.
Can anybody see any race with this patch applied on top of 2.4.23pre7?

	http://www.us.kernel.org/pub/linux/kernel/people/andrea/patches/v2.4/2.4.23pre7/anon-lru-race-better-fix-1

I believe this is cleaner and faster than the flood of bitops, since
it's 2 atomic instructions only and it only happens in the paths where
the race had a chance to trigger, so all the critical pagecache paths
aren't affected at all anymore by the slowdown.

For the page_alloc.c changes, they must not make a difference so I
backed them out.

Comments welcome, thanks.

side note: this won't break any architectural code.

diff -urNp --exclude CVS --exclude BitKeeper --exclude {arch} --exclude .arch-ids 2.4.23pre7/include/linux/mm.h race/include/linux/mm.h
--- 2.4.23pre7/include/linux/mm.h	2003-10-10 08:08:27.000000000 +0200
+++ race/include/linux/mm.h	2003-10-11 17:32:02.000000000 +0200
@@ -322,11 +322,9 @@ typedef struct page {
 #define TryLockPage(page)	test_and_set_bit(PG_locked, &(page)->flags)
 #define PageChecked(page)	test_bit(PG_checked, &(page)->flags)
 #define SetPageChecked(page)	set_bit(PG_checked, &(page)->flags)
-#define ClearPageChecked(page)	clear_bit(PG_checked, &(page)->flags)
 #define PageLaunder(page)	test_bit(PG_launder, &(page)->flags)
 #define SetPageLaunder(page)	set_bit(PG_launder, &(page)->flags)
 #define ClearPageLaunder(page)	clear_bit(PG_launder, &(page)->flags)
-#define ClearPageArch1(page)	clear_bit(PG_arch_1, &(page)->flags)
 
 /*
  * The zone field is never updated after free_area_init_core()
diff -urNp --exclude CVS --exclude BitKeeper --exclude {arch} --exclude .arch-ids 2.4.23pre7/mm/filemap.c race/mm/filemap.c
--- 2.4.23pre7/mm/filemap.c	2003-10-10 08:08:32.000000000 +0200
+++ race/mm/filemap.c	2003-10-11 17:48:32.000000000 +0200
@@ -656,19 +656,10 @@ static inline void __add_to_page_cache(s
 	struct address_space *mapping, unsigned long offset,
 	struct page **hash)
 {
-	/*
-	 * Yes this is inefficient, however it is needed.  The problem
-	 * is that we could be adding a page to the swap cache while
-	 * another CPU is also modifying page->flags, so the updates
-	 * really do need to be atomic.  -- Rik
-	 */
-	ClearPageUptodate(page);
-	ClearPageError(page);
-	ClearPageDirty(page);
-	ClearPageReferenced(page);
-	ClearPageArch1(page);
-	ClearPageChecked(page);
-	LockPage(page);
+	unsigned long flags;
+
+	flags = page->flags & ~(1 << PG_uptodate | 1 << PG_error | 1 << PG_dirty | 1 << PG_referenced | 1 << PG_arch_1 | 1 << PG_checked);
+	page->flags = flags | (1 << PG_locked);
 	page_cache_get(page);
 	page->index = offset;
 	add_page_to_inode_queue(mapping, page);
@@ -690,6 +681,7 @@ int add_to_page_cache_unique(struct page
 	int err;
 	struct page *alias;
 
+	spin_lock(&pagemap_lru_lock);
 	spin_lock(&pagecache_lock);
 	alias = __find_page_nolock(mapping, offset, *hash);
 
@@ -700,6 +692,7 @@ int add_to_page_cache_unique(struct page
 	}
 
 	spin_unlock(&pagecache_lock);
+	spin_unlock(&pagemap_lru_lock);
 	if (!err)
 		lru_cache_add(page);
 	return err;
diff -urNp --exclude CVS --exclude BitKeeper --exclude {arch} --exclude .arch-ids 2.4.23pre7/mm/page_alloc.c race/mm/page_alloc.c
--- 2.4.23pre7/mm/page_alloc.c	2003-10-10 08:08:32.000000000 +0200
+++ race/mm/page_alloc.c	2003-10-11 17:32:02.000000000 +0200
@@ -109,8 +109,7 @@ static void __free_pages_ok (struct page
 		BUG();
 	if (PageActive(page))
 		BUG();
-	ClearPageReferenced(page);
-	ClearPageDirty(page);
+	page->flags &= ~((1<<PG_referenced) | (1<<PG_dirty));
 
 	if (current->flags & PF_FREE_PAGES)
 		goto local_freelist;

Andrea - If you prefer relying on open source software, check these links:
	    rsync.kernel.org::pub/scm/linux/kernel/bkcvs/linux-2.[45]/
	    http://www.cobite.com/cvsps/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] page->flags corruption fix
  2003-10-11 16:03       ` Andrea Arcangeli
@ 2003-10-12 11:15         ` Hugh Dickins
  2003-10-12 13:21           ` Andrea Arcangeli
  2003-10-12 14:11           ` Rik van Riel
  0 siblings, 2 replies; 21+ messages in thread
From: Hugh Dickins @ 2003-10-12 11:15 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Rik van Riel, Matt_Domsch, marcelo.tosatti, linux-kernel, benh

On Sat, 11 Oct 2003, Andrea Arcangeli wrote:
> On Sat, Oct 11, 2003 at 03:48:31PM +0200, Andrea Arcangeli wrote:
> > 
> > I believe the more correct fix is to hold the pagemap_lru_lock during
> > __add_to_page_cache. The race exists between pages with PG_lru set (in
> > the lru) that are being added to the pagecache/swapcache. Holding both
> > spinlocks really avoids the race, your patch sounds less obviously safe
> > (since the race still happens but it's more "controlled") and a single
> > spinlock should be more efficient than the flood of atomic bitops
> > anyways. Comments?  Hugh?

I agree.  (I imagine some processors might implement that atomic flood
quite efficiently, but most not.)

> basically the race could only happen by running __add_to_page_cache on a
> page that was in the lru already. So it could happen with anon pages and
> shm too, never with plain pagecache. It's all about the PG_lru and
> PG_active bitflag. Both of them can only be changed inside the
> pagemap_lru_lock. So taking the pagemap_lru_lock will avoid the race.
> Can anybody see any race with this patch applied on top of 2.4.23pre7?
> 
> 	http://www.us.kernel.org/pub/linux/kernel/people/andrea/patches/v2.4/2.4.23pre7/anon-lru-race-better-fix-1
> 
> I believe this is cleaner and faster than the flood of bitops, since
> it's 2 atomic instructions only and it only happens in the paths where
> the race had a chance to trigger, so all the critical pagecache paths
> aren't affected at all anymore by the slowdown.

I believe your patch is correct, and prefer it to the atomic flood.

But I thought it almost coincidental that the racy cases happen to use
add_to_page_cache_unique rather than add_to_page_cache, and deserved
comment: until I noticed _nothing_ now uses add_to_page_cache, so
please just delete it from filemap.c and pagemap.h - it wasn't
exported, so no issue with out-of-tree modules (sorry, looks like
cleanup I should have done when I updated shmem.c in 2.4.22).

And maybe add __lru_cache_add (inline? FASTCALL?) for the
		if (!TestSetPageLRU(page))
			add_page_to_inactive_list(page);
so you don't have to drop pagemap_lru_lock before lru_cache_add.

> For the page_alloc.c changes, they must not make a difference so I
> backed them out.

I agree on that too: I think Rik did it for atomicity throughout,
to make the safety obvious; but in practice it was already safe.

However, I'm still not satisfied by your patch.  The patch
which will satisfy me is to remove all that bit clearing from
__add_to_page_cache, leaving just a LockPage, doing the clearing
in one go in __free_pages_ok.  In most cases, a page cannot be added
to page cache more than once before it's freed, so in those cases
it doesn't matter which stage we clear them.

The exceptions are normal swap (being deleted from swap cache under
pressure then readded later) and tmpfs: but __add_to_page_cache's
clearing does not help those cases, they compensate for it as much
as benefit.  PG_arch_1 needs more checking than I've time for now
(though I think it'll turn out okay), and probably -pre8 is the
wrong point for a change of this kind.

Hugh


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] page->flags corruption fix
  2003-10-12 11:15         ` Hugh Dickins
@ 2003-10-12 13:21           ` Andrea Arcangeli
  2003-10-12 13:35             ` Andrea Arcangeli
  2003-10-12 14:11           ` Rik van Riel
  1 sibling, 1 reply; 21+ messages in thread
From: Andrea Arcangeli @ 2003-10-12 13:21 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Rik van Riel, Matt_Domsch, marcelo.tosatti, linux-kernel, benh

Hi Hugh,

On Sun, Oct 12, 2003 at 12:15:23PM +0100, Hugh Dickins wrote:
> But I thought it almost coincidental that the racy cases happen to use
> add_to_page_cache_unique rather than add_to_page_cache, and deserved
> comment: until I noticed _nothing_ now uses add_to_page_cache, so

correct, nothing uses it. I also found it in the process and yes it's
not exported.

> please just delete it from filemap.c and pagemap.h - it wasn't

agreed. I didn't touch it because it was not related to this bug, likely
gcc drops it during linking anyways. But I can clean it up of course.

> exported, so no issue with out-of-tree modules (sorry, looks like
> cleanup I should have done when I updated shmem.c in 2.4.22).
> 
> And maybe add __lru_cache_add (inline? FASTCALL?) for the
> 		if (!TestSetPageLRU(page))
> 			add_page_to_inactive_list(page);
> so you don't have to drop pagemap_lru_lock before lru_cache_add.

I thought about it too, I just didn't think it was a big issue, if we
already took the lock, it'll be still in l1 cache since the unlock and
relock are immediate (and even in cpus using load locked store
conditional for all atomic ops, 1 single additional additional spinlock
didn't sound like a painful thing). But I can optimize this bit agreed
(this is especially useful in my tree with the vm_anon_lru sysctl set to
0 by default).

> > For the page_alloc.c changes, they must not make a difference so I
> > backed them out.
> 
> I agree on that too: I think Rik did it for atomicity throughout,
> to make the safety obvious; but in practice it was already safe.

Agreed.

> However, I'm still not satisfied by your patch.  The patch
> which will satisfy me is to remove all that bit clearing from
> __add_to_page_cache, leaving just a LockPage, doing the clearing
> in one go in __free_pages_ok.  In most cases, a page cannot be added
> to page cache more than once before it's freed, so in those cases
> it doesn't matter which stage we clear them.

;) Well, this goes well beyond the scope of this race condition, I agree
but I didn't consider merging those changes in the same patch ;)

> The exceptions are normal swap (being deleted from swap cache under
> pressure then readded later) and tmpfs: but __add_to_page_cache's
> clearing does not help those cases, they compensate for it as much
> as benefit.  PG_arch_1 needs more checking than I've time for now
> (though I think it'll turn out okay), and probably -pre8 is the
> wrong point for a change of this kind.

yes, I think pre8 isn't the best point, and this change has nothing to
do with the race condition in question anyways.

So I would suggest to cook a patch for this orthogonal optimization
during 2.4.24pre.

So this is a new patch addressing the removal of
add_to_page_cache, and the microoptimization to avoid a suprious
unlock/relock cycle and I added some commentary.

	http://www.us.kernel.org/pub/linux/kernel/people/andrea/patches/v2.4/2.4.23pre7/anon-lru-race-better-fix-2

I would be very interested if Dell could test this (so far untested ;)
patch, to be sure this really fixes all the troubles they've experienced
in practice.

Thanks,

diff -urNp --exclude CVS --exclude BitKeeper --exclude {arch} --exclude .arch-ids 2.4.23pre7/include/linux/mm.h race/include/linux/mm.h
--- 2.4.23pre7/include/linux/mm.h	2003-10-10 08:08:27.000000000 +0200
+++ race/include/linux/mm.h	2003-10-11 17:32:02.000000000 +0200
@@ -322,11 +322,9 @@ typedef struct page {
 #define TryLockPage(page)	test_and_set_bit(PG_locked, &(page)->flags)
 #define PageChecked(page)	test_bit(PG_checked, &(page)->flags)
 #define SetPageChecked(page)	set_bit(PG_checked, &(page)->flags)
-#define ClearPageChecked(page)	clear_bit(PG_checked, &(page)->flags)
 #define PageLaunder(page)	test_bit(PG_launder, &(page)->flags)
 #define SetPageLaunder(page)	set_bit(PG_launder, &(page)->flags)
 #define ClearPageLaunder(page)	clear_bit(PG_launder, &(page)->flags)
-#define ClearPageArch1(page)	clear_bit(PG_arch_1, &(page)->flags)
 
 /*
  * The zone field is never updated after free_area_init_core()
diff -urNp --exclude CVS --exclude BitKeeper --exclude {arch} --exclude .arch-ids 2.4.23pre7/include/linux/pagemap.h race/include/linux/pagemap.h
--- 2.4.23pre7/include/linux/pagemap.h	2003-10-10 08:08:28.000000000 +0200
+++ race/include/linux/pagemap.h	2003-10-12 14:42:47.000000000 +0200
@@ -85,7 +85,6 @@ extern void FASTCALL(unlock_page(struct 
 	__find_lock_page(mapping, index, page_hash(mapping, index))
 extern struct page *find_trylock_page(struct address_space *, unsigned long);
 
-extern void add_to_page_cache(struct page * page, struct address_space *mapping, unsigned long index);
 extern void add_to_page_cache_locked(struct page * page, struct address_space *mapping, unsigned long index);
 extern int add_to_page_cache_unique(struct page * page, struct address_space *mapping, unsigned long index, struct page **hash);
 
diff -urNp --exclude CVS --exclude BitKeeper --exclude {arch} --exclude .arch-ids 2.4.23pre7/include/linux/swap.h race/include/linux/swap.h
--- 2.4.23pre7/include/linux/swap.h	2003-10-10 08:08:29.000000000 +0200
+++ race/include/linux/swap.h	2003-10-12 15:02:08.000000000 +0200
@@ -103,6 +103,7 @@ struct sysinfo;
 struct zone_t;
 
 /* linux/mm/swap.c */
+extern void FASTCALL(__lru_cache_add(struct page *));
 extern void FASTCALL(lru_cache_add(struct page *));
 extern void FASTCALL(__lru_cache_del(struct page *));
 extern void FASTCALL(lru_cache_del(struct page *));
diff -urNp --exclude CVS --exclude BitKeeper --exclude {arch} --exclude .arch-ids 2.4.23pre7/mm/filemap.c race/mm/filemap.c
--- 2.4.23pre7/mm/filemap.c	2003-10-10 08:08:32.000000000 +0200
+++ race/mm/filemap.c	2003-10-12 15:06:36.000000000 +0200
@@ -656,33 +656,16 @@ static inline void __add_to_page_cache(s
 	struct address_space *mapping, unsigned long offset,
 	struct page **hash)
 {
-	/*
-	 * Yes this is inefficient, however it is needed.  The problem
-	 * is that we could be adding a page to the swap cache while
-	 * another CPU is also modifying page->flags, so the updates
-	 * really do need to be atomic.  -- Rik
-	 */
-	ClearPageUptodate(page);
-	ClearPageError(page);
-	ClearPageDirty(page);
-	ClearPageReferenced(page);
-	ClearPageArch1(page);
-	ClearPageChecked(page);
-	LockPage(page);
+	unsigned long flags;
+
+	flags = page->flags & ~(1 << PG_uptodate | 1 << PG_error | 1 << PG_dirty | 1 << PG_referenced | 1 << PG_arch_1 | 1 << PG_checked);
+	page->flags = flags | (1 << PG_locked);
 	page_cache_get(page);
 	page->index = offset;
 	add_page_to_inode_queue(mapping, page);
 	add_page_to_hash_queue(page, hash);
 }
 
-void add_to_page_cache(struct page * page, struct address_space * mapping, unsigned long offset)
-{
-	spin_lock(&pagecache_lock);
-	__add_to_page_cache(page, mapping, offset, page_hash(mapping, offset));
-	spin_unlock(&pagecache_lock);
-	lru_cache_add(page);
-}
-
 int add_to_page_cache_unique(struct page * page,
 	struct address_space *mapping, unsigned long offset,
 	struct page **hash)
@@ -690,6 +673,23 @@ int add_to_page_cache_unique(struct page
 	int err;
 	struct page *alias;
 
+	/*
+	 * This function is the only pagecache entry point that
+	 * is allowed to deal with pages outside the pagecache/swapcache,
+	 * but that might be already queued in the VM lru lists
+	 * (one example is the anonymous ram).
+	 *
+	 * For this reason here we have to execute the
+	 * __add_to_page_cache under the pagemap_lru_lock too,
+	 * to avoid VM lru operations like activate_page to
+	 * race with the page->flags clearing in __add_to_page_cache
+	 * or whatever else similar race condition. It's just
+	 * much safer, simpler and more performant (more performant for
+	 * the other common pagecache operations), to avoid the race,
+	 * rather than trying to control it to avoid damages when
+	 * it triggers.
+	 */
+	spin_lock(&pagemap_lru_lock);
 	spin_lock(&pagecache_lock);
 	alias = __find_page_nolock(mapping, offset, *hash);
 
@@ -701,7 +701,8 @@ int add_to_page_cache_unique(struct page
 
 	spin_unlock(&pagecache_lock);
 	if (!err)
-		lru_cache_add(page);
+		__lru_cache_add(page);
+	spin_unlock(&pagemap_lru_lock);
 	return err;
 }
 
diff -urNp --exclude CVS --exclude BitKeeper --exclude {arch} --exclude .arch-ids 2.4.23pre7/mm/page_alloc.c race/mm/page_alloc.c
--- 2.4.23pre7/mm/page_alloc.c	2003-10-10 08:08:32.000000000 +0200
+++ race/mm/page_alloc.c	2003-10-11 17:32:02.000000000 +0200
@@ -109,8 +109,7 @@ static void __free_pages_ok (struct page
 		BUG();
 	if (PageActive(page))
 		BUG();
-	ClearPageReferenced(page);
-	ClearPageDirty(page);
+	page->flags &= ~((1<<PG_referenced) | (1<<PG_dirty));
 
 	if (current->flags & PF_FREE_PAGES)
 		goto local_freelist;
diff -urNp --exclude CVS --exclude BitKeeper --exclude {arch} --exclude .arch-ids 2.4.23pre7/mm/swap.c race/mm/swap.c
--- 2.4.23pre7/mm/swap.c	2003-10-10 08:08:33.000000000 +0200
+++ race/mm/swap.c	2003-10-12 15:02:54.000000000 +0200
@@ -54,6 +54,19 @@ void activate_page(struct page * page)
 /**
  * lru_cache_add: add a page to the page lists
  * @page: the page to add
+ *
+ * This function is for when the caller already holds
+ * the pagemap_lru_lock.
+ */
+void __lru_cache_add(struct page * page)
+{
+	if (!PageLRU(page) && !TestSetPageLRU(page))
+		add_page_to_inactive_list(page);
+}
+
+/**
+ * lru_cache_add: add a page to the page lists
+ * @page: the page to add
  */
 void lru_cache_add(struct page * page)
 {

Andrea - If you prefer relying on open source software, check these links:
	    rsync.kernel.org::pub/scm/linux/kernel/bkcvs/linux-2.[45]/
	    http://www.cobite.com/cvsps/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] page->flags corruption fix
  2003-10-12 13:21           ` Andrea Arcangeli
@ 2003-10-12 13:35             ` Andrea Arcangeli
  0 siblings, 0 replies; 21+ messages in thread
From: Andrea Arcangeli @ 2003-10-12 13:35 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Rik van Riel, Matt_Domsch, marcelo.tosatti, linux-kernel, benh

On Sun, Oct 12, 2003 at 03:21:32PM +0200, Andrea Arcangeli wrote:
> +++ race/mm/swap.c	2003-10-12 15:02:54.000000000 +0200
> @@ -54,6 +54,19 @@ void activate_page(struct page * page)
>  /**
>   * lru_cache_add: add a page to the page lists
      ^ while porting the fix to UL I noticed this comment needs __ in
front of it, I'm not spamming the list with another full patch since it
makes no functional difference, this is an incremental:

--- race/mm/swap.c.~1~	2003-10-12 15:02:54.000000000 +0200
+++ race/mm/swap.c	2003-10-12 15:31:58.000000000 +0200
@@ -52,7 +52,7 @@ void activate_page(struct page * page)
 }
 
 /**
- * lru_cache_add: add a page to the page lists
+ * __lru_cache_add: add a page to the page lists
  * @page: the page to add
  *
  * This function is for when the caller already holds

I uploaded the full update here anyways:

	http://www.us.kernel.org/pub/linux/kernel/people/andrea/patches/v2.4/2.4.23pre7/anon-lru-race-better-fix-3

Andrea

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] page->flags corruption fix
  2003-10-12 11:15         ` Hugh Dickins
  2003-10-12 13:21           ` Andrea Arcangeli
@ 2003-10-12 14:11           ` Rik van Riel
  2003-10-12 14:36             ` Andrea Arcangeli
  1 sibling, 1 reply; 21+ messages in thread
From: Rik van Riel @ 2003-10-12 14:11 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrea Arcangeli, Matt_Domsch, marcelo.tosatti, linux-kernel,
	benh

On Sun, 12 Oct 2003, Hugh Dickins wrote:

> I agree on that too: I think Rik did it for atomicity throughout,
> to make the safety obvious; but in practice it was already safe.

Also to keep this fix future proof.  Who knows whether
a super-optimised patch that masks just this particular
race condition will still be safe when XFS is merged ?

I really don't like keeping the code fragile and making
it easy to reintroduce bugs.

-- 
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] page->flags corruption fix
  2003-10-12 14:11           ` Rik van Riel
@ 2003-10-12 14:36             ` Andrea Arcangeli
  2003-10-12 17:20               ` Rik van Riel
  0 siblings, 1 reply; 21+ messages in thread
From: Andrea Arcangeli @ 2003-10-12 14:36 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Hugh Dickins, Matt_Domsch, marcelo.tosatti, linux-kernel, benh

On Sun, Oct 12, 2003 at 10:11:43AM -0400, Rik van Riel wrote:
> On Sun, 12 Oct 2003, Hugh Dickins wrote:
> 
> > I agree on that too: I think Rik did it for atomicity throughout,
> > to make the safety obvious; but in practice it was already safe.
> 
> Also to keep this fix future proof.  Who knows whether
> a super-optimised patch that masks just this particular
> race condition will still be safe when XFS is merged ?
> 
> I really don't like keeping the code fragile and making
> it easy to reintroduce bugs.

free_pages as said if it's buggy, then it can still trigger no matter
whatever you do there, at that deep point of free_pages a alloc_page
user will be able to get the page and corrupt it anyways, so it can't
help and it will never be able to help no matter what xfs or whatever
else gets merged. If that makes a difference alloc_page users will still
break it.

we can argue about the filemap.c part, but I'm sure the page_alloc.c part
can't make any sense ever.

Andrea - If you prefer relying on open source software, check these links:
	    rsync.kernel.org::pub/scm/linux/kernel/bkcvs/linux-2.[45]/
	    http://www.cobite.com/cvsps/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] page->flags corruption fix
  2003-10-12 14:36             ` Andrea Arcangeli
@ 2003-10-12 17:20               ` Rik van Riel
  2003-10-12 20:40                 ` Andrea Arcangeli
  0 siblings, 1 reply; 21+ messages in thread
From: Rik van Riel @ 2003-10-12 17:20 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Hugh Dickins, Matt_Domsch, marcelo.tosatti, linux-kernel, benh

On Sun, 12 Oct 2003, Andrea Arcangeli wrote:

> we can argue about the filemap.c part, but I'm sure the page_alloc.c
> part can't make any sense ever.

Agreed, the __free_pages_ok() change can almost certainly be
undone and made more efficient.

-- 
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] page->flags corruption fix
  2003-10-12 17:20               ` Rik van Riel
@ 2003-10-12 20:40                 ` Andrea Arcangeli
  0 siblings, 0 replies; 21+ messages in thread
From: Andrea Arcangeli @ 2003-10-12 20:40 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Hugh Dickins, Matt_Domsch, marcelo.tosatti, linux-kernel, benh

On Sun, Oct 12, 2003 at 01:20:27PM -0400, Rik van Riel wrote:
> Agreed, the __free_pages_ok() change can almost certainly be
> undone and made more efficient.

ok great, many thanks for double checking.

Andrea - If you prefer relying on open source software, check these links:
	    rsync.kernel.org::pub/scm/linux/kernel/bkcvs/linux-2.[45]/
	    http://www.cobite.com/cvsps/

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2003-10-12 20:39 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-10-08 15:31 [PATCH] page->flags corruption fix Matt_Domsch
2003-10-08 15:53 ` Hugh Dickins
2003-10-08 15:59   ` Rik van Riel
2003-10-08 17:15     ` Hugh Dickins
2003-10-08 17:41       ` Marcelo Tosatti
2003-10-08 17:52         ` Rik van Riel
2003-10-11 13:48     ` Andrea Arcangeli
2003-10-11 16:03       ` Andrea Arcangeli
2003-10-12 11:15         ` Hugh Dickins
2003-10-12 13:21           ` Andrea Arcangeli
2003-10-12 13:35             ` Andrea Arcangeli
2003-10-12 14:11           ` Rik van Riel
2003-10-12 14:36             ` Andrea Arcangeli
2003-10-12 17:20               ` Rik van Riel
2003-10-12 20:40                 ` Andrea Arcangeli
  -- strict thread matches above, loose matches on Subject: below --
2003-10-07 16:26 Rik van Riel
2003-10-08 14:49 ` Hugh Dickins
2003-10-08 14:57   ` David S. Miller
2003-10-08 15:10     ` Rik van Riel
2003-10-08 15:47       ` Hugh Dickins
2003-10-08 15:52         ` Rik van Riel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox