linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Nick Piggin <npiggin@suse.de>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-fsdevel@vger.kernel.org, david@fromorbit.com,
	chris.mason@oracle.com
Subject: Re: [patch 7/9] mm: write_cache_pages terminate quickly
Date: Fri, 31 Oct 2008 08:29:01 +0100	[thread overview]
Message-ID: <20081031072901.GE19268@wotan.suse.de> (raw)
In-Reply-To: <20081030160746.7800ba3e.akpm@linux-foundation.org>

On Thu, Oct 30, 2008 at 04:07:46PM -0700, Andrew Morton wrote:
> On Wed, 29 Oct 2008 01:47:22 +1100
> npiggin@suse.de wrote:
> 
> > Terminate the write_cache_pages loop upon encountering the first page past
> > end, without locking the page. Pages cannot have their index change when we
> > have a reference on them (truncate, eg truncate_inode_pages_range performs
> > the same check without the page lock).
> > 
> 
> Traditionally lock_page() is used to stabilise ->index and ->mapping. 

Well, mapping. index of course is irrelevant without mapping, *except*
for a "where did we get to" kind of thing. But it has been used in that
way for a long time.


> Here you introduce a new and very subtle sort-of-locking rule without
> actually really introducing it at all.  OK, there's a little comment
> buried way down in this function.  But there's a contradictory comment
> over truncate_inode_pages_range() ("When looking at...").

That comment is actually wrong. Index won't change. If index could change
randomly, then we could skip pages here if index skips forwards. pagevec
pagecache tag lookup functions would be broken in general actually.

 
> How do we make this new locking rule maintainable?  How do we avoid
> breaking it in the future?  How do we prevent accidental breakage from
> slipping past developers' and reviewers' attention?

It's actually fairly fundamental. Even more fundamental than the above
functions I quote.

If we have any place that does:
lock_page(page)
if (!page->mapping) /* truncate got to it */

but does not check the index of the page (which most don't), then it could
have moved from where we first got it from (which would not always be a
bug, but often could be).

read(2) syscall actually also doesn't lock the page by default. Having the
page move somewhere else would be a disaster for it.

I guess it's not explicitly documented AFAIKS, but I thought it is a
hard rule. Is there anywhere useful we can write it that people will
actually read?

OTOH, there isn't a lot of places that could be doing this. Some wild
filesystem might think they own the pagecache I guess. I know that when
it came up in splice, I told Jens we can't move a page with references
on it even if it is locked...

 
> Given the additional maintenance burdens, is this change worth doing
> at all?
> 
> 
> > ---
> > Index: linux-2.6/mm/page-writeback.c
> > ===================================================================
> > --- linux-2.6.orig/mm/page-writeback.c
> > +++ linux-2.6/mm/page-writeback.c
> > @@ -911,15 +911,24 @@ retry:
> >  		for (i = 0; i < nr_pages; i++) {
> >  			struct page *page = pvec.pages[i];
> >  
> > -			done_index = page->index + 1;
> > -
> >  			/*
> > -			 * At this point we hold neither mapping->tree_lock nor
> > -			 * lock on the page itself: the page may be truncated or
> > -			 * invalidated (changing page->mapping to NULL), or even
> > -			 * swizzled back from swapper_space to tmpfs file
> > -			 * mapping
> > +			 * At this point, the page may be truncated or
> > +			 * invalidated (changing page->mapping to NULL), or
> > +			 * even swizzled back from swapper_space to tmpfs file
> > +			 * mapping. However, page->index will not change
> > +			 * because we have a reference on the page.
> >  			 */
> > +			if (page->index > end) {
> > +				/*
> > +				 * can't be range_cyclic (1st pass) because
> > +				 * end == -1 in that case.
> > +				 */
> > +				done = 1;
> > +				break;
> > +			}
> > +
> > +			done_index = page->index + 1;
> > +
> >  			lock_page(page);
> >  
> >  			/*
> > @@ -936,15 +945,6 @@ continue_unlock:
> >  				continue;
> >  			}
> >  
> > -			if (page->index > end) {
> > -				/*
> > -				 * can't be range_cyclic (1st pass) because
> > -				 * end == -1 in that case.
> > -				 */
> > -				done = 1;
> > -				goto continue_unlock;
> > -			}
> > -
> >  			if (!PageDirty(page)) {
> >  				/* someone wrote it for us */
> >  				goto continue_unlock;
> > 
> > -- 

  reply	other threads:[~2008-10-31  7:29 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-10-28 14:47 [patch 0/9] writeback data integrity and other fixes (take 3) npiggin
2008-10-28 14:47 ` [patch 1/9] mm: write_cache_pages cyclic fix npiggin
2008-10-29  0:24   ` [patch 1.1/9] mm: write_cache_pages cyclic fix fix Nick Piggin
2008-10-28 14:47 ` [patch 2/9] mm: write_cache_pages early loop termination npiggin
2008-10-28 14:47 ` [patch 3/9] mm: write_cache_pages writepage error fix npiggin
2008-10-28 14:47 ` [patch 4/9] mm: write_cache_pages integrity fix npiggin
2008-10-28 14:47 ` [patch 5/9] mm: write_cache_pages cleanups npiggin
2008-10-28 14:47 ` [patch 6/9] mm: write_cache_pages optimise page cleaning npiggin
2008-10-28 14:47 ` [patch 7/9] mm: write_cache_pages terminate quickly npiggin
2008-10-30 23:07   ` Andrew Morton
2008-10-31  7:29     ` Nick Piggin [this message]
2008-10-28 14:47 ` [patch 8/9] mm: write_cache_pages more " npiggin
2008-10-28 14:47 ` [patch 9/9] mm: do_sync_mapping_range integrity fix npiggin
2008-10-30 23:13   ` Andrew Morton
2008-10-31  9:16     ` Nick Piggin
2008-10-31 10:04       ` Andrew Morton
2008-10-31 10:53         ` Nick Piggin
2008-10-31 20:03         ` Jamie Lokier
2008-10-31 14:10       ` Chris Mason
2008-10-31 14:30         ` steve
2008-10-31 15:02           ` Chris Mason
2008-11-01  8:04         ` Nick Piggin
2008-10-28 15:39 ` [patch 0/9] writeback data integrity and other fixes (take 3) Nick Piggin
2008-10-28 22:27   ` Dave Chinner
2008-10-29  0:04     ` Nick Piggin
2008-10-29  0:16     ` Nick Piggin
2008-10-29  3:16       ` Dave Chinner
2008-10-29  3:26         ` Dave Chinner
2008-10-29  4:11           ` Nick Piggin
2008-10-29  4:57             ` Dave Chinner
2008-10-29  5:06               ` Nick Piggin
2008-10-29  9:13           ` Christoph Hellwig
2008-10-29 21:42             ` Dave Chinner
2008-10-29 21:45               ` Christoph Hellwig
2008-10-29 21:53                 ` Dave Chinner
2008-10-29  4:00         ` Nick Piggin
2008-10-29  5:27           ` Dave Chinner
2008-10-29  9:12         ` Christoph Hellwig
2008-10-29  9:21           ` Nick Piggin
2008-10-29  9:44             ` Christoph Hellwig
2008-10-29 10:30               ` Nick Piggin
2008-10-29 12:22                 ` Jamie Lokier
     [not found]                   ` <20081029122234.GE846-yetKDKU6eevNLxjTenLetw@public.gmane.org>
2008-10-29 13:32                     ` Ric Wheeler
2008-10-29 14:56                       ` Chris Mason
     [not found]                         ` <1225292196.6448.263.camel-cGoWVVl3WGUrkklhUoBCrlaTQe2KTcn/@public.gmane.org>
2008-10-30  2:16                           ` Nick Piggin
     [not found]                             ` <20081030021601.GF18041-B4tOwbsTzaBolqkO4TVVkw@public.gmane.org>
2008-10-30 12:51                               ` jim owens
2008-10-30 13:41                                 ` Jim Rees
2008-10-29 21:43                   ` Dave Chinner
2008-10-29  8:51     ` Dave Chinner
2008-10-28 23:14 ` Dave Chinner
2008-10-28 23:57   ` Nick Piggin
2008-10-29  0:05     ` Andrew Morton
2008-10-29  0:10       ` Nick Piggin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20081031072901.GE19268@wotan.suse.de \
    --to=npiggin@suse.de \
    --cc=akpm@linux-foundation.org \
    --cc=chris.mason@oracle.com \
    --cc=david@fromorbit.com \
    --cc=linux-fsdevel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).