linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Nick Piggin <npiggin@suse.de>
To: Chris Mason <chris.mason@oracle.com>
Cc: Linux Memory Management List <linux-mm@kvack.org>,
	linux-fsdevel@vger.kernel.org
Subject: Re: [patch][rfc] mm: new address space calls
Date: Sat, 28 Feb 2009 06:52:21 +0100	[thread overview]
Message-ID: <20090228055221.GB28496@wotan.suse.de> (raw)
In-Reply-To: <1235742767.10511.7.camel@think.oraclecorp.com>

On Fri, Feb 27, 2009 at 08:52:47AM -0500, Chris Mason wrote:
> On Fri, 2009-02-27 at 12:26 +0100, Nick Piggin wrote:
> > Well I don't see how that limits us? Either we prefer to keep the
> > metadata, or we throw it away and it is inevitable that we lose
> > information. 
> > 
> 
> We can't have metadata that isn't freed by releasepage unless we want to
> pin the page completely.  There was a time when the btrfs metadata had a
> bit for 'this block needs defrag', and I ended up not being able to use
> it because releasepage was consistently freeing my extra data while the
> page was still around.

Hmm, it sounds like that data perhaps is more a property of the
filesystem / block management rather than the pagecache (OK, it's
a blurry line)...

But I mean 'this block neds defrag' sounds like important metadata
even if the page is *not* still around? (but the block is)

Having your own private metadata, perhaps with the ->shrinker callback
is an option. In fsblock actually for the block mapping cache tree,
I don't use a shrinker, because (I'm lazy and) reclaim will eventaully
reclaim the inode in which case the tree will be taken down with the
new aop->release callback.

But in theory even when the in-memory inode goes away, the block mapping
is still valid metadata, so you could keep it around somewhere (in which
case it would need a shrinker callback).


> > > I'd like a form of releasepage that knows if the vm is going to really
> > > get rid of the page.  Or another callback that happens when the VM is
> > > sure the page will be freed so we can drop extra metadata that doesn't
> > > pin the page, but we always want to stay with the page.
> > 
> > Well, for page reclaim/invalidate/truncate, we have releasepage that you
> > can use even if the metadata is stored outside the page, just set PagePrivate
> > and it will still get called when the page is about to be freed.
> > 
> 
> For clean pages, shrink_page_list seems to check the page count after
> the releasepage call.  It was a big enough window for me to see it in
> practice under normal workloads.

Oh yes, you would see it, but it just shouldn't be *too* common I think.
It's a hard race to close. You would ned to effectively take a spinlock
to prevent pagecache lookup over the releasepage call (OK, with lockless
pagecache it is no longer really tree_lock, but setting page->_count to
0, which causes lookup to basically do equivalent spinning anyway).

Of course it still may be closed with a new callback at pagecache
removal time... but I'm not convinced you need one yet ;) Maybe I don't
understand the requirements properly yet.

 
> > There are *some* races that can result in the page subsequently not being
> > freed, but I don't think that should be a big deal. I don't want to add
> > a callback in the pagecache remove path if possible, but we can try to
> > rework or improve things if btrfs needs something specific..
> 
> Btrfs doesn't need it today, but it should help once I finally get
> subpage blocks going again (and metadata defrag as well).

  reply	other threads:[~2009-02-28  5:52 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-02-25 10:48 [patch][rfc] mm: new address space calls Nick Piggin
2009-02-25 20:59 ` Chris Mason
2009-02-26  5:17   ` Nick Piggin
2009-02-26 13:21     ` Chris Mason
2009-02-27 11:26       ` Nick Piggin
2009-02-27 13:52         ` Chris Mason
2009-02-28  5:52           ` Nick Piggin [this message]
2009-02-28 23:19   ` Christoph Hellwig
2009-03-01  2:38     ` Nick Piggin
2009-02-28 23:24 ` Christoph Hellwig
2009-03-01  2:45   ` Nick Piggin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090228055221.GB28496@wotan.suse.de \
    --to=npiggin@suse.de \
    --cc=chris.mason@oracle.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).