From: Chris Mason <chris.mason@oracle.com>
To: Nick Piggin <npiggin@suse.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>,
Trond Myklebust <trond.myklebust@fys.uio.no>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Linux Memory Management List <linux-mm@kvack.org>,
linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH RFC] extent mapped page cache
Date: Wed, 25 Jul 2007 22:10:07 -0400 [thread overview]
Message-ID: <20070725221007.0edcc2dc@think.oraclecorp.com> (raw)
In-Reply-To: <20070726013728.GB20727@wotan.suse.de>
On Thu, 26 Jul 2007 03:37:28 +0200
Nick Piggin <npiggin@suse.de> wrote:
>
> > One advantage to the state tree is that it separates the state from
> > the memory being described, allowing a simple kmap style interface
> > that covers subpages, highmem and superpages.
>
> I suppose so, although we should have added those interfaces long
> ago ;) The variants in fsblock are pretty good, and you could always
> do an arbitrary extent (rather than block) based API using the
> pagecache tree if it would be helpful.
Yes, you could use fsblock for the state bits and make a separate API
to map the actual pages.
>
>
> > It also more naturally matches the way we want to do IO, making for
> > easy clustering.
>
> Well the pagecache tree is used to reasonable effect for that now.
> OK the code isn't beautiful ;). Granted, this might be an area where
> the seperate state tree ends up being better. We'll see.
>
One thing it gains us is finding the start of the cluster. Even if
called by kswapd, the state tree allows writepage to find the start of
the cluster and send down a big bio (provided I implement trylock to
avoid various deadlocks).
>
> > O_DIRECT becomes a special case of readpages and writepages....the
> > memory used for IO just comes from userland instead of the page
> > cache.
>
> Could be, although you'll probably also need to teach the mm about
> the state tree and/or still manipulate the pagecache tree to prevent
> concurrency?
Well, it isn't coded yet, but I should be able to do it from the FS
specific ops.
>
> But isn't the main aim of O_DIRECT to do as little locking and
> synchronisation with the pagecache as possible? I thought this is
> why your race fixing patches got put on the back burner (although
> they did look fairly nice from a correctness POV).
I put the placeholder patches on hold because handling a corner case
where userland did O_DIRECT from a mmap'd region of the same file (Linus
pointed it out to me). Basically my patches had to work in 64k chunks
to avoid a deadlock in get_user_pages. With the state tree, I can
allow the page to be faulted in but still properly deal with it.
>
> Well I'm kind of handwaving when it comes to O_DIRECT ;) It does look
> like this might be another advantage of the state tree (although you
> aren't allowed to slow down buffered IO to achieve the locking ;)).
;) The O_DIRECT benefit is a fringe thing. I've long wanted to help
clean up that code, but the real point of the patch is to make general
usage faster and less complex. If I can't get there, the O_DIRECT
stuff doesn't matter.
>
>
> > The ability to put in additional tracking info like the process that
> > first dirtied a range is also significant. So, I think it is worth
> > trying.
>
> Definitely, and I'm glad you are. You haven't converted me yet, but
> I look forward to finding the best ideas from our two approaches when
> the patches are further along (ext2 port of fsblock coming along, so
> we'll be able to have races soon :P).
I'm sure we can find some river in Cambridge, winner gets to throw
Axboe in.
-chris
next prev parent reply other threads:[~2007-07-26 2:12 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-07-10 21:03 [PATCH RFC] extent mapped page cache Chris Mason
2007-07-12 7:00 ` Daniel Phillips
2007-07-18 14:18 ` Chris Mason
2007-07-24 20:00 ` Chris Mason
2007-07-24 20:03 ` [PATCH RFC] extent mapped page cache main code Chris Mason
2007-07-24 20:04 ` [PATCH RFC] ext2 extentmap support Chris Mason
2007-07-24 20:13 ` [PATCH RFC] extent mapped page cache Trond Myklebust
2007-07-24 21:25 ` Peter Zijlstra
2007-07-24 23:25 ` Chris Mason
2007-07-25 2:32 ` Nick Piggin
2007-07-25 12:18 ` Chris Mason
2007-07-26 1:37 ` Nick Piggin
2007-07-26 2:10 ` Chris Mason [this message]
2007-07-26 2:36 ` Nick Piggin
2007-07-26 7:53 ` Anton Altaparmakov
2007-07-26 13:05 ` Chris Mason
2007-07-27 1:15 ` Nick Piggin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070725221007.0edcc2dc@think.oraclecorp.com \
--to=chris.mason@oracle.com \
--cc=a.p.zijlstra@chello.nl \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=npiggin@suse.de \
--cc=trond.myklebust@fys.uio.no \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).