Re: [RFC PATCH 1/3] Introduce a place holder page for the pagecache.

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Chris Mason <chris.mason@oracle.com>
To: Suparna Bhattacharya <suparna@in.ibm.com>
Cc: linux-fsdevel@vger.kernel.org, akpm@osdl.org, zach.brown@oracle.com
Subject: Re: [RFC PATCH 1/3] Introduce a place holder page for the pagecache.
Date: Wed, 1 Nov 2006 07:41:53 -0500	[thread overview]
Message-ID: <20061101124153.GF5620@think.oraclecorp.com> (raw)
In-Reply-To: <20061101102350.GA25882@in.ibm.com>

On Wed, Nov 01, 2006 at 03:53:50PM +0530, Suparna Bhattacharya wrote:
> 
> Really appreciate your biting the bullet on this one. I have a few
> minor comments/questions as I'm trying to compare this to what we
> have discussed in the past and the last attempt approach outlined in
> http://www.kernel.org/pub/linux/kernel/people/suparna/DIO-simplify.txt
> back in Feb this year.

The main difference from DIO-simplify.txt is that I'm making truncate
wait on the placeholder pages instead of using i_alloc_sem.  I haven't
coded the TAG_LOCKED part, but more on that below.  As you
noticed, I'm using expanding truncates to update i_size for extending
the file, which has pluses and minuses.

> 
> I recall that the first time you experimented with dummy struct pages
> (placeholder pages as they are now called), you ended up putting it on
> hold due to regressions in performance which we attributed to radix
> tree insertion overheads. I think you mention in an earlier note that
> you've managed to optimize some of that -- I'm just wondering what helped
> there - radix_tree_preload or something more than that ?

My original code was fairly dumb and didn't use gang lookups.  It also
dropped the radix tree lock after every insert/delete.  The code posted
here is about 2x as fast (or 1/2 the regression, depending on how you
look at it).   For smaller read ios, the performance hit is about zero.
The cost of radix tree insertion is offset by less mutex and semaphore
locking.

> 
> This is sort of why we had discussed the alternative of tagged locking
> in the radix tree.
> 	  Lookup and lock pages in the range by tagging the radix tree
> 	  slots as "TAG_LOCKED", and locking pages that were present.
> 	  Modify find_get_page et al to return a dummy locked cache page
> 	  if TAG_LOCKED. (Alternatively put the check in add_to_page_cache,
> 	  instead)
> Putting the check in add_to_page_cache is more consistent with what you
> do now.
> 
> Where I think we were stuck at that point was that even this helps only upto
> a point in saving on leaf nodes, but did not avoid intermediate levels of
> nodes from being created. With Nick's changes to the radix tree .. I have 
> been thinking that some kind of sliding height scheme could become possible.
> 
> Anyway, the other reason for bringing this up is because I am wondering if
> some of the complexity in find_and_insert_placeholder_pages and a few other
> places could be reduced with a tagged locking approach where we do not
> actually stuff placeholder pages explicitly in the page cache and hence
> possibly can avoid some of the checks you have below. I need to take a closer
> look to be completely sure, but just a thought ...
> 
> I guess the important thing at this point is to get the interfaces right
> in a way that allows for future optimization.

Yes, my goal was to get the code fast enough for most machines, so that
we could test the result and see what impact it would have on the rest
of the page cache.  From here we can decide if additional radix tuning
is a good idea, or if we just want to stick with the mutex/semaphore
combo in use now.

I suspect that we won't be able to tune the radix tree enough to make it
suitable for the huge SGI 512 cpu test cases.  So we might want to stop
tuning now (letting XFS keep their current code) or switch away from
radix to something that can lock down an extent.

Radix tree gang lookups, insertions and deletions all have room for
optimization though (even without the LOCKED tag).  A new tag in the
radix tree isn't free, Andrew might prefer not to increase the space
usage across the board just for O_DIRECT.

-chris

next prev parent reply	other threads:[~2006-11-01 12:42 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-10-27 18:22 [RFC PATCH 0/3] O_DIRECT locking rework Chris Mason
2006-10-27 18:23 ` [RFC PATCH 1/3] Introduce a place holder page for the pagecache Chris Mason
2006-11-01 10:23   ` Suparna Bhattacharya
2006-11-01 12:41     ` Chris Mason [this message]
2006-11-03  7:12       ` Suparna Bhattacharya
2006-11-03 14:10         ` Chris Mason
2006-10-27 18:25 ` [RFC PATCH 2/3] Change O_DIRECT to use placeholders Chris Mason
2006-10-27 18:26 ` [RFC PATCH 3/3] DIO: don't fall back to buffered writes Chris Mason
2006-11-01 10:51   ` Suparna Bhattacharya
2006-11-01 12:47     ` Chris Mason
2006-11-03  7:25       ` Suparna Bhattacharya

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20061101124153.GF5620@think.oraclecorp.com \
    --to=chris.mason@oracle.com \
    --cc=akpm@osdl.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=suparna@in.ibm.com \
    --cc=zach.brown@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).