linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Suparna Bhattacharya <suparna@in.ibm.com>
To: Chris Mason <chris.mason@oracle.com>
Cc: linux-fsdevel@vger.kernel.org, akpm@osdl.org, zach.brown@oracle.com
Subject: Re: [RFC PATCH 1/3] Introduce a place holder page for the pagecache.
Date: Fri, 3 Nov 2006 12:42:57 +0530	[thread overview]
Message-ID: <20061103071257.GA13999@in.ibm.com> (raw)
In-Reply-To: <20061101124153.GF5620@think.oraclecorp.com>

On Wed, Nov 01, 2006 at 07:41:53AM -0500, Chris Mason wrote:
> On Wed, Nov 01, 2006 at 03:53:50PM +0530, Suparna Bhattacharya wrote:
> > 
> > Really appreciate your biting the bullet on this one. I have a few
> > minor comments/questions as I'm trying to compare this to what we
> > have discussed in the past and the last attempt approach outlined in
> > http://www.kernel.org/pub/linux/kernel/people/suparna/DIO-simplify.txt
> > back in Feb this year.
> 
> The main difference from DIO-simplify.txt is that I'm making truncate
> wait on the placeholder pages instead of using i_alloc_sem.  I haven't

Ah OK, that is true.

> coded the TAG_LOCKED part, but more on that below.  As you
> noticed, I'm using expanding truncates to update i_size for extending
> the file, which has pluses and minuses.
> 
> > 
> > I recall that the first time you experimented with dummy struct pages
> > (placeholder pages as they are now called), you ended up putting it on
> > hold due to regressions in performance which we attributed to radix
> > tree insertion overheads. I think you mention in an earlier note that
> > you've managed to optimize some of that -- I'm just wondering what helped
> > there - radix_tree_preload or something more than that ?
> 
> My original code was fairly dumb and didn't use gang lookups.  It also
> dropped the radix tree lock after every insert/delete.  The code posted
> here is about 2x as fast (or 1/2 the regression, depending on how you
> look at it).   For smaller read ios, the performance hit is about zero.
> The cost of radix tree insertion is offset by less mutex and semaphore
> locking.

OK, so you think the main overheads had to do with radix-tree locking,
rather than with the additional node insertions ?

> 
> > 
> > This is sort of why we had discussed the alternative of tagged locking
> > in the radix tree.
> > 	  Lookup and lock pages in the range by tagging the radix tree
> > 	  slots as "TAG_LOCKED", and locking pages that were present.
> > 	  Modify find_get_page et al to return a dummy locked cache page
> > 	  if TAG_LOCKED. (Alternatively put the check in add_to_page_cache,
> > 	  instead)
> > Putting the check in add_to_page_cache is more consistent with what you
> > do now.
> > 
> > Where I think we were stuck at that point was that even this helps only upto
> > a point in saving on leaf nodes, but did not avoid intermediate levels of
> > nodes from being created. With Nick's changes to the radix tree .. I have 
> > been thinking that some kind of sliding height scheme could become possible.
> > 
> > Anyway, the other reason for bringing this up is because I am wondering if
> > some of the complexity in find_and_insert_placeholder_pages and a few other
> > places could be reduced with a tagged locking approach where we do not
> > actually stuff placeholder pages explicitly in the page cache and hence
> > possibly can avoid some of the checks you have below. I need to take a closer
> > look to be completely sure, but just a thought ...
> > 
> > I guess the important thing at this point is to get the interfaces right
> > in a way that allows for future optimization.
> 
> Yes, my goal was to get the code fast enough for most machines, so that
> we could test the result and see what impact it would have on the rest
> of the page cache.  From here we can decide if additional radix tuning
> is a good idea, or if we just want to stick with the mutex/semaphore
> combo in use now.

Yup, that makes sense.

The complexity in find_and_insert_placeholder_pages does worry me though,
wouldn't want to end up debugging tricky problems in that code after all
this. Do you have any ideas on that ?

> 
> I suspect that we won't be able to tune the radix tree enough to make it
> suitable for the huge SGI 512 cpu test cases.  So we might want to stop
> tuning now (letting XFS keep their current code) or switch away from
> radix to something that can lock down an extent.
> 
> Radix tree gang lookups, insertions and deletions all have room for
> optimization though (even without the LOCKED tag).  A new tag in the
> radix tree isn't free, Andrew might prefer not to increase the space
> usage across the board just for O_DIRECT.

Yes that was a consideration on my mind as well. There are costs both
ways.

Regards
Suparna
> 
> -chris

-- 
Suparna Bhattacharya (suparna@in.ibm.com)
Linux Technology Center
IBM Software Lab, India


  reply	other threads:[~2006-11-03  7:10 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-10-27 18:22 [RFC PATCH 0/3] O_DIRECT locking rework Chris Mason
2006-10-27 18:23 ` [RFC PATCH 1/3] Introduce a place holder page for the pagecache Chris Mason
2006-11-01 10:23   ` Suparna Bhattacharya
2006-11-01 12:41     ` Chris Mason
2006-11-03  7:12       ` Suparna Bhattacharya [this message]
2006-11-03 14:10         ` Chris Mason
2006-10-27 18:25 ` [RFC PATCH 2/3] Change O_DIRECT to use placeholders Chris Mason
2006-10-27 18:26 ` [RFC PATCH 3/3] DIO: don't fall back to buffered writes Chris Mason
2006-11-01 10:51   ` Suparna Bhattacharya
2006-11-01 12:47     ` Chris Mason
2006-11-03  7:25       ` Suparna Bhattacharya

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20061103071257.GA13999@in.ibm.com \
    --to=suparna@in.ibm.com \
    --cc=akpm@osdl.org \
    --cc=chris.mason@oracle.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=zach.brown@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).