From mboxrd@z Thu Jan 1 00:00:00 1970 From: Suparna Bhattacharya Subject: Re: [RFC PATCH 1/3] Introduce a place holder page for the pagecache. Date: Fri, 3 Nov 2006 12:42:57 +0530 Message-ID: <20061103071257.GA13999@in.ibm.com> References: <20061027182215.GJ19895@think.oraclecorp.com> <20061027182358.GK19895@think.oraclecorp.com> <20061101102350.GA25882@in.ibm.com> <20061101124153.GF5620@think.oraclecorp.com> Reply-To: suparna@in.ibm.com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: linux-fsdevel@vger.kernel.org, akpm@osdl.org, zach.brown@oracle.com Return-path: Received: from e36.co.us.ibm.com ([32.97.110.154]:8607 "EHLO e36.co.us.ibm.com") by vger.kernel.org with ESMTP id S1753161AbWKCHKI (ORCPT ); Fri, 3 Nov 2006 02:10:08 -0500 Received: from d03relay04.boulder.ibm.com (d03relay04.boulder.ibm.com [9.17.195.106]) by e36.co.us.ibm.com (8.13.8/8.12.11) with ESMTP id kA37A7oD005275 for ; Fri, 3 Nov 2006 02:10:07 -0500 Received: from d03av03.boulder.ibm.com (d03av03.boulder.ibm.com [9.17.195.169]) by d03relay04.boulder.ibm.com (8.13.6/8.13.6/NCO v8.1.1) with ESMTP id kA37A7p3331310 for ; Fri, 3 Nov 2006 00:10:07 -0700 Received: from d03av03.boulder.ibm.com (loopback [127.0.0.1]) by d03av03.boulder.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id kA37A638010511 for ; Fri, 3 Nov 2006 00:10:07 -0700 To: Chris Mason Content-Disposition: inline In-Reply-To: <20061101124153.GF5620@think.oraclecorp.com> Sender: linux-fsdevel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org On Wed, Nov 01, 2006 at 07:41:53AM -0500, Chris Mason wrote: > On Wed, Nov 01, 2006 at 03:53:50PM +0530, Suparna Bhattacharya wrote: > > > > Really appreciate your biting the bullet on this one. I have a few > > minor comments/questions as I'm trying to compare this to what we > > have discussed in the past and the last attempt approach outlined in > > http://www.kernel.org/pub/linux/kernel/people/suparna/DIO-simplify.txt > > back in Feb this year. > > The main difference from DIO-simplify.txt is that I'm making truncate > wait on the placeholder pages instead of using i_alloc_sem. I haven't Ah OK, that is true. > coded the TAG_LOCKED part, but more on that below. As you > noticed, I'm using expanding truncates to update i_size for extending > the file, which has pluses and minuses. > > > > > I recall that the first time you experimented with dummy struct pages > > (placeholder pages as they are now called), you ended up putting it on > > hold due to regressions in performance which we attributed to radix > > tree insertion overheads. I think you mention in an earlier note that > > you've managed to optimize some of that -- I'm just wondering what helped > > there - radix_tree_preload or something more than that ? > > My original code was fairly dumb and didn't use gang lookups. It also > dropped the radix tree lock after every insert/delete. The code posted > here is about 2x as fast (or 1/2 the regression, depending on how you > look at it). For smaller read ios, the performance hit is about zero. > The cost of radix tree insertion is offset by less mutex and semaphore > locking. OK, so you think the main overheads had to do with radix-tree locking, rather than with the additional node insertions ? > > > > > This is sort of why we had discussed the alternative of tagged locking > > in the radix tree. > > Lookup and lock pages in the range by tagging the radix tree > > slots as "TAG_LOCKED", and locking pages that were present. > > Modify find_get_page et al to return a dummy locked cache page > > if TAG_LOCKED. (Alternatively put the check in add_to_page_cache, > > instead) > > Putting the check in add_to_page_cache is more consistent with what you > > do now. > > > > Where I think we were stuck at that point was that even this helps only upto > > a point in saving on leaf nodes, but did not avoid intermediate levels of > > nodes from being created. With Nick's changes to the radix tree .. I have > > been thinking that some kind of sliding height scheme could become possible. > > > > Anyway, the other reason for bringing this up is because I am wondering if > > some of the complexity in find_and_insert_placeholder_pages and a few other > > places could be reduced with a tagged locking approach where we do not > > actually stuff placeholder pages explicitly in the page cache and hence > > possibly can avoid some of the checks you have below. I need to take a closer > > look to be completely sure, but just a thought ... > > > > I guess the important thing at this point is to get the interfaces right > > in a way that allows for future optimization. > > Yes, my goal was to get the code fast enough for most machines, so that > we could test the result and see what impact it would have on the rest > of the page cache. From here we can decide if additional radix tuning > is a good idea, or if we just want to stick with the mutex/semaphore > combo in use now. Yup, that makes sense. The complexity in find_and_insert_placeholder_pages does worry me though, wouldn't want to end up debugging tricky problems in that code after all this. Do you have any ideas on that ? > > I suspect that we won't be able to tune the radix tree enough to make it > suitable for the huge SGI 512 cpu test cases. So we might want to stop > tuning now (letting XFS keep their current code) or switch away from > radix to something that can lock down an extent. > > Radix tree gang lookups, insertions and deletions all have room for > optimization though (even without the LOCKED tag). A new tag in the > radix tree isn't free, Andrew might prefer not to increase the space > usage across the board just for O_DIRECT. Yes that was a consideration on my mind as well. There are costs both ways. Regards Suparna > > -chris -- Suparna Bhattacharya (suparna@in.ibm.com) Linux Technology Center IBM Software Lab, India