From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from cuda.sgi.com (cuda2.sgi.com [192.48.176.25]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p7Q2JbCG011073 for ; Thu, 25 Aug 2011 21:19:37 -0500 Received: from ipmail06.adl2.internode.on.net (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id F3C3011A194 for ; Thu, 25 Aug 2011 19:19:35 -0700 (PDT) Received: from ipmail06.adl2.internode.on.net (ipmail06.adl2.internode.on.net [150.101.137.129]) by cuda.sgi.com with ESMTP id 8LHcdAPxZea6ZMy7 for ; Thu, 25 Aug 2011 19:19:35 -0700 (PDT) Date: Fri, 26 Aug 2011 12:19:32 +1000 From: Dave Chinner Subject: Re: [PATCH 2/6] xfs: don't serialise adjacent concurrent direct IO appending writes Message-ID: <20110826021932.GX3162@dastard> References: <1314256626-11136-1-git-send-email-david@fromorbit.com> <1314256626-11136-3-git-send-email-david@fromorbit.com> <1314306483.3136.105.camel@doink> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <1314306483.3136.105.camel@doink> List-Id: XFS Filesystem from SGI List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: xfs-bounces@oss.sgi.com Errors-To: xfs-bounces@oss.sgi.com To: Alex Elder Cc: xfs@oss.sgi.com On Thu, Aug 25, 2011 at 04:08:03PM -0500, Alex Elder wrote: > On Thu, 2011-08-25 at 17:17 +1000, Dave Chinner wrote: > > For append write workloads, extending the file requires a certain > > amount of exclusive locking to be done up front to ensure sanity in > > things like ensuring that we've zeroed any allocated regions > > between the old EOF and the start of the new IO. > > > > For single threads, this typically isn't a problem, and for large > > IOs we don't serialise enough for it to be a problem for two > > threads on really fast block devices. However for smaller IO and > > larger thread counts we have a problem. > > > > Take 4 concurrent sequential, single block sized and aligned IOs. > > After the first IO is submitted but before it completes, we end up > > with this state: > > > > IO 1 IO 2 IO 3 IO 4 > > +-------+-------+-------+-------+ > > ^ ^ > > | | > > | | > > | | > > | \- ip->i_new_size > > \- ip->i_size > > > > And the IO is done without exclusive locking because offset <= > > ip->i_size. When we submit IO 2, we see offset > ip->i_size, and > > grab the IO lock exclusive, because there is a chance we need to do > > EOF zeroing. However, there is already an IO in progress that avoids > > the need for IO zeroing because offset <= ip->i_new_size. hence we > > could avoid holding the IO lock exlcusive for this. Hence after > > submission of the second IO, we'd end up this state: > > > > IO 1 IO 2 IO 3 IO 4 > > +-------+-------+-------+-------+ > > ^ ^ > > | | > > | | > > | | > > | \- ip->i_new_size > > \- ip->i_size > > > > There is no need to grab the i_mutex of the IO lock in exclusive > > mode if we don't need to invalidate the page cache. Taking these > > locks on every direct IO effective serialises them as taking the IO > > lock in exclusive mode has to wait for all shared holders to drop > > the lock. That only happens when IO is complete, so effective it > > prevents dispatch of concurrent direct IO writes to the same inode. > > > > And so you can see that for the third concurrent IO, we'd avoid > > exclusive locking for the same reason we avoided the exclusive lock > > for the second IO. > > > > Fixing this is a bit more complex than that, because we need to hold > > a write-submission local value of ip->i_new_size to that clearing > > the value is only done if no other thread has updated it before our > > IO completes..... > > > > Signed-off-by: Dave Chinner > > This looks good. What did you do with the little > "If the IO is clearly not beyond the on-disk inode size, > return before we take locks" optimization in xfs_setfilesize() > from the last time you posted this? That's take care of in Christoph's recent patch set. Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs