From: Dave Chinner <david@fromorbit.com>
To: Alex Elder <aelder@sgi.com>
Cc: xfs@oss.sgi.com
Subject: Re: [PATCH 2/6] xfs: don't serialise adjacent concurrent direct IO appending writes
Date: Fri, 26 Aug 2011 12:19:32 +1000 [thread overview]
Message-ID: <20110826021932.GX3162@dastard> (raw)
In-Reply-To: <1314306483.3136.105.camel@doink>
On Thu, Aug 25, 2011 at 04:08:03PM -0500, Alex Elder wrote:
> On Thu, 2011-08-25 at 17:17 +1000, Dave Chinner wrote:
> > For append write workloads, extending the file requires a certain
> > amount of exclusive locking to be done up front to ensure sanity in
> > things like ensuring that we've zeroed any allocated regions
> > between the old EOF and the start of the new IO.
> >
> > For single threads, this typically isn't a problem, and for large
> > IOs we don't serialise enough for it to be a problem for two
> > threads on really fast block devices. However for smaller IO and
> > larger thread counts we have a problem.
> >
> > Take 4 concurrent sequential, single block sized and aligned IOs.
> > After the first IO is submitted but before it completes, we end up
> > with this state:
> >
> > IO 1 IO 2 IO 3 IO 4
> > +-------+-------+-------+-------+
> > ^ ^
> > | |
> > | |
> > | |
> > | \- ip->i_new_size
> > \- ip->i_size
> >
> > And the IO is done without exclusive locking because offset <=
> > ip->i_size. When we submit IO 2, we see offset > ip->i_size, and
> > grab the IO lock exclusive, because there is a chance we need to do
> > EOF zeroing. However, there is already an IO in progress that avoids
> > the need for IO zeroing because offset <= ip->i_new_size. hence we
> > could avoid holding the IO lock exlcusive for this. Hence after
> > submission of the second IO, we'd end up this state:
> >
> > IO 1 IO 2 IO 3 IO 4
> > +-------+-------+-------+-------+
> > ^ ^
> > | |
> > | |
> > | |
> > | \- ip->i_new_size
> > \- ip->i_size
> >
> > There is no need to grab the i_mutex of the IO lock in exclusive
> > mode if we don't need to invalidate the page cache. Taking these
> > locks on every direct IO effective serialises them as taking the IO
> > lock in exclusive mode has to wait for all shared holders to drop
> > the lock. That only happens when IO is complete, so effective it
> > prevents dispatch of concurrent direct IO writes to the same inode.
> >
> > And so you can see that for the third concurrent IO, we'd avoid
> > exclusive locking for the same reason we avoided the exclusive lock
> > for the second IO.
> >
> > Fixing this is a bit more complex than that, because we need to hold
> > a write-submission local value of ip->i_new_size to that clearing
> > the value is only done if no other thread has updated it before our
> > IO completes.....
> >
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
>
> This looks good. What did you do with the little
> "If the IO is clearly not beyond the on-disk inode size,
> return before we take locks" optimization in xfs_setfilesize()
> from the last time you posted this?
That's take care of in Christoph's recent patch set.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2011-08-26 2:19 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-08-25 7:17 [PATCH 0/6] xfs: patch queue for Linux 3.2 Dave Chinner
2011-08-25 7:17 ` [PATCH 1/6] xfs: don't serialise direct IO reads on page cache checks Dave Chinner
2011-08-25 7:17 ` [PATCH 2/6] xfs: don't serialise adjacent concurrent direct IO appending writes Dave Chinner
2011-08-25 21:08 ` Alex Elder
2011-08-26 2:19 ` Dave Chinner [this message]
2011-08-25 7:17 ` [PATCH 3/6] xfs: Don't allocate new buffers on every call to _xfs_buf_find Dave Chinner
2011-08-25 20:56 ` Alex Elder
2011-08-25 23:57 ` Dave Chinner
2011-08-25 7:17 ` [PATCH 4/6] xfs: reduce the number of log forces from tail pushing Dave Chinner
2011-08-25 20:57 ` Alex Elder
2011-08-25 23:47 ` Dave Chinner
2011-08-25 7:17 ` [PATCH 5/6] xfs: re-arrange all the xfsbufd delwri queue code Dave Chinner
2011-08-25 20:57 ` Alex Elder
2011-08-25 7:17 ` [PATCH 6/6] xfs: convert xfsbufd to use a workqueue Dave Chinner
2011-08-25 20:57 ` Alex Elder
2011-08-25 23:46 ` Dave Chinner
2011-08-26 0:18 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110826021932.GX3162@dastard \
--to=david@fromorbit.com \
--cc=aelder@sgi.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox