From: Avi Kivity <avi@scylladb.com>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: xfs_buf_lock vs aio
Date: Sun, 25 Feb 2018 19:47:09 +0200 [thread overview]
Message-ID: <ea00b0d5-3d29-d8a7-13f9-83e6a4404192@scylladb.com> (raw)
In-Reply-To: <20180219024055.GW7000@dastard>
Sorry for the late reply, I was on vacation.
On 02/19/2018 04:40 AM, Dave Chinner wrote:
> On Fri, Feb 16, 2018 at 10:07:55AM +0200, Avi Kivity wrote:
>> On 02/15/2018 11:30 PM, Dave Chinner wrote:
>>> On Thu, Feb 15, 2018 at 11:36:54AM +0200, Avi Kivity wrote:
>>>> On 02/15/2018 01:56 AM, Dave Chinner wrote:
>>>> A little bird whispered in my ear to try XFS_IOC_OPEN_BY_HANDLE to
>>>> avoid the the time update lock, so we'll be trying that next, to
>>>> emulate lazytime.
>>> Biggest problem with that is it requires root permissions. It's not
>>> a solution that can be deployed in practice, so I haven't bothered
>>> suggesting it as something to try.
>>>
>>> If you want to try lazytime, an easier test might be to rebuild the
>>> kernel with this change below to support the lazytime mount option
>>> and not log the timestamp updates. This is essentially the mechanism
>>> that I'll use for this, but it will need to grow more stuff to have
>>> the correct lazytime semantics...
>>>
>> We tried open by handle to see if lazytime would provide relief, but
>> it looks like it just pushes the lock acquisition to another place:
> Whack-a-mole.
>
> This is the whole problem with driving the "nowait" semantics into
> the filesystem implementations - every time we fix one blocking
> point, we find a deeper one, and we have to drive the "nowait"
> semantics deeper into code that should not have to care about IO
> level blocking semantics. And by doing it in a "slap-a-bandaid on
> it" process, we end up with spagetti code that is fragile and
> unmaintainable...
I agree it's not pretty. It would be worse if the kernel did not have
strict error handling hygiene that allows you to add those EAGAIN
returns without too much fear.
>
>> However, that function can EAGAIN (it does for IOLOCK) so maybe we
>> can change xfs_ilock to xfs_ilock_nowait and teach it about not
>> waiting for ILOCK too.
> If only it were that simple. Why, exactly, does the direct IO write
> code require the ILOCK exclusive? Indeed, if it goes to allocate
> blocks, we do this:
>
> /*
> * xfs_iomap_write_direct() expects the shared lock. It
> * is unlocked on return.
> */
> if (lockmode == XFS_ILOCK_EXCL)
> xfs_ilock_demote(ip, lockmode);
>
> We demote the lock to shared before we call into the allocation
> code. And for pure direct IO writes, all we care about is ensuring
> the extent map does not change while we do the lookup and check.
> That only requires a shared lock.
>
> So now I've got to go work out why need_excl_ilock() says we need
> an exclusive ilock for direct IO writes when it looks pretty clear
> to me that we don't.
>
> But that's only half the problem. The other problem is that even if
> we take it shared, we're still going to block on IO completion
> taking the ILOCK exclusive to do things like unwritten extent
> completion. So we still need to hack about with "trylock" operations
> into functions into various functions (xfs_ilock_data_map_shared()
> for one).
I'll try your patch and report, thanks.
> What a mess....
>
> Cheers,
>
> Dave.
prev parent reply other threads:[~2018-02-25 17:47 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-02-07 17:20 xfs_buf_lock vs aio Avi Kivity
2018-02-07 23:33 ` Dave Chinner
2018-02-08 8:24 ` Avi Kivity
2018-02-08 22:11 ` Dave Chinner
2018-02-09 12:11 ` Avi Kivity
2018-02-09 23:10 ` Dave Chinner
2018-02-12 9:33 ` Avi Kivity
2018-02-13 5:18 ` Dave Chinner
2018-02-13 23:14 ` Darrick J. Wong
2018-02-14 2:16 ` Dave Chinner
2018-02-14 12:01 ` Avi Kivity
2018-02-14 12:07 ` Avi Kivity
2018-02-14 12:18 ` Avi Kivity
2018-02-14 23:56 ` Dave Chinner
2018-02-15 9:36 ` Avi Kivity
2018-02-15 21:30 ` Dave Chinner
2018-02-16 8:07 ` Avi Kivity
2018-02-19 2:40 ` Dave Chinner
2018-02-19 4:48 ` Dave Chinner
2018-02-25 17:47 ` Avi Kivity [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ea00b0d5-3d29-d8a7-13f9-83e6a4404192@scylladb.com \
--to=avi@scylladb.com \
--cc=david@fromorbit.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).