linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Avi Kivity <avi@scylladb.com>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: xfs_buf_lock vs aio
Date: Sun, 25 Feb 2018 19:47:09 +0200	[thread overview]
Message-ID: <ea00b0d5-3d29-d8a7-13f9-83e6a4404192@scylladb.com> (raw)
In-Reply-To: <20180219024055.GW7000@dastard>

Sorry for the late reply, I was on vacation.


On 02/19/2018 04:40 AM, Dave Chinner wrote:
> On Fri, Feb 16, 2018 at 10:07:55AM +0200, Avi Kivity wrote:
>> On 02/15/2018 11:30 PM, Dave Chinner wrote:
>>> On Thu, Feb 15, 2018 at 11:36:54AM +0200, Avi Kivity wrote:
>>>> On 02/15/2018 01:56 AM, Dave Chinner wrote:
>>>> A little bird whispered in my ear to try XFS_IOC_OPEN_BY_HANDLE to
>>>> avoid the the time update lock, so we'll be trying that next, to
>>>> emulate lazytime.
>>> Biggest problem with that is it requires root permissions. It's not
>>> a solution that can be deployed in practice, so I haven't bothered
>>> suggesting it as something to try.
>>>
>>> If you want to try lazytime, an easier test might be to rebuild the
>>> kernel with this change below to support the lazytime mount option
>>> and not log the timestamp updates. This is essentially the mechanism
>>> that I'll use for this, but it will need to grow more stuff to have
>>> the correct lazytime semantics...
>>>
>> We tried open by handle to see if lazytime would provide relief, but
>> it looks like it just pushes the lock acquisition to another place:
> Whack-a-mole.
>
> This is the whole problem with driving the "nowait" semantics into
> the filesystem implementations - every time we fix one blocking
> point, we find a deeper one, and we have to drive the "nowait"
> semantics deeper into code that should not have to care about IO
> level blocking semantics. And by doing it in a "slap-a-bandaid on
> it" process, we end up with spagetti code that is fragile and
> unmaintainable...

I agree it's not pretty. It would be worse if the kernel did not have 
strict error handling hygiene that allows you to add those EAGAIN 
returns without too much fear.

>
>> However, that function can EAGAIN (it does for IOLOCK) so maybe we
>> can change xfs_ilock to xfs_ilock_nowait and teach it about not
>> waiting for ILOCK too.
> If only it were that simple. Why, exactly, does the direct IO write
> code require the ILOCK exclusive? Indeed, if it goes to allocate
> blocks, we do this:
>
>                  /*
>                   * xfs_iomap_write_direct() expects the shared lock. It
>                   * is unlocked on return.
>                   */
>                  if (lockmode == XFS_ILOCK_EXCL)
>                          xfs_ilock_demote(ip, lockmode);
>
> We demote the lock to shared before we call into the allocation
> code. And for pure direct IO writes, all we care about is ensuring
> the extent map does not change while we do the lookup and check.
> That only requires a shared lock.
>
> So now I've got to go work out why need_excl_ilock() says we need
> an exclusive ilock for direct IO writes when it looks pretty clear
> to me that we don't.
>
> But that's only half the problem. The other problem is that even if
> we take it shared, we're still going to block on IO completion
> taking the ILOCK exclusive to do things like unwritten extent
> completion. So we still need to hack about with "trylock" operations
> into functions into various functions (xfs_ilock_data_map_shared()
> for one).

I'll try your patch and report, thanks.

> What a mess....
>
> Cheers,
>
> Dave.


      parent reply	other threads:[~2018-02-25 17:47 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-07 17:20 xfs_buf_lock vs aio Avi Kivity
2018-02-07 23:33 ` Dave Chinner
2018-02-08  8:24   ` Avi Kivity
2018-02-08 22:11     ` Dave Chinner
2018-02-09 12:11       ` Avi Kivity
2018-02-09 23:10         ` Dave Chinner
2018-02-12  9:33           ` Avi Kivity
2018-02-13  5:18             ` Dave Chinner
2018-02-13 23:14               ` Darrick J. Wong
2018-02-14  2:16                 ` Dave Chinner
2018-02-14 12:01                   ` Avi Kivity
2018-02-14 12:07               ` Avi Kivity
2018-02-14 12:18                 ` Avi Kivity
2018-02-14 23:56                 ` Dave Chinner
2018-02-15  9:36                   ` Avi Kivity
2018-02-15 21:30                     ` Dave Chinner
2018-02-16  8:07                       ` Avi Kivity
2018-02-19  2:40                         ` Dave Chinner
2018-02-19  4:48                           ` Dave Chinner
2018-02-25 17:47                           ` Avi Kivity [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ea00b0d5-3d29-d8a7-13f9-83e6a4404192@scylladb.com \
    --to=avi@scylladb.com \
    --cc=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).