From: Avi Kivity <avi@scylladb.com>
To: Christoph Hellwig <hch@infradead.org>, Brian Foster <bfoster@redhat.com>
Cc: Tomasz Grabiec <tgrabiec@scylladb.com>, linux-xfs@vger.kernel.org
Subject: Re: io_submit() blocks for writes for substantial amount of time
Date: Tue, 19 Sep 2017 19:31:04 +0300 [thread overview]
Message-ID: <04cb3ee7-e7d5-6bba-6adb-8ac1c28e68dc@scylladb.com> (raw)
In-Reply-To: <20170919145827.GA21523@infradead.org>
On 09/19/2017 05:58 PM, Christoph Hellwig wrote:
> On Tue, Sep 19, 2017 at 08:27:05AM -0400, Brian Foster wrote:
>>> Please advise, is this a known bug? When can it happen? Is there a way
>>> to work it around to avoid blocking?
>>>
>> I'm not sure how either could be considered a bug based on the stack
>> trace information alone. Allocations may require reading metadata and
>> reads are synchronous. This all seems like pretty basic filesystem
>> behavior.
>>
>> I suppose performance may be a separate question. For the latter issue,
>> I'd be curious whether leaving more free space available in the
>> filesystem would help avoid running into busy extents. Perhaps having
>> more memory and thus a larger buffer cache for btree blocks could help
>> mitigate the former issue..? The deterministic workaround for both is to
>> preallocate the associated file. If the file would be too large, another
>> option may be to set an extent size hint to allocate the file in larger
>> chunks and amortize the cost of the allocations over multiple writes.
> Note that Linux 4.13 and later support a RWF_NOWAIT flag, that will
> return -EAGAIN from io_submit for these conditions so they can be
> handled by a thread pool.
>
> Note that until a few years ago we performed all allocations from
> a workqueue, this was changed by:
>
> commit cf11da9c5d374962913ca5ba0ce0886b58286224
> Author: Dave Chinner <dchinner@redhat.com>
> Date: Tue Jul 15 07:08:24 2014 +1000
>
> xfs: refine the allocation stack switch
>
> to only defer btree splits to a workqueue. With that previous scheme
> there might have been an option to defer AIO allocations to a workqueue,
> but the main issue with that is that the worker thread which is then
> going to do the actual data transfer would have to "borrow" the
> mm_struct from the submitter. That's the primary reason why something
> like that was never implemented in mainline Linux.
For DIO, does it really need the mm_struct? It can just pin the pages
and pass them to the workqueue function.
next prev parent reply other threads:[~2017-09-19 16:31 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-09-19 8:50 io_submit() blocks for writes for substantial amount of time Tomasz Grabiec
2017-09-19 12:27 ` Brian Foster
2017-09-19 14:58 ` Christoph Hellwig
2017-09-19 16:31 ` Avi Kivity [this message]
2017-09-19 17:39 ` Brian Foster
2017-09-19 20:34 ` Christoph Hellwig
2017-09-20 6:17 ` Avi Kivity
2017-09-20 10:50 ` Brian Foster
2017-09-20 11:11 ` Avi Kivity
2017-09-20 14:49 ` Christoph Hellwig
2017-09-23 18:23 ` Avi Kivity
2017-09-19 20:34 ` Christoph Hellwig
2017-09-20 6:14 ` Avi Kivity
2017-09-19 16:29 ` Avi Kivity
2017-09-19 17:38 ` Brian Foster
2017-09-19 17:53 ` Tomasz Grabiec
2017-09-19 23:38 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=04cb3ee7-e7d5-6bba-6adb-8ac1c28e68dc@scylladb.com \
--to=avi@scylladb.com \
--cc=bfoster@redhat.com \
--cc=hch@infradead.org \
--cc=linux-xfs@vger.kernel.org \
--cc=tgrabiec@scylladb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).