linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Christoph Hellwig <hch@infradead.org>
To: Brian Foster <bfoster@redhat.com>
Cc: Tomasz Grabiec <tgrabiec@scylladb.com>, linux-xfs@vger.kernel.org
Subject: Re: io_submit() blocks for writes for substantial amount of time
Date: Tue, 19 Sep 2017 07:58:27 -0700	[thread overview]
Message-ID: <20170919145827.GA21523@infradead.org> (raw)
In-Reply-To: <20170919122704.GA3487@bfoster.bfoster>

On Tue, Sep 19, 2017 at 08:27:05AM -0400, Brian Foster wrote:
> > Please advise, is this a known bug? When can it happen? Is there a way
> > to work it around to avoid blocking?
> > 
> 
> I'm not sure how either could be considered a bug based on the stack
> trace information alone. Allocations may require reading metadata and
> reads are synchronous. This all seems like pretty basic filesystem
> behavior.
> 
> I suppose performance may be a separate question. For the latter issue,
> I'd be curious whether leaving more free space available in the
> filesystem would help avoid running into busy extents. Perhaps having
> more memory and thus a larger buffer cache for btree blocks could help
> mitigate the former issue..? The deterministic workaround for both is to
> preallocate the associated file. If the file would be too large, another
> option may be to set an extent size hint to allocate the file in larger
> chunks and amortize the cost of the allocations over multiple writes.

Note that Linux 4.13 and later support a RWF_NOWAIT flag, that will
return -EAGAIN from io_submit for these conditions so they can be
handled by a thread pool.

Note that until a few years ago we performed all allocations from
a workqueue, this was changed by:

commit cf11da9c5d374962913ca5ba0ce0886b58286224
Author: Dave Chinner <dchinner@redhat.com>
Date:   Tue Jul 15 07:08:24 2014 +1000

    xfs: refine the allocation stack switch

to only defer btree splits to a workqueue.  With that previous scheme
there might have been an option to defer AIO allocations to a workqueue,
but the main issue with that is that the worker thread which is then
going to do the actual data transfer would have to "borrow" the
mm_struct from the submitter.  That's the primary reason why something
like that was never implemented in mainline Linux.

  reply	other threads:[~2017-09-19 14:58 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-19  8:50 io_submit() blocks for writes for substantial amount of time Tomasz Grabiec
2017-09-19 12:27 ` Brian Foster
2017-09-19 14:58   ` Christoph Hellwig [this message]
2017-09-19 16:31     ` Avi Kivity
2017-09-19 17:39       ` Brian Foster
2017-09-19 20:34         ` Christoph Hellwig
2017-09-20  6:17         ` Avi Kivity
2017-09-20 10:50           ` Brian Foster
2017-09-20 11:11             ` Avi Kivity
2017-09-20 14:49               ` Christoph Hellwig
2017-09-23 18:23                 ` Avi Kivity
2017-09-19 20:34       ` Christoph Hellwig
2017-09-20  6:14         ` Avi Kivity
2017-09-19 16:29   ` Avi Kivity
2017-09-19 17:38     ` Brian Foster
2017-09-19 17:53       ` Tomasz Grabiec
2017-09-19 23:38         ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170919145827.GA21523@infradead.org \
    --to=hch@infradead.org \
    --cc=bfoster@redhat.com \
    --cc=linux-xfs@vger.kernel.org \
    --cc=tgrabiec@scylladb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).