From: Avi Kivity <avi@scylladb.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Brian Foster <bfoster@redhat.com>,
Glauber Costa <glauber@scylladb.com>,
xfs@oss.sgi.com
Subject: Re: sleeps and waits during io_submit
Date: Tue, 1 Dec 2015 23:24:13 +0200 [thread overview]
Message-ID: <565E0FFD.70507@scylladb.com> (raw)
In-Reply-To: <20151201210417.GY19199@dastard>
On 12/01/2015 11:04 PM, Dave Chinner wrote:
> On Tue, Dec 01, 2015 at 05:22:38PM +0200, Avi Kivity wrote:
>> On 12/01/2015 04:56 PM, Brian Foster wrote:
>>> On Tue, Dec 01, 2015 at 03:58:28PM +0200, Avi Kivity wrote:
>>>>> io_submit() can probably block in a variety of
>>>>> places afaict... it might have to read in the inode extent map, allocate
>>>>> blocks, take inode/ag locks, reserve log space for transactions, etc.
>>>> Any chance of changing all that to be asynchronous? Doesn't sound too hard,
>>>> if somebody else has to do it.
>>>>
>>> I'm not following... if the fs needs to read in the inode extent map to
>>> prepare for an allocation, what else can the thread do but wait? Are you
>>> suggesting the request kick off whatever the blocking action happens to
>>> be asynchronously and return with an error such that the request can be
>>> retried later?
>> Not quite, it should be invisible to the caller.
> I have a pony I can sell you.
You already sold me a pony.
>> That is, the code called by io_submit()
>> (file_operations::write_iter, it seems to be called today) can kick
>> off this operation and have it continue from where it left off.
> This is a problem that people have tried to solve in the past (e.g.
> syslets, etc) where the thread executes until it has to block, and
> then it's handled off to a worker thread/syslet to block and the
> main process returns with EIOCBQUEUED.
Yes, I remember that.
> Basically, you're asking for a real AIO infrastructure to
> beintroduced into the kernel, and I think that's beyond what us XFS
> guys can do...
Sure you can, Dave. In fact you feel an irresistible urge to do it.
But I don't think the EIOCBQUEUED thing need be repeated. We can have a
simpler implementation:
- Add a task flag TIF_AIO, which causes any new I/O to fail with
EAIOWOULDBLOCK.
- have __blockdev_direct_IO() do its block-mapping operations with
TIF_AIO set (but remove it just before issuing the bio).
- sys_aio_submit() catches EAIOWOULDBLOCK and resubmits the aio in a
work item, this time without TIF_AIO games.
The effect would be similar to EIOCBQUEUED, but simpler, as instead of
issuing any metadata I/O you abort the operation and restart it from
scratch.
>
>>>>> Reducing the frequency of block allocation/frees might also be
>>>>> another help (e.g., preallocate and reuse files,
>>>> Isn't that discouraged for SSDs?
>>>>
>>> Perhaps, if you're referring to the fact that the blocks are never freed
>>> and thus never discarded..? Are you running fstrim?
>> mount -o discard. And yes, overwrites are supposedly more expensive
>> than trim old data + allocate new data, but maybe if you compare it
>> with the work XFS has to do, perhaps the tradeoff is bad.
> Oh, you do realise that using "-o discard" causes significant delays
> in journal commit processing? i.e. the journal commit completion
> blocks until all the discards have been submitted and waited on
> *synchronously*. This is a problem with the linux block layer in
> that blkdev_issue_discard() is a synchronous operation.....
I do now. What's the unicode for a crying face?
> Hence if you are seeing delays in transactions (e.g. timestamp updates)
> it's entirely possible that things will get much better if you
> remove the discard mount option. It's much better from a performance
> perspective to use the fstrim command every so often - fstrim issues
> discard operations in the context of the fstrim process - it does
> not interact with the transaction subsystem at all.
>
>
All right. On the other hand we have to know when to issue it. That
would be when nn% of the disk area have been rewritten. Is there some
counter I can poll every minute or so for this? Not doing the fstrim in
time would cause the disk performance to tank.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2015-12-01 21:24 UTC|newest]
Thread overview: 58+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-11-28 2:43 sleeps and waits during io_submit Glauber Costa
2015-11-30 14:10 ` Brian Foster
2015-11-30 14:29 ` Avi Kivity
2015-11-30 16:14 ` Brian Foster
2015-12-01 9:08 ` Avi Kivity
2015-12-01 13:11 ` Brian Foster
2015-12-01 13:58 ` Avi Kivity
2015-12-01 14:01 ` Glauber Costa
2015-12-01 14:37 ` Avi Kivity
2015-12-01 20:45 ` Dave Chinner
2015-12-01 20:56 ` Avi Kivity
2015-12-01 23:41 ` Dave Chinner
2015-12-02 8:23 ` Avi Kivity
2015-12-01 14:56 ` Brian Foster
2015-12-01 15:22 ` Avi Kivity
2015-12-01 16:01 ` Brian Foster
2015-12-01 16:08 ` Avi Kivity
2015-12-01 16:29 ` Brian Foster
2015-12-01 17:09 ` Avi Kivity
2015-12-01 18:03 ` Carlos Maiolino
2015-12-01 19:07 ` Avi Kivity
2015-12-01 21:19 ` Dave Chinner
2015-12-01 21:38 ` Avi Kivity
2015-12-01 23:06 ` Dave Chinner
2015-12-02 9:02 ` Avi Kivity
2015-12-02 12:57 ` Carlos Maiolino
2015-12-02 23:19 ` Dave Chinner
2015-12-03 12:52 ` Avi Kivity
2015-12-04 3:16 ` Dave Chinner
2015-12-08 13:52 ` Avi Kivity
2015-12-08 23:13 ` Dave Chinner
2015-12-01 18:51 ` Brian Foster
2015-12-01 19:07 ` Glauber Costa
2015-12-01 19:35 ` Brian Foster
2015-12-01 19:45 ` Avi Kivity
2015-12-01 19:26 ` Avi Kivity
2015-12-01 19:41 ` Christoph Hellwig
2015-12-01 19:50 ` Avi Kivity
2015-12-02 0:13 ` Brian Foster
2015-12-02 0:57 ` Dave Chinner
2015-12-02 8:38 ` Avi Kivity
2015-12-02 8:34 ` Avi Kivity
2015-12-08 6:03 ` Dave Chinner
2015-12-08 13:56 ` Avi Kivity
2015-12-08 23:32 ` Dave Chinner
2015-12-09 8:37 ` Avi Kivity
2015-12-01 21:04 ` Dave Chinner
2015-12-01 21:10 ` Glauber Costa
2015-12-01 21:39 ` Dave Chinner
2015-12-01 21:24 ` Avi Kivity [this message]
2015-12-01 21:31 ` Glauber Costa
2015-11-30 15:49 ` Glauber Costa
2015-12-01 13:11 ` Brian Foster
2015-12-01 13:39 ` Glauber Costa
2015-12-01 14:02 ` Brian Foster
2015-11-30 23:10 ` Dave Chinner
2015-11-30 23:51 ` Glauber Costa
2015-12-01 20:30 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=565E0FFD.70507@scylladb.com \
--to=avi@scylladb.com \
--cc=bfoster@redhat.com \
--cc=david@fromorbit.com \
--cc=glauber@scylladb.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox