From: Jens Axboe <axboe@kernel.dk>
To: Jan Kara <jack@suse.cz>, Goldwyn Rodrigues <rgoldwyn@suse.de>
Cc: Shaohua Li <shli@kernel.org>,
linux-block@vger.kernel.org, hch@infradead.org,
linux-raid@vger.kernel.org, dm-devel@redhat.com
Subject: Re: [PATCH 1/9] QUEUE_FLAG_NOWAIT to indicate device supports nowait
Date: Thu, 10 Aug 2017 08:28:28 -0600 [thread overview]
Message-ID: <e0c72d10-388e-efac-242d-1b3542490085@kernel.dk> (raw)
In-Reply-To: <20170810142515.GC14925@quack2.suse.cz>
On 08/10/2017 08:25 AM, Jan Kara wrote:
> On Thu 10-08-17 06:49:53, Goldwyn Rodrigues wrote:
>> On 08/09/2017 09:17 PM, Jens Axboe wrote:
>>> On 08/09/2017 08:07 PM, Goldwyn Rodrigues wrote:
>>>>>>>>>> No, from a multi-device point of view, this is inconsistent. I
>>>>>>>>>> have tried the request bio returns -EAGAIN before the split, but
>>>>>>>>>> I shall check again. Where do you see this happening?
>>>>>>>>>
>>>>>>>>> No, this isn't multi-device specific, any driver can do it.
>>>>>>>>> Please see blk_queue_split.
>>>>>>>>>
>>>>>>>>
>>>>>>>> In that case, the bio end_io function is chained and the bio of
>>>>>>>> the split will replicate the error to the parent (if not already
>>>>>>>> set).
>>>>>>>
>>>>>>> this doesn't answer my question. So if a bio returns -EAGAIN, part
>>>>>>> of the bio probably already dispatched to disk (if the bio is
>>>>>>> splitted to 2 bios, one returns -EAGAIN, the other one doesn't
>>>>>>> block and dispatch to disk), what will application be going to do?
>>>>>>> I think this is different to other IO errors. FOr other IO errors,
>>>>>>> application will handle the error, while we ask app to retry the
>>>>>>> whole bio here and app doesn't know part of bio is already written
>>>>>>> to disk.
>>>>>>
>>>>>> It is the same as for other I/O errors as well, such as EIO. You do
>>>>>> not know which bio of all submitted bio's returned the error EIO.
>>>>>> The application would and should consider the whole I/O as failed.
>>>>>>
>>>>>> The user application does not know of bios, or how it is going to be
>>>>>> split in the underlying layers. It knows at the system call level.
>>>>>> In this case, the EAGAIN will be returned to the user for the whole
>>>>>> I/O not as a part of the I/O. It is up to application to try the I/O
>>>>>> again with or without RWF_NOWAIT set. In direct I/O, it is bubbled
>>>>>> out using dio->io_error. You can read about it at the patch header
>>>>>> for the initial patchset at [1].
>>>>>>
>>>>>> Use case: It is for applications having two threads, a compute
>>>>>> thread and an I/O thread. It would try to push AIO as much as
>>>>>> possible in the compute thread using RWF_NOWAIT, and if it fails,
>>>>>> would pass it on to I/O thread which would perform without
>>>>>> RWF_NOWAIT. End result if done right is you save on context switches
>>>>>> and all the synchronization/messaging machinery to perform I/O.
>>>>>>
>>>>>> [1] http://marc.info/?l=linux-block&m=149789003305876&w=2
>>>>>
>>>>> Yes, I knew the concept, but I didn't see previous patches mentioned
>>>>> the -EAGAIN actually should be taken as a real IO error. This means a
>>>>> lot to applications and make the API hard to use. I'm wondering if we
>>>>> should disable bio split for NOWAIT bio, which will make the -EAGAIN
>>>>> only mean 'try again'.
>>>>
>>>> Don't take it as EAGAIN, but read it as EWOULDBLOCK. Why do you say
>>>> the API is hard to use? Do you have a case to back it up?
>>>
>>> Because it is hard to use, and potentially suboptimal. Let's say you're
>>> doing a 1MB write, we hit EWOULDBLOCK for the last split. Do we return a
>>> short write, or do we return EWOULDBLOCK? If the latter, then that
>>> really sucks from an API point of view.
>>>
>>>> No, not splitting the bio does not make sense here. I do not see any
>>>> advantage in it, unless you can present a case otherwise.
>>>
>>> It ties back into the "hard to use" that I do agree with IFF we don't
>>> return the short write. It's hard for an application to use that
>>> efficiently, if we write 1MB-128K but get EWOULDBLOCK, the re-write the
>>> full 1MB from a different context.
>>>
>>
>> It returns the error code only and not short reads/writes. But isn't
>> that true for all system calls in case of error?
>>
>> For aio, there are two result fields in io_event out of which one could
>> be used for error while the other be used for amount of writes/reads
>> performed. However, only one is used. This will not work with
>> pread()/pwrite() calls though because of the limitation of return values.
>>
>> Finally, what if the EWOULDBLOCK is returned for an earlier bio (say
>> offset 128k) for a 1MB pwrite(), while the rest of the 7 128K are
>> successful. What short return value should the system call return?
>
> This is indeed tricky. If an application submits 1MB write, I don't think
> we can afford to just write arbitrary subset of it. That just IMHO too much
> violates how writes traditionally behaved. Even short writes trigger bugs
> in various applications but I'm willing to require that applications using
> NOWAIT IO can handle these. However writing arbitrary subset looks like a
> nasty catch. IMHO we should not submit further bios until we are sure
> current one does not return EWOULDBLOCK when splitting a larger one...
Exactly, that's the point that both Shaohua and I was getting at. Short
writes should be fine, especially if NOWAIT is set. Discontig writes
should also be OK, but it's horrible and inefficient. If we do that,
then using this feature is a net-loss, not a win by any stretch.
--
Jens Axboe
next prev parent reply other threads:[~2017-08-10 14:28 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-07-26 23:57 [PATCH 0/9] Nowait feature for stacked block devices Goldwyn Rodrigues
2017-07-26 23:57 ` [PATCH 1/9] QUEUE_FLAG_NOWAIT to indicate device supports nowait Goldwyn Rodrigues
2017-08-08 20:32 ` Shaohua Li
2017-08-08 20:36 ` Jens Axboe
2017-08-10 2:18 ` Jens Axboe
2017-08-10 11:38 ` Goldwyn Rodrigues
2017-08-10 14:14 ` Jens Axboe
2017-08-10 17:15 ` Goldwyn Rodrigues
2017-08-10 17:17 ` Jens Axboe
2017-08-09 11:44 ` Goldwyn Rodrigues
2017-08-09 15:02 ` Shaohua Li
2017-08-09 15:35 ` Goldwyn Rodrigues
2017-08-09 20:21 ` Shaohua Li
2017-08-09 22:16 ` Goldwyn Rodrigues
2017-08-10 1:17 ` Shaohua Li
2017-08-10 2:07 ` Goldwyn Rodrigues
2017-08-10 2:17 ` Jens Axboe
2017-08-10 11:49 ` Goldwyn Rodrigues
2017-08-10 14:23 ` Jens Axboe
2017-08-10 14:25 ` Jan Kara
2017-08-10 14:28 ` Jens Axboe [this message]
2017-08-10 17:15 ` Goldwyn Rodrigues
2017-08-10 17:20 ` Jens Axboe
2017-07-26 23:57 ` [PATCH 2/9] md: Add nowait support to md Goldwyn Rodrigues
2017-08-08 20:34 ` Shaohua Li
2017-07-26 23:58 ` [PATCH 3/9] md: raid1 nowait support Goldwyn Rodrigues
2017-08-08 20:39 ` Shaohua Li
2017-08-09 11:45 ` Goldwyn Rodrigues
2017-07-26 23:58 ` [PATCH 4/9] md: raid5 " Goldwyn Rodrigues
2017-08-08 20:43 ` Shaohua Li
2017-08-09 11:45 ` Goldwyn Rodrigues
2017-07-26 23:58 ` [PATCH 5/9] md: raid10 " Goldwyn Rodrigues
2017-08-08 20:40 ` Shaohua Li
2017-07-26 23:58 ` [PATCH 6/9] dm: add " Goldwyn Rodrigues
2017-07-26 23:58 ` [PATCH 7/9] dm: Add nowait support to raid1 Goldwyn Rodrigues
2017-07-26 23:58 ` [PATCH 8/9] dm: Add nowait support to dm-delay Goldwyn Rodrigues
2017-07-26 23:58 ` [PATCH 9/9] dm-mpath: Add nowait support Goldwyn Rodrigues
-- strict thread matches above, loose matches on Subject: below --
2017-10-04 13:55 [PATCH v2 0/9] Nowait support for stacked block devices Goldwyn Rodrigues
2017-10-04 13:55 ` [PATCH 1/9] QUEUE_FLAG_NOWAIT to indicate device supports nowait Goldwyn Rodrigues
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e0c72d10-388e-efac-242d-1b3542490085@kernel.dk \
--to=axboe@kernel.dk \
--cc=dm-devel@redhat.com \
--cc=hch@infradead.org \
--cc=jack@suse.cz \
--cc=linux-block@vger.kernel.org \
--cc=linux-raid@vger.kernel.org \
--cc=rgoldwyn@suse.de \
--cc=shli@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox