public inbox for linux-block@vger.kernel.org
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@kernel.dk>
To: Goldwyn Rodrigues <rgoldwyn@suse.de>, Jan Kara <jack@suse.cz>
Cc: Shaohua Li <shli@kernel.org>,
	linux-block@vger.kernel.org, hch@infradead.org,
	linux-raid@vger.kernel.org, dm-devel@redhat.com
Subject: Re: [PATCH 1/9] QUEUE_FLAG_NOWAIT to indicate device supports nowait
Date: Thu, 10 Aug 2017 11:20:02 -0600	[thread overview]
Message-ID: <c6b91e66-dd5b-5bbb-7b19-9ec478f1ab94@kernel.dk> (raw)
In-Reply-To: <05106658-7bc4-4054-fbda-3b33ea32e215@suse.de>

On 08/10/2017 11:15 AM, Goldwyn Rodrigues wrote:
> 
> 
> On 08/10/2017 09:28 AM, Jens Axboe wrote:
>> On 08/10/2017 08:25 AM, Jan Kara wrote:
>>> On Thu 10-08-17 06:49:53, Goldwyn Rodrigues wrote:
>>>> On 08/09/2017 09:17 PM, Jens Axboe wrote:
>>>>> On 08/09/2017 08:07 PM, Goldwyn Rodrigues wrote:
>>>>>>>>>>>> No, from a multi-device point of view, this is inconsistent. I
>>>>>>>>>>>> have tried the request bio returns -EAGAIN before the split, but
>>>>>>>>>>>> I shall check again. Where do you see this happening?
>>>>>>>>>>>
>>>>>>>>>>> No, this isn't multi-device specific, any driver can do it.
>>>>>>>>>>> Please see blk_queue_split.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> In that case, the bio end_io function is chained and the bio of
>>>>>>>>>> the split will replicate the error to the parent (if not already
>>>>>>>>>> set).
>>>>>>>>>
>>>>>>>>> this doesn't answer my question. So if a bio returns -EAGAIN, part
>>>>>>>>> of the bio probably already dispatched to disk (if the bio is
>>>>>>>>> splitted to 2 bios, one returns -EAGAIN, the other one doesn't
>>>>>>>>> block and dispatch to disk), what will application be going to do?
>>>>>>>>> I think this is different to other IO errors. FOr other IO errors,
>>>>>>>>> application will handle the error, while we ask app to retry the
>>>>>>>>> whole bio here and app doesn't know part of bio is already written
>>>>>>>>> to disk.
>>>>>>>>
>>>>>>>> It is the same as for other I/O errors as well, such as EIO. You do
>>>>>>>> not know which bio of all submitted bio's returned the error EIO.
>>>>>>>> The application would and should consider the whole I/O as failed.
>>>>>>>>
>>>>>>>> The user application does not know of bios, or how it is going to be
>>>>>>>> split in the underlying layers. It knows at the system call level.
>>>>>>>> In this case, the EAGAIN will be returned to the user for the whole
>>>>>>>> I/O not as a part of the I/O. It is up to application to try the I/O
>>>>>>>> again with or without RWF_NOWAIT set. In direct I/O, it is bubbled
>>>>>>>> out using dio->io_error. You can read about it at the patch header
>>>>>>>> for the initial patchset at [1].
>>>>>>>>
>>>>>>>> Use case: It is for applications having two threads, a compute
>>>>>>>> thread and an I/O thread. It would try to push AIO as much as
>>>>>>>> possible in the compute thread using RWF_NOWAIT, and if it fails,
>>>>>>>> would pass it on to I/O thread which would perform without
>>>>>>>> RWF_NOWAIT. End result if done right is you save on context switches
>>>>>>>> and all the synchronization/messaging machinery to perform I/O.
>>>>>>>>
>>>>>>>> [1] http://marc.info/?l=linux-block&m=149789003305876&w=2
>>>>>>>
>>>>>>> Yes, I knew the concept, but I didn't see previous patches mentioned
>>>>>>> the -EAGAIN actually should be taken as a real IO error. This means a
>>>>>>> lot to applications and make the API hard to use. I'm wondering if we
>>>>>>> should disable bio split for NOWAIT bio, which will make the -EAGAIN
>>>>>>> only mean 'try again'.
>>>>>>
>>>>>> Don't take it as EAGAIN, but read it as EWOULDBLOCK. Why do you say
>>>>>> the API is hard to use? Do you have a case to back it up?
>>>>>
>>>>> Because it is hard to use, and potentially suboptimal. Let's say you're
>>>>> doing a 1MB write, we hit EWOULDBLOCK for the last split. Do we return a
>>>>> short write, or do we return EWOULDBLOCK? If the latter, then that
>>>>> really sucks from an API point of view.
>>>>>
>>>>>> No, not splitting the bio does not make sense here. I do not see any
>>>>>> advantage in it, unless you can present a case otherwise.
>>>>>
>>>>> It ties back into the "hard to use" that I do agree with IFF we don't
>>>>> return the short write. It's hard for an application to use that
>>>>> efficiently, if we write 1MB-128K but get EWOULDBLOCK, the re-write the
>>>>> full 1MB from a different context.
>>>>>
>>>>
>>>> It returns the error code only and not short reads/writes. But isn't
>>>> that true for all system calls in case of error?
>>>>
>>>> For aio, there are two result fields in io_event out of which one could
>>>> be used for error while the other be used for amount of writes/reads
>>>> performed. However, only one is used. This will not work with
>>>> pread()/pwrite() calls though because of the limitation of return values.
>>>>
>>>> Finally, what if the EWOULDBLOCK is returned for an earlier bio (say
>>>> offset 128k) for a 1MB pwrite(), while the rest of the 7 128K are
>>>> successful. What short return value should the system call return?
>>>
>>> This is indeed tricky. If an application submits 1MB write, I don't think
>>> we can afford to just write arbitrary subset of it. That just IMHO too much
>>> violates how writes traditionally behaved. Even short writes trigger bugs
>>> in various applications but I'm willing to require that applications using
>>> NOWAIT IO can handle these. However writing arbitrary subset looks like a
>>> nasty catch. IMHO we should not submit further bios until we are sure
>>> current one does not return EWOULDBLOCK when splitting a larger one...
>>
>> Exactly, that's the point that both Shaohua and I was getting at. Short
>> writes should be fine, especially if NOWAIT is set. Discontig writes
>> should also be OK, but it's horrible and inefficient. If we do that,
>> then using this feature is a net-loss, not a win by any stretch.
>>
> 
> To make sure I understand this, we disable bio splits for NOWAIT bio so
> we return EWOULDBLOCK for the entire I/O.

That's also not great, since splits is a common operation, and the majority
of splits can proceed without hitting out-of-resources. So ideally we'd
handle that case, but in a saner fashion than the laissez faire approach
that the current patchset takes.

-- 
Jens Axboe

  reply	other threads:[~2017-08-10 17:20 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-26 23:57 [PATCH 0/9] Nowait feature for stacked block devices Goldwyn Rodrigues
2017-07-26 23:57 ` [PATCH 1/9] QUEUE_FLAG_NOWAIT to indicate device supports nowait Goldwyn Rodrigues
2017-08-08 20:32   ` Shaohua Li
2017-08-08 20:36     ` Jens Axboe
2017-08-10  2:18       ` Jens Axboe
2017-08-10 11:38         ` Goldwyn Rodrigues
2017-08-10 14:14           ` Jens Axboe
2017-08-10 17:15             ` Goldwyn Rodrigues
2017-08-10 17:17               ` Jens Axboe
2017-08-09 11:44     ` Goldwyn Rodrigues
2017-08-09 15:02       ` Shaohua Li
2017-08-09 15:35         ` Goldwyn Rodrigues
2017-08-09 20:21           ` Shaohua Li
2017-08-09 22:16             ` Goldwyn Rodrigues
2017-08-10  1:17               ` Shaohua Li
2017-08-10  2:07                 ` Goldwyn Rodrigues
2017-08-10  2:17                   ` Jens Axboe
2017-08-10 11:49                     ` Goldwyn Rodrigues
2017-08-10 14:23                       ` Jens Axboe
2017-08-10 14:25                       ` Jan Kara
2017-08-10 14:28                         ` Jens Axboe
2017-08-10 17:15                           ` Goldwyn Rodrigues
2017-08-10 17:20                             ` Jens Axboe [this message]
2017-07-26 23:57 ` [PATCH 2/9] md: Add nowait support to md Goldwyn Rodrigues
2017-08-08 20:34   ` Shaohua Li
2017-07-26 23:58 ` [PATCH 3/9] md: raid1 nowait support Goldwyn Rodrigues
2017-08-08 20:39   ` Shaohua Li
2017-08-09 11:45     ` Goldwyn Rodrigues
2017-07-26 23:58 ` [PATCH 4/9] md: raid5 " Goldwyn Rodrigues
2017-08-08 20:43   ` Shaohua Li
2017-08-09 11:45     ` Goldwyn Rodrigues
2017-07-26 23:58 ` [PATCH 5/9] md: raid10 " Goldwyn Rodrigues
2017-08-08 20:40   ` Shaohua Li
2017-07-26 23:58 ` [PATCH 6/9] dm: add " Goldwyn Rodrigues
2017-07-26 23:58 ` [PATCH 7/9] dm: Add nowait support to raid1 Goldwyn Rodrigues
2017-07-26 23:58 ` [PATCH 8/9] dm: Add nowait support to dm-delay Goldwyn Rodrigues
2017-07-26 23:58 ` [PATCH 9/9] dm-mpath: Add nowait support Goldwyn Rodrigues
  -- strict thread matches above, loose matches on Subject: below --
2017-10-04 13:55 [PATCH v2 0/9] Nowait support for stacked block devices Goldwyn Rodrigues
2017-10-04 13:55 ` [PATCH 1/9] QUEUE_FLAG_NOWAIT to indicate device supports nowait Goldwyn Rodrigues

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c6b91e66-dd5b-5bbb-7b19-9ec478f1ab94@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=dm-devel@redhat.com \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=rgoldwyn@suse.de \
    --cc=shli@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox