public inbox for linux-block@vger.kernel.org
 help / color / mirror / Atom feed
From: Shaohua Li <shli@kernel.org>
To: Goldwyn Rodrigues <rgoldwyn@suse.de>
Cc: linux-block@vger.kernel.org, hch@infradead.org, jack@suse.cz,
	linux-raid@vger.kernel.org, dm-devel@redhat.com
Subject: Re: [PATCH 1/9] QUEUE_FLAG_NOWAIT to indicate device supports nowait
Date: Wed, 9 Aug 2017 18:17:42 -0700	[thread overview]
Message-ID: <20170810011742.s45ugh55jslvkguu@kernel.org> (raw)
In-Reply-To: <b3017adc-619d-06f7-ffb5-5ba14f01ffc2@suse.de>

On Wed, Aug 09, 2017 at 05:16:23PM -0500, Goldwyn Rodrigues wrote:
> 
> 
> On 08/09/2017 03:21 PM, Shaohua Li wrote:
> > On Wed, Aug 09, 2017 at 10:35:39AM -0500, Goldwyn Rodrigues wrote:
> >>
> >>
> >> On 08/09/2017 10:02 AM, Shaohua Li wrote:
> >>> On Wed, Aug 09, 2017 at 06:44:55AM -0500, Goldwyn Rodrigues wrote:
> >>>>
> >>>>
> >>>> On 08/08/2017 03:32 PM, Shaohua Li wrote:
> >>>>> On Wed, Jul 26, 2017 at 06:57:58PM -0500, Goldwyn Rodrigues wrote:
> >>>>>> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
> >>>>>>
> >>>>>> Nowait is a feature of direct AIO, where users can request
> >>>>>> to return immediately if the I/O is going to block. This translates
> >>>>>> to REQ_NOWAIT in bio.bi_opf flags. While request based devices
> >>>>>> don't wait, stacked devices such as md/dm will.
> >>>>>>
> >>>>>> In order to explicitly mark stacked devices as supported, we
> >>>>>> set the QUEUE_FLAG_NOWAIT in the queue_flags and return -EAGAIN
> >>>>>> whenever the device would block.
> >>>>>
> >>>>> probably you should route this patch to Jens first, DM/MD are different trees.
> >>>>
> >>>> Yes, I have sent it to linux-block as well, and he has commented as well.
> >>>>
> >>>>
> >>>>>  
> >>>>>> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
> >>>>>> ---
> >>>>>>  block/blk-core.c       | 3 ++-
> >>>>>>  include/linux/blkdev.h | 2 ++
> >>>>>>  2 files changed, 4 insertions(+), 1 deletion(-)
> >>>>>>
> >>>>>> diff --git a/block/blk-core.c b/block/blk-core.c
> >>>>>> index 970b9c9638c5..1c9a981d88e5 100644
> >>>>>> --- a/block/blk-core.c
> >>>>>> +++ b/block/blk-core.c
> >>>>>> @@ -2025,7 +2025,8 @@ generic_make_request_checks(struct bio *bio)
> >>>>>>  	 * if queue is not a request based queue.
> >>>>>>  	 */
> >>>>>>  
> >>>>>> -	if ((bio->bi_opf & REQ_NOWAIT) && !queue_is_rq_based(q))
> >>>>>> +	if ((bio->bi_opf & REQ_NOWAIT) && !queue_is_rq_based(q) &&
> >>>>>> +	    !blk_queue_supports_nowait(q))
> >>>>>>  		goto not_supported;
> >>>>>>  
> >>>>>>  	part = bio->bi_bdev->bd_part;
> >>>>>> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> >>>>>> index 25f6a0cb27d3..fae021ebec1b 100644
> >>>>>> --- a/include/linux/blkdev.h
> >>>>>> +++ b/include/linux/blkdev.h
> >>>>>> @@ -633,6 +633,7 @@ struct request_queue {
> >>>>>>  #define QUEUE_FLAG_REGISTERED  29	/* queue has been registered to a disk */
> >>>>>>  #define QUEUE_FLAG_SCSI_PASSTHROUGH 30	/* queue supports SCSI commands */
> >>>>>>  #define QUEUE_FLAG_QUIESCED    31	/* queue has been quiesced */
> >>>>>> +#define QUEUE_FLAG_NOWAIT      32	/* stack device driver supports REQ_NOWAIT */
> >>>>>>  
> >>>>>>  #define QUEUE_FLAG_DEFAULT	((1 << QUEUE_FLAG_IO_STAT) |		\
> >>>>>>  				 (1 << QUEUE_FLAG_STACKABLE)	|	\
> >>>>>> @@ -732,6 +733,7 @@ static inline void queue_flag_clear(unsigned int flag, struct request_queue *q)
> >>>>>>  #define blk_queue_dax(q)	test_bit(QUEUE_FLAG_DAX, &(q)->queue_flags)
> >>>>>>  #define blk_queue_scsi_passthrough(q)	\
> >>>>>>  	test_bit(QUEUE_FLAG_SCSI_PASSTHROUGH, &(q)->queue_flags)
> >>>>>> +#define blk_queue_supports_nowait(q)	test_bit(QUEUE_FLAG_NOWAIT, &(q)->queue_flags)
> >>>>>
> >>>>> Should this bit consider under layer disks? For example, one raid array disk
> >>>>> doesn't support NOWAIT, shouldn't we disable NOWAIT for the array?
> >>>>
> >>>> Yes, it should. I will add a check before setting the flag. Thanks.
> >>>> Request-based devices don't wait. So, they would not have this flag set.
> >>>> It is only the bio-based, with the  make_request_fn hook which need this.
> >>>>
> >>>>>  
> >>>>> I have another generic question. If a bio is splitted into 2 bios, one bio
> >>>>> doesn't need to wait but the other need to wait. We will return -EAGAIN for the
> >>>>> second bio, so the whole bio will return -EAGAIN, but the first bio is already
> >>>>> dispatched to disk. Is this correct behavior?
> >>>>>
> >>>>
> >>>> No, from a multi-device point of view, this is inconsistent. I have
> >>>> tried the request bio returns -EAGAIN before the split, but I shall
> >>>> check again. Where do you see this happening?
> >>>
> >>> No, this isn't multi-device specific, any driver can do it. Please see blk_queue_split.
> >>>
> >>
> >> In that case, the bio end_io function is chained and the bio of the
> >> split will replicate the error to the parent (if not already set).
> > 
> > this doesn't answer my question. So if a bio returns -EAGAIN, part of the bio
> > probably already dispatched to disk (if the bio is splitted to 2 bios, one
> > returns -EAGAIN, the other one doesn't block and dispatch to disk), what will
> > application be going to do? I think this is different to other IO errors. FOr
> > other IO errors, application will handle the error, while we ask app to retry
> > the whole bio here and app doesn't know part of bio is already written to disk.
> 
> It is the same as for other I/O errors as well, such as EIO. You do not
> know which bio of all submitted bio's returned the error EIO. The
> application would and should consider the whole I/O as failed.
> 
> The user application does not know of bios, or how it is going to be
> split in the underlying layers. It knows at the system call level. In
> this case, the EAGAIN will be returned to the user for the whole I/O not
> as a part of the I/O. It is up to application to try the I/O again with
> or without RWF_NOWAIT set. In direct I/O, it is bubbled out using
> dio->io_error. You can read about it at the patch header for the initial
> patchset at [1].
> 
> Use case: It is for applications having two threads, a compute thread
> and an I/O thread. It would try to push AIO as much as possible in the
> compute thread using RWF_NOWAIT, and if it fails, would pass it on to
> I/O thread which would perform without RWF_NOWAIT. End result if done
> right is you save on context switches and all the
> synchronization/messaging machinery to perform I/O.
> 
> [1] http://marc.info/?l=linux-block&m=149789003305876&w=2

Yes, I knew the concept, but I didn't see previous patches mentioned the
-EAGAIN actually should be taken as a real IO error. This means a lot to
applications and make the API hard to use. I'm wondering if we should disable
bio split for NOWAIT bio, which will make the -EAGAIN only mean 'try again'.

Thanks,
Shaohua

  reply	other threads:[~2017-08-10  1:17 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-26 23:57 [PATCH 0/9] Nowait feature for stacked block devices Goldwyn Rodrigues
2017-07-26 23:57 ` [PATCH 1/9] QUEUE_FLAG_NOWAIT to indicate device supports nowait Goldwyn Rodrigues
2017-08-08 20:32   ` Shaohua Li
2017-08-08 20:36     ` Jens Axboe
2017-08-10  2:18       ` Jens Axboe
2017-08-10 11:38         ` Goldwyn Rodrigues
2017-08-10 14:14           ` Jens Axboe
2017-08-10 17:15             ` Goldwyn Rodrigues
2017-08-10 17:17               ` Jens Axboe
2017-08-09 11:44     ` Goldwyn Rodrigues
2017-08-09 15:02       ` Shaohua Li
2017-08-09 15:35         ` Goldwyn Rodrigues
2017-08-09 20:21           ` Shaohua Li
2017-08-09 22:16             ` Goldwyn Rodrigues
2017-08-10  1:17               ` Shaohua Li [this message]
2017-08-10  2:07                 ` Goldwyn Rodrigues
2017-08-10  2:17                   ` Jens Axboe
2017-08-10 11:49                     ` Goldwyn Rodrigues
2017-08-10 14:23                       ` Jens Axboe
2017-08-10 14:25                       ` Jan Kara
2017-08-10 14:28                         ` Jens Axboe
2017-08-10 17:15                           ` Goldwyn Rodrigues
2017-08-10 17:20                             ` Jens Axboe
2017-07-26 23:57 ` [PATCH 2/9] md: Add nowait support to md Goldwyn Rodrigues
2017-08-08 20:34   ` Shaohua Li
2017-07-26 23:58 ` [PATCH 3/9] md: raid1 nowait support Goldwyn Rodrigues
2017-08-08 20:39   ` Shaohua Li
2017-08-09 11:45     ` Goldwyn Rodrigues
2017-07-26 23:58 ` [PATCH 4/9] md: raid5 " Goldwyn Rodrigues
2017-08-08 20:43   ` Shaohua Li
2017-08-09 11:45     ` Goldwyn Rodrigues
2017-07-26 23:58 ` [PATCH 5/9] md: raid10 " Goldwyn Rodrigues
2017-08-08 20:40   ` Shaohua Li
2017-07-26 23:58 ` [PATCH 6/9] dm: add " Goldwyn Rodrigues
2017-07-26 23:58 ` [PATCH 7/9] dm: Add nowait support to raid1 Goldwyn Rodrigues
2017-07-26 23:58 ` [PATCH 8/9] dm: Add nowait support to dm-delay Goldwyn Rodrigues
2017-07-26 23:58 ` [PATCH 9/9] dm-mpath: Add nowait support Goldwyn Rodrigues
  -- strict thread matches above, loose matches on Subject: below --
2017-10-04 13:55 [PATCH v2 0/9] Nowait support for stacked block devices Goldwyn Rodrigues
2017-10-04 13:55 ` [PATCH 1/9] QUEUE_FLAG_NOWAIT to indicate device supports nowait Goldwyn Rodrigues

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170810011742.s45ugh55jslvkguu@kernel.org \
    --to=shli@kernel.org \
    --cc=dm-devel@redhat.com \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=rgoldwyn@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox