[PATCH v2 0/9] Nowait support for stacked block devices

public inbox for linux-block@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2 0/9] Nowait support for stacked block devices
@ 2017-10-04 13:55 Goldwyn Rodrigues
  2017-10-04 13:55 ` [PATCH 1/9] QUEUE_FLAG_NOWAIT to indicate device supports nowait Goldwyn Rodrigues
                   ` (9 more replies)
  0 siblings, 10 replies; 13+ messages in thread
From: Goldwyn Rodrigues @ 2017-10-04 13:55 UTC (permalink / raw)
  To: linux-block; +Cc: axboe, shli

This is a continuation of the nowait support which was incorporated
a while back. We introduced REQ_NOWAIT which would return immediately
if the call would block at the block layer. Request based-devices
do not wait. However, bio based devices (the ones which exclusively
call make_request_fn) need to be trained to handle REQ_NOWAIT.

This effort covers the devices under MD and DM which would block
for any reason. If there should be more devices or situations
which need to be covered, please let me know.

The problem with partial writes discussed during v1 turned out
to be a bug in partial writes during direct I/O and is fixed
by the submitted patch[1].

Changes since v1:
 - mddev to return early in case the device is suspended, within the md code as opposed to ->make_request()
 - Check for nowait support with all the lower devices. Same with if adding a device which does not support nowait.
 - Nowait under each raid is checked before the final I/O submission for the entire I/O.

[1] https://patchwork.kernel.org/patch/9979887/

-- 
Goldwyn

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/9] QUEUE_FLAG_NOWAIT to indicate device supports nowait
  2017-10-04 13:55 [PATCH v2 0/9] Nowait support for stacked block devices Goldwyn Rodrigues
@ 2017-10-04 13:55 ` Goldwyn Rodrigues
  2017-10-04 13:55 ` [PATCH 2/9] md: Add nowait support to md Goldwyn Rodrigues
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 13+ messages in thread
From: Goldwyn Rodrigues @ 2017-10-04 13:55 UTC (permalink / raw)
  To: linux-block; +Cc: axboe, shli, Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

Nowait is a feature of direct AIO, where users can request
to return immediately if the I/O is going to block. This translates
to REQ_NOWAIT in bio.bi_opf flags. While request based devices
don't wait, stacked devices such as md/dm will.

In order to explicitly mark stacked devices as supported, we
set the QUEUE_FLAG_NOWAIT in the queue_flags and return -EAGAIN
whenever the device would block.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 block/blk-core.c       | 4 ++--
 include/linux/blkdev.h | 6 ++++++
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 048be4aa6024..8de633f8c633 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -2044,10 +2044,10 @@ generic_make_request_checks(struct bio *bio)
 
 	/*
 	 * For a REQ_NOWAIT based request, return -EOPNOTSUPP
-	 * if queue is not a request based queue.
+	 * if queue cannot handle nowait bio's
 	 */
 
-	if ((bio->bi_opf & REQ_NOWAIT) && !queue_is_rq_based(q))
+	if ((bio->bi_opf & REQ_NOWAIT) && !blk_queue_supports_nowait(q))
 		goto not_supported;
 
 	if (should_fail_request(&bio->bi_disk->part0, bio->bi_iter.bi_size))
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 02fa42d24b52..1d0da2a9cf46 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -631,6 +631,7 @@ struct request_queue {
 #define QUEUE_FLAG_REGISTERED  26	/* queue has been registered to a disk */
 #define QUEUE_FLAG_SCSI_PASSTHROUGH 27	/* queue supports SCSI commands */
 #define QUEUE_FLAG_QUIESCED    28	/* queue has been quiesced */
+#define QUEUE_FLAG_NOWAIT      29	/* stack device driver supports REQ_NOWAIT */
 
 #define QUEUE_FLAG_DEFAULT	((1 << QUEUE_FLAG_IO_STAT) |		\
 				 (1 << QUEUE_FLAG_STACKABLE)	|	\
@@ -759,6 +760,11 @@ static inline bool queue_is_rq_based(struct request_queue *q)
 	return q->request_fn || q->mq_ops;
 }
 
+static inline bool blk_queue_supports_nowait(struct request_queue *q)
+{
+	return queue_is_rq_based(q) || test_bit(QUEUE_FLAG_NOWAIT, &q->queue_flags);
+}
+
 static inline unsigned int blk_queue_cluster(struct request_queue *q)
 {
 	return q->limits.cluster;
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/9] md: Add nowait support to md
  2017-10-04 13:55 [PATCH v2 0/9] Nowait support for stacked block devices Goldwyn Rodrigues
  2017-10-04 13:55 ` [PATCH 1/9] QUEUE_FLAG_NOWAIT to indicate device supports nowait Goldwyn Rodrigues
@ 2017-10-04 13:55 ` Goldwyn Rodrigues
  2017-10-04 13:55 ` [PATCH 3/9] md: raid1 nowait support Goldwyn Rodrigues
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 13+ messages in thread
From: Goldwyn Rodrigues @ 2017-10-04 13:55 UTC (permalink / raw)
  To: linux-block; +Cc: axboe, shli, Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

Set queue flags to QUEUE_FLAG_NOWAIT to indicate REQ_NOWAIT
will be handled. If any of the underlying devices do not support
NOWAIT feature, we do not set the flag.

If the device is suspended, it returns -EWOULDBLOCK.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 drivers/md/md.c | 35 +++++++++++++++++++++++++++++++++++
 1 file changed, 35 insertions(+)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 0ff1bbf6c90e..7325f8be36b4 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -237,6 +237,19 @@ EXPORT_SYMBOL_GPL(md_new_event);
 static LIST_HEAD(all_mddevs);
 static DEFINE_SPINLOCK(all_mddevs_lock);
 
+static bool is_suspended(struct mddev *mddev, struct bio *bio)
+{
+	/* We can serve READ requests when device is suspended */
+	if (bio_data_dir(bio) == READ)
+		return false;
+
+	if (mddev->suspended)
+		return true;
+
+	return  (mddev->suspend_lo < bio_end_sector(bio) &&
+			mddev->suspend_hi > bio->bi_iter.bi_sector);
+}
+
 /*
  * iterates through all used mddevs in the system.
  * We take care to grab the all_mddevs_lock whenever navigating
@@ -316,6 +329,11 @@ static blk_qc_t md_make_request(struct request_queue *q, struct bio *bio)
 		bio_endio(bio);
 		return BLK_QC_T_NONE;
 	}
+	/* Bail out if we would have to wait for suspended device */
+	if ((bio->bi_opf & REQ_NOWAIT) && is_suspended(mddev, bio)) {
+		bio_wouldblock_error(bio);
+		return BLK_QC_T_NONE;
+	}
 
 	/*
 	 * save the sectors now since our bio can
@@ -5404,6 +5422,7 @@ int md_run(struct mddev *mddev)
 	int err;
 	struct md_rdev *rdev;
 	struct md_personality *pers;
+	bool nowait = true;
 
 	if (list_empty(&mddev->disks))
 		/* cannot run an array with no devices.. */
@@ -5470,8 +5489,15 @@ int md_run(struct mddev *mddev)
 			}
 		}
 		sysfs_notify_dirent_safe(rdev->sysfs_state);
+		if (!blk_queue_supports_nowait(rdev->bdev->bd_queue))
+			nowait = false;
 	}
 
+	/* Set the NOWAIT flags if all underlying devices support it */
+	if (nowait)
+		queue_flag_set_unlocked(QUEUE_FLAG_NOWAIT, mddev->queue);
+
+
 	if (mddev->bio_set == NULL) {
 		mddev->bio_set = bioset_create(BIO_POOL_SIZE, 0, BIOSET_NEED_BVECS);
 		if (!mddev->bio_set)
@@ -6554,6 +6580,15 @@ static int hot_add_disk(struct mddev *mddev, dev_t dev)
 	set_bit(MD_SB_CHANGE_DEVS, &mddev->sb_flags);
 	if (!mddev->thread)
 		md_update_sb(mddev, 1);
+
+	/* If the new disk does not support REQ_NOWAIT,
+	   disable on whole MD */
+	if (!blk_queue_supports_nowait(rdev->bdev->bd_queue)) {
+		pr_info("%s: Disabling nowait because %s does not support nowait\n",
+				mdname(mddev), bdevname(rdev->bdev,b));
+		queue_flag_clear_unlocked(QUEUE_FLAG_NOWAIT, mddev->queue);
+	}
+
 	/*
 	 * Kick recovery, maybe this spare has to be added to the
 	 * array immediately.
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/9] md: raid1 nowait support
  2017-10-04 13:55 [PATCH v2 0/9] Nowait support for stacked block devices Goldwyn Rodrigues
  2017-10-04 13:55 ` [PATCH 1/9] QUEUE_FLAG_NOWAIT to indicate device supports nowait Goldwyn Rodrigues
  2017-10-04 13:55 ` [PATCH 2/9] md: Add nowait support to md Goldwyn Rodrigues
@ 2017-10-04 13:55 ` Goldwyn Rodrigues
  2017-10-04 13:55 ` [PATCH 4/9] md: raid10 " Goldwyn Rodrigues
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 13+ messages in thread
From: Goldwyn Rodrigues @ 2017-10-04 13:55 UTC (permalink / raw)
  To: linux-block; +Cc: axboe, shli, Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

The RAID1 driver would bail with EAGAIN in case of:
 + I/O has to wait for a barrier
 + array is frozen
 + Area is suspended
 + There are too many pending I/O that it will be queued.

To facilitate error for wait barriers, wait_barrier() is
returning bool. True in case if there was a wait (or is not
required). False in case a wait was required, but was not performed.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 drivers/md/raid1.c | 88 +++++++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 68 insertions(+), 20 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index f3f3e40dc9d8..37ec283e67af 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -891,8 +891,9 @@ static void lower_barrier(struct r1conf *conf, sector_t sector_nr)
 	wake_up(&conf->wait_barrier);
 }
 
-static void _wait_barrier(struct r1conf *conf, int idx)
+static bool _wait_barrier(struct r1conf *conf, int idx, bool nowait)
 {
+	bool ret = true;
 	/*
 	 * We need to increase conf->nr_pending[idx] very early here,
 	 * then raise_barrier() can be blocked when it waits for
@@ -923,7 +924,7 @@ static void _wait_barrier(struct r1conf *conf, int idx)
 	 */
 	if (!READ_ONCE(conf->array_frozen) &&
 	    !atomic_read(&conf->barrier[idx]))
-		return;
+		return ret;
 
 	/*
 	 * After holding conf->resync_lock, conf->nr_pending[idx]
@@ -941,18 +942,29 @@ static void _wait_barrier(struct r1conf *conf, int idx)
 	 */
 	wake_up(&conf->wait_barrier);
 	/* Wait for the barrier in same barrier unit bucket to drop. */
-	wait_event_lock_irq(conf->wait_barrier,
-			    !conf->array_frozen &&
-			     !atomic_read(&conf->barrier[idx]),
-			    conf->resync_lock);
+	if (conf->array_frozen || atomic_read(&conf->barrier[idx])) {
+		if (nowait) {
+			ret = false;
+			goto dec_waiting;
+		} else {
+			wait_event_lock_irq(conf->wait_barrier,
+					!conf->array_frozen &&
+					!atomic_read(&conf->barrier[idx]),
+					conf->resync_lock);
+		}
+	}
 	atomic_inc(&conf->nr_pending[idx]);
+dec_waiting:
 	atomic_dec(&conf->nr_waiting[idx]);
 	spin_unlock_irq(&conf->resync_lock);
+	return ret;
 }
 
-static void wait_read_barrier(struct r1conf *conf, sector_t sector_nr)
+static bool wait_read_barrier(struct r1conf *conf, sector_t sector_nr,
+		bool nowait)
 {
 	int idx = sector_to_idx(sector_nr);
+	bool ret = true;
 
 	/*
 	 * Very similar to _wait_barrier(). The difference is, for read
@@ -964,7 +976,7 @@ static void wait_read_barrier(struct r1conf *conf, sector_t sector_nr)
 	atomic_inc(&conf->nr_pending[idx]);
 
 	if (!READ_ONCE(conf->array_frozen))
-		return;
+		return ret;
 
 	spin_lock_irq(&conf->resync_lock);
 	atomic_inc(&conf->nr_waiting[idx]);
@@ -975,19 +987,31 @@ static void wait_read_barrier(struct r1conf *conf, sector_t sector_nr)
 	 */
 	wake_up(&conf->wait_barrier);
 	/* Wait for array to be unfrozen */
-	wait_event_lock_irq(conf->wait_barrier,
-			    !conf->array_frozen,
-			    conf->resync_lock);
+	if (conf->array_frozen) {
+		/* If nowait flag is set, return false to
+		 * show we did not wait
+		 */
+		if (nowait) {
+			ret = false;
+			goto dec_waiting;
+		} else {
+			wait_event_lock_irq(conf->wait_barrier,
+					!conf->array_frozen,
+					conf->resync_lock);
+		}
+	}
 	atomic_inc(&conf->nr_pending[idx]);
+dec_waiting:
 	atomic_dec(&conf->nr_waiting[idx]);
 	spin_unlock_irq(&conf->resync_lock);
+	return ret;
 }
 
-static void wait_barrier(struct r1conf *conf, sector_t sector_nr)
+static bool wait_barrier(struct r1conf *conf, sector_t sector_nr, bool nowait)
 {
 	int idx = sector_to_idx(sector_nr);
 
-	_wait_barrier(conf, idx);
+	return _wait_barrier(conf, idx, nowait);
 }
 
 static void wait_all_barriers(struct r1conf *conf)
@@ -995,7 +1019,7 @@ static void wait_all_barriers(struct r1conf *conf)
 	int idx;
 
 	for (idx = 0; idx < BARRIER_BUCKETS_NR; idx++)
-		_wait_barrier(conf, idx);
+		_wait_barrier(conf, idx, false);
 }
 
 static void _allow_barrier(struct r1conf *conf, int idx)
@@ -1212,7 +1236,11 @@ static void raid1_read_request(struct mddev *mddev, struct bio *bio,
 	 * Still need barrier for READ in case that whole
 	 * array is frozen.
 	 */
-	wait_read_barrier(conf, bio->bi_iter.bi_sector);
+	if (!wait_read_barrier(conf, bio->bi_iter.bi_sector,
+				bio->bi_opf & REQ_NOWAIT)) {
+		bio_wouldblock_error(bio);
+		return;
+	}
 
 	if (!r1_bio)
 		r1_bio = alloc_r1bio(mddev, bio);
@@ -1321,6 +1349,11 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
 		 * an interruptible wait.
 		 */
 		DEFINE_WAIT(w);
+		if (bio->bi_opf & REQ_NOWAIT) {
+			bio_wouldblock_error(bio);
+			return;
+		}
+
 		for (;;) {
 			sigset_t full, old;
 			prepare_to_wait(&conf->wait_barrier,
@@ -1339,17 +1372,26 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
 		}
 		finish_wait(&conf->wait_barrier, &w);
 	}
-	wait_barrier(conf, bio->bi_iter.bi_sector);
-
-	r1_bio = alloc_r1bio(mddev, bio);
-	r1_bio->sectors = max_write_sectors;
+	if (!wait_barrier(conf, bio->bi_iter.bi_sector,
+				bio->bi_opf & REQ_NOWAIT)) {
+		bio_wouldblock_error(bio);
+		return;
+	}
 
 	if (conf->pending_count >= max_queued_requests) {
 		md_wakeup_thread(mddev->thread);
+		if (bio->bi_opf & REQ_NOWAIT) {
+			bio_wouldblock_error(bio);
+			return;
+		}
 		raid1_log(mddev, "wait queued");
 		wait_event(conf->wait_barrier,
 			   conf->pending_count < max_queued_requests);
 	}
+
+	r1_bio = alloc_r1bio(mddev, bio);
+	r1_bio->sectors = max_write_sectors;
+
 	/* first select target devices under rcu_lock and
 	 * inc refcount on their rdev.  Record them by setting
 	 * bios[x] to bio
@@ -1435,9 +1477,15 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
 				rdev_dec_pending(conf->mirrors[j].rdev, mddev);
 		r1_bio->state = 0;
 		allow_barrier(conf, bio->bi_iter.bi_sector);
+
+		if (bio->bi_opf & REQ_NOWAIT) {
+			bio_wouldblock_error(bio);
+			free_r1bio(r1_bio);
+			return;
+		}
 		raid1_log(mddev, "wait rdev %d blocked", blocked_rdev->raid_disk);
 		md_wait_for_blocked_rdev(blocked_rdev, mddev);
-		wait_barrier(conf, bio->bi_iter.bi_sector);
+		wait_barrier(conf, bio->bi_iter.bi_sector, false);
 		goto retry_write;
 	}
 
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 4/9] md: raid10 nowait support
  2017-10-04 13:55 [PATCH v2 0/9] Nowait support for stacked block devices Goldwyn Rodrigues
                   ` (2 preceding siblings ...)
  2017-10-04 13:55 ` [PATCH 3/9] md: raid1 nowait support Goldwyn Rodrigues
@ 2017-10-04 13:55 ` Goldwyn Rodrigues
  2017-10-04 13:55 ` [PATCH 5/9] md: raid5 " Goldwyn Rodrigues
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 13+ messages in thread
From: Goldwyn Rodrigues @ 2017-10-04 13:55 UTC (permalink / raw)
  To: linux-block; +Cc: axboe, shli, Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

Bail and status to EAGAIN if raid10 is going to wait for:
 + barriers
 + reshape operation
 + Too many queued requests

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 drivers/md/raid10.c | 67 ++++++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 51 insertions(+), 16 deletions(-)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 374df5796649..b0701f5751fe 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -969,8 +969,9 @@ static void lower_barrier(struct r10conf *conf)
 	wake_up(&conf->wait_barrier);
 }
 
-static void wait_barrier(struct r10conf *conf)
+static bool wait_barrier(struct r10conf *conf, bool nowait)
 {
+	bool ret = true;
 	spin_lock_irq(&conf->resync_lock);
 	if (conf->barrier) {
 		conf->nr_waiting++;
@@ -984,19 +985,25 @@ static void wait_barrier(struct r10conf *conf)
 		 * count down.
 		 */
 		raid10_log(conf->mddev, "wait barrier");
-		wait_event_lock_irq(conf->wait_barrier,
-				    !conf->barrier ||
-				    (atomic_read(&conf->nr_pending) &&
-				     current->bio_list &&
-				     (!bio_list_empty(&current->bio_list[0]) ||
-				      !bio_list_empty(&current->bio_list[1]))),
-				    conf->resync_lock);
+		if (!nowait)
+			wait_event_lock_irq(conf->wait_barrier,
+					    !conf->barrier ||
+				            (atomic_read(&conf->nr_pending) &&
+				             current->bio_list &&
+				             (!bio_list_empty(&current->bio_list[0]) ||
+				              !bio_list_empty(&current->bio_list[1]))),
+					    conf->resync_lock);
+		else
+			ret = false;
 		conf->nr_waiting--;
 		if (!conf->nr_waiting)
 			wake_up(&conf->wait_barrier);
 	}
-	atomic_inc(&conf->nr_pending);
+	/* Do not increment nr_pending if we din't wait */
+	if (ret)
+		atomic_inc(&conf->nr_pending);
 	spin_unlock_irq(&conf->resync_lock);
+	return ret;
 }
 
 static void allow_barrier(struct r10conf *conf)
@@ -1148,7 +1155,10 @@ static void raid10_read_request(struct mddev *mddev, struct bio *bio,
 	 * thread has put up a bar for new requests.
 	 * Continue immediately if no resync is active currently.
 	 */
-	wait_barrier(conf);
+	if (!wait_barrier(conf, bio->bi_opf & REQ_NOWAIT)) {
+		bio_wouldblock_error(bio);
+		return;
+	}
 
 	sectors = r10_bio->sectors;
 	while (test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery) &&
@@ -1159,12 +1169,16 @@ static void raid10_read_request(struct mddev *mddev, struct bio *bio,
 		 * pass
 		 */
 		raid10_log(conf->mddev, "wait reshape");
+		if (bio->bi_opf & REQ_NOWAIT) {
+			bio_wouldblock_error(bio);
+			return;
+		}
 		allow_barrier(conf);
 		wait_event(conf->wait_barrier,
 			   conf->reshape_progress <= bio->bi_iter.bi_sector ||
 			   conf->reshape_progress >= bio->bi_iter.bi_sector +
 			   sectors);
-		wait_barrier(conf);
+		wait_barrier(conf, false);
 	}
 
 	rdev = read_balance(conf, r10_bio, &max_sectors);
@@ -1298,12 +1312,19 @@ static void raid10_write_request(struct mddev *mddev, struct bio *bio,
 	 * thread has put up a bar for new requests.
 	 * Continue immediately if no resync is active currently.
 	 */
-	wait_barrier(conf);
+	if (!wait_barrier(conf, bio->bi_opf & REQ_NOWAIT)) {
+		bio_wouldblock_error(bio);
+		return;
+	}
 
 	sectors = r10_bio->sectors;
 	while (test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery) &&
 	    bio->bi_iter.bi_sector < conf->reshape_progress &&
 	    bio->bi_iter.bi_sector + sectors > conf->reshape_progress) {
+		if (bio->bi_opf & REQ_NOWAIT) {
+			bio_wouldblock_error(bio);
+			return;
+		}
 		/*
 		 * IO spans the reshape position.  Need to wait for reshape to
 		 * pass
@@ -1314,7 +1335,7 @@ static void raid10_write_request(struct mddev *mddev, struct bio *bio,
 			   conf->reshape_progress <= bio->bi_iter.bi_sector ||
 			   conf->reshape_progress >= bio->bi_iter.bi_sector +
 			   sectors);
-		wait_barrier(conf);
+		wait_barrier(conf, false);
 	}
 
 	if (test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery) &&
@@ -1328,6 +1349,10 @@ static void raid10_write_request(struct mddev *mddev, struct bio *bio,
 		set_mask_bits(&mddev->sb_flags, 0,
 			      BIT(MD_SB_CHANGE_DEVS) | BIT(MD_SB_CHANGE_PENDING));
 		md_wakeup_thread(mddev->thread);
+		if (bio->bi_opf & REQ_NOWAIT) {
+			bio_wouldblock_error(bio);
+			return;
+		}
 		raid10_log(conf->mddev, "wait reshape metadata");
 		wait_event(mddev->sb_wait,
 			   !test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags));
@@ -1337,6 +1362,10 @@ static void raid10_write_request(struct mddev *mddev, struct bio *bio,
 
 	if (conf->pending_count >= max_queued_requests) {
 		md_wakeup_thread(mddev->thread);
+		if (bio->bi_opf & REQ_NOWAIT) {
+			bio_wouldblock_error(bio);
+			return;
+		}
 		raid10_log(mddev, "wait queued");
 		wait_event(conf->wait_barrier,
 			   conf->pending_count < max_queued_requests);
@@ -1462,9 +1491,15 @@ static void raid10_write_request(struct mddev *mddev, struct bio *bio,
 			}
 		}
 		allow_barrier(conf);
+
+		/* Don't wait for REQ_NOWAIT */
+		if (bio->bi_opf & REQ_NOWAIT) {
+			bio_wouldblock_error(bio);
+			return;
+		}
 		raid10_log(conf->mddev, "wait rdev %d blocked", blocked_rdev->raid_disk);
 		md_wait_for_blocked_rdev(blocked_rdev, mddev);
-		wait_barrier(conf);
+		wait_barrier(conf, false);
 		goto retry_write;
 	}
 
@@ -1693,7 +1728,7 @@ static void print_conf(struct r10conf *conf)
 
 static void close_sync(struct r10conf *conf)
 {
-	wait_barrier(conf);
+	wait_barrier(conf, false);
 	allow_barrier(conf);
 
 	mempool_destroy(conf->r10buf_pool);
@@ -4365,7 +4400,7 @@ static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr,
 	if (need_flush ||
 	    time_after(jiffies, conf->reshape_checkpoint + 10*HZ)) {
 		/* Need to update reshape_position in metadata */
-		wait_barrier(conf);
+		wait_barrier(conf, false);
 		mddev->reshape_position = conf->reshape_progress;
 		if (mddev->reshape_backwards)
 			mddev->curr_resync_completed = raid10_size(mddev, 0, 0)
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 5/9] md: raid5 nowait support
  2017-10-04 13:55 [PATCH v2 0/9] Nowait support for stacked block devices Goldwyn Rodrigues
                   ` (3 preceding siblings ...)
  2017-10-04 13:55 ` [PATCH 4/9] md: raid10 " Goldwyn Rodrigues
@ 2017-10-04 13:55 ` Goldwyn Rodrigues
  2017-10-04 13:55 ` [PATCH 6/9] dm: add " Goldwyn Rodrigues
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 13+ messages in thread
From: Goldwyn Rodrigues @ 2017-10-04 13:55 UTC (permalink / raw)
  To: linux-block; +Cc: axboe, shli, Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

Return EAGAIN in case RAID5 would block because of waiting due to
reshaping.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 drivers/md/raid5.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 928e24a07133..707a3907d100 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -5601,6 +5601,13 @@ static bool raid5_make_request(struct mddev *mddev, struct bio * bi)
 	last_sector = bio_end_sector(bi);
 	bi->bi_next = NULL;
 
+	if ((bi->bi_opf & REQ_NOWAIT) &&
+	    conf->reshape_progress != MaxSector &&
+	    (mddev->reshape_backwards ? logical_sector < conf->reshape_safe : last_sector >= conf->reshape_safe)) {
+		bio_wouldblock_error(bi);
+		return true;
+	}
+
 	prepare_to_wait(&conf->wait_for_overlap, &w, TASK_UNINTERRUPTIBLE);
 	for (;logical_sector < last_sector; logical_sector += STRIPE_SECTORS) {
 		int previous;
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 6/9] dm: add nowait support
  2017-10-04 13:55 [PATCH v2 0/9] Nowait support for stacked block devices Goldwyn Rodrigues
                   ` (4 preceding siblings ...)
  2017-10-04 13:55 ` [PATCH 5/9] md: raid5 " Goldwyn Rodrigues
@ 2017-10-04 13:55 ` Goldwyn Rodrigues
  2017-10-04 13:55 ` [PATCH 7/9] dm: Add nowait support to raid1 Goldwyn Rodrigues
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 13+ messages in thread
From: Goldwyn Rodrigues @ 2017-10-04 13:55 UTC (permalink / raw)
  To: linux-block; +Cc: axboe, shli, Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

Add support for bio based dm devices, which exclusively
sets a make_request_fn(). Request based devices are supported
by default.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 drivers/md/dm.c | 28 +++++++++++++++++++++++++++-
 1 file changed, 27 insertions(+), 1 deletion(-)

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 6e54145969c5..b721716862a2 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1522,6 +1522,8 @@ static blk_qc_t dm_make_request(struct request_queue *q, struct bio *bio)
 
 		if (!(bio->bi_opf & REQ_RAHEAD))
 			queue_io(md, bio);
+		else if (bio->bi_opf & REQ_NOWAIT)
+			bio_wouldblock_error(bio);
 		else
 			bio_io_error(bio);
 		return BLK_QC_T_NONE;
@@ -2011,6 +2013,29 @@ struct queue_limits *dm_get_queue_limits(struct mapped_device *md)
 }
 EXPORT_SYMBOL_GPL(dm_get_queue_limits);
 
+static int device_supports_nowait(struct dm_target *ti, struct dm_dev *dev,
+				  sector_t start, sector_t len, void *data)
+{
+	struct request_queue *q = bdev_get_queue(dev->bdev);
+	return q && blk_queue_supports_nowait(q);
+}
+
+static bool dm_table_supports_nowait(struct dm_table *t)
+{
+	struct dm_target *ti;
+	unsigned i;
+
+	for (i = 0; i < dm_table_get_num_targets(t); i++) {
+		ti = dm_table_get_target(t, i);
+
+		if (!ti->type->iterate_devices ||
+		    !ti->type->iterate_devices(ti, device_supports_nowait, NULL))
+			return false;
+	}
+
+	return true;
+}
+
 /*
  * Setup the DM device's queue based on md's type
  */
@@ -2044,7 +2069,8 @@ int dm_setup_md_queue(struct mapped_device *md, struct dm_table *t)
 		 */
 		bioset_free(md->queue->bio_split);
 		md->queue->bio_split = NULL;
-
+		if (dm_table_supports_nowait(t))
+			queue_flag_set_unlocked(QUEUE_FLAG_NOWAIT, md->queue);
 		if (type == DM_TYPE_DAX_BIO_BASED)
 			queue_flag_set_unlocked(QUEUE_FLAG_DAX, md->queue);
 		break;
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 7/9] dm: Add nowait support to raid1
  2017-10-04 13:55 [PATCH v2 0/9] Nowait support for stacked block devices Goldwyn Rodrigues
                   ` (5 preceding siblings ...)
  2017-10-04 13:55 ` [PATCH 6/9] dm: add " Goldwyn Rodrigues
@ 2017-10-04 13:55 ` Goldwyn Rodrigues
  2017-10-04 13:55 ` [PATCH 8/9] dm: Add nowait support to dm-delay Goldwyn Rodrigues
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 13+ messages in thread
From: Goldwyn Rodrigues @ 2017-10-04 13:55 UTC (permalink / raw)
  To: linux-block; +Cc: axboe, shli, Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

If the I/O would block because the devices are syncing, bail.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 drivers/md/dm-raid1.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/drivers/md/dm-raid1.c b/drivers/md/dm-raid1.c
index c0b82136b2d1..96044b7787f9 100644
--- a/drivers/md/dm-raid1.c
+++ b/drivers/md/dm-raid1.c
@@ -1204,6 +1204,14 @@ static int mirror_map(struct dm_target *ti, struct bio *bio)
 	if (rw == WRITE) {
 		/* Save region for mirror_end_io() handler */
 		bio_record->write_region = dm_rh_bio_to_region(ms->rh, bio);
+		if (bio->bi_opf & REQ_NOWAIT) {
+			int state = dm_rh_get_state(ms->rh,
+					bio_record->write_region, 1);
+			if (state != DM_RH_CLEAN && state != DM_RH_DIRTY) {
+				bio_wouldblock_error(bio);
+				return DM_MAPIO_SUBMITTED;
+			}
+		}
 		queue_bio(ms, bio, rw);
 		return DM_MAPIO_SUBMITTED;
 	}
@@ -1219,6 +1227,11 @@ static int mirror_map(struct dm_target *ti, struct bio *bio)
 		if (bio->bi_opf & REQ_RAHEAD)
 			return DM_MAPIO_KILL;
 
+		if (bio->bi_opf & REQ_NOWAIT) {
+			bio_wouldblock_error(bio);
+			return DM_MAPIO_SUBMITTED;
+		}
+
 		queue_bio(ms, bio, rw);
 		return DM_MAPIO_SUBMITTED;
 	}
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 8/9] dm: Add nowait support to dm-delay
  2017-10-04 13:55 [PATCH v2 0/9] Nowait support for stacked block devices Goldwyn Rodrigues
                   ` (6 preceding siblings ...)
  2017-10-04 13:55 ` [PATCH 7/9] dm: Add nowait support to raid1 Goldwyn Rodrigues
@ 2017-10-04 13:55 ` Goldwyn Rodrigues
  2017-10-04 13:55 ` [PATCH 9/9] dm-mpath: Add nowait support Goldwyn Rodrigues
  2017-10-05 17:19 ` [PATCH v2 0/9] Nowait support for stacked block devices Shaohua Li
  9 siblings, 0 replies; 13+ messages in thread
From: Goldwyn Rodrigues @ 2017-10-04 13:55 UTC (permalink / raw)
  To: linux-block; +Cc: axboe, shli, Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

I/O should bail out if any value for delay is set.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 drivers/md/dm-delay.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/md/dm-delay.c b/drivers/md/dm-delay.c
index 2209a9700acd..e67a7042ae68 100644
--- a/drivers/md/dm-delay.c
+++ b/drivers/md/dm-delay.c
@@ -240,6 +240,10 @@ static int delay_bio(struct delay_c *dc, int delay, struct bio *bio)
 	if (!delay || !atomic_read(&dc->may_delay))
 		return DM_MAPIO_REMAPPED;
 
+	if (bio->bi_opf & REQ_NOWAIT) {
+		bio_wouldblock_error(bio);
+		return DM_MAPIO_SUBMITTED;
+	}
 	delayed = dm_per_bio_data(bio, sizeof(struct dm_delay_info));
 
 	delayed->context = dc;
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 9/9] dm-mpath: Add nowait support
  2017-10-04 13:55 [PATCH v2 0/9] Nowait support for stacked block devices Goldwyn Rodrigues
                   ` (7 preceding siblings ...)
  2017-10-04 13:55 ` [PATCH 8/9] dm: Add nowait support to dm-delay Goldwyn Rodrigues
@ 2017-10-04 13:55 ` Goldwyn Rodrigues
  2017-10-05 17:19 ` [PATCH v2 0/9] Nowait support for stacked block devices Shaohua Li
  9 siblings, 0 replies; 13+ messages in thread
From: Goldwyn Rodrigues @ 2017-10-04 13:55 UTC (permalink / raw)
  To: linux-block; +Cc: axboe, shli, Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

If there are no queues, bail if REQ_NOWAIT is set instead
of queueing up I/O.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 drivers/md/dm-mpath.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index 11f273d2f018..d714c1b1b066 100644
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -542,6 +542,11 @@ static int __multipath_map_bio(struct multipath *m, struct bio *bio, struct dm_m
 
 	if ((pgpath && queue_io) ||
 	    (!pgpath && test_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags))) {
+		/* Bail if nowait is set */
+		if (bio->bi_opf & REQ_NOWAIT) {
+			bio_wouldblock_error(bio);
+			return DM_MAPIO_SUBMITTED;
+		}
 		/* Queue for the daemon to resubmit */
 		spin_lock_irqsave(&m->lock, flags);
 		bio_list_add(&m->queued_bios, bio);
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 0/9] Nowait support for stacked block devices
  2017-10-04 13:55 [PATCH v2 0/9] Nowait support for stacked block devices Goldwyn Rodrigues
                   ` (8 preceding siblings ...)
  2017-10-04 13:55 ` [PATCH 9/9] dm-mpath: Add nowait support Goldwyn Rodrigues
@ 2017-10-05 17:19 ` Shaohua Li
  2017-10-06 12:01   ` Goldwyn Rodrigues
  9 siblings, 1 reply; 13+ messages in thread
From: Shaohua Li @ 2017-10-05 17:19 UTC (permalink / raw)
  To: Goldwyn Rodrigues; +Cc: linux-block, axboe

On Wed, Oct 04, 2017 at 08:55:02AM -0500, Goldwyn Rodrigues wrote:
> This is a continuation of the nowait support which was incorporated
> a while back. We introduced REQ_NOWAIT which would return immediately
> if the call would block at the block layer. Request based-devices
> do not wait. However, bio based devices (the ones which exclusively
> call make_request_fn) need to be trained to handle REQ_NOWAIT.
> 
> This effort covers the devices under MD and DM which would block
> for any reason. If there should be more devices or situations
> which need to be covered, please let me know.
> 
> The problem with partial writes discussed during v1 turned out
> to be a bug in partial writes during direct I/O and is fixed
> by the submitted patch[1].
> 
> Changes since v1:
>  - mddev to return early in case the device is suspended, within the md code as opposed to ->make_request()
>  - Check for nowait support with all the lower devices. Same with if adding a device which does not support nowait.
>  - Nowait under each raid is checked before the final I/O submission for the entire I/O.
> 
> [1] https://patchwork.kernel.org/patch/9979887/

Does this fix the partial IO issue we discussed before? It looks not to me. The
partial IO bailed out could be any part of an IO, so simply returning the
successed size doesn't help. Am I missing anything? I didn't follow the
discussion, maybe Jens knew.

Thanks,
Shaohua

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 0/9] Nowait support for stacked block devices
  2017-10-05 17:19 ` [PATCH v2 0/9] Nowait support for stacked block devices Shaohua Li
@ 2017-10-06 12:01   ` Goldwyn Rodrigues
  2017-10-10 22:36     ` Shaohua Li
  0 siblings, 1 reply; 13+ messages in thread
From: Goldwyn Rodrigues @ 2017-10-06 12:01 UTC (permalink / raw)
  To: Shaohua Li; +Cc: linux-block, axboe



On 10/05/2017 12:19 PM, Shaohua Li wrote:
> On Wed, Oct 04, 2017 at 08:55:02AM -0500, Goldwyn Rodrigues wrote:
>> This is a continuation of the nowait support which was incorporated
>> a while back. We introduced REQ_NOWAIT which would return immediately
>> if the call would block at the block layer. Request based-devices
>> do not wait. However, bio based devices (the ones which exclusively
>> call make_request_fn) need to be trained to handle REQ_NOWAIT.
>>
>> This effort covers the devices under MD and DM which would block
>> for any reason. If there should be more devices or situations
>> which need to be covered, please let me know.
>>
>> The problem with partial writes discussed during v1 turned out
>> to be a bug in partial writes during direct I/O and is fixed
>> by the submitted patch[1].
>>
>> Changes since v1:
>>  - mddev to return early in case the device is suspended, within the md code as opposed to ->make_request()
>>  - Check for nowait support with all the lower devices. Same with if adding a device which does not support nowait.
>>  - Nowait under each raid is checked before the final I/O submission for the entire I/O.
>>
>> [1] https://patchwork.kernel.org/patch/9979887/
> 
> Does this fix the partial IO issue we discussed before? It looks not to me. The
> partial IO bailed out could be any part of an IO, so simply returning the
> successed size doesn't help. Am I missing anything? I didn't follow the
> discussion, maybe Jens knew.
> 

If the partial IO bailed out is any part of IO, isn't it supposed to
return the size of the IO succeeded _so far_? If a latter part of the IO
succeeds (with a failure in between) what are you supposed to return to
user in case of direct write()s? Would that even be correct in case it
is a file overwrite?

-- 
Goldwyn

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 0/9] Nowait support for stacked block devices
  2017-10-06 12:01   ` Goldwyn Rodrigues
@ 2017-10-10 22:36     ` Shaohua Li
  0 siblings, 0 replies; 13+ messages in thread
From: Shaohua Li @ 2017-10-10 22:36 UTC (permalink / raw)
  To: Goldwyn Rodrigues; +Cc: linux-block, axboe

On Fri, Oct 06, 2017 at 07:01:19AM -0500, Goldwyn Rodrigues wrote:
> 
> 
> On 10/05/2017 12:19 PM, Shaohua Li wrote:
> > On Wed, Oct 04, 2017 at 08:55:02AM -0500, Goldwyn Rodrigues wrote:
> >> This is a continuation of the nowait support which was incorporated
> >> a while back. We introduced REQ_NOWAIT which would return immediately
> >> if the call would block at the block layer. Request based-devices
> >> do not wait. However, bio based devices (the ones which exclusively
> >> call make_request_fn) need to be trained to handle REQ_NOWAIT.
> >>
> >> This effort covers the devices under MD and DM which would block
> >> for any reason. If there should be more devices or situations
> >> which need to be covered, please let me know.
> >>
> >> The problem with partial writes discussed during v1 turned out
> >> to be a bug in partial writes during direct I/O and is fixed
> >> by the submitted patch[1].
> >>
> >> Changes since v1:
> >>  - mddev to return early in case the device is suspended, within the md code as opposed to ->make_request()
> >>  - Check for nowait support with all the lower devices. Same with if adding a device which does not support nowait.
> >>  - Nowait under each raid is checked before the final I/O submission for the entire I/O.
> >>
> >> [1] https://patchwork.kernel.org/patch/9979887/
> > 
> > Does this fix the partial IO issue we discussed before? It looks not to me. The
> > partial IO bailed out could be any part of an IO, so simply returning the
> > successed size doesn't help. Am I missing anything? I didn't follow the
> > discussion, maybe Jens knew.
> > 
> 
> If the partial IO bailed out is any part of IO, isn't it supposed to
> return the size of the IO succeeded _so far_? If a latter part of the IO
> succeeds (with a failure in between) what are you supposed to return to
> user in case of direct write()s? Would that even be correct in case it
> is a file overwrite?

I didn't argue about the return value. To me the partial IO issue can't be
fixed simply by whatever 'return value'.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2017-10-10 22:36 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-10-04 13:55 [PATCH v2 0/9] Nowait support for stacked block devices Goldwyn Rodrigues
2017-10-04 13:55 ` [PATCH 1/9] QUEUE_FLAG_NOWAIT to indicate device supports nowait Goldwyn Rodrigues
2017-10-04 13:55 ` [PATCH 2/9] md: Add nowait support to md Goldwyn Rodrigues
2017-10-04 13:55 ` [PATCH 3/9] md: raid1 nowait support Goldwyn Rodrigues
2017-10-04 13:55 ` [PATCH 4/9] md: raid10 " Goldwyn Rodrigues
2017-10-04 13:55 ` [PATCH 5/9] md: raid5 " Goldwyn Rodrigues
2017-10-04 13:55 ` [PATCH 6/9] dm: add " Goldwyn Rodrigues
2017-10-04 13:55 ` [PATCH 7/9] dm: Add nowait support to raid1 Goldwyn Rodrigues
2017-10-04 13:55 ` [PATCH 8/9] dm: Add nowait support to dm-delay Goldwyn Rodrigues
2017-10-04 13:55 ` [PATCH 9/9] dm-mpath: Add nowait support Goldwyn Rodrigues
2017-10-05 17:19 ` [PATCH v2 0/9] Nowait support for stacked block devices Shaohua Li
2017-10-06 12:01   ` Goldwyn Rodrigues
2017-10-10 22:36     ` Shaohua Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox