linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3 v2] Preallocate flush bio
@ 2017-06-16 14:04 David Sterba
  2017-06-16 14:04 ` [PATCH 1/3 v2] btrfs: preallocate device " David Sterba
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: David Sterba @ 2017-06-16 14:04 UTC (permalink / raw)
  To: linux-btrfs; +Cc: anand.jain, David Sterba

This patchset follows the updates in the write_dev_flush function. The flush
bio can be preallocated at the device creation time, so we avoid repeated
alloc/free.

v2:
- merge 'flush_bio_sent' from 4 to 1 and dropped patch 4
- dropped sysfs tunable (patch 5)
- fix typo in patch 1 changelog

David Sterba (3):
  btrfs: preallocate device flush bio
  btrfs: account as waiting for IO, while waiting fot the flush bio
    completion
  btrfs: move dev stats accounting out of wait_dev_flush

 fs/btrfs/disk-io.c | 38 +++++++++++---------------------------
 fs/btrfs/volumes.c | 12 ++++++++++++
 fs/btrfs/volumes.h |  1 +
 3 files changed, 24 insertions(+), 27 deletions(-)

-- 
2.13.0


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/3 v2] btrfs: preallocate device flush bio
  2017-06-16 14:04 [PATCH 0/3 v2] Preallocate flush bio David Sterba
@ 2017-06-16 14:04 ` David Sterba
  2017-06-16 14:04 ` [PATCH 2/3 v2] btrfs: account as waiting for IO, while waiting fot the flush bio completion David Sterba
  2017-06-16 14:04 ` [PATCH 3/3 v2] btrfs: move dev stats accounting out of wait_dev_flush David Sterba
  2 siblings, 0 replies; 5+ messages in thread
From: David Sterba @ 2017-06-16 14:04 UTC (permalink / raw)
  To: linux-btrfs; +Cc: anand.jain, David Sterba

For devices that support flushing, we allocate a bio, submit, wait for
it and then free it. The bio allocation does not fail so ENOMEM is not a
problem but we still may unnecessarily stress the allocation subsystem.

Instead, we can allocate the bio at the same time we allocate the device
and reuse it each time we need to flush the barriers. The bio is reset
before each use. Reference counting is simplified to just device
allocation (get) and freeing (put).

The bio used to be submitted through the integrity checker which will
find out that bio has no data attached and call submit_bio.

Status of the bio in flight needs to be tracked separately in case the
device caches get switched off between write and wait.

Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/disk-io.c | 25 +++++++------------------
 fs/btrfs/volumes.c | 12 ++++++++++++
 fs/btrfs/volumes.h |  1 +
 3 files changed, 20 insertions(+), 18 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 2b00ebff13f8..d824c0fc6821 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3482,9 +3482,7 @@ static int write_dev_supers(struct btrfs_device *device,
  */
 static void btrfs_end_empty_barrier(struct bio *bio)
 {
-	if (bio->bi_private)
-		complete(bio->bi_private);
-	bio_put(bio);
+	complete(bio->bi_private);
 }
 
 /*
@@ -3494,26 +3492,20 @@ static void btrfs_end_empty_barrier(struct bio *bio)
 static void write_dev_flush(struct btrfs_device *device)
 {
 	struct request_queue *q = bdev_get_queue(device->bdev);
-	struct bio *bio;
+	struct bio *bio = device->flush_bio;
 
 	if (!test_bit(QUEUE_FLAG_WC, &q->queue_flags))
 		return;
 
-	/*
-	 * one reference for us, and we leave it for the
-	 * caller
-	 */
-	device->flush_bio = NULL;
-	bio = btrfs_io_bio_alloc(0);
+	bio_reset(bio);
 	bio->bi_end_io = btrfs_end_empty_barrier;
 	bio->bi_bdev = device->bdev;
 	bio->bi_opf = REQ_OP_WRITE | REQ_SYNC | REQ_PREFLUSH;
 	init_completion(&device->flush_wait);
 	bio->bi_private = &device->flush_wait;
-	device->flush_bio = bio;
 
-	bio_get(bio);
-	btrfsic_submit_bio(bio);
+	submit_bio(bio);
+	device->flush_bio_sent = 1;
 }
 
 /*
@@ -3524,9 +3516,10 @@ static int wait_dev_flush(struct btrfs_device *device)
 	int ret = 0;
 	struct bio *bio = device->flush_bio;
 
-	if (!bio)
+	if (!device->flush_bio_sent)
 		return 0;
 
+	device->flush_bio_sent = 0;
 	wait_for_completion(&device->flush_wait);
 
 	if (bio->bi_error) {
@@ -3535,10 +3528,6 @@ static int wait_dev_flush(struct btrfs_device *device)
 				BTRFS_DEV_STAT_FLUSH_ERRS);
 	}
 
-	/* drop the reference from the wait == 0 run */
-	bio_put(bio);
-	device->flush_bio = NULL;
-
 	return ret;
 }
 
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 8bb1f4e5905a..251ae81e4363 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -242,6 +242,17 @@ static struct btrfs_device *__alloc_device(void)
 	if (!dev)
 		return ERR_PTR(-ENOMEM);
 
+	/*
+	 * Preallocate a bio that's always going to be used for flushing device
+	 * barriers and matches the device lifespan
+	 */
+	dev->flush_bio = bio_alloc_bioset(GFP_KERNEL, 0, NULL);
+	if (!dev->flush_bio) {
+		kfree(dev);
+		return ERR_PTR(-ENOMEM);
+	}
+	bio_get(dev->flush_bio);
+
 	INIT_LIST_HEAD(&dev->dev_list);
 	INIT_LIST_HEAD(&dev->dev_alloc_list);
 	INIT_LIST_HEAD(&dev->resized_list);
@@ -838,6 +849,7 @@ static void __free_device(struct work_struct *work)
 
 	device = container_of(work, struct btrfs_device, rcu_work);
 	rcu_string_free(device->name);
+	bio_put(device->flush_bio);
 	kfree(device);
 }
 
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 35327efecdbb..6f45fd60d15a 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -75,6 +75,7 @@ struct btrfs_device {
 	int can_discard;
 	int is_tgtdev_for_dev_replace;
 	int last_flush_error;
+	int flush_bio_sent;
 
 #ifdef __BTRFS_NEED_DEVICE_DATA_ORDERED
 	seqcount_t data_seqcount;
-- 
2.13.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/3 v2] btrfs: account as waiting for IO, while waiting fot the flush bio completion
  2017-06-16 14:04 [PATCH 0/3 v2] Preallocate flush bio David Sterba
  2017-06-16 14:04 ` [PATCH 1/3 v2] btrfs: preallocate device " David Sterba
@ 2017-06-16 14:04 ` David Sterba
  2017-06-16 14:04 ` [PATCH 3/3 v2] btrfs: move dev stats accounting out of wait_dev_flush David Sterba
  2 siblings, 0 replies; 5+ messages in thread
From: David Sterba @ 2017-06-16 14:04 UTC (permalink / raw)
  To: linux-btrfs; +Cc: anand.jain, David Sterba

Similar to what submit_bio_wait does, we should account for IO while
waiting for a bio completion. This has marginal visible effects, flush
bio is short-lived.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/disk-io.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index d824c0fc6821..85f931c5e63c 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3520,7 +3520,7 @@ static int wait_dev_flush(struct btrfs_device *device)
 		return 0;
 
 	device->flush_bio_sent = 0;
-	wait_for_completion(&device->flush_wait);
+	wait_for_completion_io(&device->flush_wait);
 
 	if (bio->bi_error) {
 		ret = bio->bi_error;
-- 
2.13.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 3/3 v2] btrfs: move dev stats accounting out of wait_dev_flush
  2017-06-16 14:04 [PATCH 0/3 v2] Preallocate flush bio David Sterba
  2017-06-16 14:04 ` [PATCH 1/3 v2] btrfs: preallocate device " David Sterba
  2017-06-16 14:04 ` [PATCH 2/3 v2] btrfs: account as waiting for IO, while waiting fot the flush bio completion David Sterba
@ 2017-06-16 14:04 ` David Sterba
  2017-06-17 13:53   ` Anand Jain
  2 siblings, 1 reply; 5+ messages in thread
From: David Sterba @ 2017-06-16 14:04 UTC (permalink / raw)
  To: linux-btrfs; +Cc: anand.jain, David Sterba

We should really just wait in wait_dev_flush and let the caller decide
what to do with the error value.

Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/disk-io.c | 11 +++--------
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 85f931c5e63c..5992e42c5e3c 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3513,7 +3513,6 @@ static void write_dev_flush(struct btrfs_device *device)
  */
 static int wait_dev_flush(struct btrfs_device *device)
 {
-	int ret = 0;
 	struct bio *bio = device->flush_bio;
 
 	if (!device->flush_bio_sent)
@@ -3522,13 +3521,7 @@ static int wait_dev_flush(struct btrfs_device *device)
 	device->flush_bio_sent = 0;
 	wait_for_completion_io(&device->flush_wait);
 
-	if (bio->bi_error) {
-		ret = bio->bi_error;
-		btrfs_dev_stat_inc_and_print(device,
-				BTRFS_DEV_STAT_FLUSH_ERRS);
-	}
-
-	return ret;
+	return bio->bi_error;
 }
 
 static int check_barrier_error(struct btrfs_fs_devices *fsdevs)
@@ -3587,6 +3580,8 @@ static int barrier_all_devices(struct btrfs_fs_info *info)
 		ret = wait_dev_flush(dev);
 		if (ret) {
 			dev->last_flush_error = ret;
+			btrfs_dev_stat_inc_and_print(dev,
+					BTRFS_DEV_STAT_FLUSH_ERRS);
 			errors_wait++;
 		}
 	}
-- 
2.13.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 3/3 v2] btrfs: move dev stats accounting out of wait_dev_flush
  2017-06-16 14:04 ` [PATCH 3/3 v2] btrfs: move dev stats accounting out of wait_dev_flush David Sterba
@ 2017-06-17 13:53   ` Anand Jain
  0 siblings, 0 replies; 5+ messages in thread
From: Anand Jain @ 2017-06-17 13:53 UTC (permalink / raw)
  To: David Sterba, linux-btrfs



On 06/16/2017 10:04 PM, David Sterba wrote:
> We should really just wait in wait_dev_flush and let the caller decide
> what to do with the error value.

Reviewed-by: Anand Jain <anand.jain@oracle.com>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-06-17 13:48 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-06-16 14:04 [PATCH 0/3 v2] Preallocate flush bio David Sterba
2017-06-16 14:04 ` [PATCH 1/3 v2] btrfs: preallocate device " David Sterba
2017-06-16 14:04 ` [PATCH 2/3 v2] btrfs: account as waiting for IO, while waiting fot the flush bio completion David Sterba
2017-06-16 14:04 ` [PATCH 3/3 v2] btrfs: move dev stats accounting out of wait_dev_flush David Sterba
2017-06-17 13:53   ` Anand Jain

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).