linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] Preallocate flush bio, sysfs tunable
@ 2017-06-15 16:49 David Sterba
  2017-06-15 16:49 ` [PATCH 1/5] btrfs: preallocate device flush bio David Sterba
                   ` (4 more replies)
  0 siblings, 5 replies; 13+ messages in thread
From: David Sterba @ 2017-06-15 16:49 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

This patchset follows the updates in the write_dev_flush function. The flush
bio can be preallocated at the device creation time, so we avoid repeated
alloc/free.

Next, there's a new sysfs tunable to enable forced dev flushes for devices that
do not support the barriers. This helps to test the new code but is not meant
for any non-debugging use.

I've tested lightly with some workloads and toggled the sysfs knob during that,
all fine.

David Sterba (5):
  btrfs: preallocate device flush bio
  btrfs: account as waiting for IO, while waiting fot the flush bio
    completion
  btrfs: move dev stats accounting out of wait_dev_flush
  btrfs: add fs flag to force device flushing
  btrfs: sysfs: export the force_dev_flush flag

 fs/btrfs/ctree.h   |  1 +
 fs/btrfs/disk-io.c | 41 +++++++++++++----------------------------
 fs/btrfs/sysfs.c   | 47 +++++++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/volumes.c | 12 ++++++++++++
 fs/btrfs/volumes.h |  1 +
 5 files changed, 74 insertions(+), 28 deletions(-)

-- 
2.13.0


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/5] btrfs: preallocate device flush bio
  2017-06-15 16:49 [PATCH 0/5] Preallocate flush bio, sysfs tunable David Sterba
@ 2017-06-15 16:49 ` David Sterba
  2017-06-15 21:53   ` Anand Jain
  2017-06-15 16:49 ` [PATCH 2/5] btrfs: account as waiting for IO, while waiting fot the flush bio completion David Sterba
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 13+ messages in thread
From: David Sterba @ 2017-06-15 16:49 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

For devices that support flushing, we allocate a bio, submit, wait for
it and then free it. The bio allocation does not fail so ENOMEM is not a
problem but we still may unnecessarily stress the allocation subsystem.

Instead, we can allocate the device at the same time we allocate the
device and reuse it each time we need to flush the barriers. The bio is
reset before each use. Reference counting is simplified to just device
allocation (get) and freeing (put).

Note for write_dev_flush: we check the queue flush status again as we
can't use the existence of bio as before.

Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/disk-io.c | 24 ++++++------------------
 fs/btrfs/volumes.c | 12 ++++++++++++
 2 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 2b00ebff13f8..27d44d6ab775 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3482,9 +3482,7 @@ static int write_dev_supers(struct btrfs_device *device,
  */
 static void btrfs_end_empty_barrier(struct bio *bio)
 {
-	if (bio->bi_private)
-		complete(bio->bi_private);
-	bio_put(bio);
+	complete(bio->bi_private);
 }
 
 /*
@@ -3494,26 +3492,19 @@ static void btrfs_end_empty_barrier(struct bio *bio)
 static void write_dev_flush(struct btrfs_device *device)
 {
 	struct request_queue *q = bdev_get_queue(device->bdev);
-	struct bio *bio;
+	struct bio *bio = device->flush_bio;
 
 	if (!test_bit(QUEUE_FLAG_WC, &q->queue_flags))
 		return;
 
-	/*
-	 * one reference for us, and we leave it for the
-	 * caller
-	 */
-	device->flush_bio = NULL;
-	bio = btrfs_io_bio_alloc(0);
+	bio_reset(bio);
 	bio->bi_end_io = btrfs_end_empty_barrier;
 	bio->bi_bdev = device->bdev;
 	bio->bi_opf = REQ_OP_WRITE | REQ_SYNC | REQ_PREFLUSH;
 	init_completion(&device->flush_wait);
 	bio->bi_private = &device->flush_wait;
-	device->flush_bio = bio;
 
-	bio_get(bio);
-	btrfsic_submit_bio(bio);
+	submit_bio(bio);
 }
 
 /*
@@ -3522,9 +3513,10 @@ static void write_dev_flush(struct btrfs_device *device)
 static int wait_dev_flush(struct btrfs_device *device)
 {
 	int ret = 0;
+	struct request_queue *q = bdev_get_queue(device->bdev);
 	struct bio *bio = device->flush_bio;
 
-	if (!bio)
+	if (!test_bit(QUEUE_FLAG_WC, &q->queue_flags))
 		return 0;
 
 	wait_for_completion(&device->flush_wait);
@@ -3535,10 +3527,6 @@ static int wait_dev_flush(struct btrfs_device *device)
 				BTRFS_DEV_STAT_FLUSH_ERRS);
 	}
 
-	/* drop the reference from the wait == 0 run */
-	bio_put(bio);
-	device->flush_bio = NULL;
-
 	return ret;
 }
 
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 8bb1f4e5905a..251ae81e4363 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -242,6 +242,17 @@ static struct btrfs_device *__alloc_device(void)
 	if (!dev)
 		return ERR_PTR(-ENOMEM);
 
+	/*
+	 * Preallocate a bio that's always going to be used for flushing device
+	 * barriers and matches the device lifespan
+	 */
+	dev->flush_bio = bio_alloc_bioset(GFP_KERNEL, 0, NULL);
+	if (!dev->flush_bio) {
+		kfree(dev);
+		return ERR_PTR(-ENOMEM);
+	}
+	bio_get(dev->flush_bio);
+
 	INIT_LIST_HEAD(&dev->dev_list);
 	INIT_LIST_HEAD(&dev->dev_alloc_list);
 	INIT_LIST_HEAD(&dev->resized_list);
@@ -838,6 +849,7 @@ static void __free_device(struct work_struct *work)
 
 	device = container_of(work, struct btrfs_device, rcu_work);
 	rcu_string_free(device->name);
+	bio_put(device->flush_bio);
 	kfree(device);
 }
 
-- 
2.13.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/5] btrfs: account as waiting for IO, while waiting fot the flush bio completion
  2017-06-15 16:49 [PATCH 0/5] Preallocate flush bio, sysfs tunable David Sterba
  2017-06-15 16:49 ` [PATCH 1/5] btrfs: preallocate device flush bio David Sterba
@ 2017-06-15 16:49 ` David Sterba
  2017-06-15 16:49 ` [PATCH 3/5] btrfs: move dev stats accounting out of wait_dev_flush David Sterba
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 13+ messages in thread
From: David Sterba @ 2017-06-15 16:49 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

Similar to what submit_bio_wait does, we should account for IO while
waiting for a bio completion. This has marginal visible effects, flush
bio is short-lived.

Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/disk-io.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 27d44d6ab775..05ff81ecb887 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3519,7 +3519,7 @@ static int wait_dev_flush(struct btrfs_device *device)
 	if (!test_bit(QUEUE_FLAG_WC, &q->queue_flags))
 		return 0;
 
-	wait_for_completion(&device->flush_wait);
+	wait_for_completion_io(&device->flush_wait);
 
 	if (bio->bi_error) {
 		ret = bio->bi_error;
-- 
2.13.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/5] btrfs: move dev stats accounting out of wait_dev_flush
  2017-06-15 16:49 [PATCH 0/5] Preallocate flush bio, sysfs tunable David Sterba
  2017-06-15 16:49 ` [PATCH 1/5] btrfs: preallocate device flush bio David Sterba
  2017-06-15 16:49 ` [PATCH 2/5] btrfs: account as waiting for IO, while waiting fot the flush bio completion David Sterba
@ 2017-06-15 16:49 ` David Sterba
  2017-06-15 22:00   ` Anand Jain
  2017-06-15 16:49 ` [PATCH 4/5] btrfs: add fs flag to force device flushing David Sterba
  2017-06-15 16:49 ` [PATCH 5/5] btrfs: sysfs: export the force_dev_flush flag David Sterba
  4 siblings, 1 reply; 13+ messages in thread
From: David Sterba @ 2017-06-15 16:49 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

We should really just wait in wait_dev_flush and let the caller decide
what to do with the error value.

Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/disk-io.c | 11 +++--------
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 05ff81ecb887..59a732a13370 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3512,7 +3512,6 @@ static void write_dev_flush(struct btrfs_device *device)
  */
 static int wait_dev_flush(struct btrfs_device *device)
 {
-	int ret = 0;
 	struct request_queue *q = bdev_get_queue(device->bdev);
 	struct bio *bio = device->flush_bio;
 
@@ -3521,13 +3520,7 @@ static int wait_dev_flush(struct btrfs_device *device)
 
 	wait_for_completion_io(&device->flush_wait);
 
-	if (bio->bi_error) {
-		ret = bio->bi_error;
-		btrfs_dev_stat_inc_and_print(device,
-				BTRFS_DEV_STAT_FLUSH_ERRS);
-	}
-
-	return ret;
+	return bio->bi_error;
 }
 
 static int check_barrier_error(struct btrfs_fs_devices *fsdevs)
@@ -3586,6 +3579,8 @@ static int barrier_all_devices(struct btrfs_fs_info *info)
 		ret = wait_dev_flush(dev);
 		if (ret) {
 			dev->last_flush_error = ret;
+			btrfs_dev_stat_inc_and_print(dev,
+					BTRFS_DEV_STAT_FLUSH_ERRS);
 			errors_wait++;
 		}
 	}
-- 
2.13.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 4/5] btrfs: add fs flag to force device flushing
  2017-06-15 16:49 [PATCH 0/5] Preallocate flush bio, sysfs tunable David Sterba
                   ` (2 preceding siblings ...)
  2017-06-15 16:49 ` [PATCH 3/5] btrfs: move dev stats accounting out of wait_dev_flush David Sterba
@ 2017-06-15 16:49 ` David Sterba
  2017-06-15 22:08   ` Anand Jain
  2017-06-15 16:49 ` [PATCH 5/5] btrfs: sysfs: export the force_dev_flush flag David Sterba
  4 siblings, 1 reply; 13+ messages in thread
From: David Sterba @ 2017-06-15 16:49 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

We need a device capable of device barriers so we can test the flush
code. To aid testing, add a per-filesystem status flag that affects the
barriers regerdless of the device capabilities and obviously does not
give the same guarantees.

It's off by default, sysfs tunable will follow.

Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/ctree.h   | 1 +
 fs/btrfs/disk-io.c | 8 +++++---
 fs/btrfs/volumes.h | 1 +
 3 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index f0f5f28784b6..dcf4404f7d61 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -716,6 +716,7 @@ struct btrfs_delayed_root;
 #define BTRFS_FS_LOG1_ERR			12
 #define BTRFS_FS_LOG2_ERR			13
 #define BTRFS_FS_QUOTA_OVERRIDE			14
+#define BTRFS_FS_FORCE_DEV_FLUSH		15
 
 /*
  * Indicate that a whole-filesystem exclusive operation is running
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 59a732a13370..659a3b4645d2 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3494,7 +3494,8 @@ static void write_dev_flush(struct btrfs_device *device)
 	struct request_queue *q = bdev_get_queue(device->bdev);
 	struct bio *bio = device->flush_bio;
 
-	if (!test_bit(QUEUE_FLAG_WC, &q->queue_flags))
+	if (!test_bit(BTRFS_FS_FORCE_DEV_FLUSH, &device->fs_info->flags)
+	    && !test_bit(QUEUE_FLAG_WC, &q->queue_flags))
 		return;
 
 	bio_reset(bio);
@@ -3505,6 +3506,7 @@ static void write_dev_flush(struct btrfs_device *device)
 	bio->bi_private = &device->flush_wait;
 
 	submit_bio(bio);
+	device->flush_bio_sent = 1;
 }
 
 /*
@@ -3512,12 +3514,12 @@ static void write_dev_flush(struct btrfs_device *device)
  */
 static int wait_dev_flush(struct btrfs_device *device)
 {
-	struct request_queue *q = bdev_get_queue(device->bdev);
 	struct bio *bio = device->flush_bio;
 
-	if (!test_bit(QUEUE_FLAG_WC, &q->queue_flags))
+	if (!device->flush_bio_sent)
 		return 0;
 
+	device->flush_bio_sent = 0;
 	wait_for_completion_io(&device->flush_wait);
 
 	return bio->bi_error;
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 35327efecdbb..6f45fd60d15a 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -75,6 +75,7 @@ struct btrfs_device {
 	int can_discard;
 	int is_tgtdev_for_dev_replace;
 	int last_flush_error;
+	int flush_bio_sent;
 
 #ifdef __BTRFS_NEED_DEVICE_DATA_ORDERED
 	seqcount_t data_seqcount;
-- 
2.13.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 5/5] btrfs: sysfs: export the force_dev_flush flag
  2017-06-15 16:49 [PATCH 0/5] Preallocate flush bio, sysfs tunable David Sterba
                   ` (3 preceding siblings ...)
  2017-06-15 16:49 ` [PATCH 4/5] btrfs: add fs flag to force device flushing David Sterba
@ 2017-06-15 16:49 ` David Sterba
  2017-06-15 22:24   ` Anand Jain
  4 siblings, 1 reply; 13+ messages in thread
From: David Sterba @ 2017-06-15 16:49 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

Add per-filesystem tunable to switch on/off the device barriers,
regardless of the actual device support. This is a debugging feature.

The path to the sysfs file is /sys/fs/btrfs/UUID/force_dev_flush,
allowed values are 0 and 1. Default is 0.

Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/sysfs.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 47 insertions(+)

diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
index c2d5f3580b4c..197d43911936 100644
--- a/fs/btrfs/sysfs.c
+++ b/fs/btrfs/sysfs.c
@@ -487,12 +487,59 @@ static ssize_t quota_override_store(struct kobject *kobj,
 
 BTRFS_ATTR_RW(quota_override, quota_override_show, quota_override_store);
 
+static ssize_t force_dev_flush_show(struct kobject *kobj,
+				   struct kobj_attribute *a, char *buf)
+{
+	struct btrfs_fs_info *fs_info = to_fs_info(kobj);
+	int force;
+
+	if (!fs_info)
+		return -EPERM;
+
+	force = !!(test_bit(BTRFS_FS_FORCE_DEV_FLUSH, &fs_info->flags));
+	return snprintf(buf, PAGE_SIZE, "%d\n", force);
+}
+
+static ssize_t force_dev_flush_store(struct kobject *kobj,
+				    struct kobj_attribute *a,
+				    const char *buf, size_t len)
+{
+	struct btrfs_fs_info *fs_info = to_fs_info(kobj);
+	unsigned long force;
+	int err;
+
+	if (!fs_info)
+		return -EPERM;
+
+	if (!capable(CAP_SYS_ADMIN))
+		return -EPERM;
+
+	err = kstrtoul(buf, 10, &force);
+	if (err)
+		return err;
+	if (force > 1)
+		return -EINVAL;
+
+	if (force) {
+		set_bit(BTRFS_FS_FORCE_DEV_FLUSH, &fs_info->flags);
+		btrfs_info(fs_info, "Forced device flushes enabled");
+	} else {
+		clear_bit(BTRFS_FS_FORCE_DEV_FLUSH, &fs_info->flags);
+		btrfs_info(fs_info, "Forced device flushes disabled");
+	}
+
+	return len;
+}
+
+BTRFS_ATTR_RW(force_dev_flush, force_dev_flush_show, force_dev_flush_store);
+
 static const struct attribute *btrfs_attrs[] = {
 	BTRFS_ATTR_PTR(label),
 	BTRFS_ATTR_PTR(nodesize),
 	BTRFS_ATTR_PTR(sectorsize),
 	BTRFS_ATTR_PTR(clone_alignment),
 	BTRFS_ATTR_PTR(quota_override),
+	BTRFS_ATTR_PTR(force_dev_flush),
 	NULL,
 };
 
-- 
2.13.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/5] btrfs: preallocate device flush bio
  2017-06-15 16:49 ` [PATCH 1/5] btrfs: preallocate device flush bio David Sterba
@ 2017-06-15 21:53   ` Anand Jain
  2017-06-16 13:17     ` David Sterba
  0 siblings, 1 reply; 13+ messages in thread
From: Anand Jain @ 2017-06-15 21:53 UTC (permalink / raw)
  To: David Sterba, linux-btrfs



On 06/16/2017 12:49 AM, David Sterba wrote:
> For devices that support flushing, we allocate a bio, submit, wait for
> it and then free it. The bio allocation does not fail so ENOMEM is not a
> problem but we still may unnecessarily stress the allocation subsystem.
> 
> Instead, we can allocate the device at the same time we allocate the
> device and reuse it each time we need to flush the barriers. The bio is
> reset before each use. Reference counting is simplified to just device
> allocation (get) and freeing (put).
> 
> Note for write_dev_flush: we check the queue flush status again as we
> can't use the existence of bio as before.

  Looks good few items as below..

> Signed-off-by: David Sterba <dsterba@suse.com>
> ---
>   fs/btrfs/disk-io.c | 24 ++++++------------------
>   fs/btrfs/volumes.c | 12 ++++++++++++
>   2 files changed, 18 insertions(+), 18 deletions(-)
> 
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 2b00ebff13f8..27d44d6ab775 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -3482,9 +3482,7 @@ static int write_dev_supers(struct btrfs_device *device,
>    */
>   static void btrfs_end_empty_barrier(struct bio *bio)
>   {
> -	if (bio->bi_private)
> -		complete(bio->bi_private);
> -	bio_put(bio);
> +	complete(bio->bi_private);
>   }
>   
>   /*
> @@ -3494,26 +3492,19 @@ static void btrfs_end_empty_barrier(struct bio *bio)
>   static void write_dev_flush(struct btrfs_device *device)
>   {
>   	struct request_queue *q = bdev_get_queue(device->bdev);
> -	struct bio *bio;
> +	struct bio *bio = device->flush_bio;
>   
>   	if (!test_bit(QUEUE_FLAG_WC, &q->queue_flags))
>   		return;
>   
> -	/*
> -	 * one reference for us, and we leave it for the
> -	 * caller
> -	 */
> -	device->flush_bio = NULL;
> -	bio = btrfs_io_bio_alloc(0);
> +	bio_reset(bio);
>   	bio->bi_end_io = btrfs_end_empty_barrier;
>   	bio->bi_bdev = device->bdev;
>   	bio->bi_opf = REQ_OP_WRITE | REQ_SYNC | REQ_PREFLUSH;
>   	init_completion(&device->flush_wait);
>   	bio->bi_private = &device->flush_wait;
> -	device->flush_bio = bio;
>   
> -	bio_get(bio);
> -	btrfsic_submit_bio(bio);
> +	submit_bio(bio);

  Originally it went through the btrfsic. There is no mention
  of this change if its not an oversight.

>   }
>   
>   /*
> @@ -3522,9 +3513,10 @@ static void write_dev_flush(struct btrfs_device *device)
>   static int wait_dev_flush(struct btrfs_device *device)
>   {
>   	int ret = 0;
> +	struct request_queue *q = bdev_get_queue(device->bdev);
>   	struct bio *bio = device->flush_bio;
>   
> -	if (!bio)
> +	if (!test_bit(QUEUE_FLAG_WC, &q->queue_flags))
>   		return 0;

  It returns here if its write through. Which can be toggled
  after write_dev_flush() has been called such as..

   echo "write back" > /sys/block/sdd/queue/write_cache
   write_dev_flush(sdd)
   echo "write through" > /sys/block/sdd/queue/write_cache
   wait_dev_flush(sdd)

  So it would fails to check error.


>   	wait_for_completion(&device->flush_wait);
> @@ -3535,10 +3527,6 @@ static int wait_dev_flush(struct btrfs_device *device)
>   				BTRFS_DEV_STAT_FLUSH_ERRS);
>   	}
>   
> -	/* drop the reference from the wait == 0 run */
> -	bio_put(bio);
> -	device->flush_bio = NULL;
> -
>   	return ret;
>   }
>   
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 8bb1f4e5905a..251ae81e4363 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -242,6 +242,17 @@ static struct btrfs_device *__alloc_device(void)
>   	if (!dev)
>   		return ERR_PTR(-ENOMEM);
>   
> +	/*
> +	 * Preallocate a bio that's always going to be used for flushing device
> +	 * barriers and matches the device lifespan
> +	 */
> +	dev->flush_bio = bio_alloc_bioset(GFP_KERNEL, 0, NULL);

   Nice.

Thanks, Anand


> +	if (!dev->flush_bio) {
> +		kfree(dev);
> +		return ERR_PTR(-ENOMEM);
> +	}
> +	bio_get(dev->flush_bio);
> +
>   	INIT_LIST_HEAD(&dev->dev_list);
>   	INIT_LIST_HEAD(&dev->dev_alloc_list);
>   	INIT_LIST_HEAD(&dev->resized_list);
> @@ -838,6 +849,7 @@ static void __free_device(struct work_struct *work)
>   
>   	device = container_of(work, struct btrfs_device, rcu_work);
>   	rcu_string_free(device->name);
> +	bio_put(device->flush_bio);
>   	kfree(device);
>   }
>   
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/5] btrfs: move dev stats accounting out of wait_dev_flush
  2017-06-15 16:49 ` [PATCH 3/5] btrfs: move dev stats accounting out of wait_dev_flush David Sterba
@ 2017-06-15 22:00   ` Anand Jain
  0 siblings, 0 replies; 13+ messages in thread
From: Anand Jain @ 2017-06-15 22:00 UTC (permalink / raw)
  To: David Sterba, linux-btrfs



On 06/16/2017 12:49 AM, David Sterba wrote:
> We should really just wait in wait_dev_flush and let the caller decide
> what to do with the error value.

  Nice.
  Reviewed-by: Anand Jain <anand.jain@oracle.com>


> Signed-off-by: David Sterba <dsterba@suse.com>
> ---
>   fs/btrfs/disk-io.c | 11 +++--------
>   1 file changed, 3 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 05ff81ecb887..59a732a13370 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -3512,7 +3512,6 @@ static void write_dev_flush(struct btrfs_device *device)
>    */
>   static int wait_dev_flush(struct btrfs_device *device)
>   {
> -	int ret = 0;
>   	struct request_queue *q = bdev_get_queue(device->bdev);
>   	struct bio *bio = device->flush_bio;
>   
> @@ -3521,13 +3520,7 @@ static int wait_dev_flush(struct btrfs_device *device)
>   
>   	wait_for_completion_io(&device->flush_wait);
>   
> -	if (bio->bi_error) {
> -		ret = bio->bi_error;
> -		btrfs_dev_stat_inc_and_print(device,
> -				BTRFS_DEV_STAT_FLUSH_ERRS);
> -	}
> -
> -	return ret;
> +	return bio->bi_error;
>   }
>   
>   static int check_barrier_error(struct btrfs_fs_devices *fsdevs)
> @@ -3586,6 +3579,8 @@ static int barrier_all_devices(struct btrfs_fs_info *info)
>   		ret = wait_dev_flush(dev);
>   		if (ret) {
>   			dev->last_flush_error = ret;
> +			btrfs_dev_stat_inc_and_print(dev,
> +					BTRFS_DEV_STAT_FLUSH_ERRS);
>   			errors_wait++;
>   		}
>   	}
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 4/5] btrfs: add fs flag to force device flushing
  2017-06-15 16:49 ` [PATCH 4/5] btrfs: add fs flag to force device flushing David Sterba
@ 2017-06-15 22:08   ` Anand Jain
  2017-06-15 22:27     ` Anand Jain
  0 siblings, 1 reply; 13+ messages in thread
From: Anand Jain @ 2017-06-15 22:08 UTC (permalink / raw)
  To: David Sterba, linux-btrfs



On 06/16/2017 12:49 AM, David Sterba wrote:
> We need a device capable of device barriers so we can test the flush
> code. To aid testing, add a per-filesystem status flag that affects the
> barriers regerdless of the device capabilities and obviously does not
> give the same guarantees.
> 
> It's off by default, sysfs tunable will follow.
> 
> Signed-off-by: David Sterba <dsterba@suse.com>
> ---
>   fs/btrfs/ctree.h   | 1 +
>   fs/btrfs/disk-io.c | 8 +++++---
>   fs/btrfs/volumes.h | 1 +
>   3 files changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index f0f5f28784b6..dcf4404f7d61 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -716,6 +716,7 @@ struct btrfs_delayed_root;
>   #define BTRFS_FS_LOG1_ERR			12
>   #define BTRFS_FS_LOG2_ERR			13
>   #define BTRFS_FS_QUOTA_OVERRIDE			14
> +#define BTRFS_FS_FORCE_DEV_FLUSH		15
>   
>   /*
>    * Indicate that a whole-filesystem exclusive operation is running
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 59a732a13370..659a3b4645d2 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -3494,7 +3494,8 @@ static void write_dev_flush(struct btrfs_device *device)
>   	struct request_queue *q = bdev_get_queue(device->bdev);
>   	struct bio *bio = device->flush_bio;
>   
> -	if (!test_bit(QUEUE_FLAG_WC, &q->queue_flags))
> +	if (!test_bit(BTRFS_FS_FORCE_DEV_FLUSH, &device->fs_info->flags)
> +	    && !test_bit(QUEUE_FLAG_WC, &q->queue_flags))
>   		return;


  Now I understand what you meant. But the most common case in our test
  set up is a device with write cache. So BTRFS_FS_FORCE_DEV_FLUSH does
  not bring any additional force. IMO.

Thanks, Anand

>   	bio_reset(bio);
> @@ -3505,6 +3506,7 @@ static void write_dev_flush(struct btrfs_device *device)
>   	bio->bi_private = &device->flush_wait;
>   
>   	submit_bio(bio);
> +	device->flush_bio_sent = 1;
>   }
>   
>   /*
> @@ -3512,12 +3514,12 @@ static void write_dev_flush(struct btrfs_device *device)
>    */
>   static int wait_dev_flush(struct btrfs_device *device)
>   {
> -	struct request_queue *q = bdev_get_queue(device->bdev);
>   	struct bio *bio = device->flush_bio;
>   
> -	if (!test_bit(QUEUE_FLAG_WC, &q->queue_flags))
> +	if (!device->flush_bio_sent)
>   		return 0;
>   
> +	device->flush_bio_sent = 0;
>   	wait_for_completion_io(&device->flush_wait);
>   
>   	return bio->bi_error;
> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
> index 35327efecdbb..6f45fd60d15a 100644
> --- a/fs/btrfs/volumes.h
> +++ b/fs/btrfs/volumes.h
> @@ -75,6 +75,7 @@ struct btrfs_device {
>   	int can_discard;
>   	int is_tgtdev_for_dev_replace;
>   	int last_flush_error;
> +	int flush_bio_sent;
>   
>   #ifdef __BTRFS_NEED_DEVICE_DATA_ORDERED
>   	seqcount_t data_seqcount;
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 5/5] btrfs: sysfs: export the force_dev_flush flag
  2017-06-15 16:49 ` [PATCH 5/5] btrfs: sysfs: export the force_dev_flush flag David Sterba
@ 2017-06-15 22:24   ` Anand Jain
  0 siblings, 0 replies; 13+ messages in thread
From: Anand Jain @ 2017-06-15 22:24 UTC (permalink / raw)
  To: David Sterba, linux-btrfs



On 06/16/2017 12:49 AM, David Sterba wrote:
> Add per-filesystem tunable to switch on/off the device barriers,
> regardless of the actual device support. This is a debugging feature.
> 
> The path to the sysfs file is /sys/fs/btrfs/UUID/force_dev_flush,
> allowed values are 0 and 1. Default is 0.


  Anyway the flush command won't touch the device if its write through.
  So unless I am missing something, IMO we don't need something
  like this.

-------------------
generic_make_request_checks(struct bio *bio)
::
	/*
	 * Filter flush bio's early so that make_request based
	 * drivers without flush support don't have to worry
	 * about them.
	 */
	if (op_is_flush(bio->bi_opf) &&
	    !test_bit(QUEUE_FLAG_WC, &q->queue_flags)) {
		bio->bi_opf &= ~(REQ_PREFLUSH | REQ_FUA);
		if (!nr_sectors) {
			err = 0;
			goto end_io;
		}
	}
-------------------



Thanks, Anand


> Signed-off-by: David Sterba <dsterba@suse.com>
> ---
>   fs/btrfs/sysfs.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 47 insertions(+)
> 
> diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
> index c2d5f3580b4c..197d43911936 100644
> --- a/fs/btrfs/sysfs.c
> +++ b/fs/btrfs/sysfs.c
> @@ -487,12 +487,59 @@ static ssize_t quota_override_store(struct kobject *kobj,
>   
>   BTRFS_ATTR_RW(quota_override, quota_override_show, quota_override_store);
>   
> +static ssize_t force_dev_flush_show(struct kobject *kobj,
> +				   struct kobj_attribute *a, char *buf)
> +{
> +	struct btrfs_fs_info *fs_info = to_fs_info(kobj);
> +	int force;
> +
> +	if (!fs_info)
> +		return -EPERM;
> +
> +	force = !!(test_bit(BTRFS_FS_FORCE_DEV_FLUSH, &fs_info->flags));
> +	return snprintf(buf, PAGE_SIZE, "%d\n", force);
> +}
> +
> +static ssize_t force_dev_flush_store(struct kobject *kobj,
> +				    struct kobj_attribute *a,
> +				    const char *buf, size_t len)
> +{
> +	struct btrfs_fs_info *fs_info = to_fs_info(kobj);
> +	unsigned long force;
> +	int err;
> +
> +	if (!fs_info)
> +		return -EPERM;
> +
> +	if (!capable(CAP_SYS_ADMIN))
> +		return -EPERM;
> +
> +	err = kstrtoul(buf, 10, &force);
> +	if (err)
> +		return err;
> +	if (force > 1)
> +		return -EINVAL;
> +
> +	if (force) {
> +		set_bit(BTRFS_FS_FORCE_DEV_FLUSH, &fs_info->flags);
> +		btrfs_info(fs_info, "Forced device flushes enabled");
> +	} else {
> +		clear_bit(BTRFS_FS_FORCE_DEV_FLUSH, &fs_info->flags);
> +		btrfs_info(fs_info, "Forced device flushes disabled");
> +	}
> +
> +	return len;
> +}
> +
> +BTRFS_ATTR_RW(force_dev_flush, force_dev_flush_show, force_dev_flush_store);
> +
>   static const struct attribute *btrfs_attrs[] = {
>   	BTRFS_ATTR_PTR(label),
>   	BTRFS_ATTR_PTR(nodesize),
>   	BTRFS_ATTR_PTR(sectorsize),
>   	BTRFS_ATTR_PTR(clone_alignment),
>   	BTRFS_ATTR_PTR(quota_override),
> +	BTRFS_ATTR_PTR(force_dev_flush),
>   	NULL,
>   };
>   
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 4/5] btrfs: add fs flag to force device flushing
  2017-06-15 22:08   ` Anand Jain
@ 2017-06-15 22:27     ` Anand Jain
  2017-06-16 14:03       ` David Sterba
  0 siblings, 1 reply; 13+ messages in thread
From: Anand Jain @ 2017-06-15 22:27 UTC (permalink / raw)
  To: David Sterba, linux-btrfs





>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>> index 59a732a13370..659a3b4645d2 100644
>> --- a/fs/btrfs/disk-io.c
>> +++ b/fs/btrfs/disk-io.c
>> @@ -3494,7 +3494,8 @@ static void write_dev_flush(struct btrfs_device 
>> *device)
>>       struct request_queue *q = bdev_get_queue(device->bdev);
>>       struct bio *bio = device->flush_bio;
>> -    if (!test_bit(QUEUE_FLAG_WC, &q->queue_flags))
>> +    if (!test_bit(BTRFS_FS_FORCE_DEV_FLUSH, &device->fs_info->flags)
>> +        && !test_bit(QUEUE_FLAG_WC, &q->queue_flags))
>>           return;
> 
> 
>   Now I understand what you meant. But the most common case in our test
>   set up is a device with write cache. So BTRFS_FS_FORCE_DEV_FLUSH does
>   not bring any additional force. IMO.
> 
> Thanks, Anand

  Or one another idea is we could remove

    !test_bit(QUEUE_FLAG_WC, &q->queue_flags)

  which purpose is to only fail early.

  If we remove it there is consistency in our code with or
  with out the write cache.

Thanks, Anand

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/5] btrfs: preallocate device flush bio
  2017-06-15 21:53   ` Anand Jain
@ 2017-06-16 13:17     ` David Sterba
  0 siblings, 0 replies; 13+ messages in thread
From: David Sterba @ 2017-06-16 13:17 UTC (permalink / raw)
  To: Anand Jain; +Cc: linux-btrfs

On Fri, Jun 16, 2017 at 05:53:12AM +0800, Anand Jain wrote:
> On 06/16/2017 12:49 AM, David Sterba wrote:
> > For devices that support flushing, we allocate a bio, submit, wait for
> > it and then free it. The bio allocation does not fail so ENOMEM is not a
> > problem but we still may unnecessarily stress the allocation subsystem.
> > 
> > Instead, we can allocate the device at the same time we allocate the
> > device and reuse it each time we need to flush the barriers. The bio is
> > reset before each use. Reference counting is simplified to just device
> > allocation (get) and freeing (put).
> > 
> > Note for write_dev_flush: we check the queue flush status again as we
> > can't use the existence of bio as before.
> 
>   Looks good few items as below..
> 
> > Signed-off-by: David Sterba <dsterba@suse.com>
> > ---
> >   fs/btrfs/disk-io.c | 24 ++++++------------------
> >   fs/btrfs/volumes.c | 12 ++++++++++++
> >   2 files changed, 18 insertions(+), 18 deletions(-)
> > 
> > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> > index 2b00ebff13f8..27d44d6ab775 100644
> > --- a/fs/btrfs/disk-io.c
> > +++ b/fs/btrfs/disk-io.c
> > @@ -3482,9 +3482,7 @@ static int write_dev_supers(struct btrfs_device *device,
> >    */
> >   static void btrfs_end_empty_barrier(struct bio *bio)
> >   {
> > -	if (bio->bi_private)
> > -		complete(bio->bi_private);
> > -	bio_put(bio);
> > +	complete(bio->bi_private);
> >   }
> >   
> >   /*
> > @@ -3494,26 +3492,19 @@ static void btrfs_end_empty_barrier(struct bio *bio)
> >   static void write_dev_flush(struct btrfs_device *device)
> >   {
> >   	struct request_queue *q = bdev_get_queue(device->bdev);
> > -	struct bio *bio;
> > +	struct bio *bio = device->flush_bio;
> >   
> >   	if (!test_bit(QUEUE_FLAG_WC, &q->queue_flags))
> >   		return;
> >   
> > -	/*
> > -	 * one reference for us, and we leave it for the
> > -	 * caller
> > -	 */
> > -	device->flush_bio = NULL;
> > -	bio = btrfs_io_bio_alloc(0);
> > +	bio_reset(bio);
> >   	bio->bi_end_io = btrfs_end_empty_barrier;
> >   	bio->bi_bdev = device->bdev;
> >   	bio->bi_opf = REQ_OP_WRITE | REQ_SYNC | REQ_PREFLUSH;
> >   	init_completion(&device->flush_wait);
> >   	bio->bi_private = &device->flush_wait;
> > -	device->flush_bio = bio;
> >   
> > -	bio_get(bio);
> > -	btrfsic_submit_bio(bio);
> > +	submit_bio(bio);
> 
>   Originally it went through the btrfsic. There is no mention
>   of this change if its not an oversight.

Right, avoiding is intentional I just forgot to mention it in the
changelog. The bio has no data attached so integrity checker will skip
it.

> >   /*
> > @@ -3522,9 +3513,10 @@ static void write_dev_flush(struct btrfs_device *device)
> >   static int wait_dev_flush(struct btrfs_device *device)
> >   {
> >   	int ret = 0;
> > +	struct request_queue *q = bdev_get_queue(device->bdev);
> >   	struct bio *bio = device->flush_bio;
> >   
> > -	if (!bio)
> > +	if (!test_bit(QUEUE_FLAG_WC, &q->queue_flags))
> >   		return 0;
> 
>   It returns here if its write through. Which can be toggled
>   after write_dev_flush() has been called such as..
> 
>    echo "write back" > /sys/block/sdd/queue/write_cache
>    write_dev_flush(sdd)
>    echo "write through" > /sys/block/sdd/queue/write_cache
>    wait_dev_flush(sdd)
> 
>   So it would fails to check error.

Yeah, the bio would stay in flight. I had to read more about the flushes
but I apparently mixed it up with FUA. Toggling write cache needs to be
handled properly which needs to pull the relevant bits from patch 4/5
and the force_dev_flush sysfs knob does not make sense, as you noted.
Thanks.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 4/5] btrfs: add fs flag to force device flushing
  2017-06-15 22:27     ` Anand Jain
@ 2017-06-16 14:03       ` David Sterba
  0 siblings, 0 replies; 13+ messages in thread
From: David Sterba @ 2017-06-16 14:03 UTC (permalink / raw)
  To: Anand Jain; +Cc: linux-btrfs

On Fri, Jun 16, 2017 at 06:27:20AM +0800, Anand Jain wrote:
> 
> 
> 
> 
> >> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> >> index 59a732a13370..659a3b4645d2 100644
> >> --- a/fs/btrfs/disk-io.c
> >> +++ b/fs/btrfs/disk-io.c
> >> @@ -3494,7 +3494,8 @@ static void write_dev_flush(struct btrfs_device 
> >> *device)
> >>       struct request_queue *q = bdev_get_queue(device->bdev);
> >>       struct bio *bio = device->flush_bio;
> >> -    if (!test_bit(QUEUE_FLAG_WC, &q->queue_flags))
> >> +    if (!test_bit(BTRFS_FS_FORCE_DEV_FLUSH, &device->fs_info->flags)
> >> +        && !test_bit(QUEUE_FLAG_WC, &q->queue_flags))
> >>           return;
> > 
> > 
> >   Now I understand what you meant. But the most common case in our test
> >   set up is a device with write cache. So BTRFS_FS_FORCE_DEV_FLUSH does
> >   not bring any additional force. IMO.
> > 
> > Thanks, Anand
> 
>   Or one another idea is we could remove
> 
>     !test_bit(QUEUE_FLAG_WC, &q->queue_flags)
> 
>   which purpose is to only fail early.

I'd prefer to keep it that way.

>   If we remove it there is consistency in our code with or
>   with out the write cache.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2017-06-16 14:04 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-06-15 16:49 [PATCH 0/5] Preallocate flush bio, sysfs tunable David Sterba
2017-06-15 16:49 ` [PATCH 1/5] btrfs: preallocate device flush bio David Sterba
2017-06-15 21:53   ` Anand Jain
2017-06-16 13:17     ` David Sterba
2017-06-15 16:49 ` [PATCH 2/5] btrfs: account as waiting for IO, while waiting fot the flush bio completion David Sterba
2017-06-15 16:49 ` [PATCH 3/5] btrfs: move dev stats accounting out of wait_dev_flush David Sterba
2017-06-15 22:00   ` Anand Jain
2017-06-15 16:49 ` [PATCH 4/5] btrfs: add fs flag to force device flushing David Sterba
2017-06-15 22:08   ` Anand Jain
2017-06-15 22:27     ` Anand Jain
2017-06-16 14:03       ` David Sterba
2017-06-15 16:49 ` [PATCH 5/5] btrfs: sysfs: export the force_dev_flush flag David Sterba
2017-06-15 22:24   ` Anand Jain

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).