linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/3] Bitmap percentage flushing
@ 2022-10-13 22:41 Jonathan Derrick
  2022-10-13 22:41 ` [PATCH v2 1/3] md/bitmap: Add chunk-threshold unplugging Jonathan Derrick
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Jonathan Derrick @ 2022-10-13 22:41 UTC (permalink / raw)
  To: Song Liu
  Cc: linux-raid, linux-kernel, jonathan.derrick, jonathanx.sk.derrick,
	Mariusz Tkaczyk, Jonathan Derrick

This introduces a percentage-flushing mechanism that works in-tandem to the
mdadm delay timer. The percentage argument is based on the number of chunks
dirty (rather than percentage), due to large drives requiring smaller and
smaller percentages (eg, 32TB drives-> 1% is 320GB).

This set hopes to provide a way to make the bitmap flushing more consistent. It
was observed that a synchronous, random write qd1 workload, could make bitmap
writes easily become almost half of the I/O. And in similar workloads with
different timing, it was several minutes between bitmap updates. This is too
inconsistent to be reliable.

This first and second patches adds the flush_threshold parameter. The default
value of 0 defines the default behavior: unplugging immediately just as before.
With a flush-threshold value of 1, it becomes more consistent and paranoid,
flushing on nearly every I/O, leading to a 40% or greater situation. From
there, the flush_threshold can be defined higher for those situations where
power loss is rare and full resync can be tolerated.

The third patch converts the daemon worker to an actual timer. This makes it
more consistent and removes some ugly code.

Jonathan Derrick (3):
  md/bitmap: Add chunk-threshold unplugging
  md/bitmap: Add sysfs interface for flush threshold
  md/bitmap: Convert daemon_work to proper timer

 Documentation/admin-guide/md.rst |  5 ++
 drivers/md/md-bitmap.c           | 98 +++++++++++++++++++++++++-------
 drivers/md/md-bitmap.h           |  4 +-
 drivers/md/md.c                  |  9 ++-
 drivers/md/md.h                  |  2 +
 5 files changed, 93 insertions(+), 25 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v2 1/3] md/bitmap: Add chunk-threshold unplugging
  2022-10-13 22:41 [PATCH v2 0/3] Bitmap percentage flushing Jonathan Derrick
@ 2022-10-13 22:41 ` Jonathan Derrick
  2022-10-14  1:11   ` Jonathan Derrick
  2022-10-13 22:41 ` [PATCH v2 2/3] md/bitmap: Add sysfs interface for flush threshold Jonathan Derrick
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 7+ messages in thread
From: Jonathan Derrick @ 2022-10-13 22:41 UTC (permalink / raw)
  To: Song Liu
  Cc: linux-raid, linux-kernel, jonathan.derrick, jonathanx.sk.derrick,
	Mariusz Tkaczyk, Jonathan Derrick

Add a mechanism to allow bitmap unplugging and flushing to wait until it
has surpassed a defined threshold of dirty chunks. This allows certain
high I/O write workloads to make good forward progress between bitmap
updates or provide reliable bitmap consistency. The default behavior is
previous behavior of always unplugging when called.

Signed-off-by: Jonathan Derrick <jonathan.derrick@linux.dev>
---
 drivers/md/md-bitmap.c | 35 +++++++++++++++++++++++++++++++----
 drivers/md/md-bitmap.h |  1 +
 drivers/md/md.h        |  1 +
 3 files changed, 33 insertions(+), 4 deletions(-)

diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c
index bf6dffadbe6f..c5c77f8371a8 100644
--- a/drivers/md/md-bitmap.c
+++ b/drivers/md/md-bitmap.c
@@ -1004,7 +1004,7 @@ static int md_bitmap_file_test_bit(struct bitmap *bitmap, sector_t block)
 /* this gets called when the md device is ready to unplug its underlying
  * (slave) device queues -- before we let any writes go down, we need to
  * sync the dirty pages of the bitmap file to disk */
-void md_bitmap_unplug(struct bitmap *bitmap)
+static void __md_bitmap_unplug(struct bitmap *bitmap)
 {
 	unsigned long i;
 	int dirty, need_write;
@@ -1038,6 +1038,33 @@ void md_bitmap_unplug(struct bitmap *bitmap)
 	if (test_bit(BITMAP_WRITE_ERROR, &bitmap->flags))
 		md_bitmap_file_kick(bitmap);
 }
+
+/*
+ * Conditional unplug based on user-defined parameter
+ * Defaults to unconditional behavior
+ */
+void md_bitmap_unplug(struct bitmap *bitmap)
+{
+	unsigned int flush_threshold = bitmap->mddev->bitmap_info.flush_threshold;
+
+	if (!flush_threshold) {
+		__md_bitmap_unplug(bitmap);
+	} else {
+		struct bitmap_page *bp = bitmap->counts.bp;
+		unsigned long pages = bitmap->counts.pages;
+		unsigned long k, count = 0;
+
+		for (k = 0; k < pages; k++)
+			if (bp[k].map && !bp[k].hijacked)
+				count += bp[k].count;
+
+		if (count - bitmap->unplugged_count > flush_threshold) {
+			bitmap->unplugged_count = count;
+			md_bitmap_daemon_work(&bitmap->mddev->daemon_timer);
+			__md_bitmap_unplug(bitmap);
+		}
+	}
+}
 EXPORT_SYMBOL(md_bitmap_unplug);
 
 static void md_bitmap_set_memory_bits(struct bitmap *bitmap, sector_t offset, int needed);
@@ -2012,9 +2039,9 @@ int md_bitmap_copy_from_slot(struct mddev *mddev, int slot,
 		for (i = 0; i < bitmap->storage.file_pages; i++)
 			if (test_page_attr(bitmap, i, BITMAP_PAGE_PENDING))
 				set_page_attr(bitmap, i, BITMAP_PAGE_NEEDWRITE);
-		md_bitmap_unplug(bitmap);
+		__md_bitmap_unplug(bitmap);
 	}
-	md_bitmap_unplug(mddev->bitmap);
+	__md_bitmap_unplug(mddev->bitmap);
 	*low = lo;
 	*high = hi;
 	md_bitmap_free(bitmap);
@@ -2246,7 +2273,7 @@ int md_bitmap_resize(struct bitmap *bitmap, sector_t blocks,
 	spin_unlock_irq(&bitmap->counts.lock);
 
 	if (!init) {
-		md_bitmap_unplug(bitmap);
+		__md_bitmap_unplug(bitmap);
 		bitmap->mddev->pers->quiesce(bitmap->mddev, 0);
 	}
 	ret = 0;
diff --git a/drivers/md/md-bitmap.h b/drivers/md/md-bitmap.h
index cfd7395de8fd..49a93d8ff307 100644
--- a/drivers/md/md-bitmap.h
+++ b/drivers/md/md-bitmap.h
@@ -223,6 +223,7 @@ struct bitmap {
 	unsigned long daemon_lastrun; /* jiffies of last run */
 	unsigned long last_end_sync; /* when we lasted called end_sync to
 				      * update bitmap with resync progress */
+	unsigned long unplugged_count; /* last dirty count from md_bitmap_unplug */
 
 	atomic_t pending_writes; /* pending writes to the bitmap file */
 	wait_queue_head_t write_wait;
diff --git a/drivers/md/md.h b/drivers/md/md.h
index b4e2d8b87b61..1a558cb18bd4 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -501,6 +501,7 @@ struct mddev {
 		int			external;
 		int			nodes; /* Maximum number of nodes in the cluster */
 		char                    cluster_name[64]; /* Name of the cluster */
+		unsigned int		flush_threshold; /* how many dirty chunks between updates */
 	} bitmap_info;
 
 	atomic_t			max_corr_read_errors; /* max read retries */
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v2 2/3] md/bitmap: Add sysfs interface for flush threshold
  2022-10-13 22:41 [PATCH v2 0/3] Bitmap percentage flushing Jonathan Derrick
  2022-10-13 22:41 ` [PATCH v2 1/3] md/bitmap: Add chunk-threshold unplugging Jonathan Derrick
@ 2022-10-13 22:41 ` Jonathan Derrick
  2022-10-13 22:41 ` [PATCH v2 3/3] md/bitmap: Convert daemon_work to proper timer Jonathan Derrick
  2022-10-14 21:10 ` [PATCH v2 0/3] Bitmap percentage flushing John Stoffel
  3 siblings, 0 replies; 7+ messages in thread
From: Jonathan Derrick @ 2022-10-13 22:41 UTC (permalink / raw)
  To: Song Liu
  Cc: linux-raid, linux-kernel, jonathan.derrick, jonathanx.sk.derrick,
	Mariusz Tkaczyk, Jonathan Derrick

Adds a sysfs interface in the bitmap device for setting the chunk flush
threshold. This is an unsigned integer value which defines the amount of
dirty chunks allowed to be pending between bitmap flushes.

Signed-off-by: Jonathan Derrick <jonathan.derrick@linux.dev>
---
 Documentation/admin-guide/md.rst |  5 +++++
 drivers/md/md-bitmap.c           | 33 ++++++++++++++++++++++++++++++++
 2 files changed, 38 insertions(+)

diff --git a/Documentation/admin-guide/md.rst b/Documentation/admin-guide/md.rst
index d8fc9a59c086..d688ae4065cf 100644
--- a/Documentation/admin-guide/md.rst
+++ b/Documentation/admin-guide/md.rst
@@ -401,6 +401,11 @@ All md devices contain:
      once the array becomes non-degraded, and this fact has been
      recorded in the metadata.
 
+  bitmap/flush_threshold
+     The number of outstanding dirty chunks that are allowed to be pending
+     before unplugging the bitmap queue. The default behavior is to always
+     unplugging the queue when requested.
+
   consistency_policy
      This indicates how the array maintains consistency in case of unexpected
      shutdown. It can be:
diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c
index c5c77f8371a8..cd8250368860 100644
--- a/drivers/md/md-bitmap.c
+++ b/drivers/md/md-bitmap.c
@@ -2652,6 +2652,38 @@ static struct md_sysfs_entry max_backlog_used =
 __ATTR(max_backlog_used, S_IRUGO | S_IWUSR,
        behind_writes_used_show, behind_writes_used_reset);
 
+static ssize_t
+bitmap_flush_threshold_show(struct mddev *mddev, char *page)
+{
+	ssize_t ret;
+	spin_lock(&mddev->lock);
+	if (mddev->bitmap == NULL)
+		ret = sprintf(page, "0\n");
+	else
+		ret = sprintf(page, "%u\n",
+			      mddev->bitmap_info.flush_threshold);
+	spin_unlock(&mddev->lock);
+	return ret;
+}
+
+static ssize_t
+bitmap_flush_threshold_store(struct mddev *mddev, const char *buf, size_t len)
+{
+	unsigned int thresh;
+	int ret;
+	if (!mddev->bitmap)
+		return -ENOENT;
+	ret = kstrtouint(buf, 10, &thresh);
+	if (ret)
+		return ret;
+	mddev->bitmap_info.flush_threshold = thresh;
+	return len;
+}
+
+static struct md_sysfs_entry bitmap_flush_threshold =
+__ATTR(flush_threshold, S_IRUGO | S_IWUSR,
+       bitmap_flush_threshold_show, bitmap_flush_threshold_store);
+
 static struct attribute *md_bitmap_attrs[] = {
 	&bitmap_location.attr,
 	&bitmap_space.attr,
@@ -2661,6 +2693,7 @@ static struct attribute *md_bitmap_attrs[] = {
 	&bitmap_metadata.attr,
 	&bitmap_can_clear.attr,
 	&max_backlog_used.attr,
+	&bitmap_flush_threshold.attr,
 	NULL
 };
 const struct attribute_group md_bitmap_group = {
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v2 3/3] md/bitmap: Convert daemon_work to proper timer
  2022-10-13 22:41 [PATCH v2 0/3] Bitmap percentage flushing Jonathan Derrick
  2022-10-13 22:41 ` [PATCH v2 1/3] md/bitmap: Add chunk-threshold unplugging Jonathan Derrick
  2022-10-13 22:41 ` [PATCH v2 2/3] md/bitmap: Add sysfs interface for flush threshold Jonathan Derrick
@ 2022-10-13 22:41 ` Jonathan Derrick
  2022-10-14 21:10 ` [PATCH v2 0/3] Bitmap percentage flushing John Stoffel
  3 siblings, 0 replies; 7+ messages in thread
From: Jonathan Derrick @ 2022-10-13 22:41 UTC (permalink / raw)
  To: Song Liu
  Cc: linux-raid, linux-kernel, jonathan.derrick, jonathanx.sk.derrick,
	Mariusz Tkaczyk, Jonathan Derrick

It was observed that with certain high I/O workloads that the daemon
work may never run, preventing the bitmap from fully flushing. Ensure
the bitmap fully flushes by converting the daemon worker to a proper
timer.

Signed-off-by: Jonathan Derrick <jonathan.derrick@linux.dev>
---
 drivers/md/md-bitmap.c | 30 +++++++++++++-----------------
 drivers/md/md-bitmap.h |  3 +--
 drivers/md/md.c        |  9 +++++++--
 drivers/md/md.h        |  1 +
 4 files changed, 22 insertions(+), 21 deletions(-)

diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c
index cd8250368860..34feb906243f 100644
--- a/drivers/md/md-bitmap.c
+++ b/drivers/md/md-bitmap.c
@@ -1250,8 +1250,9 @@ static bitmap_counter_t *md_bitmap_get_counter(struct bitmap_counts *bitmap,
  *			out to disk
  */
 
-void md_bitmap_daemon_work(struct mddev *mddev)
+void md_bitmap_daemon_work(struct timer_list *t)
 {
+	struct mddev *mddev = from_timer(mddev, t, daemon_timer);
 	struct bitmap *bitmap;
 	unsigned long j;
 	unsigned long nextpage;
@@ -1267,11 +1268,7 @@ void md_bitmap_daemon_work(struct mddev *mddev)
 		mutex_unlock(&mddev->bitmap_info.mutex);
 		return;
 	}
-	if (time_before(jiffies, bitmap->daemon_lastrun
-			+ mddev->bitmap_info.daemon_sleep))
-		goto done;
 
-	bitmap->daemon_lastrun = jiffies;
 	if (bitmap->allclean) {
 		mddev->thread->timeout = MAX_SCHEDULE_TIMEOUT;
 		goto done;
@@ -1372,6 +1369,7 @@ void md_bitmap_daemon_work(struct mddev *mddev)
 	if (bitmap->allclean == 0)
 		mddev->thread->timeout =
 			mddev->bitmap_info.daemon_sleep;
+	mod_timer(&mddev->daemon_timer, jiffies + mddev->bitmap_info.daemon_sleep);
 	mutex_unlock(&mddev->bitmap_info.mutex);
 }
 
@@ -1735,21 +1733,16 @@ void md_bitmap_dirty_bits(struct bitmap *bitmap, unsigned long s, unsigned long
 void md_bitmap_flush(struct mddev *mddev)
 {
 	struct bitmap *bitmap = mddev->bitmap;
-	long sleep;
 
 	if (!bitmap) /* there was no bitmap */
 		return;
 
 	/* run the daemon_work three time to ensure everything is flushed
-	 * that can be
-	 */
-	sleep = mddev->bitmap_info.daemon_sleep * 2;
-	bitmap->daemon_lastrun -= sleep;
-	md_bitmap_daemon_work(mddev);
-	bitmap->daemon_lastrun -= sleep;
-	md_bitmap_daemon_work(mddev);
-	bitmap->daemon_lastrun -= sleep;
-	md_bitmap_daemon_work(mddev);
+	* that can be
+	*/
+	md_bitmap_daemon_work(&mddev->daemon_timer);
+	md_bitmap_daemon_work(&mddev->daemon_timer);
+	md_bitmap_daemon_work(&mddev->daemon_timer);
 	if (mddev->bitmap_info.external)
 		md_super_wait(mddev);
 	md_bitmap_update_sb(bitmap);
@@ -1826,7 +1819,7 @@ void md_bitmap_destroy(struct mddev *mddev)
 	mutex_unlock(&mddev->bitmap_info.mutex);
 	if (mddev->thread)
 		mddev->thread->timeout = MAX_SCHEDULE_TIMEOUT;
-
+	del_timer_sync(&mddev->daemon_timer);
 	md_bitmap_free(bitmap);
 }
 
@@ -1904,7 +1897,10 @@ struct bitmap *md_bitmap_create(struct mddev *mddev, int slot)
 	if (err)
 		goto error;
 
-	bitmap->daemon_lastrun = jiffies;
+	timer_setup(&mddev->daemon_timer, md_bitmap_daemon_work, 0);
+	mddev->daemon_timer.expires = jiffies + mddev->bitmap_info.daemon_sleep;
+	add_timer(&mddev->daemon_timer);
+
 	err = md_bitmap_resize(bitmap, blocks, mddev->bitmap_info.chunksize, 1);
 	if (err)
 		goto error;
diff --git a/drivers/md/md-bitmap.h b/drivers/md/md-bitmap.h
index 49a93d8ff307..b7e8f2266bf2 100644
--- a/drivers/md/md-bitmap.h
+++ b/drivers/md/md-bitmap.h
@@ -220,7 +220,6 @@ struct bitmap {
 	 * the bitmap daemon - periodically wakes up and sweeps the bitmap
 	 * file, cleaning up bits and flushing out pages to disk as necessary
 	 */
-	unsigned long daemon_lastrun; /* jiffies of last run */
 	unsigned long last_end_sync; /* when we lasted called end_sync to
 				      * update bitmap with resync progress */
 	unsigned long unplugged_count; /* last dirty count from md_bitmap_unplug */
@@ -265,7 +264,7 @@ void md_bitmap_sync_with_cluster(struct mddev *mddev,
 				 sector_t new_lo, sector_t new_hi);
 
 void md_bitmap_unplug(struct bitmap *bitmap);
-void md_bitmap_daemon_work(struct mddev *mddev);
+void md_bitmap_daemon_work(struct timer_list *t);
 
 int md_bitmap_resize(struct bitmap *bitmap, sector_t blocks,
 		     int chunksize, int init);
diff --git a/drivers/md/md.c b/drivers/md/md.c
index afaf36b2f6ab..9f8a9f62b3db 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -9265,8 +9265,13 @@ void md_check_recovery(struct mddev *mddev)
 	if (mddev->suspended)
 		return;
 
-	if (mddev->bitmap)
-		md_bitmap_daemon_work(mddev);
+	if (mddev->bitmap) {
+		spin_lock(&pers_lock);
+		if (mddev->bitmap->allclean == 0)
+			mddev->thread->timeout =
+				mddev->bitmap_info.daemon_sleep;
+		spin_unlock(&pers_lock);
+	}
 
 	if (signal_pending(current)) {
 		if (mddev->pers->sync_request && !mddev->external) {
diff --git a/drivers/md/md.h b/drivers/md/md.h
index 1a558cb18bd4..578cc496c325 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -503,6 +503,7 @@ struct mddev {
 		char                    cluster_name[64]; /* Name of the cluster */
 		unsigned int		flush_threshold; /* how many dirty chunks between updates */
 	} bitmap_info;
+	struct timer_list		daemon_timer;
 
 	atomic_t			max_corr_read_errors; /* max read retries */
 	struct list_head		all_mddevs;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 1/3] md/bitmap: Add chunk-threshold unplugging
  2022-10-13 22:41 ` [PATCH v2 1/3] md/bitmap: Add chunk-threshold unplugging Jonathan Derrick
@ 2022-10-14  1:11   ` Jonathan Derrick
  0 siblings, 0 replies; 7+ messages in thread
From: Jonathan Derrick @ 2022-10-14  1:11 UTC (permalink / raw)
  To: Song Liu
  Cc: linux-raid, linux-kernel, jonathan.derrick, jonathanx.sk.derrick,
	Mariusz Tkaczyk



On 10/13/2022 4:41 PM, Jonathan Derrick wrote:
> Add a mechanism to allow bitmap unplugging and flushing to wait until it
> has surpassed a defined threshold of dirty chunks. This allows certain
> high I/O write workloads to make good forward progress between bitmap
> updates or provide reliable bitmap consistency. The default behavior is
> previous behavior of always unplugging when called.
> 
> Signed-off-by: Jonathan Derrick <jonathan.derrick@linux.dev>
> ---
>  drivers/md/md-bitmap.c | 35 +++++++++++++++++++++++++++++++----
>  drivers/md/md-bitmap.h |  1 +
>  drivers/md/md.h        |  1 +
>  3 files changed, 33 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c
> index bf6dffadbe6f..c5c77f8371a8 100644
> --- a/drivers/md/md-bitmap.c
> +++ b/drivers/md/md-bitmap.c
> @@ -1004,7 +1004,7 @@ static int md_bitmap_file_test_bit(struct bitmap *bitmap, sector_t block)
>  /* this gets called when the md device is ready to unplug its underlying
>   * (slave) device queues -- before we let any writes go down, we need to
>   * sync the dirty pages of the bitmap file to disk */
> -void md_bitmap_unplug(struct bitmap *bitmap)
> +static void __md_bitmap_unplug(struct bitmap *bitmap)
>  {
>  	unsigned long i;
>  	int dirty, need_write;
> @@ -1038,6 +1038,33 @@ void md_bitmap_unplug(struct bitmap *bitmap)
>  	if (test_bit(BITMAP_WRITE_ERROR, &bitmap->flags))
>  		md_bitmap_file_kick(bitmap);
>  }
> +
> +/*
> + * Conditional unplug based on user-defined parameter
> + * Defaults to unconditional behavior
> + */
> +void md_bitmap_unplug(struct bitmap *bitmap)
> +{
> +	unsigned int flush_threshold = bitmap->mddev->bitmap_info.flush_threshold;
> +
> +	if (!flush_threshold) {
> +		__md_bitmap_unplug(bitmap);
> +	} else {
> +		struct bitmap_page *bp = bitmap->counts.bp;
> +		unsigned long pages = bitmap->counts.pages;
> +		unsigned long k, count = 0;
> +
> +		for (k = 0; k < pages; k++)
> +			if (bp[k].map && !bp[k].hijacked)
> +				count += bp[k].count;
> +
> +		if (count - bitmap->unplugged_count > flush_threshold) {
> +			bitmap->unplugged_count = count;
> +			md_bitmap_daemon_work(&bitmap->mddev->daemon_timer);
I just noticed I call daemon_timer before adding it in 3/3
I'll fix that in v3

> +			__md_bitmap_unplug(bitmap);
> +		}
> +	}
> +}
>  EXPORT_SYMBOL(md_bitmap_unplug);
>  
>  static void md_bitmap_set_memory_bits(struct bitmap *bitmap, sector_t offset, int needed);
> @@ -2012,9 +2039,9 @@ int md_bitmap_copy_from_slot(struct mddev *mddev, int slot,
>  		for (i = 0; i < bitmap->storage.file_pages; i++)
>  			if (test_page_attr(bitmap, i, BITMAP_PAGE_PENDING))
>  				set_page_attr(bitmap, i, BITMAP_PAGE_NEEDWRITE);
> -		md_bitmap_unplug(bitmap);
> +		__md_bitmap_unplug(bitmap);
>  	}
> -	md_bitmap_unplug(mddev->bitmap);
> +	__md_bitmap_unplug(mddev->bitmap);
>  	*low = lo;
>  	*high = hi;
>  	md_bitmap_free(bitmap);
> @@ -2246,7 +2273,7 @@ int md_bitmap_resize(struct bitmap *bitmap, sector_t blocks,
>  	spin_unlock_irq(&bitmap->counts.lock);
>  
>  	if (!init) {
> -		md_bitmap_unplug(bitmap);
> +		__md_bitmap_unplug(bitmap);
>  		bitmap->mddev->pers->quiesce(bitmap->mddev, 0);
>  	}
>  	ret = 0;
> diff --git a/drivers/md/md-bitmap.h b/drivers/md/md-bitmap.h
> index cfd7395de8fd..49a93d8ff307 100644
> --- a/drivers/md/md-bitmap.h
> +++ b/drivers/md/md-bitmap.h
> @@ -223,6 +223,7 @@ struct bitmap {
>  	unsigned long daemon_lastrun; /* jiffies of last run */
>  	unsigned long last_end_sync; /* when we lasted called end_sync to
>  				      * update bitmap with resync progress */
> +	unsigned long unplugged_count; /* last dirty count from md_bitmap_unplug */
>  
>  	atomic_t pending_writes; /* pending writes to the bitmap file */
>  	wait_queue_head_t write_wait;
> diff --git a/drivers/md/md.h b/drivers/md/md.h
> index b4e2d8b87b61..1a558cb18bd4 100644
> --- a/drivers/md/md.h
> +++ b/drivers/md/md.h
> @@ -501,6 +501,7 @@ struct mddev {
>  		int			external;
>  		int			nodes; /* Maximum number of nodes in the cluster */
>  		char                    cluster_name[64]; /* Name of the cluster */
> +		unsigned int		flush_threshold; /* how many dirty chunks between updates */
>  	} bitmap_info;
>  
>  	atomic_t			max_corr_read_errors; /* max read retries */

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 0/3] Bitmap percentage flushing
  2022-10-13 22:41 [PATCH v2 0/3] Bitmap percentage flushing Jonathan Derrick
                   ` (2 preceding siblings ...)
  2022-10-13 22:41 ` [PATCH v2 3/3] md/bitmap: Convert daemon_work to proper timer Jonathan Derrick
@ 2022-10-14 21:10 ` John Stoffel
  2022-10-15 22:27   ` Jonathan Derrick
  3 siblings, 1 reply; 7+ messages in thread
From: John Stoffel @ 2022-10-14 21:10 UTC (permalink / raw)
  To: Jonathan Derrick
  Cc: Song Liu, linux-raid, linux-kernel, jonathan.derrick,
	jonathanx.sk.derrick, Mariusz Tkaczyk

>>>>> "Jonathan" == Jonathan Derrick <jonathan.derrick@linux.dev> writes:

> This introduces a percentage-flushing mechanism that works in-tandem to the
> mdadm delay timer. The percentage argument is based on the number of chunks
> dirty (rather than percentage), due to large drives requiring smaller and
> smaller percentages (eg, 32TB drives-> 1% is 320GB).

I've been reading and re-reading this and I still don't understand
what you're saying here.  You say you're adding a percentage based
mechanism, but then you say it's based on chunk counts, not
percentages.  I think you need to clean this up and re-word it.

Maybe you're trying to say that you only take a percentage of the
available write bandwidth per second or something like that? 


> This set hopes to provide a way to make the bitmap flushing more consistent. It
> was observed that a synchronous, random write qd1 workload, could make bitmap
> writes easily become almost half of the I/O. And in similar workloads with
> different timing, it was several minutes between bitmap updates. This is too
> inconsistent to be reliable.

> This first and second patches adds the flush_threshold parameter. The default
> value of 0 defines the default behavior: unplugging immediately just as before.
> With a flush-threshold value of 1, it becomes more consistent and paranoid,
> flushing on nearly every I/O, leading to a 40% or greater situation. From

What situation?  Please be more clear here.  

> there, the flush_threshold can be defined higher for those situations where
> power loss is rare and full resync can be tolerated.

> The third patch converts the daemon worker to an actual timer. This makes it
> more consistent and removes some ugly code.

> Jonathan Derrick (3):
>   md/bitmap: Add chunk-threshold unplugging
>   md/bitmap: Add sysfs interface for flush threshold
>   md/bitmap: Convert daemon_work to proper timer

>  Documentation/admin-guide/md.rst |  5 ++
>  drivers/md/md-bitmap.c           | 98 +++++++++++++++++++++++++-------
>  drivers/md/md-bitmap.h           |  4 +-
>  drivers/md/md.c                  |  9 ++-
>  drivers/md/md.h                  |  2 +
>  5 files changed, 93 insertions(+), 25 deletions(-)

> -- 
> 2.31.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 0/3] Bitmap percentage flushing
  2022-10-14 21:10 ` [PATCH v2 0/3] Bitmap percentage flushing John Stoffel
@ 2022-10-15 22:27   ` Jonathan Derrick
  0 siblings, 0 replies; 7+ messages in thread
From: Jonathan Derrick @ 2022-10-15 22:27 UTC (permalink / raw)
  To: John Stoffel
  Cc: Song Liu, linux-raid, linux-kernel, jonathan.derrick,
	jonathanx.sk.derrick, Mariusz Tkaczyk



On 10/14/2022 3:10 PM, John Stoffel wrote:
>>>>>> "Jonathan" == Jonathan Derrick <jonathan.derrick@linux.dev> writes:
> 
>> This introduces a percentage-flushing mechanism that works in-tandem to the
>> mdadm delay timer. The percentage argument is based on the number of chunks
>> dirty (rather than percentage), due to large drives requiring smaller and
>> smaller percentages (eg, 32TB drives-> 1% is 320GB).
> 
> I've been reading and re-reading this and I still don't understand
> what you're saying here.  You say you're adding a percentage based
> mechanism, but then you say it's based on chunk counts, not
> percentages.  I think you need to clean this up and re-word it.> 
> Maybe you're trying to say that you only take a percentage of the
> available write bandwidth per second or something like that? 
I'll adjust it to chunk-count-based in the cover letter and make sure it
specifies bandwidth. I figured the chunk-count-based was a good way to
cover the desired percentage-based feature [1]. 

[1] https://elixir.bootlin.com/linux/latest/source/drivers/md/md-bitmap.c#L16


> 
> 
>> This set hopes to provide a way to make the bitmap flushing more consistent. It
>> was observed that a synchronous, random write qd1 workload, could make bitmap
>> writes easily become almost half of the I/O. And in similar workloads with
>> different timing, it was several minutes between bitmap updates. This is too
>> inconsistent to be reliable.
> 
>> This first and second patches adds the flush_threshold parameter. The default
>> value of 0 defines the default behavior: unplugging immediately just as before.
>> With a flush-threshold value of 1, it becomes more consistent and paranoid,
>> flushing on nearly every I/O, leading to a 40% or greater situation. From
> 
> What situation?  Please be more clear here.  

40% or more of given workload I/Os being bitmap flushes.
Will be more clear in v3

> 
>> there, the flush_threshold can be defined higher for those situations where
>> power loss is rare and full resync can be tolerated.
> 
>> The third patch converts the daemon worker to an actual timer. This makes it
>> more consistent and removes some ugly code.
> 
>> Jonathan Derrick (3):
>>   md/bitmap: Add chunk-threshold unplugging
>>   md/bitmap: Add sysfs interface for flush threshold
>>   md/bitmap: Convert daemon_work to proper timer
> 
>>  Documentation/admin-guide/md.rst |  5 ++
>>  drivers/md/md-bitmap.c           | 98 +++++++++++++++++++++++++-------
>>  drivers/md/md-bitmap.h           |  4 +-
>>  drivers/md/md.c                  |  9 ++-
>>  drivers/md/md.h                  |  2 +
>>  5 files changed, 93 insertions(+), 25 deletions(-)
> 
>> -- 
>> 2.31.1
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-10-15 22:27 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-10-13 22:41 [PATCH v2 0/3] Bitmap percentage flushing Jonathan Derrick
2022-10-13 22:41 ` [PATCH v2 1/3] md/bitmap: Add chunk-threshold unplugging Jonathan Derrick
2022-10-14  1:11   ` Jonathan Derrick
2022-10-13 22:41 ` [PATCH v2 2/3] md/bitmap: Add sysfs interface for flush threshold Jonathan Derrick
2022-10-13 22:41 ` [PATCH v2 3/3] md/bitmap: Convert daemon_work to proper timer Jonathan Derrick
2022-10-14 21:10 ` [PATCH v2 0/3] Bitmap percentage flushing John Stoffel
2022-10-15 22:27   ` Jonathan Derrick

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).