All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] dm-delay: don't busy-wait in kthread
@ 2025-04-11 20:56 Benjamin Marzinski
  2025-04-11 21:12 ` Benjamin Marzinski
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Benjamin Marzinski @ 2025-04-11 20:56 UTC (permalink / raw)
  To: Mikulas Patocka, Mike Snitzer; +Cc: dm-devel, Damien Le Moal, Christian Loehle

When using a kthread to delay the IOs, dm-delay would continuously loop,
checking if IOs were ready to submit. It had a cond_resched() call in
the loop, but might still loop hundreds of millions of times waiting for
an IO that was scheduled to be submitted 10s of ms in the future. With
the change to make dm-delay over zoned devices always use kthreads
regardless of the length of the delay, this wasted work only gets worse.

To solve this and still keep roughly the same precision for very short
delays, dm-delay now calls fsleep() for 1/8th of the smallest non-zero
delay it will place on IOs, or 1 ms, whichever is smaller. The reason
that dm-delay doesn't just use the actual expiration time of the next
delayed IO to calculated the sleep time is that delay_dtr() must wait
for the kthread to finish before deleting the table. If a zoned device
with a long delay queued an IO shortly before being suspended and
removed, the IO would be flushed in delay_presuspend(), but the removing
the device would still have to wait for the remainder of the long delay.
This time is now capped at 1 ms.

Fixes: 70bbeb29fab09 ("dm delay: for short delays, use kthread instead of timers and wq")
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
This patch is meant to apply on top of Damien Le Moal's "dm-delay:
Prevent zoned write reordering on suspend" patch. If people think it's
important to avoid either this much smaller amount of looping or the
possible 1 ms delay on deleting a table, I can send a patch that uses
usleep_range_state() and msleep_interruptible() to do an interruptible
sleep with a duration based on the expiration time of the next delayed
IO.

 drivers/md/dm-delay.c | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/drivers/md/dm-delay.c b/drivers/md/dm-delay.c
index c665b2ab1115..23027aa3fdca 100644
--- a/drivers/md/dm-delay.c
+++ b/drivers/md/dm-delay.c
@@ -14,11 +14,14 @@
 #include <linux/bio.h>
 #include <linux/slab.h>
 #include <linux/kthread.h>
+#include <linux/delay.h>
 
 #include <linux/device-mapper.h>
 
 #define DM_MSG_PREFIX "delay"
 
+#define SLEEP_SHIFT 3
+
 struct delay_class {
 	struct dm_dev *dev;
 	sector_t start;
@@ -34,6 +37,7 @@ struct delay_c {
 	struct work_struct flush_expired_bios;
 	struct list_head delayed_bios;
 	struct task_struct *worker;
+	unsigned int worker_sleep_ns;
 	bool may_delay;
 
 	struct delay_class read;
@@ -136,7 +140,8 @@ static int flush_worker_fn(void *data)
 			schedule();
 		} else {
 			spin_unlock(&dc->delayed_bios_lock);
-			cond_resched();
+			if (dc->may_delay)
+				fsleep(dc->worker_sleep_ns);
 		}
 	}
 
@@ -212,7 +217,7 @@ static int delay_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 {
 	struct delay_c *dc;
 	int ret;
-	unsigned int max_delay;
+	unsigned int max_delay, min_delay;
 	bool is_zoned = false;
 
 	if (argc != 3 && argc != 6 && argc != 9) {
@@ -236,7 +241,7 @@ static int delay_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 	ret = delay_class_ctr(ti, &dc->read, argv);
 	if (ret)
 		goto bad;
-	max_delay = dc->read.delay;
+	min_delay = max_delay = dc->read.delay;
 	is_zoned = bdev_is_zoned(dc->read.dev->bdev);
 
 	if (argc == 3) {
@@ -253,6 +258,7 @@ static int delay_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 	if (ret)
 		goto bad;
 	max_delay = max(max_delay, dc->write.delay);
+	min_delay = min_not_zero(min_delay, dc->write.delay);
 	is_zoned = is_zoned || bdev_is_zoned(dc->write.dev->bdev);
 
 	if (argc == 6) {
@@ -266,6 +272,7 @@ static int delay_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 	if (ret)
 		goto bad;
 	max_delay = max(max_delay, dc->flush.delay);
+	min_delay = min_not_zero(min_delay, dc->flush.delay);
 	is_zoned = is_zoned || bdev_is_zoned(dc->flush.dev->bdev);
 
 out:
@@ -276,6 +283,10 @@ static int delay_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 	 * suspend.
 	 */
 	if (max_delay < 50 || is_zoned) {
+		if (min_delay >> SLEEP_SHIFT)
+			dc->worker_sleep_ns = 1000;
+		else
+			dc->worker_sleep_ns = (min_delay * 1000) >> SLEEP_SHIFT;
 		dc->worker = kthread_run(&flush_worker_fn, dc, "dm-delay-flush-worker");
 		if (IS_ERR(dc->worker)) {
 			ret = PTR_ERR(dc->worker);
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] dm-delay: don't busy-wait in kthread
  2025-04-11 20:56 [PATCH] dm-delay: don't busy-wait in kthread Benjamin Marzinski
@ 2025-04-11 21:12 ` Benjamin Marzinski
  2025-04-14  0:00 ` Damien Le Moal
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 8+ messages in thread
From: Benjamin Marzinski @ 2025-04-11 21:12 UTC (permalink / raw)
  To: Mikulas Patocka, Mike Snitzer; +Cc: dm-devel, Damien Le Moal, Christian Loehle

On Fri, Apr 11, 2025 at 04:56:56PM -0400, Benjamin Marzinski wrote:
> When using a kthread to delay the IOs, dm-delay would continuously loop,
> checking if IOs were ready to submit. It had a cond_resched() call in
> the loop, but might still loop hundreds of millions of times waiting for
> an IO that was scheduled to be submitted 10s of ms in the future. With
> the change to make dm-delay over zoned devices always use kthreads
> regardless of the length of the delay, this wasted work only gets worse.
> 
> To solve this and still keep roughly the same precision for very short
> delays, dm-delay now calls fsleep() for 1/8th of the smallest non-zero
> delay it will place on IOs, or 1 ms, whichever is smaller. The reason
> that dm-delay doesn't just use the actual expiration time of the next
> delayed IO to calculated the sleep time is that delay_dtr() must wait
> for the kthread to finish before deleting the table. If a zoned device
> with a long delay queued an IO shortly before being suspended and
> removed, the IO would be flushed in delay_presuspend(), but the removing
> the device would still have to wait for the remainder of the long delay.
> This time is now capped at 1 ms.

I just realized that this same issue can occur during suspending. If new
IOs come in while or after the IOs are flushed in delay_presuspend(),
they will wait until the kthread runs again, which is again capped at
1 ms.

> 
> Fixes: 70bbeb29fab09 ("dm delay: for short delays, use kthread instead of timers and wq")
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> ---
> This patch is meant to apply on top of Damien Le Moal's "dm-delay:
> Prevent zoned write reordering on suspend" patch. If people think it's
> important to avoid either this much smaller amount of looping or the
> possible 1 ms delay on deleting a table, I can send a patch that uses
> usleep_range_state() and msleep_interruptible() to do an interruptible
> sleep with a duration based on the expiration time of the next delayed
> IO.
> 
>  drivers/md/dm-delay.c | 17 ++++++++++++++---
>  1 file changed, 14 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/md/dm-delay.c b/drivers/md/dm-delay.c
> index c665b2ab1115..23027aa3fdca 100644
> --- a/drivers/md/dm-delay.c
> +++ b/drivers/md/dm-delay.c
> @@ -14,11 +14,14 @@
>  #include <linux/bio.h>
>  #include <linux/slab.h>
>  #include <linux/kthread.h>
> +#include <linux/delay.h>
>  
>  #include <linux/device-mapper.h>
>  
>  #define DM_MSG_PREFIX "delay"
>  
> +#define SLEEP_SHIFT 3
> +
>  struct delay_class {
>  	struct dm_dev *dev;
>  	sector_t start;
> @@ -34,6 +37,7 @@ struct delay_c {
>  	struct work_struct flush_expired_bios;
>  	struct list_head delayed_bios;
>  	struct task_struct *worker;
> +	unsigned int worker_sleep_ns;
>  	bool may_delay;
>  
>  	struct delay_class read;
> @@ -136,7 +140,8 @@ static int flush_worker_fn(void *data)
>  			schedule();
>  		} else {
>  			spin_unlock(&dc->delayed_bios_lock);
> -			cond_resched();
> +			if (dc->may_delay)
> +				fsleep(dc->worker_sleep_ns);
>  		}
>  	}
>  
> @@ -212,7 +217,7 @@ static int delay_ctr(struct dm_target *ti, unsigned int argc, char **argv)
>  {
>  	struct delay_c *dc;
>  	int ret;
> -	unsigned int max_delay;
> +	unsigned int max_delay, min_delay;
>  	bool is_zoned = false;
>  
>  	if (argc != 3 && argc != 6 && argc != 9) {
> @@ -236,7 +241,7 @@ static int delay_ctr(struct dm_target *ti, unsigned int argc, char **argv)
>  	ret = delay_class_ctr(ti, &dc->read, argv);
>  	if (ret)
>  		goto bad;
> -	max_delay = dc->read.delay;
> +	min_delay = max_delay = dc->read.delay;
>  	is_zoned = bdev_is_zoned(dc->read.dev->bdev);
>  
>  	if (argc == 3) {
> @@ -253,6 +258,7 @@ static int delay_ctr(struct dm_target *ti, unsigned int argc, char **argv)
>  	if (ret)
>  		goto bad;
>  	max_delay = max(max_delay, dc->write.delay);
> +	min_delay = min_not_zero(min_delay, dc->write.delay);
>  	is_zoned = is_zoned || bdev_is_zoned(dc->write.dev->bdev);
>  
>  	if (argc == 6) {
> @@ -266,6 +272,7 @@ static int delay_ctr(struct dm_target *ti, unsigned int argc, char **argv)
>  	if (ret)
>  		goto bad;
>  	max_delay = max(max_delay, dc->flush.delay);
> +	min_delay = min_not_zero(min_delay, dc->flush.delay);
>  	is_zoned = is_zoned || bdev_is_zoned(dc->flush.dev->bdev);
>  
>  out:
> @@ -276,6 +283,10 @@ static int delay_ctr(struct dm_target *ti, unsigned int argc, char **argv)
>  	 * suspend.
>  	 */
>  	if (max_delay < 50 || is_zoned) {
> +		if (min_delay >> SLEEP_SHIFT)
> +			dc->worker_sleep_ns = 1000;
> +		else
> +			dc->worker_sleep_ns = (min_delay * 1000) >> SLEEP_SHIFT;
>  		dc->worker = kthread_run(&flush_worker_fn, dc, "dm-delay-flush-worker");
>  		if (IS_ERR(dc->worker)) {
>  			ret = PTR_ERR(dc->worker);
> -- 
> 2.48.1
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] dm-delay: don't busy-wait in kthread
  2025-04-11 20:56 [PATCH] dm-delay: don't busy-wait in kthread Benjamin Marzinski
  2025-04-11 21:12 ` Benjamin Marzinski
@ 2025-04-14  0:00 ` Damien Le Moal
  2025-04-14  0:16 ` Damien Le Moal
  2025-04-14 13:52 ` Mikulas Patocka
  3 siblings, 0 replies; 8+ messages in thread
From: Damien Le Moal @ 2025-04-14  0:00 UTC (permalink / raw)
  To: Benjamin Marzinski, Mikulas Patocka, Mike Snitzer
  Cc: dm-devel, Christian Loehle

On 4/12/25 5:56 AM, Benjamin Marzinski wrote:
> When using a kthread to delay the IOs, dm-delay would continuously loop,
> checking if IOs were ready to submit. It had a cond_resched() call in
> the loop, but might still loop hundreds of millions of times waiting for
> an IO that was scheduled to be submitted 10s of ms in the future. With
> the change to make dm-delay over zoned devices always use kthreads
> regardless of the length of the delay, this wasted work only gets worse.
> 
> To solve this and still keep roughly the same precision for very short
> delays, dm-delay now calls fsleep() for 1/8th of the smallest non-zero
> delay it will place on IOs, or 1 ms, whichever is smaller. The reason
> that dm-delay doesn't just use the actual expiration time of the next
> delayed IO to calculated the sleep time is that delay_dtr() must wait
> for the kthread to finish before deleting the table. If a zoned device
> with a long delay queued an IO shortly before being suspended and
> removed, the IO would be flushed in delay_presuspend(), but the removing
> the device would still have to wait for the remainder of the long delay.
> This time is now capped at 1 ms.
> 
> Fixes: 70bbeb29fab09 ("dm delay: for short delays, use kthread instead of timers and wq")
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>

Nice !

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>

-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] dm-delay: don't busy-wait in kthread
  2025-04-11 20:56 [PATCH] dm-delay: don't busy-wait in kthread Benjamin Marzinski
  2025-04-11 21:12 ` Benjamin Marzinski
  2025-04-14  0:00 ` Damien Le Moal
@ 2025-04-14  0:16 ` Damien Le Moal
  2025-04-14 13:52 ` Mikulas Patocka
  3 siblings, 0 replies; 8+ messages in thread
From: Damien Le Moal @ 2025-04-14  0:16 UTC (permalink / raw)
  To: Benjamin Marzinski, Mikulas Patocka, Mike Snitzer
  Cc: dm-devel, Christian Loehle

On 4/12/25 5:56 AM, Benjamin Marzinski wrote:
> When using a kthread to delay the IOs, dm-delay would continuously loop,
> checking if IOs were ready to submit. It had a cond_resched() call in
> the loop, but might still loop hundreds of millions of times waiting for
> an IO that was scheduled to be submitted 10s of ms in the future. With
> the change to make dm-delay over zoned devices always use kthreads
> regardless of the length of the delay, this wasted work only gets worse.
> 
> To solve this and still keep roughly the same precision for very short
> delays, dm-delay now calls fsleep() for 1/8th of the smallest non-zero
> delay it will place on IOs, or 1 ms, whichever is smaller. The reason
> that dm-delay doesn't just use the actual expiration time of the next
> delayed IO to calculated the sleep time is that delay_dtr() must wait
> for the kthread to finish before deleting the table. If a zoned device
> with a long delay queued an IO shortly before being suspended and
> removed, the IO would be flushed in delay_presuspend(), but the removing
> the device would still have to wait for the remainder of the long delay.
> This time is now capped at 1 ms.
> 
> Fixes: 70bbeb29fab09 ("dm delay: for short delays, use kthread instead of timers and wq")
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>

Works fine !

Tested-by: Damien Le Moal <dlemoal@kernel.org>

-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] dm-delay: don't busy-wait in kthread
  2025-04-11 20:56 [PATCH] dm-delay: don't busy-wait in kthread Benjamin Marzinski
                   ` (2 preceding siblings ...)
  2025-04-14  0:16 ` Damien Le Moal
@ 2025-04-14 13:52 ` Mikulas Patocka
  2025-04-14 16:53   ` Benjamin Marzinski
  3 siblings, 1 reply; 8+ messages in thread
From: Mikulas Patocka @ 2025-04-14 13:52 UTC (permalink / raw)
  To: Benjamin Marzinski
  Cc: Mike Snitzer, dm-devel, Damien Le Moal, Christian Loehle



On Fri, 11 Apr 2025, Benjamin Marzinski wrote:

> When using a kthread to delay the IOs, dm-delay would continuously loop,
> checking if IOs were ready to submit. It had a cond_resched() call in
> the loop, but might still loop hundreds of millions of times waiting for
> an IO that was scheduled to be submitted 10s of ms in the future. With
> the change to make dm-delay over zoned devices always use kthreads
> regardless of the length of the delay, this wasted work only gets worse.
> 
> To solve this and still keep roughly the same precision for very short
> delays, dm-delay now calls fsleep() for 1/8th of the smallest non-zero
> delay it will place on IOs, or 1 ms, whichever is smaller. The reason
> that dm-delay doesn't just use the actual expiration time of the next
> delayed IO to calculated the sleep time is that delay_dtr() must wait
> for the kthread to finish before deleting the table. If a zoned device
> with a long delay queued an IO shortly before being suspended and
> removed, the IO would be flushed in delay_presuspend(), but the removing
> the device would still have to wait for the remainder of the long delay.
> This time is now capped at 1 ms.
> 
> Fixes: 70bbeb29fab09 ("dm delay: for short delays, use kthread instead of timers and wq")
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> ---
> This patch is meant to apply on top of Damien Le Moal's "dm-delay:
> Prevent zoned write reordering on suspend" patch. If people think it's
> important to avoid either this much smaller amount of looping or the
> possible 1 ms delay on deleting a table, I can send a patch that uses
> usleep_range_state() and msleep_interruptible() to do an interruptible
> sleep with a duration based on the expiration time of the next delayed
> IO.

Hi

worker_sleep_ns should be worker_sleep_us - as the value is in 
microseconds.

fsleep in flush_worker_fn should be called unconditionally, to not consume 
100% CPU when suspending.

cond_resched() shouldn't be removed because fsleep may fall back to 
udelay.

The patch should increase target version.

I fixed the patch so that it applies on the top Linus' tree and applied 
it to the linux-dm tree.

BTW. do we need to backport this to the stable kernels? I think not, but 
if you have some reason why should we backport it, explain it.

Mikulas


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] dm-delay: don't busy-wait in kthread
  2025-04-14 13:52 ` Mikulas Patocka
@ 2025-04-14 16:53   ` Benjamin Marzinski
  2025-04-14 17:06     ` Benjamin Marzinski
  2025-04-14 20:09     ` Mikulas Patocka
  0 siblings, 2 replies; 8+ messages in thread
From: Benjamin Marzinski @ 2025-04-14 16:53 UTC (permalink / raw)
  To: Mikulas Patocka; +Cc: Mike Snitzer, dm-devel, Damien Le Moal, Christian Loehle

On Mon, Apr 14, 2025 at 03:52:18PM +0200, Mikulas Patocka wrote:
> 
> 
> On Fri, 11 Apr 2025, Benjamin Marzinski wrote:
> 
> > When using a kthread to delay the IOs, dm-delay would continuously loop,
> > checking if IOs were ready to submit. It had a cond_resched() call in
> > the loop, but might still loop hundreds of millions of times waiting for
> > an IO that was scheduled to be submitted 10s of ms in the future. With
> > the change to make dm-delay over zoned devices always use kthreads
> > regardless of the length of the delay, this wasted work only gets worse.
> > 
> > To solve this and still keep roughly the same precision for very short
> > delays, dm-delay now calls fsleep() for 1/8th of the smallest non-zero
> > delay it will place on IOs, or 1 ms, whichever is smaller. The reason
> > that dm-delay doesn't just use the actual expiration time of the next
> > delayed IO to calculated the sleep time is that delay_dtr() must wait
> > for the kthread to finish before deleting the table. If a zoned device
> > with a long delay queued an IO shortly before being suspended and
> > removed, the IO would be flushed in delay_presuspend(), but the removing
> > the device would still have to wait for the remainder of the long delay.
> > This time is now capped at 1 ms.
> > 
> > Fixes: 70bbeb29fab09 ("dm delay: for short delays, use kthread instead of timers and wq")
> > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> > ---
> > This patch is meant to apply on top of Damien Le Moal's "dm-delay:
> > Prevent zoned write reordering on suspend" patch. If people think it's
> > important to avoid either this much smaller amount of looping or the
> > possible 1 ms delay on deleting a table, I can send a patch that uses
> > usleep_range_state() and msleep_interruptible() to do an interruptible
> > sleep with a duration based on the expiration time of the next delayed
> > IO.
> 
> Hi
> 
> worker_sleep_ns should be worker_sleep_us - as the value is in 
> microseconds.

Oops.

> fsleep in flush_worker_fn should be called unconditionally, to not consume 
> 100% CPU when suspending.

This is fine, but flush_worker_fn() won't busy-wait while suspending.
Since it is calling flush_delayed_bios() with flush_all equal to true,
it will handle all the bios on the list. As long as bios keep getting
added to the list while it's flushing the last batch, it will keep
looping to flush them. But it will be doing necessary work on each loop,
and flush_delayed_bios() has a cond_resched(), so even if there are a
flood of bios, it will still take breaks. As soon as bios stop
continuously arriving, flush_worker_fn() will see the empty list and the
kthread will stop.

Unconditionally sleeping here makes it more likely that dm-delay will
end up sleeping unnecessarily while a there are just a few remaining
bios on the list. Like I said in my commit message, this sleep will be
capped at 1 ms, so it's not that big of a deal. But it's a trade-off.
I'm o.k. with your version. I just not sure that it is the better
trade-off. Perhaps I'm overlooking something.

> cond_resched() shouldn't be removed because fsleep may fall back to 
> udelay.

Again, your version is fine, but I'm not sure that cond_resched() was
ever necessary, since there already is one in flush_delayed_bios().
Also, at least the way it's currently coded, fsleep() will only resort
to busy-waiting when the delay is 10 us or less, and the shortest it can
be with this code is 62 us, so I don't think this cond_resched() will
ever do anything.

> The patch should increase target version.
> 
> I fixed the patch so that it applies on the top Linus' tree and applied 
> it to the linux-dm tree.
> 
> BTW. do we need to backport this to the stable kernels? I think not, but 
> if you have some reason why should we backport it, explain it.

dm-delay is basically a testing target, so I agree that it seems
unnecessary to backport this.

-Ben

> 
> Mikulas


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] dm-delay: don't busy-wait in kthread
  2025-04-14 16:53   ` Benjamin Marzinski
@ 2025-04-14 17:06     ` Benjamin Marzinski
  2025-04-14 20:09     ` Mikulas Patocka
  1 sibling, 0 replies; 8+ messages in thread
From: Benjamin Marzinski @ 2025-04-14 17:06 UTC (permalink / raw)
  To: Mikulas Patocka; +Cc: Mike Snitzer, dm-devel, Damien Le Moal, Christian Loehle

On Mon, Apr 14, 2025 at 12:53:15PM -0400, Benjamin Marzinski wrote:
> On Mon, Apr 14, 2025 at 03:52:18PM +0200, Mikulas Patocka wrote:
> > 
> > 
> > On Fri, 11 Apr 2025, Benjamin Marzinski wrote:
> > 
> > > When using a kthread to delay the IOs, dm-delay would continuously loop,
> > > checking if IOs were ready to submit. It had a cond_resched() call in
> > > the loop, but might still loop hundreds of millions of times waiting for
> > > an IO that was scheduled to be submitted 10s of ms in the future. With
> > > the change to make dm-delay over zoned devices always use kthreads
> > > regardless of the length of the delay, this wasted work only gets worse.
> > > 
> > > To solve this and still keep roughly the same precision for very short
> > > delays, dm-delay now calls fsleep() for 1/8th of the smallest non-zero
> > > delay it will place on IOs, or 1 ms, whichever is smaller. The reason
> > > that dm-delay doesn't just use the actual expiration time of the next
> > > delayed IO to calculated the sleep time is that delay_dtr() must wait
> > > for the kthread to finish before deleting the table. If a zoned device
> > > with a long delay queued an IO shortly before being suspended and
> > > removed, the IO would be flushed in delay_presuspend(), but the removing
> > > the device would still have to wait for the remainder of the long delay.
> > > This time is now capped at 1 ms.
> > > 
> > > Fixes: 70bbeb29fab09 ("dm delay: for short delays, use kthread instead of timers and wq")
> > > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> > > ---
> > > This patch is meant to apply on top of Damien Le Moal's "dm-delay:
> > > Prevent zoned write reordering on suspend" patch. If people think it's
> > > important to avoid either this much smaller amount of looping or the
> > > possible 1 ms delay on deleting a table, I can send a patch that uses
> > > usleep_range_state() and msleep_interruptible() to do an interruptible
> > > sleep with a duration based on the expiration time of the next delayed
> > > IO.
> > 
> > Hi
> > 
> > worker_sleep_ns should be worker_sleep_us - as the value is in 
> > microseconds.
> 
> Oops.
> 
> > fsleep in flush_worker_fn should be called unconditionally, to not consume 
> > 100% CPU when suspending.
> 
> This is fine, but flush_worker_fn() won't busy-wait while suspending.
> Since it is calling flush_delayed_bios() with flush_all equal to true,
> it will handle all the bios on the list. As long as bios keep getting
> added to the list while it's flushing the last batch, it will keep
> looping to flush them. But it will be doing necessary work on each loop,
> and flush_delayed_bios() has a cond_resched(), so even if there are a
> flood of bios, it will still take breaks. As soon as bios stop
> continuously arriving, flush_worker_fn() will see the empty list and the
> kthread will stop.
> 
> Unconditionally sleeping here makes it more likely that dm-delay will
> end up sleeping unnecessarily while a there are just a few remaining
> bios on the list. Like I said in my commit message, this sleep will be
> capped at 1 ms, so it's not that big of a deal. But it's a trade-off.
> I'm o.k. with your version. I just not sure that it is the better
> trade-off. Perhaps I'm overlooking something.

And the thing that I overlooked was you NAK'ing Damien's patch. Oops.
Please ignore this.

-Ben
 
> > cond_resched() shouldn't be removed because fsleep may fall back to 
> > udelay.
> 
> Again, your version is fine, but I'm not sure that cond_resched() was
> ever necessary, since there already is one in flush_delayed_bios().
> Also, at least the way it's currently coded, fsleep() will only resort
> to busy-waiting when the delay is 10 us or less, and the shortest it can
> be with this code is 62 us, so I don't think this cond_resched() will
> ever do anything.
> 
> > The patch should increase target version.
> > 
> > I fixed the patch so that it applies on the top Linus' tree and applied 
> > it to the linux-dm tree.
> > 
> > BTW. do we need to backport this to the stable kernels? I think not, but 
> > if you have some reason why should we backport it, explain it.
> 
> dm-delay is basically a testing target, so I agree that it seems
> unnecessary to backport this.
> 
> -Ben
> 
> > 
> > Mikulas
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] dm-delay: don't busy-wait in kthread
  2025-04-14 16:53   ` Benjamin Marzinski
  2025-04-14 17:06     ` Benjamin Marzinski
@ 2025-04-14 20:09     ` Mikulas Patocka
  1 sibling, 0 replies; 8+ messages in thread
From: Mikulas Patocka @ 2025-04-14 20:09 UTC (permalink / raw)
  To: Benjamin Marzinski
  Cc: Mike Snitzer, dm-devel, Damien Le Moal, Christian Loehle



On Mon, 14 Apr 2025, Benjamin Marzinski wrote:

> > cond_resched() shouldn't be removed because fsleep may fall back to 
> > udelay.
> 
> Again, your version is fine, but I'm not sure that cond_resched() was
> ever necessary, since there already is one in flush_delayed_bios().
> Also, at least the way it's currently coded, fsleep() will only resort
> to busy-waiting when the delay is 10 us or less, and the shortest it can
> be with this code is 62 us, so I don't think this cond_resched() will
> ever do anything.

Yes, but this is implementation detail that may change. Someone may change 
fsleep to spin for larger timeout without knowing that dm-delay depends on 
fsleep not spinning.

> > The patch should increase target version.
> > 
> > I fixed the patch so that it applies on the top Linus' tree and applied 
> > it to the linux-dm tree.
> > 
> > BTW. do we need to backport this to the stable kernels? I think not, but 
> > if you have some reason why should we backport it, explain it.
> 
> dm-delay is basically a testing target, so I agree that it seems
> unnecessary to backport this.
> 
> -Ben

OK, so I removed the "Fixes:" tag.

Mikulas


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-04-14 20:09 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-11 20:56 [PATCH] dm-delay: don't busy-wait in kthread Benjamin Marzinski
2025-04-11 21:12 ` Benjamin Marzinski
2025-04-14  0:00 ` Damien Le Moal
2025-04-14  0:16 ` Damien Le Moal
2025-04-14 13:52 ` Mikulas Patocka
2025-04-14 16:53   ` Benjamin Marzinski
2025-04-14 17:06     ` Benjamin Marzinski
2025-04-14 20:09     ` Mikulas Patocka

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.