From: Benjamin Marzinski <bmarzins@redhat.com>
To: Mikulas Patocka <mpatocka@redhat.com>, Mike Snitzer <snitzer@redhat.com>
Cc: dm-devel@lists.linux.dev, Damien Le Moal <dlemoal@kernel.org>,
Christian Loehle <christian.loehle@arm.com>
Subject: Re: [PATCH] dm-delay: don't busy-wait in kthread
Date: Fri, 11 Apr 2025 17:12:20 -0400 [thread overview]
Message-ID: <Z_mFtK6H4CFdjHWZ@redhat.com> (raw)
In-Reply-To: <20250411205656.60709-1-bmarzins@redhat.com>
On Fri, Apr 11, 2025 at 04:56:56PM -0400, Benjamin Marzinski wrote:
> When using a kthread to delay the IOs, dm-delay would continuously loop,
> checking if IOs were ready to submit. It had a cond_resched() call in
> the loop, but might still loop hundreds of millions of times waiting for
> an IO that was scheduled to be submitted 10s of ms in the future. With
> the change to make dm-delay over zoned devices always use kthreads
> regardless of the length of the delay, this wasted work only gets worse.
>
> To solve this and still keep roughly the same precision for very short
> delays, dm-delay now calls fsleep() for 1/8th of the smallest non-zero
> delay it will place on IOs, or 1 ms, whichever is smaller. The reason
> that dm-delay doesn't just use the actual expiration time of the next
> delayed IO to calculated the sleep time is that delay_dtr() must wait
> for the kthread to finish before deleting the table. If a zoned device
> with a long delay queued an IO shortly before being suspended and
> removed, the IO would be flushed in delay_presuspend(), but the removing
> the device would still have to wait for the remainder of the long delay.
> This time is now capped at 1 ms.
I just realized that this same issue can occur during suspending. If new
IOs come in while or after the IOs are flushed in delay_presuspend(),
they will wait until the kthread runs again, which is again capped at
1 ms.
>
> Fixes: 70bbeb29fab09 ("dm delay: for short delays, use kthread instead of timers and wq")
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> ---
> This patch is meant to apply on top of Damien Le Moal's "dm-delay:
> Prevent zoned write reordering on suspend" patch. If people think it's
> important to avoid either this much smaller amount of looping or the
> possible 1 ms delay on deleting a table, I can send a patch that uses
> usleep_range_state() and msleep_interruptible() to do an interruptible
> sleep with a duration based on the expiration time of the next delayed
> IO.
>
> drivers/md/dm-delay.c | 17 ++++++++++++++---
> 1 file changed, 14 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/md/dm-delay.c b/drivers/md/dm-delay.c
> index c665b2ab1115..23027aa3fdca 100644
> --- a/drivers/md/dm-delay.c
> +++ b/drivers/md/dm-delay.c
> @@ -14,11 +14,14 @@
> #include <linux/bio.h>
> #include <linux/slab.h>
> #include <linux/kthread.h>
> +#include <linux/delay.h>
>
> #include <linux/device-mapper.h>
>
> #define DM_MSG_PREFIX "delay"
>
> +#define SLEEP_SHIFT 3
> +
> struct delay_class {
> struct dm_dev *dev;
> sector_t start;
> @@ -34,6 +37,7 @@ struct delay_c {
> struct work_struct flush_expired_bios;
> struct list_head delayed_bios;
> struct task_struct *worker;
> + unsigned int worker_sleep_ns;
> bool may_delay;
>
> struct delay_class read;
> @@ -136,7 +140,8 @@ static int flush_worker_fn(void *data)
> schedule();
> } else {
> spin_unlock(&dc->delayed_bios_lock);
> - cond_resched();
> + if (dc->may_delay)
> + fsleep(dc->worker_sleep_ns);
> }
> }
>
> @@ -212,7 +217,7 @@ static int delay_ctr(struct dm_target *ti, unsigned int argc, char **argv)
> {
> struct delay_c *dc;
> int ret;
> - unsigned int max_delay;
> + unsigned int max_delay, min_delay;
> bool is_zoned = false;
>
> if (argc != 3 && argc != 6 && argc != 9) {
> @@ -236,7 +241,7 @@ static int delay_ctr(struct dm_target *ti, unsigned int argc, char **argv)
> ret = delay_class_ctr(ti, &dc->read, argv);
> if (ret)
> goto bad;
> - max_delay = dc->read.delay;
> + min_delay = max_delay = dc->read.delay;
> is_zoned = bdev_is_zoned(dc->read.dev->bdev);
>
> if (argc == 3) {
> @@ -253,6 +258,7 @@ static int delay_ctr(struct dm_target *ti, unsigned int argc, char **argv)
> if (ret)
> goto bad;
> max_delay = max(max_delay, dc->write.delay);
> + min_delay = min_not_zero(min_delay, dc->write.delay);
> is_zoned = is_zoned || bdev_is_zoned(dc->write.dev->bdev);
>
> if (argc == 6) {
> @@ -266,6 +272,7 @@ static int delay_ctr(struct dm_target *ti, unsigned int argc, char **argv)
> if (ret)
> goto bad;
> max_delay = max(max_delay, dc->flush.delay);
> + min_delay = min_not_zero(min_delay, dc->flush.delay);
> is_zoned = is_zoned || bdev_is_zoned(dc->flush.dev->bdev);
>
> out:
> @@ -276,6 +283,10 @@ static int delay_ctr(struct dm_target *ti, unsigned int argc, char **argv)
> * suspend.
> */
> if (max_delay < 50 || is_zoned) {
> + if (min_delay >> SLEEP_SHIFT)
> + dc->worker_sleep_ns = 1000;
> + else
> + dc->worker_sleep_ns = (min_delay * 1000) >> SLEEP_SHIFT;
> dc->worker = kthread_run(&flush_worker_fn, dc, "dm-delay-flush-worker");
> if (IS_ERR(dc->worker)) {
> ret = PTR_ERR(dc->worker);
> --
> 2.48.1
>
next prev parent reply other threads:[~2025-04-11 21:12 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-11 20:56 [PATCH] dm-delay: don't busy-wait in kthread Benjamin Marzinski
2025-04-11 21:12 ` Benjamin Marzinski [this message]
2025-04-14 0:00 ` Damien Le Moal
2025-04-14 0:16 ` Damien Le Moal
2025-04-14 13:52 ` Mikulas Patocka
2025-04-14 16:53 ` Benjamin Marzinski
2025-04-14 17:06 ` Benjamin Marzinski
2025-04-14 20:09 ` Mikulas Patocka
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z_mFtK6H4CFdjHWZ@redhat.com \
--to=bmarzins@redhat.com \
--cc=christian.loehle@arm.com \
--cc=dlemoal@kernel.org \
--cc=dm-devel@lists.linux.dev \
--cc=mpatocka@redhat.com \
--cc=snitzer@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.