* [PATCH] dm-delay: Prevent zoned write reordering on suspend
@ 2025-04-10 4:33 Damien Le Moal
2025-04-10 4:43 ` Damien Le Moal
2025-04-10 7:54 ` Christoph Hellwig
0 siblings, 2 replies; 6+ messages in thread
From: Damien Le Moal @ 2025-04-10 4:33 UTC (permalink / raw)
To: dm-devel, Mike Snitzer, Mikulas Patocka
Cc: Christoph Hellwig, Benjamin Marzinski
When a dm-delay devie is being suspended, the .presuspend() operation is
first executed (delay_presuspend()) to immediately issue all the BIOs
present in the delayed list of the device and also sets the device
may_delay boolean to false. At the same time, any new BIO is issued to
the device will not be delayed and immediately issued with delay_bio()
returning DM_MAPIO_REMAPPED. This creates a situation where potentially
2 different contexts may be issuing write BIOs to the same zone of a
zone device without respecting the issuing order from the user, that is,
a newly issued write BIO may be issued before other write BIOs for the
same target zone that are in the device delayed list. If such situation
occurs, write BIOs may be failed by the underlying zoned device due to
an unaligned write error.
Prevent this situation from happening by always handling newly issued
write BIOs using the delayed list of BIOs, even when the device is being
suspended. This is done by forcing the use of the worker kthread for
zoned devices, and by modifying flush_worker_fn() to always flush all
delayed BIOs if the device may_delay boolean is false.
Fixes: d43929ef65a6 ("dm-delay: support zoned devices")
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
---
drivers/md/dm-delay.c | 29 +++++++++++++++++++++--------
1 file changed, 21 insertions(+), 8 deletions(-)
diff --git a/drivers/md/dm-delay.c b/drivers/md/dm-delay.c
index d4cf0ac2a7aa..c665b2ab1115 100644
--- a/drivers/md/dm-delay.c
+++ b/drivers/md/dm-delay.c
@@ -128,7 +128,7 @@ static int flush_worker_fn(void *data)
struct delay_c *dc = data;
while (!kthread_should_stop()) {
- flush_delayed_bios(dc, false);
+ flush_delayed_bios(dc, !dc->may_delay);
spin_lock(&dc->delayed_bios_lock);
if (unlikely(list_empty(&dc->delayed_bios))) {
set_current_state(TASK_INTERRUPTIBLE);
@@ -213,6 +213,7 @@ static int delay_ctr(struct dm_target *ti, unsigned int argc, char **argv)
struct delay_c *dc;
int ret;
unsigned int max_delay;
+ bool is_zoned = false;
if (argc != 3 && argc != 6 && argc != 9) {
ti->error = "Requires exactly 3, 6 or 9 arguments";
@@ -236,6 +237,7 @@ static int delay_ctr(struct dm_target *ti, unsigned int argc, char **argv)
if (ret)
goto bad;
max_delay = dc->read.delay;
+ is_zoned = bdev_is_zoned(dc->read.dev->bdev);
if (argc == 3) {
ret = delay_class_ctr(ti, &dc->write, argv);
@@ -251,6 +253,7 @@ static int delay_ctr(struct dm_target *ti, unsigned int argc, char **argv)
if (ret)
goto bad;
max_delay = max(max_delay, dc->write.delay);
+ is_zoned = is_zoned || bdev_is_zoned(dc->write.dev->bdev);
if (argc == 6) {
ret = delay_class_ctr(ti, &dc->flush, argv + 3);
@@ -263,13 +266,16 @@ static int delay_ctr(struct dm_target *ti, unsigned int argc, char **argv)
if (ret)
goto bad;
max_delay = max(max_delay, dc->flush.delay);
+ is_zoned = is_zoned || bdev_is_zoned(dc->flush.dev->bdev);
out:
- if (max_delay < 50) {
- /*
- * In case of small requested delays, use kthread instead of
- * timers and workqueue to achieve better latency.
- */
+ /*
+ * In case of small requested delays, use kthread instead of timers and
+ * workqueue to achieve better latency. Also use a kthread for a zoned
+ * device so that we can preserve the order of write operations during
+ * suspend.
+ */
+ if (max_delay < 50 || is_zoned) {
dc->worker = kthread_run(&flush_worker_fn, dc, "dm-delay-flush-worker");
if (IS_ERR(dc->worker)) {
ret = PTR_ERR(dc->worker);
@@ -313,8 +319,15 @@ static int delay_bio(struct delay_c *dc, struct delay_class *c, struct bio *bio)
spin_lock(&dc->delayed_bios_lock);
if (unlikely(!dc->may_delay)) {
- spin_unlock(&dc->delayed_bios_lock);
- return DM_MAPIO_REMAPPED;
+ /*
+ * Issue the BIO immediately if the device is not zoned. FOr a
+ * zoned device, preserver the ordering of write operations by
+ * using the delay list.
+ */
+ if (!bdev_is_zoned(c->dev->bdev) || c != &dc->write) {
+ spin_unlock(&dc->delayed_bios_lock);
+ return DM_MAPIO_REMAPPED;
+ }
}
c->ops++;
list_add_tail(&delayed->list, &dc->delayed_bios);
--
2.49.0
^ permalink raw reply related [flat|nested] 6+ messages in thread* Re: [PATCH] dm-delay: Prevent zoned write reordering on suspend
2025-04-10 4:33 [PATCH] dm-delay: Prevent zoned write reordering on suspend Damien Le Moal
@ 2025-04-10 4:43 ` Damien Le Moal
2025-04-10 7:54 ` Christoph Hellwig
1 sibling, 0 replies; 6+ messages in thread
From: Damien Le Moal @ 2025-04-10 4:43 UTC (permalink / raw)
To: dm-devel, Mike Snitzer, Mikulas Patocka
Cc: Christoph Hellwig, Benjamin Marzinski
On 4/10/25 1:33 PM, Damien Le Moal wrote:
> When a dm-delay devie is being suspended, the .presuspend() operation is
> first executed (delay_presuspend()) to immediately issue all the BIOs
> present in the delayed list of the device and also sets the device
> may_delay boolean to false. At the same time, any new BIO is issued to
> the device will not be delayed and immediately issued with delay_bio()
> returning DM_MAPIO_REMAPPED. This creates a situation where potentially
> 2 different contexts may be issuing write BIOs to the same zone of a
> zone device without respecting the issuing order from the user, that is,
> a newly issued write BIO may be issued before other write BIOs for the
> same target zone that are in the device delayed list. If such situation
> occurs, write BIOs may be failed by the underlying zoned device due to
> an unaligned write error.
>
> Prevent this situation from happening by always handling newly issued
> write BIOs using the delayed list of BIOs, even when the device is being
> suspended. This is done by forcing the use of the worker kthread for
> zoned devices, and by modifying flush_worker_fn() to always flush all
> delayed BIOs if the device may_delay boolean is false.
>
> Fixes: d43929ef65a6 ("dm-delay: support zoned devices")
> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
I probably should also add:
Reported-by: Benjamin Marzinski <bmarzins@redhat.com>
> ---
> drivers/md/dm-delay.c | 29 +++++++++++++++++++++--------
> 1 file changed, 21 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/md/dm-delay.c b/drivers/md/dm-delay.c
> index d4cf0ac2a7aa..c665b2ab1115 100644
> --- a/drivers/md/dm-delay.c
> +++ b/drivers/md/dm-delay.c
> @@ -128,7 +128,7 @@ static int flush_worker_fn(void *data)
> struct delay_c *dc = data;
>
> while (!kthread_should_stop()) {
> - flush_delayed_bios(dc, false);
> + flush_delayed_bios(dc, !dc->may_delay);
> spin_lock(&dc->delayed_bios_lock);
> if (unlikely(list_empty(&dc->delayed_bios))) {
> set_current_state(TASK_INTERRUPTIBLE);
> @@ -213,6 +213,7 @@ static int delay_ctr(struct dm_target *ti, unsigned int argc, char **argv)
> struct delay_c *dc;
> int ret;
> unsigned int max_delay;
> + bool is_zoned = false;
>
> if (argc != 3 && argc != 6 && argc != 9) {
> ti->error = "Requires exactly 3, 6 or 9 arguments";
> @@ -236,6 +237,7 @@ static int delay_ctr(struct dm_target *ti, unsigned int argc, char **argv)
> if (ret)
> goto bad;
> max_delay = dc->read.delay;
> + is_zoned = bdev_is_zoned(dc->read.dev->bdev);
>
> if (argc == 3) {
> ret = delay_class_ctr(ti, &dc->write, argv);
> @@ -251,6 +253,7 @@ static int delay_ctr(struct dm_target *ti, unsigned int argc, char **argv)
> if (ret)
> goto bad;
> max_delay = max(max_delay, dc->write.delay);
> + is_zoned = is_zoned || bdev_is_zoned(dc->write.dev->bdev);
>
> if (argc == 6) {
> ret = delay_class_ctr(ti, &dc->flush, argv + 3);
> @@ -263,13 +266,16 @@ static int delay_ctr(struct dm_target *ti, unsigned int argc, char **argv)
> if (ret)
> goto bad;
> max_delay = max(max_delay, dc->flush.delay);
> + is_zoned = is_zoned || bdev_is_zoned(dc->flush.dev->bdev);
>
> out:
> - if (max_delay < 50) {
> - /*
> - * In case of small requested delays, use kthread instead of
> - * timers and workqueue to achieve better latency.
> - */
> + /*
> + * In case of small requested delays, use kthread instead of timers and
> + * workqueue to achieve better latency. Also use a kthread for a zoned
> + * device so that we can preserve the order of write operations during
> + * suspend.
> + */
> + if (max_delay < 50 || is_zoned) {
> dc->worker = kthread_run(&flush_worker_fn, dc, "dm-delay-flush-worker");
> if (IS_ERR(dc->worker)) {
> ret = PTR_ERR(dc->worker);
> @@ -313,8 +319,15 @@ static int delay_bio(struct delay_c *dc, struct delay_class *c, struct bio *bio)
>
> spin_lock(&dc->delayed_bios_lock);
> if (unlikely(!dc->may_delay)) {
> - spin_unlock(&dc->delayed_bios_lock);
> - return DM_MAPIO_REMAPPED;
> + /*
> + * Issue the BIO immediately if the device is not zoned. FOr a
> + * zoned device, preserver the ordering of write operations by
> + * using the delay list.
> + */
> + if (!bdev_is_zoned(c->dev->bdev) || c != &dc->write) {
> + spin_unlock(&dc->delayed_bios_lock);
> + return DM_MAPIO_REMAPPED;
> + }
> }
> c->ops++;
> list_add_tail(&delayed->list, &dc->delayed_bios);
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [PATCH] dm-delay: Prevent zoned write reordering on suspend
2025-04-10 4:33 [PATCH] dm-delay: Prevent zoned write reordering on suspend Damien Le Moal
2025-04-10 4:43 ` Damien Le Moal
@ 2025-04-10 7:54 ` Christoph Hellwig
2025-04-10 8:02 ` Damien Le Moal
1 sibling, 1 reply; 6+ messages in thread
From: Christoph Hellwig @ 2025-04-10 7:54 UTC (permalink / raw)
To: Damien Le Moal
Cc: dm-devel, Mike Snitzer, Mikulas Patocka, Christoph Hellwig,
Benjamin Marzinski
On Thu, Apr 10, 2025 at 01:33:11PM +0900, Damien Le Moal wrote:
> When a dm-delay devie is being suspended, the .presuspend() operation is
s/devie/device/
> first executed (delay_presuspend()) to immediately issue all the BIOs
> present in the delayed list of the device and also sets the device
> may_delay boolean to false. At the same time, any new BIO is issued to
> the device will not be delayed and immediately issued with delay_bio()
> returning DM_MAPIO_REMAPPED. This creates a situation where potentially
> 2 different contexts may be issuing write BIOs to the same zone of a
> zone device without respecting the issuing order from the user, that is,
> a newly issued write BIO may be issued before other write BIOs for the
> same target zone that are in the device delayed list. If such situation
> occurs, write BIOs may be failed by the underlying zoned device due to
> an unaligned write error.
>
> Prevent this situation from happening by always handling newly issued
> write BIOs using the delayed list of BIOs, even when the device is being
> suspended. This is done by forcing the use of the worker kthread for
> zoned devices, and by modifying flush_worker_fn() to always flush all
> delayed BIOs if the device may_delay boolean is false.
Is that scenario specific to dm-delay? I think the same applies to
any other target that supports passing on bios to zoned devices.
> spin_lock(&dc->delayed_bios_lock);
> if (unlikely(!dc->may_delay)) {
> - spin_unlock(&dc->delayed_bios_lock);
> - return DM_MAPIO_REMAPPED;
> + /*
> + * Issue the BIO immediately if the device is not zoned. FOr a
> + * zoned device, preserver the ordering of write operations by
> + * using the delay list.
> + */
> + if (!bdev_is_zoned(c->dev->bdev) || c != &dc->write) {
> + spin_unlock(&dc->delayed_bios_lock);
> + return DM_MAPIO_REMAPPED;
Don't we only need to do that if anything is queued up due to a
suspension? Then again having a different patch might be premature
optimization, it's not like anyone cares about dm-delay performance
almost by definition :)
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [PATCH] dm-delay: Prevent zoned write reordering on suspend
2025-04-10 7:54 ` Christoph Hellwig
@ 2025-04-10 8:02 ` Damien Le Moal
2025-04-10 18:10 ` Benjamin Marzinski
0 siblings, 1 reply; 6+ messages in thread
From: Damien Le Moal @ 2025-04-10 8:02 UTC (permalink / raw)
To: Christoph Hellwig
Cc: dm-devel, Mike Snitzer, Mikulas Patocka, Benjamin Marzinski
On 4/10/25 4:54 PM, Christoph Hellwig wrote:
> On Thu, Apr 10, 2025 at 01:33:11PM +0900, Damien Le Moal wrote:
>> When a dm-delay devie is being suspended, the .presuspend() operation is
>
> s/devie/device/
>
>> first executed (delay_presuspend()) to immediately issue all the BIOs
>> present in the delayed list of the device and also sets the device
>> may_delay boolean to false. At the same time, any new BIO is issued to
>> the device will not be delayed and immediately issued with delay_bio()
>> returning DM_MAPIO_REMAPPED. This creates a situation where potentially
>> 2 different contexts may be issuing write BIOs to the same zone of a
>> zone device without respecting the issuing order from the user, that is,
>> a newly issued write BIO may be issued before other write BIOs for the
>> same target zone that are in the device delayed list. If such situation
>> occurs, write BIOs may be failed by the underlying zoned device due to
>> an unaligned write error.
>>
>> Prevent this situation from happening by always handling newly issued
>> write BIOs using the delayed list of BIOs, even when the device is being
>> suspended. This is done by forcing the use of the worker kthread for
>> zoned devices, and by modifying flush_worker_fn() to always flush all
>> delayed BIOs if the device may_delay boolean is false.
>
> Is that scenario specific to dm-delay? I think the same applies to
> any other target that supports passing on bios to zoned devices.
Don't think so. The delayed BIO list is a dm-delay thing only. dm-linear,
dm-error or dm-crypt do not delay BIOs like dm-delay so we do not have BIOs
being issued in the pre-suspend context.
>
>> spin_lock(&dc->delayed_bios_lock);
>> if (unlikely(!dc->may_delay)) {
>> - spin_unlock(&dc->delayed_bios_lock);
>> - return DM_MAPIO_REMAPPED;
>> + /*
>> + * Issue the BIO immediately if the device is not zoned. FOr a
>> + * zoned device, preserver the ordering of write operations by
>> + * using the delay list.
>> + */
>> + if (!bdev_is_zoned(c->dev->bdev) || c != &dc->write) {
>> + spin_unlock(&dc->delayed_bios_lock);
>> + return DM_MAPIO_REMAPPED;
>
> Don't we only need to do that if anything is queued up due to a
> suspension? Then again having a different patch might be premature
> optimization, it's not like anyone cares about dm-delay performance
> almost by definition :)
True. I can add a list_empty check here. Will send a v2 with that.
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [PATCH] dm-delay: Prevent zoned write reordering on suspend
2025-04-10 8:02 ` Damien Le Moal
@ 2025-04-10 18:10 ` Benjamin Marzinski
2025-04-10 23:56 ` Damien Le Moal
0 siblings, 1 reply; 6+ messages in thread
From: Benjamin Marzinski @ 2025-04-10 18:10 UTC (permalink / raw)
To: Damien Le Moal; +Cc: Christoph Hellwig, dm-devel, Mike Snitzer, Mikulas Patocka
On Thu, Apr 10, 2025 at 05:02:00PM +0900, Damien Le Moal wrote:
> On 4/10/25 4:54 PM, Christoph Hellwig wrote:
> > On Thu, Apr 10, 2025 at 01:33:11PM +0900, Damien Le Moal wrote:
> >> When a dm-delay devie is being suspended, the .presuspend() operation is
> >
> > s/devie/device/
> >
> >> first executed (delay_presuspend()) to immediately issue all the BIOs
> >> present in the delayed list of the device and also sets the device
> >> may_delay boolean to false. At the same time, any new BIO is issued to
> >> the device will not be delayed and immediately issued with delay_bio()
> >> returning DM_MAPIO_REMAPPED. This creates a situation where potentially
> >> 2 different contexts may be issuing write BIOs to the same zone of a
> >> zone device without respecting the issuing order from the user, that is,
> >> a newly issued write BIO may be issued before other write BIOs for the
> >> same target zone that are in the device delayed list. If such situation
> >> occurs, write BIOs may be failed by the underlying zoned device due to
> >> an unaligned write error.
> >>
> >> Prevent this situation from happening by always handling newly issued
> >> write BIOs using the delayed list of BIOs, even when the device is being
> >> suspended. This is done by forcing the use of the worker kthread for
> >> zoned devices, and by modifying flush_worker_fn() to always flush all
> >> delayed BIOs if the device may_delay boolean is false.
> >
> > Is that scenario specific to dm-delay? I think the same applies to
> > any other target that supports passing on bios to zoned devices.
>
> Don't think so. The delayed BIO list is a dm-delay thing only. dm-linear,
> dm-error or dm-crypt do not delay BIOs like dm-delay so we do not have BIOs
> being issued in the pre-suspend context.
>
> >
> >> spin_lock(&dc->delayed_bios_lock);
> >> if (unlikely(!dc->may_delay)) {
> >> - spin_unlock(&dc->delayed_bios_lock);
> >> - return DM_MAPIO_REMAPPED;
> >> + /*
> >> + * Issue the BIO immediately if the device is not zoned. FOr a
> >> + * zoned device, preserver the ordering of write operations by
> >> + * using the delay list.
> >> + */
> >> + if (!bdev_is_zoned(c->dev->bdev) || c != &dc->write) {
> >> + spin_unlock(&dc->delayed_bios_lock);
> >> + return DM_MAPIO_REMAPPED;
> >
> > Don't we only need to do that if anything is queued up due to a
> > suspension? Then again having a different patch might be premature
> > optimization, it's not like anyone cares about dm-delay performance
> > almost by definition :)
>
> True. I can add a list_empty check here. Will send a v2 with that.
I may be pointing out the obvious, but we can't just check if
dc->delayed_bios is empty, since we pull the bios off the list before we
submit them. We can only skip adding the bios if the list is empty and
flush_delay_bios() isn't currently processing any bios.
-Ben
>
>
> --
> Damien Le Moal
> Western Digital Research
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [PATCH] dm-delay: Prevent zoned write reordering on suspend
2025-04-10 18:10 ` Benjamin Marzinski
@ 2025-04-10 23:56 ` Damien Le Moal
0 siblings, 0 replies; 6+ messages in thread
From: Damien Le Moal @ 2025-04-10 23:56 UTC (permalink / raw)
To: Benjamin Marzinski
Cc: Christoph Hellwig, dm-devel, Mike Snitzer, Mikulas Patocka
On 4/11/25 03:10, Benjamin Marzinski wrote:
>>> Don't we only need to do that if anything is queued up due to a
>>> suspension? Then again having a different patch might be premature
>>> optimization, it's not like anyone cares about dm-delay performance
>>> almost by definition :)
>>
>> True. I can add a list_empty check here. Will send a v2 with that.
>
> I may be pointing out the obvious, but we can't just check if
> dc->delayed_bios is empty, since we pull the bios off the list before we
> submit them. We can only skip adding the bios if the list is empty and
> flush_delay_bios() isn't currently processing any bios.
Hehe, obvious when you hear it :) Good point !
I will not try to micro optimize this.
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-04-10 23:56 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-10 4:33 [PATCH] dm-delay: Prevent zoned write reordering on suspend Damien Le Moal
2025-04-10 4:43 ` Damien Le Moal
2025-04-10 7:54 ` Christoph Hellwig
2025-04-10 8:02 ` Damien Le Moal
2025-04-10 18:10 ` Benjamin Marzinski
2025-04-10 23:56 ` Damien Le Moal
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.