All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] dm-delay: Prevent zoned write reordering on suspend
@ 2025-04-10  4:33 Damien Le Moal
  2025-04-10  4:43 ` Damien Le Moal
  2025-04-10  7:54 ` Christoph Hellwig
  0 siblings, 2 replies; 6+ messages in thread
From: Damien Le Moal @ 2025-04-10  4:33 UTC (permalink / raw)
  To: dm-devel, Mike Snitzer, Mikulas Patocka
  Cc: Christoph Hellwig, Benjamin Marzinski

When a dm-delay devie is being suspended, the .presuspend() operation is
first executed (delay_presuspend()) to immediately issue all the BIOs
present in the delayed list of the device and also sets the device
may_delay boolean to false. At the same time, any new BIO is issued to
the device will not be delayed and immediately issued with delay_bio()
returning DM_MAPIO_REMAPPED. This creates a situation where potentially
2 different contexts may be issuing write BIOs to the same zone of a
zone device without respecting the issuing order from the user, that is,
a newly issued write BIO may be issued before other write BIOs for the
same target zone that are in the device delayed list. If such situation
occurs, write BIOs may be failed by the underlying zoned device due to
an unaligned write error.

Prevent this situation from happening by always handling newly issued
write BIOs using the delayed list of BIOs, even when the device is being
suspended. This is done by forcing the use of the worker kthread for
zoned devices, and by modifying flush_worker_fn() to always flush all
delayed BIOs if the device may_delay boolean is false.

Fixes: d43929ef65a6 ("dm-delay: support zoned devices")
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
---
 drivers/md/dm-delay.c | 29 +++++++++++++++++++++--------
 1 file changed, 21 insertions(+), 8 deletions(-)

diff --git a/drivers/md/dm-delay.c b/drivers/md/dm-delay.c
index d4cf0ac2a7aa..c665b2ab1115 100644
--- a/drivers/md/dm-delay.c
+++ b/drivers/md/dm-delay.c
@@ -128,7 +128,7 @@ static int flush_worker_fn(void *data)
 	struct delay_c *dc = data;
 
 	while (!kthread_should_stop()) {
-		flush_delayed_bios(dc, false);
+		flush_delayed_bios(dc, !dc->may_delay);
 		spin_lock(&dc->delayed_bios_lock);
 		if (unlikely(list_empty(&dc->delayed_bios))) {
 			set_current_state(TASK_INTERRUPTIBLE);
@@ -213,6 +213,7 @@ static int delay_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 	struct delay_c *dc;
 	int ret;
 	unsigned int max_delay;
+	bool is_zoned = false;
 
 	if (argc != 3 && argc != 6 && argc != 9) {
 		ti->error = "Requires exactly 3, 6 or 9 arguments";
@@ -236,6 +237,7 @@ static int delay_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 	if (ret)
 		goto bad;
 	max_delay = dc->read.delay;
+	is_zoned = bdev_is_zoned(dc->read.dev->bdev);
 
 	if (argc == 3) {
 		ret = delay_class_ctr(ti, &dc->write, argv);
@@ -251,6 +253,7 @@ static int delay_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 	if (ret)
 		goto bad;
 	max_delay = max(max_delay, dc->write.delay);
+	is_zoned = is_zoned || bdev_is_zoned(dc->write.dev->bdev);
 
 	if (argc == 6) {
 		ret = delay_class_ctr(ti, &dc->flush, argv + 3);
@@ -263,13 +266,16 @@ static int delay_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 	if (ret)
 		goto bad;
 	max_delay = max(max_delay, dc->flush.delay);
+	is_zoned = is_zoned || bdev_is_zoned(dc->flush.dev->bdev);
 
 out:
-	if (max_delay < 50) {
-		/*
-		 * In case of small requested delays, use kthread instead of
-		 * timers and workqueue to achieve better latency.
-		 */
+	/*
+	 * In case of small requested delays, use kthread instead of timers and
+	 * workqueue to achieve better latency. Also use a kthread for a zoned
+	 * device so that we can preserve the order of write operations during
+	 * suspend.
+	 */
+	if (max_delay < 50 || is_zoned) {
 		dc->worker = kthread_run(&flush_worker_fn, dc, "dm-delay-flush-worker");
 		if (IS_ERR(dc->worker)) {
 			ret = PTR_ERR(dc->worker);
@@ -313,8 +319,15 @@ static int delay_bio(struct delay_c *dc, struct delay_class *c, struct bio *bio)
 
 	spin_lock(&dc->delayed_bios_lock);
 	if (unlikely(!dc->may_delay)) {
-		spin_unlock(&dc->delayed_bios_lock);
-		return DM_MAPIO_REMAPPED;
+		/*
+		 * Issue the BIO immediately if the device is not zoned. FOr a
+		 * zoned device, preserver the ordering of write operations by
+		 * using the delay list.
+		 */
+		if (!bdev_is_zoned(c->dev->bdev) || c != &dc->write) {
+			spin_unlock(&dc->delayed_bios_lock);
+			return DM_MAPIO_REMAPPED;
+		}
 	}
 	c->ops++;
 	list_add_tail(&delayed->list, &dc->delayed_bios);
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-04-10 23:56 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-10  4:33 [PATCH] dm-delay: Prevent zoned write reordering on suspend Damien Le Moal
2025-04-10  4:43 ` Damien Le Moal
2025-04-10  7:54 ` Christoph Hellwig
2025-04-10  8:02   ` Damien Le Moal
2025-04-10 18:10     ` Benjamin Marzinski
2025-04-10 23:56       ` Damien Le Moal

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.