Distributed Replicated Block Device (DRBD) development
 help / color / mirror / Atom feed
From: Philipp Reisner <philipp.reisner@linbit.com>
To: Dongsheng Yang <dongsheng.yang@linux.dev>
Cc: Philipp Reisner <philipp.reisner@linbit.com>, drbd-dev@lists.linbit.com
Subject: [PATCH] drbd: make drbd_adm_detach() interruptible
Date: Wed,  3 Jul 2024 16:31:35 +0200	[thread overview]
Message-ID: <20240703143135.330462-1-philipp.reisner@linbit.com> (raw)
In-Reply-To: <d16555b2-a777-e6ed-83f3-fc93a7a12607@linux.dev>

If a backing device suddenly ceases delivering I/O completions, and in
reaction, the user issues a `drbdsetup detach`, the operation will
hang when it tries to write internal meta-data.

The user should have used `drbdsetup --force detach`, but it is too
late. There was no way to interrupt the hanging drbdsetup detach.

Improve the situation by making detach operations interruptible.
---
 drbd/drbd_actlog.c |  5 ++++-
 drbd/drbd_int.h    |  1 +
 drbd/drbd_state.c  | 29 +++++++++++++++++++++++++++--
 3 files changed, 32 insertions(+), 3 deletions(-)

diff --git a/drbd/drbd_actlog.c b/drbd/drbd_actlog.c
index bc09dee2f..d6ba168ac 100644
--- a/drbd/drbd_actlog.c
+++ b/drbd/drbd_actlog.c
@@ -74,7 +74,10 @@ void wait_until_done_or_force_detached(struct drbd_device *device, struct drbd_b
 		dt = MAX_SCHEDULE_TIMEOUT;
 
 	dt = wait_event_timeout(device->misc_wait,
-			*done || test_bit(FORCE_DETACH, &device->flags), dt);
+			*done ||
+			test_bit(FORCE_DETACH, &device->flags) ||
+			test_bit(INTERRUPT_DETACH, &device->flags),
+			dt);
 	if (dt == 0) {
 		drbd_err(device, "meta-data IO operation timed out\n");
 		drbd_handle_io_error(device, DRBD_FORCE_DETACH);
diff --git a/drbd/drbd_int.h b/drbd/drbd_int.h
index 0ebd79091..8ea752edd 100644
--- a/drbd/drbd_int.h
+++ b/drbd/drbd_int.h
@@ -521,6 +521,7 @@ enum device_flag {
 	MD_NO_FUA,		/* meta data device does not support barriers,
 				   so don't even try */
 	FORCE_DETACH,		/* Force-detach from local disk, aborting any pending local IO */
+	INTERRUPT_DETACH,	/* Interrupt an ongoing detach operation */
 	NEW_CUR_UUID,		/* Create new current UUID when thawing IO or issuing local IO */
 	__NEW_CUR_UUID,		/* Set NEW_CUR_UUID as soon as state change visible */
 	WRITING_NEW_CUR_UUID,	/* Set while the new current ID gets generated. */
diff --git a/drbd/drbd_state.c b/drbd/drbd_state.c
index be1de8f06..643b2f385 100644
--- a/drbd/drbd_state.c
+++ b/drbd/drbd_state.c
@@ -924,14 +924,39 @@ void state_change_lock(struct drbd_resource *resource, unsigned long *irq_flags,
 	resource->state_change_flags = flags;
 }
 
+/* Interrupt writing meta-data */
+static void interrupt_detach(struct drbd_resource *resource, struct completion *done)
+{
+	struct drbd_device *device;
+	int vnr;
+
+	idr_for_each_entry(&resource->devices, device, vnr) {
+		if (device->disk_state[NOW] == D_DETACHING) {
+			set_bit(INTERRUPT_DETACH, &device->flags);
+			wake_up_all(&device->misc_wait);
+		}
+	}
+
+	wait_for_completion(done);
+
+	idr_for_each_entry(&resource->devices, device, vnr) {
+		if (test_bit(INTERRUPT_DETACH, &device->flags))
+			clear_bit(INTERRUPT_DETACH, &device->flags);
+	}
+}
+
 static void __state_change_unlock(struct drbd_resource *resource, unsigned long *irq_flags, struct completion *done)
 {
 	enum chg_state_flags flags = resource->state_change_flags;
 
 	resource->state_change_flags = 0;
 	write_unlock_irqrestore(&resource->state_rwlock, *irq_flags);
-	if (done && expect(resource, current != resource->worker.task))
-		wait_for_completion(done);
+	if (done && expect(resource, current != resource->worker.task)) {
+		int err = wait_for_completion_interruptible(done);
+
+		if (err == -ERESTARTSYS)
+			interrupt_detach(resource, done);
+	}
 	if ((flags & CS_SERIALIZE) && !(flags & (CS_ALREADY_SERIALIZED | CS_PREPARE)))
 		up(&resource->state_sem);
 }
-- 
2.45.2


  reply	other threads:[~2024-07-03 14:31 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-24  5:46 [PATCH 01/11] drbd_nl: dont allow detating to be inttrupted in waiting D_DETACHING to DISKLESS zhengbing.huang
2024-06-24  5:46 ` [PATCH 02/11] drbd_receiver: get_ldev before use device->ldev for drbd_reconsider_queue_parameters() zhengbing.huang
2024-06-28  9:35   ` Philipp Reisner
2024-06-24  5:46 ` [PATCH 03/11] drbd_transport_rdma: put kref for cm in dtr_path_established in error path zhengbing.huang
2024-06-28  9:40   ` Philipp Reisner
2024-07-01  2:07     ` Dongsheng Yang
2024-07-01  2:48       ` Dongsheng Yang
2024-10-16 16:44         ` Philipp Reisner
2024-10-17  6:42           ` Zhengbing
2024-06-24  5:46 ` [PATCH 04/11] drbd_transport_rdma: dont schedule retry_connect_work in active is false zhengbing.huang
2024-06-28 11:51   ` Philipp Reisner
2024-07-01  2:11     ` Dongsheng Yang
2024-06-24  5:46 ` [PATCH 05/11] drbd_transport_rdma: dont break in dtr_tx_cq_event_handler if (cm->state != DSM_CONNECTED) zhengbing.huang
2024-06-28 12:07   ` Philipp Reisner
2024-07-01  2:23     ` Dongsheng Yang
2024-06-24  5:46 ` [PATCH 06/11] drbd_transport_rdma: put kref in error path zhengbing.huang
2024-06-28 12:12   ` Philipp Reisner
2024-06-24  5:46 ` [PATCH 07/11] drbd_transport_rdma: put kref in dtr_remap_tx_desc error zhengbing.huang
2024-06-28 12:19   ` Philipp Reisner
2024-07-01  2:28     ` Dongsheng Yang
2024-06-24  5:46 ` [PATCH 08/11] drbd_transport_rdma: fix a race between dtr_connect and drbd_thread_stop zhengbing.huang
2024-06-28 12:36   ` Philipp Reisner
2024-07-01  2:30     ` Dongsheng Yang
2024-06-24  5:46 ` [PATCH 09/11] drbd_transport_rdma: introduce timeout for rdma_disocnnect zhengbing.huang
2024-06-24  5:46 ` [PATCH 10/11] drbd_transport_rdma: introduce timeout for rdma_connect zhengbing.huang
2024-06-24  5:46 ` [PATCH 11/11] drbd_transport_rdma: wake up state_wq after clear DSB_CONNECTED in dtr_tx_timeout_work_fn zhengbing.huang
2024-06-28  9:10 ` [PATCH 01/11] drbd_nl: dont allow detating to be inttrupted in waiting D_DETACHING to DISKLESS Philipp Reisner
2024-07-01  2:02   ` Dongsheng Yang
2024-07-01 10:00     ` Philipp Reisner
2024-07-02  1:45       ` Dongsheng Yang
2024-07-03 14:31         ` Philipp Reisner [this message]
2024-07-04  2:59           ` Re:[PATCH] drbd: make drbd_adm_detach() interruptible Zhengbing

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240703143135.330462-1-philipp.reisner@linbit.com \
    --to=philipp.reisner@linbit.com \
    --cc=dongsheng.yang@linux.dev \
    --cc=drbd-dev@lists.linbit.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox