From: "zhengbing.huang" <zhengbing.huang@easystack.cn>
To: drbd-dev@lists.linbit.com
Subject: [PATCH] drbd: Fix IO block after network failure
Date: Wed, 19 Feb 2025 11:05:06 +0800 [thread overview]
Message-ID: <20250219030506.1389085-1-zhengbing.huang@easystack.cn> (raw)
Network failure test, I/O is not finished.
The oldest_request has follow status information:
master: pending|postponed local: in-AL|completed|ok net[1]: queued|done : C|barr
This req also has RQ_NET_QUEUED,so its reference count
cannot be reduced to zero and req cannot complete.
The commit 8962f7c03c1
drbd: exclude requests that are not yet queued from "seen_dagtag_sector"
has modify the __next_request_for_connection() function,
which causes the sender thread to be unable to clean up all
pending req when the network failure.
The race occurred as follows, where T is a submit req thread,
and S is a sender thread:
S: process_one_request() handle r0
S: network failure. drbd_send_dblock(r0) fail, then call __req_mod(r0, SEND_FAILED...)
S: Call mod_rq_state(), r0 clear RQ_NET_QUEUED, and still has RQ_NET_PENDING
T: r1 arrive drbd_send_and_submit(), add to transfer_log, and set RQ_NET_QUEUED
S: drbd_sender() handle network failure, change_cstate(C_NETWORK_FAILURE)
When sender thread state change to stop, and want to
cleanup all currently unprocessed requests(call __req_mod(req, SEND_CANCELED...)).
but it can not find r1, because in the __next_request_for_connection() function,
r0 always satisfies the first if condition and returns NULL.
static struct drbd_request *__next_request_for_connection(...)
{
...
if (unlikely(s & RQ_NET_PENDING && !(s & (RQ_NET_QUEUED|RQ_NET_SENT))))
return NULL;
...
}
Finally, r1 could not be completed due to has RQ_NET_QUEUED.
So, In the cleanup process of sender,
we find all the req with RQ_NET_QUEUED and clean it.
Signed-off-by: zhengbing.huang <zhengbing.huang@easystack.cn>
---
drbd/drbd_sender.c | 22 ++++++++++++++++++++--
1 file changed, 20 insertions(+), 2 deletions(-)
diff --git a/drbd/drbd_sender.c b/drbd/drbd_sender.c
index 80badc606..e6fc751c7 100644
--- a/drbd/drbd_sender.c
+++ b/drbd/drbd_sender.c
@@ -3251,6 +3251,24 @@ static struct drbd_request *tl_next_request_for_connection(struct drbd_connectio
return connection->todo.req;
}
+static struct drbd_request *tl_next_request_for_cleanup(struct drbd_connection *connection)
+{
+ struct drbd_request *req;
+ struct drbd_request *found_req = NULL;
+
+ list_for_each_entry_rcu(req, &connection->resource->transfer_log, tl_requests) {
+ unsigned s = req->net_rq_state[connection->peer_node_id];
+
+ if (s & RQ_NET_QUEUED) {
+ found_req = req;
+ break;
+ }
+ }
+
+ connection->todo.req = found_req;
+ return connection->todo.req;
+}
+
static void maybe_send_state_afer_ahead(struct drbd_connection *connection)
{
struct drbd_peer_device *peer_device;
@@ -3644,7 +3662,7 @@ int drbd_sender(struct drbd_thread *thi)
/* cleanup all currently unprocessed requests */
if (!connection->todo.req) {
rcu_read_lock();
- tl_next_request_for_connection(connection);
+ tl_next_request_for_cleanup(connection);
rcu_read_unlock();
}
while (connection->todo.req) {
@@ -3660,7 +3678,7 @@ int drbd_sender(struct drbd_thread *thi)
complete_master_bio(device, &m);
rcu_read_lock();
- tl_next_request_for_connection(connection);
+ tl_next_request_for_cleanup(connection);
rcu_read_unlock();
}
--
2.43.0
next reply other threads:[~2025-02-19 3:10 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-19 3:05 zhengbing.huang [this message]
2025-03-20 6:36 ` [PATCH] drbd: Fix IO block after network failure Philipp Reisner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250219030506.1389085-1-zhengbing.huang@easystack.cn \
--to=zhengbing.huang@easystack.cn \
--cc=drbd-dev@lists.linbit.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox