From: James Simmons <jsimmons@infradead.org>
To: Andreas Dilger <adilger@whamcloud.com>,
Oleg Drokin <green@whamcloud.com>, NeilBrown <neilb@suse.de>
Cc: Vitaly Fertman <vitaly.fertman@hpe.com>,
Vitaly Fertman <c17818@cray.com>,
Lustre Development List <lustre-devel@lists.lustre.org>
Subject: [lustre-devel] [PATCH 13/24] lustre: ptlrpc: two replay lock threads
Date: Tue, 21 Sep 2021 22:19:50 -0400 [thread overview]
Message-ID: <1632277201-6920-14-git-send-email-jsimmons@infradead.org> (raw)
In-Reply-To: <1632277201-6920-1-git-send-email-jsimmons@infradead.org>
From: Vitaly Fertman <c17818@cray.com>
conflict to each other what leads to:
ASSERTION( atomic_read(&imp->imp_replay_inflight) == 1 )
replay_lock_interpret() does ptlrpc_connect_import() on error, and one
thread will appear starting with connect reply interpret.
replay_lock_interpret() also wakes up ldlm_lock_replay_thread() which
does ptlrpc_import_recovery_state_machine().
It may happen that both threads will get to ldlm_replay_locks() on the
next round at the same time, both increment imp_replay_inflight and
the second one will assert.
The problem appeared in LU-13600 which added ldlm_lock_replay_thread()
with the ptlrpc_import_recovery_state_machine() call.
HPE-bug-id: LUS-10147
WC-bug-id: https://jira.whamcloud.com/browse/LU-14847
Lustre-commit: d7d7eb50c8f5fd3fc ("LU-14847 ptlrpc: two replay lock threads")
Fixes: 8cc7f22847 ("lustre: ptlrpc: limit rate of lock replays")
Signed-off-by: Vitaly Fertman <vitaly.fertman@hpe.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-on: https://es-gerrit.dev.cray.com/158931
Reviewed-on: https://review.whamcloud.com/44294
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
fs/lustre/ldlm/ldlm_request.c | 10 +++++++---
fs/lustre/obdclass/obd_config.c | 4 ++--
2 files changed, 9 insertions(+), 5 deletions(-)
diff --git a/fs/lustre/ldlm/ldlm_request.c b/fs/lustre/ldlm/ldlm_request.c
index 7718e07..746c45b 100644
--- a/fs/lustre/ldlm/ldlm_request.c
+++ b/fs/lustre/ldlm/ldlm_request.c
@@ -2253,7 +2253,8 @@ int __ldlm_replay_locks(struct obd_import *imp, bool rate_limit)
struct ldlm_lock *lock;
int rc = 0;
- LASSERT(atomic_read(&imp->imp_replay_inflight) == 1);
+ while (atomic_read(&imp->imp_replay_inflight) != 1)
+ cond_resched();
/* don't replay locks if import failed recovery */
if (imp->imp_vbr_failed)
@@ -2311,9 +2312,12 @@ int ldlm_replay_locks(struct obd_import *imp)
struct task_struct *task;
int rc = 0;
- class_import_get(imp);
/* ensure this doesn't fall to 0 before all have been queued */
- atomic_inc(&imp->imp_replay_inflight);
+ if (atomic_inc_return(&imp->imp_replay_inflight) > 1) {
+ atomic_dec(&imp->imp_replay_inflight);
+ return 0;
+ }
+ class_import_get(imp);
task = kthread_run(ldlm_lock_replay_thread, imp, "ldlm_lock_replay");
if (IS_ERR(task)) {
diff --git a/fs/lustre/obdclass/obd_config.c b/fs/lustre/obdclass/obd_config.c
index 3a0dbd5..cb70ed5 100644
--- a/fs/lustre/obdclass/obd_config.c
+++ b/fs/lustre/obdclass/obd_config.c
@@ -519,8 +519,8 @@ struct obd_device *class_incref(struct obd_device *obd,
{
lu_ref_add_atomic(&obd->obd_reference, scope, source);
atomic_inc(&obd->obd_refcount);
- CDEBUG(D_INFO, "incref %s (%p) now %d\n", obd->obd_name, obd,
- atomic_read(&obd->obd_refcount));
+ CDEBUG(D_INFO, "incref %s (%p) now %d - %s\n", obd->obd_name, obd,
+ atomic_read(&obd->obd_refcount), scope);
return obd;
}
--
1.8.3.1
_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org
next prev parent reply other threads:[~2021-09-22 2:21 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-09-22 2:19 [lustre-devel] [PATCH 00/24] lustre: Update to OpenSFS Sept 21, 2021 James Simmons
2021-09-22 2:19 ` [lustre-devel] [PATCH 01/24] lnet: Lock primary NID logic James Simmons
2021-09-22 2:19 ` [lustre-devel] [PATCH 02/24] lustre: quota: enforce block quota for chgrp James Simmons
2021-09-22 2:19 ` [lustre-devel] [PATCH 03/24] lnet: introduce struct lnet_nid James Simmons
2021-09-22 2:19 ` [lustre-devel] [PATCH 04/24] lnet: add string formating/parsing for IPv6 nids James Simmons
2021-09-22 2:19 ` [lustre-devel] [PATCH 05/24] lnet: change lpni_nid in lnet_peer_ni to lnet_nid James Simmons
2021-09-22 2:19 ` [lustre-devel] [PATCH 06/24] lnet: change lp_primary_nid to struct lnet_nid James Simmons
2021-09-22 2:19 ` [lustre-devel] [PATCH 07/24] lnet: change lp_disc_*_nid " James Simmons
2021-09-22 2:19 ` [lustre-devel] [PATCH 08/24] lnet: socklnd: factor out key calculation for ksnd_peers James Simmons
2021-09-22 2:19 ` [lustre-devel] [PATCH 09/24] lnet: introduce lnet_processid for ksock_peer_ni James Simmons
2021-09-22 2:19 ` [lustre-devel] [PATCH 10/24] lnet: enhance connect/accept to support large addr James Simmons
2021-09-22 2:19 ` [lustre-devel] [PATCH 11/24] lnet: change lr_nid to struct lnet_nid James Simmons
2021-09-22 2:19 ` [lustre-devel] [PATCH 12/24] lnet: extend rspt_next_hop_nid in lnet_rsp_tracker James Simmons
2021-09-22 2:19 ` James Simmons [this message]
2021-09-22 2:19 ` [lustre-devel] [PATCH 14/24] lustre: llite: Always do lookup on ENOENT in open James Simmons
2021-09-22 2:19 ` [lustre-devel] [PATCH 15/24] lustre: llite: Remove inode locking in ll_fsync James Simmons
2021-09-22 2:19 ` [lustre-devel] [PATCH 16/24] lnet: socklnd: fix link state detection James Simmons
2021-09-22 2:19 ` [lustre-devel] [PATCH 17/24] lustre: llite: check read only mount for setquota James Simmons
2021-09-22 2:19 ` [lustre-devel] [PATCH 18/24] lustre: llite: don't touch vma after filemap_fault James Simmons
2021-09-22 2:19 ` [lustre-devel] [PATCH 19/24] lnet: Check for -ESHUTDOWN in lnet_parse James Simmons
2021-09-22 2:19 ` [lustre-devel] [PATCH 20/24] lustre: obdclass: EAGAIN after rhashtable_walk_next() James Simmons
2021-09-22 2:19 ` [lustre-devel] [PATCH 21/24] lustre: sec: filename encryption James Simmons
2021-09-22 2:19 ` [lustre-devel] [PATCH 22/24] lustre: uapi: fixup UAPI headers for native Linux client James Simmons
2021-09-22 2:20 ` [lustre-devel] [PATCH 23/24] lustre: ptlrpc: separate out server code for wiretest James Simmons
2021-09-22 2:20 ` [lustre-devel] [PATCH 24/24] lustre: pcc: VM_WRITE should not trigger layout write James Simmons
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1632277201-6920-14-git-send-email-jsimmons@infradead.org \
--to=jsimmons@infradead.org \
--cc=adilger@whamcloud.com \
--cc=c17818@cray.com \
--cc=green@whamcloud.com \
--cc=lustre-devel@lists.lustre.org \
--cc=neilb@suse.de \
--cc=vitaly.fertman@hpe.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).