From: James Simmons <jsimmons@infradead.org>
To: lustre-devel@lists.lustre.org
Subject: [lustre-devel] [PATCH 35/45] lustre: osc: Do not wait for grants for too long
Date: Mon, 25 May 2020 18:08:12 -0400 [thread overview]
Message-ID: <1590444502-20533-36-git-send-email-jsimmons@infradead.org> (raw)
In-Reply-To: <1590444502-20533-1-git-send-email-jsimmons@infradead.org>
From: Oleg Drokin <green@whamcloud.com>
obd_timeout is way too long considering we are holding a lock
that might be contended. If OST is slow to respond, we might
get evicted, so limit us to a half of the shortest possible
max wait a server might have before switching to synchronous IO.
WC-bug-id: https://jira.whamcloud.com/browse/LU-13131
Lustre-commit: 1eee11c75ca13 ("LU-13131 osc: Do not wait for grants for too long")
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/38283
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
fs/lustre/include/lustre_dlm.h | 2 ++
fs/lustre/ldlm/ldlm_request.c | 1 +
fs/lustre/osc/osc_cache.c | 13 ++++++++++++-
3 files changed, 15 insertions(+), 1 deletion(-)
diff --git a/fs/lustre/include/lustre_dlm.h b/fs/lustre/include/lustre_dlm.h
index f67b612..174b314 100644
--- a/fs/lustre/include/lustre_dlm.h
+++ b/fs/lustre/include/lustre_dlm.h
@@ -1320,6 +1320,8 @@ int ldlm_cli_cancel_list(struct list_head *head, int count,
enum ldlm_cancel_flags flags);
/** @} ldlm_cli_api */
+extern unsigned int ldlm_enqueue_min;
+
int ldlm_inodebits_drop(struct ldlm_lock *lock, u64 to_drop);
int ldlm_cli_inodebits_convert(struct ldlm_lock *lock,
enum ldlm_cancel_flags cancel_flags);
diff --git a/fs/lustre/ldlm/ldlm_request.c b/fs/lustre/ldlm/ldlm_request.c
index 5f06def..12ee241 100644
--- a/fs/lustre/ldlm/ldlm_request.c
+++ b/fs/lustre/ldlm/ldlm_request.c
@@ -69,6 +69,7 @@
unsigned int ldlm_enqueue_min = OBD_TIMEOUT_DEFAULT;
module_param(ldlm_enqueue_min, uint, 0644);
MODULE_PARM_DESC(ldlm_enqueue_min, "lock enqueue timeout minimum");
+EXPORT_SYMBOL(ldlm_enqueue_min);
/* in client side, whether the cached locks will be canceled before replay */
unsigned int ldlm_cancel_unused_locks_before_replay = 1;
diff --git a/fs/lustre/osc/osc_cache.c b/fs/lustre/osc/osc_cache.c
index 9e28ff6..c7f1502 100644
--- a/fs/lustre/osc/osc_cache.c
+++ b/fs/lustre/osc/osc_cache.c
@@ -39,6 +39,7 @@
#define DEBUG_SUBSYSTEM S_OSC
#include <lustre_osc.h>
+#include <lustre_dlm.h>
#include "osc_internal.h"
@@ -1630,10 +1631,20 @@ static int osc_enter_cache(const struct lu_env *env, struct client_obd *cli,
{
struct osc_object *osc = oap->oap_obj;
struct lov_oinfo *loi = osc->oo_oinfo;
- unsigned long timeout = (AT_OFF ? obd_timeout : at_max) * HZ;
int rc = -EDQUOT;
int remain;
bool entered = false;
+ /* We cannot wait for a long time here since we are holding ldlm lock
+ * across the actual IO. If no requests complete fast (e.g. due to
+ * overloaded OST that takes a long time to process everything, we'd
+ * get evicted if we wait for a normal obd_timeout or some such.
+ * So we try to wait half the time it would take the client to be
+ * evicted by server which is half obd_timeout when AT is off
+ * or at least ldlm_enqueue_min with AT on.
+ * See LU-13131
+ */
+ unsigned long timeout = (AT_OFF ? obd_timeout / 2 :
+ ldlm_enqueue_min / 2) * HZ;
OSC_DUMP_GRANT(D_CACHE, cli, "need:%d\n", bytes);
--
1.8.3.1
next prev parent reply other threads:[~2020-05-25 22:08 UTC|newest]
Thread overview: 54+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-05-25 22:07 [lustre-devel] [PATCH 00/45] lustre: merged OpenSFS client patches from April 30 to today James Simmons
2020-05-25 22:07 ` [lustre-devel] [PATCH 01/45] lustre: fid: revert seq_client_rpc patch James Simmons
2020-05-25 22:07 ` [lustre-devel] [PATCH 02/45] lustre: fld: convert cache_flush file to LPROC_SEQ_FOPS James Simmons
2020-05-25 22:07 ` [lustre-devel] [PATCH 03/45] lustre: cleanups and bug fixes James Simmons
2020-05-25 22:07 ` [lustre-devel] [PATCH 04/45] lnet: merge lnet_md_alloc into lnet_md_build James Simmons
2020-05-25 22:07 ` [lustre-devel] [PATCH 05/45] lnet: always put a page list into struct lnet_libmd James Simmons
2020-05-25 22:07 ` [lustre-devel] [PATCH 06/45] lnet: discard kvec option from lnet_libmd James Simmons
2020-05-25 22:07 ` [lustre-devel] [PATCH 07/45] lnet: remove msg_iov from lnet_msg James Simmons
2020-05-25 22:07 ` [lustre-devel] [PATCH 08/45] lnet: o2iblnd: discard kiblnd_setup_rd_iov James Simmons
2020-05-25 22:07 ` [lustre-devel] [PATCH 09/45] lustre: ptlrpc: return proper write count from ping_store James Simmons
2020-05-25 22:07 ` [lustre-devel] [PATCH 10/45] lustre: sec: check permissions for changelogs access James Simmons
2020-05-25 22:07 ` [lustre-devel] [PATCH 11/45] lustre: uapi: add OBD_CONNECT2_FIDMAP James Simmons
2020-05-25 22:07 ` [lustre-devel] [PATCH 12/45] lustre: lov: lov_io_sub_init()) ASSERTION James Simmons
2020-05-25 22:07 ` [lustre-devel] [PATCH 13/45] lnet: Introduce constant for the lolnd NID James Simmons
2020-05-25 22:07 ` [lustre-devel] [PATCH 14/45] lustre: Remove inappropriate uses of BIT() macro James Simmons
2020-05-25 22:07 ` [lustre-devel] [PATCH 15/45] lustre: mgc: protect from NULL exp in mgc_enqueue() James Simmons
2020-05-25 22:07 ` [lustre-devel] [PATCH 16/45] lustre: llite: do not flush COW pages from mapping James Simmons
2020-05-25 22:07 ` [lustre-devel] [PATCH 17/45] lustre: quota: quota pools for OSTs James Simmons
2020-05-25 22:07 ` [lustre-devel] [PATCH 18/45] lnet: libcfs: use BIT() macro where appropriate James Simmons
2020-05-25 22:07 ` [lustre-devel] [PATCH 19/45] lustre: llite: clean up pcc_layout_wait() James Simmons
2020-05-25 22:07 ` [lustre-devel] [PATCH 20/45] lustre: misc: declare static chars as const where possible James Simmons
2020-05-25 22:07 ` [lustre-devel] [PATCH 21/45] lustre: llite: fix to make jobstats work for async ra James Simmons
2020-05-25 22:07 ` [lustre-devel] [PATCH 22/45] lustre: llite: verify truncated xattr is handled James Simmons
2020-05-25 22:08 ` [lustre-devel] [PATCH 23/45] lustre: obd: fix printing of client connection UUID James Simmons
2020-05-25 22:08 ` [lustre-devel] [PATCH 24/45] lnet: Add MD options for response tracking James Simmons
2020-05-25 22:08 ` [lustre-devel] [PATCH 25/45] lustre: Send file creation time to clients James Simmons
2020-05-25 22:08 ` [lustre-devel] [PATCH 26/45] lnet: stop using struct timeval James Simmons
2020-05-25 22:08 ` [lustre-devel] [PATCH 27/45] lustre: ptlrpc: connect to MDT stucks James Simmons
2020-05-25 22:08 ` [lustre-devel] [PATCH 28/45] lnet: restrict gateway selection James Simmons
2020-05-25 22:08 ` [lustre-devel] [PATCH 29/45] lustre: llite: restore ll_dcompare() James Simmons
2020-05-25 22:08 ` [lustre-devel] [PATCH 30/45] lustre: fallocate: Implement fallocate preallocate operation James Simmons
2020-05-25 22:08 ` [lustre-devel] [PATCH 31/45] lustre: llite: fix possible divide zero in ll_use_fast_io() James Simmons
2020-05-25 22:08 ` [lustre-devel] [PATCH 32/45] lustre: llog: allow delete of zero size llog James Simmons
2020-05-25 22:08 ` [lustre-devel] [PATCH 33/45] lustre: ldlm: use proper units for timeouts James Simmons
2020-05-25 22:08 ` [lustre-devel] [PATCH 34/45] lustre: dne: support directory restripe James Simmons
2020-05-25 22:08 ` James Simmons [this message]
2020-05-25 22:08 ` [lustre-devel] [PATCH 36/45] lnet: use kmem_cache_zalloc as appropriate James Simmons
2020-05-25 22:08 ` [lustre-devel] [PATCH 37/45] lustre: osc: Ensure immediate departure of sync write pages James Simmons
2020-05-25 22:08 ` [lustre-devel] [PATCH 38/45] lnet: remove lnet_extract_iov() James Simmons
2020-05-25 22:08 ` [lustre-devel] [PATCH 39/45] lnet: simplify ksock_tx James Simmons
2020-05-25 22:08 ` [lustre-devel] [PATCH 40/45] lnet: socklnd: discard tx_iov James Simmons
2020-05-25 22:08 ` [lustre-devel] [PATCH 41/45] lustre: lmv: do not print MDTs that are inactive James Simmons
2020-05-25 22:08 ` [lustre-devel] [PATCH 42/45] lnet: use the same src nid for discovery James Simmons
2020-05-25 22:08 ` [lustre-devel] [PATCH 43/45] lustre: llite: check if page truncated in ll_write_begin() James Simmons
2020-05-25 22:08 ` [lustre-devel] [PATCH 44/45] lustre: dne: improve temp file name check James Simmons
2020-05-25 22:08 ` [lustre-devel] [PATCH 45/45] lustre: all: Cleanup LASSERTF uses missing newlines James Simmons
2020-05-29 6:29 ` [lustre-devel] [PATCH 00/45] lustre: merged OpenSFS client patches from April 30 to today NeilBrown
2020-06-01 22:52 ` James Simmons
2020-06-23 4:10 ` NeilBrown
2020-06-23 7:57 ` Degremont, Aurelien
2020-06-24 0:52 ` NeilBrown
2020-07-03 6:37 ` NeilBrown
2020-06-24 14:34 ` James Simmons
2020-06-25 1:46 ` NeilBrown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1590444502-20533-36-git-send-email-jsimmons@infradead.org \
--to=jsimmons@infradead.org \
--cc=lustre-devel@lists.lustre.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).