From: James Simmons <jsimmons@infradead.org>
To: Andreas Dilger <adilger@whamcloud.com>,
Oleg Drokin <green@whamcloud.com>, NeilBrown <neilb@suse.de>
Cc: Chris Horn <chris.horn@hpe.com>,
Lustre Development List <lustre-devel@lists.lustre.org>
Subject: [lustre-devel] [PATCH 04/42] lnet: Drop LNet message if deadline exceeded
Date: Mon, 23 Jan 2023 18:00:17 -0500 [thread overview]
Message-ID: <1674514855-15399-5-git-send-email-jsimmons@infradead.org> (raw)
In-Reply-To: <1674514855-15399-1-git-send-email-jsimmons@infradead.org>
From: Chris Horn <chris.horn@hpe.com>
The LNet message deadline is set when a message is committed for
sending. A message can be queued while waiting for send credit(s)
after it has been committed. Thus, it is possible for a message
deadline to be exceeded while on the queue. We should check for this
when posting messages to LND layer.
HPE-bug-id: LUS-11333
WC-bug-id: https://jira.whamcloud.com/browse/LU-16303
Lustre-commit: 52db11cdceef0851b ("LU-16303 lnet: Drop LNet message if deadline exceeded")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49078
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
net/lnet/lnet/lib-move.c | 57 +++++++++++++++++++++++++++-------------
net/lnet/lnet/lib-msg.c | 2 +-
2 files changed, 40 insertions(+), 19 deletions(-)
diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c
index 225accaf5d08..f602492ee75f 100644
--- a/net/lnet/lnet/lib-move.c
+++ b/net/lnet/lnet/lib-move.c
@@ -572,41 +572,52 @@ lnet_ni_eager_recv(struct lnet_ni *ni, struct lnet_msg *msg)
return rc;
}
-/* returns true if this message should be dropped */
-static bool
+/* Returns:
+ * -ETIMEDOUT if the message deadline has been exceeded
+ * -EHOSTUNREACH if the peer is down
+ * 0 if this message should not be dropped
+ */
+static int
lnet_check_message_drop(struct lnet_ni *ni, struct lnet_peer_ni *lpni,
struct lnet_msg *msg)
{
+ /* Drop message if we've exceeded the message deadline */
+ if (ktime_after(ktime_get(), msg->msg_deadline))
+ return -ETIMEDOUT;
+
if (msg->msg_target.pid & LNET_PID_USERFLAG)
- return false;
+ return 0;
if (!lnet_peer_aliveness_enabled(lpni))
- return false;
+ return 0;
/* If we're resending a message, let's attempt to send it even if
* the peer is down to fulfill our resend quota on the message
*/
if (msg->msg_retry_count > 0)
- return false;
+ return 0;
- /* try and send recovery messages irregardless */
+ /* try and send recovery messages regardless */
if (msg->msg_recovery)
- return false;
+ return 0;
/* always send any responses */
if (lnet_msg_is_response(msg))
- return false;
+ return 0;
/* always send non-routed messages */
if (!msg->msg_routing)
- return false;
+ return 0;
/* assume peer_ni is alive as long as we're within the configured
* peer timeout
*/
- return ktime_get_seconds() >=
- (lpni->lpni_last_alive +
- lpni->lpni_net->net_tunables.lct_peer_timeout);
+ if (ktime_get_seconds() >=
+ (lpni->lpni_last_alive +
+ lpni->lpni_net->net_tunables.lct_peer_timeout))
+ return -EHOSTUNREACH;
+
+ return 0;
}
/**
@@ -628,6 +639,7 @@ lnet_post_send_locked(struct lnet_msg *msg, int do_send)
struct lnet_ni *ni = msg->msg_txni;
int cpt = msg->msg_tx_cpt;
struct lnet_tx_queue *tq = ni->ni_tx_queues[cpt];
+ int rc;
/* non-lnet_send() callers have checked before */
LASSERT(!do_send || msg->msg_tx_delayed);
@@ -639,7 +651,8 @@ lnet_post_send_locked(struct lnet_msg *msg, int do_send)
LASSERT(!nid_same(&lp->lpni_nid, &the_lnet.ln_loni->ni_nid));
/* NB 'lp' is always the next hop */
- if (lnet_check_message_drop(ni, lp, msg)) {
+ rc = lnet_check_message_drop(ni, lp, msg);
+ if (rc) {
the_lnet.ln_counters[cpt]->lct_common.lcc_drop_count++;
the_lnet.ln_counters[cpt]->lct_common.lcc_drop_length +=
msg->msg_len;
@@ -653,14 +666,22 @@ lnet_post_send_locked(struct lnet_msg *msg, int do_send)
msg->msg_type,
LNET_STATS_TYPE_DROP);
- CNETERR("Dropping message for %s: peer not alive\n",
- libcfs_idstr(&msg->msg_target));
- msg->msg_health_status = LNET_MSG_STATUS_REMOTE_DROPPED;
+ if (rc == -EHOSTUNREACH) {
+ CNETERR("Dropping message for %s: peer not alive\n",
+ libcfs_idstr(&msg->msg_target));
+ msg->msg_health_status = LNET_MSG_STATUS_REMOTE_DROPPED;
+ } else {
+ CNETERR("Dropping message for %s: exceeded message deadline\n",
+ libcfs_idstr(&msg->msg_target));
+ msg->msg_health_status =
+ LNET_MSG_STATUS_NETWORK_TIMEOUT;
+ }
+
if (do_send)
- lnet_finalize(msg, -EHOSTUNREACH);
+ lnet_finalize(msg, rc);
lnet_net_lock(cpt);
- return -EHOSTUNREACH;
+ return rc;
}
if (msg->msg_md &&
diff --git a/net/lnet/lnet/lib-msg.c b/net/lnet/lnet/lib-msg.c
index 898d8670aedf..82d117dc6b61 100644
--- a/net/lnet/lnet/lib-msg.c
+++ b/net/lnet/lnet/lib-msg.c
@@ -779,7 +779,7 @@ lnet_health_check(struct lnet_msg *msg)
lo = true;
if (hstatus != LNET_MSG_STATUS_OK &&
- ktime_compare(ktime_get(), msg->msg_deadline) >= 0)
+ ktime_after(ktime_get(), msg->msg_deadline))
return -1;
/* always prefer txni/txpeer if they message is committed for both
--
2.27.0
_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org
next prev parent reply other threads:[~2023-01-23 23:06 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-23 23:00 [lustre-devel] [PATCH 00/42] lustre: sync to OpenSFS tree as of Jan 22 2023 James Simmons
2023-01-23 23:00 ` [lustre-devel] [PATCH 01/42] lustre: osc: pack osc_async_page better James Simmons
2023-01-23 23:00 ` [lustre-devel] [PATCH 02/42] lnet: lnet_peer_merge_data to understand large addr James Simmons
2023-01-23 23:00 ` [lustre-devel] [PATCH 03/42] lnet: router_discover - handle large addrs in ping James Simmons
2023-01-23 23:00 ` James Simmons [this message]
2023-01-23 23:00 ` [lustre-devel] [PATCH 05/42] lnet: change lnet_find_best_lpni to handle large NIDs James Simmons
2023-01-23 23:00 ` [lustre-devel] [PATCH 06/42] lustre: ldebugfs: add histogram to stats counter James Simmons
2023-01-23 23:00 ` [lustre-devel] [PATCH 07/42] lustre: llite: wake_up after cl_object_kill James Simmons
2023-01-23 23:00 ` [lustre-devel] [PATCH 08/42] lustre: pcc: use two bits to indicate pcc type for attach James Simmons
2023-01-23 23:00 ` [lustre-devel] [PATCH 09/42] lustre: ldebugfs: make job_stats and rename_stats valid YAML James Simmons
2023-01-23 23:00 ` [lustre-devel] [PATCH 10/42] lustre: misc: fix stats snapshot_time to use wallclock James Simmons
2023-01-23 23:00 ` [lustre-devel] [PATCH 11/42] lustre: pools: force creation of a component without a pool James Simmons
2023-01-23 23:00 ` [lustre-devel] [PATCH 12/42] lustre: sec: reserve flag for fid2path for encrypted files James Simmons
2023-01-23 23:00 ` [lustre-devel] [PATCH 13/42] lustre: llite: update statx size/ctime for fallocate James Simmons
2023-01-23 23:00 ` [lustre-devel] [PATCH 14/42] lustre: ptlrpc: fiemap flexible array James Simmons
2023-01-23 23:00 ` [lustre-devel] [PATCH 15/42] lustre: ptlrpc: Add LCME_FL_PARITY to wirecheck James Simmons
2023-01-23 23:00 ` [lustre-devel] [PATCH 16/42] lnet: selftest: lst read-outside of allocation James Simmons
2023-01-23 23:00 ` [lustre-devel] [PATCH 17/42] lustre: misc: rename lprocfs_stats functions James Simmons
2023-01-23 23:00 ` [lustre-devel] [PATCH 18/42] lustre: osc: Fix possible null pointer James Simmons
2023-01-23 23:00 ` [lustre-devel] [PATCH 19/42] lustre: ptlrpc: NUL terminate long jobid strings James Simmons
2023-01-23 23:00 ` [lustre-devel] [PATCH 20/42] lustre: uapi: remove _GNU_SOURCE dependency in lustre_user.h James Simmons
2023-01-23 23:00 ` [lustre-devel] [PATCH 21/42] lnet: handles unregister/register events James Simmons
2023-01-23 23:00 ` [lustre-devel] [PATCH 22/42] lustre: update version to 2.15.53 James Simmons
2023-01-23 23:00 ` [lustre-devel] [PATCH 23/42] lustre: ptlrpc: don't panic during reconnection James Simmons
2023-01-23 23:00 ` [lustre-devel] [PATCH 24/42] lustre: move to kobj_type default_groups James Simmons
2023-01-23 23:00 ` [lustre-devel] [PATCH 25/42] lnet: increase transaction timeout James Simmons
2023-01-23 23:00 ` [lustre-devel] [PATCH 26/42] lnet: Allow IP specification James Simmons
2023-01-23 23:00 ` [lustre-devel] [PATCH 27/42] lustre: obdclass: fix T10PI prototypes James Simmons
2023-01-23 23:00 ` [lustre-devel] [PATCH 28/42] lustre: obdclass: prefer T10 checksum if the target supports it James Simmons
2023-01-23 23:00 ` [lustre-devel] [PATCH 29/42] lustre: llite: remove false outdated comment James Simmons
2023-01-23 23:00 ` [lustre-devel] [PATCH 30/42] lnet: socklnd: clarify error message on timeout James Simmons
2023-01-23 23:00 ` [lustre-devel] [PATCH 31/42] lustre: llite: replace selinux_is_enabled() James Simmons
2023-01-23 23:00 ` [lustre-devel] [PATCH 32/42] lustre: enc: S_ENCRYPTED flag on OST objects for enc files James Simmons
2023-01-23 23:00 ` [lustre-devel] [PATCH 33/42] lnet: asym route inconsistency warning James Simmons
2023-01-23 23:00 ` [lustre-devel] [PATCH 34/42] lnet: o2iblnd: reset hiw proportionally James Simmons
2023-01-23 23:00 ` [lustre-devel] [PATCH 35/42] lnet: libcfs: cfs_hash_for_each_empty optimization James Simmons
2023-01-23 23:00 ` [lustre-devel] [PATCH 36/42] lustre: llite: always enable remote subdir mount James Simmons
2023-01-23 23:00 ` [lustre-devel] [PATCH 37/42] lnet: selftest: migrate LNet selftest group handling to Netlink James Simmons
2023-01-23 23:00 ` [lustre-devel] [PATCH 38/42] lnet: use Netlink to support LNet ping commands James Simmons
2023-01-23 23:00 ` [lustre-devel] [PATCH 39/42] lustre: llite: revert: "llite: clear stale page's uptodate bit" James Simmons
2023-01-23 23:00 ` [lustre-devel] [PATCH 40/42] lnet: validate data sent from user land properly James Simmons
2023-01-23 23:00 ` [lustre-devel] [PATCH 41/42] lnet: modify lnet_inetdev to work with large NIDS James Simmons
2023-01-23 23:00 ` [lustre-devel] [PATCH 42/42] lustre: ldlm: remove obsolete LDLM_FL_SERVER_LOCK James Simmons
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1674514855-15399-5-git-send-email-jsimmons@infradead.org \
--to=jsimmons@infradead.org \
--cc=adilger@whamcloud.com \
--cc=chris.horn@hpe.com \
--cc=green@whamcloud.com \
--cc=lustre-devel@lists.lustre.org \
--cc=neilb@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).