From: Chuck Lever <chuck.lever@oracle.com>
To: NeilBrown <neilb@suse.de>
Cc: linux-nfs@vger.kernel.org,
Olga Kornievskaia <okorniev@redhat.com>,
Dai Ngo <Dai.Ngo@oracle.com>, Tom Talpey <tom@talpey.com>,
Jeff Layton <jlayton@kernel.org>
Subject: Re: [PATCH 5/6] nfsd: add support for freeing unused session-DRC slots
Date: Tue, 21 Jan 2025 11:24:38 -0500 [thread overview]
Message-ID: <8e0279ae-e65b-408a-ab30-7665c2714885@oracle.com> (raw)
In-Reply-To: <173742700974.22054.17564503623784796029@noble.neil.brown.name>
On 1/20/25 9:36 PM, NeilBrown wrote:
> On Sun, 19 Jan 2025, Chuck Lever wrote:
>> On 12/11/24 4:47 PM, NeilBrown wrote:
>>> Reducing the number of slots in the session slot table requires
>>> confirmation from the client. This patch adds reduce_session_slots()
>>> which starts the process of getting confirmation, but never calls it.
>>> That will come in a later patch.
>>>
>>> Before we can free a slot we need to confirm that the client won't try
>>> to use it again. This involves returning a lower cr_maxrequests in a
>>> SEQUENCE reply and then seeing a ca_maxrequests on the same slot which
>>> is not larger than we limit we are trying to impose. So for each slot
>>> we need to remember that we have sent a reduced cr_maxrequests.
>>>
>>> To achieve this we introduce a concept of request "generations". Each
>>> time we decide to reduce cr_maxrequests we increment the generation
>>> number, and record this when we return the lower cr_maxrequests to the
>>> client. When a slot with the current generation reports a low
>>> ca_maxrequests, we commit to that level and free extra slots.
>>>
>>> We use an 16 bit generation number (64 seems wasteful) and if it cycles
>>> we iterate all slots and reset the generation number to avoid false matches.
>>>
>>> When we free a slot we store the seqid in the slot pointer so that it can
>>> be restored when we reactivate the slot. The RFC can be read as
>>> suggesting that the slot number could restart from one after a slot is
>>> retired and reactivated, but also suggests that retiring slots is not
>>> required. So when we reactive a slot we accept with the next seqid in
>>> sequence, or 1.
>>>
>>> When decoding sa_highest_slotid into maxslots we need to add 1 - this
>>> matches how it is encoded for the reply.
>>>
>>> se_dead is moved in struct nfsd4_session to remove a hole.
>>>
>>> Reviewed-by: Jeff Layton <jlayton@kernel.org>
>>> Signed-off-by: NeilBrown <neilb@suse.de>
>>> ---
>>> fs/nfsd/nfs4state.c | 94 ++++++++++++++++++++++++++++++++++++++++-----
>>> fs/nfsd/nfs4xdr.c | 5 ++-
>>> fs/nfsd/state.h | 6 ++-
>>> fs/nfsd/xdr4.h | 2 -
>>> 4 files changed, 92 insertions(+), 15 deletions(-)
>>>
>>> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
>>> index fd9473d487f3..a2d1f97b8a0e 100644
>>> --- a/fs/nfsd/nfs4state.c
>>> +++ b/fs/nfsd/nfs4state.c
>>> @@ -1910,17 +1910,69 @@ gen_sessionid(struct nfsd4_session *ses)
>>> #define NFSD_MIN_HDR_SEQ_SZ (24 + 12 + 44)
>>>
>>> static void
>>> -free_session_slots(struct nfsd4_session *ses)
>>> +free_session_slots(struct nfsd4_session *ses, int from)
>>> {
>>> int i;
>>>
>>> - for (i = 0; i < ses->se_fchannel.maxreqs; i++) {
>>> + if (from >= ses->se_fchannel.maxreqs)
>>> + return;
>>> +
>>> + for (i = from; i < ses->se_fchannel.maxreqs; i++) {
>>> struct nfsd4_slot *slot = xa_load(&ses->se_slots, i);
>>>
>>> - xa_erase(&ses->se_slots, i);
>>> + /*
>>> + * Save the seqid in case we reactivate this slot.
>>> + * This will never require a memory allocation so GFP
>>> + * flag is irrelevant
>>> + */
>>> + xa_store(&ses->se_slots, i, xa_mk_value(slot->sl_seqid), 0);
>>> free_svc_cred(&slot->sl_cred);
>>> kfree(slot);
>>> }
>>> + ses->se_fchannel.maxreqs = from;
>>> + if (ses->se_target_maxslots > from)
>>> + ses->se_target_maxslots = from;
>>> +}
>>> +
>>> +/**
>>> + * reduce_session_slots - reduce the target max-slots of a session if possible
>>> + * @ses: The session to affect
>>> + * @dec: how much to decrease the target by
>>> + *
>>> + * This interface can be used by a shrinker to reduce the target max-slots
>>> + * for a session so that some slots can eventually be freed.
>>> + * It uses spin_trylock() as it may be called in a context where another
>>> + * spinlock is held that has a dependency on client_lock. As shrinkers are
>>> + * best-effort, skiping a session is client_lock is already held has no
>>> + * great coast
>>> + *
>>> + * Return value:
>>> + * The number of slots that the target was reduced by.
>>> + */
>>> +static int __maybe_unused
>>> +reduce_session_slots(struct nfsd4_session *ses, int dec)
>>> +{
>>> + struct nfsd_net *nn = net_generic(ses->se_client->net,
>>> + nfsd_net_id);
>>> + int ret = 0;
>>> +
>>> + if (ses->se_target_maxslots <= 1)
>>> + return ret;
>>> + if (!spin_trylock(&nn->client_lock))
>>> + return ret;
>>> + ret = min(dec, ses->se_target_maxslots-1);
>>> + ses->se_target_maxslots -= ret;
>>> + ses->se_slot_gen += 1;
>>> + if (ses->se_slot_gen == 0) {
>>> + int i;
>>> + ses->se_slot_gen = 1;
>>> + for (i = 0; i < ses->se_fchannel.maxreqs; i++) {
>>> + struct nfsd4_slot *slot = xa_load(&ses->se_slots, i);
>>> + slot->sl_generation = 0;
>>> + }
>>> + }
>>> + spin_unlock(&nn->client_lock);
>>> + return ret;
>>> }
>>>
>>> /*
>>> @@ -1968,6 +2020,7 @@ static struct nfsd4_session *alloc_session(struct nfsd4_channel_attrs *fattrs,
>>> }
>>> fattrs->maxreqs = i;
>>> memcpy(&new->se_fchannel, fattrs, sizeof(struct nfsd4_channel_attrs));
>>> + new->se_target_maxslots = i;
>>> new->se_cb_slot_avail = ~0U;
>>> new->se_cb_highest_slot = min(battrs->maxreqs - 1,
>>> NFSD_BC_SLOT_TABLE_SIZE - 1);
>>> @@ -2081,7 +2134,7 @@ static void nfsd4_del_conns(struct nfsd4_session *s)
>>>
>>> static void __free_session(struct nfsd4_session *ses)
>>> {
>>> - free_session_slots(ses);
>>> + free_session_slots(ses, 0);
>>> xa_destroy(&ses->se_slots);
>>> kfree(ses);
>>> }
>>> @@ -2684,6 +2737,9 @@ static int client_info_show(struct seq_file *m, void *v)
>>> seq_printf(m, "session slots:");
>>> list_for_each_entry(ses, &clp->cl_sessions, se_perclnt)
>>> seq_printf(m, " %u", ses->se_fchannel.maxreqs);
>>> + seq_printf(m, "\nsession target slots:");
>>> + list_for_each_entry(ses, &clp->cl_sessions, se_perclnt)
>>> + seq_printf(m, " %u", ses->se_target_maxslots);
>>> spin_unlock(&clp->cl_lock);
>>> seq_puts(m, "\n");
>>>
>>> @@ -3674,10 +3730,10 @@ nfsd4_exchange_id_release(union nfsd4_op_u *u)
>>> kfree(exid->server_impl_name);
>>> }
>>>
>>> -static __be32 check_slot_seqid(u32 seqid, u32 slot_seqid, bool slot_inuse)
>>> +static __be32 check_slot_seqid(u32 seqid, u32 slot_seqid, u8 flags)
>>> {
>>> /* The slot is in use, and no response has been sent. */
>>> - if (slot_inuse) {
>>> + if (flags & NFSD4_SLOT_INUSE) {
>>> if (seqid == slot_seqid)
>>> return nfserr_jukebox;
>>> else
>>> @@ -3686,6 +3742,8 @@ static __be32 check_slot_seqid(u32 seqid, u32 slot_seqid, bool slot_inuse)
>>> /* Note unsigned 32-bit arithmetic handles wraparound: */
>>> if (likely(seqid == slot_seqid + 1))
>>> return nfs_ok;
>>> + if ((flags & NFSD4_SLOT_REUSED) && seqid == 1)
>>> + return nfs_ok;
>>> if (seqid == slot_seqid)
>>> return nfserr_replay_cache;
>>> return nfserr_seq_misordered;
>>> @@ -4236,8 +4294,7 @@ nfsd4_sequence(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>>> dprintk("%s: slotid %d\n", __func__, seq->slotid);
>>>
>>> trace_nfsd_slot_seqid_sequence(clp, seq, slot);
>>> - status = check_slot_seqid(seq->seqid, slot->sl_seqid,
>>> - slot->sl_flags & NFSD4_SLOT_INUSE);
>>> + status = check_slot_seqid(seq->seqid, slot->sl_seqid, slot->sl_flags);
>>> if (status == nfserr_replay_cache) {
>>> status = nfserr_seq_misordered;
>>> if (!(slot->sl_flags & NFSD4_SLOT_INITIALIZED))
>>> @@ -4262,6 +4319,12 @@ nfsd4_sequence(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>>> if (status)
>>> goto out_put_session;
>>>
>>> + if (session->se_target_maxslots < session->se_fchannel.maxreqs &&
>>> + slot->sl_generation == session->se_slot_gen &&
>>> + seq->maxslots <= session->se_target_maxslots)
>>> + /* Client acknowledged our reduce maxreqs */
>>> + free_session_slots(session, session->se_target_maxslots);
>>> +
>>> buflen = (seq->cachethis) ?
>>> session->se_fchannel.maxresp_cached :
>>> session->se_fchannel.maxresp_sz;
>>> @@ -4272,9 +4335,11 @@ nfsd4_sequence(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>>> svc_reserve(rqstp, buflen);
>>>
>>> status = nfs_ok;
>>> - /* Success! bump slot seqid */
>>> + /* Success! accept new slot seqid */
>>> slot->sl_seqid = seq->seqid;
>>> + slot->sl_flags &= ~NFSD4_SLOT_REUSED;
>>> slot->sl_flags |= NFSD4_SLOT_INUSE;
>>> + slot->sl_generation = session->se_slot_gen;
>>> if (seq->cachethis)
>>> slot->sl_flags |= NFSD4_SLOT_CACHETHIS;
>>> else
>>> @@ -4291,9 +4356,11 @@ nfsd4_sequence(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>>> * the client might use.
>>> */
>>> if (seq->slotid == session->se_fchannel.maxreqs - 1 &&
>>> + session->se_target_maxslots >= session->se_fchannel.maxreqs &&
>>> session->se_fchannel.maxreqs < NFSD_MAX_SLOTS_PER_SESSION) {
>>> int s = session->se_fchannel.maxreqs;
>>> int cnt = DIV_ROUND_UP(s, 5);
>>> + void *prev_slot;
>>>
>>> do {
>>> /*
>>> @@ -4303,18 +4370,25 @@ nfsd4_sequence(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>>> */
>>> slot = kzalloc(slot_bytes(&session->se_fchannel),
>>> GFP_NOWAIT);
>>> + prev_slot = xa_load(&session->se_slots, s);
>>> + if (xa_is_value(prev_slot) && slot) {
>>> + slot->sl_seqid = xa_to_value(prev_slot);
>>> + slot->sl_flags |= NFSD4_SLOT_REUSED;
>>> + }
>>> if (slot &&
>>> !xa_is_err(xa_store(&session->se_slots, s, slot,
>>> GFP_NOWAIT))) {
>>> s += 1;
>>> session->se_fchannel.maxreqs = s;
>>> + session->se_target_maxslots = s;
>>> } else {
>>> kfree(slot);
>>> slot = NULL;
>>> }
>>> } while (slot && --cnt > 0);
>>> }
>>> - seq->maxslots = session->se_fchannel.maxreqs;
>>> + seq->maxslots = max(session->se_target_maxslots, seq->maxslots);
>>> + seq->target_maxslots = session->se_target_maxslots;
>>>
>>> out:
>>> switch (clp->cl_cb_state) {
>>> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
>>> index 53fac037611c..4dcb03cd9292 100644
>>> --- a/fs/nfsd/nfs4xdr.c
>>> +++ b/fs/nfsd/nfs4xdr.c
>>> @@ -1884,7 +1884,8 @@ nfsd4_decode_sequence(struct nfsd4_compoundargs *argp,
>>> return nfserr_bad_xdr;
>>> seq->seqid = be32_to_cpup(p++);
>>> seq->slotid = be32_to_cpup(p++);
>>> - seq->maxslots = be32_to_cpup(p++);
>>> + /* sa_highest_slotid counts from 0 but maxslots counts from 1 ... */
>>> + seq->maxslots = be32_to_cpup(p++) + 1;
>>> seq->cachethis = be32_to_cpup(p);
>>>
>>> seq->status_flags = 0;
>>> @@ -4968,7 +4969,7 @@ nfsd4_encode_sequence(struct nfsd4_compoundres *resp, __be32 nfserr,
>>> if (nfserr != nfs_ok)
>>> return nfserr;
>>> /* sr_target_highest_slotid */
>>> - nfserr = nfsd4_encode_slotid4(xdr, seq->maxslots - 1);
>>> + nfserr = nfsd4_encode_slotid4(xdr, seq->target_maxslots - 1);
>>> if (nfserr != nfs_ok)
>>> return nfserr;
>>> /* sr_status_flags */
>>> diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
>>> index aad547d3ad8b..4251ff3c5ad1 100644
>>> --- a/fs/nfsd/state.h
>>> +++ b/fs/nfsd/state.h
>>> @@ -245,10 +245,12 @@ struct nfsd4_slot {
>>> struct svc_cred sl_cred;
>>> u32 sl_datalen;
>>> u16 sl_opcnt;
>>> + u16 sl_generation;
>>> #define NFSD4_SLOT_INUSE (1 << 0)
>>> #define NFSD4_SLOT_CACHETHIS (1 << 1)
>>> #define NFSD4_SLOT_INITIALIZED (1 << 2)
>>> #define NFSD4_SLOT_CACHED (1 << 3)
>>> +#define NFSD4_SLOT_REUSED (1 << 4)
>>> u8 sl_flags;
>>> char sl_data[];
>>> };
>>> @@ -321,7 +323,6 @@ struct nfsd4_session {
>>> u32 se_cb_slot_avail; /* bitmap of available slots */
>>> u32 se_cb_highest_slot; /* highest slot client wants */
>>> u32 se_cb_prog;
>>> - bool se_dead;
>>> struct list_head se_hash; /* hash by sessionid */
>>> struct list_head se_perclnt;
>>> struct nfs4_client *se_client;
>>> @@ -331,6 +332,9 @@ struct nfsd4_session {
>>> struct list_head se_conns;
>>> u32 se_cb_seq_nr[NFSD_BC_SLOT_TABLE_SIZE];
>>> struct xarray se_slots; /* forward channel slots */
>>> + u16 se_slot_gen;
>>> + bool se_dead;
>>> + u32 se_target_maxslots;
>>> };
>>>
>>> /* formatted contents of nfs4_sessionid */
>>> diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h
>>> index 382cc1389396..c26ba86dbdfd 100644
>>> --- a/fs/nfsd/xdr4.h
>>> +++ b/fs/nfsd/xdr4.h
>>> @@ -576,9 +576,7 @@ struct nfsd4_sequence {
>>> u32 slotid; /* request/response */
>>> u32 maxslots; /* request/response */
>>> u32 cachethis; /* request */
>>> -#if 0
>>> u32 target_maxslots; /* response */
>>> -#endif /* not yet */
>>> u32 status_flags; /* response */
>>> };
>>>
>>
>> Hi Neil -
>>
>> I've found some misbehavior which I've bisected to this commit.
>>
>> If disconnect injection is set up to break the connection every 25,000
>> RPCs or so, xfstests running on an NFSv4.1 mount will eventually stall
>> after this commit is applied.
>>
>> Network capture shows that the server eventually starts returning
>> SEQ_MISORDERED because the client has forgotten an retired slot after a
>> disconnect, and tries to use sequence number 1 for that slot with a new
>> operation.
>>
>> I've narrowed the issue down to nfs41_is_outlier_target_slotid() on the
>> client. This function uses a bit of calculus to decide when to bump the
>> slot table's generation number. In the failing case, it never bumps the
>> generation, and that results in the client freeing slots it shouldn't.
>> The server's reported { highest, target_highest } slot numbers don't
>> appear to change correctly after the client has reconnected.
>>
>> If I revert this hunk from 5/6:
>>
>> @@ -4968,7 +4969,7 @@ nfsd4_encode_sequence(struct nfsd4_compoundres
>> *resp, __be32 nfserr,
>> if (nfserr != nfs_ok)
>> return nfserr;
>> /* sr_target_highest_slotid */
>> - nfserr = nfsd4_encode_slotid4(xdr, seq->maxslots - 1);
>> + nfserr = nfsd4_encode_slotid4(xdr, seq->target_maxslots - 1);
>> if (nfserr != nfs_ok)
>> return nfserr;
>> /* sr_status_flags */
>>
>> the reproducer above runs to completion in the expected amount of time.
>>
>>
>> The high order bit here is whether I should drop these patches for
>> v6.14, or whether you believe you can come up with a narrow solution
>> during the early part of v6.14-rc that I can include in an -rc update
>> for NFSD. I can't really tell if a significant amount of surgery will
>> be necessary.
>>
>> What do you think?
>
> I think I can fix it.
>
> It sounds like it might be a client bug, or it might be a difference in
> interpretation of the spec, or it might be an ambiguity in the spec.
Perhaps, but consider that the Linux NFS client interoperates
successfully with v6.13 NFSD, Solaris's NFSv4.1 implementation,
NetApp's implementation, and Hammerspace, and has done for many
years.
I agree that the spec language isn't particularly transparent, but
somehow all of these implementations have interpreted it the same
way.
> But I think I can fix it by setting NFSD_SLOT_REUSED in any slots
> beyond ->target_maxslots when I reduce target_maxslots. That makes it
> so that we accept either 1 or the next-in-sequence seqid.
> Doing that does open up a possible risk of a resend being accepted as a
> new request. As it is a new connection, resends are a real possibility.
> I'll have to ponder that a bit more and might need to be careful about
> retiring slots with a low seqid which have been used recently. Maybe I
> need to store the time when seqid 1 was used, and not return a slot
> until some minimum time after that timestamp.
I gathered additional information using a trace_printk() in the client's
decode_sequence() function.
In the working case, after reconnect, the server SEQUENCE response has
sr_highest_slotid = 18
sr_target_highest_slot = 18
IIUC 18 is the slot number of the highest slot currently in use.
In the stalling case, after reconnect, the server SEQUENCE response has
sr_highest_slotid = 23
sr_target_highest_slot = 63
The client heuristics decide to free slots 23 and higher, even though
one or more of those slots have been active and have sequence numbers
larger than 1.
I can provide more detailed reproducer instructions if you wish.
> I'll try to get you a patch before the end of next week.
Thanks!
--
Chuck Lever
next prev parent reply other threads:[~2025-01-21 16:25 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-11 21:47 [PATCH 0/6 v5] nfsd: allocate/free session-based DRC slots on demand NeilBrown
2024-12-11 21:47 ` [PATCH 1/6] nfsd: use an xarray to store v4.1 session slots NeilBrown
2024-12-11 21:47 ` [PATCH 2/6] nfsd: remove artificial limits on the session-based DRC NeilBrown
2024-12-11 21:47 ` [PATCH 3/6] nfsd: add session slot count to /proc/fs/nfsd/clients/*/info NeilBrown
2024-12-11 21:47 ` [PATCH 4/6] nfsd: allocate new session-based DRC slots on demand NeilBrown
2024-12-11 21:47 ` [PATCH 5/6] nfsd: add support for freeing unused session-DRC slots NeilBrown
2025-01-19 2:01 ` Chuck Lever
2025-01-21 2:36 ` NeilBrown
2025-01-21 16:24 ` Chuck Lever [this message]
2025-01-27 4:08 ` NeilBrown
2025-01-27 13:57 ` Chuck Lever
2025-01-27 22:57 ` NeilBrown
2024-12-11 21:47 ` [PATCH 6/6] nfsd: add shrinker to reduce number of slots allocated per session NeilBrown
-- strict thread matches above, loose matches on Subject: below --
2024-12-08 22:43 [PATCH 0/6 v4] nfsd: allocate/free session-based DRC slots on demand NeilBrown
2024-12-08 22:43 ` [PATCH 5/6] nfsd: add support for freeing unused session-DRC slots NeilBrown
2024-12-06 0:43 [PATCH 0/6 v3] nfsd: allocate/free session-based DRC slots on demand NeilBrown
2024-12-06 0:43 ` [PATCH 5/6] nfsd: add support for freeing unused session-DRC slots NeilBrown
2024-12-06 5:30 ` Jeff Layton
2024-12-06 6:05 ` NeilBrown
2024-12-06 13:59 ` Jeff Layton
2024-11-19 0:41 [PATCH 0/6 RFC v2] nfsd: allocate/free session-based DRC slots on demand NeilBrown
2024-11-19 0:41 ` [PATCH 5/6] nfsd: add support for freeing unused session-DRC slots NeilBrown
2024-11-19 19:25 ` Chuck Lever
2024-11-19 22:35 ` NeilBrown
2024-11-20 1:27 ` Chuck Lever
2024-11-21 21:47 ` NeilBrown
2024-11-21 22:29 ` Chuck Lever III
2024-12-02 16:11 ` Chuck Lever III
2024-12-03 4:28 ` NeilBrown
2024-12-03 14:40 ` Chuck Lever III
2024-11-19 19:48 ` Jeff Layton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8e0279ae-e65b-408a-ab30-7665c2714885@oracle.com \
--to=chuck.lever@oracle.com \
--cc=Dai.Ngo@oracle.com \
--cc=jlayton@kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=neilb@suse.de \
--cc=okorniev@redhat.com \
--cc=tom@talpey.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox