linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Chuck Lever <chuck.lever@oracle.com>
To: NeilBrown <neil@brown.name>
Cc: Chuck Lever <cel@kernel.org>, Jeff Layton <jlayton@kernel.org>,
	Olga Kornievskaia <okorniev@redhat.com>,
	Dai Ngo <dai.ngo@oracle.com>, Tom Talpey <tom@talpey.com>,
	linux-nfs@vger.kernel.org
Subject: Re: [PATCH v2 3/4] NFSD: Do not cache solo SEQUENCE operations
Date: Sat, 11 Oct 2025 11:30:34 -0400	[thread overview]
Message-ID: <1692dd6e-3b5c-4fde-bf21-24a5ca2db9df@oracle.com> (raw)
In-Reply-To: <176014414963.1793333.15458960142917474536@noble.neil.brown.name>

On 10/10/25 8:55 PM, NeilBrown wrote:
> On Sat, 11 Oct 2025, Chuck Lever wrote:
>> On 10/9/25 7:29 PM, NeilBrown wrote:
>>> On Thu, 09 Oct 2025, Chuck Lever wrote:
>>>> On 10/8/25 6:03 PM, NeilBrown wrote:
>>>>> On Thu, 09 Oct 2025, Chuck Lever wrote:
>>>>>> On 10/7/25 6:12 PM, NeilBrown wrote:
>>>>>>> On Wed, 08 Oct 2025, Chuck Lever wrote:
>>>>>>>> On 10/7/25 12:04 PM, Chuck Lever wrote:
>>>>>>>>>  RFC 8881 Section 2.10.6.1.3 says:
>>>>>>>>>
>>>>>>>>>> On a per-request basis, the requester can choose to direct the
>>>>>>>>>> replier to cache the reply to all operations after the first
>>>>>>>>>> operation (SEQUENCE or CB_SEQUENCE) via the sa_cachethis or
>>>>>>>>>> csa_cachethis fields of the arguments to SEQUENCE or CB_SEQUENCE.
>>>>>>>>> RFC 8881 Section 2.10.6.4 further says:
>>>>>>>>>
>>>>>>>>>> If sa_cachethis or csa_cachethis is TRUE, then the replier MUST
>>>>>>>>>> cache a reply except if an error is returned by the SEQUENCE or
>>>>>>>>>> CB_SEQUENCE operation (see Section 2.10.6.1.2).
>>>>>>>>> This suggests to me that the spec authors do not expect an NFSv4.1
>>>>>>>>> server implementation to ever cache the result of a SEQUENCE
>>>>>>>>> operation (except perhaps as part of a successful multi-operation
>>>>>>>>> COMPOUND).
>>>>>>>>>
>>>>>>>>> NFSD attempts to cache the result of solo SEQUENCE operations,
>>>>>>>>> however. This is because the protocol does not permit servers to
>>>>>>>>> respond to a SEQUENCE with NFS4ERR_RETRY_UNCACHED_REP. If the server
>>>>>>>>> always caches solo SEQUENCE operations, then it never has occasion
>>>>>>>>> to return that status code.
>>>>>>>>>
>>>>>>>>> However, clients use solo SEQUENCE to query the current status of a
>>>>>>>>> session slot. A cached reply will return stale information to the
>>>>>>>>> client, and could result in an infinite loop.
>>>>>>>>
>>>>>>>> The pynfs SEQ9f test is now failing with this change. This test:
>>>>>>>>
>>>>>>>> - Sends a CREATE_SESSION
>>>>>>>> - Sends a solo SEQUENCE with sa_cachethis set
>>>>>>>> - Sends the same operation without changing the slot sequence number
>>>>>>>>
>>>>>>>> The test expects the server's response to be NFS4_OK. NFSD now returns
>>>>>>>> NFS4ERR_SEQ_FALSE_RETRY instead.
>>>>>>>>
>>>>>>>> It's possible the test is wrong, but how should it be fixed?
>>>>>>>>
>>>>>>>> Is it compliant for an NFSv4.1 server to ignore sa_cachethis for a
>>>>>>>> COMPOUND containing a solo SEQUENCE?
>>>>>>>>
>>>>>>>> When reporting a retransmitted solo SEQUENCE, what is the correct status
>>>>>>>> code?
>>>>>>>
>>>>>>> Interesting question....
>>>>>>> To help with context: you wrote:
>>>>>>>
>>>>>>>    However, clients use solo SEQUENCE to query the current status of a
>>>>>>>    session slot.  A cached reply will return stale information to the
>>>>>>>    client, and could result in an infinite loop.
>>>>>>>
>>>>>>> Could you please expand on that?  What in the reply might be stale, and
>>>>>>> how might that result in an infinite loop?
>>>>>>>
>>>>>>> Could a reply to a replayed singleton SEQUENCE simple always return the
>>>>>>> current info, rather than cached info?
>>>>>>
>>>>>> If a cached reply is returned to the client, the slot sequence number
>>>>>> doesn't change, and neither do the SEQ4_STATUS flags.
>>>>>
>>>>> Why is that a problem?  And importantly: how can it result in an
>>>>> infinite loop?
>>>>
>>>> My remark was "could result in an infinite loop" -- subjunctive, meaning
>>>> we haven't seen that behavior in this specific case, but there is a
>>>> risk, if the client has a bug, for example. I can drop that remark, if
>>>> it vexes.
>>>
>>> I think dropping it would be best.
>>>
>>>>
>>>> The actual issue at hand is that the client can set the server up for
>>>> a memory overwrite. If CREATE_SESSION does not request a large enough
>>>> ca_maxresponsesize_cached, but the server expects to be able to cache
>>>> SEQUENCE operations anyway, a solo SEQUENCE will cause the server to
>>>> write into a cache that does not exist.
>>>
>>> OK, this looks like an important issue.  Adding that to the commit
>>> message would help.
>>>
>>>>
>>>> Either NFSD needs to unconditionally reserve the slot cache space for
>>>> solo SEQUENCE, if it intends to unconditionally cache those; or it
>>>> shouldn't cache them at all.
>>>>
>>>> The language of the spec suggests that the authors did not expect
>>>> servers to cache solo SEQUENCE operations. It states that sa_cachethis
>>>> refers to caching COMPOUND operations /after/ the SEQUENCE operation.
>>>>
>>>>
>>>>>> The only real recovery in this case is to destroy the session, which
>>>>>> will remove the cached reply.
>>>>>
>>>>> What "case" that needs to be recovered from?
>>>>
>>>> When the session slot has become unusable because server and client have
>>>> different ideas of what the slot sequence number is.
>>>>
>>>> The client and server might stop using a slot in that situation.
>>>> Destroying the session is the only way to get rid of the slot entirely.
>>>>
>>>> But IMHO this is a rat hole.
>>>
>>> OK, I think I understand now.
>>>
>>> According to RFC 5661
>>>
>>>   The value of the sa_sequenceid argument relative to the cached
>>>    sequence ID on the slot falls into one of three cases.
>>>
>>> of which the second case is
>>>
>>>    o  If sa_sequenceid and the cached sequence ID are the same, this is
>>>       a retry, and the server replies with what is recorded in the reply
>>>       cache.  The lease is possibly renewed as described below.
>>>
>>> So the server *must* cache the reply for the most recent SEQUENCE in
>>> each slot.
>>>
>>> The language in that section seems to suggest the "cached sequence ID"
>>> is stored in the slot's reply cache.
>>> Our struct nfsd4_slot stores the cached sequence ID not as part of the
>>> generic reply "sa_data[]" but as a dedicated field.  We could store the
>>> rest of the SEQUENCE reply in dedicated fields and leave sa_data to
>>> store the replies to ops 2 onwards.
>>>
>>> The discussion of ca_maxresponsesize_cached always takes about "if
>>> sa_cachethis is set" it seems to apply only to those ops where that
>>> is relevant - it isn't relevant for SEQUENCE.
>>>
>>> So I think I would be in favour of always having enough space to cache a
>>> full SEQUENCE reply.  The decision of whether it goes in sa_data[] or in
>>> dedicated fields depends on how clean the code looks.
>>
>> IIUC the protocol requirements are:
>>
>> + solo SEQUENCE that fails MUST NOT be cached. That is currently an NFSD
>> bug, because it unconditionally caches solo SEQUENCE. I have a patch to
>> be added to this series to address that.
> 
> Agreed.
> 
>>
>> + solo SEQUENCE when sa_cachethis is false does not need to be cached,
>> but the protocol does not allow REP_UNCACHED_RETRY. This is the only
>> reason to cache a solo SEQUENCE reply. Note that the cached response to
>> a solo SEQUENCE can be constructed from existing fields in the slot. But
>> I can't see such reconstruction being clean, and it is certainly a
>> heroic case (ie, not commonly used, if ever).
> 
> I disagree.  The text I quoted strongly implies that any SEQUENCE must be
> cached so that it can be replayed on a resend. 

First, RFC 8881 is now authoritative, not RFC 5661, so let's stick with
the former.

The text you quoted refers to the sa_sequenceid field. AFAIUI, that is
a field that could be or should be saved whether sa_cachethis is set or
not.

My question regards only what NFSD saves in the slot's sl_data, the part
of the actual RPC reply message that is saved by
nfsd4_store_cache_entry() when sa_cachethis is set.


> This is not presented as
> optional.
>    "the server replies with what is recorded in the reply cache".

Be careful here. There is no usage of BCP14 terms here such as MUST or
REQUIRED. Therefore the text here is not a normative mandate of any
kind; rather this text is only descriptive. We don't have to speculate
about whether this is or isn't optional; this language is clearly not
normative.


> The code currently always constructs a new reply using the fields of
> 'seq' - the request.
> This is probably wrong.
> When nfsd4_sequence() doesn't detect a resend, it updates 
>    ->maxslots
>    ->target_maxslots
>    ->status_flags
> 
> When a replay is detected, those fields aren't updated so the client
> just sees what it send.
> The nfsd4_slot doesn't have explicit room to store these.  Maybe it should.

A few paragraphs earlier in Section 18.46.3, we have:

> The sa_sequenceid field is the sequence number of the request for the
> reply cache entry (slot). The sr_slotid result MUST equal sa_slotid.
> The sr_sequenceid result MUST equal sa_sequenceid.

Here we have BCP14 terms that are normative requirements that the server
copies the client's values to the reply. I haven't yet found text that
suggests this needs to be different when returning a cached reply.

I'm open to suggestions, though.


> NeilBrown
> 
>>
>> nfsd4_alloc_slot() sets the slot replay cache size to /zero/ when the
>> client's ca_maxresponsesize_cached is smaller than ... I think just the
>> size of a solo SEQUENCE. I thought that was the bug, originally, but
>> then Tom suggested there's truly no reason to cache a solo SEQUENCE.
>>
>> So, we could go with:
>>
>> 0. Fix nfsd_is_solo_sequence (that's 2/4 in this series)
>> 1. Fix NFSD to not cache failed solo SEQUENCE
>> 2. Increase the minimum size of the slot's replay cache to fit a
>> solo SEQUENCE
>>
>>
>> -- 
>> Chuck Lever
>>
> 


-- 
Chuck Lever

  reply	other threads:[~2025-10-11 15:30 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-07 16:04 [PATCH v2 0/4] Fix unwanted memory overwrites Chuck Lever
2025-10-07 16:04 ` [PATCH v2 1/4] NFSD: Skip close replay processing if XDR encoding fails Chuck Lever
2025-10-07 16:04 ` [PATCH v2 2/4] NFSD: Fix the "is this a solo SEQUENCE" predicate Chuck Lever
2025-10-07 17:18   ` Jeff Layton
2025-10-07 16:04 ` [PATCH v2 3/4] NFSD: Do not cache solo SEQUENCE operations Chuck Lever
2025-10-07 17:19   ` Jeff Layton
2025-10-07 20:05   ` Chuck Lever
2025-10-07 22:12     ` NeilBrown
2025-10-08 13:04       ` Chuck Lever
2025-10-08 22:03         ` NeilBrown
2025-10-09 12:56           ` Chuck Lever
2025-10-09 23:29             ` NeilBrown
2025-10-10 13:03               ` Chuck Lever
2025-10-11  0:55                 ` NeilBrown
2025-10-11 15:30                   ` Chuck Lever [this message]
2025-10-07 22:21   ` Calum Mackay
2025-10-07 16:04 ` [PATCH v2 4/4] NFSD: Move nfsd4_cache_this() Chuck Lever
2025-10-07 17:20   ` Jeff Layton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1692dd6e-3b5c-4fde-bf21-24a5ca2db9df@oracle.com \
    --to=chuck.lever@oracle.com \
    --cc=cel@kernel.org \
    --cc=dai.ngo@oracle.com \
    --cc=jlayton@kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=neil@brown.name \
    --cc=okorniev@redhat.com \
    --cc=tom@talpey.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).