From: Jeff Layton <jlayton@kernel.org>
To: Chuck Lever <cel@kernel.org>, linux-nfs@vger.kernel.org
Cc: Chuck Lever <chuck.lever@oracle.com>
Subject: Re: [PATCH RFC] NFSD: Fix checksum mismatches in the duplicate reply cache
Date: Fri, 10 Nov 2023 10:32:24 -0500 [thread overview]
Message-ID: <efdbd2d53c97d3482f8844ffc5cd9caee577a5f6.camel@kernel.org> (raw)
In-Reply-To: <169953717421.7448.7269783258225907202.stgit@bazille.1015granger.net>
On Thu, 2023-11-09 at 08:45 -0500, Chuck Lever wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
>
> nfsd_cache_csum() currently assumes that the server's RPC layer has
> been advancing rq_arg.head[0].iov_base as it decodes an incoming
> request, because that's the way it used to work. On entry, it
> expects that buf->head[0].iov_base points to the start of the NFS
> header, and excludes the already-decoded RPC header.
>
> Ever since the RPC layer was converted over to use xdr_stream (in
> v6.3), however, head[0].iov_base now points to the start of the RPC
> header during all processing. It no longer points at the NFS Call
> header when execution arrives at nfsd_cache_csum().
>
> In a retransmitted RPC the XID and the NFS header are supposed to
> be the same as the original message, but the contents of the
> retransmitted RPC header can be different. For example, for krb5,
> the GSS sequence number will be different between the two. Thus if
> the RPC header is included in the DRC checksum computation, the
> checksum of the retransmitted message might not match the checksum
> of the original message, even though the NFS part of these messages
> is identical.
>
> If the DRC fails to recognize a retransmit, it permits the server to
> execute that RPC Call again. That breaks retransmits of idempotent
> procedures such as RENAME or REMOVE -- the retransmitted RPC Call
> will get a different result.
>
> Note that I am not marking this commit with a Fixes: tag:
>
> o The RPC-related commits that broke the DRC are too numerous
> o The fix should not be automatically backported, as any backport
> of it needs to be thoroughly tested
>
> However, it might be appropriate to consider applying this fix to
> v6.3 and later kernels, if someone can make it fit cleanly and test
> it properly. This patch applies with fuzz to v6.6, but does not
> apply cleanly to earlier kernels.
>
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
> fs/nfsd/cache.h | 4 ++--
> fs/nfsd/nfscache.c | 55 +++++++++++++++++++++++++++++++++-------------------
> fs/nfsd/nfssvc.c | 10 +++++++++
> 3 files changed, 46 insertions(+), 23 deletions(-)
>
> This fix is kind of ugly. Looking for comments or suggestions for
> improvement. Two further clean-ups occurred to me:
>
> - Set up a parms struct in nfsd_dispatch() that is passed to
> nfsd_cache_lookup() and nfsd_cache_update() that carries all
> of this extraneous garbage
> - Move nfsd_cache_csum to net/sunrpc/xdr.c since it assumes
> details about the how the message appears in the xdr_buf
>
>
> diff --git a/fs/nfsd/cache.h b/fs/nfsd/cache.h
> index 929248c6ca84..4cbe0434cbb8 100644
> --- a/fs/nfsd/cache.h
> +++ b/fs/nfsd/cache.h
> @@ -84,8 +84,8 @@ int nfsd_net_reply_cache_init(struct nfsd_net *nn);
> void nfsd_net_reply_cache_destroy(struct nfsd_net *nn);
> int nfsd_reply_cache_init(struct nfsd_net *);
> void nfsd_reply_cache_shutdown(struct nfsd_net *);
> -int nfsd_cache_lookup(struct svc_rqst *rqstp,
> - struct nfsd_cacherep **cacherep);
> +int nfsd_cache_lookup(struct svc_rqst *rqstp, unsigned int start,
> + unsigned int len, struct nfsd_cacherep **cacherep);
> void nfsd_cache_update(struct svc_rqst *rqstp, struct nfsd_cacherep *rp,
> int cachetype, __be32 *statp);
> int nfsd_reply_cache_stats_show(struct seq_file *m, void *v);
> diff --git a/fs/nfsd/nfscache.c b/fs/nfsd/nfscache.c
> index fd56a52aa5fb..761896c44e77 100644
> --- a/fs/nfsd/nfscache.c
> +++ b/fs/nfsd/nfscache.c
> @@ -370,32 +370,44 @@ nfsd_reply_cache_scan(struct shrinker *shrink, struct shrink_control *sc)
> }
>
> /*
> - * Walk an xdr_buf and get a CRC for at most the first RC_CSUMLEN bytes
> + * Compute a weak checksum of the leading bytes of an NFS procedure
> + * call header. csum_partial() computes a TCP checksum as specified
> + * in RFC 793.
> + *
> + * To avoid assumptions about how the RPC message is laid out in
> + * @buf and what else it might contain (eg, a GSS MIC suffix), the
> + * caller passes the exact location and length of the NFS Call
> + * header.
> */
Maybe make this a kerneldoc comment?
> -static __wsum
> -nfsd_cache_csum(struct svc_rqst *rqstp)
> +static __wsum nfsd_cache_csum(struct xdr_buf *buf, unsigned int start,
> + unsigned int remaining)
> {
> + unsigned int base, len;
> + struct xdr_buf subbuf;
> + __wsum csum = 0;
> + void *p;
> int idx;
> - unsigned int base;
> - __wsum csum;
> - struct xdr_buf *buf = &rqstp->rq_arg;
> - const unsigned char *p = buf->head[0].iov_base;
> - size_t csum_len = min_t(size_t, buf->head[0].iov_len + buf->page_len,
> - RC_CSUMLEN);
> - size_t len = min(buf->head[0].iov_len, csum_len);
> +
> + if (remaining > RC_CSUMLEN)
> + remaining = RC_CSUMLEN;
> + if (xdr_buf_subsegment(buf, &subbuf, start, remaining))
> + return csum;
>
> /* rq_arg.head first */
> - csum = csum_partial(p, len, 0);
> - csum_len -= len;
> + if (subbuf.head[0].iov_len) {
> + len = min_t(unsigned int, subbuf.head[0].iov_len, remaining);
> + csum = csum_partial(subbuf.head[0].iov_base, len, csum);
> + remaining -= len;
> + }
>
> /* Continue into page array */
> - idx = buf->page_base / PAGE_SIZE;
> - base = buf->page_base & ~PAGE_MASK;
> - while (csum_len) {
> - p = page_address(buf->pages[idx]) + base;
> - len = min_t(size_t, PAGE_SIZE - base, csum_len);
> + idx = subbuf.page_base / PAGE_SIZE;
> + base = subbuf.page_base & ~PAGE_MASK;
> + while (remaining) {
> + p = page_address(subbuf.pages[idx]) + base;
> + len = min_t(unsigned int, PAGE_SIZE - base, remaining);
> csum = csum_partial(p, len, csum);
> - csum_len -= len;
> + remaining -= len;
> base = 0;
> ++idx;
> }
> @@ -466,6 +478,8 @@ nfsd_cache_insert(struct nfsd_drc_bucket *b, struct nfsd_cacherep *key,
> /**
> * nfsd_cache_lookup - Find an entry in the duplicate reply cache
> * @rqstp: Incoming Call to find
> + * @start: location in @rqstp->rq_arg of the NFS Call header
> + * @len: length of NFS Call header in bytes
> * @cacherep: OUT: DRC entry for this request
> *
> * Try to find an entry matching the current call in the cache. When none
> @@ -479,7 +493,8 @@ nfsd_cache_insert(struct nfsd_drc_bucket *b, struct nfsd_cacherep *key,
> * %RC_REPLY: Reply from cache
> * %RC_DROPIT: Do not process the request further
> */
> -int nfsd_cache_lookup(struct svc_rqst *rqstp, struct nfsd_cacherep **cacherep)
> +int nfsd_cache_lookup(struct svc_rqst *rqstp, unsigned int start,
> + unsigned int len, struct nfsd_cacherep **cacherep)
> {
> struct nfsd_net *nn;
> struct nfsd_cacherep *rp, *found;
> @@ -495,7 +510,7 @@ int nfsd_cache_lookup(struct svc_rqst *rqstp, struct nfsd_cacherep **cacherep)
> goto out;
> }
>
> - csum = nfsd_cache_csum(rqstp);
> + csum = nfsd_cache_csum(&rqstp->rq_arg, start, len);
>
> /*
> * Since the common case is a cache miss followed by an insert,
> diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
> index ef126969a7ce..1d6e33c6c4fe 100644
> --- a/fs/nfsd/nfssvc.c
> +++ b/fs/nfsd/nfssvc.c
> @@ -980,6 +980,7 @@ int nfsd_dispatch(struct svc_rqst *rqstp)
> const struct svc_procedure *proc = rqstp->rq_procinfo;
> __be32 *statp = rqstp->rq_accept_statp;
> struct nfsd_cacherep *rp;
> + unsigned int start, len;
>
> /*
> * Give the xdr decoder a chance to change this if it wants
> @@ -987,6 +988,13 @@ int nfsd_dispatch(struct svc_rqst *rqstp)
> */
> rqstp->rq_cachetype = proc->pc_cachetype;
>
> + /*
> + * ->pc_decode advances the argument stream past the NFS
> + * Call header, so grab the header's starting location and
> + * size now for the call to nfsd_cache_lookup().
> + */
> + start = xdr_stream_pos(&rqstp->rq_arg_stream);
> + len = xdr_stream_remaining(&rqstp->rq_arg_stream);
> if (!proc->pc_decode(rqstp, &rqstp->rq_arg_stream))
> goto out_decode_err;
>
> @@ -1000,7 +1008,7 @@ int nfsd_dispatch(struct svc_rqst *rqstp)
> smp_store_release(&rqstp->rq_status_counter, rqstp->rq_status_counter | 1);
>
> rp = NULL;
> - switch (nfsd_cache_lookup(rqstp, &rp)) {
> + switch (nfsd_cache_lookup(rqstp, start, len, &rp)) {
> case RC_DOIT:
> break;
> case RC_REPLY:
>
>
This looks pretty reasonable to me, even if it is a bit ugly. Since the
need is nfsd-specific, passing down these values is preferable to adding
them to struct svc_rqst.
--
Jeff Layton <jlayton@kernel.org>
prev parent reply other threads:[~2023-11-10 18:02 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-11-09 13:45 [PATCH RFC] NFSD: Fix checksum mismatches in the duplicate reply cache Chuck Lever
2023-11-10 15:32 ` Jeff Layton [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=efdbd2d53c97d3482f8844ffc5cd9caee577a5f6.camel@kernel.org \
--to=jlayton@kernel.org \
--cc=cel@kernel.org \
--cc=chuck.lever@oracle.com \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox