Re: [PATCH RFC] NFSD: Fix checksum mismatches in the duplicate reply cache

Linux NFS development
 help / color / mirror / Atom feed

From: Jeff Layton <jlayton@kernel.org>
To: Chuck Lever <cel@kernel.org>, linux-nfs@vger.kernel.org
Cc: Chuck Lever <chuck.lever@oracle.com>
Subject: Re: [PATCH RFC] NFSD: Fix checksum mismatches in the duplicate reply cache
Date: Fri, 10 Nov 2023 10:32:24 -0500	[thread overview]
Message-ID: <efdbd2d53c97d3482f8844ffc5cd9caee577a5f6.camel@kernel.org> (raw)
In-Reply-To: <169953717421.7448.7269783258225907202.stgit@bazille.1015granger.net>

On Thu, 2023-11-09 at 08:45 -0500, Chuck Lever wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
> 
> nfsd_cache_csum() currently assumes that the server's RPC layer has
> been advancing rq_arg.head[0].iov_base as it decodes an incoming
> request, because that's the way it used to work. On entry, it
> expects that buf->head[0].iov_base points to the start of the NFS
> header, and excludes the already-decoded RPC header.
> 
> Ever since the RPC layer was converted over to use xdr_stream (in
> v6.3), however, head[0].iov_base now points to the start of the RPC
> header during all processing. It no longer points at the NFS Call
> header when execution arrives at nfsd_cache_csum().
> 
> In a retransmitted RPC the XID and the NFS header are supposed to
> be the same as the original message, but the contents of the
> retransmitted RPC header can be different. For example, for krb5,
> the GSS sequence number will be different between the two. Thus if
> the RPC header is included in the DRC checksum computation, the
> checksum of the retransmitted message might not match the checksum
> of the original message, even though the NFS part of these messages
> is identical.
> 
> If the DRC fails to recognize a retransmit, it permits the server to
> execute that RPC Call again. That breaks retransmits of idempotent
> procedures such as RENAME or REMOVE -- the retransmitted RPC Call
> will get a different result.
> 
> Note that I am not marking this commit with a Fixes: tag:
> 
>  o The RPC-related commits that broke the DRC are too numerous
>  o The fix should not be automatically backported, as any backport
>    of it needs to be thoroughly tested
> 
> However, it might be appropriate to consider applying this fix to
> v6.3 and later kernels, if someone can make it fit cleanly and test
> it properly. This patch applies with fuzz to v6.6, but does not
> apply cleanly to earlier kernels.
> 
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
>  fs/nfsd/cache.h    |    4 ++--
>  fs/nfsd/nfscache.c |   55 +++++++++++++++++++++++++++++++++-------------------
>  fs/nfsd/nfssvc.c   |   10 +++++++++
>  3 files changed, 46 insertions(+), 23 deletions(-)
> 
> This fix is kind of ugly. Looking for comments or suggestions for
> improvement. Two further clean-ups occurred to me:
> 
>  - Set up a parms struct in nfsd_dispatch() that is passed to
>    nfsd_cache_lookup() and nfsd_cache_update() that carries all
>    of this extraneous garbage
>  - Move nfsd_cache_csum to net/sunrpc/xdr.c since it assumes
>    details about the how the message appears in the xdr_buf
> 
> 
> diff --git a/fs/nfsd/cache.h b/fs/nfsd/cache.h
> index 929248c6ca84..4cbe0434cbb8 100644
> --- a/fs/nfsd/cache.h
> +++ b/fs/nfsd/cache.h
> @@ -84,8 +84,8 @@ int	nfsd_net_reply_cache_init(struct nfsd_net *nn);
>  void	nfsd_net_reply_cache_destroy(struct nfsd_net *nn);
>  int	nfsd_reply_cache_init(struct nfsd_net *);
>  void	nfsd_reply_cache_shutdown(struct nfsd_net *);
> -int	nfsd_cache_lookup(struct svc_rqst *rqstp,
> -			  struct nfsd_cacherep **cacherep);
> +int	nfsd_cache_lookup(struct svc_rqst *rqstp, unsigned int start,
> +			  unsigned int len, struct nfsd_cacherep **cacherep);
>  void	nfsd_cache_update(struct svc_rqst *rqstp, struct nfsd_cacherep *rp,
>  			  int cachetype, __be32 *statp);
>  int	nfsd_reply_cache_stats_show(struct seq_file *m, void *v);
> diff --git a/fs/nfsd/nfscache.c b/fs/nfsd/nfscache.c
> index fd56a52aa5fb..761896c44e77 100644
> --- a/fs/nfsd/nfscache.c
> +++ b/fs/nfsd/nfscache.c
> @@ -370,32 +370,44 @@ nfsd_reply_cache_scan(struct shrinker *shrink, struct shrink_control *sc)
>  }
>  
>  /*
> - * Walk an xdr_buf and get a CRC for at most the first RC_CSUMLEN bytes
> + * Compute a weak checksum of the leading bytes of an NFS procedure
> + * call header. csum_partial() computes a TCP checksum as specified
> + * in RFC 793.
> + *
> + * To avoid assumptions about how the RPC message is laid out in
> + * @buf and what else it might contain (eg, a GSS MIC suffix), the
> + * caller passes the exact location and length of the NFS Call
> + * header.
>   */

Maybe make this a kerneldoc comment?

> -static __wsum
> -nfsd_cache_csum(struct svc_rqst *rqstp)
> +static __wsum nfsd_cache_csum(struct xdr_buf *buf, unsigned int start,
> +			      unsigned int remaining)
>  {
> +	unsigned int base, len;
> +	struct xdr_buf subbuf;
> +	__wsum csum = 0;
> +	void *p;
>  	int idx;
> -	unsigned int base;
> -	__wsum csum;
> -	struct xdr_buf *buf = &rqstp->rq_arg;
> -	const unsigned char *p = buf->head[0].iov_base;
> -	size_t csum_len = min_t(size_t, buf->head[0].iov_len + buf->page_len,
> -				RC_CSUMLEN);
> -	size_t len = min(buf->head[0].iov_len, csum_len);
> +
> +	if (remaining > RC_CSUMLEN)
> +		remaining = RC_CSUMLEN;
> +	if (xdr_buf_subsegment(buf, &subbuf, start, remaining))
> +		return csum;
>  
>  	/* rq_arg.head first */
> -	csum = csum_partial(p, len, 0);
> -	csum_len -= len;
> +	if (subbuf.head[0].iov_len) {
> +		len = min_t(unsigned int, subbuf.head[0].iov_len, remaining);
> +		csum = csum_partial(subbuf.head[0].iov_base, len, csum);
> +		remaining -= len;
> +	}
>  
>  	/* Continue into page array */
> -	idx = buf->page_base / PAGE_SIZE;
> -	base = buf->page_base & ~PAGE_MASK;
> -	while (csum_len) {
> -		p = page_address(buf->pages[idx]) + base;
> -		len = min_t(size_t, PAGE_SIZE - base, csum_len);
> +	idx = subbuf.page_base / PAGE_SIZE;
> +	base = subbuf.page_base & ~PAGE_MASK;
> +	while (remaining) {
> +		p = page_address(subbuf.pages[idx]) + base;
> +		len = min_t(unsigned int, PAGE_SIZE - base, remaining);
>  		csum = csum_partial(p, len, csum);
> -		csum_len -= len;
> +		remaining -= len;
>  		base = 0;
>  		++idx;
>  	}
> @@ -466,6 +478,8 @@ nfsd_cache_insert(struct nfsd_drc_bucket *b, struct nfsd_cacherep *key,
>  /**
>   * nfsd_cache_lookup - Find an entry in the duplicate reply cache
>   * @rqstp: Incoming Call to find
> + * @start: location in @rqstp->rq_arg of the NFS Call header
> + * @len: length of NFS Call header in bytes
>   * @cacherep: OUT: DRC entry for this request
>   *
>   * Try to find an entry matching the current call in the cache. When none
> @@ -479,7 +493,8 @@ nfsd_cache_insert(struct nfsd_drc_bucket *b, struct nfsd_cacherep *key,
>   *   %RC_REPLY: Reply from cache
>   *   %RC_DROPIT: Do not process the request further
>   */
> -int nfsd_cache_lookup(struct svc_rqst *rqstp, struct nfsd_cacherep **cacherep)
> +int nfsd_cache_lookup(struct svc_rqst *rqstp, unsigned int start,
> +		      unsigned int len, struct nfsd_cacherep **cacherep)
>  {
>  	struct nfsd_net		*nn;
>  	struct nfsd_cacherep	*rp, *found;
> @@ -495,7 +510,7 @@ int nfsd_cache_lookup(struct svc_rqst *rqstp, struct nfsd_cacherep **cacherep)
>  		goto out;
>  	}
>  
> -	csum = nfsd_cache_csum(rqstp);
> +	csum = nfsd_cache_csum(&rqstp->rq_arg, start, len);
>  
>  	/*
>  	 * Since the common case is a cache miss followed by an insert,
> diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
> index ef126969a7ce..1d6e33c6c4fe 100644
> --- a/fs/nfsd/nfssvc.c
> +++ b/fs/nfsd/nfssvc.c
> @@ -980,6 +980,7 @@ int nfsd_dispatch(struct svc_rqst *rqstp)
>  	const struct svc_procedure *proc = rqstp->rq_procinfo;
>  	__be32 *statp = rqstp->rq_accept_statp;
>  	struct nfsd_cacherep *rp;
> +	unsigned int start, len;
>  
>  	/*
>  	 * Give the xdr decoder a chance to change this if it wants
> @@ -987,6 +988,13 @@ int nfsd_dispatch(struct svc_rqst *rqstp)
>  	 */
>  	rqstp->rq_cachetype = proc->pc_cachetype;
>  
> +	/*
> +	 * ->pc_decode advances the argument stream past the NFS
> +	 * Call header, so grab the header's starting location and
> +	 * size now for the call to nfsd_cache_lookup().
> +	 */
> +	start = xdr_stream_pos(&rqstp->rq_arg_stream);
> +	len = xdr_stream_remaining(&rqstp->rq_arg_stream);
>  	if (!proc->pc_decode(rqstp, &rqstp->rq_arg_stream))
>  		goto out_decode_err;
>  
> @@ -1000,7 +1008,7 @@ int nfsd_dispatch(struct svc_rqst *rqstp)
>  	smp_store_release(&rqstp->rq_status_counter, rqstp->rq_status_counter | 1);
>  
>  	rp = NULL;
> -	switch (nfsd_cache_lookup(rqstp, &rp)) {
> +	switch (nfsd_cache_lookup(rqstp, start, len, &rp)) {
>  	case RC_DOIT:
>  		break;
>  	case RC_REPLY:
> 
> 

This looks pretty reasonable to me, even if it is a bit ugly. Since the
need is nfsd-specific, passing down these values is preferable to adding
them to struct svc_rqst.
-- 
Jeff Layton <jlayton@kernel.org>

     prev parent reply	other threads:[~2023-11-10 18:02 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-09 13:45 [PATCH RFC] NFSD: Fix checksum mismatches in the duplicate reply cache Chuck Lever
2023-11-10 15:32 ` Jeff Layton [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=efdbd2d53c97d3482f8844ffc5cd9caee577a5f6.camel@kernel.org \
    --to=jlayton@kernel.org \
    --cc=cel@kernel.org \
    --cc=chuck.lever@oracle.com \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox