public inbox for linux-nfs@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/2] Fix XDR encoding near page boundaries
@ 2024-12-23 18:07 cel
  2024-12-23 18:07 ` [PATCH v2 1/2] NFSD: Encode COMPOUND operation status on " cel
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: cel @ 2024-12-23 18:07 UTC (permalink / raw)
  To: Neil Brown, Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey
  Cc: linux-nfs, Rick Macklem, Chuck Lever

From: Chuck Lever <chuck.lever@oracle.com>

Build out the patch series to address the longstanding bug pointed
out by J David and Rick Macklem.

At least during NFSv4 COMPOUND encoding, using
write_bytes_to_xdr_buf() seems less brittle than saving a pointer
into the XDR encoding buffer.

I have one more patch to add (not yet included) that addresses the
issue in the NFSv4 READ and READ_PLUS encoders.

Changes since RFC:
- Document the guarantees around pointer returned by xdr_reserve_space()
- Use write_bytes_to_xdr_buf() instead

Chuck Lever (2):
  NFSD: Encode COMPOUND operation status on page boundaries
  SUNRPC: Document validity guarantees of the pointer returned by
    reserve_space

 fs/nfsd/nfs4xdr.c | 20 ++++++++++----------
 net/sunrpc/xdr.c  |  3 +++
 2 files changed, 13 insertions(+), 10 deletions(-)

-- 
2.47.0


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v2 1/2] NFSD: Encode COMPOUND operation status on page boundaries
  2024-12-23 18:07 [PATCH v2 0/2] Fix XDR encoding near page boundaries cel
@ 2024-12-23 18:07 ` cel
  2024-12-23 23:46   ` Rick Macklem
  2024-12-24 23:19   ` NeilBrown
  2024-12-23 18:07 ` [PATCH v2 2/2] SUNRPC: Document validity guarantees of the pointer returned by reserve_space cel
  2024-12-23 23:22 ` [PATCH v2 0/2] Fix XDR encoding near page boundaries Rick Macklem
  2 siblings, 2 replies; 9+ messages in thread
From: cel @ 2024-12-23 18:07 UTC (permalink / raw)
  To: Neil Brown, Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey
  Cc: linux-nfs, Rick Macklem, Chuck Lever, J David

From: Chuck Lever <chuck.lever@oracle.com>

J. David reports an odd corruption of a READDIR reply sent to a
FreeBSD client.

xdr_reserve_space() has to do a special trick when the @nbytes value
requests more space than there is in the current page of the XDR
buffer.

In that case, xdr_reserve_space() returns a pointer to the start of
the next page, and then the next call to xdr_reserve_space() invokes
__xdr_commit_encode() to copy enough of the data item back into the
previous page to make that data item contiguous across the page
boundary.

But we need to be careful in the case where buffer space is reserved
early for a data item that will be inserted into the buffer later.

One such caller, nfsd4_encode_operation(), reserves 8 bytes in the
encoding buffer for each COMPOUND operation. However, a READDIR
result can sometimes encode file names so that there are only 4
bytes left at the end of the current XDR buffer page (though plenty
of pages are left to handle the remaining encoding tasks).

If a COMPOUND operation follows the READDIR result (say, a GETATTR),
then nfsd4_encode_operation() will reserve 8 bytes for the op number
(9) and the op status (usually NFS4_OK). In this weird case,
xdr_reserve_space() returns a pointer to byte zero of the next buffer
page, as it assumes the data item will be copied back into place (in
the previous page) on the next call to xdr_reserve_space().

nfsd4_encode_operation() writes the op num into the buffer, then
saves the next 4-byte location for the op's status code. The next
xdr_reserve_space() call is part of GETATTR encoding, so the op num
gets copied back into the previous page, but the saved location for
the op status continues to point to the wrong spot in the current
XDR buffer page because __xdr_commit_encode() moved that data item.

After GETATTR encoding is complete, nfsd4_encode_operation() writes
the op status over the first XDR data item in the GETATTR result.
The NFS4_OK status code (0) makes it look like there are zero items
in the GETATTR's attribute bitmask.

The patch description of commit 2825a7f90753 ("nfsd4: allow encoding
across page boundaries") [2014] remarks that NFSD "can't handle a
new operation starting close to the end of a page." This bug appears
to be one reason for that remark.

Reported-by: J David <j.david.lists@gmail.com>
Closes: https://lore.kernel.org/linux-nfs/3998d739-c042-46b4-8166-dbd6c5f0e804@oracle.com/T/#t
X-Cc: stable@vger.kernel.org
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfsd/nfs4xdr.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 53fac037611c..15cd716e9d91 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -5760,15 +5760,14 @@ nfsd4_encode_operation(struct nfsd4_compoundres *resp, struct nfsd4_op *op)
 	struct nfs4_stateowner *so = resp->cstate.replay_owner;
 	struct svc_rqst *rqstp = resp->rqstp;
 	const struct nfsd4_operation *opdesc = op->opdesc;
-	int post_err_offset;
+	unsigned int op_status_offset;
 	nfsd4_enc encoder;
-	__be32 *p;
 
-	p = xdr_reserve_space(xdr, 8);
-	if (!p)
+	if (xdr_stream_encode_u32(xdr, op->opnum) != XDR_UNIT)
+		goto release;
+	op_status_offset = xdr_stream_pos(xdr);
+	if (!xdr_reserve_space(xdr, 4))
 		goto release;
-	*p++ = cpu_to_be32(op->opnum);
-	post_err_offset = xdr->buf->len;
 
 	if (op->opnum == OP_ILLEGAL)
 		goto status;
@@ -5809,20 +5808,21 @@ nfsd4_encode_operation(struct nfsd4_compoundres *resp, struct nfsd4_op *op)
 		 * bug if we had to do this on a non-idempotent op:
 		 */
 		warn_on_nonidempotent_op(op);
-		xdr_truncate_encode(xdr, post_err_offset);
+		xdr_truncate_encode(xdr, op_status_offset + XDR_UNIT);
 	}
 	if (so) {
-		int len = xdr->buf->len - post_err_offset;
+		int len = xdr->buf->len - (op_status_offset + XDR_UNIT);
 
 		so->so_replay.rp_status = op->status;
 		so->so_replay.rp_buflen = len;
-		read_bytes_from_xdr_buf(xdr->buf, post_err_offset,
+		read_bytes_from_xdr_buf(xdr->buf, op_status_offset + XDR_UNIT,
 						so->so_replay.rp_buf, len);
 	}
 status:
 	op->status = nfsd4_map_status(op->status,
 				      resp->cstate.minorversion);
-	*p = op->status;
+	write_bytes_to_xdr_buf(xdr->buf, op_status_offset,
+			       &op->status, XDR_UNIT);
 release:
 	if (opdesc && opdesc->op_release)
 		opdesc->op_release(&op->u);
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 2/2] SUNRPC: Document validity guarantees of the pointer returned by reserve_space
  2024-12-23 18:07 [PATCH v2 0/2] Fix XDR encoding near page boundaries cel
  2024-12-23 18:07 ` [PATCH v2 1/2] NFSD: Encode COMPOUND operation status on " cel
@ 2024-12-23 18:07 ` cel
  2024-12-23 23:22 ` [PATCH v2 0/2] Fix XDR encoding near page boundaries Rick Macklem
  2 siblings, 0 replies; 9+ messages in thread
From: cel @ 2024-12-23 18:07 UTC (permalink / raw)
  To: Neil Brown, Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey
  Cc: linux-nfs, Rick Macklem, Chuck Lever

From: Chuck Lever <chuck.lever@oracle.com>

A subtlety of this API is that if the @nbytes region traverses a
page boundary, the next __xdr_commit_encode will shift the data item
in the XDR encode buffer. This makes the returned pointer point to
something else, leading to unexpected behavior.

There are a few cases where the caller saves the returned pointer
and then later uses it to insert a computed value into an earlier
part of the stream. This can be safe only if either:

 - the data item is guaranteed to be in the XDR buffer's head, and
   thus is not ever going to be near a page boundary, or
 - the data item is no larger than 4 octets, since XDR alignment
   rules require all data items to start on 4-octet boundaries

But that safety is only an artifact of the current implementation.
It would be less brittle if these "safe" uses were eventually
replaced.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 net/sunrpc/xdr.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c
index 62e07c330a66..f198bb043e2f 100644
--- a/net/sunrpc/xdr.c
+++ b/net/sunrpc/xdr.c
@@ -1097,6 +1097,9 @@ static noinline __be32 *xdr_get_next_encode_buffer(struct xdr_stream *xdr,
  * Checks that we have enough buffer space to encode 'nbytes' more
  * bytes of data. If so, update the total xdr_buf length, and
  * adjust the length of the current kvec.
+ *
+ * The returned pointer is valid only until the next call to
+ * xdr_reserve_space() or xdr_commit_encode() on this stream.
  */
 __be32 * xdr_reserve_space(struct xdr_stream *xdr, size_t nbytes)
 {
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 0/2] Fix XDR encoding near page boundaries
  2024-12-23 18:07 [PATCH v2 0/2] Fix XDR encoding near page boundaries cel
  2024-12-23 18:07 ` [PATCH v2 1/2] NFSD: Encode COMPOUND operation status on " cel
  2024-12-23 18:07 ` [PATCH v2 2/2] SUNRPC: Document validity guarantees of the pointer returned by reserve_space cel
@ 2024-12-23 23:22 ` Rick Macklem
  2024-12-24 14:16   ` Chuck Lever
  2 siblings, 1 reply; 9+ messages in thread
From: Rick Macklem @ 2024-12-23 23:22 UTC (permalink / raw)
  To: cel
  Cc: Neil Brown, Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey,
	linux-nfs, Chuck Lever

On Mon, Dec 23, 2024 at 10:07 AM <cel@kernel.org> wrote:
>
> From: Chuck Lever <chuck.lever@oracle.com>
>
> Build out the patch series to address the longstanding bug pointed
> out by J David and Rick Macklem.
>
> At least during NFSv4 COMPOUND encoding, using
> write_bytes_to_xdr_buf() seems less brittle than saving a pointer
> into the XDR encoding buffer.
>
> I have one more patch to add (not yet included) that addresses the
> issue in the NFSv4 READ and READ_PLUS encoders.
It also looks like there is a similar situation in nfsd4_encode_fattr4().
It uses attrlen_p (only a 4byte xdr_reserve_space(), so safe for now.

You might just regret choosing to not wire down the "safe to use
xdr_reserve_space() for 4 bytes" semantic, but it is obviously up to you.

rick

>
> Changes since RFC:
> - Document the guarantees around pointer returned by xdr_reserve_space()
> - Use write_bytes_to_xdr_buf() instead
>
> Chuck Lever (2):
>   NFSD: Encode COMPOUND operation status on page boundaries
>   SUNRPC: Document validity guarantees of the pointer returned by
>     reserve_space
>
>  fs/nfsd/nfs4xdr.c | 20 ++++++++++----------
>  net/sunrpc/xdr.c  |  3 +++
>  2 files changed, 13 insertions(+), 10 deletions(-)
>
> --
> 2.47.0
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 1/2] NFSD: Encode COMPOUND operation status on page boundaries
  2024-12-23 18:07 ` [PATCH v2 1/2] NFSD: Encode COMPOUND operation status on " cel
@ 2024-12-23 23:46   ` Rick Macklem
  2024-12-24 23:19   ` NeilBrown
  1 sibling, 0 replies; 9+ messages in thread
From: Rick Macklem @ 2024-12-23 23:46 UTC (permalink / raw)
  To: cel
  Cc: Neil Brown, Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey,
	linux-nfs, Chuck Lever, J David

On Mon, Dec 23, 2024 at 10:07 AM <cel@kernel.org> wrote:
>
> From: Chuck Lever <chuck.lever@oracle.com>
>
> J. David reports an odd corruption of a READDIR reply sent to a
> FreeBSD client.
>
> xdr_reserve_space() has to do a special trick when the @nbytes value
> requests more space than there is in the current page of the XDR
> buffer.
>
> In that case, xdr_reserve_space() returns a pointer to the start of
> the next page, and then the next call to xdr_reserve_space() invokes
> __xdr_commit_encode() to copy enough of the data item back into the
> previous page to make that data item contiguous across the page
> boundary.
>
> But we need to be careful in the case where buffer space is reserved
> early for a data item that will be inserted into the buffer later.
>
> One such caller, nfsd4_encode_operation(), reserves 8 bytes in the
> encoding buffer for each COMPOUND operation. However, a READDIR
> result can sometimes encode file names so that there are only 4
> bytes left at the end of the current XDR buffer page (though plenty
> of pages are left to handle the remaining encoding tasks).
>
> If a COMPOUND operation follows the READDIR result (say, a GETATTR),
> then nfsd4_encode_operation() will reserve 8 bytes for the op number
> (9) and the op status (usually NFS4_OK). In this weird case,
> xdr_reserve_space() returns a pointer to byte zero of the next buffer
> page, as it assumes the data item will be copied back into place (in
> the previous page) on the next call to xdr_reserve_space().
>
> nfsd4_encode_operation() writes the op num into the buffer, then
> saves the next 4-byte location for the op's status code. The next
> xdr_reserve_space() call is part of GETATTR encoding, so the op num
> gets copied back into the previous page, but the saved location for
> the op status continues to point to the wrong spot in the current
> XDR buffer page because __xdr_commit_encode() moved that data item.
>
> After GETATTR encoding is complete, nfsd4_encode_operation() writes
> the op status over the first XDR data item in the GETATTR result.
> The NFS4_OK status code (0) makes it look like there are zero items
> in the GETATTR's attribute bitmask.
>
> The patch description of commit 2825a7f90753 ("nfsd4: allow encoding
> across page boundaries") [2014] remarks that NFSD "can't handle a
> new operation starting close to the end of a page." This bug appears
> to be one reason for that remark.
>
> Reported-by: J David <j.david.lists@gmail.com>
> Closes: https://lore.kernel.org/linux-nfs/3998d739-c042-46b4-8166-dbd6c5f0e804@oracle.com/T/#t
> X-Cc: stable@vger.kernel.org
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
>  fs/nfsd/nfs4xdr.c | 20 ++++++++++----------
>  1 file changed, 10 insertions(+), 10 deletions(-)
>
> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> index 53fac037611c..15cd716e9d91 100644
> --- a/fs/nfsd/nfs4xdr.c
> +++ b/fs/nfsd/nfs4xdr.c
> @@ -5760,15 +5760,14 @@ nfsd4_encode_operation(struct nfsd4_compoundres *resp, struct nfsd4_op *op)
>         struct nfs4_stateowner *so = resp->cstate.replay_owner;
>         struct svc_rqst *rqstp = resp->rqstp;
>         const struct nfsd4_operation *opdesc = op->opdesc;
> -       int post_err_offset;
> +       unsigned int op_status_offset;
>         nfsd4_enc encoder;
> -       __be32 *p;
>
> -       p = xdr_reserve_space(xdr, 8);
> -       if (!p)
> +       if (xdr_stream_encode_u32(xdr, op->opnum) != XDR_UNIT)
> +               goto release;
> +       op_status_offset = xdr_stream_pos(xdr);
> +       if (!xdr_reserve_space(xdr, 4))
Maybe XDR_UNIT instead of 4 for consistency, but I don't care which you choose.

>                 goto release;
> -       *p++ = cpu_to_be32(op->opnum);
> -       post_err_offset = xdr->buf->len;
>
>         if (op->opnum == OP_ILLEGAL)
>                 goto status;
> @@ -5809,20 +5808,21 @@ nfsd4_encode_operation(struct nfsd4_compoundres *resp, struct nfsd4_op *op)
>                  * bug if we had to do this on a non-idempotent op:
>                  */
>                 warn_on_nonidempotent_op(op);
> -               xdr_truncate_encode(xdr, post_err_offset);
> +               xdr_truncate_encode(xdr, op_status_offset + XDR_UNIT);
>         }
>         if (so) {
> -               int len = xdr->buf->len - post_err_offset;
> +               int len = xdr->buf->len - (op_status_offset + XDR_UNIT);
>
>                 so->so_replay.rp_status = op->status;
>                 so->so_replay.rp_buflen = len;
> -               read_bytes_from_xdr_buf(xdr->buf, post_err_offset,
> +               read_bytes_from_xdr_buf(xdr->buf, op_status_offset + XDR_UNIT,
>                                                 so->so_replay.rp_buf, len);
>         }
>  status:
>         op->status = nfsd4_map_status(op->status,
>                                       resp->cstate.minorversion);
> -       *p = op->status;
> +       write_bytes_to_xdr_buf(xdr->buf, op_status_offset,
> +                              &op->status, XDR_UNIT);
>  release:
>         if (opdesc && opdesc->op_release)
>                 opdesc->op_release(&op->u);
> --
> 2.47.0
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 0/2] Fix XDR encoding near page boundaries
  2024-12-23 23:22 ` [PATCH v2 0/2] Fix XDR encoding near page boundaries Rick Macklem
@ 2024-12-24 14:16   ` Chuck Lever
  2024-12-24 15:50     ` Rick Macklem
  0 siblings, 1 reply; 9+ messages in thread
From: Chuck Lever @ 2024-12-24 14:16 UTC (permalink / raw)
  To: Rick Macklem
  Cc: Neil Brown, Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey,
	linux-nfs

On 12/23/24 6:22 PM, Rick Macklem wrote:
> On Mon, Dec 23, 2024 at 10:07 AM <cel@kernel.org> wrote:
>>
>> From: Chuck Lever <chuck.lever@oracle.com>
>>
>> Build out the patch series to address the longstanding bug pointed
>> out by J David and Rick Macklem.
>>
>> At least during NFSv4 COMPOUND encoding, using
>> write_bytes_to_xdr_buf() seems less brittle than saving a pointer
>> into the XDR encoding buffer.
>>
>> I have one more patch to add (not yet included) that addresses the
>> issue in the NFSv4 READ and READ_PLUS encoders.
> It also looks like there is a similar situation in nfsd4_encode_fattr4().
> It uses attrlen_p (only a 4byte xdr_reserve_space(), so safe for now.
> 
> You might just regret choosing to not wire down the "safe to use
> xdr_reserve_space() for 4 bytes" semantic, but it is obviously up to you.

I'm 100% behind the idea of making it hard or impossible to code
things the wrong way.

But I'm not sure what you mean by "wire down". I can't think of a way
to enforce the "only 4 octets or less" rule -- just having a helper
function with that name documents it but doesn't enforce it.

My plan is to replace the obviously unsafe call sites immediately,
document the desired long-term semantics, then replace the other
"safe for now" call sites over time.

I've found a few other potentially unsafe server-side callers:
* gss_wrap_req_integ
* gss_wrap_req_priv
* unx_marshal

I consider these "safe for now" because the reserved space is guaranteed
to be in the XDR buffer's head iovec, far away from page boundaries.

I'm hoping that no-one introduces new unsafe call sites in the meantime.
We should be able to catch that in review.

That also buys some time to come up with something better.


> rick
> 
>>
>> Changes since RFC:
>> - Document the guarantees around pointer returned by xdr_reserve_space()
>> - Use write_bytes_to_xdr_buf() instead
>>
>> Chuck Lever (2):
>>    NFSD: Encode COMPOUND operation status on page boundaries
>>    SUNRPC: Document validity guarantees of the pointer returned by
>>      reserve_space
>>
>>   fs/nfsd/nfs4xdr.c | 20 ++++++++++----------
>>   net/sunrpc/xdr.c  |  3 +++
>>   2 files changed, 13 insertions(+), 10 deletions(-)
>>
>> --
>> 2.47.0
>>


-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 0/2] Fix XDR encoding near page boundaries
  2024-12-24 14:16   ` Chuck Lever
@ 2024-12-24 15:50     ` Rick Macklem
  0 siblings, 0 replies; 9+ messages in thread
From: Rick Macklem @ 2024-12-24 15:50 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Neil Brown, Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey,
	linux-nfs

On Tue, Dec 24, 2024 at 6:16 AM Chuck Lever <chuck.lever@oracle.com> wrote:
>
> On 12/23/24 6:22 PM, Rick Macklem wrote:
> > On Mon, Dec 23, 2024 at 10:07 AM <cel@kernel.org> wrote:
> >>
> >> From: Chuck Lever <chuck.lever@oracle.com>
> >>
> >> Build out the patch series to address the longstanding bug pointed
> >> out by J David and Rick Macklem.
> >>
> >> At least during NFSv4 COMPOUND encoding, using
> >> write_bytes_to_xdr_buf() seems less brittle than saving a pointer
> >> into the XDR encoding buffer.
> >>
> >> I have one more patch to add (not yet included) that addresses the
> >> issue in the NFSv4 READ and READ_PLUS encoders.
> > It also looks like there is a similar situation in nfsd4_encode_fattr4().
> > It uses attrlen_p (only a 4byte xdr_reserve_space(), so safe for now.
> >
> > You might just regret choosing to not wire down the "safe to use
> > xdr_reserve_space() for 4 bytes" semantic, but it is obviously up to you.
>
> I'm 100% behind the idea of making it hard or impossible to code
> things the wrong way.
Righto, sounds good to me.

>
> But I'm not sure what you mean by "wire down". I can't think of a way
> to enforce the "only 4 octets or less" rule -- just having a helper
> function with that name documents it but doesn't enforce it.
Agreed. By "wired down" I was simply thinking of defining it as
required behaviour. Documented by a comment update or similar.

I was being a little "tongue in cheek" when I made the comment,
referring to all the work involved.

Good luck with it and have a good holiday, rick

>
> My plan is to replace the obviously unsafe call sites immediately,
> document the desired long-term semantics, then replace the other
> "safe for now" call sites over time.
>
> I've found a few other potentially unsafe server-side callers:
> * gss_wrap_req_integ
> * gss_wrap_req_priv
> * unx_marshal
>
> I consider these "safe for now" because the reserved space is guaranteed
> to be in the XDR buffer's head iovec, far away from page boundaries.
>
> I'm hoping that no-one introduces new unsafe call sites in the meantime.
> We should be able to catch that in review.
>
> That also buys some time to come up with something better.
>
>
> > rick
> >
> >>
> >> Changes since RFC:
> >> - Document the guarantees around pointer returned by xdr_reserve_space()
> >> - Use write_bytes_to_xdr_buf() instead
> >>
> >> Chuck Lever (2):
> >>    NFSD: Encode COMPOUND operation status on page boundaries
> >>    SUNRPC: Document validity guarantees of the pointer returned by
> >>      reserve_space
> >>
> >>   fs/nfsd/nfs4xdr.c | 20 ++++++++++----------
> >>   net/sunrpc/xdr.c  |  3 +++
> >>   2 files changed, 13 insertions(+), 10 deletions(-)
> >>
> >> --
> >> 2.47.0
> >>
>
>
> --
> Chuck Lever

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 1/2] NFSD: Encode COMPOUND operation status on page boundaries
  2024-12-23 18:07 ` [PATCH v2 1/2] NFSD: Encode COMPOUND operation status on " cel
  2024-12-23 23:46   ` Rick Macklem
@ 2024-12-24 23:19   ` NeilBrown
  2024-12-25 15:53     ` Chuck Lever
  1 sibling, 1 reply; 9+ messages in thread
From: NeilBrown @ 2024-12-24 23:19 UTC (permalink / raw)
  To: cel
  Cc: Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey, linux-nfs,
	Rick Macklem, Chuck Lever, J David

On Tue, 24 Dec 2024, cel@kernel.org wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
> 
> J. David reports an odd corruption of a READDIR reply sent to a
> FreeBSD client.
> 
> xdr_reserve_space() has to do a special trick when the @nbytes value
> requests more space than there is in the current page of the XDR
> buffer.
> 
> In that case, xdr_reserve_space() returns a pointer to the start of
> the next page, and then the next call to xdr_reserve_space() invokes
> __xdr_commit_encode() to copy enough of the data item back into the
> previous page to make that data item contiguous across the page
> boundary.
> 
> But we need to be careful in the case where buffer space is reserved
> early for a data item that will be inserted into the buffer later.
> 
> One such caller, nfsd4_encode_operation(), reserves 8 bytes in the
> encoding buffer for each COMPOUND operation. However, a READDIR
> result can sometimes encode file names so that there are only 4
> bytes left at the end of the current XDR buffer page (though plenty
> of pages are left to handle the remaining encoding tasks).
> 
> If a COMPOUND operation follows the READDIR result (say, a GETATTR),
> then nfsd4_encode_operation() will reserve 8 bytes for the op number
> (9) and the op status (usually NFS4_OK). In this weird case,
> xdr_reserve_space() returns a pointer to byte zero of the next buffer
> page, as it assumes the data item will be copied back into place (in
> the previous page) on the next call to xdr_reserve_space().
> 
> nfsd4_encode_operation() writes the op num into the buffer, then
> saves the next 4-byte location for the op's status code. The next
> xdr_reserve_space() call is part of GETATTR encoding, so the op num
> gets copied back into the previous page, but the saved location for
> the op status continues to point to the wrong spot in the current
> XDR buffer page because __xdr_commit_encode() moved that data item.
> 
> After GETATTR encoding is complete, nfsd4_encode_operation() writes
> the op status over the first XDR data item in the GETATTR result.
> The NFS4_OK status code (0) makes it look like there are zero items
> in the GETATTR's attribute bitmask.
> 
> The patch description of commit 2825a7f90753 ("nfsd4: allow encoding
> across page boundaries") [2014] remarks that NFSD "can't handle a
> new operation starting close to the end of a page." This bug appears
> to be one reason for that remark.
> 
> Reported-by: J David <j.david.lists@gmail.com>
> Closes: https://lore.kernel.org/linux-nfs/3998d739-c042-46b4-8166-dbd6c5f0e804@oracle.com/T/#t
> X-Cc: stable@vger.kernel.org
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
>  fs/nfsd/nfs4xdr.c | 20 ++++++++++----------
>  1 file changed, 10 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> index 53fac037611c..15cd716e9d91 100644
> --- a/fs/nfsd/nfs4xdr.c
> +++ b/fs/nfsd/nfs4xdr.c
> @@ -5760,15 +5760,14 @@ nfsd4_encode_operation(struct nfsd4_compoundres *resp, struct nfsd4_op *op)
>  	struct nfs4_stateowner *so = resp->cstate.replay_owner;
>  	struct svc_rqst *rqstp = resp->rqstp;
>  	const struct nfsd4_operation *opdesc = op->opdesc;
> -	int post_err_offset;
> +	unsigned int op_status_offset;

As most uses of "op_status_offset" add XDR_UNIT I'd be incline to keep
the "post" offset.

   unsigned int op_status_offset, post_status_offset;

>  	nfsd4_enc encoder;
> -	__be32 *p;
>  
> -	p = xdr_reserve_space(xdr, 8);
> -	if (!p)
> +	if (xdr_stream_encode_u32(xdr, op->opnum) != XDR_UNIT)
> +		goto release;
> +	op_status_offset = xdr_stream_pos(xdr);
> +	if (!xdr_reserve_space(xdr, 4))

The underlying message of this bug seems to be that xdr_reserve_space()
is a low-level interface that probably shouldn't be used outside of xdr
code.
So I wonder if we could use
    op_status_offset = xdr_stream_pos(xdr);
    if (xdr_stream_encode_u32(xdr, NFS4ERR_SERVERFAULT) < 0) //will be over-written
           goto release;
    post_status_offset = xdr_stream_pos(xdr);

instead??

But these are minor thoughts - only use them if you like them.
Generally this is a definite improvement.

Reviewed-by: NeilBrown <neilb@suse.de>

Thanks,
NeilBrown

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 1/2] NFSD: Encode COMPOUND operation status on page boundaries
  2024-12-24 23:19   ` NeilBrown
@ 2024-12-25 15:53     ` Chuck Lever
  0 siblings, 0 replies; 9+ messages in thread
From: Chuck Lever @ 2024-12-25 15:53 UTC (permalink / raw)
  To: NeilBrown, cel
  Cc: Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey, linux-nfs,
	Rick Macklem, J David

On 12/24/24 6:19 PM, NeilBrown wrote:
> On Tue, 24 Dec 2024, cel@kernel.org wrote:
>> From: Chuck Lever <chuck.lever@oracle.com>
>>
>> J. David reports an odd corruption of a READDIR reply sent to a
>> FreeBSD client.
>>
>> xdr_reserve_space() has to do a special trick when the @nbytes value
>> requests more space than there is in the current page of the XDR
>> buffer.
>>
>> In that case, xdr_reserve_space() returns a pointer to the start of
>> the next page, and then the next call to xdr_reserve_space() invokes
>> __xdr_commit_encode() to copy enough of the data item back into the
>> previous page to make that data item contiguous across the page
>> boundary.
>>
>> But we need to be careful in the case where buffer space is reserved
>> early for a data item that will be inserted into the buffer later.
>>
>> One such caller, nfsd4_encode_operation(), reserves 8 bytes in the
>> encoding buffer for each COMPOUND operation. However, a READDIR
>> result can sometimes encode file names so that there are only 4
>> bytes left at the end of the current XDR buffer page (though plenty
>> of pages are left to handle the remaining encoding tasks).
>>
>> If a COMPOUND operation follows the READDIR result (say, a GETATTR),
>> then nfsd4_encode_operation() will reserve 8 bytes for the op number
>> (9) and the op status (usually NFS4_OK). In this weird case,
>> xdr_reserve_space() returns a pointer to byte zero of the next buffer
>> page, as it assumes the data item will be copied back into place (in
>> the previous page) on the next call to xdr_reserve_space().
>>
>> nfsd4_encode_operation() writes the op num into the buffer, then
>> saves the next 4-byte location for the op's status code. The next
>> xdr_reserve_space() call is part of GETATTR encoding, so the op num
>> gets copied back into the previous page, but the saved location for
>> the op status continues to point to the wrong spot in the current
>> XDR buffer page because __xdr_commit_encode() moved that data item.
>>
>> After GETATTR encoding is complete, nfsd4_encode_operation() writes
>> the op status over the first XDR data item in the GETATTR result.
>> The NFS4_OK status code (0) makes it look like there are zero items
>> in the GETATTR's attribute bitmask.
>>
>> The patch description of commit 2825a7f90753 ("nfsd4: allow encoding
>> across page boundaries") [2014] remarks that NFSD "can't handle a
>> new operation starting close to the end of a page." This bug appears
>> to be one reason for that remark.
>>
>> Reported-by: J David <j.david.lists@gmail.com>
>> Closes: https://lore.kernel.org/linux-nfs/3998d739-c042-46b4-8166-dbd6c5f0e804@oracle.com/T/#t
>> X-Cc: stable@vger.kernel.org
>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>> ---
>>   fs/nfsd/nfs4xdr.c | 20 ++++++++++----------
>>   1 file changed, 10 insertions(+), 10 deletions(-)
>>
>> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
>> index 53fac037611c..15cd716e9d91 100644
>> --- a/fs/nfsd/nfs4xdr.c
>> +++ b/fs/nfsd/nfs4xdr.c
>> @@ -5760,15 +5760,14 @@ nfsd4_encode_operation(struct nfsd4_compoundres *resp, struct nfsd4_op *op)
>>   	struct nfs4_stateowner *so = resp->cstate.replay_owner;
>>   	struct svc_rqst *rqstp = resp->rqstp;
>>   	const struct nfsd4_operation *opdesc = op->opdesc;
>> -	int post_err_offset;
>> +	unsigned int op_status_offset;
> 
> As most uses of "op_status_offset" add XDR_UNIT I'd be incline to keep
> the "post" offset.
> 
>     unsigned int op_status_offset, post_status_offset;

My thinking is that the "op_status_offset" value will be used on every
call of this function, but the "post_status_offset" cases are actually
exceedingly rare. There's no good reason to maintain that value
separately during every call.


>>   	nfsd4_enc encoder;
>> -	__be32 *p;
>>   
>> -	p = xdr_reserve_space(xdr, 8);
>> -	if (!p)
>> +	if (xdr_stream_encode_u32(xdr, op->opnum) != XDR_UNIT)
>> +		goto release;
>> +	op_status_offset = xdr_stream_pos(xdr);
>> +	if (!xdr_reserve_space(xdr, 4))
> 
> The underlying message of this bug seems to be that xdr_reserve_space()
> is a low-level interface that probably shouldn't be used outside of xdr
> code.
> So I wonder if we could use
>      op_status_offset = xdr_stream_pos(xdr);
>      if (xdr_stream_encode_u32(xdr, NFS4ERR_SERVERFAULT) < 0) //will be over-written
>             goto release;
>      post_status_offset = xdr_stream_pos(xdr);
> 
> instead??
> 
> But these are minor thoughts - only use them if you like them.
> Generally this is a definite improvement.
> 
> Reviewed-by: NeilBrown <neilb@suse.de>

Thanks!


-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2024-12-25 15:54 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-23 18:07 [PATCH v2 0/2] Fix XDR encoding near page boundaries cel
2024-12-23 18:07 ` [PATCH v2 1/2] NFSD: Encode COMPOUND operation status on " cel
2024-12-23 23:46   ` Rick Macklem
2024-12-24 23:19   ` NeilBrown
2024-12-25 15:53     ` Chuck Lever
2024-12-23 18:07 ` [PATCH v2 2/2] SUNRPC: Document validity guarantees of the pointer returned by reserve_space cel
2024-12-23 23:22 ` [PATCH v2 0/2] Fix XDR encoding near page boundaries Rick Macklem
2024-12-24 14:16   ` Chuck Lever
2024-12-24 15:50     ` Rick Macklem

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox