* [PATCH v3 0/6] Fix XDR encoding near page boundaries
@ 2024-12-26 16:28 cel
2024-12-26 16:28 ` [PATCH v3 1/6] NFSD: Encode COMPOUND operation status on " cel
` (5 more replies)
0 siblings, 6 replies; 11+ messages in thread
From: cel @ 2024-12-26 16:28 UTC (permalink / raw)
To: Neil Brown, Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey
Cc: linux-nfs, Rick Macklem, j.david.lists, Chuck Lever
From: Chuck Lever <chuck.lever@oracle.com>
Refresh the patch series to address the longstanding bug pointed out
by J David and Rick Macklem.
I believe we have identified and addressed this issue in all of
the NFSv4 COMPOUND operation encoders on the server side. Only the
GSS integrity and privacy encoders are still vulnerable but "safe
for now". Barring further review comments, this series is code-
complete.
Neil suggests xdr_reserve_space() should not ever be open-coded in
NFSv4 code. That seems difficult to enforce: nfsd4_encode_operation()
is certainly an XDR encode function; it lives in fs/nfsd/nfs4xdr.c,
for instance. So xdr_reserve_space() seems like a reasonable thing
to see in that function. I'm not sure exactly where to draw that
line.
Changes since v2:
- Address same issue in NFSv4 READ/READ_PLUS and fattr4 encoders
Chuck Lever (6):
NFSD: Encode COMPOUND operation status on page boundaries
NFSD: Insulate nfsd4_encode_read() from page boundaries in the encode
buffer
NFSD: Insulate nfsd4_encode_read_plus() from page boundaries in the
encode buffer
NFSD: Insulate nfsd4_encode_read_plus_data() from page boundaries in
the encode buffer
NFSD: Insulate nfsd4_encode_fattr4() from page boundaries in the
encode buffer
SUNRPC: Document validity guarantees of the pointer returned by
reserve_space
fs/nfsd/nfs4xdr.c | 109 ++++++++++++++++++++++++++--------------------
net/sunrpc/xdr.c | 3 ++
2 files changed, 65 insertions(+), 47 deletions(-)
--
2.47.0
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v3 1/6] NFSD: Encode COMPOUND operation status on page boundaries
2024-12-26 16:28 [PATCH v3 0/6] Fix XDR encoding near page boundaries cel
@ 2024-12-26 16:28 ` cel
2024-12-26 17:17 ` Cedric Blancher
2024-12-26 16:28 ` [PATCH v3 2/6] NFSD: Insulate nfsd4_encode_read() from page boundaries in the encode buffer cel
` (4 subsequent siblings)
5 siblings, 1 reply; 11+ messages in thread
From: cel @ 2024-12-26 16:28 UTC (permalink / raw)
To: Neil Brown, Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey
Cc: linux-nfs, Rick Macklem, j.david.lists, Chuck Lever, Rick Macklem
From: Chuck Lever <chuck.lever@oracle.com>
J. David reports an odd corruption of a READDIR reply sent to a
FreeBSD client.
xdr_reserve_space() has to do a special trick when the @nbytes value
requests more space than there is in the current page of the XDR
buffer.
In that case, xdr_reserve_space() returns a pointer to the start of
the next page, and then the next call to xdr_reserve_space() invokes
__xdr_commit_encode() to copy enough of the data item back into the
previous page to make that data item contiguous across the page
boundary.
But we need to be careful in the case where buffer space is reserved
early for a data item whose value will be inserted into the buffer
later.
One such caller, nfsd4_encode_operation(), reserves 8 bytes in the
encoding buffer for each COMPOUND operation. However, a READDIR
result can sometimes encode file names so that there are only 4
bytes left at the end of the current XDR buffer page (though plenty
of pages are left to handle the remaining encoding tasks).
If a COMPOUND operation follows the READDIR result (say, a GETATTR),
then nfsd4_encode_operation() will reserve 8 bytes for the op number
(9) and the op status (usually NFS4_OK). In this weird case,
xdr_reserve_space() returns a pointer to byte zero of the next buffer
page, as it assumes the data item will be copied back into place (in
the previous page) on the next call to xdr_reserve_space().
nfsd4_encode_operation() writes the op num into the buffer, then
saves the next 4-byte location for the op's status code. The next
xdr_reserve_space() call is part of GETATTR encoding, so the op num
gets copied back into the previous page, but the saved location for
the op status continues to point to the wrong spot in the current
XDR buffer page because __xdr_commit_encode() moved that data item.
After GETATTR encoding is complete, nfsd4_encode_operation() writes
the op status over the first XDR data item in the GETATTR result.
The NFS4_OK status code (0) makes it look like there are zero items
in the GETATTR's attribute bitmask.
The patch description of commit 2825a7f90753 ("nfsd4: allow encoding
across page boundaries") [2014] remarks that NFSD "can't handle a
new operation starting close to the end of a page." This bug appears
to be one reason for that remark.
Reported-by: J David <j.david.lists@gmail.com>
Closes: https://lore.kernel.org/linux-nfs/3998d739-c042-46b4-8166-dbd6c5f0e804@oracle.com/T/#t
Tested-by: Rick Macklem <rmacklem@uoguelph.ca>
Reviewed-by: NeilBrown <neilb@suse.de>
X-Cc: stable@vger.kernel.org
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
fs/nfsd/nfs4xdr.c | 20 ++++++++++----------
1 file changed, 10 insertions(+), 10 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 53fac037611c..efcb132c19d4 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -5760,15 +5760,14 @@ nfsd4_encode_operation(struct nfsd4_compoundres *resp, struct nfsd4_op *op)
struct nfs4_stateowner *so = resp->cstate.replay_owner;
struct svc_rqst *rqstp = resp->rqstp;
const struct nfsd4_operation *opdesc = op->opdesc;
- int post_err_offset;
+ unsigned int op_status_offset;
nfsd4_enc encoder;
- __be32 *p;
- p = xdr_reserve_space(xdr, 8);
- if (!p)
+ if (xdr_stream_encode_u32(xdr, op->opnum) != XDR_UNIT)
+ goto release;
+ op_status_offset = xdr_stream_pos(xdr);
+ if (!xdr_reserve_space(xdr, XDR_UNIT))
goto release;
- *p++ = cpu_to_be32(op->opnum);
- post_err_offset = xdr->buf->len;
if (op->opnum == OP_ILLEGAL)
goto status;
@@ -5809,20 +5808,21 @@ nfsd4_encode_operation(struct nfsd4_compoundres *resp, struct nfsd4_op *op)
* bug if we had to do this on a non-idempotent op:
*/
warn_on_nonidempotent_op(op);
- xdr_truncate_encode(xdr, post_err_offset);
+ xdr_truncate_encode(xdr, op_status_offset + XDR_UNIT);
}
if (so) {
- int len = xdr->buf->len - post_err_offset;
+ int len = xdr->buf->len - (op_status_offset + XDR_UNIT);
so->so_replay.rp_status = op->status;
so->so_replay.rp_buflen = len;
- read_bytes_from_xdr_buf(xdr->buf, post_err_offset,
+ read_bytes_from_xdr_buf(xdr->buf, op_status_offset + XDR_UNIT,
so->so_replay.rp_buf, len);
}
status:
op->status = nfsd4_map_status(op->status,
resp->cstate.minorversion);
- *p = op->status;
+ write_bytes_to_xdr_buf(xdr->buf, op_status_offset,
+ &op->status, XDR_UNIT);
release:
if (opdesc && opdesc->op_release)
opdesc->op_release(&op->u);
--
2.47.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v3 2/6] NFSD: Insulate nfsd4_encode_read() from page boundaries in the encode buffer
2024-12-26 16:28 [PATCH v3 0/6] Fix XDR encoding near page boundaries cel
2024-12-26 16:28 ` [PATCH v3 1/6] NFSD: Encode COMPOUND operation status on " cel
@ 2024-12-26 16:28 ` cel
2024-12-26 16:28 ` [PATCH v3 3/6] NFSD: Insulate nfsd4_encode_read_plus() " cel
` (3 subsequent siblings)
5 siblings, 0 replies; 11+ messages in thread
From: cel @ 2024-12-26 16:28 UTC (permalink / raw)
To: Neil Brown, Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey
Cc: linux-nfs, Rick Macklem, j.david.lists, Chuck Lever
From: Chuck Lever <chuck.lever@oracle.com>
Commit 28d5bc468efe ("NFSD: Optimize nfsd4_encode_readv()") replaced
the use of write_bytes_to_xdr_buf() because it's expensive and the
data items to be encoded are already properly aligned.
However, the current code will corrupt the encoded data if the XDR
data items that are reserved early and then poked into the XDR
buffer later happen to fall on a page boundary in the XDR encoding
buffer.
__xdr_commit_encode can shift encoded data items in the encoding
buffer so that pointers returned from xdr_reserve_space() no longer
address the same part of the encoding stream.
This isn't an issue for splice reads because the reserved encode
buffer areas must fall in the XDR buffers header for the splice to
work without error. For vectored reads, however, there is a
possibility of send buffer corruption in rare cases.
Fixes: 28d5bc468efe ("NFSD: Optimize nfsd4_encode_readv()")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
fs/nfsd/nfs4xdr.c | 34 +++++++++++++++++++---------------
1 file changed, 19 insertions(+), 15 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index efcb132c19d4..094806fe1a32 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -4316,6 +4316,15 @@ static __be32 nfsd4_encode_splice_read(
int status, space_left;
__be32 nfserr;
+ /*
+ * Splice read doesn't work if encoding has already wandered
+ * into the XDR buf's page array.
+ */
+ if (unlikely(xdr->buf->page_len)) {
+ WARN_ON_ONCE(1);
+ return nfserr_serverfault;
+ }
+
/*
* Make sure there is room at the end of buf->head for
* svcxdr_encode_opaque_pages() to create a tail buffer
@@ -4398,25 +4407,23 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr,
struct nfsd4_compoundargs *argp = resp->rqstp->rq_argp;
struct nfsd4_read *read = &u->read;
struct xdr_stream *xdr = resp->xdr;
- int starting_len = xdr->buf->len;
bool splice_ok = argp->splice_ok;
+ unsigned int eof_offset;
unsigned long maxcount;
+ __be32 wire_data[2];
struct file *file;
- __be32 *p;
if (nfserr)
return nfserr;
+
+ eof_offset = xdr_stream_pos(xdr);
file = read->rd_nf->nf_file;
- p = xdr_reserve_space(xdr, 8); /* eof flag and byte count */
- if (!p) {
+ /* Reserve space for the eof flag and byte count */
+ if (unlikely(!xdr_reserve_space(xdr, XDR_UNIT * 2))) {
WARN_ON_ONCE(splice_ok);
return nfserr_resource;
}
- if (resp->xdr->buf->page_len && splice_ok) {
- WARN_ON_ONCE(1);
- return nfserr_serverfault;
- }
xdr_commit_encode(xdr);
maxcount = min_t(unsigned long, read->rd_length,
@@ -4427,12 +4434,13 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr,
else
nfserr = nfsd4_encode_readv(resp, read, file, maxcount);
if (nfserr) {
- xdr_truncate_encode(xdr, starting_len);
+ xdr_truncate_encode(xdr, eof_offset);
return nfserr;
}
- p = xdr_encode_bool(p, read->rd_eof);
- *p = cpu_to_be32(read->rd_length);
+ wire_data[0] = read->rd_eof ? xdr_one : xdr_zero;
+ wire_data[1] = cpu_to_be32(read->rd_length);
+ write_bytes_to_xdr_buf(xdr->buf, eof_offset, &wire_data, XDR_UNIT * 2);
return nfs_ok;
}
@@ -5303,10 +5311,6 @@ nfsd4_encode_read_plus_data(struct nfsd4_compoundres *resp,
p = xdr_reserve_space(xdr, 4 + 8 + 4);
if (!p)
return nfserr_io;
- if (resp->xdr->buf->page_len && splice_ok) {
- WARN_ON_ONCE(splice_ok);
- return nfserr_serverfault;
- }
maxcount = min_t(unsigned long, read->rd_length,
(xdr->buf->buflen - xdr->buf->len));
--
2.47.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v3 3/6] NFSD: Insulate nfsd4_encode_read_plus() from page boundaries in the encode buffer
2024-12-26 16:28 [PATCH v3 0/6] Fix XDR encoding near page boundaries cel
2024-12-26 16:28 ` [PATCH v3 1/6] NFSD: Encode COMPOUND operation status on " cel
2024-12-26 16:28 ` [PATCH v3 2/6] NFSD: Insulate nfsd4_encode_read() from page boundaries in the encode buffer cel
@ 2024-12-26 16:28 ` cel
2024-12-26 16:28 ` [PATCH v3 4/6] NFSD: Insulate nfsd4_encode_read_plus_data() " cel
` (2 subsequent siblings)
5 siblings, 0 replies; 11+ messages in thread
From: cel @ 2024-12-26 16:28 UTC (permalink / raw)
To: Neil Brown, Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey
Cc: linux-nfs, Rick Macklem, j.david.lists, Chuck Lever
From: Chuck Lever <chuck.lever@oracle.com>
Commit eeadcb757945 ("NFSD: Simplify READ_PLUS") replaced the use of
write_bytes_to_xdr_buf(), copying what was in nfsd4_encode_read()
at the time.
However, the current code will corrupt the encoded data if the XDR
data items that are reserved early and then poked into the XDR
buffer later happen to fall on a page boundary in the XDR encoding
buffer.
__xdr_commit_encode can shift encoded data items in the encoding
buffer so that pointers returned from xdr_reserve_space() no longer
address the same part of the encoding stream.
Fixes: eeadcb757945 ("NFSD: Simplify READ_PLUS")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
fs/nfsd/nfs4xdr.c | 18 ++++++++++--------
1 file changed, 10 insertions(+), 8 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 094806fe1a32..00e2f4fc4e19 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -5336,16 +5336,17 @@ nfsd4_encode_read_plus(struct nfsd4_compoundres *resp, __be32 nfserr,
struct nfsd4_read *read = &u->read;
struct file *file = read->rd_nf->nf_file;
struct xdr_stream *xdr = resp->xdr;
- int starting_len = xdr->buf->len;
+ unsigned int eof_offset;
+ __be32 wire_data[2];
u32 segments = 0;
- __be32 *p;
if (nfserr)
return nfserr;
- /* eof flag, segment count */
- p = xdr_reserve_space(xdr, 4 + 4);
- if (!p)
+ eof_offset = xdr_stream_pos(xdr);
+
+ /* Reserve space for the eof flag and segment count */
+ if (unlikely(!xdr_reserve_space(xdr, XDR_UNIT * 2)))
return nfserr_io;
xdr_commit_encode(xdr);
@@ -5355,15 +5356,16 @@ nfsd4_encode_read_plus(struct nfsd4_compoundres *resp, __be32 nfserr,
nfserr = nfsd4_encode_read_plus_data(resp, read);
if (nfserr) {
- xdr_truncate_encode(xdr, starting_len);
+ xdr_truncate_encode(xdr, eof_offset);
return nfserr;
}
segments++;
out:
- p = xdr_encode_bool(p, read->rd_eof);
- *p = cpu_to_be32(segments);
+ wire_data[0] = read->rd_eof ? xdr_one : xdr_zero;
+ wire_data[1] = cpu_to_be32(segments);
+ write_bytes_to_xdr_buf(xdr->buf, eof_offset, &wire_data, XDR_UNIT * 2);
return nfserr;
}
--
2.47.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v3 4/6] NFSD: Insulate nfsd4_encode_read_plus_data() from page boundaries in the encode buffer
2024-12-26 16:28 [PATCH v3 0/6] Fix XDR encoding near page boundaries cel
` (2 preceding siblings ...)
2024-12-26 16:28 ` [PATCH v3 3/6] NFSD: Insulate nfsd4_encode_read_plus() " cel
@ 2024-12-26 16:28 ` cel
2024-12-26 16:28 ` [PATCH v3 5/6] NFSD: Insulate nfsd4_encode_fattr4() " cel
2024-12-26 16:28 ` [PATCH v3 6/6] SUNRPC: Document validity guarantees of the pointer returned by reserve_space cel
5 siblings, 0 replies; 11+ messages in thread
From: cel @ 2024-12-26 16:28 UTC (permalink / raw)
To: Neil Brown, Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey
Cc: linux-nfs, Rick Macklem, j.david.lists, Chuck Lever
From: Chuck Lever <chuck.lever@oracle.com>
Commit eeadcb757945 ("NFSD: Simplify READ_PLUS") replaced the use of
write_bytes_to_xdr_buf(), copying what was in nfsd4_encode_read()
at the time.
However, the current code will corrupt the encoded data if the XDR
data items that are reserved early and then poked into the XDR
buffer later happen to fall on a page boundary in the XDR encoding
buffer.
__xdr_commit_encode can shift encoded data items in the encoding
buffer so that pointers returned from xdr_reserve_space() no longer
address the same part of the encoding stream.
Fixes: eeadcb757945 ("NFSD: Simplify READ_PLUS")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
fs/nfsd/nfs4xdr.c | 25 +++++++++++++++++--------
1 file changed, 17 insertions(+), 8 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 00e2f4fc4e19..b770225d63dc 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -5304,14 +5304,21 @@ nfsd4_encode_read_plus_data(struct nfsd4_compoundres *resp,
struct file *file = read->rd_nf->nf_file;
struct xdr_stream *xdr = resp->xdr;
bool splice_ok = argp->splice_ok;
+ unsigned int offset_offset;
+ __be32 nfserr, wire_count;
unsigned long maxcount;
- __be32 nfserr, *p;
+ __be64 wire_offset;
- /* Content type, offset, byte count */
- p = xdr_reserve_space(xdr, 4 + 8 + 4);
- if (!p)
+ if (xdr_stream_encode_u32(xdr, NFS4_CONTENT_DATA) != XDR_UNIT)
return nfserr_io;
+ offset_offset = xdr_stream_pos(xdr);
+
+ /* Reserve space for the byte offset and count */
+ if (unlikely(!xdr_reserve_space(xdr, XDR_UNIT * 3)))
+ return nfserr_io;
+ xdr_commit_encode(xdr);
+
maxcount = min_t(unsigned long, read->rd_length,
(xdr->buf->buflen - xdr->buf->len));
@@ -5322,10 +5329,12 @@ nfsd4_encode_read_plus_data(struct nfsd4_compoundres *resp,
if (nfserr)
return nfserr;
- *p++ = cpu_to_be32(NFS4_CONTENT_DATA);
- p = xdr_encode_hyper(p, read->rd_offset);
- *p = cpu_to_be32(read->rd_length);
-
+ wire_offset = cpu_to_be64(read->rd_offset);
+ write_bytes_to_xdr_buf(xdr->buf, offset_offset, &wire_offset,
+ XDR_UNIT * 2);
+ wire_count = cpu_to_be32(read->rd_length);
+ write_bytes_to_xdr_buf(xdr->buf, offset_offset + XDR_UNIT * 2,
+ &wire_count, XDR_UNIT);
return nfs_ok;
}
--
2.47.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v3 5/6] NFSD: Insulate nfsd4_encode_fattr4() from page boundaries in the encode buffer
2024-12-26 16:28 [PATCH v3 0/6] Fix XDR encoding near page boundaries cel
` (3 preceding siblings ...)
2024-12-26 16:28 ` [PATCH v3 4/6] NFSD: Insulate nfsd4_encode_read_plus_data() " cel
@ 2024-12-26 16:28 ` cel
2024-12-26 16:28 ` [PATCH v3 6/6] SUNRPC: Document validity guarantees of the pointer returned by reserve_space cel
5 siblings, 0 replies; 11+ messages in thread
From: cel @ 2024-12-26 16:28 UTC (permalink / raw)
To: Neil Brown, Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey
Cc: linux-nfs, Rick Macklem, j.david.lists, Chuck Lever
From: Chuck Lever <chuck.lever@oracle.com>
Commit ab04de60ae1c ("NFSD: Optimize nfsd4_encode_fattr()") replaced
the use of write_bytes_to_xdr_buf() because it's expensive and the
data items to be encoded are already properly aligned.
However, there's no guarantee that the pointer returned from
xdr_reserve_space() will still point to the correct reserved space
in the encode buffer after one or more intervening calls to
xdr_reserve_space(). It just happens to work with the current
implementation of xdr_reserve_space().
This commit effectively reverts the optimization.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
fs/nfsd/nfs4xdr.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index b770225d63dc..d4ee633b01e3 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -3506,8 +3506,8 @@ nfsd4_encode_fattr4(struct svc_rqst *rqstp, struct xdr_stream *xdr,
struct nfsd4_fattr_args args;
struct svc_fh *tempfh = NULL;
int starting_len = xdr->buf->len;
- __be32 *attrlen_p, status;
- int attrlen_offset;
+ unsigned int attrlen_offset;
+ __be32 attrlen, status;
u32 attrmask[3];
int err;
struct nfsd4_compoundres *resp = rqstp->rq_resp;
@@ -3627,9 +3627,8 @@ nfsd4_encode_fattr4(struct svc_rqst *rqstp, struct xdr_stream *xdr,
goto out;
/* attr_vals */
- attrlen_offset = xdr->buf->len;
- attrlen_p = xdr_reserve_space(xdr, XDR_UNIT);
- if (!attrlen_p)
+ attrlen_offset = xdr_stream_pos(xdr);
+ if (unlikely(!xdr_reserve_space(xdr, XDR_UNIT)))
goto out_resource;
bitmap_from_arr32(attr_bitmap, attrmask,
ARRAY_SIZE(nfsd4_enc_fattr4_encode_ops));
@@ -3639,7 +3638,8 @@ nfsd4_encode_fattr4(struct svc_rqst *rqstp, struct xdr_stream *xdr,
if (status != nfs_ok)
goto out;
}
- *attrlen_p = cpu_to_be32(xdr->buf->len - attrlen_offset - XDR_UNIT);
+ attrlen = cpu_to_be32(xdr_stream_pos(xdr) - attrlen_offset - XDR_UNIT);
+ write_bytes_to_xdr_buf(xdr->buf, attrlen_offset, &attrlen, XDR_UNIT);
status = nfs_ok;
out:
--
2.47.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v3 6/6] SUNRPC: Document validity guarantees of the pointer returned by reserve_space
2024-12-26 16:28 [PATCH v3 0/6] Fix XDR encoding near page boundaries cel
` (4 preceding siblings ...)
2024-12-26 16:28 ` [PATCH v3 5/6] NFSD: Insulate nfsd4_encode_fattr4() " cel
@ 2024-12-26 16:28 ` cel
2024-12-26 22:36 ` NeilBrown
5 siblings, 1 reply; 11+ messages in thread
From: cel @ 2024-12-26 16:28 UTC (permalink / raw)
To: Neil Brown, Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey
Cc: linux-nfs, Rick Macklem, j.david.lists, Chuck Lever
From: Chuck Lever <chuck.lever@oracle.com>
A subtlety of this API is that if the @nbytes region traverses a
page boundary, the next __xdr_commit_encode will shift the data item
in the XDR encode buffer. This makes the returned pointer point to
something else, leading to unexpected behavior.
There are a few cases where the caller saves the returned pointer
and then later uses it to insert a computed value into an earlier
part of the stream. This can be safe only if either:
- the data item is guaranteed to be in the XDR buffer's head, and
thus is not ever going to be near a page boundary, or
- the data item is no larger than 4 octets, since XDR alignment
rules require all data items to start on 4-octet boundaries
But that safety is only an artifact of the current implementation.
It would be less brittle if these "safe" uses were eventually
replaced.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
net/sunrpc/xdr.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c
index 62e07c330a66..f198bb043e2f 100644
--- a/net/sunrpc/xdr.c
+++ b/net/sunrpc/xdr.c
@@ -1097,6 +1097,9 @@ static noinline __be32 *xdr_get_next_encode_buffer(struct xdr_stream *xdr,
* Checks that we have enough buffer space to encode 'nbytes' more
* bytes of data. If so, update the total xdr_buf length, and
* adjust the length of the current kvec.
+ *
+ * The returned pointer is valid only until the next call to
+ * xdr_reserve_space() or xdr_commit_encode() on this stream.
*/
__be32 * xdr_reserve_space(struct xdr_stream *xdr, size_t nbytes)
{
--
2.47.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH v3 1/6] NFSD: Encode COMPOUND operation status on page boundaries
2024-12-26 16:28 ` [PATCH v3 1/6] NFSD: Encode COMPOUND operation status on " cel
@ 2024-12-26 17:17 ` Cedric Blancher
2024-12-27 2:06 ` Chuck Lever
0 siblings, 1 reply; 11+ messages in thread
From: Cedric Blancher @ 2024-12-26 17:17 UTC (permalink / raw)
To: linux-nfs
On Thu, 26 Dec 2024 at 17:29, <cel@kernel.org> wrote:
>
> From: Chuck Lever <chuck.lever@oracle.com>
>
> J. David reports an odd corruption of a READDIR reply sent to a
> FreeBSD client.
>
> xdr_reserve_space() has to do a special trick when the @nbytes value
> requests more space than there is in the current page of the XDR
> buffer.
>
> In that case, xdr_reserve_space() returns a pointer to the start of
> the next page, and then the next call to xdr_reserve_space() invokes
> __xdr_commit_encode() to copy enough of the data item back into the
> previous page to make that data item contiguous across the page
> boundary.
>
> But we need to be careful in the case where buffer space is reserved
> early for a data item whose value will be inserted into the buffer
> later.
>
> One such caller, nfsd4_encode_operation(), reserves 8 bytes in the
> encoding buffer for each COMPOUND operation. However, a READDIR
> result can sometimes encode file names so that there are only 4
> bytes left at the end of the current XDR buffer page (though plenty
> of pages are left to handle the remaining encoding tasks).
>
> If a COMPOUND operation follows the READDIR result (say, a GETATTR),
> then nfsd4_encode_operation() will reserve 8 bytes for the op number
> (9) and the op status (usually NFS4_OK). In this weird case,
> xdr_reserve_space() returns a pointer to byte zero of the next buffer
> page, as it assumes the data item will be copied back into place (in
> the previous page) on the next call to xdr_reserve_space().
>
> nfsd4_encode_operation() writes the op num into the buffer, then
> saves the next 4-byte location for the op's status code. The next
> xdr_reserve_space() call is part of GETATTR encoding, so the op num
> gets copied back into the previous page, but the saved location for
> the op status continues to point to the wrong spot in the current
> XDR buffer page because __xdr_commit_encode() moved that data item.
>
> After GETATTR encoding is complete, nfsd4_encode_operation() writes
> the op status over the first XDR data item in the GETATTR result.
> The NFS4_OK status code (0) makes it look like there are zero items
> in the GETATTR's attribute bitmask.
>
> The patch description of commit 2825a7f90753 ("nfsd4: allow encoding
> across page boundaries") [2014] remarks that NFSD "can't handle a
> new operation starting close to the end of a page." This bug appears
> to be one reason for that remark.
>
> Reported-by: J David <j.david.lists@gmail.com>
> Closes: https://lore.kernel.org/linux-nfs/3998d739-c042-46b4-8166-dbd6c5f0e804@oracle.com/T/#t
> Tested-by: Rick Macklem <rmacklem@uoguelph.ca>
> Reviewed-by: NeilBrown <neilb@suse.de>
> X-Cc: stable@vger.kernel.org
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
We would appreciate it if this patch series (esp the "insulate"
patches) could be backported to (at least) the 6.6 LTS branch, as this
kind of data corruption is haunting NFSv4.x clients since years#
Ced
--
Cedric Blancher <cedric.blancher@gmail.com>
[https://plus.google.com/u/0/+CedricBlancher/]
Institute Pasteur
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v3 6/6] SUNRPC: Document validity guarantees of the pointer returned by reserve_space
2024-12-26 16:28 ` [PATCH v3 6/6] SUNRPC: Document validity guarantees of the pointer returned by reserve_space cel
@ 2024-12-26 22:36 ` NeilBrown
2024-12-27 0:52 ` Chuck Lever
0 siblings, 1 reply; 11+ messages in thread
From: NeilBrown @ 2024-12-26 22:36 UTC (permalink / raw)
To: cel
Cc: Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey, linux-nfs,
Rick Macklem, j.david.lists, Chuck Lever
On Fri, 27 Dec 2024, cel@kernel.org wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
>
> A subtlety of this API is that if the @nbytes region traverses a
> page boundary, the next __xdr_commit_encode will shift the data item
> in the XDR encode buffer. This makes the returned pointer point to
> something else, leading to unexpected behavior.
>
> There are a few cases where the caller saves the returned pointer
> and then later uses it to insert a computed value into an earlier
> part of the stream. This can be safe only if either:
>
> - the data item is guaranteed to be in the XDR buffer's head, and
> thus is not ever going to be near a page boundary, or
> - the data item is no larger than 4 octets, since XDR alignment
> rules require all data items to start on 4-octet boundaries
>
> But that safety is only an artifact of the current implementation.
> It would be less brittle if these "safe" uses were eventually
> replaced.
>
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
> net/sunrpc/xdr.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c
> index 62e07c330a66..f198bb043e2f 100644
> --- a/net/sunrpc/xdr.c
> +++ b/net/sunrpc/xdr.c
> @@ -1097,6 +1097,9 @@ static noinline __be32 *xdr_get_next_encode_buffer(struct xdr_stream *xdr,
> * Checks that we have enough buffer space to encode 'nbytes' more
> * bytes of data. If so, update the total xdr_buf length, and
> * adjust the length of the current kvec.
> + *
> + * The returned pointer is valid only until the next call to
> + * xdr_reserve_space() or xdr_commit_encode() on this stream.
There are still several places where the return value of
xdr_reserver_space is stored for longer than advised here.
- there are several in nfs/callback_xdr.c
e.g. encode_compound_hdr
- there is attrlen_p in nfsd4_encode_fattr4
- maxcount_p in nfsd4_encode_readlink
- flavors_p in nfsd4_do_encode_secinfo
- rqstp->rq_accept_statp which is initialied in svcxdr_set_accept_stat
and updated in many places.
- segcount in rpcrdma_encode_write_list
- segcount in rpcrdma_encode_reply_chunk
These are all safe because of the "4 octets" rule. Should we include
that in the documenting comment, or would you rather all of these
be change to use an offset rather than a pointer?
Thanks,
NeilBrown
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v3 6/6] SUNRPC: Document validity guarantees of the pointer returned by reserve_space
2024-12-26 22:36 ` NeilBrown
@ 2024-12-27 0:52 ` Chuck Lever
0 siblings, 0 replies; 11+ messages in thread
From: Chuck Lever @ 2024-12-27 0:52 UTC (permalink / raw)
To: NeilBrown, cel
Cc: Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey, linux-nfs,
Rick Macklem, j.david.lists
On 12/26/24 5:36 PM, NeilBrown wrote:
> On Fri, 27 Dec 2024, cel@kernel.org wrote:
>> From: Chuck Lever <chuck.lever@oracle.com>
>>
>> A subtlety of this API is that if the @nbytes region traverses a
>> page boundary, the next __xdr_commit_encode will shift the data item
>> in the XDR encode buffer. This makes the returned pointer point to
>> something else, leading to unexpected behavior.
>>
>> There are a few cases where the caller saves the returned pointer
>> and then later uses it to insert a computed value into an earlier
>> part of the stream. This can be safe only if either:
>>
>> - the data item is guaranteed to be in the XDR buffer's head, and
>> thus is not ever going to be near a page boundary, or
>> - the data item is no larger than 4 octets, since XDR alignment
>> rules require all data items to start on 4-octet boundaries
>>
>> But that safety is only an artifact of the current implementation.
>> It would be less brittle if these "safe" uses were eventually
>> replaced.
>>
>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>> ---
>> net/sunrpc/xdr.c | 3 +++
>> 1 file changed, 3 insertions(+)
>>
>> diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c
>> index 62e07c330a66..f198bb043e2f 100644
>> --- a/net/sunrpc/xdr.c
>> +++ b/net/sunrpc/xdr.c
>> @@ -1097,6 +1097,9 @@ static noinline __be32 *xdr_get_next_encode_buffer(struct xdr_stream *xdr,
>> * Checks that we have enough buffer space to encode 'nbytes' more
>> * bytes of data. If so, update the total xdr_buf length, and
>> * adjust the length of the current kvec.
>> + *
>> + * The returned pointer is valid only until the next call to
>> + * xdr_reserve_space() or xdr_commit_encode() on this stream.
>
> There are still several places where the return value of
> xdr_reserver_space is stored for longer than advised here.
>
> - there are several in nfs/callback_xdr.c
> e.g. encode_compound_hdr
> - there is attrlen_p in nfsd4_encode_fattr4
> - maxcount_p in nfsd4_encode_readlink
> - flavors_p in nfsd4_do_encode_secinfo
> - rqstp->rq_accept_statp which is initialied in svcxdr_set_accept_stat
> and updated in many places.
> - segcount in rpcrdma_encode_write_list
> - segcount in rpcrdma_encode_reply_chunk
>
> These are all safe because of the "4 octets" rule. Should we include
> that in the documenting comment, or would you rather all of these
> be change to use an offset rather than a pointer?
That's getting to be a tall list. I will add another sentence to
xdr_reserve_space's kdoc comment for now, something like:
"The current implementation guarantees that space reserved for a
four-byte data item remains valid until the stream is destroyed."
IMO encode_readlink and encode_secinfo need to be updated sooner
rather than later.
--
Chuck Lever
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v3 1/6] NFSD: Encode COMPOUND operation status on page boundaries
2024-12-26 17:17 ` Cedric Blancher
@ 2024-12-27 2:06 ` Chuck Lever
0 siblings, 0 replies; 11+ messages in thread
From: Chuck Lever @ 2024-12-27 2:06 UTC (permalink / raw)
To: Cedric Blancher; +Cc: linux-nfs
On 12/26/24 12:17 PM, Cedric Blancher wrote:
> On Thu, 26 Dec 2024 at 17:29, <cel@kernel.org> wrote:
>>
>> From: Chuck Lever <chuck.lever@oracle.com>
>>
>> J. David reports an odd corruption of a READDIR reply sent to a
>> FreeBSD client.
>>
>> xdr_reserve_space() has to do a special trick when the @nbytes value
>> requests more space than there is in the current page of the XDR
>> buffer.
>>
>> In that case, xdr_reserve_space() returns a pointer to the start of
>> the next page, and then the next call to xdr_reserve_space() invokes
>> __xdr_commit_encode() to copy enough of the data item back into the
>> previous page to make that data item contiguous across the page
>> boundary.
>>
>> But we need to be careful in the case where buffer space is reserved
>> early for a data item whose value will be inserted into the buffer
>> later.
>>
>> One such caller, nfsd4_encode_operation(), reserves 8 bytes in the
>> encoding buffer for each COMPOUND operation. However, a READDIR
>> result can sometimes encode file names so that there are only 4
>> bytes left at the end of the current XDR buffer page (though plenty
>> of pages are left to handle the remaining encoding tasks).
>>
>> If a COMPOUND operation follows the READDIR result (say, a GETATTR),
>> then nfsd4_encode_operation() will reserve 8 bytes for the op number
>> (9) and the op status (usually NFS4_OK). In this weird case,
>> xdr_reserve_space() returns a pointer to byte zero of the next buffer
>> page, as it assumes the data item will be copied back into place (in
>> the previous page) on the next call to xdr_reserve_space().
>>
>> nfsd4_encode_operation() writes the op num into the buffer, then
>> saves the next 4-byte location for the op's status code. The next
>> xdr_reserve_space() call is part of GETATTR encoding, so the op num
>> gets copied back into the previous page, but the saved location for
>> the op status continues to point to the wrong spot in the current
>> XDR buffer page because __xdr_commit_encode() moved that data item.
>>
>> After GETATTR encoding is complete, nfsd4_encode_operation() writes
>> the op status over the first XDR data item in the GETATTR result.
>> The NFS4_OK status code (0) makes it look like there are zero items
>> in the GETATTR's attribute bitmask.
>>
>> The patch description of commit 2825a7f90753 ("nfsd4: allow encoding
>> across page boundaries") [2014] remarks that NFSD "can't handle a
>> new operation starting close to the end of a page." This bug appears
>> to be one reason for that remark.
>>
>> Reported-by: J David <j.david.lists@gmail.com>
>> Closes: https://lore.kernel.org/linux-nfs/3998d739-c042-46b4-8166-dbd6c5f0e804@oracle.com/T/#t
>> Tested-by: Rick Macklem <rmacklem@uoguelph.ca>
>> Reviewed-by: NeilBrown <neilb@suse.de>
>> X-Cc: stable@vger.kernel.org
>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>> ---
>
> We would appreciate it if this patch series (esp the "insulate"
> patches) could be backported to (at least) the 6.6 LTS branch, as this
> kind of data corruption is haunting NFSv4.x clients since years#
Note the "Cc: stable" tag. Fwiw, it's my intention to see that this fix
gets into stable kernels.
I expect this patch to apply cleanly and automatically to LTS 6.6.
Earlier LTS kernels might be more challenging, but I'm happy to manually
backport to those if that's needed.
--
Chuck Lever
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2024-12-27 2:06 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-26 16:28 [PATCH v3 0/6] Fix XDR encoding near page boundaries cel
2024-12-26 16:28 ` [PATCH v3 1/6] NFSD: Encode COMPOUND operation status on " cel
2024-12-26 17:17 ` Cedric Blancher
2024-12-27 2:06 ` Chuck Lever
2024-12-26 16:28 ` [PATCH v3 2/6] NFSD: Insulate nfsd4_encode_read() from page boundaries in the encode buffer cel
2024-12-26 16:28 ` [PATCH v3 3/6] NFSD: Insulate nfsd4_encode_read_plus() " cel
2024-12-26 16:28 ` [PATCH v3 4/6] NFSD: Insulate nfsd4_encode_read_plus_data() " cel
2024-12-26 16:28 ` [PATCH v3 5/6] NFSD: Insulate nfsd4_encode_fattr4() " cel
2024-12-26 16:28 ` [PATCH v3 6/6] SUNRPC: Document validity guarantees of the pointer returned by reserve_space cel
2024-12-26 22:36 ` NeilBrown
2024-12-27 0:52 ` Chuck Lever
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox