* [PATCH 2/2] NFSD: Fix last write offset handling in layoutcommit
2025-07-04 11:49 [PATCH 0/2] NFSD: Fix last write offset handling in layoutcommit Sergey Bashirov
@ 2025-07-04 11:49 ` Sergey Bashirov
2025-07-04 16:29 ` Chuck Lever
2025-07-10 7:27 ` Christoph Hellwig
0 siblings, 2 replies; 10+ messages in thread
From: Sergey Bashirov @ 2025-07-04 11:49 UTC (permalink / raw)
To: Chuck Lever, Christoph Hellwig, Jeff Layton, NeilBrown,
Olga Kornievskaia, Dai Ngo, Tom Talpey
Cc: linux-nfs, linux-kernel, Konstantin Evtushenko, Sergey Bashirov
The data type of loca_last_write_offset is newoffset4 and is switched
on a boolean value, no_newoffset, that indicates if a previous write
occurred or not. If no_newoffset is FALSE, an offset is not given.
This means that client does not try to update the file size. Thus,
server should not try to calculate new file size and check if it fits
into the seg range.
Co-developed-by: Konstantin Evtushenko <koevtushenko@yandex.com>
Signed-off-by: Konstantin Evtushenko <koevtushenko@yandex.com>
Signed-off-by: Sergey Bashirov <sergeybashirov@gmail.com>
---
fs/nfsd/blocklayout.c | 2 +-
fs/nfsd/nfs4proc.c | 16 ++++++++--------
2 files changed, 9 insertions(+), 9 deletions(-)
diff --git a/fs/nfsd/blocklayout.c b/fs/nfsd/blocklayout.c
index 19078a043e85..ee6544bdc045 100644
--- a/fs/nfsd/blocklayout.c
+++ b/fs/nfsd/blocklayout.c
@@ -118,7 +118,7 @@ nfsd4_block_commit_blocks(struct inode *inode, struct nfsd4_layoutcommit *lcp,
struct iomap *iomaps, int nr_iomaps)
{
struct timespec64 mtime = inode_get_mtime(inode);
- loff_t new_size = lcp->lc_last_wr + 1;
+ loff_t new_size = (lcp->lc_newoffset) ? lcp->lc_last_wr + 1 : 0;
struct iattr iattr = { .ia_valid = 0 };
int error;
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 37bdb937a0ae..ff38be803d8b 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -2482,7 +2482,7 @@ nfsd4_layoutcommit(struct svc_rqst *rqstp,
const struct nfsd4_layout_seg *seg = &lcp->lc_seg;
struct svc_fh *current_fh = &cstate->current_fh;
const struct nfsd4_layout_ops *ops;
- loff_t new_size = lcp->lc_last_wr + 1;
+ loff_t new_size = (lcp->lc_newoffset) ? lcp->lc_last_wr + 1 : 0;
struct inode *inode;
struct nfs4_layout_stateid *ls;
__be32 nfserr;
@@ -2498,13 +2498,13 @@ nfsd4_layoutcommit(struct svc_rqst *rqstp,
goto out;
inode = d_inode(current_fh->fh_dentry);
- nfserr = nfserr_inval;
- if (new_size <= seg->offset)
- goto out;
- if (new_size > seg->offset + seg->length)
- goto out;
- if (!lcp->lc_newoffset && new_size > i_size_read(inode))
- goto out;
+ if (new_size) {
+ nfserr = nfserr_inval;
+ if (new_size <= seg->offset)
+ goto out;
+ if (new_size > seg->offset + seg->length)
+ goto out;
+ }
nfserr = nfsd4_preprocess_layout_stateid(rqstp, cstate, &lcp->lc_sid,
false, lcp->lc_layout_type,
--
2.43.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] NFSD: Fix last write offset handling in layoutcommit
2025-07-04 11:49 ` [PATCH 2/2] " Sergey Bashirov
@ 2025-07-04 16:29 ` Chuck Lever
2025-07-05 6:16 ` Sergey Bashirov
2025-07-10 7:27 ` Christoph Hellwig
1 sibling, 1 reply; 10+ messages in thread
From: Chuck Lever @ 2025-07-04 16:29 UTC (permalink / raw)
To: Sergey Bashirov, Christoph Hellwig, Jeff Layton, NeilBrown,
Olga Kornievskaia, Dai Ngo, Tom Talpey
Cc: linux-nfs, linux-kernel, Konstantin Evtushenko
Hi Sergey, Konstantin -
On 7/4/25 7:49 AM, Sergey Bashirov wrote:
> The data type of loca_last_write_offset is newoffset4 and is switched
> on a boolean value, no_newoffset, that indicates if a previous write
> occurred or not. If no_newoffset is FALSE, an offset is not given.
> This means that client does not try to update the file size. Thus,
> server should not try to calculate new file size and check if it fits
> into the seg range.
The patch description should describe the impact of the current
incorrect logic -- does it result in file corruption, failed tests, etc?
That way support engineers at distributions can more easily find this
patch if a customer runs across bad behavior.
Also, let's reference RFC 8881 Section 12.5.4.2, where the properly
compliant behavior is specified.
Fixes: 9cf514ccfacb ("nfsd: implement pNFS operations")
> Co-developed-by: Konstantin Evtushenko <koevtushenko@yandex.com>
> Signed-off-by: Konstantin Evtushenko <koevtushenko@yandex.com>
> Signed-off-by: Sergey Bashirov <sergeybashirov@gmail.com>
> ---
> fs/nfsd/blocklayout.c | 2 +-
> fs/nfsd/nfs4proc.c | 16 ++++++++--------
> 2 files changed, 9 insertions(+), 9 deletions(-)
>
> diff --git a/fs/nfsd/blocklayout.c b/fs/nfsd/blocklayout.c
> index 19078a043e85..ee6544bdc045 100644
> --- a/fs/nfsd/blocklayout.c
> +++ b/fs/nfsd/blocklayout.c
> @@ -118,7 +118,7 @@ nfsd4_block_commit_blocks(struct inode *inode, struct nfsd4_layoutcommit *lcp,
> struct iomap *iomaps, int nr_iomaps)
> {
> struct timespec64 mtime = inode_get_mtime(inode);
> - loff_t new_size = lcp->lc_last_wr + 1;
> + loff_t new_size = (lcp->lc_newoffset) ? lcp->lc_last_wr + 1 : 0;
> struct iattr iattr = { .ia_valid = 0 };
> int error;
See below for an alternative.
> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> index 37bdb937a0ae..ff38be803d8b 100644
> --- a/fs/nfsd/nfs4proc.c
> +++ b/fs/nfsd/nfs4proc.c
> @@ -2482,7 +2482,7 @@ nfsd4_layoutcommit(struct svc_rqst *rqstp,
> const struct nfsd4_layout_seg *seg = &lcp->lc_seg;
> struct svc_fh *current_fh = &cstate->current_fh;
> const struct nfsd4_layout_ops *ops;
> - loff_t new_size = lcp->lc_last_wr + 1;
> + loff_t new_size = (lcp->lc_newoffset) ? lcp->lc_last_wr + 1 : 0;
> struct inode *inode;
> struct nfs4_layout_stateid *ls;
> __be32 nfserr;
> @@ -2498,13 +2498,13 @@ nfsd4_layoutcommit(struct svc_rqst *rqstp,
> goto out;
> inode = d_inode(current_fh->fh_dentry);
>
How about instead, drop the new_size initializer above, and do this:
lcp->lc_size_chg = false;
if (lcp->lc_newoffset) {
loff_t new_size = lcp->lc_last_wr + 1;
nfserr = nfserr_inval;
if (new_size <= seg->offset)
goto out;
if (new_size > seg->offset + seg->length)
goto out;
if (new_size > i_size_read(inode)) {
lcp->lc_size_chg = true;
lcp->lc_newsize = new_size;
}
}
> - nfserr = nfserr_inval;
> - if (new_size <= seg->offset)
> - goto out;
> - if (new_size > seg->offset + seg->length)
> - goto out;
> - if (!lcp->lc_newoffset && new_size > i_size_read(inode))
> - goto out;
> + if (new_size) {
> + nfserr = nfserr_inval;
> + if (new_size <= seg->offset)
> + goto out;
> + if (new_size > seg->offset + seg->length)
> + goto out;
> + }
>
> nfserr = nfsd4_preprocess_layout_stateid(rqstp, cstate, &lcp->lc_sid,
> false, lcp->lc_layout_type,
And lastly:
- if (new_size > i_size_read(inode)) {
- lcp->lc_size_chg = true;
- lcp->lc_newsize = new_size;
- } else {
- lcp->lc_size_chg = false;
- }
Also, I notice that nfsd4_decode_layoutcommit() has:
if (xdr_stream_decode_bool(argp->xdr, &lcp->lc_reclaim) < 0)
return nfserr_bad_xdr;
but:
if (xdr_stream_decode_u32(argp->xdr, &lcp->lc_newoffset) < 0)
return nfserr_bad_xdr;
The no_newoffset field should be decoded with xdr_stream_decode_bool too
(though the end result is the same). For just this nit, please make a
separate patch. Thanks!
--
Chuck Lever
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] NFSD: Fix last write offset handling in layoutcommit
2025-07-04 16:29 ` Chuck Lever
@ 2025-07-05 6:16 ` Sergey Bashirov
2025-07-06 15:01 ` Chuck Lever
0 siblings, 1 reply; 10+ messages in thread
From: Sergey Bashirov @ 2025-07-05 6:16 UTC (permalink / raw)
To: Chuck Lever, Christoph Hellwig, Jeff Layton, NeilBrown,
Olga Kornievskaia, Dai Ngo, Tom Talpey
Cc: linux-nfs, linux-kernel, Konstantin Evtushenko, Sergey Bashirov
Hi Chuck,
Thanks for your kind guidance, it really helps to understand the
process of code contribution better! I'm on vacation next week,
will rework and resubmit updated patches when I get back.
--
Sergey Bashirov
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] NFSD: Fix last write offset handling in layoutcommit
2025-07-05 6:16 ` Sergey Bashirov
@ 2025-07-06 15:01 ` Chuck Lever
0 siblings, 0 replies; 10+ messages in thread
From: Chuck Lever @ 2025-07-06 15:01 UTC (permalink / raw)
To: Sergey Bashirov, Christoph Hellwig, Jeff Layton, NeilBrown,
Olga Kornievskaia, Dai Ngo, Tom Talpey
Cc: linux-nfs, linux-kernel, Konstantin Evtushenko
On 7/5/25 2:16 AM, Sergey Bashirov wrote:
> Hi Chuck,
>
> Thanks for your kind guidance, it really helps to understand the
> process of code contribution better! I'm on vacation next week,
> will rework and resubmit updated patches when I get back.
Dai, Jeff, and I appreciate your help with NFSD's pNFS implementation!
--
Chuck Lever
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] NFSD: Fix last write offset handling in layoutcommit
2025-07-04 11:49 ` [PATCH 2/2] " Sergey Bashirov
2025-07-04 16:29 ` Chuck Lever
@ 2025-07-10 7:27 ` Christoph Hellwig
1 sibling, 0 replies; 10+ messages in thread
From: Christoph Hellwig @ 2025-07-10 7:27 UTC (permalink / raw)
To: Sergey Bashirov
Cc: Chuck Lever, Christoph Hellwig, Jeff Layton, NeilBrown,
Olga Kornievskaia, Dai Ngo, Tom Talpey, linux-nfs, linux-kernel,
Konstantin Evtushenko
On Fri, Jul 04, 2025 at 02:49:05PM +0300, Sergey Bashirov wrote:
> The data type of loca_last_write_offset is newoffset4 and is switched
> on a boolean value, no_newoffset, that indicates if a previous write
> occurred or not. If no_newoffset is FALSE, an offset is not given.
> This means that client does not try to update the file size. Thus,
> server should not try to calculate new file size and check if it fits
> into the seg range.
>
> Co-developed-by: Konstantin Evtushenko <koevtushenko@yandex.com>
> Signed-off-by: Konstantin Evtushenko <koevtushenko@yandex.com>
> Signed-off-by: Sergey Bashirov <sergeybashirov@gmail.com>
> ---
> fs/nfsd/blocklayout.c | 2 +-
> fs/nfsd/nfs4proc.c | 16 ++++++++--------
> 2 files changed, 9 insertions(+), 9 deletions(-)
>
> diff --git a/fs/nfsd/blocklayout.c b/fs/nfsd/blocklayout.c
> index 19078a043e85..ee6544bdc045 100644
> --- a/fs/nfsd/blocklayout.c
> +++ b/fs/nfsd/blocklayout.c
> @@ -118,7 +118,7 @@ nfsd4_block_commit_blocks(struct inode *inode, struct nfsd4_layoutcommit *lcp,
> struct iomap *iomaps, int nr_iomaps)
> {
> struct timespec64 mtime = inode_get_mtime(inode);
> - loff_t new_size = lcp->lc_last_wr + 1;
> + loff_t new_size = (lcp->lc_newoffset) ? lcp->lc_last_wr + 1 : 0;
> struct iattr iattr = { .ia_valid = 0 };
> int error;
Please guard the entire new_size check below instead, i.e.
if (lcp->lc_newoffset) {
loff_t new_size = lcp->lc_last_wr + 1;
if (new_size > i_size_read(inode)) {
iattr.ia_valid |= ATTR_SIZE;
iattr.ia_size = new_size;
}
}
> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> index 37bdb937a0ae..ff38be803d8b 100644
> --- a/fs/nfsd/nfs4proc.c
> +++ b/fs/nfsd/nfs4proc.c
> @@ -2482,7 +2482,7 @@ nfsd4_layoutcommit(struct svc_rqst *rqstp,
> const struct nfsd4_layout_seg *seg = &lcp->lc_seg;
> struct svc_fh *current_fh = &cstate->current_fh;
> const struct nfsd4_layout_ops *ops;
> - loff_t new_size = lcp->lc_last_wr + 1;
> + loff_t new_size = (lcp->lc_newoffset) ? lcp->lc_last_wr + 1 : 0;
Same here.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 0/2] NFSD: Rebase dropped patches for block/scsi layout
@ 2025-07-21 18:40 Sergey Bashirov
2025-07-21 18:40 ` [PATCH 1/2] NFSD: Implement large extent array support in pNFS Sergey Bashirov
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: Sergey Bashirov @ 2025-07-21 18:40 UTC (permalink / raw)
To: Chuck Lever
Cc: Jeff Layton, Christoph Hellwig, NeilBrown, Olga Kornievskaia,
Dai Ngo, Tom Talpey, linux-nfs, linux-kernel, Sergey Bashirov
This series resubmits two patches that were dropped due to conflicts
after the deviceid4 handling rework. The first patch is rebased. Only
the decoding of bex.vol_id, which has the deviceid4 type, is updated.
The second patch doesn't change, it just depends on the first one.
Signed-off-by: Sergey Bashirov <sergeybashirov@gmail.com>
---
Sergey Bashirov (2):
NFSD: Implement large extent array support in pNFS
NFSD: Fix last write offset handling in layoutcommit
fs/nfsd/blocklayout.c | 25 ++++++------
fs/nfsd/blocklayoutxdr.c | 83 +++++++++++++++++++++++++++-------------
fs/nfsd/blocklayoutxdr.h | 4 +-
fs/nfsd/nfs4proc.c | 32 ++++++++--------
fs/nfsd/nfs4xdr.c | 11 +++---
fs/nfsd/pnfs.h | 1 +
fs/nfsd/xdr4.h | 3 +-
7 files changed, 95 insertions(+), 64 deletions(-)
--
2.43.0
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 1/2] NFSD: Implement large extent array support in pNFS
2025-07-21 18:40 [PATCH 0/2] NFSD: Rebase dropped patches for block/scsi layout Sergey Bashirov
@ 2025-07-21 18:40 ` Sergey Bashirov
2025-07-21 18:40 ` [PATCH 2/2] NFSD: Fix last write offset handling in layoutcommit Sergey Bashirov
2025-07-21 19:00 ` [PATCH 0/2] NFSD: Rebase dropped patches for block/scsi layout Chuck Lever
2 siblings, 0 replies; 10+ messages in thread
From: Sergey Bashirov @ 2025-07-21 18:40 UTC (permalink / raw)
To: Chuck Lever
Cc: Jeff Layton, Christoph Hellwig, NeilBrown, Olga Kornievskaia,
Dai Ngo, Tom Talpey, linux-nfs, linux-kernel, Sergey Bashirov,
Konstantin Evtushenko
When pNFS client in the block or scsi layout mode sends layoutcommit
to MDS, a variable length array of modified extents is supplied within
the request. This patch allows the server to accept such extent arrays
if they do not fit within single memory page.
The issue can be reproduced when writing to a 1GB file using FIO with
O_DIRECT, 4K block and large I/O depth without preallocation of the
file. In this case, the server returns NFSERR_BADXDR to the client.
Co-developed-by: Konstantin Evtushenko <koevtushenko@yandex.com>
Signed-off-by: Konstantin Evtushenko <koevtushenko@yandex.com>
Signed-off-by: Sergey Bashirov <sergeybashirov@gmail.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
fs/nfsd/blocklayout.c | 20 ++++++----
fs/nfsd/blocklayoutxdr.c | 83 +++++++++++++++++++++++++++-------------
fs/nfsd/blocklayoutxdr.h | 4 +-
fs/nfsd/nfs4proc.c | 2 +-
fs/nfsd/nfs4xdr.c | 11 +++---
fs/nfsd/pnfs.h | 1 +
fs/nfsd/xdr4.h | 3 +-
7 files changed, 78 insertions(+), 46 deletions(-)
diff --git a/fs/nfsd/blocklayout.c b/fs/nfsd/blocklayout.c
index 19078a043e85c..4c936132eb440 100644
--- a/fs/nfsd/blocklayout.c
+++ b/fs/nfsd/blocklayout.c
@@ -173,16 +173,18 @@ nfsd4_block_proc_getdeviceinfo(struct super_block *sb,
}
static __be32
-nfsd4_block_proc_layoutcommit(struct inode *inode,
+nfsd4_block_proc_layoutcommit(struct inode *inode, struct svc_rqst *rqstp,
struct nfsd4_layoutcommit *lcp)
{
struct iomap *iomaps;
int nr_iomaps;
__be32 nfserr;
- nfserr = nfsd4_block_decode_layoutupdate(lcp->lc_up_layout,
- lcp->lc_up_len, &iomaps, &nr_iomaps,
- i_blocksize(inode));
+ rqstp->rq_arg = lcp->lc_up_layout;
+ svcxdr_init_decode(rqstp);
+
+ nfserr = nfsd4_block_decode_layoutupdate(&rqstp->rq_arg_stream,
+ &iomaps, &nr_iomaps, i_blocksize(inode));
if (nfserr != nfs_ok)
return nfserr;
@@ -313,16 +315,18 @@ nfsd4_scsi_proc_getdeviceinfo(struct super_block *sb,
return nfserrno(nfsd4_block_get_device_info_scsi(sb, clp, gdp));
}
static __be32
-nfsd4_scsi_proc_layoutcommit(struct inode *inode,
+nfsd4_scsi_proc_layoutcommit(struct inode *inode, struct svc_rqst *rqstp,
struct nfsd4_layoutcommit *lcp)
{
struct iomap *iomaps;
int nr_iomaps;
__be32 nfserr;
- nfserr = nfsd4_scsi_decode_layoutupdate(lcp->lc_up_layout,
- lcp->lc_up_len, &iomaps, &nr_iomaps,
- i_blocksize(inode));
+ rqstp->rq_arg = lcp->lc_up_layout;
+ svcxdr_init_decode(rqstp);
+
+ nfserr = nfsd4_scsi_decode_layoutupdate(&rqstp->rq_arg_stream,
+ &iomaps, &nr_iomaps, i_blocksize(inode));
if (nfserr != nfs_ok)
return nfserr;
diff --git a/fs/nfsd/blocklayoutxdr.c b/fs/nfsd/blocklayoutxdr.c
index 18de37ff28916..e50afe3407371 100644
--- a/fs/nfsd/blocklayoutxdr.c
+++ b/fs/nfsd/blocklayoutxdr.c
@@ -113,8 +113,7 @@ nfsd4_block_encode_getdeviceinfo(struct xdr_stream *xdr,
/**
* nfsd4_block_decode_layoutupdate - decode the block layout extent array
- * @p: pointer to the xdr data
- * @len: number of bytes to decode
+ * @xdr: subbuf set to the encoded array
* @iomapp: pointer to store the decoded extent array
* @nr_iomapsp: pointer to store the number of extents
* @block_size: alignment of extent offset and length
@@ -127,25 +126,24 @@ nfsd4_block_encode_getdeviceinfo(struct xdr_stream *xdr,
*
* Return values:
* %nfs_ok: Successful decoding, @iomapp and @nr_iomapsp are valid
- * %nfserr_bad_xdr: The encoded array in @p is invalid
+ * %nfserr_bad_xdr: The encoded array in @xdr is invalid
* %nfserr_inval: An unaligned extent found
* %nfserr_delay: Failed to allocate memory for @iomapp
*/
__be32
-nfsd4_block_decode_layoutupdate(__be32 *p, u32 len, struct iomap **iomapp,
+nfsd4_block_decode_layoutupdate(struct xdr_stream *xdr, struct iomap **iomapp,
int *nr_iomapsp, u32 block_size)
{
struct iomap *iomaps;
- u32 nr_iomaps, i;
+ u32 nr_iomaps, expected, len, i;
+ __be32 nfserr;
- if (len < sizeof(u32))
- return nfserr_bad_xdr;
- len -= sizeof(u32);
- if (len % PNFS_BLOCK_EXTENT_SIZE)
+ if (xdr_stream_decode_u32(xdr, &nr_iomaps))
return nfserr_bad_xdr;
- nr_iomaps = be32_to_cpup(p++);
- if (nr_iomaps != len / PNFS_BLOCK_EXTENT_SIZE)
+ len = sizeof(__be32) + xdr_stream_remaining(xdr);
+ expected = sizeof(__be32) + nr_iomaps * PNFS_BLOCK_EXTENT_SIZE;
+ if (len != expected)
return nfserr_bad_xdr;
iomaps = kcalloc(nr_iomaps, sizeof(*iomaps), GFP_KERNEL);
@@ -155,21 +153,44 @@ nfsd4_block_decode_layoutupdate(__be32 *p, u32 len, struct iomap **iomapp,
for (i = 0; i < nr_iomaps; i++) {
struct pnfs_block_extent bex;
- p = svcxdr_decode_deviceid4(p, &bex.vol_id);
- p = xdr_decode_hyper(p, &bex.foff);
+ if (nfsd4_decode_deviceid4(xdr, &bex.vol_id)) {
+ nfserr = nfserr_bad_xdr;
+ goto fail;
+ }
+
+ if (xdr_stream_decode_u64(xdr, &bex.foff)) {
+ nfserr = nfserr_bad_xdr;
+ goto fail;
+ }
if (bex.foff & (block_size - 1)) {
+ nfserr = nfserr_inval;
+ goto fail;
+ }
+
+ if (xdr_stream_decode_u64(xdr, &bex.len)) {
+ nfserr = nfserr_bad_xdr;
goto fail;
}
- p = xdr_decode_hyper(p, &bex.len);
if (bex.len & (block_size - 1)) {
+ nfserr = nfserr_inval;
+ goto fail;
+ }
+
+ if (xdr_stream_decode_u64(xdr, &bex.soff)) {
+ nfserr = nfserr_bad_xdr;
goto fail;
}
- p = xdr_decode_hyper(p, &bex.soff);
if (bex.soff & (block_size - 1)) {
+ nfserr = nfserr_inval;
+ goto fail;
+ }
+
+ if (xdr_stream_decode_u32(xdr, &bex.es)) {
+ nfserr = nfserr_bad_xdr;
goto fail;
}
- bex.es = be32_to_cpup(p++);
if (bex.es != PNFS_BLOCK_READWRITE_DATA) {
+ nfserr = nfserr_inval;
goto fail;
}
@@ -182,13 +203,12 @@ nfsd4_block_decode_layoutupdate(__be32 *p, u32 len, struct iomap **iomapp,
return nfs_ok;
fail:
kfree(iomaps);
- return nfserr_inval;
+ return nfserr;
}
/**
* nfsd4_scsi_decode_layoutupdate - decode the scsi layout extent array
- * @p: pointer to the xdr data
- * @len: number of bytes to decode
+ * @xdr: subbuf set to the encoded array
* @iomapp: pointer to store the decoded extent array
* @nr_iomapsp: pointer to store the number of extents
* @block_size: alignment of extent offset and length
@@ -200,21 +220,22 @@ nfsd4_block_decode_layoutupdate(__be32 *p, u32 len, struct iomap **iomapp,
*
* Return values:
* %nfs_ok: Successful decoding, @iomapp and @nr_iomapsp are valid
- * %nfserr_bad_xdr: The encoded array in @p is invalid
+ * %nfserr_bad_xdr: The encoded array in @xdr is invalid
* %nfserr_inval: An unaligned extent found
* %nfserr_delay: Failed to allocate memory for @iomapp
*/
__be32
-nfsd4_scsi_decode_layoutupdate(__be32 *p, u32 len, struct iomap **iomapp,
+nfsd4_scsi_decode_layoutupdate(struct xdr_stream *xdr, struct iomap **iomapp,
int *nr_iomapsp, u32 block_size)
{
struct iomap *iomaps;
- u32 nr_iomaps, expected, i;
+ u32 nr_iomaps, expected, len, i;
+ __be32 nfserr;
- if (len < sizeof(u32))
+ if (xdr_stream_decode_u32(xdr, &nr_iomaps))
return nfserr_bad_xdr;
- nr_iomaps = be32_to_cpup(p++);
+ len = sizeof(__be32) + xdr_stream_remaining(xdr);
expected = sizeof(__be32) + nr_iomaps * PNFS_SCSI_RANGE_SIZE;
if (len != expected)
return nfserr_bad_xdr;
@@ -226,14 +247,22 @@ nfsd4_scsi_decode_layoutupdate(__be32 *p, u32 len, struct iomap **iomapp,
for (i = 0; i < nr_iomaps; i++) {
u64 val;
- p = xdr_decode_hyper(p, &val);
+ if (xdr_stream_decode_u64(xdr, &val)) {
+ nfserr = nfserr_bad_xdr;
+ goto fail;
+ }
if (val & (block_size - 1)) {
+ nfserr = nfserr_inval;
goto fail;
}
iomaps[i].offset = val;
- p = xdr_decode_hyper(p, &val);
+ if (xdr_stream_decode_u64(xdr, &val)) {
+ nfserr = nfserr_bad_xdr;
+ goto fail;
+ }
if (val & (block_size - 1)) {
+ nfserr = nfserr_inval;
goto fail;
}
iomaps[i].length = val;
@@ -244,5 +273,5 @@ nfsd4_scsi_decode_layoutupdate(__be32 *p, u32 len, struct iomap **iomapp,
return nfs_ok;
fail:
kfree(iomaps);
- return nfserr_inval;
+ return nfserr;
}
diff --git a/fs/nfsd/blocklayoutxdr.h b/fs/nfsd/blocklayoutxdr.h
index 15b3569f3d9ad..7d25ef689671f 100644
--- a/fs/nfsd/blocklayoutxdr.h
+++ b/fs/nfsd/blocklayoutxdr.h
@@ -54,9 +54,9 @@ __be32 nfsd4_block_encode_getdeviceinfo(struct xdr_stream *xdr,
const struct nfsd4_getdeviceinfo *gdp);
__be32 nfsd4_block_encode_layoutget(struct xdr_stream *xdr,
const struct nfsd4_layoutget *lgp);
-__be32 nfsd4_block_decode_layoutupdate(__be32 *p, u32 len,
+__be32 nfsd4_block_decode_layoutupdate(struct xdr_stream *xdr,
struct iomap **iomapp, int *nr_iomapsp, u32 block_size);
-__be32 nfsd4_scsi_decode_layoutupdate(__be32 *p, u32 len,
+__be32 nfsd4_scsi_decode_layoutupdate(struct xdr_stream *xdr,
struct iomap **iomapp, int *nr_iomapsp, u32 block_size);
#endif /* _NFSD_BLOCKLAYOUTXDR_H */
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 04b8856d06157..656b2e7d88407 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -2520,7 +2520,7 @@ nfsd4_layoutcommit(struct svc_rqst *rqstp,
lcp->lc_size_chg = false;
}
- nfserr = ops->proc_layoutcommit(inode, lcp);
+ nfserr = ops->proc_layoutcommit(inode, rqstp, lcp);
nfs4_put_stid(&ls->ls_stid);
out:
return nfserr;
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index cbbb61fcdd499..8b68f74a8cf08 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -592,6 +592,8 @@ static __be32
nfsd4_decode_layoutupdate4(struct nfsd4_compoundargs *argp,
struct nfsd4_layoutcommit *lcp)
{
+ u32 len;
+
if (xdr_stream_decode_u32(argp->xdr, &lcp->lc_layout_type) < 0)
return nfserr_bad_xdr;
if (lcp->lc_layout_type < LAYOUT_NFSV4_1_FILES)
@@ -599,13 +601,10 @@ nfsd4_decode_layoutupdate4(struct nfsd4_compoundargs *argp,
if (lcp->lc_layout_type >= LAYOUT_TYPE_MAX)
return nfserr_bad_xdr;
- if (xdr_stream_decode_u32(argp->xdr, &lcp->lc_up_len) < 0)
+ if (xdr_stream_decode_u32(argp->xdr, &len) < 0)
+ return nfserr_bad_xdr;
+ if (!xdr_stream_subsegment(argp->xdr, &lcp->lc_up_layout, len))
return nfserr_bad_xdr;
- if (lcp->lc_up_len > 0) {
- lcp->lc_up_layout = xdr_inline_decode(argp->xdr, lcp->lc_up_len);
- if (!lcp->lc_up_layout)
- return nfserr_bad_xdr;
- }
return nfs_ok;
}
diff --git a/fs/nfsd/pnfs.h b/fs/nfsd/pnfs.h
index 925817f669176..dfd411d1f363f 100644
--- a/fs/nfsd/pnfs.h
+++ b/fs/nfsd/pnfs.h
@@ -35,6 +35,7 @@ struct nfsd4_layout_ops {
const struct nfsd4_layoutget *lgp);
__be32 (*proc_layoutcommit)(struct inode *inode,
+ struct svc_rqst *rqstp,
struct nfsd4_layoutcommit *lcp);
void (*fence_client)(struct nfs4_layout_stateid *ls,
diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h
index e65b552bf5f5a..d4b48602b2b0c 100644
--- a/fs/nfsd/xdr4.h
+++ b/fs/nfsd/xdr4.h
@@ -664,8 +664,7 @@ struct nfsd4_layoutcommit {
u64 lc_last_wr; /* request */
struct timespec64 lc_mtime; /* request */
u32 lc_layout_type; /* request */
- u32 lc_up_len; /* layout length */
- void *lc_up_layout; /* decoded by callback */
+ struct xdr_buf lc_up_layout; /* decoded by callback */
bool lc_size_chg; /* response */
u64 lc_newsize; /* response */
};
--
2.43.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 2/2] NFSD: Fix last write offset handling in layoutcommit
2025-07-21 18:40 [PATCH 0/2] NFSD: Rebase dropped patches for block/scsi layout Sergey Bashirov
2025-07-21 18:40 ` [PATCH 1/2] NFSD: Implement large extent array support in pNFS Sergey Bashirov
@ 2025-07-21 18:40 ` Sergey Bashirov
2025-07-21 18:56 ` Jeff Layton
2025-07-21 19:00 ` [PATCH 0/2] NFSD: Rebase dropped patches for block/scsi layout Chuck Lever
2 siblings, 1 reply; 10+ messages in thread
From: Sergey Bashirov @ 2025-07-21 18:40 UTC (permalink / raw)
To: Chuck Lever
Cc: Jeff Layton, Christoph Hellwig, NeilBrown, Olga Kornievskaia,
Dai Ngo, Tom Talpey, linux-nfs, linux-kernel, Sergey Bashirov,
stable, Konstantin Evtushenko
The data type of loca_last_write_offset is newoffset4 and is switched
on a boolean value, no_newoffset, that indicates if a previous write
occurred or not. If no_newoffset is FALSE, an offset is not given.
This means that client does not try to update the file size. Thus,
server should not try to calculate new file size and check if it fits
into the segment range. See RFC 8881, section 12.5.4.2.
Sometimes the current incorrect logic may cause clients to hang when
trying to sync an inode. If layoutcommit fails, the client marks the
inode as dirty again.
Fixes: 9cf514ccfacb ("nfsd: implement pNFS operations")
Cc: stable@vger.kernel.org
Co-developed-by: Konstantin Evtushenko <koevtushenko@yandex.com>
Signed-off-by: Konstantin Evtushenko <koevtushenko@yandex.com>
Signed-off-by: Sergey Bashirov <sergeybashirov@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
fs/nfsd/blocklayout.c | 5 ++---
fs/nfsd/nfs4proc.c | 30 +++++++++++++++---------------
2 files changed, 17 insertions(+), 18 deletions(-)
diff --git a/fs/nfsd/blocklayout.c b/fs/nfsd/blocklayout.c
index 4c936132eb440..0822d8a119c6f 100644
--- a/fs/nfsd/blocklayout.c
+++ b/fs/nfsd/blocklayout.c
@@ -118,7 +118,6 @@ nfsd4_block_commit_blocks(struct inode *inode, struct nfsd4_layoutcommit *lcp,
struct iomap *iomaps, int nr_iomaps)
{
struct timespec64 mtime = inode_get_mtime(inode);
- loff_t new_size = lcp->lc_last_wr + 1;
struct iattr iattr = { .ia_valid = 0 };
int error;
@@ -128,9 +127,9 @@ nfsd4_block_commit_blocks(struct inode *inode, struct nfsd4_layoutcommit *lcp,
iattr.ia_valid |= ATTR_ATIME | ATTR_CTIME | ATTR_MTIME;
iattr.ia_atime = iattr.ia_ctime = iattr.ia_mtime = lcp->lc_mtime;
- if (new_size > i_size_read(inode)) {
+ if (lcp->lc_size_chg) {
iattr.ia_valid |= ATTR_SIZE;
- iattr.ia_size = new_size;
+ iattr.ia_size = lcp->lc_newsize;
}
error = inode->i_sb->s_export_op->commit_blocks(inode, iomaps,
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 656b2e7d88407..7043fc475458d 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -2475,7 +2475,6 @@ nfsd4_layoutcommit(struct svc_rqst *rqstp,
const struct nfsd4_layout_seg *seg = &lcp->lc_seg;
struct svc_fh *current_fh = &cstate->current_fh;
const struct nfsd4_layout_ops *ops;
- loff_t new_size = lcp->lc_last_wr + 1;
struct inode *inode;
struct nfs4_layout_stateid *ls;
__be32 nfserr;
@@ -2491,13 +2490,21 @@ nfsd4_layoutcommit(struct svc_rqst *rqstp,
goto out;
inode = d_inode(current_fh->fh_dentry);
- nfserr = nfserr_inval;
- if (new_size <= seg->offset)
- goto out;
- if (new_size > seg->offset + seg->length)
- goto out;
- if (!lcp->lc_newoffset && new_size > i_size_read(inode))
- goto out;
+ lcp->lc_size_chg = false;
+ if (lcp->lc_newoffset) {
+ loff_t new_size = lcp->lc_last_wr + 1;
+
+ nfserr = nfserr_inval;
+ if (new_size <= seg->offset)
+ goto out;
+ if (new_size > seg->offset + seg->length)
+ goto out;
+
+ if (new_size > i_size_read(inode)) {
+ lcp->lc_size_chg = true;
+ lcp->lc_newsize = new_size;
+ }
+ }
nfserr = nfsd4_preprocess_layout_stateid(rqstp, cstate, &lcp->lc_sid,
false, lcp->lc_layout_type,
@@ -2513,13 +2520,6 @@ nfsd4_layoutcommit(struct svc_rqst *rqstp,
/* LAYOUTCOMMIT does not require any serialization */
mutex_unlock(&ls->ls_mutex);
- if (new_size > i_size_read(inode)) {
- lcp->lc_size_chg = true;
- lcp->lc_newsize = new_size;
- } else {
- lcp->lc_size_chg = false;
- }
-
nfserr = ops->proc_layoutcommit(inode, rqstp, lcp);
nfs4_put_stid(&ls->ls_stid);
out:
--
2.43.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] NFSD: Fix last write offset handling in layoutcommit
2025-07-21 18:40 ` [PATCH 2/2] NFSD: Fix last write offset handling in layoutcommit Sergey Bashirov
@ 2025-07-21 18:56 ` Jeff Layton
0 siblings, 0 replies; 10+ messages in thread
From: Jeff Layton @ 2025-07-21 18:56 UTC (permalink / raw)
To: Sergey Bashirov, Chuck Lever
Cc: Christoph Hellwig, NeilBrown, Olga Kornievskaia, Dai Ngo,
Tom Talpey, linux-nfs, linux-kernel, stable,
Konstantin Evtushenko
On Mon, 2025-07-21 at 21:40 +0300, Sergey Bashirov wrote:
> The data type of loca_last_write_offset is newoffset4 and is switched
> on a boolean value, no_newoffset, that indicates if a previous write
> occurred or not. If no_newoffset is FALSE, an offset is not given.
Gah, what a confusing label in the spec. I went down the rabbit hole of
trying to understand why "no_newoffset" being TRUE means that you do
get an offset. The "no_" in this case is just a prefix for "union
newoffset", and not a negation of the flag.
> This means that client does not try to update the file size. Thus,
> server should not try to calculate new file size and check if it fits
> into the segment range. See RFC 8881, section 12.5.4.2.
>
> Sometimes the current incorrect logic may cause clients to hang when
> trying to sync an inode. If layoutcommit fails, the client marks the
> inode as dirty again.
>
> Fixes: 9cf514ccfacb ("nfsd: implement pNFS operations")
> Cc: stable@vger.kernel.org
> Co-developed-by: Konstantin Evtushenko <koevtushenko@yandex.com>
> Signed-off-by: Konstantin Evtushenko <koevtushenko@yandex.com>
> Signed-off-by: Sergey Bashirov <sergeybashirov@gmail.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> ---
> fs/nfsd/blocklayout.c | 5 ++---
> fs/nfsd/nfs4proc.c | 30 +++++++++++++++---------------
> 2 files changed, 17 insertions(+), 18 deletions(-)
>
> diff --git a/fs/nfsd/blocklayout.c b/fs/nfsd/blocklayout.c
> index 4c936132eb440..0822d8a119c6f 100644
> --- a/fs/nfsd/blocklayout.c
> +++ b/fs/nfsd/blocklayout.c
> @@ -118,7 +118,6 @@ nfsd4_block_commit_blocks(struct inode *inode, struct nfsd4_layoutcommit *lcp,
> struct iomap *iomaps, int nr_iomaps)
> {
> struct timespec64 mtime = inode_get_mtime(inode);
> - loff_t new_size = lcp->lc_last_wr + 1;
> struct iattr iattr = { .ia_valid = 0 };
> int error;
>
> @@ -128,9 +127,9 @@ nfsd4_block_commit_blocks(struct inode *inode, struct nfsd4_layoutcommit *lcp,
> iattr.ia_valid |= ATTR_ATIME | ATTR_CTIME | ATTR_MTIME;
> iattr.ia_atime = iattr.ia_ctime = iattr.ia_mtime = lcp->lc_mtime;
>
> - if (new_size > i_size_read(inode)) {
> + if (lcp->lc_size_chg) {
> iattr.ia_valid |= ATTR_SIZE;
> - iattr.ia_size = new_size;
> + iattr.ia_size = lcp->lc_newsize;
> }
>
> error = inode->i_sb->s_export_op->commit_blocks(inode, iomaps,
> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> index 656b2e7d88407..7043fc475458d 100644
> --- a/fs/nfsd/nfs4proc.c
> +++ b/fs/nfsd/nfs4proc.c
> @@ -2475,7 +2475,6 @@ nfsd4_layoutcommit(struct svc_rqst *rqstp,
> const struct nfsd4_layout_seg *seg = &lcp->lc_seg;
> struct svc_fh *current_fh = &cstate->current_fh;
> const struct nfsd4_layout_ops *ops;
> - loff_t new_size = lcp->lc_last_wr + 1;
> struct inode *inode;
> struct nfs4_layout_stateid *ls;
> __be32 nfserr;
> @@ -2491,13 +2490,21 @@ nfsd4_layoutcommit(struct svc_rqst *rqstp,
> goto out;
> inode = d_inode(current_fh->fh_dentry);
>
> - nfserr = nfserr_inval;
> - if (new_size <= seg->offset)
> - goto out;
> - if (new_size > seg->offset + seg->length)
> - goto out;
> - if (!lcp->lc_newoffset && new_size > i_size_read(inode))
> - goto out;
> + lcp->lc_size_chg = false;
> + if (lcp->lc_newoffset) {
> + loff_t new_size = lcp->lc_last_wr + 1;
> +
> + nfserr = nfserr_inval;
> + if (new_size <= seg->offset)
> + goto out;
> + if (new_size > seg->offset + seg->length)
> + goto out;
> +
> + if (new_size > i_size_read(inode)) {
> + lcp->lc_size_chg = true;
> + lcp->lc_newsize = new_size;
> + }
> + }
>
> nfserr = nfsd4_preprocess_layout_stateid(rqstp, cstate, &lcp->lc_sid,
> false, lcp->lc_layout_type,
> @@ -2513,13 +2520,6 @@ nfsd4_layoutcommit(struct svc_rqst *rqstp,
> /* LAYOUTCOMMIT does not require any serialization */
> mutex_unlock(&ls->ls_mutex);
>
> - if (new_size > i_size_read(inode)) {
> - lcp->lc_size_chg = true;
> - lcp->lc_newsize = new_size;
> - } else {
> - lcp->lc_size_chg = false;
> - }
> -
> nfserr = ops->proc_layoutcommit(inode, rqstp, lcp);
> nfs4_put_stid(&ls->ls_stid);
> out:
Code looks correct to me though.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 0/2] NFSD: Rebase dropped patches for block/scsi layout
2025-07-21 18:40 [PATCH 0/2] NFSD: Rebase dropped patches for block/scsi layout Sergey Bashirov
2025-07-21 18:40 ` [PATCH 1/2] NFSD: Implement large extent array support in pNFS Sergey Bashirov
2025-07-21 18:40 ` [PATCH 2/2] NFSD: Fix last write offset handling in layoutcommit Sergey Bashirov
@ 2025-07-21 19:00 ` Chuck Lever
2 siblings, 0 replies; 10+ messages in thread
From: Chuck Lever @ 2025-07-21 19:00 UTC (permalink / raw)
To: Sergey Bashirov
Cc: Chuck Lever, Jeff Layton, Christoph Hellwig, NeilBrown,
Olga Kornievskaia, Dai Ngo, Tom Talpey, linux-nfs, linux-kernel
From: Chuck Lever <chuck.lever@oracle.com>
On Mon, 21 Jul 2025 21:40:54 +0300, Sergey Bashirov wrote:
> This series resubmits two patches that were dropped due to conflicts
> after the deviceid4 handling rework. The first patch is rebased. Only
> the decoding of bex.vol_id, which has the deviceid4 type, is updated.
> The second patch doesn't change, it just depends on the first one.
>
>
Applied to nfsd-testing, thanks!
[1/2] NFSD: Implement large extent array support in pNFS
commit: 0c8cc94a6dc2b200617094c0016cd46f01c12490
[2/2] NFSD: Fix last write offset handling in layoutcommit
commit: f91f48cedc8ef26d55a548e4e3879070d06906f6
--
Chuck Lever
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2025-07-21 19:00 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-21 18:40 [PATCH 0/2] NFSD: Rebase dropped patches for block/scsi layout Sergey Bashirov
2025-07-21 18:40 ` [PATCH 1/2] NFSD: Implement large extent array support in pNFS Sergey Bashirov
2025-07-21 18:40 ` [PATCH 2/2] NFSD: Fix last write offset handling in layoutcommit Sergey Bashirov
2025-07-21 18:56 ` Jeff Layton
2025-07-21 19:00 ` [PATCH 0/2] NFSD: Rebase dropped patches for block/scsi layout Chuck Lever
-- strict thread matches above, loose matches on Subject: below --
2025-07-04 11:49 [PATCH 0/2] NFSD: Fix last write offset handling in layoutcommit Sergey Bashirov
2025-07-04 11:49 ` [PATCH 2/2] " Sergey Bashirov
2025-07-04 16:29 ` Chuck Lever
2025-07-05 6:16 ` Sergey Bashirov
2025-07-06 15:01 ` Chuck Lever
2025-07-10 7:27 ` Christoph Hellwig
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).