linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] netfs: Fix unbuffered write error handling
@ 2025-08-14 21:45 David Howells
  2025-08-15 13:57 ` Christian Brauner
  2025-08-15 16:09 ` Paulo Alcantara
  0 siblings, 2 replies; 3+ messages in thread
From: David Howells @ 2025-08-14 21:45 UTC (permalink / raw)
  To: Christian Brauner, Paulo Alcantara, Steve French
  Cc: dhowells, Xiaoli Feng, Shyam Prasad N, netfs, linux-cifs,
	linux-fsdevel, stable, linux-kernel

If all the subrequests in an unbuffered write stream fail, the subrequest
collector doesn't update the stream->transferred value and it retains its
initial LONG_MAX value.  Unfortunately, if all active streams fail, then we
take the smallest value of { LONG_MAX, LONG_MAX, ... } as the value to set
in wreq->transferred - which is then returned from ->write_iter().

LONG_MAX was chosen as the initial value so that all the streams can be
quickly assessed by taking the smallest value of all stream->transferred -
but this only works if we've set any of them.

Fix this by adding a flag to indicate whether the value in
stream->transferred is valid and checking that when we integrate the
values.  stream->transferred can then be initialised to zero.

This was found by running the generic/750 xfstest against cifs with
cache=none.  It splices data to the target file.  Once (if) it has used up
all the available scratch space, the writes start failing with ENOSPC.
This causes ->write_iter() to fail.  However, it was returning
wreq->transferred, i.e. LONG_MAX, rather than an error (because it thought
the amount transferred was non-zero) and iter_file_splice_write() would
then try to clean up that amount of pipe bufferage - leading to an oops
when it overran.  The kernel log showed:

    CIFS: VFS: Send error in write = -28

followed by:

    BUG: kernel NULL pointer dereference, address: 0000000000000008

with:

    RIP: 0010:iter_file_splice_write+0x3a4/0x520
    do_splice+0x197/0x4e0

or:

    RIP: 0010:pipe_buf_release (include/linux/pipe_fs_i.h:282)
    iter_file_splice_write (fs/splice.c:755)

Also put a warning check into splice to announce if ->write_iter() returned
that it had written more than it was asked to.

Fixes: 288ace2f57c9 ("netfs: New writeback implementation")
Reported-by: Xiaoli Feng <fengxiaoli0714@gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220445
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Paulo Alcantara <pc@manguebit.org>
cc: Steve French <sfrench@samba.org>
cc: Shyam Prasad N <sprasad@microsoft.com>
cc: netfs@lists.linux.dev
cc: linux-cifs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
cc: stable@vger.kernel.org
---
 fs/netfs/read_collect.c  |    4 +++-
 fs/netfs/write_collect.c |   10 ++++++++--
 fs/netfs/write_issue.c   |    4 ++--
 fs/splice.c              |    3 +++
 include/linux/netfs.h    |    1 +
 5 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/fs/netfs/read_collect.c b/fs/netfs/read_collect.c
index 3e804da1e1eb..a95e7aadafd0 100644
--- a/fs/netfs/read_collect.c
+++ b/fs/netfs/read_collect.c
@@ -281,8 +281,10 @@ static void netfs_collect_read_results(struct netfs_io_request *rreq)
 		} else if (test_bit(NETFS_RREQ_SHORT_TRANSFER, &rreq->flags)) {
 			notes |= MADE_PROGRESS;
 		} else {
-			if (!stream->failed)
+			if (!stream->failed) {
 				stream->transferred += transferred;
+				stream->transferred_valid = true;
+			}
 			if (front->transferred < front->len)
 				set_bit(NETFS_RREQ_SHORT_TRANSFER, &rreq->flags);
 			notes |= MADE_PROGRESS;
diff --git a/fs/netfs/write_collect.c b/fs/netfs/write_collect.c
index 0f3a36852a4d..cbf3d9194c7b 100644
--- a/fs/netfs/write_collect.c
+++ b/fs/netfs/write_collect.c
@@ -254,6 +254,7 @@ static void netfs_collect_write_results(struct netfs_io_request *wreq)
 			if (front->start + front->transferred > stream->collected_to) {
 				stream->collected_to = front->start + front->transferred;
 				stream->transferred = stream->collected_to - wreq->start;
+				stream->transferred_valid = true;
 				notes |= MADE_PROGRESS;
 			}
 			if (test_bit(NETFS_SREQ_FAILED, &front->flags)) {
@@ -356,6 +357,7 @@ bool netfs_write_collection(struct netfs_io_request *wreq)
 {
 	struct netfs_inode *ictx = netfs_inode(wreq->inode);
 	size_t transferred;
+	bool transferred_valid = false;
 	int s;
 
 	_enter("R=%x", wreq->debug_id);
@@ -376,12 +378,16 @@ bool netfs_write_collection(struct netfs_io_request *wreq)
 			continue;
 		if (!list_empty(&stream->subrequests))
 			return false;
-		if (stream->transferred < transferred)
+		if (stream->transferred_valid &&
+		    stream->transferred < transferred) {
 			transferred = stream->transferred;
+			transferred_valid = true;
+		}
 	}
 
 	/* Okay, declare that all I/O is complete. */
-	wreq->transferred = transferred;
+	if (transferred_valid)
+		wreq->transferred = transferred;
 	trace_netfs_rreq(wreq, netfs_rreq_trace_write_done);
 
 	if (wreq->io_streams[1].active &&
diff --git a/fs/netfs/write_issue.c b/fs/netfs/write_issue.c
index 50bee2c4130d..0584cba1a043 100644
--- a/fs/netfs/write_issue.c
+++ b/fs/netfs/write_issue.c
@@ -118,12 +118,12 @@ struct netfs_io_request *netfs_create_write_req(struct address_space *mapping,
 	wreq->io_streams[0].prepare_write	= ictx->ops->prepare_write;
 	wreq->io_streams[0].issue_write		= ictx->ops->issue_write;
 	wreq->io_streams[0].collected_to	= start;
-	wreq->io_streams[0].transferred		= LONG_MAX;
+	wreq->io_streams[0].transferred		= 0;
 
 	wreq->io_streams[1].stream_nr		= 1;
 	wreq->io_streams[1].source		= NETFS_WRITE_TO_CACHE;
 	wreq->io_streams[1].collected_to	= start;
-	wreq->io_streams[1].transferred		= LONG_MAX;
+	wreq->io_streams[1].transferred		= 0;
 	if (fscache_resources_valid(&wreq->cache_resources)) {
 		wreq->io_streams[1].avail	= true;
 		wreq->io_streams[1].active	= true;
diff --git a/fs/splice.c b/fs/splice.c
index 4d6df083e0c0..f5094b6d00a0 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -739,6 +739,9 @@ iter_file_splice_write(struct pipe_inode_info *pipe, struct file *out,
 		sd.pos = kiocb.ki_pos;
 		if (ret <= 0)
 			break;
+		WARN_ONCE(ret > sd.total_len - left,
+			  "Splice Exceeded! ret=%zd tot=%zu left=%zu\n",
+			  ret, sd.total_len, left);
 
 		sd.num_spliced += ret;
 		sd.total_len -= ret;
diff --git a/include/linux/netfs.h b/include/linux/netfs.h
index 185bd8196503..98c96d649bf9 100644
--- a/include/linux/netfs.h
+++ b/include/linux/netfs.h
@@ -150,6 +150,7 @@ struct netfs_io_stream {
 	bool			active;		/* T if stream is active */
 	bool			need_retry;	/* T if this stream needs retrying */
 	bool			failed;		/* T if this stream failed */
+	bool			transferred_valid; /* T is ->transferred is valid */
 };
 
 /*


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] netfs: Fix unbuffered write error handling
  2025-08-14 21:45 [PATCH] netfs: Fix unbuffered write error handling David Howells
@ 2025-08-15 13:57 ` Christian Brauner
  2025-08-15 16:09 ` Paulo Alcantara
  1 sibling, 0 replies; 3+ messages in thread
From: Christian Brauner @ 2025-08-15 13:57 UTC (permalink / raw)
  To: David Howells
  Cc: Christian Brauner, Xiaoli Feng, Shyam Prasad N, netfs, linux-cifs,
	linux-fsdevel, stable, linux-kernel, Paulo Alcantara,
	Steve French

On Thu, 14 Aug 2025 22:45:50 +0100, David Howells wrote:
> If all the subrequests in an unbuffered write stream fail, the subrequest
> collector doesn't update the stream->transferred value and it retains its
> initial LONG_MAX value.  Unfortunately, if all active streams fail, then we
> take the smallest value of { LONG_MAX, LONG_MAX, ... } as the value to set
> in wreq->transferred - which is then returned from ->write_iter().
> 
> LONG_MAX was chosen as the initial value so that all the streams can be
> quickly assessed by taking the smallest value of all stream->transferred -
> but this only works if we've set any of them.
> 
> [...]

Applied to the vfs.fixes branch of the vfs/vfs.git tree.
Patches in the vfs.fixes branch should appear in linux-next soon.

Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.

It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.

Note that commit hashes shown below are subject to change due to rebase,
trailer updates or similar. If in doubt, please check the listed branch.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: vfs.fixes

[1/1] netfs: Fix unbuffered write error handling
      https://git.kernel.org/vfs/vfs/c/a3de58b12ce0

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] netfs: Fix unbuffered write error handling
  2025-08-14 21:45 [PATCH] netfs: Fix unbuffered write error handling David Howells
  2025-08-15 13:57 ` Christian Brauner
@ 2025-08-15 16:09 ` Paulo Alcantara
  1 sibling, 0 replies; 3+ messages in thread
From: Paulo Alcantara @ 2025-08-15 16:09 UTC (permalink / raw)
  To: David Howells, Christian Brauner, Steve French
  Cc: dhowells, Xiaoli Feng, Shyam Prasad N, netfs, linux-cifs,
	linux-fsdevel, stable, linux-kernel

David Howells <dhowells@redhat.com> writes:

> If all the subrequests in an unbuffered write stream fail, the subrequest
> collector doesn't update the stream->transferred value and it retains its
> initial LONG_MAX value.  Unfortunately, if all active streams fail, then we
> take the smallest value of { LONG_MAX, LONG_MAX, ... } as the value to set
> in wreq->transferred - which is then returned from ->write_iter().
>
> LONG_MAX was chosen as the initial value so that all the streams can be
> quickly assessed by taking the smallest value of all stream->transferred -
> but this only works if we've set any of them.
>
> Fix this by adding a flag to indicate whether the value in
> stream->transferred is valid and checking that when we integrate the
> values.  stream->transferred can then be initialised to zero.
>
> This was found by running the generic/750 xfstest against cifs with
> cache=none.  It splices data to the target file.  Once (if) it has used up
> all the available scratch space, the writes start failing with ENOSPC.
> This causes ->write_iter() to fail.  However, it was returning
> wreq->transferred, i.e. LONG_MAX, rather than an error (because it thought
> the amount transferred was non-zero) and iter_file_splice_write() would
> then try to clean up that amount of pipe bufferage - leading to an oops
> when it overran.  The kernel log showed:
>
>     CIFS: VFS: Send error in write = -28
>
> followed by:
>
>     BUG: kernel NULL pointer dereference, address: 0000000000000008
>
> with:
>
>     RIP: 0010:iter_file_splice_write+0x3a4/0x520
>     do_splice+0x197/0x4e0
>
> or:
>
>     RIP: 0010:pipe_buf_release (include/linux/pipe_fs_i.h:282)
>     iter_file_splice_write (fs/splice.c:755)
>
> Also put a warning check into splice to announce if ->write_iter() returned
> that it had written more than it was asked to.
>
> Fixes: 288ace2f57c9 ("netfs: New writeback implementation")
> Reported-by: Xiaoli Feng <fengxiaoli0714@gmail.com>
> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220445
> Signed-off-by: David Howells <dhowells@redhat.com>
> cc: Paulo Alcantara <pc@manguebit.org>
> cc: Steve French <sfrench@samba.org>
> cc: Shyam Prasad N <sprasad@microsoft.com>
> cc: netfs@lists.linux.dev
> cc: linux-cifs@vger.kernel.org
> cc: linux-fsdevel@vger.kernel.org
> cc: stable@vger.kernel.org
> ---
>  fs/netfs/read_collect.c  |    4 +++-
>  fs/netfs/write_collect.c |   10 ++++++++--
>  fs/netfs/write_issue.c   |    4 ++--
>  fs/splice.c              |    3 +++
>  include/linux/netfs.h    |    1 +
>  5 files changed, 17 insertions(+), 5 deletions(-)

Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2025-08-15 16:09 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-14 21:45 [PATCH] netfs: Fix unbuffered write error handling David Howells
2025-08-15 13:57 ` Christian Brauner
2025-08-15 16:09 ` Paulo Alcantara

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).