netfs.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
* -ENODATA from read syscall on 9p
@ 2025-10-08 17:54 Kent Overstreet
  2025-10-11 14:35 ` Tingmao Wang
  0 siblings, 1 reply; 3+ messages in thread
From: Kent Overstreet @ 2025-10-08 17:54 UTC (permalink / raw)
  To: v9fs, netfs, linux-fsdevel
  Cc: David Howells, Eric Van Hensbergen, Latchesar Ionkov,
	Dominique Martinet

So I recently rebased my xfstests branch and started seeing quite the
strange test failures:

00891     +cat: /ktest-out/xfstests/generic/036.dmesg: No data available

No idea why a userspace update would expose this, it's a kernel bug - in
the main netfs/9p read path, no less. Upon further investigation, cat is
indeed receiving -ENODATA from a read syscall.

No, read(2) is not allowed to return -ENODATA...

Upon further investigation, the error is generated in
netfs_read_subreq_terminated():

netfs_wait_for_in_progress (rreq=rreq@entry=0xffff8881032f3980, collector=0xffffffff81542240 <netfs_read_collection>) at /home/kent/linux/fs/netfs/misc.c:468
468             BUG_ON(ret == -ENODATA);
(gdb) bt
#0  netfs_wait_for_in_progress (rreq=rreq@entry=0xffff8881032f3980, collector=0xffffffff81542240 <netfs_read_collection>) at /home/kent/linux/fs/netfs/misc.c:468
#1  0xffffffff815412f5 in netfs_wait_for_read (rreq=rreq@entry=0xffff8881032f3980) at /home/kent/linux/fs/netfs/misc.c:492
#2  0xffffffff8153ca41 in netfs_unbuffered_read (rreq=0xffff8881032f3980, sync=<optimized out>) at /home/kent/linux/fs/netfs/direct_read.c:153
#3  netfs_unbuffered_read_iter_locked (iocb=iocb@entry=0xffffc90003a17e98, iter=iter@entry=0xffffc90003a17e70) at /home/kent/linux/fs/netfs/direct_read.c:234
#4  0xffffffff8153cb25 in netfs_unbuffered_read_iter (iocb=0xffffc90003a17e98, iter=0xffffc90003a17e70) at /home/kent/linux/fs/netfs/direct_read.c:272
#5  0xffffffff81498fb0 in new_sync_read (filp=0xffff88811234c180, buf=0x0, len=2147479552, ppos=0xffffc90003a17f00) at /home/kent/linux/fs/read_write.c:491
#6  vfs_read (file=file@entry=0xffff88811234c180, buf=buf@entry=0x7f0ab1767000 <error: Cannot access memory at address 0x7f0ab1767000>, count=count@entry=262144, pos=pos@entry=0xffffc90003a17f00) at /home/kent/linux/fs/read_write.c:572
#7  0xffffffff81499a2a in ksys_read (fd=<optimized out>, buf=0x7f0ab1767000 <error: Cannot access memory at address 0x7f0ab1767000>, count=262144) at /home/kent/linux/fs/read_write.c:717
#8  0xffffffff81bce8cc in do_syscall_x64 (regs=0xffffc90003a17f58, nr=<optimized out>) at /home/kent/linux/arch/x86/entry/syscall_64.c:63
#9  do_syscall_64 (regs=0xffffc90003a17f58, nr=<optimized out>) at /home/kent/linux/arch/x86/entry/syscall_64.c:94
#10 0xffffffff810000b0 in entry_SYSCALL_64 () at /home/kent/linux/arch/x86/entry/entry_64.S:121

void netfs_read_subreq_terminated(struct netfs_io_subrequest *subreq)
{
	struct netfs_io_request *rreq = subreq->rreq;

	switch (subreq->source) {
	case NETFS_READ_FROM_CACHE:
		netfs_stat(&netfs_n_rh_read_done);
		break;
	case NETFS_DOWNLOAD_FROM_SERVER:
		netfs_stat(&netfs_n_rh_download_done);
		break;
	default:
		break;
	}

	/* Deal with retry requests, short reads and errors.  If we retry
	 * but don't make progress, we abandon the attempt.
	 */
	if (!subreq->error && subreq->transferred < subreq->len) {
		if (test_bit(NETFS_SREQ_HIT_EOF, &subreq->flags)) {
			trace_netfs_sreq(subreq, netfs_sreq_trace_hit_eof);
		} else if (test_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags)) {
			trace_netfs_sreq(subreq, netfs_sreq_trace_need_clear);
		} else if (test_bit(NETFS_SREQ_NEED_RETRY, &subreq->flags)) {
			trace_netfs_sreq(subreq, netfs_sreq_trace_need_retry);
		} else if (test_bit(NETFS_SREQ_MADE_PROGRESS, &subreq->flags)) {
			__set_bit(NETFS_SREQ_NEED_RETRY, &subreq->flags);
			trace_netfs_sreq(subreq, netfs_sreq_trace_partial_read);
		} else {
			BUG();								<- ???
			__set_bit(NETFS_SREQ_FAILED, &subreq->flags);
			subreq->error = -ENODATA;
			trace_netfs_sreq(subreq, netfs_sreq_trace_short);
		}
	}

	if (unlikely(subreq->error < 0)) {
		trace_netfs_failure(rreq, subreq, subreq->error, netfs_fail_read);
		if (subreq->source == NETFS_READ_FROM_CACHE) {
			netfs_stat(&netfs_n_rh_read_failed);
			__set_bit(NETFS_SREQ_NEED_RETRY, &subreq->flags);
		} else {
			netfs_stat(&netfs_n_rh_download_failed);
			__set_bit(NETFS_SREQ_FAILED, &subreq->flags);
		}
		trace_netfs_rreq(rreq, netfs_rreq_trace_set_pause);
		set_bit(NETFS_RREQ_PAUSE, &rreq->flags);
	}

	trace_netfs_sreq(subreq, netfs_sreq_trace_terminated);
	netfs_subreq_clear_in_progress(subreq);
	netfs_put_subrequest(subreq, netfs_sreq_trace_put_terminated);
}

So, the underlying transport doesn't appear to be making forward
progress - IOW, this would appear to be a 9p bug - and then netfs
instead of a WARN() or doing anything to let people know that there's a
bug and where to look for it, returns a nonstandard error code to
userspace - fun.

Of course, this being a read, short reads are expected; another thought
is to wonder why netfs has decided that it should decide this particular
short read is unexpected instead of leaving the i_size checks to the
underlying filesystem.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: -ENODATA from read syscall on 9p
  2025-10-08 17:54 -ENODATA from read syscall on 9p Kent Overstreet
@ 2025-10-11 14:35 ` Tingmao Wang
  2025-10-11 19:40   ` Dominique Martinet
  0 siblings, 1 reply; 3+ messages in thread
From: Tingmao Wang @ 2025-10-11 14:35 UTC (permalink / raw)
  To: Kent Overstreet, Dominique Martinet
  Cc: v9fs, netfs, linux-fsdevel, David Howells, Eric Van Hensbergen,
	Latchesar Ionkov

On 10/8/25 18:54, Kent Overstreet wrote:
> So I recently rebased my xfstests branch and started seeing quite the
> strange test failures:
> 
> 00891     +cat: /ktest-out/xfstests/generic/036.dmesg: No data available
> 
> No idea why a userspace update would expose this, it's a kernel bug - in
> the main netfs/9p read path, no less. Upon further investigation, cat is
> indeed receiving -ENODATA from a read syscall.
> [...]

Hi Kent,

Not a 9pfs maintainer here, but I think I have encountered this in the
past but I didn't think too much of it.  Which kernel version are you
testing on?  A while ago I sent a patch to fix some stale metadata
issue on uncached 9pfs, and one of the symptom was -ENODATA from a read:
https://lore.kernel.org/all/cover.1743956147.git.m@maowtm.org/

Basically, if some other process has a 9pfs file open, and the file
shrinks on the server side, the inode's i_size is not updated when another
process tries to read it, and the result is -ENODATA (instead of reporting
a normal EOF).

Does this sound like it could be happening in your situation?  This patch
series should land in 6.18, so if this was not reproduced on -next it
might be worth a try?

I hope this information is helpful :)

Tingmao

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: -ENODATA from read syscall on 9p
  2025-10-11 14:35 ` Tingmao Wang
@ 2025-10-11 19:40   ` Dominique Martinet
  0 siblings, 0 replies; 3+ messages in thread
From: Dominique Martinet @ 2025-10-11 19:40 UTC (permalink / raw)
  To: Kent Overstreet, Tingmao Wang
  Cc: v9fs, netfs, linux-fsdevel, David Howells, Eric Van Hensbergen,
	Latchesar Ionkov

Tingmao Wang wrote on Sat, Oct 11, 2025 at 03:35:00PM +0100:
> Not a 9pfs maintainer here, but I think I have encountered this in the
> past but I didn't think too much of it.  Which kernel version are you
> testing on?  A while ago I sent a patch to fix some stale metadata
> issue on uncached 9pfs, and one of the symptom was -ENODATA from a read:
> https://lore.kernel.org/all/cover.1743956147.git.m@maowtm.org/
> 
> Basically, if some other process has a 9pfs file open, and the file
> shrinks on the server side, the inode's i_size is not updated when another
> process tries to read it, and the result is -ENODATA (instead of reporting
> a normal EOF).
> 
> Does this sound like it could be happening in your situation?  This patch
> series should land in 6.18, so if this was not reproduced on -next it
> might be worth a try?

It got merged in yesterday


With that said I'm also curious if that's the reason 9p reads stopped
progressing, but even with this patch I think there'd be a window for
files to shrink while the read is happening so netfs needs to return a
short read anyway -- if the file really is being modified under us it's
possible to hit end of file early.

OTOH I don't think that's what's happening here though, as Kent is
likely just running xfstest on its own in a directory...
You says these errors just started happening recently?
How recently are you talking?
I doubt it's been months but the only recent changes I see in this area
would be the netfs i_size updating patches early July.. If it's more
recent than that there's something else I didn't see anything obvious,
having a rough range to look at would be welcome for closer inspection.

-- 
Dominique Martinet | Asmadeus

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2025-10-11 19:40 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-08 17:54 -ENODATA from read syscall on 9p Kent Overstreet
2025-10-11 14:35 ` Tingmao Wang
2025-10-11 19:40   ` Dominique Martinet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).