* -ENODATA from read syscall on 9p
@ 2025-10-08 17:54 Kent Overstreet
2025-10-11 14:35 ` Tingmao Wang
0 siblings, 1 reply; 3+ messages in thread
From: Kent Overstreet @ 2025-10-08 17:54 UTC (permalink / raw)
To: v9fs, netfs, linux-fsdevel
Cc: David Howells, Eric Van Hensbergen, Latchesar Ionkov,
Dominique Martinet
So I recently rebased my xfstests branch and started seeing quite the
strange test failures:
00891 +cat: /ktest-out/xfstests/generic/036.dmesg: No data available
No idea why a userspace update would expose this, it's a kernel bug - in
the main netfs/9p read path, no less. Upon further investigation, cat is
indeed receiving -ENODATA from a read syscall.
No, read(2) is not allowed to return -ENODATA...
Upon further investigation, the error is generated in
netfs_read_subreq_terminated():
netfs_wait_for_in_progress (rreq=rreq@entry=0xffff8881032f3980, collector=0xffffffff81542240 <netfs_read_collection>) at /home/kent/linux/fs/netfs/misc.c:468
468 BUG_ON(ret == -ENODATA);
(gdb) bt
#0 netfs_wait_for_in_progress (rreq=rreq@entry=0xffff8881032f3980, collector=0xffffffff81542240 <netfs_read_collection>) at /home/kent/linux/fs/netfs/misc.c:468
#1 0xffffffff815412f5 in netfs_wait_for_read (rreq=rreq@entry=0xffff8881032f3980) at /home/kent/linux/fs/netfs/misc.c:492
#2 0xffffffff8153ca41 in netfs_unbuffered_read (rreq=0xffff8881032f3980, sync=<optimized out>) at /home/kent/linux/fs/netfs/direct_read.c:153
#3 netfs_unbuffered_read_iter_locked (iocb=iocb@entry=0xffffc90003a17e98, iter=iter@entry=0xffffc90003a17e70) at /home/kent/linux/fs/netfs/direct_read.c:234
#4 0xffffffff8153cb25 in netfs_unbuffered_read_iter (iocb=0xffffc90003a17e98, iter=0xffffc90003a17e70) at /home/kent/linux/fs/netfs/direct_read.c:272
#5 0xffffffff81498fb0 in new_sync_read (filp=0xffff88811234c180, buf=0x0, len=2147479552, ppos=0xffffc90003a17f00) at /home/kent/linux/fs/read_write.c:491
#6 vfs_read (file=file@entry=0xffff88811234c180, buf=buf@entry=0x7f0ab1767000 <error: Cannot access memory at address 0x7f0ab1767000>, count=count@entry=262144, pos=pos@entry=0xffffc90003a17f00) at /home/kent/linux/fs/read_write.c:572
#7 0xffffffff81499a2a in ksys_read (fd=<optimized out>, buf=0x7f0ab1767000 <error: Cannot access memory at address 0x7f0ab1767000>, count=262144) at /home/kent/linux/fs/read_write.c:717
#8 0xffffffff81bce8cc in do_syscall_x64 (regs=0xffffc90003a17f58, nr=<optimized out>) at /home/kent/linux/arch/x86/entry/syscall_64.c:63
#9 do_syscall_64 (regs=0xffffc90003a17f58, nr=<optimized out>) at /home/kent/linux/arch/x86/entry/syscall_64.c:94
#10 0xffffffff810000b0 in entry_SYSCALL_64 () at /home/kent/linux/arch/x86/entry/entry_64.S:121
void netfs_read_subreq_terminated(struct netfs_io_subrequest *subreq)
{
struct netfs_io_request *rreq = subreq->rreq;
switch (subreq->source) {
case NETFS_READ_FROM_CACHE:
netfs_stat(&netfs_n_rh_read_done);
break;
case NETFS_DOWNLOAD_FROM_SERVER:
netfs_stat(&netfs_n_rh_download_done);
break;
default:
break;
}
/* Deal with retry requests, short reads and errors. If we retry
* but don't make progress, we abandon the attempt.
*/
if (!subreq->error && subreq->transferred < subreq->len) {
if (test_bit(NETFS_SREQ_HIT_EOF, &subreq->flags)) {
trace_netfs_sreq(subreq, netfs_sreq_trace_hit_eof);
} else if (test_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags)) {
trace_netfs_sreq(subreq, netfs_sreq_trace_need_clear);
} else if (test_bit(NETFS_SREQ_NEED_RETRY, &subreq->flags)) {
trace_netfs_sreq(subreq, netfs_sreq_trace_need_retry);
} else if (test_bit(NETFS_SREQ_MADE_PROGRESS, &subreq->flags)) {
__set_bit(NETFS_SREQ_NEED_RETRY, &subreq->flags);
trace_netfs_sreq(subreq, netfs_sreq_trace_partial_read);
} else {
BUG(); <- ???
__set_bit(NETFS_SREQ_FAILED, &subreq->flags);
subreq->error = -ENODATA;
trace_netfs_sreq(subreq, netfs_sreq_trace_short);
}
}
if (unlikely(subreq->error < 0)) {
trace_netfs_failure(rreq, subreq, subreq->error, netfs_fail_read);
if (subreq->source == NETFS_READ_FROM_CACHE) {
netfs_stat(&netfs_n_rh_read_failed);
__set_bit(NETFS_SREQ_NEED_RETRY, &subreq->flags);
} else {
netfs_stat(&netfs_n_rh_download_failed);
__set_bit(NETFS_SREQ_FAILED, &subreq->flags);
}
trace_netfs_rreq(rreq, netfs_rreq_trace_set_pause);
set_bit(NETFS_RREQ_PAUSE, &rreq->flags);
}
trace_netfs_sreq(subreq, netfs_sreq_trace_terminated);
netfs_subreq_clear_in_progress(subreq);
netfs_put_subrequest(subreq, netfs_sreq_trace_put_terminated);
}
So, the underlying transport doesn't appear to be making forward
progress - IOW, this would appear to be a 9p bug - and then netfs
instead of a WARN() or doing anything to let people know that there's a
bug and where to look for it, returns a nonstandard error code to
userspace - fun.
Of course, this being a read, short reads are expected; another thought
is to wonder why netfs has decided that it should decide this particular
short read is unexpected instead of leaving the i_size checks to the
underlying filesystem.
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: -ENODATA from read syscall on 9p
2025-10-08 17:54 -ENODATA from read syscall on 9p Kent Overstreet
@ 2025-10-11 14:35 ` Tingmao Wang
2025-10-11 19:40 ` Dominique Martinet
0 siblings, 1 reply; 3+ messages in thread
From: Tingmao Wang @ 2025-10-11 14:35 UTC (permalink / raw)
To: Kent Overstreet, Dominique Martinet
Cc: v9fs, netfs, linux-fsdevel, David Howells, Eric Van Hensbergen,
Latchesar Ionkov
On 10/8/25 18:54, Kent Overstreet wrote:
> So I recently rebased my xfstests branch and started seeing quite the
> strange test failures:
>
> 00891 +cat: /ktest-out/xfstests/generic/036.dmesg: No data available
>
> No idea why a userspace update would expose this, it's a kernel bug - in
> the main netfs/9p read path, no less. Upon further investigation, cat is
> indeed receiving -ENODATA from a read syscall.
> [...]
Hi Kent,
Not a 9pfs maintainer here, but I think I have encountered this in the
past but I didn't think too much of it. Which kernel version are you
testing on? A while ago I sent a patch to fix some stale metadata
issue on uncached 9pfs, and one of the symptom was -ENODATA from a read:
https://lore.kernel.org/all/cover.1743956147.git.m@maowtm.org/
Basically, if some other process has a 9pfs file open, and the file
shrinks on the server side, the inode's i_size is not updated when another
process tries to read it, and the result is -ENODATA (instead of reporting
a normal EOF).
Does this sound like it could be happening in your situation? This patch
series should land in 6.18, so if this was not reproduced on -next it
might be worth a try?
I hope this information is helpful :)
Tingmao
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: -ENODATA from read syscall on 9p
2025-10-11 14:35 ` Tingmao Wang
@ 2025-10-11 19:40 ` Dominique Martinet
0 siblings, 0 replies; 3+ messages in thread
From: Dominique Martinet @ 2025-10-11 19:40 UTC (permalink / raw)
To: Kent Overstreet, Tingmao Wang
Cc: v9fs, netfs, linux-fsdevel, David Howells, Eric Van Hensbergen,
Latchesar Ionkov
Tingmao Wang wrote on Sat, Oct 11, 2025 at 03:35:00PM +0100:
> Not a 9pfs maintainer here, but I think I have encountered this in the
> past but I didn't think too much of it. Which kernel version are you
> testing on? A while ago I sent a patch to fix some stale metadata
> issue on uncached 9pfs, and one of the symptom was -ENODATA from a read:
> https://lore.kernel.org/all/cover.1743956147.git.m@maowtm.org/
>
> Basically, if some other process has a 9pfs file open, and the file
> shrinks on the server side, the inode's i_size is not updated when another
> process tries to read it, and the result is -ENODATA (instead of reporting
> a normal EOF).
>
> Does this sound like it could be happening in your situation? This patch
> series should land in 6.18, so if this was not reproduced on -next it
> might be worth a try?
It got merged in yesterday
With that said I'm also curious if that's the reason 9p reads stopped
progressing, but even with this patch I think there'd be a window for
files to shrink while the read is happening so netfs needs to return a
short read anyway -- if the file really is being modified under us it's
possible to hit end of file early.
OTOH I don't think that's what's happening here though, as Kent is
likely just running xfstest on its own in a directory...
You says these errors just started happening recently?
How recently are you talking?
I doubt it's been months but the only recent changes I see in this area
would be the netfs i_size updating patches early July.. If it's more
recent than that there's something else I didn't see anything obvious,
having a rough range to look at would be welcome for closer inspection.
--
Dominique Martinet | Asmadeus
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2025-10-11 19:40 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-08 17:54 -ENODATA from read syscall on 9p Kent Overstreet
2025-10-11 14:35 ` Tingmao Wang
2025-10-11 19:40 ` Dominique Martinet
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).