* Re: strace log before the fix, with fsync fix and with fclose fix.
[not found] ` <CAADnVQLDQpNEa0bT6nyX3UfGTE94YxrM4gPD+PirmqHwXRB15Q@mail.gmail.com>
@ 2025-10-20 20:12 ` Andrii Nakryiko
2025-10-21 22:14 ` Dominique Martinet
0 siblings, 1 reply; 2+ messages in thread
From: Andrii Nakryiko @ 2025-10-20 20:12 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Xing Guo, Andrii Nakryiko, Alexei Starovoitov, bpf,
open list:KERNEL SELFTEST FRAMEWORK, Jiri Olsa, sveiss,
Linux-Fsdevel
+linux-fsdevel
On Mon, Oct 20, 2025 at 9:28 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Mon, Oct 20, 2025 at 1:59 AM Xing Guo <higuoxing@gmail.com> wrote:
> >
> > Test with fsync:
>
> I doubt people will be reading this giant log.
> Please bisect it instead.
> Since it's not reproducible when /tmp is backed by tmpfs
> it's probably some change in vfs or in the file system that
> your laptop is using for /tmp.
> It changes a user visible behavior of the file system and
> needs to be investigated, since it may affect more code than
> just this selftest.
dmesg output was certainly too much, but I filtered all that out. Here
are relevant pieces of strace log.
BEFORE (FAILING)
================
openat(AT_FDCWD, "/tmp/bpf_arg_parsing_test.Pf280c",
O_RDWR|O_CREAT|O_EXCL, 0600) = 4
fcntl(4, F_GETFL) = 0x8002 (flags O_RDWR|O_LARGEFILE)
fstat(4, {st_mode=S_IFREG|0600, st_size=0, ...}) = 0
write(4, "# comment\n test_with_spaces "..., 175) = 175
openat(AT_FDCWD, "/tmp/bpf_arg_parsing_test.Pf280c", O_RDONLY) = 5
fstat(5, {st_mode=S_IFREG|0600, st_size=0, ...}) = 0
read(5, "", 8192) = 0
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -- THIS IS BAD, NO CONTENTS
close(5) = 0
close(4) = 0
unlink("/tmp/bpf_arg_parsing_test.Pf280c") = 0
WITH SYNC
=========
openat(AT_FDCWD, "/tmp/bpf_arg_parsing_test.UK5nUq",
O_RDWR|O_CREAT|O_EXCL, 0600) = 4
fcntl(4, F_GETFL) = 0x8002 (flags O_RDWR|O_LARGEFILE)
fstat(4, {st_mode=S_IFREG|0600, st_size=0, ...}) = 0
write(4, "# comment\n test_with_spaces "..., 175) = 175
fsync(4) = 0
openat(AT_FDCWD, "/tmp/bpf_arg_parsing_test.UK5nUq", O_RDONLY) = 5
fstat(5, {st_mode=S_IFREG|0600, st_size=175, ...}) = 0
read(5, "# comment\n test_with_spaces "..., 8192) = 175
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -- GOOD,
because fsync(4) before second openat()
read(5, "", 8192) = 0
close(5) = 0
close(4) = 0
unlink("/tmp/bpf_arg_parsing_test.UK5nUq") = 0
WITH CLOSE
==========
openat(AT_FDCWD, "/tmp/bpf_arg_parsing_test.WavYEa",
O_RDWR|O_CREAT|O_EXCL, 0600) = 4
fcntl(4, F_GETFL) = 0x8002 (flags O_RDWR|O_LARGEFILE)
fstat(4, {st_mode=S_IFREG|0600, st_size=0, ...}) = 0
write(4, "# comment\n test_with_spaces "..., 175) = 175
close(4) = 0
openat(AT_FDCWD, "/tmp/bpf_arg_parsing_test.WavYEa", O_RDONLY) = 4
fstat(4, {st_mode=S_IFREG|0600, st_size=175, ...}) = 0
read(4, "# comment\n test_with_spaces "..., 8192) = 175
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -- GOOD,
because close(4) before second openat()
read(4, "", 8192) = 0
close(4) = 0
unlink("/tmp/bpf_arg_parsing_test.WavYEa") = 0
So as can be seen above, kernel does see the write(4, <175 bytes of
content>) in all cases (so libc's fflush(fp) works as expected), but
without either fsync(4) or close(4), kernel won't return those 175
bytes if we open() same file (returning FD 5 this time).
Is that a reasonable behavior of the kernel? I don't know, it would be
good for FS folks to double check/confirm. The complication here is
that we have two FDs open against the same underlying file (so my
assumption is that kernel should share underlying page cache data),
and documentation I've found isn't particularly clear on guarantees in
that case.
write()'s man page states:
> POSIX requires that a read(2) which can be proved to occur after a
write() has returned returns the new data. Note that not all file
systems are POSIX conforming.
(but this doesn't clarify if all this is applied only within the same *FD*)
POSIX itself says:
> Writes can be serialized with respect to other reads and writes.
If a read() of file data can be proven (by any means) to occur after a
write() of the data, it must reflect that write(), even if the calls
are made by different processes. A similar requirement applies to
multiple write operations to the same file position. This is needed to
guarantee the propagation of data from write() calls to subsequent
read() calls. This requirement is particularly significant for
networked file systems, where some caching schemes violate these
semantics.
But again, no mention of multiple FDs opened against the same underlying file.
So unclear, which is why it would be nice for FS folks to double
check. It's certainly a change in behavior, it used to work reliably
before. [0] is the source code of the test (and note that we now added
fsync(), without it the test is now broken).
[0] https://github.com/torvalds/linux/blob/master/tools/testing/selftests/bpf/prog_tests/arg_parsing.c
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: strace log before the fix, with fsync fix and with fclose fix.
2025-10-20 20:12 ` strace log before the fix, with fsync fix and with fclose fix Andrii Nakryiko
@ 2025-10-21 22:14 ` Dominique Martinet
0 siblings, 0 replies; 2+ messages in thread
From: Dominique Martinet @ 2025-10-21 22:14 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: Alexei Starovoitov, Xing Guo, Andrii Nakryiko, Alexei Starovoitov,
bpf, open list:KERNEL SELFTEST FRAMEWORK, Jiri Olsa, sveiss,
Linux-Fsdevel
Andrii Nakryiko wrote on Mon, Oct 20, 2025 at 01:12:16PM -0700:
> So unclear, which is why it would be nice for FS folks to double
> check. It's certainly a change in behavior, it used to work reliably
> before. [0] is the source code of the test (and note that we now added
> fsync(), without it the test is now broken).
It's a 9p bug, sorry.
tentative fix:
https://lkml.kernel.org/r/20251022-mmap-regression-v1-1-980365ee524e@codewreck.org
other thread with repro:
https://lkml.kernel.org/r/CAHzjS_u_SYdt5=2gYO_dxzMKXzGMt-TfdE_ueowg-Hq5tRCAiw@mail.gmail.com
I'll send the fix to Linus once someone can confirm this works for this
usecase as well (and try to improve our testing a bit... maybe just run
the bpf test suite for starters)
--
Dominique Martinet | Asmadeus
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2025-10-21 22:15 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <CAEf4Bza6ynjUHanEEqQZ_mke3oBCzSitxBt9Jb5tx8rxt8q4vg@mail.gmail.com>
[not found] ` <20251020085918.1604034-1-higuoxing@gmail.com>
[not found] ` <CAADnVQLDQpNEa0bT6nyX3UfGTE94YxrM4gPD+PirmqHwXRB15Q@mail.gmail.com>
2025-10-20 20:12 ` strace log before the fix, with fsync fix and with fclose fix Andrii Nakryiko
2025-10-21 22:14 ` Dominique Martinet
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).