* [PATCH 0/1] fs/9p: Do not open remote file with APPEND mode when writeback cache is used
@ 2025-11-02 20:24 Tingmao Wang
2025-11-02 20:24 ` [PATCH 1/1] " Tingmao Wang
0 siblings, 1 reply; 8+ messages in thread
From: Tingmao Wang @ 2025-11-02 20:24 UTC (permalink / raw)
To: Dominique Martinet, Christian Schoenebeck, Eric Van Hensbergen,
Latchesar Ionkov
Cc: Tingmao Wang, v9fs
Hi,
Earlier I noticed that when using cache=mmap mode for my QEMU VM's rootfs,
fish seems to sometimes report corrupted history file. It turns out that
for some reason, when fish writes the file, a bunch of NUL bytes are being
added to the end.
Some further investigation lead me to conclude that the problem is with
how O_APPEND is handled in v9fs when a caching mode with writeback is used
(e.g. cache=loose or cache=mmap). Basically, the file is opened with
O_APPEND on the server side as well, in which case writes works fine in
uncached mode, but when the page cache is involved, Linux writes whole
pages (with position for that write pointing to the start of the page),
causing previously written content to be written again (since when the
file is opened with O_APPEND by QEMU, all writes goes to the end
regardless of offset).
Pasted at the end is a program to reproduce the problem. It will:
1. Open a file with O_APPEND
2. Write "Hello\n", sync, then write "Goodbye\n"
3. At this point the file is corrupted. It will use drop_caches to make
the problem immediately observable in the guest when it tries to read
the data back, even though this is not required - the file on the host
contains duplicate content as soon as a problematic write is issued.
Here is what happens:
root@6-18-0-rc3-next-20251031-dev-dirty ~# linux/reproducer-write-then-read /tmp/9p/a
Try reading /tmp/9p/a from the host now.
Press Enter to continue...
We can inspect the content from the host at this point:
> hexdump -C /tmp/linux-test/a
00000000 48 65 6c 6c 6f 0a 48 65 6c 6c 6f 0a 47 6f 6f 64 |Hello.Hello.Good|
00000010 62 79 65 0a |bye.|
00000014
The program will also detect this.
Here is the same setup but with some debug logs (I added logging to dump
the content being sent to the host)
openat(AT_FDCWD, "/tmp/9p/a", O_WRONLY|O_CREAT|O_APPEND, 0644
[ 10.207738][ T197] 9pnet: -- v9fs_vfs_lookup (197): dir: ffff888103228000 dentry: (a) ffff88810042ebc0 flags: 0
...
) = 3
write(3, "Hello\n", 6
[ 10.211944][ T197] 9pnet: -- v9fs_file_write_iter (197): fid 2
[ 10.212057][ T197] 9pnet: -- v9fs_file_write_iter (197): (cached)
) = 6
sync(
[ 10.212550][ T61] 9pnet: -- v9fs_fid_find_inode (61): inode: ffff88810a128000
[ 10.212794][ T61] 9pnet: (00000061) >>> TWRITE fid 2 offset 0 count 6 (/6)
[ 10.212932][ T61] content to be written: 00000000: 48 65 6c 6c 6f 0a Hello.
[ 10.213224][ T61] 9pnet: (00000061) >>> size=29 type: 118 tag: 0
[ 10.213447][ T61] 9pnet: (00000061) <<< size=11 type: 119 tag: 0
[ 10.213565][ T61] 9pnet: (00000061) <<< RWRITE count 6
[ 10.213751][ T61] 9pnet: -- v9fs_write_inode_dotl (61): v9fs_write_inode_dotl: inode ffff88810a128000
) = 0
write(3, "Goodbye\n", 8
[ 10.270821][ T197] 9pnet: -- v9fs_file_write_iter (197): fid 2
[ 10.270920][ T197] 9pnet: -- v9fs_file_write_iter (197): (cached)
) = 8
fsync(3
[ 10.271346][ T197] 9pnet: -- v9fs_fid_find_inode (197): inode: ffff88810a128000
[ 10.271501][ T197] 9pnet: (00000197) >>> TWRITE fid 2 offset 0 count 14 (/14)
[ 10.271610][ T197] content to be written: 00000000: 48 65 6c 6c 6f 0a 47 6f 6f 64 62 79 65 0a Hello.Goodbye.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This causes "Hello\n" to be written again due to the file being opened
in append mode on the host.
[ 10.271772][ T197] 9pnet: (00000197) >>> size=37 type: 118 tag: 0
[ 10.271933][ T197] 9pnet: (00000197) <<< size=11 type: 119 tag: 0
[ 10.272030][ T197] 9pnet: (00000197) <<< RWRITE count 14
[ 10.272205][ T197] 9pnet: -- v9fs_file_fsync_dotl (197): filp ffff88810952b800 datasync 0
[ 10.272325][ T197] 9pnet: (00000197) >>> TFSYNC fid 2 datasync:0
[ 10.272494][ T197] 9pnet: (00000197) >>> size=15 type: 50 tag: 0
[ 10.272640][ T197] 9pnet: (00000197) <<< size=7 type: 51 tag: 0
[ 10.272733][ T197] 9pnet: (00000197) <<< RFSYNC fid 2
) = 0
close(3
[ 10.273010][ T197] 9pnet: -- v9fs_dir_release (197): inode: ffff88810a128000 filp: ffff88810952b800 fid: 2
...
My understanding is that when we get to v9fs' write_iter, iocb->ki_pos
will, for an O_APPEND file, always point at the end (c.f.
generic_write_checks_count), so technically, except to mitigate guest vs
host write race conditions, we never needed to open the file as O_APPEND
on the server side in the first place, as we always send the correct
offset.
This can also lead to unexpected file lengthing if an fstat is issued
after a write, since we will (first flush the dirty pages, then) refresh
the i_size from the server. This is ultimately the cause of the NUL
bytes. This case can be tested via the reproducer by setting
DO_FSTAT_AFTER_WRITE=1.
I haven't tested what happens in cached mode if you have two fds open to
the same file, one with O_APPEND and one without - not sure yet whether
the folio writeback will use the "correct" fid... but I somewhat suspect
the behaviour even for the non-append fd will not be correct if the
O_APPEND one is used?
The patch that follows is an attempt at fixing this. Technically opening
the file with O_APPEND on the server side should be fine for uncached
mode, so I've preserved that, but I'm not 100% sure if there are any
problematic edge cases.
I did test, by having two cats pointing to the same file, one with ">" and
one with ">>", that the "two fd" situation is correctly handled when this
patch is applied.
Testing was done mainly on cache=none, cache=loose and cache=mmap modes,
but I also ran this reproducer and some previous ones over all caching bit
combinations from 0 to 15. I also confirmed that the problem with
fish_history no longer reproduces.
reproducer-write-then-read.c:
#include <unistd.h>
#include <fcntl.h>
#include <stdio.h>
#include <sys/stat.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
int main(int argc, const char **argv)
{
int fd1, fd2;
char buf[1024];
ssize_t size;
struct stat statbuf;
size_t i;
int ret;
bool corrupted = false;
size_t expected_total_size = 0;
bool do_fstat_after_write = false;
const char *expected_final_content = "Hello\nGoodbye\n";
if (argc != 2) {
fprintf(stderr, "Usage: %s <file>\n", argv[0]);
return 1;
}
if (getenv("DO_FSTAT_AFTER_WRITE") != NULL) {
do_fstat_after_write = true;
}
unlink(argv[1]);
fd1 = open(argv[1], O_WRONLY | O_APPEND | O_CREAT, 0644);
if (fd1 < 0) {
perror("open fd1");
return 1;
}
size = sprintf(buf, "Hello\n");
errno = 0;
if (write(fd1, buf, size) != size) {
perror("write");
close(fd1);
return 1;
}
expected_total_size += size;
if (do_fstat_after_write) {
ret = fstat(fd1, &statbuf);
if (ret < 0) {
perror("fstat");
close(fd1);
return 1;
}
if (statbuf.st_size != expected_total_size) {
fprintf(stderr, "File size returned by fstat mismatch after writes (1)\n");
corrupted = true;
}
}
sync();
size = sprintf(buf, "Goodbye\n");
errno = 0;
if (write(fd1, buf, size) != size) {
perror("write");
close(fd1);
return 1;
}
expected_total_size += size;
if (do_fstat_after_write) {
ret = fstat(fd1, &statbuf);
if (ret < 0) {
perror("fstat");
close(fd1);
return 1;
}
if (statbuf.st_size != expected_total_size) {
fprintf(stderr, "File size returned by fstat mismatch after writes (2)\n");
corrupted = true;
}
size = sprintf(buf, "Final goodbye\n");
errno = 0;
if (write(fd1, buf, size) != size) {
perror("write");
close(fd1);
return 1;
}
expected_total_size += size;
expected_final_content = "Hello\nGoodbye\nFinal goodbye\n";
}
fsync(fd1);
close(fd1);
sync();
printf("Try reading %s from the host now.\nPress Enter to continue...\n", argv[1]);
getchar();
fd1 = open("/proc/sys/vm/drop_caches", O_WRONLY);
if (fd1 < 0) {
perror("open drop_caches");
return 1;
}
errno = 0;
if (write(fd1, "1\n", 2) != 2) {
close(fd1);
perror("write");
return 1;
}
close(fd1);
fd2 = open(argv[1], O_RDONLY);
if (fd2 < 0) {
perror("open fd2");
return 1;
}
if ((size = read(fd2, buf, sizeof(buf))) < 0) {
close(fd2);
perror("read");
return 1;
}
if (size != expected_total_size) {
fprintf(stderr, "File size mismatch after reopen\n");
corrupted = true;
}
close(fd2);
close(fd1);
fprintf(stdout, "File content:\n");
for (i = 0; i < size; i++) {
if (buf[i] == '\0') {
corrupted = true;
printf("\\0");
} else {
printf("%c", buf[i]);
}
}
if (memcmp(buf, expected_final_content, expected_total_size) != 0) {
fprintf(stdout, "\nFile content mismatch\n");
corrupted = true;
}
if (corrupted) {
fprintf(stdout, "\nFile corrupted\n");
return 1;
}
return 0;
}
Tingmao Wang (1):
fs/9p: Do not open remote file with APPEND mode when writeback cache
is used
fs/9p/vfs_file.c | 13 +++++++++++--
fs/9p/vfs_inode.c | 7 ++++++-
fs/9p/vfs_inode_dotl.c | 6 ++++++
3 files changed, 23 insertions(+), 3 deletions(-)
base-commit: 98bd8b16ae57e8f25c95d496fcde3dfdd8223d41
--
2.51.2
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 1/1] fs/9p: Do not open remote file with APPEND mode when writeback cache is used
2025-11-02 20:24 [PATCH 0/1] fs/9p: Do not open remote file with APPEND mode when writeback cache is used Tingmao Wang
@ 2025-11-02 20:24 ` Tingmao Wang
2025-11-02 23:07 ` Dominique Martinet
0 siblings, 1 reply; 8+ messages in thread
From: Tingmao Wang @ 2025-11-02 20:24 UTC (permalink / raw)
To: Dominique Martinet, Christian Schoenebeck, Eric Van Hensbergen,
Latchesar Ionkov
Cc: Tingmao Wang, v9fs
When page cache is used, writebacks are done on a page granularity, and it
is expected that the underlying filesystem (such as v9fs) should respect
the write position. However, currently v9fs will passthrough O_APPEND to
the server even on cached mode. This causes data corruption if a sync or
fstat gets between two writes to the same file.
This patch removes the APPEND flag from the open request we send to the
server when writeback caching is involved. I believe keeping server-side
APPEND is probably fine for uncached mode (even if two fds are opened, one
without O_APPEND and one with it, this should still be fine since they
would use separate fid for the writes).
Signed-off-by: Tingmao Wang <m@maowtm.org>
---
I haven't done a bisect to figure out if this regression was introduced by
a change or was this behaviour always present - 6.14 has the same problem
and 6.13 did not compile for me due to some problem with bool / true /
false being a keyword in c23, and it could not compile with c17 either.
fs/9p/vfs_file.c | 13 +++++++++++--
fs/9p/vfs_inode.c | 7 ++++++-
fs/9p/vfs_inode_dotl.c | 6 ++++++
3 files changed, 23 insertions(+), 3 deletions(-)
diff --git a/fs/9p/vfs_file.c b/fs/9p/vfs_file.c
index 612a230bc012..ff95dfb30583 100644
--- a/fs/9p/vfs_file.c
+++ b/fs/9p/vfs_file.c
@@ -43,14 +43,18 @@ int v9fs_file_open(struct inode *inode, struct file *file)
struct v9fs_session_info *v9ses;
struct p9_fid *fid;
int omode;
+ int o_append;
p9_debug(P9_DEBUG_VFS, "inode: %p file: %p\n", inode, file);
v9ses = v9fs_inode2v9ses(inode);
- if (v9fs_proto_dotl(v9ses))
+ if (v9fs_proto_dotl(v9ses)) {
omode = v9fs_open_to_dotl_flags(file->f_flags);
- else
+ o_append = P9_DOTL_APPEND;
+ } else {
omode = v9fs_uflags2omode(file->f_flags,
v9fs_proto_dotu(v9ses));
+ o_append = P9_OAPPEND;
+ }
fid = file->private_data;
if (!fid) {
fid = v9fs_fid_clone(file_dentry(file));
@@ -61,6 +65,11 @@ int v9fs_file_open(struct inode *inode, struct file *file)
int writeback_omode = (omode & ~P9_OWRITE) | P9_ORDWR;
p9_debug(P9_DEBUG_CACHE, "write-only file with writeback enabled, try opening O_RDWR\n");
+ if (omode & o_append) {
+ writeback_omode &= ~o_append;
+ p9_debug(P9_DEBUG_CACHE, "removing append from open mode in writeback cache mode\n");
+ }
+
err = p9_client_open(fid, writeback_omode);
if (err < 0) {
p9_debug(P9_DEBUG_CACHE, "could not open O_RDWR, disabling caches\n");
diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
index 8666c9c62258..b68d76eb54a6 100644
--- a/fs/9p/vfs_inode.c
+++ b/fs/9p/vfs_inode.c
@@ -789,6 +789,12 @@ v9fs_vfs_atomic_open(struct inode *dir, struct dentry *dentry,
p9_omode = (p9_omode & ~P9_OWRITE) | P9_ORDWR;
p9_debug(P9_DEBUG_CACHE,
"write-only file with writeback enabled, creating w/ O_RDWR\n");
+ if (p9_omode & P9_OAPPEND) {
+ p9_omode &= ~P9_OAPPEND;
+ p9_debug(
+ P9_DEBUG_CACHE,
+ "removing append from open mode in writeback cache mode\n");
+ }
}
fid = v9fs_create(v9ses, dir, dentry, NULL, perm, p9_omode);
if (IS_ERR(fid))
@@ -1393,4 +1399,3 @@ static const struct inode_operations v9fs_symlink_inode_operations = {
.getattr = v9fs_vfs_getattr,
.setattr = v9fs_vfs_setattr,
};
-
diff --git a/fs/9p/vfs_inode_dotl.c b/fs/9p/vfs_inode_dotl.c
index 1661a25f2772..43580905b092 100644
--- a/fs/9p/vfs_inode_dotl.c
+++ b/fs/9p/vfs_inode_dotl.c
@@ -285,6 +285,12 @@ v9fs_vfs_atomic_open_dotl(struct inode *dir, struct dentry *dentry,
p9_omode = (p9_omode & ~P9_OWRITE) | P9_ORDWR;
p9_debug(P9_DEBUG_CACHE,
"write-only file with writeback enabled, creating w/ O_RDWR\n");
+ if (p9_omode & P9_DOTL_APPEND) {
+ p9_omode &= ~P9_DOTL_APPEND;
+ p9_debug(
+ P9_DEBUG_CACHE,
+ "removing append from open mode in writeback cache mode\n");
+ }
}
err = p9_client_create_dotl(ofid, name, p9_omode, mode, gid, &qid);
if (err < 0) {
--
2.51.2
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 1/1] fs/9p: Do not open remote file with APPEND mode when writeback cache is used
2025-11-02 20:24 ` [PATCH 1/1] " Tingmao Wang
@ 2025-11-02 23:07 ` Dominique Martinet
2025-11-02 23:56 ` [PATCH v2] fs/9p: Don't " Tingmao Wang
2025-11-02 23:58 ` [PATCH 1/1] fs/9p: Do not " Tingmao Wang
0 siblings, 2 replies; 8+ messages in thread
From: Dominique Martinet @ 2025-11-02 23:07 UTC (permalink / raw)
To: Tingmao Wang
Cc: Christian Schoenebeck, Eric Van Hensbergen, Latchesar Ionkov,
v9fs
Tingmao Wang wrote on Sun, Nov 02, 2025 at 08:24:39PM +0000:
> When page cache is used, writebacks are done on a page granularity, and it
> is expected that the underlying filesystem (such as v9fs) should respect
> the write position. However, currently v9fs will passthrough O_APPEND to
> the server even on cached mode. This causes data corruption if a sync or
> fstat gets between two writes to the same file.
Ugh.. I'd never expect O_APPEND to have any effect on the server tbh,
9p writes are "pwrite"-like, the offset is always specified -- there is
no plain write(fid, data) call -- so O_APPEND should have nothing to do
with the remote :/
(well, man pwrite(2) does say this... So that explains the behavior if
qemu respects the flag:
BUGS
POSIX requires that opening a file with the O_APPEND flag should have
no effect on the location at which pwrite() writes data. However, on
Linux, if a file is opened with O_APPEND, pwrite() appends data to the
end of the file, regardless of the value of offset.
)
I guess I can see this being useful if multiple clients are involved in
cacheless mode, but I agree any kind of data cache should strip this
flag.
(I think this didn't cause problem in other cache modes because writes
through the page cache happen on a writeback fid that's opened with
fixed (no O_APPEND) flags?...)
> diff --git a/fs/9p/vfs_file.c b/fs/9p/vfs_file.c
> index 612a230bc012..ff95dfb30583 100644
> --- a/fs/9p/vfs_file.c
> +++ b/fs/9p/vfs_file.c
> @@ -43,14 +43,18 @@ int v9fs_file_open(struct inode *inode, struct file *file)
> struct v9fs_session_info *v9ses;
> struct p9_fid *fid;
> int omode;
> + int o_append;
>
> p9_debug(P9_DEBUG_VFS, "inode: %p file: %p\n", inode, file);
> v9ses = v9fs_inode2v9ses(inode);
> - if (v9fs_proto_dotl(v9ses))
> + if (v9fs_proto_dotl(v9ses)) {
> omode = v9fs_open_to_dotl_flags(file->f_flags);
> - else
> + o_append = P9_DOTL_APPEND;
> + } else {
> omode = v9fs_uflags2omode(file->f_flags,
> v9fs_proto_dotu(v9ses));
> + o_append = P9_OAPPEND;
> + }
> fid = file->private_data;
> if (!fid) {
> fid = v9fs_fid_clone(file_dentry(file));
> @@ -61,6 +65,11 @@ int v9fs_file_open(struct inode *inode, struct file *file)
> int writeback_omode = (omode & ~P9_OWRITE) | P9_ORDWR;
>
> p9_debug(P9_DEBUG_CACHE, "write-only file with writeback enabled, try opening O_RDWR\n");
> + if (omode & o_append) {
> + writeback_omode &= ~o_append;
> + p9_debug(P9_DEBUG_CACHE, "removing append from open mode in writeback cache mode\n");
> + }
I wouldn't bother with the debug message, just clear it like P9_OWRITE:
int writebeack_omode = (omode & ~(P9_OWRITE|o_append)) | P9_ORDWR;
(same for other hunks)
--
Dominique Martinet | Asmadeus
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH v2] fs/9p: Don't open remote file with APPEND mode when writeback cache is used
2025-11-02 23:07 ` Dominique Martinet
@ 2025-11-02 23:56 ` Tingmao Wang
2025-11-03 7:34 ` Dominique Martinet
2025-11-10 14:22 ` Christian Schoenebeck
2025-11-02 23:58 ` [PATCH 1/1] fs/9p: Do not " Tingmao Wang
1 sibling, 2 replies; 8+ messages in thread
From: Tingmao Wang @ 2025-11-02 23:56 UTC (permalink / raw)
To: Dominique Martinet
Cc: Tingmao Wang, Christian Schoenebeck, Eric Van Hensbergen,
Latchesar Ionkov, v9fs
When page cache is used, writebacks are done on a page granularity, and it
is expected that the underlying filesystem (such as v9fs) should respect
the write position. However, currently v9fs will passthrough O_APPEND to
the server even on cached mode. This causes data corruption if a sync or
fstat gets between two writes to the same file.
This patch removes the APPEND flag from the open request we send to the
server when writeback caching is involved. I believe keeping server-side
APPEND is probably fine for uncached mode (even if two fds are opened, one
without O_APPEND and one with it, this should still be fine since they
would use separate fid for the writes).
Signed-off-by: Tingmao Wang <m@maowtm.org>
---
I haven't done a bisect to figure out if this regression was introduced by
a change or was this behaviour always present - 6.14 has the same problem
and 6.13 did not compile for me due to some problem with bool / true /
false being a keyword in c23, and it could not compile with c17 either.
fs/9p/vfs_file.c | 11 ++++++++---
fs/9p/vfs_inode.c | 3 +--
fs/9p/vfs_inode_dotl.c | 2 +-
3 files changed, 10 insertions(+), 6 deletions(-)
diff --git a/fs/9p/vfs_file.c b/fs/9p/vfs_file.c
index 612a230bc012..6f3880208587 100644
--- a/fs/9p/vfs_file.c
+++ b/fs/9p/vfs_file.c
@@ -43,14 +43,18 @@ int v9fs_file_open(struct inode *inode, struct file *file)
struct v9fs_session_info *v9ses;
struct p9_fid *fid;
int omode;
+ int o_append;
p9_debug(P9_DEBUG_VFS, "inode: %p file: %p\n", inode, file);
v9ses = v9fs_inode2v9ses(inode);
- if (v9fs_proto_dotl(v9ses))
+ if (v9fs_proto_dotl(v9ses)) {
omode = v9fs_open_to_dotl_flags(file->f_flags);
- else
+ o_append = P9_DOTL_APPEND;
+ } else {
omode = v9fs_uflags2omode(file->f_flags,
v9fs_proto_dotu(v9ses));
+ o_append = P9_OAPPEND;
+ }
fid = file->private_data;
if (!fid) {
fid = v9fs_fid_clone(file_dentry(file));
@@ -58,9 +62,10 @@ int v9fs_file_open(struct inode *inode, struct file *file)
return PTR_ERR(fid);
if ((v9ses->cache & CACHE_WRITEBACK) && (omode & P9_OWRITE)) {
- int writeback_omode = (omode & ~P9_OWRITE) | P9_ORDWR;
+ int writeback_omode = (omode & ~(P9_OWRITE | o_append)) | P9_ORDWR;
p9_debug(P9_DEBUG_CACHE, "write-only file with writeback enabled, try opening O_RDWR\n");
+
err = p9_client_open(fid, writeback_omode);
if (err < 0) {
p9_debug(P9_DEBUG_CACHE, "could not open O_RDWR, disabling caches\n");
diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
index 8666c9c62258..97abe65bf7c1 100644
--- a/fs/9p/vfs_inode.c
+++ b/fs/9p/vfs_inode.c
@@ -786,7 +786,7 @@ v9fs_vfs_atomic_open(struct inode *dir, struct dentry *dentry,
p9_omode = v9fs_uflags2omode(flags, v9fs_proto_dotu(v9ses));
if ((v9ses->cache & CACHE_WRITEBACK) && (p9_omode & P9_OWRITE)) {
- p9_omode = (p9_omode & ~P9_OWRITE) | P9_ORDWR;
+ p9_omode = (p9_omode & ~(P9_OWRITE | P9_OAPPEND)) | P9_ORDWR;
p9_debug(P9_DEBUG_CACHE,
"write-only file with writeback enabled, creating w/ O_RDWR\n");
}
@@ -1393,4 +1393,3 @@ static const struct inode_operations v9fs_symlink_inode_operations = {
.getattr = v9fs_vfs_getattr,
.setattr = v9fs_vfs_setattr,
};
-
diff --git a/fs/9p/vfs_inode_dotl.c b/fs/9p/vfs_inode_dotl.c
index 1661a25f2772..643e759eacb2 100644
--- a/fs/9p/vfs_inode_dotl.c
+++ b/fs/9p/vfs_inode_dotl.c
@@ -282,7 +282,7 @@ v9fs_vfs_atomic_open_dotl(struct inode *dir, struct dentry *dentry,
}
if ((v9ses->cache & CACHE_WRITEBACK) && (p9_omode & P9_OWRITE)) {
- p9_omode = (p9_omode & ~P9_OWRITE) | P9_ORDWR;
+ p9_omode = (p9_omode & ~(P9_OWRITE | P9_DOTL_APPEND)) | P9_ORDWR;
p9_debug(P9_DEBUG_CACHE,
"write-only file with writeback enabled, creating w/ O_RDWR\n");
}
--
2.51.2
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 1/1] fs/9p: Do not open remote file with APPEND mode when writeback cache is used
2025-11-02 23:07 ` Dominique Martinet
2025-11-02 23:56 ` [PATCH v2] fs/9p: Don't " Tingmao Wang
@ 2025-11-02 23:58 ` Tingmao Wang
1 sibling, 0 replies; 8+ messages in thread
From: Tingmao Wang @ 2025-11-02 23:58 UTC (permalink / raw)
To: Dominique Martinet
Cc: Christian Schoenebeck, Eric Van Hensbergen, Latchesar Ionkov,
v9fs
On 11/2/25 23:07, Dominique Martinet wrote:
> Tingmao Wang wrote on Sun, Nov 02, 2025 at 08:24:39PM +0000:
>> When page cache is used, writebacks are done on a page granularity, and it
>> is expected that the underlying filesystem (such as v9fs) should respect
>> the write position. However, currently v9fs will passthrough O_APPEND to
>> the server even on cached mode. This causes data corruption if a sync or
>> fstat gets between two writes to the same file.
>
> Ugh.. I'd never expect O_APPEND to have any effect on the server tbh,
> 9p writes are "pwrite"-like, the offset is always specified -- there is
> no plain write(fid, data) call -- so O_APPEND should have nothing to do
> with the remote :/
>
> (well, man pwrite(2) does say this... So that explains the behavior if
> qemu respects the flag:
> BUGS
> POSIX requires that opening a file with the O_APPEND flag should have
> no effect on the location at which pwrite() writes data. However, on
> Linux, if a file is opened with O_APPEND, pwrite() appends data to the
> end of the file, regardless of the value of offset.
> )
>
> I guess I can see this being useful if multiple clients are involved in
> cacheless mode, but I agree any kind of data cache should strip this
> flag.
> (I think this didn't cause problem in other cache modes because writes
> through the page cache happen on a writeback fid that's opened with
> fixed (no O_APPEND) flags?...)
I don't really know enough about 9p and netfs and which fid a writeback
would end up using, but I've seen that it happens in mmap and loose cache
mode. For cache=loose it's harder to notice since the client has the file
data still cached and thus read() would behave correctly (and in loose
mode we don't refresh the i_size either so we don't even see the extra NUL
bytes that happens when in cache=mmap mode), however the file is still
being written wrong by the server, and would be visible after e.g. a VM
restart.
>
>
>> diff --git a/fs/9p/vfs_file.c b/fs/9p/vfs_file.c
>> index 612a230bc012..ff95dfb30583 100644
>> --- a/fs/9p/vfs_file.c
>> +++ b/fs/9p/vfs_file.c
>> @@ -43,14 +43,18 @@ int v9fs_file_open(struct inode *inode, struct file *file)
>> struct v9fs_session_info *v9ses;
>> struct p9_fid *fid;
>> int omode;
>> + int o_append;
>>
>> p9_debug(P9_DEBUG_VFS, "inode: %p file: %p\n", inode, file);
>> v9ses = v9fs_inode2v9ses(inode);
>> - if (v9fs_proto_dotl(v9ses))
>> + if (v9fs_proto_dotl(v9ses)) {
>> omode = v9fs_open_to_dotl_flags(file->f_flags);
>> - else
>> + o_append = P9_DOTL_APPEND;
>> + } else {
>> omode = v9fs_uflags2omode(file->f_flags,
>> v9fs_proto_dotu(v9ses));
>> + o_append = P9_OAPPEND;
>> + }
>> fid = file->private_data;
>> if (!fid) {
>> fid = v9fs_fid_clone(file_dentry(file));
>> @@ -61,6 +65,11 @@ int v9fs_file_open(struct inode *inode, struct file *file)
>> int writeback_omode = (omode & ~P9_OWRITE) | P9_ORDWR;
>>
>> p9_debug(P9_DEBUG_CACHE, "write-only file with writeback enabled, try opening O_RDWR\n");
>> + if (omode & o_append) {
>> + writeback_omode &= ~o_append;
>> + p9_debug(P9_DEBUG_CACHE, "removing append from open mode in writeback cache mode\n");
>> + }
>
> I wouldn't bother with the debug message, just clear it like P9_OWRITE:
> int writebeack_omode = (omode & ~(P9_OWRITE|o_append)) | P9_ORDWR;
>
> (same for other hunks)
Patch resent as a reply. Feel free to add any additional tweaks /
comments you prefer :)
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2] fs/9p: Don't open remote file with APPEND mode when writeback cache is used
2025-11-02 23:56 ` [PATCH v2] fs/9p: Don't " Tingmao Wang
@ 2025-11-03 7:34 ` Dominique Martinet
2025-11-10 13:25 ` Christian Schoenebeck
2025-11-10 14:22 ` Christian Schoenebeck
1 sibling, 1 reply; 8+ messages in thread
From: Dominique Martinet @ 2025-11-03 7:34 UTC (permalink / raw)
To: Tingmao Wang
Cc: Christian Schoenebeck, Eric Van Hensbergen, Latchesar Ionkov,
v9fs
Thanks for the v2
Tingmao Wang wrote on Sun, Nov 02, 2025 at 11:56:30PM +0000:
> I haven't done a bisect to figure out if this regression was introduced by
> a change or was this behaviour always present - 6.14 has the same problem
> and 6.13 did not compile for me due to some problem with bool / true /
> false being a keyword in c23, and it could not compile with c17 either.
Ok so you got me at the "it happens in cache=loose too", so I went ahead
with bisect... I had the same problem with newer gcc being c23 and older
kernels being silly, you'd think `KCFLAGS=-std=gnu11` is enough but some
arch makefiles ignore that so I went in heavy patching
arch/x86/boot/Makefile and arch/x86/boot/compressed/Makefile . . .
... Anyway, it certainly didn't break recently, but it's not ages ago
either: the repro you made (thank you!!) starts failing in 6.4 cache
rework: the first bad commit is actually 21e26d5e54ab ("fs/9p: Fix bit
operation logic error"), but that's just a fixup of the previous commit
so 4eb3117888a9 ("fs/9p: Rework cache modes and add new options to
Documentation") is the cluprit... and... I'm not quite sure I get why
without digging deeper...
(But at least that made me realize the commit just before, 1543b4c5071c
("fs/9p: remove writeback fid and fix per-file modes"), removed the
writeback fid I was talking about... But that commit was working (with a
sane ~P9_OWRITE mask), so it's not directly that removal that broke
things, but something else in the rework)
Anyway, I think this commit is a strict improvement over the current
situation, I'll just slap a Fixes tag and push to -next for now, and if
time allows I'll try to have a closer look...
Thanks,
--
Dominique
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2] fs/9p: Don't open remote file with APPEND mode when writeback cache is used
2025-11-03 7:34 ` Dominique Martinet
@ 2025-11-10 13:25 ` Christian Schoenebeck
0 siblings, 0 replies; 8+ messages in thread
From: Christian Schoenebeck @ 2025-11-10 13:25 UTC (permalink / raw)
To: Tingmao Wang, Dominique Martinet
Cc: Eric Van Hensbergen, Latchesar Ionkov, v9fs
On Monday, November 3, 2025 8:34:19 AM CET Dominique Martinet wrote:
> Thanks for the v2
>
> Tingmao Wang wrote on Sun, Nov 02, 2025 at 11:56:30PM +0000:
> > I haven't done a bisect to figure out if this regression was introduced by
> > a change or was this behaviour always present - 6.14 has the same problem
> > and 6.13 did not compile for me due to some problem with bool / true /
> > false being a keyword in c23, and it could not compile with c17 either.
>
> Ok so you got me at the "it happens in cache=loose too", so I went ahead
> with bisect... I had the same problem with newer gcc being c23 and older
> kernels being silly, you'd think `KCFLAGS=-std=gnu11` is enough but some
> arch makefiles ignore that so I went in heavy patching
> arch/x86/boot/Makefile and arch/x86/boot/compressed/Makefile . . .
> ... Anyway, it certainly didn't break recently, but it's not ages ago
> either: the repro you made (thank you!!) starts failing in 6.4 cache
> rework: the first bad commit is actually 21e26d5e54ab ("fs/9p: Fix bit
> operation logic error"), but that's just a fixup of the previous commit
> so 4eb3117888a9 ("fs/9p: Rework cache modes and add new options to
> Documentation") is the cluprit... and... I'm not quite sure I get why
> without digging deeper...
> (But at least that made me realize the commit just before, 1543b4c5071c
> ("fs/9p: remove writeback fid and fix per-file modes"), removed the
> writeback fid I was talking about... But that commit was working (with a
> sane ~P9_OWRITE mask), so it's not directly that removal that broke
> things, but something else in the rework)
Haven't tested, but to me it looks like 1543b4c5071c ("fs/9p: remove writeback
fid and fix per-file modes") was the causing change. Up to that commit the
open flags were hard coded to exactly O_RDWR for all writeback fids:
struct p9_fid *v9fs_writeback_fid(struct dentry *dentry)
{
...
/*
* writeback fid will only be used to write back the
* dirty pages. We always request for the open fid in read-write
* mode so that a partial page write which result in page
* read can work.
*/
err = p9_client_open(fid, O_RDWR);
...
}
/Christian
> Anyway, I think this commit is a strict improvement over the current
> situation, I'll just slap a Fixes tag and push to -next for now, and if
> time allows I'll try to have a closer look...
>
> Thanks,
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2] fs/9p: Don't open remote file with APPEND mode when writeback cache is used
2025-11-02 23:56 ` [PATCH v2] fs/9p: Don't " Tingmao Wang
2025-11-03 7:34 ` Dominique Martinet
@ 2025-11-10 14:22 ` Christian Schoenebeck
1 sibling, 0 replies; 8+ messages in thread
From: Christian Schoenebeck @ 2025-11-10 14:22 UTC (permalink / raw)
To: Dominique Martinet, Tingmao Wang
Cc: Tingmao Wang, Eric Van Hensbergen, Latchesar Ionkov, v9fs
On Monday, November 3, 2025 12:56:30 AM CET Tingmao Wang wrote:
> When page cache is used, writebacks are done on a page granularity, and it
> is expected that the underlying filesystem (such as v9fs) should respect
> the write position. However, currently v9fs will passthrough O_APPEND to
> the server even on cached mode. This causes data corruption if a sync or
> fstat gets between two writes to the same file.
>
> This patch removes the APPEND flag from the open request we send to the
> server when writeback caching is involved. I believe keeping server-side
> APPEND is probably fine for uncached mode (even if two fds are opened, one
> without O_APPEND and one with it, this should still be fine since they
> would use separate fid for the writes).
>
> Signed-off-by: Tingmao Wang <m@maowtm.org>
> ---
>
> I haven't done a bisect to figure out if this regression was introduced by
> a change or was this behaviour always present - 6.14 has the same problem
> and 6.13 did not compile for me due to some problem with bool / true /
> false being a keyword in c23, and it could not compile with c17 either.
>
> fs/9p/vfs_file.c | 11 ++++++++---
> fs/9p/vfs_inode.c | 3 +--
> fs/9p/vfs_inode_dotl.c | 2 +-
> 3 files changed, 10 insertions(+), 6 deletions(-)
>
> diff --git a/fs/9p/vfs_file.c b/fs/9p/vfs_file.c
> index 612a230bc012..6f3880208587 100644
> --- a/fs/9p/vfs_file.c
> +++ b/fs/9p/vfs_file.c
> @@ -43,14 +43,18 @@ int v9fs_file_open(struct inode *inode, struct file
> *file) struct v9fs_session_info *v9ses;
> struct p9_fid *fid;
> int omode;
> + int o_append;
>
> p9_debug(P9_DEBUG_VFS, "inode: %p file: %p\n", inode, file);
> v9ses = v9fs_inode2v9ses(inode);
> - if (v9fs_proto_dotl(v9ses))
> + if (v9fs_proto_dotl(v9ses)) {
> omode = v9fs_open_to_dotl_flags(file->f_flags);
> - else
> + o_append = P9_DOTL_APPEND;
> + } else {
> omode = v9fs_uflags2omode(file->f_flags,
> v9fs_proto_dotu(v9ses));
> + o_append = P9_OAPPEND;
> + }
> fid = file->private_data;
> if (!fid) {
> fid = v9fs_fid_clone(file_dentry(file));
> @@ -58,9 +62,10 @@ int v9fs_file_open(struct inode *inode, struct file
> *file) return PTR_ERR(fid);
>
> if ((v9ses->cache & CACHE_WRITEBACK) && (omode & P9_OWRITE)) {
> - int writeback_omode = (omode & ~P9_OWRITE) | P9_ORDWR;
> + int writeback_omode = (omode & ~(P9_OWRITE | o_append)) | P9_ORDWR;
I guess changes in this function could be simplified by either pulling
O_APPEND at the very beginning right after v9ses = v9fs_inode2v9ses(inode); or
by simply pulling both P9_OAPPEND and P9_DOTL_APPEND here, but it's doing the
job so:
Reviewed-by: Christian Schoenebeck <linux_oss@crudebyte.com>
/Christian
>
> p9_debug(P9_DEBUG_CACHE, "write-only file with writeback enabled, try
> opening O_RDWR\n"); +
> err = p9_client_open(fid, writeback_omode);
> if (err < 0) {
> p9_debug(P9_DEBUG_CACHE, "could not open O_RDWR, disabling caches\n");
> diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
> index 8666c9c62258..97abe65bf7c1 100644
> --- a/fs/9p/vfs_inode.c
> +++ b/fs/9p/vfs_inode.c
> @@ -786,7 +786,7 @@ v9fs_vfs_atomic_open(struct inode *dir, struct dentry
> *dentry, p9_omode = v9fs_uflags2omode(flags, v9fs_proto_dotu(v9ses));
>
> if ((v9ses->cache & CACHE_WRITEBACK) && (p9_omode & P9_OWRITE)) {
> - p9_omode = (p9_omode & ~P9_OWRITE) | P9_ORDWR;
> + p9_omode = (p9_omode & ~(P9_OWRITE | P9_OAPPEND)) | P9_ORDWR;
> p9_debug(P9_DEBUG_CACHE,
> "write-only file with writeback enabled, creating w/ O_RDWR\n");
> }
> @@ -1393,4 +1393,3 @@ static const struct inode_operations
> v9fs_symlink_inode_operations = { .getattr = v9fs_vfs_getattr,
> .setattr = v9fs_vfs_setattr,
> };
> -
> diff --git a/fs/9p/vfs_inode_dotl.c b/fs/9p/vfs_inode_dotl.c
> index 1661a25f2772..643e759eacb2 100644
> --- a/fs/9p/vfs_inode_dotl.c
> +++ b/fs/9p/vfs_inode_dotl.c
> @@ -282,7 +282,7 @@ v9fs_vfs_atomic_open_dotl(struct inode *dir, struct
> dentry *dentry, }
>
> if ((v9ses->cache & CACHE_WRITEBACK) && (p9_omode & P9_OWRITE)) {
> - p9_omode = (p9_omode & ~P9_OWRITE) | P9_ORDWR;
> + p9_omode = (p9_omode & ~(P9_OWRITE | P9_DOTL_APPEND)) | P9_ORDWR;
> p9_debug(P9_DEBUG_CACHE,
> "write-only file with writeback enabled, creating w/ O_RDWR\n");
> }
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2025-11-10 14:22 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-02 20:24 [PATCH 0/1] fs/9p: Do not open remote file with APPEND mode when writeback cache is used Tingmao Wang
2025-11-02 20:24 ` [PATCH 1/1] " Tingmao Wang
2025-11-02 23:07 ` Dominique Martinet
2025-11-02 23:56 ` [PATCH v2] fs/9p: Don't " Tingmao Wang
2025-11-03 7:34 ` Dominique Martinet
2025-11-10 13:25 ` Christian Schoenebeck
2025-11-10 14:22 ` Christian Schoenebeck
2025-11-02 23:58 ` [PATCH 1/1] fs/9p: Do not " Tingmao Wang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox