* [Virtio-fs] [PATCH v4 0/5] virtiofs: propagate sync() to file server
@ 2021-05-20 15:46 ` Greg Kurz
0 siblings, 0 replies; 83+ messages in thread
From: Greg Kurz @ 2021-05-20 15:46 UTC (permalink / raw)
To: Miklos Szeredi
Cc: linux-kernel, Max Reitz, virtio-fs, linux-fsdevel, virtualization,
Vivek Goyal
This was a single patch until v3. Some preliminary cleanups were
introduced for submounts in this v4.
This can be tested with a custom virtiofsd implementing FUSE_SYNCFS, here:
https://gitlab.com/gkurz/qemu/-/tree/fuse-sync
v4: - submount fixes
- set nodeid of the superblock in the request (Miklos)
v3: - just keep a 64-bit padding field in the arg struct (Vivek)
v2: - clarify compatibility with older servers in changelog (Vivek)
- ignore the wait == 0 case (Miklos)
- 64-bit aligned argument structure (Vivek, Miklos)
Greg Kurz (5):
fuse: Fix leak in fuse_dentry_automount() error path
fuse: Call vfs_get_tree() for submounts
fuse: Make fuse_fill_super_submount() static
virtiofs: Skip submounts in sget_fc()
virtiofs: propagate sync() to file server
fs/fuse/dir.c | 45 +++++---------------
fs/fuse/fuse_i.h | 12 +++---
fs/fuse/inode.c | 87 ++++++++++++++++++++++++++++++++++++++-
fs/fuse/virtio_fs.c | 9 ++++
include/uapi/linux/fuse.h | 10 ++++-
5 files changed, 120 insertions(+), 43 deletions(-)
--
2.26.3
^ permalink raw reply [flat|nested] 83+ messages in thread* [PATCH v4 0/5] virtiofs: propagate sync() to file server @ 2021-05-20 15:46 ` Greg Kurz 0 siblings, 0 replies; 83+ messages in thread From: Greg Kurz @ 2021-05-20 15:46 UTC (permalink / raw) To: Miklos Szeredi Cc: virtualization, linux-fsdevel, linux-kernel, virtio-fs, Stefan Hajnoczi, Max Reitz, Vivek Goyal, Greg Kurz This was a single patch until v3. Some preliminary cleanups were introduced for submounts in this v4. This can be tested with a custom virtiofsd implementing FUSE_SYNCFS, here: https://gitlab.com/gkurz/qemu/-/tree/fuse-sync v4: - submount fixes - set nodeid of the superblock in the request (Miklos) v3: - just keep a 64-bit padding field in the arg struct (Vivek) v2: - clarify compatibility with older servers in changelog (Vivek) - ignore the wait == 0 case (Miklos) - 64-bit aligned argument structure (Vivek, Miklos) Greg Kurz (5): fuse: Fix leak in fuse_dentry_automount() error path fuse: Call vfs_get_tree() for submounts fuse: Make fuse_fill_super_submount() static virtiofs: Skip submounts in sget_fc() virtiofs: propagate sync() to file server fs/fuse/dir.c | 45 +++++--------------- fs/fuse/fuse_i.h | 12 +++--- fs/fuse/inode.c | 87 ++++++++++++++++++++++++++++++++++++++- fs/fuse/virtio_fs.c | 9 ++++ include/uapi/linux/fuse.h | 10 ++++- 5 files changed, 120 insertions(+), 43 deletions(-) -- 2.26.3 ^ permalink raw reply [flat|nested] 83+ messages in thread
* [PATCH v4 0/5] virtiofs: propagate sync() to file server @ 2021-05-20 15:46 ` Greg Kurz 0 siblings, 0 replies; 83+ messages in thread From: Greg Kurz @ 2021-05-20 15:46 UTC (permalink / raw) To: Miklos Szeredi Cc: linux-kernel, Max Reitz, virtio-fs, Stefan Hajnoczi, linux-fsdevel, virtualization, Vivek Goyal This was a single patch until v3. Some preliminary cleanups were introduced for submounts in this v4. This can be tested with a custom virtiofsd implementing FUSE_SYNCFS, here: https://gitlab.com/gkurz/qemu/-/tree/fuse-sync v4: - submount fixes - set nodeid of the superblock in the request (Miklos) v3: - just keep a 64-bit padding field in the arg struct (Vivek) v2: - clarify compatibility with older servers in changelog (Vivek) - ignore the wait == 0 case (Miklos) - 64-bit aligned argument structure (Vivek, Miklos) Greg Kurz (5): fuse: Fix leak in fuse_dentry_automount() error path fuse: Call vfs_get_tree() for submounts fuse: Make fuse_fill_super_submount() static virtiofs: Skip submounts in sget_fc() virtiofs: propagate sync() to file server fs/fuse/dir.c | 45 +++++--------------- fs/fuse/fuse_i.h | 12 +++--- fs/fuse/inode.c | 87 ++++++++++++++++++++++++++++++++++++++- fs/fuse/virtio_fs.c | 9 ++++ include/uapi/linux/fuse.h | 10 ++++- 5 files changed, 120 insertions(+), 43 deletions(-) -- 2.26.3 _______________________________________________ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization ^ permalink raw reply [flat|nested] 83+ messages in thread
* [Virtio-fs] [PATCH v4 1/5] fuse: Fix leak in fuse_dentry_automount() error path 2021-05-20 15:46 ` Greg Kurz (?) @ 2021-05-20 15:46 ` Greg Kurz -1 siblings, 0 replies; 83+ messages in thread From: Greg Kurz @ 2021-05-20 15:46 UTC (permalink / raw) To: Miklos Szeredi Cc: linux-kernel, Max Reitz, virtio-fs, linux-fsdevel, virtualization, Vivek Goyal Some rollback was forgotten during the addition of crossmounts. Fixes: bf109c64040f ("fuse: implement crossmounts") Cc: mreitz@redhat.com Signed-off-by: Greg Kurz <groug@kaod.org> --- fs/fuse/dir.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c index 1b6c001a7dd1..fb2af70596c3 100644 --- a/fs/fuse/dir.c +++ b/fs/fuse/dir.c @@ -339,8 +339,11 @@ static struct vfsmount *fuse_dentry_automount(struct path *path) /* Initialize superblock, making @mp_fi its root */ err = fuse_fill_super_submount(sb, mp_fi); - if (err) + if (err) { + fuse_conn_put(fc); + kfree(fm); goto out_put_sb; + } sb->s_flags |= SB_ACTIVE; fsc->root = dget(sb->s_root); -- 2.26.3 ^ permalink raw reply related [flat|nested] 83+ messages in thread
* [PATCH v4 1/5] fuse: Fix leak in fuse_dentry_automount() error path @ 2021-05-20 15:46 ` Greg Kurz 0 siblings, 0 replies; 83+ messages in thread From: Greg Kurz @ 2021-05-20 15:46 UTC (permalink / raw) To: Miklos Szeredi Cc: virtualization, linux-fsdevel, linux-kernel, virtio-fs, Stefan Hajnoczi, Max Reitz, Vivek Goyal, Greg Kurz Some rollback was forgotten during the addition of crossmounts. Fixes: bf109c64040f ("fuse: implement crossmounts") Cc: mreitz@redhat.com Signed-off-by: Greg Kurz <groug@kaod.org> --- fs/fuse/dir.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c index 1b6c001a7dd1..fb2af70596c3 100644 --- a/fs/fuse/dir.c +++ b/fs/fuse/dir.c @@ -339,8 +339,11 @@ static struct vfsmount *fuse_dentry_automount(struct path *path) /* Initialize superblock, making @mp_fi its root */ err = fuse_fill_super_submount(sb, mp_fi); - if (err) + if (err) { + fuse_conn_put(fc); + kfree(fm); goto out_put_sb; + } sb->s_flags |= SB_ACTIVE; fsc->root = dget(sb->s_root); -- 2.26.3 ^ permalink raw reply related [flat|nested] 83+ messages in thread
* [PATCH v4 1/5] fuse: Fix leak in fuse_dentry_automount() error path @ 2021-05-20 15:46 ` Greg Kurz 0 siblings, 0 replies; 83+ messages in thread From: Greg Kurz @ 2021-05-20 15:46 UTC (permalink / raw) To: Miklos Szeredi Cc: linux-kernel, Max Reitz, virtio-fs, Stefan Hajnoczi, linux-fsdevel, virtualization, Vivek Goyal Some rollback was forgotten during the addition of crossmounts. Fixes: bf109c64040f ("fuse: implement crossmounts") Cc: mreitz@redhat.com Signed-off-by: Greg Kurz <groug@kaod.org> --- fs/fuse/dir.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c index 1b6c001a7dd1..fb2af70596c3 100644 --- a/fs/fuse/dir.c +++ b/fs/fuse/dir.c @@ -339,8 +339,11 @@ static struct vfsmount *fuse_dentry_automount(struct path *path) /* Initialize superblock, making @mp_fi its root */ err = fuse_fill_super_submount(sb, mp_fi); - if (err) + if (err) { + fuse_conn_put(fc); + kfree(fm); goto out_put_sb; + } sb->s_flags |= SB_ACTIVE; fsc->root = dget(sb->s_root); -- 2.26.3 _______________________________________________ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization ^ permalink raw reply related [flat|nested] 83+ messages in thread
* Re: [Virtio-fs] [PATCH v4 1/5] fuse: Fix leak in fuse_dentry_automount() error path 2021-05-20 15:46 ` Greg Kurz (?) @ 2021-05-20 19:45 ` Al Viro -1 siblings, 0 replies; 83+ messages in thread From: Al Viro @ 2021-05-20 19:45 UTC (permalink / raw) To: Greg Kurz Cc: Miklos Szeredi, linux-kernel, virtualization, virtio-fs, linux-fsdevel, Max Reitz, Vivek Goyal On Thu, May 20, 2021 at 05:46:50PM +0200, Greg Kurz wrote: > Some rollback was forgotten during the addition of crossmounts. Have you actually tested that? Because I strongly suspect that by that point the ownership of fc and fm is with sb and those should be taken care of by deactivate_locked_super(). ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [PATCH v4 1/5] fuse: Fix leak in fuse_dentry_automount() error path @ 2021-05-20 19:45 ` Al Viro 0 siblings, 0 replies; 83+ messages in thread From: Al Viro @ 2021-05-20 19:45 UTC (permalink / raw) To: Greg Kurz Cc: Miklos Szeredi, virtualization, linux-fsdevel, linux-kernel, virtio-fs, Stefan Hajnoczi, Max Reitz, Vivek Goyal On Thu, May 20, 2021 at 05:46:50PM +0200, Greg Kurz wrote: > Some rollback was forgotten during the addition of crossmounts. Have you actually tested that? Because I strongly suspect that by that point the ownership of fc and fm is with sb and those should be taken care of by deactivate_locked_super(). ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [PATCH v4 1/5] fuse: Fix leak in fuse_dentry_automount() error path @ 2021-05-20 19:45 ` Al Viro 0 siblings, 0 replies; 83+ messages in thread From: Al Viro @ 2021-05-20 19:45 UTC (permalink / raw) To: Greg Kurz Cc: Miklos Szeredi, linux-kernel, virtualization, virtio-fs, Stefan Hajnoczi, linux-fsdevel, Max Reitz, Vivek Goyal On Thu, May 20, 2021 at 05:46:50PM +0200, Greg Kurz wrote: > Some rollback was forgotten during the addition of crossmounts. Have you actually tested that? Because I strongly suspect that by that point the ownership of fc and fm is with sb and those should be taken care of by deactivate_locked_super(). _______________________________________________ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [Virtio-fs] [PATCH v4 1/5] fuse: Fix leak in fuse_dentry_automount() error path 2021-05-20 19:45 ` Al Viro @ 2021-05-21 7:54 ` Miklos Szeredi -1 siblings, 0 replies; 83+ messages in thread From: Miklos Szeredi @ 2021-05-21 7:54 UTC (permalink / raw) To: Al Viro Cc: linux-kernel, virtio-fs-list, Max Reitz, linux-fsdevel, virtualization, Vivek Goyal On Thu, 20 May 2021 at 21:45, Al Viro <viro@zeniv.linux.org.uk> wrote: > > On Thu, May 20, 2021 at 05:46:50PM +0200, Greg Kurz wrote: > > Some rollback was forgotten during the addition of crossmounts. > > Have you actually tested that? Because I strongly suspect that > by that point the ownership of fc and fm is with sb and those > should be taken care of by deactivate_locked_super(). Not quite. Patch looks correct because destruction of fm is done in fuse_put_super(), which only gets called if the sb initialization gets as far as setting up sb->s_root, which only happens after the successful fuse_fill_super_submount() call in this case. Doing the destruction from the various ->kill_sb() instances instead of from ->put_super() would also fix this, but I'm not quite sure that that would be any cleaner. Thanks, Miklos ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [PATCH v4 1/5] fuse: Fix leak in fuse_dentry_automount() error path @ 2021-05-21 7:54 ` Miklos Szeredi 0 siblings, 0 replies; 83+ messages in thread From: Miklos Szeredi @ 2021-05-21 7:54 UTC (permalink / raw) To: Al Viro Cc: Greg Kurz, virtualization, linux-fsdevel, linux-kernel, virtio-fs-list, Stefan Hajnoczi, Max Reitz, Vivek Goyal On Thu, 20 May 2021 at 21:45, Al Viro <viro@zeniv.linux.org.uk> wrote: > > On Thu, May 20, 2021 at 05:46:50PM +0200, Greg Kurz wrote: > > Some rollback was forgotten during the addition of crossmounts. > > Have you actually tested that? Because I strongly suspect that > by that point the ownership of fc and fm is with sb and those > should be taken care of by deactivate_locked_super(). Not quite. Patch looks correct because destruction of fm is done in fuse_put_super(), which only gets called if the sb initialization gets as far as setting up sb->s_root, which only happens after the successful fuse_fill_super_submount() call in this case. Doing the destruction from the various ->kill_sb() instances instead of from ->put_super() would also fix this, but I'm not quite sure that that would be any cleaner. Thanks, Miklos ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [Virtio-fs] [PATCH v4 1/5] fuse: Fix leak in fuse_dentry_automount() error path 2021-05-21 7:54 ` Miklos Szeredi (?) @ 2021-05-21 8:15 ` Greg Kurz -1 siblings, 0 replies; 83+ messages in thread From: Greg Kurz @ 2021-05-21 8:15 UTC (permalink / raw) To: Miklos Szeredi Cc: Stefan, linux-kernel, virtualization, virtio-fs-list, Al Viro, linux-fsdevel, Max Reitz, Vivek Goyal On Fri, 21 May 2021 09:54:19 +0200 Miklos Szeredi <miklos@szeredi.hu> wrote: > On Thu, 20 May 2021 at 21:45, Al Viro <viro@zeniv.linux.org.uk> wrote: > > > > On Thu, May 20, 2021 at 05:46:50PM +0200, Greg Kurz wrote: > > > Some rollback was forgotten during the addition of crossmounts. > > > > Have you actually tested that? Because I strongly suspect that > > by that point the ownership of fc and fm is with sb and those > > should be taken care of by deactivate_locked_super(). > > Not quite. Patch looks correct because destruction of fm is done in > fuse_put_super(), which only gets called if the sb initialization gets > as far as setting up sb->s_root, which only happens after the > successful fuse_fill_super_submount() call in this case. > > Doing the destruction from the various ->kill_sb() instances instead > of from ->put_super() would also fix this, but I'm not quite sure that > that would be any cleaner. > As saying in the answer I've just posted, a failure in fuse_fill_super_submount() causes an actual crash because fuse_mount_remove() logically assumes fm to already be in fc->mounts, which isn't the case at this point. In the root mount case, this is handled by taking back the ownership on fm, i.e. do the rollback *and* clear sb->s_fs_info. It seems that the same should be done for submounts. > Thanks, > Miklos ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [PATCH v4 1/5] fuse: Fix leak in fuse_dentry_automount() error path @ 2021-05-21 8:15 ` Greg Kurz 0 siblings, 0 replies; 83+ messages in thread From: Greg Kurz @ 2021-05-21 8:15 UTC (permalink / raw) To: Miklos Szeredi Cc: Al Viro, virtualization, linux-fsdevel, linux-kernel, virtio-fs-list, Stefan Hajnoczi, Max Reitz, Vivek Goyal On Fri, 21 May 2021 09:54:19 +0200 Miklos Szeredi <miklos@szeredi.hu> wrote: > On Thu, 20 May 2021 at 21:45, Al Viro <viro@zeniv.linux.org.uk> wrote: > > > > On Thu, May 20, 2021 at 05:46:50PM +0200, Greg Kurz wrote: > > > Some rollback was forgotten during the addition of crossmounts. > > > > Have you actually tested that? Because I strongly suspect that > > by that point the ownership of fc and fm is with sb and those > > should be taken care of by deactivate_locked_super(). > > Not quite. Patch looks correct because destruction of fm is done in > fuse_put_super(), which only gets called if the sb initialization gets > as far as setting up sb->s_root, which only happens after the > successful fuse_fill_super_submount() call in this case. > > Doing the destruction from the various ->kill_sb() instances instead > of from ->put_super() would also fix this, but I'm not quite sure that > that would be any cleaner. > As saying in the answer I've just posted, a failure in fuse_fill_super_submount() causes an actual crash because fuse_mount_remove() logically assumes fm to already be in fc->mounts, which isn't the case at this point. In the root mount case, this is handled by taking back the ownership on fm, i.e. do the rollback *and* clear sb->s_fs_info. It seems that the same should be done for submounts. > Thanks, > Miklos ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [PATCH v4 1/5] fuse: Fix leak in fuse_dentry_automount() error path @ 2021-05-21 8:15 ` Greg Kurz 0 siblings, 0 replies; 83+ messages in thread From: Greg Kurz @ 2021-05-21 8:15 UTC (permalink / raw) To: Miklos Szeredi Cc: linux-kernel, virtualization, virtio-fs-list, Al Viro, Stefan Hajnoczi, linux-fsdevel, Max Reitz, Vivek Goyal On Fri, 21 May 2021 09:54:19 +0200 Miklos Szeredi <miklos@szeredi.hu> wrote: > On Thu, 20 May 2021 at 21:45, Al Viro <viro@zeniv.linux.org.uk> wrote: > > > > On Thu, May 20, 2021 at 05:46:50PM +0200, Greg Kurz wrote: > > > Some rollback was forgotten during the addition of crossmounts. > > > > Have you actually tested that? Because I strongly suspect that > > by that point the ownership of fc and fm is with sb and those > > should be taken care of by deactivate_locked_super(). > > Not quite. Patch looks correct because destruction of fm is done in > fuse_put_super(), which only gets called if the sb initialization gets > as far as setting up sb->s_root, which only happens after the > successful fuse_fill_super_submount() call in this case. > > Doing the destruction from the various ->kill_sb() instances instead > of from ->put_super() would also fix this, but I'm not quite sure that > that would be any cleaner. > As saying in the answer I've just posted, a failure in fuse_fill_super_submount() causes an actual crash because fuse_mount_remove() logically assumes fm to already be in fc->mounts, which isn't the case at this point. In the root mount case, this is handled by taking back the ownership on fm, i.e. do the rollback *and* clear sb->s_fs_info. It seems that the same should be done for submounts. > Thanks, > Miklos _______________________________________________ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [Virtio-fs] [PATCH v4 1/5] fuse: Fix leak in fuse_dentry_automount() error path 2021-05-21 8:15 ` Greg Kurz @ 2021-05-21 8:23 ` Miklos Szeredi -1 siblings, 0 replies; 83+ messages in thread From: Miklos Szeredi @ 2021-05-21 8:23 UTC (permalink / raw) To: Greg Kurz Cc: linux-kernel, virtualization, virtio-fs-list, Al Viro, linux-fsdevel, Max Reitz, Vivek Goyal On Fri, 21 May 2021 at 10:15, Greg Kurz <groug@kaod.org> wrote: > > On Fri, 21 May 2021 09:54:19 +0200 > Miklos Szeredi <miklos@szeredi.hu> wrote: > > > On Thu, 20 May 2021 at 21:45, Al Viro <viro@zeniv.linux.org.uk> wrote: > > > > > > On Thu, May 20, 2021 at 05:46:50PM +0200, Greg Kurz wrote: > > > > Some rollback was forgotten during the addition of crossmounts. > > > > > > Have you actually tested that? Because I strongly suspect that > > > by that point the ownership of fc and fm is with sb and those > > > should be taken care of by deactivate_locked_super(). > > > > Not quite. Patch looks correct because destruction of fm is done in > > fuse_put_super(), which only gets called if the sb initialization gets > > as far as setting up sb->s_root, which only happens after the > > successful fuse_fill_super_submount() call in this case. > > > > Doing the destruction from the various ->kill_sb() instances instead > > of from ->put_super() would also fix this, but I'm not quite sure that > > that would be any cleaner. > > > > As saying in the answer I've just posted, a failure in > fuse_fill_super_submount() causes an actual crash because > fuse_mount_remove() logically assumes fm to already be in > fc->mounts, which isn't the case at this point. > > In the root mount case, this is handled by taking back > the ownership on fm, i.e. do the rollback *and* clear > sb->s_fs_info. It seems that the same should be done > for submounts. Agreed. Thanks for verifying. Miklos ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [PATCH v4 1/5] fuse: Fix leak in fuse_dentry_automount() error path @ 2021-05-21 8:23 ` Miklos Szeredi 0 siblings, 0 replies; 83+ messages in thread From: Miklos Szeredi @ 2021-05-21 8:23 UTC (permalink / raw) To: Greg Kurz Cc: Al Viro, virtualization, linux-fsdevel, linux-kernel, virtio-fs-list, Stefan Hajnoczi, Max Reitz, Vivek Goyal On Fri, 21 May 2021 at 10:15, Greg Kurz <groug@kaod.org> wrote: > > On Fri, 21 May 2021 09:54:19 +0200 > Miklos Szeredi <miklos@szeredi.hu> wrote: > > > On Thu, 20 May 2021 at 21:45, Al Viro <viro@zeniv.linux.org.uk> wrote: > > > > > > On Thu, May 20, 2021 at 05:46:50PM +0200, Greg Kurz wrote: > > > > Some rollback was forgotten during the addition of crossmounts. > > > > > > Have you actually tested that? Because I strongly suspect that > > > by that point the ownership of fc and fm is with sb and those > > > should be taken care of by deactivate_locked_super(). > > > > Not quite. Patch looks correct because destruction of fm is done in > > fuse_put_super(), which only gets called if the sb initialization gets > > as far as setting up sb->s_root, which only happens after the > > successful fuse_fill_super_submount() call in this case. > > > > Doing the destruction from the various ->kill_sb() instances instead > > of from ->put_super() would also fix this, but I'm not quite sure that > > that would be any cleaner. > > > > As saying in the answer I've just posted, a failure in > fuse_fill_super_submount() causes an actual crash because > fuse_mount_remove() logically assumes fm to already be in > fc->mounts, which isn't the case at this point. > > In the root mount case, this is handled by taking back > the ownership on fm, i.e. do the rollback *and* clear > sb->s_fs_info. It seems that the same should be done > for submounts. Agreed. Thanks for verifying. Miklos ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [Virtio-fs] [PATCH v4 1/5] fuse: Fix leak in fuse_dentry_automount() error path 2021-05-20 19:45 ` Al Viro (?) @ 2021-05-21 8:08 ` Greg Kurz -1 siblings, 0 replies; 83+ messages in thread From: Greg Kurz @ 2021-05-21 8:08 UTC (permalink / raw) To: Al Viro Cc: Miklos Szeredi, linux-kernel, virtualization, virtio-fs, linux-fsdevel, Max Reitz, Vivek Goyal On Thu, 20 May 2021 19:45:13 +0000 Al Viro <viro@zeniv.linux.org.uk> wrote: > On Thu, May 20, 2021 at 05:46:50PM +0200, Greg Kurz wrote: > > Some rollback was forgotten during the addition of crossmounts. > > Have you actually tested that? Because I strongly suspect that > by that point the ownership of fc and fm is with sb and those > should be taken care of by deactivate_locked_super(). My bad, I didn't test but now I did and the issue is actually worse than just a memory leak. This error path crashes upstream without this patch: [ 26.206673] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 26.209560] #PF: supervisor read access in kernel mode [ 26.211699] #PF: error_code(0x0000) - not-present page [ 26.214574] PGD 0 P4D 0 [ 26.216016] Oops: 0000 [#1] SMP PTI [ 26.217451] CPU: 0 PID: 3380 Comm: ls Kdump: loaded Not tainted 5.13.0-virtio-fs-sync+ #30 [ 26.220839] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [ 26.226362] RIP: 0010:__list_del_entry_valid+0x25/0x90 [ 26.228449] Code: c3 0f 1f 40 00 48 8b 17 4c 8b 47 08 48 b8 00 01 00 00 00 00 ad de 48 39 c2 74 26 48 b8 22 01 00 00 00 00 ad de 49 39 c0 74 2b <49> 8b 30 48 39 fe 75 3a 48 8b 52 08 48 39 f2 75 48 b8 01 00 00 00 [ 26.234256] RSP: 0018:ffffaa37006cbb18 EFLAGS: 00010217 [ 26.235473] RAX: dead000000000122 RBX: ffff8f6844098200 RCX: 0000000000000000 [ 26.236922] RDX: 0000000000000000 RSI: ffffffff99264e92 RDI: ffff8f6844098210 [ 26.238401] RBP: ffff8f68420b3c00 R08: 0000000000000000 R09: 000000000000002a [ 26.239852] R10: 0000000000000000 R11: ffff8f6840402480 R12: ffff8f6844098210 [ 26.241160] R13: ffff8f68420b3da8 R14: ffff8f6844098200 R15: 0000000000000000 [ 26.242398] FS: 00007f547b93f200(0000) GS:ffff8f687bc00000(0000) knlGS:0000000000000000 [ 26.243698] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 26.244693] CR2: 0000000000000000 CR3: 0000000104e50000 CR4: 00000000000006f0 [ 26.245936] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 26.246961] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 26.247938] Call Trace: [ 26.248300] fuse_mount_remove+0x2c/0x70 [fuse] [ 26.248892] virtio_kill_sb+0x22/0x160 [virtiofs] [ 26.249487] deactivate_locked_super+0x36/0xa0 [ 26.250077] fuse_dentry_automount+0x178/0x1a0 [fuse] The crash happens because we're assuming fm was already added to fc->mounts... bool fuse_mount_remove(struct fuse_mount *fm) { struct fuse_conn *fc = fm->fc; bool last = false; down_write(&fc->killsb); list_del_init(&fm->fc_entry); <=== HERE if (list_empty(&fc->mounts)) last = true; up_write(&fc->killsb); return last; } but fm is added to fc->mounts much later after the superblock is fully configured. Looking again at what is done for the root mount in virtio_fs_get_tree(), I now realize sb->s_fs_info is used as a flag to decide whether fuse_mount_remove() should be called: static int virtio_fs_get_tree(struct fs_context *fsc) { ... if (!sb->s_root) { err = virtio_fs_fill_super(sb, fsc); if (err) { fuse_conn_put(fc); kfree(fm); CLEARED HERE => sb->s_fs_info = NULL; deactivate_locked_super(sb); return err; } sb->s_flags |= SB_ACTIVE; } ... } static void virtio_kill_sb(struct super_block *sb) { struct fuse_mount *fm = get_fuse_mount_super(sb); I.E. sb->s_fs_info bool last; /* If mount failed, we can still be called without any fc */ if (fm) { TESTED HERE ^^ last = fuse_mount_remove(fm); if (last) virtio_fs_conn_destroy(fm); } kill_anon_super(sb); } The natural fix is to do the same in the automount case : take back the ownership on fm by clearing sb->s_fs_info, which thus implies to do the freeing. ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [PATCH v4 1/5] fuse: Fix leak in fuse_dentry_automount() error path @ 2021-05-21 8:08 ` Greg Kurz 0 siblings, 0 replies; 83+ messages in thread From: Greg Kurz @ 2021-05-21 8:08 UTC (permalink / raw) To: Al Viro Cc: Miklos Szeredi, virtualization, linux-fsdevel, linux-kernel, virtio-fs, Stefan Hajnoczi, Max Reitz, Vivek Goyal On Thu, 20 May 2021 19:45:13 +0000 Al Viro <viro@zeniv.linux.org.uk> wrote: > On Thu, May 20, 2021 at 05:46:50PM +0200, Greg Kurz wrote: > > Some rollback was forgotten during the addition of crossmounts. > > Have you actually tested that? Because I strongly suspect that > by that point the ownership of fc and fm is with sb and those > should be taken care of by deactivate_locked_super(). My bad, I didn't test but now I did and the issue is actually worse than just a memory leak. This error path crashes upstream without this patch: [ 26.206673] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 26.209560] #PF: supervisor read access in kernel mode [ 26.211699] #PF: error_code(0x0000) - not-present page [ 26.214574] PGD 0 P4D 0 [ 26.216016] Oops: 0000 [#1] SMP PTI [ 26.217451] CPU: 0 PID: 3380 Comm: ls Kdump: loaded Not tainted 5.13.0-virtio-fs-sync+ #30 [ 26.220839] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [ 26.226362] RIP: 0010:__list_del_entry_valid+0x25/0x90 [ 26.228449] Code: c3 0f 1f 40 00 48 8b 17 4c 8b 47 08 48 b8 00 01 00 00 00 00 ad de 48 39 c2 74 26 48 b8 22 01 00 00 00 00 ad de 49 39 c0 74 2b <49> 8b 30 48 39 fe 75 3a 48 8b 52 08 48 39 f2 75 48 b8 01 00 00 00 [ 26.234256] RSP: 0018:ffffaa37006cbb18 EFLAGS: 00010217 [ 26.235473] RAX: dead000000000122 RBX: ffff8f6844098200 RCX: 0000000000000000 [ 26.236922] RDX: 0000000000000000 RSI: ffffffff99264e92 RDI: ffff8f6844098210 [ 26.238401] RBP: ffff8f68420b3c00 R08: 0000000000000000 R09: 000000000000002a [ 26.239852] R10: 0000000000000000 R11: ffff8f6840402480 R12: ffff8f6844098210 [ 26.241160] R13: ffff8f68420b3da8 R14: ffff8f6844098200 R15: 0000000000000000 [ 26.242398] FS: 00007f547b93f200(0000) GS:ffff8f687bc00000(0000) knlGS:0000000000000000 [ 26.243698] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 26.244693] CR2: 0000000000000000 CR3: 0000000104e50000 CR4: 00000000000006f0 [ 26.245936] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 26.246961] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 26.247938] Call Trace: [ 26.248300] fuse_mount_remove+0x2c/0x70 [fuse] [ 26.248892] virtio_kill_sb+0x22/0x160 [virtiofs] [ 26.249487] deactivate_locked_super+0x36/0xa0 [ 26.250077] fuse_dentry_automount+0x178/0x1a0 [fuse] The crash happens because we're assuming fm was already added to fc->mounts... bool fuse_mount_remove(struct fuse_mount *fm) { struct fuse_conn *fc = fm->fc; bool last = false; down_write(&fc->killsb); list_del_init(&fm->fc_entry); <=== HERE if (list_empty(&fc->mounts)) last = true; up_write(&fc->killsb); return last; } but fm is added to fc->mounts much later after the superblock is fully configured. Looking again at what is done for the root mount in virtio_fs_get_tree(), I now realize sb->s_fs_info is used as a flag to decide whether fuse_mount_remove() should be called: static int virtio_fs_get_tree(struct fs_context *fsc) { ... if (!sb->s_root) { err = virtio_fs_fill_super(sb, fsc); if (err) { fuse_conn_put(fc); kfree(fm); CLEARED HERE => sb->s_fs_info = NULL; deactivate_locked_super(sb); return err; } sb->s_flags |= SB_ACTIVE; } ... } static void virtio_kill_sb(struct super_block *sb) { struct fuse_mount *fm = get_fuse_mount_super(sb); I.E. sb->s_fs_info bool last; /* If mount failed, we can still be called without any fc */ if (fm) { TESTED HERE ^^ last = fuse_mount_remove(fm); if (last) virtio_fs_conn_destroy(fm); } kill_anon_super(sb); } The natural fix is to do the same in the automount case : take back the ownership on fm by clearing sb->s_fs_info, which thus implies to do the freeing. ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [PATCH v4 1/5] fuse: Fix leak in fuse_dentry_automount() error path @ 2021-05-21 8:08 ` Greg Kurz 0 siblings, 0 replies; 83+ messages in thread From: Greg Kurz @ 2021-05-21 8:08 UTC (permalink / raw) To: Al Viro Cc: Miklos Szeredi, linux-kernel, virtualization, virtio-fs, Stefan Hajnoczi, linux-fsdevel, Max Reitz, Vivek Goyal On Thu, 20 May 2021 19:45:13 +0000 Al Viro <viro@zeniv.linux.org.uk> wrote: > On Thu, May 20, 2021 at 05:46:50PM +0200, Greg Kurz wrote: > > Some rollback was forgotten during the addition of crossmounts. > > Have you actually tested that? Because I strongly suspect that > by that point the ownership of fc and fm is with sb and those > should be taken care of by deactivate_locked_super(). My bad, I didn't test but now I did and the issue is actually worse than just a memory leak. This error path crashes upstream without this patch: [ 26.206673] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 26.209560] #PF: supervisor read access in kernel mode [ 26.211699] #PF: error_code(0x0000) - not-present page [ 26.214574] PGD 0 P4D 0 [ 26.216016] Oops: 0000 [#1] SMP PTI [ 26.217451] CPU: 0 PID: 3380 Comm: ls Kdump: loaded Not tainted 5.13.0-virtio-fs-sync+ #30 [ 26.220839] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [ 26.226362] RIP: 0010:__list_del_entry_valid+0x25/0x90 [ 26.228449] Code: c3 0f 1f 40 00 48 8b 17 4c 8b 47 08 48 b8 00 01 00 00 00 00 ad de 48 39 c2 74 26 48 b8 22 01 00 00 00 00 ad de 49 39 c0 74 2b <49> 8b 30 48 39 fe 75 3a 48 8b 52 08 48 39 f2 75 48 b8 01 00 00 00 [ 26.234256] RSP: 0018:ffffaa37006cbb18 EFLAGS: 00010217 [ 26.235473] RAX: dead000000000122 RBX: ffff8f6844098200 RCX: 0000000000000000 [ 26.236922] RDX: 0000000000000000 RSI: ffffffff99264e92 RDI: ffff8f6844098210 [ 26.238401] RBP: ffff8f68420b3c00 R08: 0000000000000000 R09: 000000000000002a [ 26.239852] R10: 0000000000000000 R11: ffff8f6840402480 R12: ffff8f6844098210 [ 26.241160] R13: ffff8f68420b3da8 R14: ffff8f6844098200 R15: 0000000000000000 [ 26.242398] FS: 00007f547b93f200(0000) GS:ffff8f687bc00000(0000) knlGS:0000000000000000 [ 26.243698] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 26.244693] CR2: 0000000000000000 CR3: 0000000104e50000 CR4: 00000000000006f0 [ 26.245936] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 26.246961] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 26.247938] Call Trace: [ 26.248300] fuse_mount_remove+0x2c/0x70 [fuse] [ 26.248892] virtio_kill_sb+0x22/0x160 [virtiofs] [ 26.249487] deactivate_locked_super+0x36/0xa0 [ 26.250077] fuse_dentry_automount+0x178/0x1a0 [fuse] The crash happens because we're assuming fm was already added to fc->mounts... bool fuse_mount_remove(struct fuse_mount *fm) { struct fuse_conn *fc = fm->fc; bool last = false; down_write(&fc->killsb); list_del_init(&fm->fc_entry); <=== HERE if (list_empty(&fc->mounts)) last = true; up_write(&fc->killsb); return last; } but fm is added to fc->mounts much later after the superblock is fully configured. Looking again at what is done for the root mount in virtio_fs_get_tree(), I now realize sb->s_fs_info is used as a flag to decide whether fuse_mount_remove() should be called: static int virtio_fs_get_tree(struct fs_context *fsc) { ... if (!sb->s_root) { err = virtio_fs_fill_super(sb, fsc); if (err) { fuse_conn_put(fc); kfree(fm); CLEARED HERE => sb->s_fs_info = NULL; deactivate_locked_super(sb); return err; } sb->s_flags |= SB_ACTIVE; } ... } static void virtio_kill_sb(struct super_block *sb) { struct fuse_mount *fm = get_fuse_mount_super(sb); I.E. sb->s_fs_info bool last; /* If mount failed, we can still be called without any fc */ if (fm) { TESTED HERE ^^ last = fuse_mount_remove(fm); if (last) virtio_fs_conn_destroy(fm); } kill_anon_super(sb); } The natural fix is to do the same in the automount case : take back the ownership on fm by clearing sb->s_fs_info, which thus implies to do the freeing. _______________________________________________ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization ^ permalink raw reply [flat|nested] 83+ messages in thread
* [Virtio-fs] [PATCH v4 2/5] fuse: Call vfs_get_tree() for submounts 2021-05-20 15:46 ` Greg Kurz (?) @ 2021-05-20 15:46 ` Greg Kurz -1 siblings, 0 replies; 83+ messages in thread From: Greg Kurz @ 2021-05-20 15:46 UTC (permalink / raw) To: Miklos Szeredi Cc: linux-kernel, Max Reitz, virtio-fs, linux-fsdevel, virtualization, Vivek Goyal We don't set the SB_BORN flag on submounts superblocks. This is wrong as these superblocks are then considered as partially constructed or dying in the rest of the code and can break some assumptions. One such case is when you have a virtiofs filesystem and you try to mount it again : virtio_fs_get_tree() tries to obtain a superblock with sget_fc(). The matching criteria in virtio_fs_test_super() is the pointer of the underlying virtiofs device, which is shared by the root mount and its submounts. This means that any submount can be picked up instead of the root mount. This is itself a bug : submounts should be ignored in this case. But, most importantly, it then triggers an infinite loop in sget_fc() because it fails to grab the superblock (very easy to reproduce). The only viable solution is to set SB_BORN at some point. This must be done with vfs_get_tree() because setting SB_BORN requires special care, i.e. a memory barrier for super_cache_count() which can check SB_BORN without taking any lock. This requires to split out some code from fuse_dentry_automount() to a new dedicated fuse_get_tree_submount(). The fs_private field of the filesystem context isn't used with submounts : hijack it to pass the FUSE inode of the mount point down to fuse_get_tree_submount(). Finally, adapt virtiofs to use this. Signed-off-by: Greg Kurz <groug@kaod.org> --- fs/fuse/dir.c | 48 +++++++++++---------------------------------- fs/fuse/fuse_i.h | 6 ++++++ fs/fuse/inode.c | 43 ++++++++++++++++++++++++++++++++++++++++ fs/fuse/virtio_fs.c | 3 +++ 4 files changed, 63 insertions(+), 37 deletions(-) diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c index fb2af70596c3..4c8dafe4f69e 100644 --- a/fs/fuse/dir.c +++ b/fs/fuse/dir.c @@ -309,12 +309,9 @@ static int fuse_dentry_delete(const struct dentry *dentry) static struct vfsmount *fuse_dentry_automount(struct path *path) { struct fs_context *fsc; - struct fuse_mount *parent_fm = get_fuse_mount_super(path->mnt->mnt_sb); - struct fuse_conn *fc = parent_fm->fc; struct fuse_mount *fm; struct vfsmount *mnt; struct fuse_inode *mp_fi = get_fuse_inode(d_inode(path->dentry)); - struct super_block *sb; int err; fsc = fs_context_for_submount(path->mnt->mnt_sb->s_type, path->dentry); @@ -323,36 +320,19 @@ static struct vfsmount *fuse_dentry_automount(struct path *path) goto out; } - err = -ENOMEM; - fm = kzalloc(sizeof(struct fuse_mount), GFP_KERNEL); - if (!fm) + /* + * Hijack fsc->fs_private to pass the mount point inode to + * fuse_get_tree_submount(). It *must* be NULLified afterwards + * to avoid the inode pointer to be passed to kfree() when + * the context gets freed. + */ + fsc->fs_private = mp_fi; + err = vfs_get_tree(fsc); + fsc->fs_private = NULL; + if (err) goto out_put_fsc; - fsc->s_fs_info = fm; - sb = sget_fc(fsc, NULL, set_anon_super_fc); - if (IS_ERR(sb)) { - err = PTR_ERR(sb); - kfree(fm); - goto out_put_fsc; - } - fm->fc = fuse_conn_get(fc); - - /* Initialize superblock, making @mp_fi its root */ - err = fuse_fill_super_submount(sb, mp_fi); - if (err) { - fuse_conn_put(fc); - kfree(fm); - goto out_put_sb; - } - - sb->s_flags |= SB_ACTIVE; - fsc->root = dget(sb->s_root); - /* We are done configuring the superblock, so unlock it */ - up_write(&sb->s_umount); - - down_write(&fc->killsb); - list_add_tail(&fm->fc_entry, &fc->mounts); - up_write(&fc->killsb); + fm = get_fuse_mount_super(fsc->root->d_sb); /* Create the submount */ mnt = vfs_create_mount(fsc); @@ -364,12 +344,6 @@ static struct vfsmount *fuse_dentry_automount(struct path *path) put_fs_context(fsc); return mnt; -out_put_sb: - /* - * Only jump here when fsc->root is NULL and sb is still locked - * (otherwise put_fs_context() will put the superblock) - */ - deactivate_locked_super(sb); out_put_fsc: put_fs_context(fsc); out: diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index 7e463e220053..d7fcf59a6a0e 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -1090,6 +1090,12 @@ int fuse_fill_super_common(struct super_block *sb, struct fuse_fs_context *ctx); int fuse_fill_super_submount(struct super_block *sb, struct fuse_inode *parent_fi); +/* + * Get the mountable root for the submount + * @fsc: superblock configuration context + */ +int fuse_get_tree_submount(struct fs_context *fsc); + /* * Remove the mount from the connection * diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index 393e36b74dc4..74e5205f203c 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -1313,6 +1313,49 @@ int fuse_fill_super_submount(struct super_block *sb, return 0; } +/* Filesystem context private data holds the FUSE inode of the mount point */ +int fuse_get_tree_submount(struct fs_context *fsc) +{ + struct fuse_mount *fm; + struct fuse_inode *mp_fi = fsc->fs_private; + struct fuse_conn *fc = get_fuse_conn(&mp_fi->inode); + struct super_block *sb; + int err; + + fm = kzalloc(sizeof(struct fuse_mount), GFP_KERNEL); + if (!fm) + return -ENOMEM; + + fsc->s_fs_info = fm; + sb = sget_fc(fsc, NULL, set_anon_super_fc); + if (IS_ERR(sb)) { + kfree(fm); + return PTR_ERR(sb); + } + fm->fc = fuse_conn_get(fc); + + /* Initialize superblock, making @mp_fi its root */ + err = fuse_fill_super_submount(sb, mp_fi); + if (err) { + fuse_conn_put(fc); + deactivate_locked_super(sb); + kfree(fm); + return err; + } + + sb->s_flags |= SB_ACTIVE; + fsc->root = dget(sb->s_root); + /* We are done configuring the superblock, so unlock it */ + up_write(&sb->s_umount); + + down_write(&fc->killsb); + list_add_tail(&fm->fc_entry, &fc->mounts); + up_write(&fc->killsb); + + return 0; +} +EXPORT_SYMBOL_GPL(fuse_get_tree_submount); + int fuse_fill_super_common(struct super_block *sb, struct fuse_fs_context *ctx) { struct fuse_dev *fud = NULL; diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c index bcb8a02e2d8b..e12e5190352c 100644 --- a/fs/fuse/virtio_fs.c +++ b/fs/fuse/virtio_fs.c @@ -1420,6 +1420,9 @@ static int virtio_fs_get_tree(struct fs_context *fsc) unsigned int virtqueue_size; int err = -EIO; + if (fsc->purpose == FS_CONTEXT_FOR_SUBMOUNT) + return fuse_get_tree_submount(fsc); + /* This gets a reference on virtio_fs object. This ptr gets installed * in fc->iq->priv. Once fuse_conn is going away, it calls ->put() * to drop the reference to this object. -- 2.26.3 ^ permalink raw reply related [flat|nested] 83+ messages in thread
* [PATCH v4 2/5] fuse: Call vfs_get_tree() for submounts @ 2021-05-20 15:46 ` Greg Kurz 0 siblings, 0 replies; 83+ messages in thread From: Greg Kurz @ 2021-05-20 15:46 UTC (permalink / raw) To: Miklos Szeredi Cc: virtualization, linux-fsdevel, linux-kernel, virtio-fs, Stefan Hajnoczi, Max Reitz, Vivek Goyal, Greg Kurz We don't set the SB_BORN flag on submounts superblocks. This is wrong as these superblocks are then considered as partially constructed or dying in the rest of the code and can break some assumptions. One such case is when you have a virtiofs filesystem and you try to mount it again : virtio_fs_get_tree() tries to obtain a superblock with sget_fc(). The matching criteria in virtio_fs_test_super() is the pointer of the underlying virtiofs device, which is shared by the root mount and its submounts. This means that any submount can be picked up instead of the root mount. This is itself a bug : submounts should be ignored in this case. But, most importantly, it then triggers an infinite loop in sget_fc() because it fails to grab the superblock (very easy to reproduce). The only viable solution is to set SB_BORN at some point. This must be done with vfs_get_tree() because setting SB_BORN requires special care, i.e. a memory barrier for super_cache_count() which can check SB_BORN without taking any lock. This requires to split out some code from fuse_dentry_automount() to a new dedicated fuse_get_tree_submount(). The fs_private field of the filesystem context isn't used with submounts : hijack it to pass the FUSE inode of the mount point down to fuse_get_tree_submount(). Finally, adapt virtiofs to use this. Signed-off-by: Greg Kurz <groug@kaod.org> --- fs/fuse/dir.c | 48 +++++++++++---------------------------------- fs/fuse/fuse_i.h | 6 ++++++ fs/fuse/inode.c | 43 ++++++++++++++++++++++++++++++++++++++++ fs/fuse/virtio_fs.c | 3 +++ 4 files changed, 63 insertions(+), 37 deletions(-) diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c index fb2af70596c3..4c8dafe4f69e 100644 --- a/fs/fuse/dir.c +++ b/fs/fuse/dir.c @@ -309,12 +309,9 @@ static int fuse_dentry_delete(const struct dentry *dentry) static struct vfsmount *fuse_dentry_automount(struct path *path) { struct fs_context *fsc; - struct fuse_mount *parent_fm = get_fuse_mount_super(path->mnt->mnt_sb); - struct fuse_conn *fc = parent_fm->fc; struct fuse_mount *fm; struct vfsmount *mnt; struct fuse_inode *mp_fi = get_fuse_inode(d_inode(path->dentry)); - struct super_block *sb; int err; fsc = fs_context_for_submount(path->mnt->mnt_sb->s_type, path->dentry); @@ -323,36 +320,19 @@ static struct vfsmount *fuse_dentry_automount(struct path *path) goto out; } - err = -ENOMEM; - fm = kzalloc(sizeof(struct fuse_mount), GFP_KERNEL); - if (!fm) + /* + * Hijack fsc->fs_private to pass the mount point inode to + * fuse_get_tree_submount(). It *must* be NULLified afterwards + * to avoid the inode pointer to be passed to kfree() when + * the context gets freed. + */ + fsc->fs_private = mp_fi; + err = vfs_get_tree(fsc); + fsc->fs_private = NULL; + if (err) goto out_put_fsc; - fsc->s_fs_info = fm; - sb = sget_fc(fsc, NULL, set_anon_super_fc); - if (IS_ERR(sb)) { - err = PTR_ERR(sb); - kfree(fm); - goto out_put_fsc; - } - fm->fc = fuse_conn_get(fc); - - /* Initialize superblock, making @mp_fi its root */ - err = fuse_fill_super_submount(sb, mp_fi); - if (err) { - fuse_conn_put(fc); - kfree(fm); - goto out_put_sb; - } - - sb->s_flags |= SB_ACTIVE; - fsc->root = dget(sb->s_root); - /* We are done configuring the superblock, so unlock it */ - up_write(&sb->s_umount); - - down_write(&fc->killsb); - list_add_tail(&fm->fc_entry, &fc->mounts); - up_write(&fc->killsb); + fm = get_fuse_mount_super(fsc->root->d_sb); /* Create the submount */ mnt = vfs_create_mount(fsc); @@ -364,12 +344,6 @@ static struct vfsmount *fuse_dentry_automount(struct path *path) put_fs_context(fsc); return mnt; -out_put_sb: - /* - * Only jump here when fsc->root is NULL and sb is still locked - * (otherwise put_fs_context() will put the superblock) - */ - deactivate_locked_super(sb); out_put_fsc: put_fs_context(fsc); out: diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index 7e463e220053..d7fcf59a6a0e 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -1090,6 +1090,12 @@ int fuse_fill_super_common(struct super_block *sb, struct fuse_fs_context *ctx); int fuse_fill_super_submount(struct super_block *sb, struct fuse_inode *parent_fi); +/* + * Get the mountable root for the submount + * @fsc: superblock configuration context + */ +int fuse_get_tree_submount(struct fs_context *fsc); + /* * Remove the mount from the connection * diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index 393e36b74dc4..74e5205f203c 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -1313,6 +1313,49 @@ int fuse_fill_super_submount(struct super_block *sb, return 0; } +/* Filesystem context private data holds the FUSE inode of the mount point */ +int fuse_get_tree_submount(struct fs_context *fsc) +{ + struct fuse_mount *fm; + struct fuse_inode *mp_fi = fsc->fs_private; + struct fuse_conn *fc = get_fuse_conn(&mp_fi->inode); + struct super_block *sb; + int err; + + fm = kzalloc(sizeof(struct fuse_mount), GFP_KERNEL); + if (!fm) + return -ENOMEM; + + fsc->s_fs_info = fm; + sb = sget_fc(fsc, NULL, set_anon_super_fc); + if (IS_ERR(sb)) { + kfree(fm); + return PTR_ERR(sb); + } + fm->fc = fuse_conn_get(fc); + + /* Initialize superblock, making @mp_fi its root */ + err = fuse_fill_super_submount(sb, mp_fi); + if (err) { + fuse_conn_put(fc); + deactivate_locked_super(sb); + kfree(fm); + return err; + } + + sb->s_flags |= SB_ACTIVE; + fsc->root = dget(sb->s_root); + /* We are done configuring the superblock, so unlock it */ + up_write(&sb->s_umount); + + down_write(&fc->killsb); + list_add_tail(&fm->fc_entry, &fc->mounts); + up_write(&fc->killsb); + + return 0; +} +EXPORT_SYMBOL_GPL(fuse_get_tree_submount); + int fuse_fill_super_common(struct super_block *sb, struct fuse_fs_context *ctx) { struct fuse_dev *fud = NULL; diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c index bcb8a02e2d8b..e12e5190352c 100644 --- a/fs/fuse/virtio_fs.c +++ b/fs/fuse/virtio_fs.c @@ -1420,6 +1420,9 @@ static int virtio_fs_get_tree(struct fs_context *fsc) unsigned int virtqueue_size; int err = -EIO; + if (fsc->purpose == FS_CONTEXT_FOR_SUBMOUNT) + return fuse_get_tree_submount(fsc); + /* This gets a reference on virtio_fs object. This ptr gets installed * in fc->iq->priv. Once fuse_conn is going away, it calls ->put() * to drop the reference to this object. -- 2.26.3 ^ permalink raw reply related [flat|nested] 83+ messages in thread
* [PATCH v4 2/5] fuse: Call vfs_get_tree() for submounts @ 2021-05-20 15:46 ` Greg Kurz 0 siblings, 0 replies; 83+ messages in thread From: Greg Kurz @ 2021-05-20 15:46 UTC (permalink / raw) To: Miklos Szeredi Cc: linux-kernel, Max Reitz, virtio-fs, Stefan Hajnoczi, linux-fsdevel, virtualization, Vivek Goyal We don't set the SB_BORN flag on submounts superblocks. This is wrong as these superblocks are then considered as partially constructed or dying in the rest of the code and can break some assumptions. One such case is when you have a virtiofs filesystem and you try to mount it again : virtio_fs_get_tree() tries to obtain a superblock with sget_fc(). The matching criteria in virtio_fs_test_super() is the pointer of the underlying virtiofs device, which is shared by the root mount and its submounts. This means that any submount can be picked up instead of the root mount. This is itself a bug : submounts should be ignored in this case. But, most importantly, it then triggers an infinite loop in sget_fc() because it fails to grab the superblock (very easy to reproduce). The only viable solution is to set SB_BORN at some point. This must be done with vfs_get_tree() because setting SB_BORN requires special care, i.e. a memory barrier for super_cache_count() which can check SB_BORN without taking any lock. This requires to split out some code from fuse_dentry_automount() to a new dedicated fuse_get_tree_submount(). The fs_private field of the filesystem context isn't used with submounts : hijack it to pass the FUSE inode of the mount point down to fuse_get_tree_submount(). Finally, adapt virtiofs to use this. Signed-off-by: Greg Kurz <groug@kaod.org> --- fs/fuse/dir.c | 48 +++++++++++---------------------------------- fs/fuse/fuse_i.h | 6 ++++++ fs/fuse/inode.c | 43 ++++++++++++++++++++++++++++++++++++++++ fs/fuse/virtio_fs.c | 3 +++ 4 files changed, 63 insertions(+), 37 deletions(-) diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c index fb2af70596c3..4c8dafe4f69e 100644 --- a/fs/fuse/dir.c +++ b/fs/fuse/dir.c @@ -309,12 +309,9 @@ static int fuse_dentry_delete(const struct dentry *dentry) static struct vfsmount *fuse_dentry_automount(struct path *path) { struct fs_context *fsc; - struct fuse_mount *parent_fm = get_fuse_mount_super(path->mnt->mnt_sb); - struct fuse_conn *fc = parent_fm->fc; struct fuse_mount *fm; struct vfsmount *mnt; struct fuse_inode *mp_fi = get_fuse_inode(d_inode(path->dentry)); - struct super_block *sb; int err; fsc = fs_context_for_submount(path->mnt->mnt_sb->s_type, path->dentry); @@ -323,36 +320,19 @@ static struct vfsmount *fuse_dentry_automount(struct path *path) goto out; } - err = -ENOMEM; - fm = kzalloc(sizeof(struct fuse_mount), GFP_KERNEL); - if (!fm) + /* + * Hijack fsc->fs_private to pass the mount point inode to + * fuse_get_tree_submount(). It *must* be NULLified afterwards + * to avoid the inode pointer to be passed to kfree() when + * the context gets freed. + */ + fsc->fs_private = mp_fi; + err = vfs_get_tree(fsc); + fsc->fs_private = NULL; + if (err) goto out_put_fsc; - fsc->s_fs_info = fm; - sb = sget_fc(fsc, NULL, set_anon_super_fc); - if (IS_ERR(sb)) { - err = PTR_ERR(sb); - kfree(fm); - goto out_put_fsc; - } - fm->fc = fuse_conn_get(fc); - - /* Initialize superblock, making @mp_fi its root */ - err = fuse_fill_super_submount(sb, mp_fi); - if (err) { - fuse_conn_put(fc); - kfree(fm); - goto out_put_sb; - } - - sb->s_flags |= SB_ACTIVE; - fsc->root = dget(sb->s_root); - /* We are done configuring the superblock, so unlock it */ - up_write(&sb->s_umount); - - down_write(&fc->killsb); - list_add_tail(&fm->fc_entry, &fc->mounts); - up_write(&fc->killsb); + fm = get_fuse_mount_super(fsc->root->d_sb); /* Create the submount */ mnt = vfs_create_mount(fsc); @@ -364,12 +344,6 @@ static struct vfsmount *fuse_dentry_automount(struct path *path) put_fs_context(fsc); return mnt; -out_put_sb: - /* - * Only jump here when fsc->root is NULL and sb is still locked - * (otherwise put_fs_context() will put the superblock) - */ - deactivate_locked_super(sb); out_put_fsc: put_fs_context(fsc); out: diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index 7e463e220053..d7fcf59a6a0e 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -1090,6 +1090,12 @@ int fuse_fill_super_common(struct super_block *sb, struct fuse_fs_context *ctx); int fuse_fill_super_submount(struct super_block *sb, struct fuse_inode *parent_fi); +/* + * Get the mountable root for the submount + * @fsc: superblock configuration context + */ +int fuse_get_tree_submount(struct fs_context *fsc); + /* * Remove the mount from the connection * diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index 393e36b74dc4..74e5205f203c 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -1313,6 +1313,49 @@ int fuse_fill_super_submount(struct super_block *sb, return 0; } +/* Filesystem context private data holds the FUSE inode of the mount point */ +int fuse_get_tree_submount(struct fs_context *fsc) +{ + struct fuse_mount *fm; + struct fuse_inode *mp_fi = fsc->fs_private; + struct fuse_conn *fc = get_fuse_conn(&mp_fi->inode); + struct super_block *sb; + int err; + + fm = kzalloc(sizeof(struct fuse_mount), GFP_KERNEL); + if (!fm) + return -ENOMEM; + + fsc->s_fs_info = fm; + sb = sget_fc(fsc, NULL, set_anon_super_fc); + if (IS_ERR(sb)) { + kfree(fm); + return PTR_ERR(sb); + } + fm->fc = fuse_conn_get(fc); + + /* Initialize superblock, making @mp_fi its root */ + err = fuse_fill_super_submount(sb, mp_fi); + if (err) { + fuse_conn_put(fc); + deactivate_locked_super(sb); + kfree(fm); + return err; + } + + sb->s_flags |= SB_ACTIVE; + fsc->root = dget(sb->s_root); + /* We are done configuring the superblock, so unlock it */ + up_write(&sb->s_umount); + + down_write(&fc->killsb); + list_add_tail(&fm->fc_entry, &fc->mounts); + up_write(&fc->killsb); + + return 0; +} +EXPORT_SYMBOL_GPL(fuse_get_tree_submount); + int fuse_fill_super_common(struct super_block *sb, struct fuse_fs_context *ctx) { struct fuse_dev *fud = NULL; diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c index bcb8a02e2d8b..e12e5190352c 100644 --- a/fs/fuse/virtio_fs.c +++ b/fs/fuse/virtio_fs.c @@ -1420,6 +1420,9 @@ static int virtio_fs_get_tree(struct fs_context *fsc) unsigned int virtqueue_size; int err = -EIO; + if (fsc->purpose == FS_CONTEXT_FOR_SUBMOUNT) + return fuse_get_tree_submount(fsc); + /* This gets a reference on virtio_fs object. This ptr gets installed * in fc->iq->priv. Once fuse_conn is going away, it calls ->put() * to drop the reference to this object. -- 2.26.3 _______________________________________________ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization ^ permalink raw reply related [flat|nested] 83+ messages in thread
* Re: [Virtio-fs] [PATCH v4 2/5] fuse: Call vfs_get_tree() for submounts 2021-05-20 15:46 ` Greg Kurz @ 2021-05-21 8:19 ` Miklos Szeredi -1 siblings, 0 replies; 83+ messages in thread From: Miklos Szeredi @ 2021-05-21 8:19 UTC (permalink / raw) To: Greg Kurz Cc: linux-kernel, Max Reitz, virtio-fs, linux-fsdevel, virtualization, Vivek Goyal On Thu, May 20, 2021 at 05:46:51PM +0200, Greg Kurz wrote: > We don't set the SB_BORN flag on submounts superblocks. This is wrong > as these superblocks are then considered as partially constructed or > dying in the rest of the code and can break some assumptions. > > One such case is when you have a virtiofs filesystem and you try to > mount it again : virtio_fs_get_tree() tries to obtain a superblock > with sget_fc(). The matching criteria in virtio_fs_test_super() is > the pointer of the underlying virtiofs device, which is shared by > the root mount and its submounts. This means that any submount can > be picked up instead of the root mount. This is itself a bug : > submounts should be ignored in this case. But, most importantly, it > then triggers an infinite loop in sget_fc() because it fails to grab > the superblock (very easy to reproduce). > > The only viable solution is to set SB_BORN at some point. This > must be done with vfs_get_tree() because setting SB_BORN requires > special care, i.e. a memory barrier for super_cache_count() which > can check SB_BORN without taking any lock. Looks correct, but... as an easily backportable and verifiable bugfix I'd still go with the simple two liner: --- a/fs/fuse/dir.c +++ b/fs/fuse/dir.c @@ -351,6 +351,9 @@ static struct vfsmount *fuse_dentry_automount(struct path *path) list_add_tail(&fm->fc_entry, &fc->mounts); up_write(&fc->killsb); + smp_wmb(); + sb->s_flags |= SB_BORN; + /* Create the submount */ mnt = vfs_create_mount(fsc); if (IS_ERR(mnt)) { And have this patch be the cleanup. Also we need Fixes: and a Cc: stable@... tags on that one. Thanks, Miklos ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [PATCH v4 2/5] fuse: Call vfs_get_tree() for submounts @ 2021-05-21 8:19 ` Miklos Szeredi 0 siblings, 0 replies; 83+ messages in thread From: Miklos Szeredi @ 2021-05-21 8:19 UTC (permalink / raw) To: Greg Kurz Cc: virtualization, linux-fsdevel, linux-kernel, virtio-fs, Stefan Hajnoczi, Max Reitz, Vivek Goyal On Thu, May 20, 2021 at 05:46:51PM +0200, Greg Kurz wrote: > We don't set the SB_BORN flag on submounts superblocks. This is wrong > as these superblocks are then considered as partially constructed or > dying in the rest of the code and can break some assumptions. > > One such case is when you have a virtiofs filesystem and you try to > mount it again : virtio_fs_get_tree() tries to obtain a superblock > with sget_fc(). The matching criteria in virtio_fs_test_super() is > the pointer of the underlying virtiofs device, which is shared by > the root mount and its submounts. This means that any submount can > be picked up instead of the root mount. This is itself a bug : > submounts should be ignored in this case. But, most importantly, it > then triggers an infinite loop in sget_fc() because it fails to grab > the superblock (very easy to reproduce). > > The only viable solution is to set SB_BORN at some point. This > must be done with vfs_get_tree() because setting SB_BORN requires > special care, i.e. a memory barrier for super_cache_count() which > can check SB_BORN without taking any lock. Looks correct, but... as an easily backportable and verifiable bugfix I'd still go with the simple two liner: --- a/fs/fuse/dir.c +++ b/fs/fuse/dir.c @@ -351,6 +351,9 @@ static struct vfsmount *fuse_dentry_automount(struct path *path) list_add_tail(&fm->fc_entry, &fc->mounts); up_write(&fc->killsb); + smp_wmb(); + sb->s_flags |= SB_BORN; + /* Create the submount */ mnt = vfs_create_mount(fsc); if (IS_ERR(mnt)) { And have this patch be the cleanup. Also we need Fixes: and a Cc: stable@... tags on that one. Thanks, Miklos ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [Virtio-fs] [PATCH v4 2/5] fuse: Call vfs_get_tree() for submounts 2021-05-21 8:19 ` Miklos Szeredi (?) @ 2021-05-21 8:28 ` Greg Kurz -1 siblings, 0 replies; 83+ messages in thread From: Greg Kurz @ 2021-05-21 8:28 UTC (permalink / raw) To: Miklos Szeredi Cc: linux-kernel, Max Reitz, virtio-fs, linux-fsdevel, virtualization, Vivek Goyal On Fri, 21 May 2021 10:19:48 +0200 Miklos Szeredi <miklos@szeredi.hu> wrote: > On Thu, May 20, 2021 at 05:46:51PM +0200, Greg Kurz wrote: > > We don't set the SB_BORN flag on submounts superblocks. This is wrong > > as these superblocks are then considered as partially constructed or > > dying in the rest of the code and can break some assumptions. > > > > One such case is when you have a virtiofs filesystem and you try to > > mount it again : virtio_fs_get_tree() tries to obtain a superblock > > with sget_fc(). The matching criteria in virtio_fs_test_super() is > > the pointer of the underlying virtiofs device, which is shared by > > the root mount and its submounts. This means that any submount can > > be picked up instead of the root mount. This is itself a bug : > > submounts should be ignored in this case. But, most importantly, it > > then triggers an infinite loop in sget_fc() because it fails to grab > > the superblock (very easy to reproduce). > > > > The only viable solution is to set SB_BORN at some point. This > > must be done with vfs_get_tree() because setting SB_BORN requires > > special care, i.e. a memory barrier for super_cache_count() which > > can check SB_BORN without taking any lock. > > Looks correct, but... > > as an easily backportable and verifiable bugfix I'd still go with the > simple two liner: > > --- a/fs/fuse/dir.c > +++ b/fs/fuse/dir.c > @@ -351,6 +351,9 @@ static struct vfsmount *fuse_dentry_automount(struct path *path) > list_add_tail(&fm->fc_entry, &fc->mounts); > up_write(&fc->killsb); > > + smp_wmb(); > + sb->s_flags |= SB_BORN; > + plus the mandatory comment one must put to justify the need for a memory barrier. > /* Create the submount */ > mnt = vfs_create_mount(fsc); > if (IS_ERR(mnt)) { > > And have this patch be the cleanup. > Fair enough. > Also we need Fixes: and a Cc: stable@... tags on that one. > Oops, I'll add these in the next round. > Thanks, > Miklos ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [PATCH v4 2/5] fuse: Call vfs_get_tree() for submounts @ 2021-05-21 8:28 ` Greg Kurz 0 siblings, 0 replies; 83+ messages in thread From: Greg Kurz @ 2021-05-21 8:28 UTC (permalink / raw) To: Miklos Szeredi Cc: virtualization, linux-fsdevel, linux-kernel, virtio-fs, Stefan Hajnoczi, Max Reitz, Vivek Goyal On Fri, 21 May 2021 10:19:48 +0200 Miklos Szeredi <miklos@szeredi.hu> wrote: > On Thu, May 20, 2021 at 05:46:51PM +0200, Greg Kurz wrote: > > We don't set the SB_BORN flag on submounts superblocks. This is wrong > > as these superblocks are then considered as partially constructed or > > dying in the rest of the code and can break some assumptions. > > > > One such case is when you have a virtiofs filesystem and you try to > > mount it again : virtio_fs_get_tree() tries to obtain a superblock > > with sget_fc(). The matching criteria in virtio_fs_test_super() is > > the pointer of the underlying virtiofs device, which is shared by > > the root mount and its submounts. This means that any submount can > > be picked up instead of the root mount. This is itself a bug : > > submounts should be ignored in this case. But, most importantly, it > > then triggers an infinite loop in sget_fc() because it fails to grab > > the superblock (very easy to reproduce). > > > > The only viable solution is to set SB_BORN at some point. This > > must be done with vfs_get_tree() because setting SB_BORN requires > > special care, i.e. a memory barrier for super_cache_count() which > > can check SB_BORN without taking any lock. > > Looks correct, but... > > as an easily backportable and verifiable bugfix I'd still go with the > simple two liner: > > --- a/fs/fuse/dir.c > +++ b/fs/fuse/dir.c > @@ -351,6 +351,9 @@ static struct vfsmount *fuse_dentry_automount(struct path *path) > list_add_tail(&fm->fc_entry, &fc->mounts); > up_write(&fc->killsb); > > + smp_wmb(); > + sb->s_flags |= SB_BORN; > + plus the mandatory comment one must put to justify the need for a memory barrier. > /* Create the submount */ > mnt = vfs_create_mount(fsc); > if (IS_ERR(mnt)) { > > And have this patch be the cleanup. > Fair enough. > Also we need Fixes: and a Cc: stable@... tags on that one. > Oops, I'll add these in the next round. > Thanks, > Miklos ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [PATCH v4 2/5] fuse: Call vfs_get_tree() for submounts @ 2021-05-21 8:28 ` Greg Kurz 0 siblings, 0 replies; 83+ messages in thread From: Greg Kurz @ 2021-05-21 8:28 UTC (permalink / raw) To: Miklos Szeredi Cc: linux-kernel, Max Reitz, virtio-fs, Stefan Hajnoczi, linux-fsdevel, virtualization, Vivek Goyal On Fri, 21 May 2021 10:19:48 +0200 Miklos Szeredi <miklos@szeredi.hu> wrote: > On Thu, May 20, 2021 at 05:46:51PM +0200, Greg Kurz wrote: > > We don't set the SB_BORN flag on submounts superblocks. This is wrong > > as these superblocks are then considered as partially constructed or > > dying in the rest of the code and can break some assumptions. > > > > One such case is when you have a virtiofs filesystem and you try to > > mount it again : virtio_fs_get_tree() tries to obtain a superblock > > with sget_fc(). The matching criteria in virtio_fs_test_super() is > > the pointer of the underlying virtiofs device, which is shared by > > the root mount and its submounts. This means that any submount can > > be picked up instead of the root mount. This is itself a bug : > > submounts should be ignored in this case. But, most importantly, it > > then triggers an infinite loop in sget_fc() because it fails to grab > > the superblock (very easy to reproduce). > > > > The only viable solution is to set SB_BORN at some point. This > > must be done with vfs_get_tree() because setting SB_BORN requires > > special care, i.e. a memory barrier for super_cache_count() which > > can check SB_BORN without taking any lock. > > Looks correct, but... > > as an easily backportable and verifiable bugfix I'd still go with the > simple two liner: > > --- a/fs/fuse/dir.c > +++ b/fs/fuse/dir.c > @@ -351,6 +351,9 @@ static struct vfsmount *fuse_dentry_automount(struct path *path) > list_add_tail(&fm->fc_entry, &fc->mounts); > up_write(&fc->killsb); > > + smp_wmb(); > + sb->s_flags |= SB_BORN; > + plus the mandatory comment one must put to justify the need for a memory barrier. > /* Create the submount */ > mnt = vfs_create_mount(fsc); > if (IS_ERR(mnt)) { > > And have this patch be the cleanup. > Fair enough. > Also we need Fixes: and a Cc: stable@... tags on that one. > Oops, I'll add these in the next round. > Thanks, > Miklos _______________________________________________ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [PATCH v4 2/5] fuse: Call vfs_get_tree() for submounts 2021-05-20 15:46 ` Greg Kurz (?) (?) @ 2021-05-22 17:50 ` kernel test robot -1 siblings, 0 replies; 83+ messages in thread From: kernel test robot @ 2021-05-22 17:50 UTC (permalink / raw) To: kbuild-all [-- Attachment #1: Type: text/plain, Size: 5387 bytes --] Hi Greg, Thank you for the patch! Perhaps something to improve: [auto build test WARNING on fuse/for-next] [also build test WARNING on linux/master linus/master v5.13-rc2 next-20210521] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/0day-ci/linux/commits/Greg-Kurz/virtiofs-propagate-sync-to-file-server/20210522-210652 base: https://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse.git for-next config: nds32-randconfig-r011-20210522 (attached as .config) compiler: nds32le-linux-gcc (GCC) 9.3.0 reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/0day-ci/linux/commit/ee3cc45c5a2311efc82021bd5463271507bef828 git remote add linux-review https://github.com/0day-ci/linux git fetch --no-tags linux-review Greg-Kurz/virtiofs-propagate-sync-to-file-server/20210522-210652 git checkout ee3cc45c5a2311efc82021bd5463271507bef828 # save the attached .config to linux build tree COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=nds32 If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot <lkp@intel.com> All warnings (new ones prefixed by >>): fs/fuse/dir.c: In function 'fuse_dentry_automount': >> fs/fuse/dir.c:312:21: warning: variable 'fm' set but not used [-Wunused-but-set-variable] 312 | struct fuse_mount *fm; | ^~ vim +/fm +312 fs/fuse/dir.c 8fab010644363f Miklos Szeredi 2018-08-15 303 bf109c64040f5b Max Reitz 2020-04-21 304 /* bf109c64040f5b Max Reitz 2020-04-21 305 * Create a fuse_mount object with a new superblock (with path->dentry bf109c64040f5b Max Reitz 2020-04-21 306 * as the root), and return that mount so it can be auto-mounted on bf109c64040f5b Max Reitz 2020-04-21 307 * @path. bf109c64040f5b Max Reitz 2020-04-21 308 */ bf109c64040f5b Max Reitz 2020-04-21 309 static struct vfsmount *fuse_dentry_automount(struct path *path) bf109c64040f5b Max Reitz 2020-04-21 310 { bf109c64040f5b Max Reitz 2020-04-21 311 struct fs_context *fsc; bf109c64040f5b Max Reitz 2020-04-21 @312 struct fuse_mount *fm; bf109c64040f5b Max Reitz 2020-04-21 313 struct vfsmount *mnt; bf109c64040f5b Max Reitz 2020-04-21 314 struct fuse_inode *mp_fi = get_fuse_inode(d_inode(path->dentry)); bf109c64040f5b Max Reitz 2020-04-21 315 int err; bf109c64040f5b Max Reitz 2020-04-21 316 bf109c64040f5b Max Reitz 2020-04-21 317 fsc = fs_context_for_submount(path->mnt->mnt_sb->s_type, path->dentry); bf109c64040f5b Max Reitz 2020-04-21 318 if (IS_ERR(fsc)) { bf109c64040f5b Max Reitz 2020-04-21 319 err = PTR_ERR(fsc); bf109c64040f5b Max Reitz 2020-04-21 320 goto out; bf109c64040f5b Max Reitz 2020-04-21 321 } bf109c64040f5b Max Reitz 2020-04-21 322 ee3cc45c5a2311 Greg Kurz 2021-05-20 323 /* ee3cc45c5a2311 Greg Kurz 2021-05-20 324 * Hijack fsc->fs_private to pass the mount point inode to ee3cc45c5a2311 Greg Kurz 2021-05-20 325 * fuse_get_tree_submount(). It *must* be NULLified afterwards ee3cc45c5a2311 Greg Kurz 2021-05-20 326 * to avoid the inode pointer to be passed to kfree() when ee3cc45c5a2311 Greg Kurz 2021-05-20 327 * the context gets freed. ee3cc45c5a2311 Greg Kurz 2021-05-20 328 */ ee3cc45c5a2311 Greg Kurz 2021-05-20 329 fsc->fs_private = mp_fi; ee3cc45c5a2311 Greg Kurz 2021-05-20 330 err = vfs_get_tree(fsc); ee3cc45c5a2311 Greg Kurz 2021-05-20 331 fsc->fs_private = NULL; ee3cc45c5a2311 Greg Kurz 2021-05-20 332 if (err) bf109c64040f5b Max Reitz 2020-04-21 333 goto out_put_fsc; bf109c64040f5b Max Reitz 2020-04-21 334 ee3cc45c5a2311 Greg Kurz 2021-05-20 335 fm = get_fuse_mount_super(fsc->root->d_sb); bf109c64040f5b Max Reitz 2020-04-21 336 bf109c64040f5b Max Reitz 2020-04-21 337 /* Create the submount */ bf109c64040f5b Max Reitz 2020-04-21 338 mnt = vfs_create_mount(fsc); bf109c64040f5b Max Reitz 2020-04-21 339 if (IS_ERR(mnt)) { bf109c64040f5b Max Reitz 2020-04-21 340 err = PTR_ERR(mnt); bf109c64040f5b Max Reitz 2020-04-21 341 goto out_put_fsc; bf109c64040f5b Max Reitz 2020-04-21 342 } bf109c64040f5b Max Reitz 2020-04-21 343 mntget(mnt); bf109c64040f5b Max Reitz 2020-04-21 344 put_fs_context(fsc); bf109c64040f5b Max Reitz 2020-04-21 345 return mnt; bf109c64040f5b Max Reitz 2020-04-21 346 bf109c64040f5b Max Reitz 2020-04-21 347 out_put_fsc: bf109c64040f5b Max Reitz 2020-04-21 348 put_fs_context(fsc); bf109c64040f5b Max Reitz 2020-04-21 349 out: bf109c64040f5b Max Reitz 2020-04-21 350 return ERR_PTR(err); bf109c64040f5b Max Reitz 2020-04-21 351 } bf109c64040f5b Max Reitz 2020-04-21 352 --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org [-- Attachment #2: config.gz --] [-- Type: application/gzip, Size: 21559 bytes --] ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [PATCH v4 2/5] fuse: Call vfs_get_tree() for submounts @ 2021-05-22 17:50 ` kernel test robot 0 siblings, 0 replies; 83+ messages in thread From: kernel test robot @ 2021-05-22 17:50 UTC (permalink / raw) To: Greg Kurz, Miklos Szeredi Cc: kbuild-all, virtualization, linux-fsdevel, linux-kernel, virtio-fs, Stefan Hajnoczi, Max Reitz, Vivek Goyal, Greg Kurz [-- Attachment #1: Type: text/plain, Size: 5294 bytes --] Hi Greg, Thank you for the patch! Perhaps something to improve: [auto build test WARNING on fuse/for-next] [also build test WARNING on linux/master linus/master v5.13-rc2 next-20210521] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/0day-ci/linux/commits/Greg-Kurz/virtiofs-propagate-sync-to-file-server/20210522-210652 base: https://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse.git for-next config: nds32-randconfig-r011-20210522 (attached as .config) compiler: nds32le-linux-gcc (GCC) 9.3.0 reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/0day-ci/linux/commit/ee3cc45c5a2311efc82021bd5463271507bef828 git remote add linux-review https://github.com/0day-ci/linux git fetch --no-tags linux-review Greg-Kurz/virtiofs-propagate-sync-to-file-server/20210522-210652 git checkout ee3cc45c5a2311efc82021bd5463271507bef828 # save the attached .config to linux build tree COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=nds32 If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot <lkp@intel.com> All warnings (new ones prefixed by >>): fs/fuse/dir.c: In function 'fuse_dentry_automount': >> fs/fuse/dir.c:312:21: warning: variable 'fm' set but not used [-Wunused-but-set-variable] 312 | struct fuse_mount *fm; | ^~ vim +/fm +312 fs/fuse/dir.c 8fab010644363f Miklos Szeredi 2018-08-15 303 bf109c64040f5b Max Reitz 2020-04-21 304 /* bf109c64040f5b Max Reitz 2020-04-21 305 * Create a fuse_mount object with a new superblock (with path->dentry bf109c64040f5b Max Reitz 2020-04-21 306 * as the root), and return that mount so it can be auto-mounted on bf109c64040f5b Max Reitz 2020-04-21 307 * @path. bf109c64040f5b Max Reitz 2020-04-21 308 */ bf109c64040f5b Max Reitz 2020-04-21 309 static struct vfsmount *fuse_dentry_automount(struct path *path) bf109c64040f5b Max Reitz 2020-04-21 310 { bf109c64040f5b Max Reitz 2020-04-21 311 struct fs_context *fsc; bf109c64040f5b Max Reitz 2020-04-21 @312 struct fuse_mount *fm; bf109c64040f5b Max Reitz 2020-04-21 313 struct vfsmount *mnt; bf109c64040f5b Max Reitz 2020-04-21 314 struct fuse_inode *mp_fi = get_fuse_inode(d_inode(path->dentry)); bf109c64040f5b Max Reitz 2020-04-21 315 int err; bf109c64040f5b Max Reitz 2020-04-21 316 bf109c64040f5b Max Reitz 2020-04-21 317 fsc = fs_context_for_submount(path->mnt->mnt_sb->s_type, path->dentry); bf109c64040f5b Max Reitz 2020-04-21 318 if (IS_ERR(fsc)) { bf109c64040f5b Max Reitz 2020-04-21 319 err = PTR_ERR(fsc); bf109c64040f5b Max Reitz 2020-04-21 320 goto out; bf109c64040f5b Max Reitz 2020-04-21 321 } bf109c64040f5b Max Reitz 2020-04-21 322 ee3cc45c5a2311 Greg Kurz 2021-05-20 323 /* ee3cc45c5a2311 Greg Kurz 2021-05-20 324 * Hijack fsc->fs_private to pass the mount point inode to ee3cc45c5a2311 Greg Kurz 2021-05-20 325 * fuse_get_tree_submount(). It *must* be NULLified afterwards ee3cc45c5a2311 Greg Kurz 2021-05-20 326 * to avoid the inode pointer to be passed to kfree() when ee3cc45c5a2311 Greg Kurz 2021-05-20 327 * the context gets freed. ee3cc45c5a2311 Greg Kurz 2021-05-20 328 */ ee3cc45c5a2311 Greg Kurz 2021-05-20 329 fsc->fs_private = mp_fi; ee3cc45c5a2311 Greg Kurz 2021-05-20 330 err = vfs_get_tree(fsc); ee3cc45c5a2311 Greg Kurz 2021-05-20 331 fsc->fs_private = NULL; ee3cc45c5a2311 Greg Kurz 2021-05-20 332 if (err) bf109c64040f5b Max Reitz 2020-04-21 333 goto out_put_fsc; bf109c64040f5b Max Reitz 2020-04-21 334 ee3cc45c5a2311 Greg Kurz 2021-05-20 335 fm = get_fuse_mount_super(fsc->root->d_sb); bf109c64040f5b Max Reitz 2020-04-21 336 bf109c64040f5b Max Reitz 2020-04-21 337 /* Create the submount */ bf109c64040f5b Max Reitz 2020-04-21 338 mnt = vfs_create_mount(fsc); bf109c64040f5b Max Reitz 2020-04-21 339 if (IS_ERR(mnt)) { bf109c64040f5b Max Reitz 2020-04-21 340 err = PTR_ERR(mnt); bf109c64040f5b Max Reitz 2020-04-21 341 goto out_put_fsc; bf109c64040f5b Max Reitz 2020-04-21 342 } bf109c64040f5b Max Reitz 2020-04-21 343 mntget(mnt); bf109c64040f5b Max Reitz 2020-04-21 344 put_fs_context(fsc); bf109c64040f5b Max Reitz 2020-04-21 345 return mnt; bf109c64040f5b Max Reitz 2020-04-21 346 bf109c64040f5b Max Reitz 2020-04-21 347 out_put_fsc: bf109c64040f5b Max Reitz 2020-04-21 348 put_fs_context(fsc); bf109c64040f5b Max Reitz 2020-04-21 349 out: bf109c64040f5b Max Reitz 2020-04-21 350 return ERR_PTR(err); bf109c64040f5b Max Reitz 2020-04-21 351 } bf109c64040f5b Max Reitz 2020-04-21 352 --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org [-- Attachment #2: .config.gz --] [-- Type: application/gzip, Size: 21559 bytes --] ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [PATCH v4 2/5] fuse: Call vfs_get_tree() for submounts @ 2021-05-22 17:50 ` kernel test robot 0 siblings, 0 replies; 83+ messages in thread From: kernel test robot @ 2021-05-22 17:50 UTC (permalink / raw) To: Greg Kurz, Miklos Szeredi Cc: kbuild-all, linux-kernel, virtualization, virtio-fs, Stefan Hajnoczi, linux-fsdevel, Max Reitz, Vivek Goyal [-- Attachment #1: Type: text/plain, Size: 5294 bytes --] Hi Greg, Thank you for the patch! Perhaps something to improve: [auto build test WARNING on fuse/for-next] [also build test WARNING on linux/master linus/master v5.13-rc2 next-20210521] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/0day-ci/linux/commits/Greg-Kurz/virtiofs-propagate-sync-to-file-server/20210522-210652 base: https://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse.git for-next config: nds32-randconfig-r011-20210522 (attached as .config) compiler: nds32le-linux-gcc (GCC) 9.3.0 reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/0day-ci/linux/commit/ee3cc45c5a2311efc82021bd5463271507bef828 git remote add linux-review https://github.com/0day-ci/linux git fetch --no-tags linux-review Greg-Kurz/virtiofs-propagate-sync-to-file-server/20210522-210652 git checkout ee3cc45c5a2311efc82021bd5463271507bef828 # save the attached .config to linux build tree COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=nds32 If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot <lkp@intel.com> All warnings (new ones prefixed by >>): fs/fuse/dir.c: In function 'fuse_dentry_automount': >> fs/fuse/dir.c:312:21: warning: variable 'fm' set but not used [-Wunused-but-set-variable] 312 | struct fuse_mount *fm; | ^~ vim +/fm +312 fs/fuse/dir.c 8fab010644363f Miklos Szeredi 2018-08-15 303 bf109c64040f5b Max Reitz 2020-04-21 304 /* bf109c64040f5b Max Reitz 2020-04-21 305 * Create a fuse_mount object with a new superblock (with path->dentry bf109c64040f5b Max Reitz 2020-04-21 306 * as the root), and return that mount so it can be auto-mounted on bf109c64040f5b Max Reitz 2020-04-21 307 * @path. bf109c64040f5b Max Reitz 2020-04-21 308 */ bf109c64040f5b Max Reitz 2020-04-21 309 static struct vfsmount *fuse_dentry_automount(struct path *path) bf109c64040f5b Max Reitz 2020-04-21 310 { bf109c64040f5b Max Reitz 2020-04-21 311 struct fs_context *fsc; bf109c64040f5b Max Reitz 2020-04-21 @312 struct fuse_mount *fm; bf109c64040f5b Max Reitz 2020-04-21 313 struct vfsmount *mnt; bf109c64040f5b Max Reitz 2020-04-21 314 struct fuse_inode *mp_fi = get_fuse_inode(d_inode(path->dentry)); bf109c64040f5b Max Reitz 2020-04-21 315 int err; bf109c64040f5b Max Reitz 2020-04-21 316 bf109c64040f5b Max Reitz 2020-04-21 317 fsc = fs_context_for_submount(path->mnt->mnt_sb->s_type, path->dentry); bf109c64040f5b Max Reitz 2020-04-21 318 if (IS_ERR(fsc)) { bf109c64040f5b Max Reitz 2020-04-21 319 err = PTR_ERR(fsc); bf109c64040f5b Max Reitz 2020-04-21 320 goto out; bf109c64040f5b Max Reitz 2020-04-21 321 } bf109c64040f5b Max Reitz 2020-04-21 322 ee3cc45c5a2311 Greg Kurz 2021-05-20 323 /* ee3cc45c5a2311 Greg Kurz 2021-05-20 324 * Hijack fsc->fs_private to pass the mount point inode to ee3cc45c5a2311 Greg Kurz 2021-05-20 325 * fuse_get_tree_submount(). It *must* be NULLified afterwards ee3cc45c5a2311 Greg Kurz 2021-05-20 326 * to avoid the inode pointer to be passed to kfree() when ee3cc45c5a2311 Greg Kurz 2021-05-20 327 * the context gets freed. ee3cc45c5a2311 Greg Kurz 2021-05-20 328 */ ee3cc45c5a2311 Greg Kurz 2021-05-20 329 fsc->fs_private = mp_fi; ee3cc45c5a2311 Greg Kurz 2021-05-20 330 err = vfs_get_tree(fsc); ee3cc45c5a2311 Greg Kurz 2021-05-20 331 fsc->fs_private = NULL; ee3cc45c5a2311 Greg Kurz 2021-05-20 332 if (err) bf109c64040f5b Max Reitz 2020-04-21 333 goto out_put_fsc; bf109c64040f5b Max Reitz 2020-04-21 334 ee3cc45c5a2311 Greg Kurz 2021-05-20 335 fm = get_fuse_mount_super(fsc->root->d_sb); bf109c64040f5b Max Reitz 2020-04-21 336 bf109c64040f5b Max Reitz 2020-04-21 337 /* Create the submount */ bf109c64040f5b Max Reitz 2020-04-21 338 mnt = vfs_create_mount(fsc); bf109c64040f5b Max Reitz 2020-04-21 339 if (IS_ERR(mnt)) { bf109c64040f5b Max Reitz 2020-04-21 340 err = PTR_ERR(mnt); bf109c64040f5b Max Reitz 2020-04-21 341 goto out_put_fsc; bf109c64040f5b Max Reitz 2020-04-21 342 } bf109c64040f5b Max Reitz 2020-04-21 343 mntget(mnt); bf109c64040f5b Max Reitz 2020-04-21 344 put_fs_context(fsc); bf109c64040f5b Max Reitz 2020-04-21 345 return mnt; bf109c64040f5b Max Reitz 2020-04-21 346 bf109c64040f5b Max Reitz 2020-04-21 347 out_put_fsc: bf109c64040f5b Max Reitz 2020-04-21 348 put_fs_context(fsc); bf109c64040f5b Max Reitz 2020-04-21 349 out: bf109c64040f5b Max Reitz 2020-04-21 350 return ERR_PTR(err); bf109c64040f5b Max Reitz 2020-04-21 351 } bf109c64040f5b Max Reitz 2020-04-21 352 --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org [-- Attachment #2: .config.gz --] [-- Type: application/gzip, Size: 21559 bytes --] [-- Attachment #3: Type: text/plain, Size: 183 bytes --] _______________________________________________ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [Virtio-fs] [PATCH v4 2/5] fuse: Call vfs_get_tree() for submounts @ 2021-05-22 17:50 ` kernel test robot 0 siblings, 0 replies; 83+ messages in thread From: kernel test robot @ 2021-05-22 17:50 UTC (permalink / raw) To: Greg Kurz, Miklos Szeredi Cc: kbuild-all, linux-kernel, virtualization, virtio-fs, linux-fsdevel, Max Reitz, Vivek Goyal [-- Attachment #1: Type: text/plain, Size: 5294 bytes --] Hi Greg, Thank you for the patch! Perhaps something to improve: [auto build test WARNING on fuse/for-next] [also build test WARNING on linux/master linus/master v5.13-rc2 next-20210521] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/0day-ci/linux/commits/Greg-Kurz/virtiofs-propagate-sync-to-file-server/20210522-210652 base: https://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse.git for-next config: nds32-randconfig-r011-20210522 (attached as .config) compiler: nds32le-linux-gcc (GCC) 9.3.0 reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/0day-ci/linux/commit/ee3cc45c5a2311efc82021bd5463271507bef828 git remote add linux-review https://github.com/0day-ci/linux git fetch --no-tags linux-review Greg-Kurz/virtiofs-propagate-sync-to-file-server/20210522-210652 git checkout ee3cc45c5a2311efc82021bd5463271507bef828 # save the attached .config to linux build tree COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=nds32 If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot <lkp@intel.com> All warnings (new ones prefixed by >>): fs/fuse/dir.c: In function 'fuse_dentry_automount': >> fs/fuse/dir.c:312:21: warning: variable 'fm' set but not used [-Wunused-but-set-variable] 312 | struct fuse_mount *fm; | ^~ vim +/fm +312 fs/fuse/dir.c 8fab010644363f Miklos Szeredi 2018-08-15 303 bf109c64040f5b Max Reitz 2020-04-21 304 /* bf109c64040f5b Max Reitz 2020-04-21 305 * Create a fuse_mount object with a new superblock (with path->dentry bf109c64040f5b Max Reitz 2020-04-21 306 * as the root), and return that mount so it can be auto-mounted on bf109c64040f5b Max Reitz 2020-04-21 307 * @path. bf109c64040f5b Max Reitz 2020-04-21 308 */ bf109c64040f5b Max Reitz 2020-04-21 309 static struct vfsmount *fuse_dentry_automount(struct path *path) bf109c64040f5b Max Reitz 2020-04-21 310 { bf109c64040f5b Max Reitz 2020-04-21 311 struct fs_context *fsc; bf109c64040f5b Max Reitz 2020-04-21 @312 struct fuse_mount *fm; bf109c64040f5b Max Reitz 2020-04-21 313 struct vfsmount *mnt; bf109c64040f5b Max Reitz 2020-04-21 314 struct fuse_inode *mp_fi = get_fuse_inode(d_inode(path->dentry)); bf109c64040f5b Max Reitz 2020-04-21 315 int err; bf109c64040f5b Max Reitz 2020-04-21 316 bf109c64040f5b Max Reitz 2020-04-21 317 fsc = fs_context_for_submount(path->mnt->mnt_sb->s_type, path->dentry); bf109c64040f5b Max Reitz 2020-04-21 318 if (IS_ERR(fsc)) { bf109c64040f5b Max Reitz 2020-04-21 319 err = PTR_ERR(fsc); bf109c64040f5b Max Reitz 2020-04-21 320 goto out; bf109c64040f5b Max Reitz 2020-04-21 321 } bf109c64040f5b Max Reitz 2020-04-21 322 ee3cc45c5a2311 Greg Kurz 2021-05-20 323 /* ee3cc45c5a2311 Greg Kurz 2021-05-20 324 * Hijack fsc->fs_private to pass the mount point inode to ee3cc45c5a2311 Greg Kurz 2021-05-20 325 * fuse_get_tree_submount(). It *must* be NULLified afterwards ee3cc45c5a2311 Greg Kurz 2021-05-20 326 * to avoid the inode pointer to be passed to kfree() when ee3cc45c5a2311 Greg Kurz 2021-05-20 327 * the context gets freed. ee3cc45c5a2311 Greg Kurz 2021-05-20 328 */ ee3cc45c5a2311 Greg Kurz 2021-05-20 329 fsc->fs_private = mp_fi; ee3cc45c5a2311 Greg Kurz 2021-05-20 330 err = vfs_get_tree(fsc); ee3cc45c5a2311 Greg Kurz 2021-05-20 331 fsc->fs_private = NULL; ee3cc45c5a2311 Greg Kurz 2021-05-20 332 if (err) bf109c64040f5b Max Reitz 2020-04-21 333 goto out_put_fsc; bf109c64040f5b Max Reitz 2020-04-21 334 ee3cc45c5a2311 Greg Kurz 2021-05-20 335 fm = get_fuse_mount_super(fsc->root->d_sb); bf109c64040f5b Max Reitz 2020-04-21 336 bf109c64040f5b Max Reitz 2020-04-21 337 /* Create the submount */ bf109c64040f5b Max Reitz 2020-04-21 338 mnt = vfs_create_mount(fsc); bf109c64040f5b Max Reitz 2020-04-21 339 if (IS_ERR(mnt)) { bf109c64040f5b Max Reitz 2020-04-21 340 err = PTR_ERR(mnt); bf109c64040f5b Max Reitz 2020-04-21 341 goto out_put_fsc; bf109c64040f5b Max Reitz 2020-04-21 342 } bf109c64040f5b Max Reitz 2020-04-21 343 mntget(mnt); bf109c64040f5b Max Reitz 2020-04-21 344 put_fs_context(fsc); bf109c64040f5b Max Reitz 2020-04-21 345 return mnt; bf109c64040f5b Max Reitz 2020-04-21 346 bf109c64040f5b Max Reitz 2020-04-21 347 out_put_fsc: bf109c64040f5b Max Reitz 2020-04-21 348 put_fs_context(fsc); bf109c64040f5b Max Reitz 2020-04-21 349 out: bf109c64040f5b Max Reitz 2020-04-21 350 return ERR_PTR(err); bf109c64040f5b Max Reitz 2020-04-21 351 } bf109c64040f5b Max Reitz 2020-04-21 352 --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org [-- Attachment #2: .config.gz --] [-- Type: application/gzip, Size: 21559 bytes --] ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [PATCH v4 2/5] fuse: Call vfs_get_tree() for submounts 2021-05-20 15:46 ` Greg Kurz (?) (?) @ 2021-05-22 20:12 ` kernel test robot -1 siblings, 0 replies; 83+ messages in thread From: kernel test robot @ 2021-05-22 20:12 UTC (permalink / raw) To: kbuild-all [-- Attachment #1: Type: text/plain, Size: 5729 bytes --] Hi Greg, Thank you for the patch! Perhaps something to improve: [auto build test WARNING on fuse/for-next] [also build test WARNING on linux/master linus/master v5.13-rc2 next-20210521] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/0day-ci/linux/commits/Greg-Kurz/virtiofs-propagate-sync-to-file-server/20210522-210652 base: https://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse.git for-next config: sparc64-randconfig-p002-20210522 (attached as .config) compiler: sparc64-linux-gcc (GCC) 9.3.0 reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/0day-ci/linux/commit/ee3cc45c5a2311efc82021bd5463271507bef828 git remote add linux-review https://github.com/0day-ci/linux git fetch --no-tags linux-review Greg-Kurz/virtiofs-propagate-sync-to-file-server/20210522-210652 git checkout ee3cc45c5a2311efc82021bd5463271507bef828 # save the attached .config to linux build tree COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=sparc64 If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot <lkp@intel.com> All warnings (new ones prefixed by >>): fs/fuse/dir.c: In function 'fuse_dentry_automount': >> fs/fuse/dir.c:312:21: warning: variable 'fm' set but not used [-Wunused-but-set-variable] 312 | struct fuse_mount *fm; | ^~ Kconfig warnings: (for reference only) WARNING: unmet direct dependencies detected for FRAME_POINTER Depends on DEBUG_KERNEL && (M68K || UML || SUPERH) || ARCH_WANT_FRAME_POINTERS || MCOUNT Selected by - LOCKDEP && DEBUG_KERNEL && LOCK_DEBUGGING_SUPPORT && !MIPS && !PPC && !ARM && !S390 && !MICROBLAZE && !ARC && !X86 vim +/fm +312 fs/fuse/dir.c 8fab010644363f Miklos Szeredi 2018-08-15 303 bf109c64040f5b Max Reitz 2020-04-21 304 /* bf109c64040f5b Max Reitz 2020-04-21 305 * Create a fuse_mount object with a new superblock (with path->dentry bf109c64040f5b Max Reitz 2020-04-21 306 * as the root), and return that mount so it can be auto-mounted on bf109c64040f5b Max Reitz 2020-04-21 307 * @path. bf109c64040f5b Max Reitz 2020-04-21 308 */ bf109c64040f5b Max Reitz 2020-04-21 309 static struct vfsmount *fuse_dentry_automount(struct path *path) bf109c64040f5b Max Reitz 2020-04-21 310 { bf109c64040f5b Max Reitz 2020-04-21 311 struct fs_context *fsc; bf109c64040f5b Max Reitz 2020-04-21 @312 struct fuse_mount *fm; bf109c64040f5b Max Reitz 2020-04-21 313 struct vfsmount *mnt; bf109c64040f5b Max Reitz 2020-04-21 314 struct fuse_inode *mp_fi = get_fuse_inode(d_inode(path->dentry)); bf109c64040f5b Max Reitz 2020-04-21 315 int err; bf109c64040f5b Max Reitz 2020-04-21 316 bf109c64040f5b Max Reitz 2020-04-21 317 fsc = fs_context_for_submount(path->mnt->mnt_sb->s_type, path->dentry); bf109c64040f5b Max Reitz 2020-04-21 318 if (IS_ERR(fsc)) { bf109c64040f5b Max Reitz 2020-04-21 319 err = PTR_ERR(fsc); bf109c64040f5b Max Reitz 2020-04-21 320 goto out; bf109c64040f5b Max Reitz 2020-04-21 321 } bf109c64040f5b Max Reitz 2020-04-21 322 ee3cc45c5a2311 Greg Kurz 2021-05-20 323 /* ee3cc45c5a2311 Greg Kurz 2021-05-20 324 * Hijack fsc->fs_private to pass the mount point inode to ee3cc45c5a2311 Greg Kurz 2021-05-20 325 * fuse_get_tree_submount(). It *must* be NULLified afterwards ee3cc45c5a2311 Greg Kurz 2021-05-20 326 * to avoid the inode pointer to be passed to kfree() when ee3cc45c5a2311 Greg Kurz 2021-05-20 327 * the context gets freed. ee3cc45c5a2311 Greg Kurz 2021-05-20 328 */ ee3cc45c5a2311 Greg Kurz 2021-05-20 329 fsc->fs_private = mp_fi; ee3cc45c5a2311 Greg Kurz 2021-05-20 330 err = vfs_get_tree(fsc); ee3cc45c5a2311 Greg Kurz 2021-05-20 331 fsc->fs_private = NULL; ee3cc45c5a2311 Greg Kurz 2021-05-20 332 if (err) bf109c64040f5b Max Reitz 2020-04-21 333 goto out_put_fsc; bf109c64040f5b Max Reitz 2020-04-21 334 ee3cc45c5a2311 Greg Kurz 2021-05-20 335 fm = get_fuse_mount_super(fsc->root->d_sb); bf109c64040f5b Max Reitz 2020-04-21 336 bf109c64040f5b Max Reitz 2020-04-21 337 /* Create the submount */ bf109c64040f5b Max Reitz 2020-04-21 338 mnt = vfs_create_mount(fsc); bf109c64040f5b Max Reitz 2020-04-21 339 if (IS_ERR(mnt)) { bf109c64040f5b Max Reitz 2020-04-21 340 err = PTR_ERR(mnt); bf109c64040f5b Max Reitz 2020-04-21 341 goto out_put_fsc; bf109c64040f5b Max Reitz 2020-04-21 342 } bf109c64040f5b Max Reitz 2020-04-21 343 mntget(mnt); bf109c64040f5b Max Reitz 2020-04-21 344 put_fs_context(fsc); bf109c64040f5b Max Reitz 2020-04-21 345 return mnt; bf109c64040f5b Max Reitz 2020-04-21 346 bf109c64040f5b Max Reitz 2020-04-21 347 out_put_fsc: bf109c64040f5b Max Reitz 2020-04-21 348 put_fs_context(fsc); bf109c64040f5b Max Reitz 2020-04-21 349 out: bf109c64040f5b Max Reitz 2020-04-21 350 return ERR_PTR(err); bf109c64040f5b Max Reitz 2020-04-21 351 } bf109c64040f5b Max Reitz 2020-04-21 352 --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org [-- Attachment #2: config.gz --] [-- Type: application/gzip, Size: 31248 bytes --] ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [PATCH v4 2/5] fuse: Call vfs_get_tree() for submounts @ 2021-05-22 20:12 ` kernel test robot 0 siblings, 0 replies; 83+ messages in thread From: kernel test robot @ 2021-05-22 20:12 UTC (permalink / raw) To: Greg Kurz, Miklos Szeredi Cc: kbuild-all, virtualization, linux-fsdevel, linux-kernel, virtio-fs, Stefan Hajnoczi, Max Reitz, Vivek Goyal, Greg Kurz [-- Attachment #1: Type: text/plain, Size: 5630 bytes --] Hi Greg, Thank you for the patch! Perhaps something to improve: [auto build test WARNING on fuse/for-next] [also build test WARNING on linux/master linus/master v5.13-rc2 next-20210521] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/0day-ci/linux/commits/Greg-Kurz/virtiofs-propagate-sync-to-file-server/20210522-210652 base: https://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse.git for-next config: sparc64-randconfig-p002-20210522 (attached as .config) compiler: sparc64-linux-gcc (GCC) 9.3.0 reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/0day-ci/linux/commit/ee3cc45c5a2311efc82021bd5463271507bef828 git remote add linux-review https://github.com/0day-ci/linux git fetch --no-tags linux-review Greg-Kurz/virtiofs-propagate-sync-to-file-server/20210522-210652 git checkout ee3cc45c5a2311efc82021bd5463271507bef828 # save the attached .config to linux build tree COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=sparc64 If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot <lkp@intel.com> All warnings (new ones prefixed by >>): fs/fuse/dir.c: In function 'fuse_dentry_automount': >> fs/fuse/dir.c:312:21: warning: variable 'fm' set but not used [-Wunused-but-set-variable] 312 | struct fuse_mount *fm; | ^~ Kconfig warnings: (for reference only) WARNING: unmet direct dependencies detected for FRAME_POINTER Depends on DEBUG_KERNEL && (M68K || UML || SUPERH) || ARCH_WANT_FRAME_POINTERS || MCOUNT Selected by - LOCKDEP && DEBUG_KERNEL && LOCK_DEBUGGING_SUPPORT && !MIPS && !PPC && !ARM && !S390 && !MICROBLAZE && !ARC && !X86 vim +/fm +312 fs/fuse/dir.c 8fab010644363f Miklos Szeredi 2018-08-15 303 bf109c64040f5b Max Reitz 2020-04-21 304 /* bf109c64040f5b Max Reitz 2020-04-21 305 * Create a fuse_mount object with a new superblock (with path->dentry bf109c64040f5b Max Reitz 2020-04-21 306 * as the root), and return that mount so it can be auto-mounted on bf109c64040f5b Max Reitz 2020-04-21 307 * @path. bf109c64040f5b Max Reitz 2020-04-21 308 */ bf109c64040f5b Max Reitz 2020-04-21 309 static struct vfsmount *fuse_dentry_automount(struct path *path) bf109c64040f5b Max Reitz 2020-04-21 310 { bf109c64040f5b Max Reitz 2020-04-21 311 struct fs_context *fsc; bf109c64040f5b Max Reitz 2020-04-21 @312 struct fuse_mount *fm; bf109c64040f5b Max Reitz 2020-04-21 313 struct vfsmount *mnt; bf109c64040f5b Max Reitz 2020-04-21 314 struct fuse_inode *mp_fi = get_fuse_inode(d_inode(path->dentry)); bf109c64040f5b Max Reitz 2020-04-21 315 int err; bf109c64040f5b Max Reitz 2020-04-21 316 bf109c64040f5b Max Reitz 2020-04-21 317 fsc = fs_context_for_submount(path->mnt->mnt_sb->s_type, path->dentry); bf109c64040f5b Max Reitz 2020-04-21 318 if (IS_ERR(fsc)) { bf109c64040f5b Max Reitz 2020-04-21 319 err = PTR_ERR(fsc); bf109c64040f5b Max Reitz 2020-04-21 320 goto out; bf109c64040f5b Max Reitz 2020-04-21 321 } bf109c64040f5b Max Reitz 2020-04-21 322 ee3cc45c5a2311 Greg Kurz 2021-05-20 323 /* ee3cc45c5a2311 Greg Kurz 2021-05-20 324 * Hijack fsc->fs_private to pass the mount point inode to ee3cc45c5a2311 Greg Kurz 2021-05-20 325 * fuse_get_tree_submount(). It *must* be NULLified afterwards ee3cc45c5a2311 Greg Kurz 2021-05-20 326 * to avoid the inode pointer to be passed to kfree() when ee3cc45c5a2311 Greg Kurz 2021-05-20 327 * the context gets freed. ee3cc45c5a2311 Greg Kurz 2021-05-20 328 */ ee3cc45c5a2311 Greg Kurz 2021-05-20 329 fsc->fs_private = mp_fi; ee3cc45c5a2311 Greg Kurz 2021-05-20 330 err = vfs_get_tree(fsc); ee3cc45c5a2311 Greg Kurz 2021-05-20 331 fsc->fs_private = NULL; ee3cc45c5a2311 Greg Kurz 2021-05-20 332 if (err) bf109c64040f5b Max Reitz 2020-04-21 333 goto out_put_fsc; bf109c64040f5b Max Reitz 2020-04-21 334 ee3cc45c5a2311 Greg Kurz 2021-05-20 335 fm = get_fuse_mount_super(fsc->root->d_sb); bf109c64040f5b Max Reitz 2020-04-21 336 bf109c64040f5b Max Reitz 2020-04-21 337 /* Create the submount */ bf109c64040f5b Max Reitz 2020-04-21 338 mnt = vfs_create_mount(fsc); bf109c64040f5b Max Reitz 2020-04-21 339 if (IS_ERR(mnt)) { bf109c64040f5b Max Reitz 2020-04-21 340 err = PTR_ERR(mnt); bf109c64040f5b Max Reitz 2020-04-21 341 goto out_put_fsc; bf109c64040f5b Max Reitz 2020-04-21 342 } bf109c64040f5b Max Reitz 2020-04-21 343 mntget(mnt); bf109c64040f5b Max Reitz 2020-04-21 344 put_fs_context(fsc); bf109c64040f5b Max Reitz 2020-04-21 345 return mnt; bf109c64040f5b Max Reitz 2020-04-21 346 bf109c64040f5b Max Reitz 2020-04-21 347 out_put_fsc: bf109c64040f5b Max Reitz 2020-04-21 348 put_fs_context(fsc); bf109c64040f5b Max Reitz 2020-04-21 349 out: bf109c64040f5b Max Reitz 2020-04-21 350 return ERR_PTR(err); bf109c64040f5b Max Reitz 2020-04-21 351 } bf109c64040f5b Max Reitz 2020-04-21 352 --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org [-- Attachment #2: .config.gz --] [-- Type: application/gzip, Size: 31248 bytes --] ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [PATCH v4 2/5] fuse: Call vfs_get_tree() for submounts @ 2021-05-22 20:12 ` kernel test robot 0 siblings, 0 replies; 83+ messages in thread From: kernel test robot @ 2021-05-22 20:12 UTC (permalink / raw) To: Greg Kurz, Miklos Szeredi Cc: kbuild-all, linux-kernel, virtualization, virtio-fs, Stefan Hajnoczi, linux-fsdevel, Max Reitz, Vivek Goyal [-- Attachment #1: Type: text/plain, Size: 5630 bytes --] Hi Greg, Thank you for the patch! Perhaps something to improve: [auto build test WARNING on fuse/for-next] [also build test WARNING on linux/master linus/master v5.13-rc2 next-20210521] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/0day-ci/linux/commits/Greg-Kurz/virtiofs-propagate-sync-to-file-server/20210522-210652 base: https://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse.git for-next config: sparc64-randconfig-p002-20210522 (attached as .config) compiler: sparc64-linux-gcc (GCC) 9.3.0 reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/0day-ci/linux/commit/ee3cc45c5a2311efc82021bd5463271507bef828 git remote add linux-review https://github.com/0day-ci/linux git fetch --no-tags linux-review Greg-Kurz/virtiofs-propagate-sync-to-file-server/20210522-210652 git checkout ee3cc45c5a2311efc82021bd5463271507bef828 # save the attached .config to linux build tree COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=sparc64 If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot <lkp@intel.com> All warnings (new ones prefixed by >>): fs/fuse/dir.c: In function 'fuse_dentry_automount': >> fs/fuse/dir.c:312:21: warning: variable 'fm' set but not used [-Wunused-but-set-variable] 312 | struct fuse_mount *fm; | ^~ Kconfig warnings: (for reference only) WARNING: unmet direct dependencies detected for FRAME_POINTER Depends on DEBUG_KERNEL && (M68K || UML || SUPERH) || ARCH_WANT_FRAME_POINTERS || MCOUNT Selected by - LOCKDEP && DEBUG_KERNEL && LOCK_DEBUGGING_SUPPORT && !MIPS && !PPC && !ARM && !S390 && !MICROBLAZE && !ARC && !X86 vim +/fm +312 fs/fuse/dir.c 8fab010644363f Miklos Szeredi 2018-08-15 303 bf109c64040f5b Max Reitz 2020-04-21 304 /* bf109c64040f5b Max Reitz 2020-04-21 305 * Create a fuse_mount object with a new superblock (with path->dentry bf109c64040f5b Max Reitz 2020-04-21 306 * as the root), and return that mount so it can be auto-mounted on bf109c64040f5b Max Reitz 2020-04-21 307 * @path. bf109c64040f5b Max Reitz 2020-04-21 308 */ bf109c64040f5b Max Reitz 2020-04-21 309 static struct vfsmount *fuse_dentry_automount(struct path *path) bf109c64040f5b Max Reitz 2020-04-21 310 { bf109c64040f5b Max Reitz 2020-04-21 311 struct fs_context *fsc; bf109c64040f5b Max Reitz 2020-04-21 @312 struct fuse_mount *fm; bf109c64040f5b Max Reitz 2020-04-21 313 struct vfsmount *mnt; bf109c64040f5b Max Reitz 2020-04-21 314 struct fuse_inode *mp_fi = get_fuse_inode(d_inode(path->dentry)); bf109c64040f5b Max Reitz 2020-04-21 315 int err; bf109c64040f5b Max Reitz 2020-04-21 316 bf109c64040f5b Max Reitz 2020-04-21 317 fsc = fs_context_for_submount(path->mnt->mnt_sb->s_type, path->dentry); bf109c64040f5b Max Reitz 2020-04-21 318 if (IS_ERR(fsc)) { bf109c64040f5b Max Reitz 2020-04-21 319 err = PTR_ERR(fsc); bf109c64040f5b Max Reitz 2020-04-21 320 goto out; bf109c64040f5b Max Reitz 2020-04-21 321 } bf109c64040f5b Max Reitz 2020-04-21 322 ee3cc45c5a2311 Greg Kurz 2021-05-20 323 /* ee3cc45c5a2311 Greg Kurz 2021-05-20 324 * Hijack fsc->fs_private to pass the mount point inode to ee3cc45c5a2311 Greg Kurz 2021-05-20 325 * fuse_get_tree_submount(). It *must* be NULLified afterwards ee3cc45c5a2311 Greg Kurz 2021-05-20 326 * to avoid the inode pointer to be passed to kfree() when ee3cc45c5a2311 Greg Kurz 2021-05-20 327 * the context gets freed. ee3cc45c5a2311 Greg Kurz 2021-05-20 328 */ ee3cc45c5a2311 Greg Kurz 2021-05-20 329 fsc->fs_private = mp_fi; ee3cc45c5a2311 Greg Kurz 2021-05-20 330 err = vfs_get_tree(fsc); ee3cc45c5a2311 Greg Kurz 2021-05-20 331 fsc->fs_private = NULL; ee3cc45c5a2311 Greg Kurz 2021-05-20 332 if (err) bf109c64040f5b Max Reitz 2020-04-21 333 goto out_put_fsc; bf109c64040f5b Max Reitz 2020-04-21 334 ee3cc45c5a2311 Greg Kurz 2021-05-20 335 fm = get_fuse_mount_super(fsc->root->d_sb); bf109c64040f5b Max Reitz 2020-04-21 336 bf109c64040f5b Max Reitz 2020-04-21 337 /* Create the submount */ bf109c64040f5b Max Reitz 2020-04-21 338 mnt = vfs_create_mount(fsc); bf109c64040f5b Max Reitz 2020-04-21 339 if (IS_ERR(mnt)) { bf109c64040f5b Max Reitz 2020-04-21 340 err = PTR_ERR(mnt); bf109c64040f5b Max Reitz 2020-04-21 341 goto out_put_fsc; bf109c64040f5b Max Reitz 2020-04-21 342 } bf109c64040f5b Max Reitz 2020-04-21 343 mntget(mnt); bf109c64040f5b Max Reitz 2020-04-21 344 put_fs_context(fsc); bf109c64040f5b Max Reitz 2020-04-21 345 return mnt; bf109c64040f5b Max Reitz 2020-04-21 346 bf109c64040f5b Max Reitz 2020-04-21 347 out_put_fsc: bf109c64040f5b Max Reitz 2020-04-21 348 put_fs_context(fsc); bf109c64040f5b Max Reitz 2020-04-21 349 out: bf109c64040f5b Max Reitz 2020-04-21 350 return ERR_PTR(err); bf109c64040f5b Max Reitz 2020-04-21 351 } bf109c64040f5b Max Reitz 2020-04-21 352 --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org [-- Attachment #2: .config.gz --] [-- Type: application/gzip, Size: 31248 bytes --] [-- Attachment #3: Type: text/plain, Size: 183 bytes --] _______________________________________________ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [Virtio-fs] [PATCH v4 2/5] fuse: Call vfs_get_tree() for submounts @ 2021-05-22 20:12 ` kernel test robot 0 siblings, 0 replies; 83+ messages in thread From: kernel test robot @ 2021-05-22 20:12 UTC (permalink / raw) To: Greg Kurz, Miklos Szeredi Cc: kbuild-all, linux-kernel, virtualization, virtio-fs, linux-fsdevel, Max Reitz, Vivek Goyal [-- Attachment #1: Type: text/plain, Size: 5630 bytes --] Hi Greg, Thank you for the patch! Perhaps something to improve: [auto build test WARNING on fuse/for-next] [also build test WARNING on linux/master linus/master v5.13-rc2 next-20210521] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/0day-ci/linux/commits/Greg-Kurz/virtiofs-propagate-sync-to-file-server/20210522-210652 base: https://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse.git for-next config: sparc64-randconfig-p002-20210522 (attached as .config) compiler: sparc64-linux-gcc (GCC) 9.3.0 reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/0day-ci/linux/commit/ee3cc45c5a2311efc82021bd5463271507bef828 git remote add linux-review https://github.com/0day-ci/linux git fetch --no-tags linux-review Greg-Kurz/virtiofs-propagate-sync-to-file-server/20210522-210652 git checkout ee3cc45c5a2311efc82021bd5463271507bef828 # save the attached .config to linux build tree COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=sparc64 If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot <lkp@intel.com> All warnings (new ones prefixed by >>): fs/fuse/dir.c: In function 'fuse_dentry_automount': >> fs/fuse/dir.c:312:21: warning: variable 'fm' set but not used [-Wunused-but-set-variable] 312 | struct fuse_mount *fm; | ^~ Kconfig warnings: (for reference only) WARNING: unmet direct dependencies detected for FRAME_POINTER Depends on DEBUG_KERNEL && (M68K || UML || SUPERH) || ARCH_WANT_FRAME_POINTERS || MCOUNT Selected by - LOCKDEP && DEBUG_KERNEL && LOCK_DEBUGGING_SUPPORT && !MIPS && !PPC && !ARM && !S390 && !MICROBLAZE && !ARC && !X86 vim +/fm +312 fs/fuse/dir.c 8fab010644363f Miklos Szeredi 2018-08-15 303 bf109c64040f5b Max Reitz 2020-04-21 304 /* bf109c64040f5b Max Reitz 2020-04-21 305 * Create a fuse_mount object with a new superblock (with path->dentry bf109c64040f5b Max Reitz 2020-04-21 306 * as the root), and return that mount so it can be auto-mounted on bf109c64040f5b Max Reitz 2020-04-21 307 * @path. bf109c64040f5b Max Reitz 2020-04-21 308 */ bf109c64040f5b Max Reitz 2020-04-21 309 static struct vfsmount *fuse_dentry_automount(struct path *path) bf109c64040f5b Max Reitz 2020-04-21 310 { bf109c64040f5b Max Reitz 2020-04-21 311 struct fs_context *fsc; bf109c64040f5b Max Reitz 2020-04-21 @312 struct fuse_mount *fm; bf109c64040f5b Max Reitz 2020-04-21 313 struct vfsmount *mnt; bf109c64040f5b Max Reitz 2020-04-21 314 struct fuse_inode *mp_fi = get_fuse_inode(d_inode(path->dentry)); bf109c64040f5b Max Reitz 2020-04-21 315 int err; bf109c64040f5b Max Reitz 2020-04-21 316 bf109c64040f5b Max Reitz 2020-04-21 317 fsc = fs_context_for_submount(path->mnt->mnt_sb->s_type, path->dentry); bf109c64040f5b Max Reitz 2020-04-21 318 if (IS_ERR(fsc)) { bf109c64040f5b Max Reitz 2020-04-21 319 err = PTR_ERR(fsc); bf109c64040f5b Max Reitz 2020-04-21 320 goto out; bf109c64040f5b Max Reitz 2020-04-21 321 } bf109c64040f5b Max Reitz 2020-04-21 322 ee3cc45c5a2311 Greg Kurz 2021-05-20 323 /* ee3cc45c5a2311 Greg Kurz 2021-05-20 324 * Hijack fsc->fs_private to pass the mount point inode to ee3cc45c5a2311 Greg Kurz 2021-05-20 325 * fuse_get_tree_submount(). It *must* be NULLified afterwards ee3cc45c5a2311 Greg Kurz 2021-05-20 326 * to avoid the inode pointer to be passed to kfree() when ee3cc45c5a2311 Greg Kurz 2021-05-20 327 * the context gets freed. ee3cc45c5a2311 Greg Kurz 2021-05-20 328 */ ee3cc45c5a2311 Greg Kurz 2021-05-20 329 fsc->fs_private = mp_fi; ee3cc45c5a2311 Greg Kurz 2021-05-20 330 err = vfs_get_tree(fsc); ee3cc45c5a2311 Greg Kurz 2021-05-20 331 fsc->fs_private = NULL; ee3cc45c5a2311 Greg Kurz 2021-05-20 332 if (err) bf109c64040f5b Max Reitz 2020-04-21 333 goto out_put_fsc; bf109c64040f5b Max Reitz 2020-04-21 334 ee3cc45c5a2311 Greg Kurz 2021-05-20 335 fm = get_fuse_mount_super(fsc->root->d_sb); bf109c64040f5b Max Reitz 2020-04-21 336 bf109c64040f5b Max Reitz 2020-04-21 337 /* Create the submount */ bf109c64040f5b Max Reitz 2020-04-21 338 mnt = vfs_create_mount(fsc); bf109c64040f5b Max Reitz 2020-04-21 339 if (IS_ERR(mnt)) { bf109c64040f5b Max Reitz 2020-04-21 340 err = PTR_ERR(mnt); bf109c64040f5b Max Reitz 2020-04-21 341 goto out_put_fsc; bf109c64040f5b Max Reitz 2020-04-21 342 } bf109c64040f5b Max Reitz 2020-04-21 343 mntget(mnt); bf109c64040f5b Max Reitz 2020-04-21 344 put_fs_context(fsc); bf109c64040f5b Max Reitz 2020-04-21 345 return mnt; bf109c64040f5b Max Reitz 2020-04-21 346 bf109c64040f5b Max Reitz 2020-04-21 347 out_put_fsc: bf109c64040f5b Max Reitz 2020-04-21 348 put_fs_context(fsc); bf109c64040f5b Max Reitz 2020-04-21 349 out: bf109c64040f5b Max Reitz 2020-04-21 350 return ERR_PTR(err); bf109c64040f5b Max Reitz 2020-04-21 351 } bf109c64040f5b Max Reitz 2020-04-21 352 --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org [-- Attachment #2: .config.gz --] [-- Type: application/gzip, Size: 31248 bytes --] ^ permalink raw reply [flat|nested] 83+ messages in thread
* [Virtio-fs] [PATCH v4 3/5] fuse: Make fuse_fill_super_submount() static 2021-05-20 15:46 ` Greg Kurz (?) @ 2021-05-20 15:46 ` Greg Kurz -1 siblings, 0 replies; 83+ messages in thread From: Greg Kurz @ 2021-05-20 15:46 UTC (permalink / raw) To: Miklos Szeredi Cc: linux-kernel, Max Reitz, virtio-fs, linux-fsdevel, virtualization, Vivek Goyal This function used to be called from fuse_dentry_automount(). This code was moved to fuse_get_tree_submount() in the same file since then. It is unlikely there will ever be another user. No need to be extern in this case. Signed-off-by: Greg Kurz <groug@kaod.org> --- fs/fuse/fuse_i.h | 9 --------- fs/fuse/inode.c | 4 ++-- 2 files changed, 2 insertions(+), 11 deletions(-) diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index d7fcf59a6a0e..e2f5c8617e0d 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -1081,15 +1081,6 @@ void fuse_send_init(struct fuse_mount *fm); */ int fuse_fill_super_common(struct super_block *sb, struct fuse_fs_context *ctx); -/* - * Fill in superblock for submounts - * @sb: partially-initialized superblock to fill in - * @parent_fi: The fuse_inode of the parent filesystem where this submount is - * mounted - */ -int fuse_fill_super_submount(struct super_block *sb, - struct fuse_inode *parent_fi); - /* * Get the mountable root for the submount * @fsc: superblock configuration context diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index 74e5205f203c..123b53d1c3c6 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -1275,8 +1275,8 @@ static void fuse_sb_defaults(struct super_block *sb) sb->s_xattr = fuse_no_acl_xattr_handlers; } -int fuse_fill_super_submount(struct super_block *sb, - struct fuse_inode *parent_fi) +static int fuse_fill_super_submount(struct super_block *sb, + struct fuse_inode *parent_fi) { struct fuse_mount *fm = get_fuse_mount_super(sb); struct super_block *parent_sb = parent_fi->inode.i_sb; -- 2.26.3 ^ permalink raw reply related [flat|nested] 83+ messages in thread
* [PATCH v4 3/5] fuse: Make fuse_fill_super_submount() static @ 2021-05-20 15:46 ` Greg Kurz 0 siblings, 0 replies; 83+ messages in thread From: Greg Kurz @ 2021-05-20 15:46 UTC (permalink / raw) To: Miklos Szeredi Cc: virtualization, linux-fsdevel, linux-kernel, virtio-fs, Stefan Hajnoczi, Max Reitz, Vivek Goyal, Greg Kurz This function used to be called from fuse_dentry_automount(). This code was moved to fuse_get_tree_submount() in the same file since then. It is unlikely there will ever be another user. No need to be extern in this case. Signed-off-by: Greg Kurz <groug@kaod.org> --- fs/fuse/fuse_i.h | 9 --------- fs/fuse/inode.c | 4 ++-- 2 files changed, 2 insertions(+), 11 deletions(-) diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index d7fcf59a6a0e..e2f5c8617e0d 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -1081,15 +1081,6 @@ void fuse_send_init(struct fuse_mount *fm); */ int fuse_fill_super_common(struct super_block *sb, struct fuse_fs_context *ctx); -/* - * Fill in superblock for submounts - * @sb: partially-initialized superblock to fill in - * @parent_fi: The fuse_inode of the parent filesystem where this submount is - * mounted - */ -int fuse_fill_super_submount(struct super_block *sb, - struct fuse_inode *parent_fi); - /* * Get the mountable root for the submount * @fsc: superblock configuration context diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index 74e5205f203c..123b53d1c3c6 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -1275,8 +1275,8 @@ static void fuse_sb_defaults(struct super_block *sb) sb->s_xattr = fuse_no_acl_xattr_handlers; } -int fuse_fill_super_submount(struct super_block *sb, - struct fuse_inode *parent_fi) +static int fuse_fill_super_submount(struct super_block *sb, + struct fuse_inode *parent_fi) { struct fuse_mount *fm = get_fuse_mount_super(sb); struct super_block *parent_sb = parent_fi->inode.i_sb; -- 2.26.3 ^ permalink raw reply related [flat|nested] 83+ messages in thread
* [PATCH v4 3/5] fuse: Make fuse_fill_super_submount() static @ 2021-05-20 15:46 ` Greg Kurz 0 siblings, 0 replies; 83+ messages in thread From: Greg Kurz @ 2021-05-20 15:46 UTC (permalink / raw) To: Miklos Szeredi Cc: linux-kernel, Max Reitz, virtio-fs, Stefan Hajnoczi, linux-fsdevel, virtualization, Vivek Goyal This function used to be called from fuse_dentry_automount(). This code was moved to fuse_get_tree_submount() in the same file since then. It is unlikely there will ever be another user. No need to be extern in this case. Signed-off-by: Greg Kurz <groug@kaod.org> --- fs/fuse/fuse_i.h | 9 --------- fs/fuse/inode.c | 4 ++-- 2 files changed, 2 insertions(+), 11 deletions(-) diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index d7fcf59a6a0e..e2f5c8617e0d 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -1081,15 +1081,6 @@ void fuse_send_init(struct fuse_mount *fm); */ int fuse_fill_super_common(struct super_block *sb, struct fuse_fs_context *ctx); -/* - * Fill in superblock for submounts - * @sb: partially-initialized superblock to fill in - * @parent_fi: The fuse_inode of the parent filesystem where this submount is - * mounted - */ -int fuse_fill_super_submount(struct super_block *sb, - struct fuse_inode *parent_fi); - /* * Get the mountable root for the submount * @fsc: superblock configuration context diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index 74e5205f203c..123b53d1c3c6 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -1275,8 +1275,8 @@ static void fuse_sb_defaults(struct super_block *sb) sb->s_xattr = fuse_no_acl_xattr_handlers; } -int fuse_fill_super_submount(struct super_block *sb, - struct fuse_inode *parent_fi) +static int fuse_fill_super_submount(struct super_block *sb, + struct fuse_inode *parent_fi) { struct fuse_mount *fm = get_fuse_mount_super(sb); struct super_block *parent_sb = parent_fi->inode.i_sb; -- 2.26.3 _______________________________________________ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization ^ permalink raw reply related [flat|nested] 83+ messages in thread
* [Virtio-fs] [PATCH v4 4/5] virtiofs: Skip submounts in sget_fc() 2021-05-20 15:46 ` Greg Kurz (?) @ 2021-05-20 15:46 ` Greg Kurz -1 siblings, 0 replies; 83+ messages in thread From: Greg Kurz @ 2021-05-20 15:46 UTC (permalink / raw) To: Miklos Szeredi Cc: linux-kernel, Max Reitz, virtio-fs, linux-fsdevel, virtualization, Vivek Goyal All submounts share the same virtio-fs device instance as the root mount. If the same virtiofs filesystem is mounted again, sget_fc() is likely to pick up any of these submounts and reuse it instead of the root mount. On the server side: # mkdir ${some_dir} # mkdir ${some_dir}/mnt1 # mount -t tmpfs none ${some_dir}/mnt1 # touch ${some_dir}/mnt1/THIS_IS_MNT1 # mkdir ${some_dir}/mnt2 # mount -t tmpfs none ${some_dir}/mnt2 # touch ${some_dir}/mnt2/THIS_IS_MNT2 On the client side: # mkdir /mnt/virtiofs1 # mount -t virtiofs myfs /mnt/virtiofs1 # ls /mnt/virtiofs1 mnt1 mnt2 # grep virtiofs /proc/mounts myfs /mnt/virtiofs1 virtiofs rw,seclabel,relatime 0 0 none on /mnt/mnt1 type virtiofs (rw,relatime,seclabel) none on /mnt/mnt2 type virtiofs (rw,relatime,seclabel) And now remount it again: # mount -t virtiofs myfs /mnt/virtiofs2 # grep virtiofs /proc/mounts myfs /mnt/virtiofs1 virtiofs rw,seclabel,relatime 0 0 none on /mnt/mnt1 type virtiofs (rw,relatime,seclabel) none on /mnt/mnt2 type virtiofs (rw,relatime,seclabel) myfs /mnt/virtiofs2 virtiofs rw,seclabel,relatime 0 0 # ls /mnt/virtiofs2 THIS_IS_MNT2 Submount mnt2 was picked-up instead of the root mount. Just skip submounts in virtio_fs_test_super(). Signed-off-by: Greg Kurz <groug@kaod.org> --- fs/fuse/virtio_fs.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c index e12e5190352c..8962cd033016 100644 --- a/fs/fuse/virtio_fs.c +++ b/fs/fuse/virtio_fs.c @@ -1408,6 +1408,11 @@ static int virtio_fs_test_super(struct super_block *sb, struct fuse_mount *fsc_fm = fsc->s_fs_info; struct fuse_mount *sb_fm = get_fuse_mount_super(sb); + + /* Skip submounts */ + if (!list_is_first(&sb_fm->fc_entry, &sb_fm->fc->mounts)) + return 0; + return fsc_fm->fc->iq.priv == sb_fm->fc->iq.priv; } -- 2.26.3 ^ permalink raw reply related [flat|nested] 83+ messages in thread
* [PATCH v4 4/5] virtiofs: Skip submounts in sget_fc() @ 2021-05-20 15:46 ` Greg Kurz 0 siblings, 0 replies; 83+ messages in thread From: Greg Kurz @ 2021-05-20 15:46 UTC (permalink / raw) To: Miklos Szeredi Cc: virtualization, linux-fsdevel, linux-kernel, virtio-fs, Stefan Hajnoczi, Max Reitz, Vivek Goyal, Greg Kurz All submounts share the same virtio-fs device instance as the root mount. If the same virtiofs filesystem is mounted again, sget_fc() is likely to pick up any of these submounts and reuse it instead of the root mount. On the server side: # mkdir ${some_dir} # mkdir ${some_dir}/mnt1 # mount -t tmpfs none ${some_dir}/mnt1 # touch ${some_dir}/mnt1/THIS_IS_MNT1 # mkdir ${some_dir}/mnt2 # mount -t tmpfs none ${some_dir}/mnt2 # touch ${some_dir}/mnt2/THIS_IS_MNT2 On the client side: # mkdir /mnt/virtiofs1 # mount -t virtiofs myfs /mnt/virtiofs1 # ls /mnt/virtiofs1 mnt1 mnt2 # grep virtiofs /proc/mounts myfs /mnt/virtiofs1 virtiofs rw,seclabel,relatime 0 0 none on /mnt/mnt1 type virtiofs (rw,relatime,seclabel) none on /mnt/mnt2 type virtiofs (rw,relatime,seclabel) And now remount it again: # mount -t virtiofs myfs /mnt/virtiofs2 # grep virtiofs /proc/mounts myfs /mnt/virtiofs1 virtiofs rw,seclabel,relatime 0 0 none on /mnt/mnt1 type virtiofs (rw,relatime,seclabel) none on /mnt/mnt2 type virtiofs (rw,relatime,seclabel) myfs /mnt/virtiofs2 virtiofs rw,seclabel,relatime 0 0 # ls /mnt/virtiofs2 THIS_IS_MNT2 Submount mnt2 was picked-up instead of the root mount. Just skip submounts in virtio_fs_test_super(). Signed-off-by: Greg Kurz <groug@kaod.org> --- fs/fuse/virtio_fs.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c index e12e5190352c..8962cd033016 100644 --- a/fs/fuse/virtio_fs.c +++ b/fs/fuse/virtio_fs.c @@ -1408,6 +1408,11 @@ static int virtio_fs_test_super(struct super_block *sb, struct fuse_mount *fsc_fm = fsc->s_fs_info; struct fuse_mount *sb_fm = get_fuse_mount_super(sb); + + /* Skip submounts */ + if (!list_is_first(&sb_fm->fc_entry, &sb_fm->fc->mounts)) + return 0; + return fsc_fm->fc->iq.priv == sb_fm->fc->iq.priv; } -- 2.26.3 ^ permalink raw reply related [flat|nested] 83+ messages in thread
* [PATCH v4 4/5] virtiofs: Skip submounts in sget_fc() @ 2021-05-20 15:46 ` Greg Kurz 0 siblings, 0 replies; 83+ messages in thread From: Greg Kurz @ 2021-05-20 15:46 UTC (permalink / raw) To: Miklos Szeredi Cc: linux-kernel, Max Reitz, virtio-fs, Stefan Hajnoczi, linux-fsdevel, virtualization, Vivek Goyal All submounts share the same virtio-fs device instance as the root mount. If the same virtiofs filesystem is mounted again, sget_fc() is likely to pick up any of these submounts and reuse it instead of the root mount. On the server side: # mkdir ${some_dir} # mkdir ${some_dir}/mnt1 # mount -t tmpfs none ${some_dir}/mnt1 # touch ${some_dir}/mnt1/THIS_IS_MNT1 # mkdir ${some_dir}/mnt2 # mount -t tmpfs none ${some_dir}/mnt2 # touch ${some_dir}/mnt2/THIS_IS_MNT2 On the client side: # mkdir /mnt/virtiofs1 # mount -t virtiofs myfs /mnt/virtiofs1 # ls /mnt/virtiofs1 mnt1 mnt2 # grep virtiofs /proc/mounts myfs /mnt/virtiofs1 virtiofs rw,seclabel,relatime 0 0 none on /mnt/mnt1 type virtiofs (rw,relatime,seclabel) none on /mnt/mnt2 type virtiofs (rw,relatime,seclabel) And now remount it again: # mount -t virtiofs myfs /mnt/virtiofs2 # grep virtiofs /proc/mounts myfs /mnt/virtiofs1 virtiofs rw,seclabel,relatime 0 0 none on /mnt/mnt1 type virtiofs (rw,relatime,seclabel) none on /mnt/mnt2 type virtiofs (rw,relatime,seclabel) myfs /mnt/virtiofs2 virtiofs rw,seclabel,relatime 0 0 # ls /mnt/virtiofs2 THIS_IS_MNT2 Submount mnt2 was picked-up instead of the root mount. Just skip submounts in virtio_fs_test_super(). Signed-off-by: Greg Kurz <groug@kaod.org> --- fs/fuse/virtio_fs.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c index e12e5190352c..8962cd033016 100644 --- a/fs/fuse/virtio_fs.c +++ b/fs/fuse/virtio_fs.c @@ -1408,6 +1408,11 @@ static int virtio_fs_test_super(struct super_block *sb, struct fuse_mount *fsc_fm = fsc->s_fs_info; struct fuse_mount *sb_fm = get_fuse_mount_super(sb); + + /* Skip submounts */ + if (!list_is_first(&sb_fm->fc_entry, &sb_fm->fc->mounts)) + return 0; + return fsc_fm->fc->iq.priv == sb_fm->fc->iq.priv; } -- 2.26.3 _______________________________________________ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization ^ permalink raw reply related [flat|nested] 83+ messages in thread
* Re: [Virtio-fs] [PATCH v4 4/5] virtiofs: Skip submounts in sget_fc() 2021-05-20 15:46 ` Greg Kurz @ 2021-05-21 8:26 ` Miklos Szeredi -1 siblings, 0 replies; 83+ messages in thread From: Miklos Szeredi @ 2021-05-21 8:26 UTC (permalink / raw) To: Greg Kurz Cc: linux-kernel, Max Reitz, virtio-fs-list, linux-fsdevel, virtualization, Vivek Goyal On Thu, 20 May 2021 at 17:47, Greg Kurz <groug@kaod.org> wrote: > > All submounts share the same virtio-fs device instance as the root > mount. If the same virtiofs filesystem is mounted again, sget_fc() > is likely to pick up any of these submounts and reuse it instead of > the root mount. > > On the server side: > > # mkdir ${some_dir} > # mkdir ${some_dir}/mnt1 > # mount -t tmpfs none ${some_dir}/mnt1 > # touch ${some_dir}/mnt1/THIS_IS_MNT1 > # mkdir ${some_dir}/mnt2 > # mount -t tmpfs none ${some_dir}/mnt2 > # touch ${some_dir}/mnt2/THIS_IS_MNT2 > > On the client side: > > # mkdir /mnt/virtiofs1 > # mount -t virtiofs myfs /mnt/virtiofs1 > # ls /mnt/virtiofs1 > mnt1 mnt2 > # grep virtiofs /proc/mounts > myfs /mnt/virtiofs1 virtiofs rw,seclabel,relatime 0 0 > none on /mnt/mnt1 type virtiofs (rw,relatime,seclabel) > none on /mnt/mnt2 type virtiofs (rw,relatime,seclabel) > > And now remount it again: > > # mount -t virtiofs myfs /mnt/virtiofs2 > # grep virtiofs /proc/mounts > myfs /mnt/virtiofs1 virtiofs rw,seclabel,relatime 0 0 > none on /mnt/mnt1 type virtiofs (rw,relatime,seclabel) > none on /mnt/mnt2 type virtiofs (rw,relatime,seclabel) > myfs /mnt/virtiofs2 virtiofs rw,seclabel,relatime 0 0 > # ls /mnt/virtiofs2 > THIS_IS_MNT2 > > Submount mnt2 was picked-up instead of the root mount. Why is this a problem? Thanks, Miklos ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [PATCH v4 4/5] virtiofs: Skip submounts in sget_fc() @ 2021-05-21 8:26 ` Miklos Szeredi 0 siblings, 0 replies; 83+ messages in thread From: Miklos Szeredi @ 2021-05-21 8:26 UTC (permalink / raw) To: Greg Kurz Cc: virtualization, linux-fsdevel, linux-kernel, virtio-fs-list, Stefan Hajnoczi, Max Reitz, Vivek Goyal On Thu, 20 May 2021 at 17:47, Greg Kurz <groug@kaod.org> wrote: > > All submounts share the same virtio-fs device instance as the root > mount. If the same virtiofs filesystem is mounted again, sget_fc() > is likely to pick up any of these submounts and reuse it instead of > the root mount. > > On the server side: > > # mkdir ${some_dir} > # mkdir ${some_dir}/mnt1 > # mount -t tmpfs none ${some_dir}/mnt1 > # touch ${some_dir}/mnt1/THIS_IS_MNT1 > # mkdir ${some_dir}/mnt2 > # mount -t tmpfs none ${some_dir}/mnt2 > # touch ${some_dir}/mnt2/THIS_IS_MNT2 > > On the client side: > > # mkdir /mnt/virtiofs1 > # mount -t virtiofs myfs /mnt/virtiofs1 > # ls /mnt/virtiofs1 > mnt1 mnt2 > # grep virtiofs /proc/mounts > myfs /mnt/virtiofs1 virtiofs rw,seclabel,relatime 0 0 > none on /mnt/mnt1 type virtiofs (rw,relatime,seclabel) > none on /mnt/mnt2 type virtiofs (rw,relatime,seclabel) > > And now remount it again: > > # mount -t virtiofs myfs /mnt/virtiofs2 > # grep virtiofs /proc/mounts > myfs /mnt/virtiofs1 virtiofs rw,seclabel,relatime 0 0 > none on /mnt/mnt1 type virtiofs (rw,relatime,seclabel) > none on /mnt/mnt2 type virtiofs (rw,relatime,seclabel) > myfs /mnt/virtiofs2 virtiofs rw,seclabel,relatime 0 0 > # ls /mnt/virtiofs2 > THIS_IS_MNT2 > > Submount mnt2 was picked-up instead of the root mount. Why is this a problem? Thanks, Miklos ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [Virtio-fs] [PATCH v4 4/5] virtiofs: Skip submounts in sget_fc() 2021-05-21 8:26 ` Miklos Szeredi (?) @ 2021-05-21 8:39 ` Greg Kurz -1 siblings, 0 replies; 83+ messages in thread From: Greg Kurz @ 2021-05-21 8:39 UTC (permalink / raw) To: Miklos Szeredi Cc: linux-kernel, Max Reitz, virtio-fs-list, linux-fsdevel, virtualization, Vivek Goyal On Fri, 21 May 2021 10:26:27 +0200 Miklos Szeredi <miklos@szeredi.hu> wrote: > On Thu, 20 May 2021 at 17:47, Greg Kurz <groug@kaod.org> wrote: > > > > All submounts share the same virtio-fs device instance as the root > > mount. If the same virtiofs filesystem is mounted again, sget_fc() > > is likely to pick up any of these submounts and reuse it instead of > > the root mount. > > > > On the server side: > > > > # mkdir ${some_dir} > > # mkdir ${some_dir}/mnt1 > > # mount -t tmpfs none ${some_dir}/mnt1 > > # touch ${some_dir}/mnt1/THIS_IS_MNT1 > > # mkdir ${some_dir}/mnt2 > > # mount -t tmpfs none ${some_dir}/mnt2 > > # touch ${some_dir}/mnt2/THIS_IS_MNT2 > > > > On the client side: > > > > # mkdir /mnt/virtiofs1 > > # mount -t virtiofs myfs /mnt/virtiofs1 > > # ls /mnt/virtiofs1 > > mnt1 mnt2 > > # grep virtiofs /proc/mounts > > myfs /mnt/virtiofs1 virtiofs rw,seclabel,relatime 0 0 > > none on /mnt/mnt1 type virtiofs (rw,relatime,seclabel) > > none on /mnt/mnt2 type virtiofs (rw,relatime,seclabel) > > > > And now remount it again: > > > > # mount -t virtiofs myfs /mnt/virtiofs2 > > # grep virtiofs /proc/mounts > > myfs /mnt/virtiofs1 virtiofs rw,seclabel,relatime 0 0 > > none on /mnt/mnt1 type virtiofs (rw,relatime,seclabel) > > none on /mnt/mnt2 type virtiofs (rw,relatime,seclabel) > > myfs /mnt/virtiofs2 virtiofs rw,seclabel,relatime 0 0 > > # ls /mnt/virtiofs2 > > THIS_IS_MNT2 > > > > Submount mnt2 was picked-up instead of the root mount. > > Why is this a problem? > It seems very weird to mount the same filesystem again and to end up in one of its submounts. We should have: # ls /mnt/virtiofs2 mnt1 mnt2 > Thanks, > Miklos ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [PATCH v4 4/5] virtiofs: Skip submounts in sget_fc() @ 2021-05-21 8:39 ` Greg Kurz 0 siblings, 0 replies; 83+ messages in thread From: Greg Kurz @ 2021-05-21 8:39 UTC (permalink / raw) To: Miklos Szeredi Cc: virtualization, linux-fsdevel, linux-kernel, virtio-fs-list, Stefan Hajnoczi, Max Reitz, Vivek Goyal On Fri, 21 May 2021 10:26:27 +0200 Miklos Szeredi <miklos@szeredi.hu> wrote: > On Thu, 20 May 2021 at 17:47, Greg Kurz <groug@kaod.org> wrote: > > > > All submounts share the same virtio-fs device instance as the root > > mount. If the same virtiofs filesystem is mounted again, sget_fc() > > is likely to pick up any of these submounts and reuse it instead of > > the root mount. > > > > On the server side: > > > > # mkdir ${some_dir} > > # mkdir ${some_dir}/mnt1 > > # mount -t tmpfs none ${some_dir}/mnt1 > > # touch ${some_dir}/mnt1/THIS_IS_MNT1 > > # mkdir ${some_dir}/mnt2 > > # mount -t tmpfs none ${some_dir}/mnt2 > > # touch ${some_dir}/mnt2/THIS_IS_MNT2 > > > > On the client side: > > > > # mkdir /mnt/virtiofs1 > > # mount -t virtiofs myfs /mnt/virtiofs1 > > # ls /mnt/virtiofs1 > > mnt1 mnt2 > > # grep virtiofs /proc/mounts > > myfs /mnt/virtiofs1 virtiofs rw,seclabel,relatime 0 0 > > none on /mnt/mnt1 type virtiofs (rw,relatime,seclabel) > > none on /mnt/mnt2 type virtiofs (rw,relatime,seclabel) > > > > And now remount it again: > > > > # mount -t virtiofs myfs /mnt/virtiofs2 > > # grep virtiofs /proc/mounts > > myfs /mnt/virtiofs1 virtiofs rw,seclabel,relatime 0 0 > > none on /mnt/mnt1 type virtiofs (rw,relatime,seclabel) > > none on /mnt/mnt2 type virtiofs (rw,relatime,seclabel) > > myfs /mnt/virtiofs2 virtiofs rw,seclabel,relatime 0 0 > > # ls /mnt/virtiofs2 > > THIS_IS_MNT2 > > > > Submount mnt2 was picked-up instead of the root mount. > > Why is this a problem? > It seems very weird to mount the same filesystem again and to end up in one of its submounts. We should have: # ls /mnt/virtiofs2 mnt1 mnt2 > Thanks, > Miklos ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [PATCH v4 4/5] virtiofs: Skip submounts in sget_fc() @ 2021-05-21 8:39 ` Greg Kurz 0 siblings, 0 replies; 83+ messages in thread From: Greg Kurz @ 2021-05-21 8:39 UTC (permalink / raw) To: Miklos Szeredi Cc: linux-kernel, Max Reitz, virtio-fs-list, Stefan Hajnoczi, linux-fsdevel, virtualization, Vivek Goyal On Fri, 21 May 2021 10:26:27 +0200 Miklos Szeredi <miklos@szeredi.hu> wrote: > On Thu, 20 May 2021 at 17:47, Greg Kurz <groug@kaod.org> wrote: > > > > All submounts share the same virtio-fs device instance as the root > > mount. If the same virtiofs filesystem is mounted again, sget_fc() > > is likely to pick up any of these submounts and reuse it instead of > > the root mount. > > > > On the server side: > > > > # mkdir ${some_dir} > > # mkdir ${some_dir}/mnt1 > > # mount -t tmpfs none ${some_dir}/mnt1 > > # touch ${some_dir}/mnt1/THIS_IS_MNT1 > > # mkdir ${some_dir}/mnt2 > > # mount -t tmpfs none ${some_dir}/mnt2 > > # touch ${some_dir}/mnt2/THIS_IS_MNT2 > > > > On the client side: > > > > # mkdir /mnt/virtiofs1 > > # mount -t virtiofs myfs /mnt/virtiofs1 > > # ls /mnt/virtiofs1 > > mnt1 mnt2 > > # grep virtiofs /proc/mounts > > myfs /mnt/virtiofs1 virtiofs rw,seclabel,relatime 0 0 > > none on /mnt/mnt1 type virtiofs (rw,relatime,seclabel) > > none on /mnt/mnt2 type virtiofs (rw,relatime,seclabel) > > > > And now remount it again: > > > > # mount -t virtiofs myfs /mnt/virtiofs2 > > # grep virtiofs /proc/mounts > > myfs /mnt/virtiofs1 virtiofs rw,seclabel,relatime 0 0 > > none on /mnt/mnt1 type virtiofs (rw,relatime,seclabel) > > none on /mnt/mnt2 type virtiofs (rw,relatime,seclabel) > > myfs /mnt/virtiofs2 virtiofs rw,seclabel,relatime 0 0 > > # ls /mnt/virtiofs2 > > THIS_IS_MNT2 > > > > Submount mnt2 was picked-up instead of the root mount. > > Why is this a problem? > It seems very weird to mount the same filesystem again and to end up in one of its submounts. We should have: # ls /mnt/virtiofs2 mnt1 mnt2 > Thanks, > Miklos _______________________________________________ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [Virtio-fs] [PATCH v4 4/5] virtiofs: Skip submounts in sget_fc() 2021-05-21 8:39 ` Greg Kurz @ 2021-05-21 8:50 ` Miklos Szeredi -1 siblings, 0 replies; 83+ messages in thread From: Miklos Szeredi @ 2021-05-21 8:50 UTC (permalink / raw) To: Greg Kurz Cc: linux-kernel, Max Reitz, virtio-fs-list, linux-fsdevel, virtualization, Vivek Goyal On Fri, 21 May 2021 at 10:39, Greg Kurz <groug@kaod.org> wrote: > > On Fri, 21 May 2021 10:26:27 +0200 > Miklos Szeredi <miklos@szeredi.hu> wrote: > > > On Thu, 20 May 2021 at 17:47, Greg Kurz <groug@kaod.org> wrote: > > > > > > All submounts share the same virtio-fs device instance as the root > > > mount. If the same virtiofs filesystem is mounted again, sget_fc() > > > is likely to pick up any of these submounts and reuse it instead of > > > the root mount. > > > > > > On the server side: > > > > > > # mkdir ${some_dir} > > > # mkdir ${some_dir}/mnt1 > > > # mount -t tmpfs none ${some_dir}/mnt1 > > > # touch ${some_dir}/mnt1/THIS_IS_MNT1 > > > # mkdir ${some_dir}/mnt2 > > > # mount -t tmpfs none ${some_dir}/mnt2 > > > # touch ${some_dir}/mnt2/THIS_IS_MNT2 > > > > > > On the client side: > > > > > > # mkdir /mnt/virtiofs1 > > > # mount -t virtiofs myfs /mnt/virtiofs1 > > > # ls /mnt/virtiofs1 > > > mnt1 mnt2 > > > # grep virtiofs /proc/mounts > > > myfs /mnt/virtiofs1 virtiofs rw,seclabel,relatime 0 0 > > > none on /mnt/mnt1 type virtiofs (rw,relatime,seclabel) > > > none on /mnt/mnt2 type virtiofs (rw,relatime,seclabel) > > > > > > And now remount it again: > > > > > > # mount -t virtiofs myfs /mnt/virtiofs2 > > > # grep virtiofs /proc/mounts > > > myfs /mnt/virtiofs1 virtiofs rw,seclabel,relatime 0 0 > > > none on /mnt/mnt1 type virtiofs (rw,relatime,seclabel) > > > none on /mnt/mnt2 type virtiofs (rw,relatime,seclabel) > > > myfs /mnt/virtiofs2 virtiofs rw,seclabel,relatime 0 0 > > > # ls /mnt/virtiofs2 > > > THIS_IS_MNT2 > > > > > > Submount mnt2 was picked-up instead of the root mount. > > > > > Why is this a problem? > > > > It seems very weird to mount the same filesystem again > and to end up in one of its submounts. We should have: > > # ls /mnt/virtiofs2 > mnt1 mnt2 Okay, sorry, I understand the problem. The solution is wrong, however: the position of the submount on that list is no indication that it's the right one (it's possible that the root sb will go away and only a sub-sb will remain). Even just setting a flag in the root, indicating that it's the root isn't fully going to solve the problem. Here's issue in full: case 1: no connection for "myfs" exists - need to create fuse_conn, sb case 2: connection for "myfs" exists but only sb for submount - only create sb for root, reuse fuse_conn case 3: connection for "myfs" as well as root sb exists - reuse sb I'll think about how to fix this properly, it's probably going to be rather more involved... Thanks, Miklos ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [PATCH v4 4/5] virtiofs: Skip submounts in sget_fc() @ 2021-05-21 8:50 ` Miklos Szeredi 0 siblings, 0 replies; 83+ messages in thread From: Miklos Szeredi @ 2021-05-21 8:50 UTC (permalink / raw) To: Greg Kurz Cc: virtualization, linux-fsdevel, linux-kernel, virtio-fs-list, Stefan Hajnoczi, Max Reitz, Vivek Goyal On Fri, 21 May 2021 at 10:39, Greg Kurz <groug@kaod.org> wrote: > > On Fri, 21 May 2021 10:26:27 +0200 > Miklos Szeredi <miklos@szeredi.hu> wrote: > > > On Thu, 20 May 2021 at 17:47, Greg Kurz <groug@kaod.org> wrote: > > > > > > All submounts share the same virtio-fs device instance as the root > > > mount. If the same virtiofs filesystem is mounted again, sget_fc() > > > is likely to pick up any of these submounts and reuse it instead of > > > the root mount. > > > > > > On the server side: > > > > > > # mkdir ${some_dir} > > > # mkdir ${some_dir}/mnt1 > > > # mount -t tmpfs none ${some_dir}/mnt1 > > > # touch ${some_dir}/mnt1/THIS_IS_MNT1 > > > # mkdir ${some_dir}/mnt2 > > > # mount -t tmpfs none ${some_dir}/mnt2 > > > # touch ${some_dir}/mnt2/THIS_IS_MNT2 > > > > > > On the client side: > > > > > > # mkdir /mnt/virtiofs1 > > > # mount -t virtiofs myfs /mnt/virtiofs1 > > > # ls /mnt/virtiofs1 > > > mnt1 mnt2 > > > # grep virtiofs /proc/mounts > > > myfs /mnt/virtiofs1 virtiofs rw,seclabel,relatime 0 0 > > > none on /mnt/mnt1 type virtiofs (rw,relatime,seclabel) > > > none on /mnt/mnt2 type virtiofs (rw,relatime,seclabel) > > > > > > And now remount it again: > > > > > > # mount -t virtiofs myfs /mnt/virtiofs2 > > > # grep virtiofs /proc/mounts > > > myfs /mnt/virtiofs1 virtiofs rw,seclabel,relatime 0 0 > > > none on /mnt/mnt1 type virtiofs (rw,relatime,seclabel) > > > none on /mnt/mnt2 type virtiofs (rw,relatime,seclabel) > > > myfs /mnt/virtiofs2 virtiofs rw,seclabel,relatime 0 0 > > > # ls /mnt/virtiofs2 > > > THIS_IS_MNT2 > > > > > > Submount mnt2 was picked-up instead of the root mount. > > > > > Why is this a problem? > > > > It seems very weird to mount the same filesystem again > and to end up in one of its submounts. We should have: > > # ls /mnt/virtiofs2 > mnt1 mnt2 Okay, sorry, I understand the problem. The solution is wrong, however: the position of the submount on that list is no indication that it's the right one (it's possible that the root sb will go away and only a sub-sb will remain). Even just setting a flag in the root, indicating that it's the root isn't fully going to solve the problem. Here's issue in full: case 1: no connection for "myfs" exists - need to create fuse_conn, sb case 2: connection for "myfs" exists but only sb for submount - only create sb for root, reuse fuse_conn case 3: connection for "myfs" as well as root sb exists - reuse sb I'll think about how to fix this properly, it's probably going to be rather more involved... Thanks, Miklos ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [Virtio-fs] [PATCH v4 4/5] virtiofs: Skip submounts in sget_fc() 2021-05-21 8:50 ` Miklos Szeredi (?) @ 2021-05-21 10:06 ` Greg Kurz -1 siblings, 0 replies; 83+ messages in thread From: Greg Kurz @ 2021-05-21 10:06 UTC (permalink / raw) To: Miklos Szeredi Cc: linux-kernel, Max Reitz, virtio-fs-list, linux-fsdevel, virtualization, Vivek Goyal On Fri, 21 May 2021 10:50:34 +0200 Miklos Szeredi <miklos@szeredi.hu> wrote: > On Fri, 21 May 2021 at 10:39, Greg Kurz <groug@kaod.org> wrote: > > > > On Fri, 21 May 2021 10:26:27 +0200 > > Miklos Szeredi <miklos@szeredi.hu> wrote: > > > > > On Thu, 20 May 2021 at 17:47, Greg Kurz <groug@kaod.org> wrote: > > > > > > > > All submounts share the same virtio-fs device instance as the root > > > > mount. If the same virtiofs filesystem is mounted again, sget_fc() > > > > is likely to pick up any of these submounts and reuse it instead of > > > > the root mount. > > > > > > > > On the server side: > > > > > > > > # mkdir ${some_dir} > > > > # mkdir ${some_dir}/mnt1 > > > > # mount -t tmpfs none ${some_dir}/mnt1 > > > > # touch ${some_dir}/mnt1/THIS_IS_MNT1 > > > > # mkdir ${some_dir}/mnt2 > > > > # mount -t tmpfs none ${some_dir}/mnt2 > > > > # touch ${some_dir}/mnt2/THIS_IS_MNT2 > > > > > > > > On the client side: > > > > > > > > # mkdir /mnt/virtiofs1 > > > > # mount -t virtiofs myfs /mnt/virtiofs1 > > > > # ls /mnt/virtiofs1 > > > > mnt1 mnt2 > > > > # grep virtiofs /proc/mounts > > > > myfs /mnt/virtiofs1 virtiofs rw,seclabel,relatime 0 0 > > > > none on /mnt/mnt1 type virtiofs (rw,relatime,seclabel) > > > > none on /mnt/mnt2 type virtiofs (rw,relatime,seclabel) > > > > > > > > And now remount it again: > > > > > > > > # mount -t virtiofs myfs /mnt/virtiofs2 > > > > # grep virtiofs /proc/mounts > > > > myfs /mnt/virtiofs1 virtiofs rw,seclabel,relatime 0 0 > > > > none on /mnt/mnt1 type virtiofs (rw,relatime,seclabel) > > > > none on /mnt/mnt2 type virtiofs (rw,relatime,seclabel) > > > > myfs /mnt/virtiofs2 virtiofs rw,seclabel,relatime 0 0 > > > > # ls /mnt/virtiofs2 > > > > THIS_IS_MNT2 > > > > > > > > Submount mnt2 was picked-up instead of the root mount. > > > > > > > > Why is this a problem? > > > > > > > It seems very weird to mount the same filesystem again > > and to end up in one of its submounts. We should have: > > > > # ls /mnt/virtiofs2 > > mnt1 mnt2 > > Okay, sorry, I understand the problem. The solution is wrong, > however: the position of the submount on that list is no indication > that it's the right one (it's possible that the root sb will go away > and only a sub-sb will remain). > Ah... I had myself convinced this could not happen, i.e. you can't unmount a parent sb with a sub-sb still mounted. How can this happen ? > Even just setting a flag in the root, indicating that it's the root > isn't fully going to solve the problem. > > Here's issue in full: > > case 1: no connection for "myfs" exists > - need to create fuse_conn, sb > > case 2: connection for "myfs" exists but only sb for submount How would we know this sb isn't a root sb ? > - only create sb for root, reuse fuse_conn > > case 3: connection for "myfs" as well as root sb exists > - reuse sb > > I'll think about how to fix this properly, it's probably going to be > rather more involved... > Sure. BTW I'm wondering why we never reuse sbs for submounts ? > Thanks, > Miklos ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [PATCH v4 4/5] virtiofs: Skip submounts in sget_fc() @ 2021-05-21 10:06 ` Greg Kurz 0 siblings, 0 replies; 83+ messages in thread From: Greg Kurz @ 2021-05-21 10:06 UTC (permalink / raw) To: Miklos Szeredi Cc: virtualization, linux-fsdevel, linux-kernel, virtio-fs-list, Stefan Hajnoczi, Max Reitz, Vivek Goyal On Fri, 21 May 2021 10:50:34 +0200 Miklos Szeredi <miklos@szeredi.hu> wrote: > On Fri, 21 May 2021 at 10:39, Greg Kurz <groug@kaod.org> wrote: > > > > On Fri, 21 May 2021 10:26:27 +0200 > > Miklos Szeredi <miklos@szeredi.hu> wrote: > > > > > On Thu, 20 May 2021 at 17:47, Greg Kurz <groug@kaod.org> wrote: > > > > > > > > All submounts share the same virtio-fs device instance as the root > > > > mount. If the same virtiofs filesystem is mounted again, sget_fc() > > > > is likely to pick up any of these submounts and reuse it instead of > > > > the root mount. > > > > > > > > On the server side: > > > > > > > > # mkdir ${some_dir} > > > > # mkdir ${some_dir}/mnt1 > > > > # mount -t tmpfs none ${some_dir}/mnt1 > > > > # touch ${some_dir}/mnt1/THIS_IS_MNT1 > > > > # mkdir ${some_dir}/mnt2 > > > > # mount -t tmpfs none ${some_dir}/mnt2 > > > > # touch ${some_dir}/mnt2/THIS_IS_MNT2 > > > > > > > > On the client side: > > > > > > > > # mkdir /mnt/virtiofs1 > > > > # mount -t virtiofs myfs /mnt/virtiofs1 > > > > # ls /mnt/virtiofs1 > > > > mnt1 mnt2 > > > > # grep virtiofs /proc/mounts > > > > myfs /mnt/virtiofs1 virtiofs rw,seclabel,relatime 0 0 > > > > none on /mnt/mnt1 type virtiofs (rw,relatime,seclabel) > > > > none on /mnt/mnt2 type virtiofs (rw,relatime,seclabel) > > > > > > > > And now remount it again: > > > > > > > > # mount -t virtiofs myfs /mnt/virtiofs2 > > > > # grep virtiofs /proc/mounts > > > > myfs /mnt/virtiofs1 virtiofs rw,seclabel,relatime 0 0 > > > > none on /mnt/mnt1 type virtiofs (rw,relatime,seclabel) > > > > none on /mnt/mnt2 type virtiofs (rw,relatime,seclabel) > > > > myfs /mnt/virtiofs2 virtiofs rw,seclabel,relatime 0 0 > > > > # ls /mnt/virtiofs2 > > > > THIS_IS_MNT2 > > > > > > > > Submount mnt2 was picked-up instead of the root mount. > > > > > > > > Why is this a problem? > > > > > > > It seems very weird to mount the same filesystem again > > and to end up in one of its submounts. We should have: > > > > # ls /mnt/virtiofs2 > > mnt1 mnt2 > > Okay, sorry, I understand the problem. The solution is wrong, > however: the position of the submount on that list is no indication > that it's the right one (it's possible that the root sb will go away > and only a sub-sb will remain). > Ah... I had myself convinced this could not happen, i.e. you can't unmount a parent sb with a sub-sb still mounted. How can this happen ? > Even just setting a flag in the root, indicating that it's the root > isn't fully going to solve the problem. > > Here's issue in full: > > case 1: no connection for "myfs" exists > - need to create fuse_conn, sb > > case 2: connection for "myfs" exists but only sb for submount How would we know this sb isn't a root sb ? > - only create sb for root, reuse fuse_conn > > case 3: connection for "myfs" as well as root sb exists > - reuse sb > > I'll think about how to fix this properly, it's probably going to be > rather more involved... > Sure. BTW I'm wondering why we never reuse sbs for submounts ? > Thanks, > Miklos ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [PATCH v4 4/5] virtiofs: Skip submounts in sget_fc() @ 2021-05-21 10:06 ` Greg Kurz 0 siblings, 0 replies; 83+ messages in thread From: Greg Kurz @ 2021-05-21 10:06 UTC (permalink / raw) To: Miklos Szeredi Cc: linux-kernel, Max Reitz, virtio-fs-list, Stefan Hajnoczi, linux-fsdevel, virtualization, Vivek Goyal On Fri, 21 May 2021 10:50:34 +0200 Miklos Szeredi <miklos@szeredi.hu> wrote: > On Fri, 21 May 2021 at 10:39, Greg Kurz <groug@kaod.org> wrote: > > > > On Fri, 21 May 2021 10:26:27 +0200 > > Miklos Szeredi <miklos@szeredi.hu> wrote: > > > > > On Thu, 20 May 2021 at 17:47, Greg Kurz <groug@kaod.org> wrote: > > > > > > > > All submounts share the same virtio-fs device instance as the root > > > > mount. If the same virtiofs filesystem is mounted again, sget_fc() > > > > is likely to pick up any of these submounts and reuse it instead of > > > > the root mount. > > > > > > > > On the server side: > > > > > > > > # mkdir ${some_dir} > > > > # mkdir ${some_dir}/mnt1 > > > > # mount -t tmpfs none ${some_dir}/mnt1 > > > > # touch ${some_dir}/mnt1/THIS_IS_MNT1 > > > > # mkdir ${some_dir}/mnt2 > > > > # mount -t tmpfs none ${some_dir}/mnt2 > > > > # touch ${some_dir}/mnt2/THIS_IS_MNT2 > > > > > > > > On the client side: > > > > > > > > # mkdir /mnt/virtiofs1 > > > > # mount -t virtiofs myfs /mnt/virtiofs1 > > > > # ls /mnt/virtiofs1 > > > > mnt1 mnt2 > > > > # grep virtiofs /proc/mounts > > > > myfs /mnt/virtiofs1 virtiofs rw,seclabel,relatime 0 0 > > > > none on /mnt/mnt1 type virtiofs (rw,relatime,seclabel) > > > > none on /mnt/mnt2 type virtiofs (rw,relatime,seclabel) > > > > > > > > And now remount it again: > > > > > > > > # mount -t virtiofs myfs /mnt/virtiofs2 > > > > # grep virtiofs /proc/mounts > > > > myfs /mnt/virtiofs1 virtiofs rw,seclabel,relatime 0 0 > > > > none on /mnt/mnt1 type virtiofs (rw,relatime,seclabel) > > > > none on /mnt/mnt2 type virtiofs (rw,relatime,seclabel) > > > > myfs /mnt/virtiofs2 virtiofs rw,seclabel,relatime 0 0 > > > > # ls /mnt/virtiofs2 > > > > THIS_IS_MNT2 > > > > > > > > Submount mnt2 was picked-up instead of the root mount. > > > > > > > > Why is this a problem? > > > > > > > It seems very weird to mount the same filesystem again > > and to end up in one of its submounts. We should have: > > > > # ls /mnt/virtiofs2 > > mnt1 mnt2 > > Okay, sorry, I understand the problem. The solution is wrong, > however: the position of the submount on that list is no indication > that it's the right one (it's possible that the root sb will go away > and only a sub-sb will remain). > Ah... I had myself convinced this could not happen, i.e. you can't unmount a parent sb with a sub-sb still mounted. How can this happen ? > Even just setting a flag in the root, indicating that it's the root > isn't fully going to solve the problem. > > Here's issue in full: > > case 1: no connection for "myfs" exists > - need to create fuse_conn, sb > > case 2: connection for "myfs" exists but only sb for submount How would we know this sb isn't a root sb ? > - only create sb for root, reuse fuse_conn > > case 3: connection for "myfs" as well as root sb exists > - reuse sb > > I'll think about how to fix this properly, it's probably going to be > rather more involved... > Sure. BTW I'm wondering why we never reuse sbs for submounts ? > Thanks, > Miklos _______________________________________________ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [Virtio-fs] [PATCH v4 4/5] virtiofs: Skip submounts in sget_fc() 2021-05-21 10:06 ` Greg Kurz @ 2021-05-21 12:37 ` Miklos Szeredi -1 siblings, 0 replies; 83+ messages in thread From: Miklos Szeredi @ 2021-05-21 12:37 UTC (permalink / raw) To: Greg Kurz Cc: linux-kernel, Max Reitz, virtio-fs-list, linux-fsdevel, virtualization, Vivek Goyal On Fri, 21 May 2021 at 12:06, Greg Kurz <groug@kaod.org> wrote: > > On Fri, 21 May 2021 10:50:34 +0200 > Miklos Szeredi <miklos@szeredi.hu> wrote: > > > On Fri, 21 May 2021 at 10:39, Greg Kurz <groug@kaod.org> wrote: > > > > > > On Fri, 21 May 2021 10:26:27 +0200 > > > Miklos Szeredi <miklos@szeredi.hu> wrote: > > > > > > > On Thu, 20 May 2021 at 17:47, Greg Kurz <groug@kaod.org> wrote: > > > > > > > > > > All submounts share the same virtio-fs device instance as the root > > > > > mount. If the same virtiofs filesystem is mounted again, sget_fc() > > > > > is likely to pick up any of these submounts and reuse it instead of > > > > > the root mount. > > > > > > > > > > On the server side: > > > > > > > > > > # mkdir ${some_dir} > > > > > # mkdir ${some_dir}/mnt1 > > > > > # mount -t tmpfs none ${some_dir}/mnt1 > > > > > # touch ${some_dir}/mnt1/THIS_IS_MNT1 > > > > > # mkdir ${some_dir}/mnt2 > > > > > # mount -t tmpfs none ${some_dir}/mnt2 > > > > > # touch ${some_dir}/mnt2/THIS_IS_MNT2 > > > > > > > > > > On the client side: > > > > > > > > > > # mkdir /mnt/virtiofs1 > > > > > # mount -t virtiofs myfs /mnt/virtiofs1 > > > > > # ls /mnt/virtiofs1 > > > > > mnt1 mnt2 > > > > > # grep virtiofs /proc/mounts > > > > > myfs /mnt/virtiofs1 virtiofs rw,seclabel,relatime 0 0 > > > > > none on /mnt/mnt1 type virtiofs (rw,relatime,seclabel) > > > > > none on /mnt/mnt2 type virtiofs (rw,relatime,seclabel) > > > > > > > > > > And now remount it again: > > > > > > > > > > # mount -t virtiofs myfs /mnt/virtiofs2 > > > > > # grep virtiofs /proc/mounts > > > > > myfs /mnt/virtiofs1 virtiofs rw,seclabel,relatime 0 0 > > > > > none on /mnt/mnt1 type virtiofs (rw,relatime,seclabel) > > > > > none on /mnt/mnt2 type virtiofs (rw,relatime,seclabel) > > > > > myfs /mnt/virtiofs2 virtiofs rw,seclabel,relatime 0 0 > > > > > # ls /mnt/virtiofs2 > > > > > THIS_IS_MNT2 > > > > > > > > > > Submount mnt2 was picked-up instead of the root mount. > > > > > > > > > > > Why is this a problem? > > > > > > > > > > It seems very weird to mount the same filesystem again > > > and to end up in one of its submounts. We should have: > > > > > > # ls /mnt/virtiofs2 > > > mnt1 mnt2 > > > > Okay, sorry, I understand the problem. The solution is wrong, > > however: the position of the submount on that list is no indication > > that it's the right one (it's possible that the root sb will go away > > and only a sub-sb will remain). > > > > Ah... I had myself convinced this could not happen, i.e. you can't > unmount a parent sb with a sub-sb still mounted. No, but it's possible for sub-sb to continue existing after it's no longer a submount of original mount. > > How can this happen ? E.g. move the submount out of the way, then unmount the parent, or detach submount (umount -l) while keeping something open in there and umount the parent. > > Even just setting a flag in the root, indicating that it's the root > > isn't fully going to solve the problem. > > > > Here's issue in full: > > > > case 1: no connection for "myfs" exists > > - need to create fuse_conn, sb > > > > case 2: connection for "myfs" exists but only sb for submount > > How would we know this sb isn't a root sb ? > > > - only create sb for root, reuse fuse_conn > > > > case 3: connection for "myfs" as well as root sb exists > > - reuse sb > > > > I'll think about how to fix this properly, it's probably going to be > > rather more involved... > > > > Sure. BTW I'm wondering why we never reuse sbs for submounts ? Right, same general issue. An sb can be identified by its root nodeid, so I guess the proper fix to make the root nodeid be the key for virtio_fs_test_super(). Thanks, Miklos ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [PATCH v4 4/5] virtiofs: Skip submounts in sget_fc() @ 2021-05-21 12:37 ` Miklos Szeredi 0 siblings, 0 replies; 83+ messages in thread From: Miklos Szeredi @ 2021-05-21 12:37 UTC (permalink / raw) To: Greg Kurz Cc: virtualization, linux-fsdevel, linux-kernel, virtio-fs-list, Stefan Hajnoczi, Max Reitz, Vivek Goyal On Fri, 21 May 2021 at 12:06, Greg Kurz <groug@kaod.org> wrote: > > On Fri, 21 May 2021 10:50:34 +0200 > Miklos Szeredi <miklos@szeredi.hu> wrote: > > > On Fri, 21 May 2021 at 10:39, Greg Kurz <groug@kaod.org> wrote: > > > > > > On Fri, 21 May 2021 10:26:27 +0200 > > > Miklos Szeredi <miklos@szeredi.hu> wrote: > > > > > > > On Thu, 20 May 2021 at 17:47, Greg Kurz <groug@kaod.org> wrote: > > > > > > > > > > All submounts share the same virtio-fs device instance as the root > > > > > mount. If the same virtiofs filesystem is mounted again, sget_fc() > > > > > is likely to pick up any of these submounts and reuse it instead of > > > > > the root mount. > > > > > > > > > > On the server side: > > > > > > > > > > # mkdir ${some_dir} > > > > > # mkdir ${some_dir}/mnt1 > > > > > # mount -t tmpfs none ${some_dir}/mnt1 > > > > > # touch ${some_dir}/mnt1/THIS_IS_MNT1 > > > > > # mkdir ${some_dir}/mnt2 > > > > > # mount -t tmpfs none ${some_dir}/mnt2 > > > > > # touch ${some_dir}/mnt2/THIS_IS_MNT2 > > > > > > > > > > On the client side: > > > > > > > > > > # mkdir /mnt/virtiofs1 > > > > > # mount -t virtiofs myfs /mnt/virtiofs1 > > > > > # ls /mnt/virtiofs1 > > > > > mnt1 mnt2 > > > > > # grep virtiofs /proc/mounts > > > > > myfs /mnt/virtiofs1 virtiofs rw,seclabel,relatime 0 0 > > > > > none on /mnt/mnt1 type virtiofs (rw,relatime,seclabel) > > > > > none on /mnt/mnt2 type virtiofs (rw,relatime,seclabel) > > > > > > > > > > And now remount it again: > > > > > > > > > > # mount -t virtiofs myfs /mnt/virtiofs2 > > > > > # grep virtiofs /proc/mounts > > > > > myfs /mnt/virtiofs1 virtiofs rw,seclabel,relatime 0 0 > > > > > none on /mnt/mnt1 type virtiofs (rw,relatime,seclabel) > > > > > none on /mnt/mnt2 type virtiofs (rw,relatime,seclabel) > > > > > myfs /mnt/virtiofs2 virtiofs rw,seclabel,relatime 0 0 > > > > > # ls /mnt/virtiofs2 > > > > > THIS_IS_MNT2 > > > > > > > > > > Submount mnt2 was picked-up instead of the root mount. > > > > > > > > > > > Why is this a problem? > > > > > > > > > > It seems very weird to mount the same filesystem again > > > and to end up in one of its submounts. We should have: > > > > > > # ls /mnt/virtiofs2 > > > mnt1 mnt2 > > > > Okay, sorry, I understand the problem. The solution is wrong, > > however: the position of the submount on that list is no indication > > that it's the right one (it's possible that the root sb will go away > > and only a sub-sb will remain). > > > > Ah... I had myself convinced this could not happen, i.e. you can't > unmount a parent sb with a sub-sb still mounted. No, but it's possible for sub-sb to continue existing after it's no longer a submount of original mount. > > How can this happen ? E.g. move the submount out of the way, then unmount the parent, or detach submount (umount -l) while keeping something open in there and umount the parent. > > Even just setting a flag in the root, indicating that it's the root > > isn't fully going to solve the problem. > > > > Here's issue in full: > > > > case 1: no connection for "myfs" exists > > - need to create fuse_conn, sb > > > > case 2: connection for "myfs" exists but only sb for submount > > How would we know this sb isn't a root sb ? > > > - only create sb for root, reuse fuse_conn > > > > case 3: connection for "myfs" as well as root sb exists > > - reuse sb > > > > I'll think about how to fix this properly, it's probably going to be > > rather more involved... > > > > Sure. BTW I'm wondering why we never reuse sbs for submounts ? Right, same general issue. An sb can be identified by its root nodeid, so I guess the proper fix to make the root nodeid be the key for virtio_fs_test_super(). Thanks, Miklos ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [Virtio-fs] [PATCH v4 4/5] virtiofs: Skip submounts in sget_fc() 2021-05-21 12:37 ` Miklos Szeredi (?) @ 2021-05-21 13:36 ` Greg Kurz -1 siblings, 0 replies; 83+ messages in thread From: Greg Kurz @ 2021-05-21 13:36 UTC (permalink / raw) To: Miklos Szeredi Cc: linux-kernel, Max Reitz, virtio-fs-list, linux-fsdevel, virtualization, Vivek Goyal On Fri, 21 May 2021 14:37:25 +0200 Miklos Szeredi <miklos@szeredi.hu> wrote: > On Fri, 21 May 2021 at 12:06, Greg Kurz <groug@kaod.org> wrote: > > > > On Fri, 21 May 2021 10:50:34 +0200 > > Miklos Szeredi <miklos@szeredi.hu> wrote: > > > > > On Fri, 21 May 2021 at 10:39, Greg Kurz <groug@kaod.org> wrote: > > > > > > > > On Fri, 21 May 2021 10:26:27 +0200 > > > > Miklos Szeredi <miklos@szeredi.hu> wrote: > > > > > > > > > On Thu, 20 May 2021 at 17:47, Greg Kurz <groug@kaod.org> wrote: > > > > > > > > > > > > All submounts share the same virtio-fs device instance as the root > > > > > > mount. If the same virtiofs filesystem is mounted again, sget_fc() > > > > > > is likely to pick up any of these submounts and reuse it instead of > > > > > > the root mount. > > > > > > > > > > > > On the server side: > > > > > > > > > > > > # mkdir ${some_dir} > > > > > > # mkdir ${some_dir}/mnt1 > > > > > > # mount -t tmpfs none ${some_dir}/mnt1 > > > > > > # touch ${some_dir}/mnt1/THIS_IS_MNT1 > > > > > > # mkdir ${some_dir}/mnt2 > > > > > > # mount -t tmpfs none ${some_dir}/mnt2 > > > > > > # touch ${some_dir}/mnt2/THIS_IS_MNT2 > > > > > > > > > > > > On the client side: > > > > > > > > > > > > # mkdir /mnt/virtiofs1 > > > > > > # mount -t virtiofs myfs /mnt/virtiofs1 > > > > > > # ls /mnt/virtiofs1 > > > > > > mnt1 mnt2 > > > > > > # grep virtiofs /proc/mounts > > > > > > myfs /mnt/virtiofs1 virtiofs rw,seclabel,relatime 0 0 > > > > > > none on /mnt/mnt1 type virtiofs (rw,relatime,seclabel) > > > > > > none on /mnt/mnt2 type virtiofs (rw,relatime,seclabel) > > > > > > > > > > > > And now remount it again: > > > > > > > > > > > > # mount -t virtiofs myfs /mnt/virtiofs2 > > > > > > # grep virtiofs /proc/mounts > > > > > > myfs /mnt/virtiofs1 virtiofs rw,seclabel,relatime 0 0 > > > > > > none on /mnt/mnt1 type virtiofs (rw,relatime,seclabel) > > > > > > none on /mnt/mnt2 type virtiofs (rw,relatime,seclabel) > > > > > > myfs /mnt/virtiofs2 virtiofs rw,seclabel,relatime 0 0 > > > > > > # ls /mnt/virtiofs2 > > > > > > THIS_IS_MNT2 > > > > > > > > > > > > Submount mnt2 was picked-up instead of the root mount. > > > > > > > > > > > > > > Why is this a problem? > > > > > > > > > > > > > It seems very weird to mount the same filesystem again > > > > and to end up in one of its submounts. We should have: > > > > > > > > # ls /mnt/virtiofs2 > > > > mnt1 mnt2 > > > > > > Okay, sorry, I understand the problem. The solution is wrong, > > > however: the position of the submount on that list is no indication > > > that it's the right one (it's possible that the root sb will go away > > > and only a sub-sb will remain). > > > > > > > Ah... I had myself convinced this could not happen, i.e. you can't > > unmount a parent sb with a sub-sb still mounted. > > No, but it's possible for sub-sb to continue existing after it's no > longer a submount of original mount. > > > > How can this happen ? > > E.g. move the submount out of the way, then unmount the parent, or > detach submount (umount -l) while keeping something open in there and > umount the parent. > Ok, I get it now. Thanks for the clarification. > > > Even just setting a flag in the root, indicating that it's the root > > > isn't fully going to solve the problem. > > > > > > Here's issue in full: > > > > > > case 1: no connection for "myfs" exists > > > - need to create fuse_conn, sb > > > > > > case 2: connection for "myfs" exists but only sb for submount > > > > How would we know this sb isn't a root sb ? > > > > > - only create sb for root, reuse fuse_conn > > > > > > case 3: connection for "myfs" as well as root sb exists > > > - reuse sb > > > > > > I'll think about how to fix this properly, it's probably going to be > > > rather more involved... > > > > > > > Sure. BTW I'm wondering why we never reuse sbs for submounts ? > > Right, same general issue. > > An sb can be identified by its root nodeid, so I guess the proper fix > to make the root nodeid be the key for virtio_fs_test_super(). > Cool, I was thinking about doing this exactly. :) > Thanks, > Miklos ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [PATCH v4 4/5] virtiofs: Skip submounts in sget_fc() @ 2021-05-21 13:36 ` Greg Kurz 0 siblings, 0 replies; 83+ messages in thread From: Greg Kurz @ 2021-05-21 13:36 UTC (permalink / raw) To: Miklos Szeredi Cc: virtualization, linux-fsdevel, linux-kernel, virtio-fs-list, Stefan Hajnoczi, Max Reitz, Vivek Goyal On Fri, 21 May 2021 14:37:25 +0200 Miklos Szeredi <miklos@szeredi.hu> wrote: > On Fri, 21 May 2021 at 12:06, Greg Kurz <groug@kaod.org> wrote: > > > > On Fri, 21 May 2021 10:50:34 +0200 > > Miklos Szeredi <miklos@szeredi.hu> wrote: > > > > > On Fri, 21 May 2021 at 10:39, Greg Kurz <groug@kaod.org> wrote: > > > > > > > > On Fri, 21 May 2021 10:26:27 +0200 > > > > Miklos Szeredi <miklos@szeredi.hu> wrote: > > > > > > > > > On Thu, 20 May 2021 at 17:47, Greg Kurz <groug@kaod.org> wrote: > > > > > > > > > > > > All submounts share the same virtio-fs device instance as the root > > > > > > mount. If the same virtiofs filesystem is mounted again, sget_fc() > > > > > > is likely to pick up any of these submounts and reuse it instead of > > > > > > the root mount. > > > > > > > > > > > > On the server side: > > > > > > > > > > > > # mkdir ${some_dir} > > > > > > # mkdir ${some_dir}/mnt1 > > > > > > # mount -t tmpfs none ${some_dir}/mnt1 > > > > > > # touch ${some_dir}/mnt1/THIS_IS_MNT1 > > > > > > # mkdir ${some_dir}/mnt2 > > > > > > # mount -t tmpfs none ${some_dir}/mnt2 > > > > > > # touch ${some_dir}/mnt2/THIS_IS_MNT2 > > > > > > > > > > > > On the client side: > > > > > > > > > > > > # mkdir /mnt/virtiofs1 > > > > > > # mount -t virtiofs myfs /mnt/virtiofs1 > > > > > > # ls /mnt/virtiofs1 > > > > > > mnt1 mnt2 > > > > > > # grep virtiofs /proc/mounts > > > > > > myfs /mnt/virtiofs1 virtiofs rw,seclabel,relatime 0 0 > > > > > > none on /mnt/mnt1 type virtiofs (rw,relatime,seclabel) > > > > > > none on /mnt/mnt2 type virtiofs (rw,relatime,seclabel) > > > > > > > > > > > > And now remount it again: > > > > > > > > > > > > # mount -t virtiofs myfs /mnt/virtiofs2 > > > > > > # grep virtiofs /proc/mounts > > > > > > myfs /mnt/virtiofs1 virtiofs rw,seclabel,relatime 0 0 > > > > > > none on /mnt/mnt1 type virtiofs (rw,relatime,seclabel) > > > > > > none on /mnt/mnt2 type virtiofs (rw,relatime,seclabel) > > > > > > myfs /mnt/virtiofs2 virtiofs rw,seclabel,relatime 0 0 > > > > > > # ls /mnt/virtiofs2 > > > > > > THIS_IS_MNT2 > > > > > > > > > > > > Submount mnt2 was picked-up instead of the root mount. > > > > > > > > > > > > > > Why is this a problem? > > > > > > > > > > > > > It seems very weird to mount the same filesystem again > > > > and to end up in one of its submounts. We should have: > > > > > > > > # ls /mnt/virtiofs2 > > > > mnt1 mnt2 > > > > > > Okay, sorry, I understand the problem. The solution is wrong, > > > however: the position of the submount on that list is no indication > > > that it's the right one (it's possible that the root sb will go away > > > and only a sub-sb will remain). > > > > > > > Ah... I had myself convinced this could not happen, i.e. you can't > > unmount a parent sb with a sub-sb still mounted. > > No, but it's possible for sub-sb to continue existing after it's no > longer a submount of original mount. > > > > How can this happen ? > > E.g. move the submount out of the way, then unmount the parent, or > detach submount (umount -l) while keeping something open in there and > umount the parent. > Ok, I get it now. Thanks for the clarification. > > > Even just setting a flag in the root, indicating that it's the root > > > isn't fully going to solve the problem. > > > > > > Here's issue in full: > > > > > > case 1: no connection for "myfs" exists > > > - need to create fuse_conn, sb > > > > > > case 2: connection for "myfs" exists but only sb for submount > > > > How would we know this sb isn't a root sb ? > > > > > - only create sb for root, reuse fuse_conn > > > > > > case 3: connection for "myfs" as well as root sb exists > > > - reuse sb > > > > > > I'll think about how to fix this properly, it's probably going to be > > > rather more involved... > > > > > > > Sure. BTW I'm wondering why we never reuse sbs for submounts ? > > Right, same general issue. > > An sb can be identified by its root nodeid, so I guess the proper fix > to make the root nodeid be the key for virtio_fs_test_super(). > Cool, I was thinking about doing this exactly. :) > Thanks, > Miklos ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [PATCH v4 4/5] virtiofs: Skip submounts in sget_fc() @ 2021-05-21 13:36 ` Greg Kurz 0 siblings, 0 replies; 83+ messages in thread From: Greg Kurz @ 2021-05-21 13:36 UTC (permalink / raw) To: Miklos Szeredi Cc: linux-kernel, Max Reitz, virtio-fs-list, Stefan Hajnoczi, linux-fsdevel, virtualization, Vivek Goyal On Fri, 21 May 2021 14:37:25 +0200 Miklos Szeredi <miklos@szeredi.hu> wrote: > On Fri, 21 May 2021 at 12:06, Greg Kurz <groug@kaod.org> wrote: > > > > On Fri, 21 May 2021 10:50:34 +0200 > > Miklos Szeredi <miklos@szeredi.hu> wrote: > > > > > On Fri, 21 May 2021 at 10:39, Greg Kurz <groug@kaod.org> wrote: > > > > > > > > On Fri, 21 May 2021 10:26:27 +0200 > > > > Miklos Szeredi <miklos@szeredi.hu> wrote: > > > > > > > > > On Thu, 20 May 2021 at 17:47, Greg Kurz <groug@kaod.org> wrote: > > > > > > > > > > > > All submounts share the same virtio-fs device instance as the root > > > > > > mount. If the same virtiofs filesystem is mounted again, sget_fc() > > > > > > is likely to pick up any of these submounts and reuse it instead of > > > > > > the root mount. > > > > > > > > > > > > On the server side: > > > > > > > > > > > > # mkdir ${some_dir} > > > > > > # mkdir ${some_dir}/mnt1 > > > > > > # mount -t tmpfs none ${some_dir}/mnt1 > > > > > > # touch ${some_dir}/mnt1/THIS_IS_MNT1 > > > > > > # mkdir ${some_dir}/mnt2 > > > > > > # mount -t tmpfs none ${some_dir}/mnt2 > > > > > > # touch ${some_dir}/mnt2/THIS_IS_MNT2 > > > > > > > > > > > > On the client side: > > > > > > > > > > > > # mkdir /mnt/virtiofs1 > > > > > > # mount -t virtiofs myfs /mnt/virtiofs1 > > > > > > # ls /mnt/virtiofs1 > > > > > > mnt1 mnt2 > > > > > > # grep virtiofs /proc/mounts > > > > > > myfs /mnt/virtiofs1 virtiofs rw,seclabel,relatime 0 0 > > > > > > none on /mnt/mnt1 type virtiofs (rw,relatime,seclabel) > > > > > > none on /mnt/mnt2 type virtiofs (rw,relatime,seclabel) > > > > > > > > > > > > And now remount it again: > > > > > > > > > > > > # mount -t virtiofs myfs /mnt/virtiofs2 > > > > > > # grep virtiofs /proc/mounts > > > > > > myfs /mnt/virtiofs1 virtiofs rw,seclabel,relatime 0 0 > > > > > > none on /mnt/mnt1 type virtiofs (rw,relatime,seclabel) > > > > > > none on /mnt/mnt2 type virtiofs (rw,relatime,seclabel) > > > > > > myfs /mnt/virtiofs2 virtiofs rw,seclabel,relatime 0 0 > > > > > > # ls /mnt/virtiofs2 > > > > > > THIS_IS_MNT2 > > > > > > > > > > > > Submount mnt2 was picked-up instead of the root mount. > > > > > > > > > > > > > > Why is this a problem? > > > > > > > > > > > > > It seems very weird to mount the same filesystem again > > > > and to end up in one of its submounts. We should have: > > > > > > > > # ls /mnt/virtiofs2 > > > > mnt1 mnt2 > > > > > > Okay, sorry, I understand the problem. The solution is wrong, > > > however: the position of the submount on that list is no indication > > > that it's the right one (it's possible that the root sb will go away > > > and only a sub-sb will remain). > > > > > > > Ah... I had myself convinced this could not happen, i.e. you can't > > unmount a parent sb with a sub-sb still mounted. > > No, but it's possible for sub-sb to continue existing after it's no > longer a submount of original mount. > > > > How can this happen ? > > E.g. move the submount out of the way, then unmount the parent, or > detach submount (umount -l) while keeping something open in there and > umount the parent. > Ok, I get it now. Thanks for the clarification. > > > Even just setting a flag in the root, indicating that it's the root > > > isn't fully going to solve the problem. > > > > > > Here's issue in full: > > > > > > case 1: no connection for "myfs" exists > > > - need to create fuse_conn, sb > > > > > > case 2: connection for "myfs" exists but only sb for submount > > > > How would we know this sb isn't a root sb ? > > > > > - only create sb for root, reuse fuse_conn > > > > > > case 3: connection for "myfs" as well as root sb exists > > > - reuse sb > > > > > > I'll think about how to fix this properly, it's probably going to be > > > rather more involved... > > > > > > > Sure. BTW I'm wondering why we never reuse sbs for submounts ? > > Right, same general issue. > > An sb can be identified by its root nodeid, so I guess the proper fix > to make the root nodeid be the key for virtio_fs_test_super(). > Cool, I was thinking about doing this exactly. :) > Thanks, > Miklos _______________________________________________ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization ^ permalink raw reply [flat|nested] 83+ messages in thread
* [Virtio-fs] [PATCH v4 5/5] virtiofs: propagate sync() to file server 2021-05-20 15:46 ` Greg Kurz (?) @ 2021-05-20 15:46 ` Greg Kurz -1 siblings, 0 replies; 83+ messages in thread From: Greg Kurz @ 2021-05-20 15:46 UTC (permalink / raw) To: Miklos Szeredi Cc: linux-kernel, Max Reitz, virtio-fs, linux-fsdevel, virtualization, Robert Krawitz, Vivek Goyal Even if POSIX doesn't mandate it, linux users legitimately expect sync() to flush all data and metadata to physical storage when it is located on the same system. This isn't happening with virtiofs though : sync() inside the guest returns right away even though data still needs to be flushed from the host page cache. This is easily demonstrated by doing the following in the guest: $ dd if=/dev/zero of=/mnt/foo bs=1M count=5K ; strace -T -e sync sync 5120+0 records in 5120+0 records out 5368709120 bytes (5.4 GB, 5.0 GiB) copied, 5.22224 s, 1.0 GB/s sync() = 0 <0.024068> +++ exited with 0 +++ and start the following in the host when the 'dd' command completes in the guest: $ strace -T -e fsync /usr/bin/sync virtiofs/foo fsync(3) = 0 <10.371640> +++ exited with 0 +++ There are no good reasons not to honor the expected behavior of sync() actually : it gives an unrealistic impression that virtiofs is super fast and that data has safely landed on HW, which isn't the case obviously. Implement a ->sync_fs() superblock operation that sends a new FUSE_SYNCFS request type for this purpose. Provision a 64-bit placeholder for possible future extensions. Since the file server cannot handle the wait == 0 case, we skip it to avoid a gratuitous roundtrip. Note that this is per-superblock : a FUSE_SYNCFS is send for the root mount and for each submount. Like with FUSE_FSYNC and FUSE_FSYNCDIR, lack of support for FUSE_SYNCFS in the file server is treated as permanent success. This ensures compatibility with older file servers : the client will get the current behavior of sync() not being propagated to the file server. Note that such an operation allows the file server to DoS sync(). Since a typical FUSE file server is an untrusted piece of software running in userspace, this is disabled by default. Only enable it with virtiofs for now since virtiofsd is supposedly trusted by the guest kernel. Reported-by: Robert Krawitz <rlk@redhat.com> Signed-off-by: Greg Kurz <groug@kaod.org> --- fs/fuse/fuse_i.h | 3 +++ fs/fuse/inode.c | 40 +++++++++++++++++++++++++++++++++++++++ fs/fuse/virtio_fs.c | 1 + include/uapi/linux/fuse.h | 10 +++++++++- 4 files changed, 53 insertions(+), 1 deletion(-) diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index e2f5c8617e0d..01d9283261af 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -761,6 +761,9 @@ struct fuse_conn { /* Auto-mount submounts announced by the server */ unsigned int auto_submounts:1; + /* Propagate syncfs() to server */ + unsigned int sync_fs:1; + /** The number of requests waiting for completion */ atomic_t num_waiting; diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index 123b53d1c3c6..96b00253f766 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -506,6 +506,45 @@ static int fuse_statfs(struct dentry *dentry, struct kstatfs *buf) return err; } +static int fuse_sync_fs(struct super_block *sb, int wait) +{ + struct fuse_mount *fm = get_fuse_mount_super(sb); + struct fuse_conn *fc = fm->fc; + struct fuse_syncfs_in inarg; + FUSE_ARGS(args); + int err; + + /* + * Userspace cannot handle the wait == 0 case. Avoid a + * gratuitous roundtrip. + */ + if (!wait) + return 0; + + /* The filesystem is being unmounted. Nothing to do. */ + if (!sb->s_root) + return 0; + + if (!fc->sync_fs) + return 0; + + memset(&inarg, 0, sizeof(inarg)); + args.in_numargs = 1; + args.in_args[0].size = sizeof(inarg); + args.in_args[0].value = &inarg; + args.opcode = FUSE_SYNCFS; + args.nodeid = get_node_id(sb->s_root->d_inode); + args.out_numargs = 0; + + err = fuse_simple_request(fm, &args); + if (err == -ENOSYS) { + fc->sync_fs = 0; + err = 0; + } + + return err; +} + enum { OPT_SOURCE, OPT_SUBTYPE, @@ -909,6 +948,7 @@ static const struct super_operations fuse_super_operations = { .put_super = fuse_put_super, .umount_begin = fuse_umount_begin, .statfs = fuse_statfs, + .sync_fs = fuse_sync_fs, .show_options = fuse_show_options, }; diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c index 8962cd033016..f649a47efb68 100644 --- a/fs/fuse/virtio_fs.c +++ b/fs/fuse/virtio_fs.c @@ -1455,6 +1455,7 @@ static int virtio_fs_get_tree(struct fs_context *fsc) fc->release = fuse_free_conn; fc->delete_stale = true; fc->auto_submounts = true; + fc->sync_fs = true; /* Tell FUSE to split requests that exceed the virtqueue's size */ fc->max_pages_limit = min_t(unsigned int, fc->max_pages_limit, diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h index 271ae90a9bb7..36ed092227fa 100644 --- a/include/uapi/linux/fuse.h +++ b/include/uapi/linux/fuse.h @@ -181,6 +181,9 @@ * - add FUSE_OPEN_KILL_SUIDGID * - extend fuse_setxattr_in, add FUSE_SETXATTR_EXT * - add FUSE_SETXATTR_ACL_KILL_SGID + * + * 7.34 + * - add FUSE_SYNCFS */ #ifndef _LINUX_FUSE_H @@ -216,7 +219,7 @@ #define FUSE_KERNEL_VERSION 7 /** Minor version number of this interface */ -#define FUSE_KERNEL_MINOR_VERSION 33 +#define FUSE_KERNEL_MINOR_VERSION 34 /** The node ID of the root inode */ #define FUSE_ROOT_ID 1 @@ -509,6 +512,7 @@ enum fuse_opcode { FUSE_COPY_FILE_RANGE = 47, FUSE_SETUPMAPPING = 48, FUSE_REMOVEMAPPING = 49, + FUSE_SYNCFS = 50, /* CUSE specific operations */ CUSE_INIT = 4096, @@ -971,4 +975,8 @@ struct fuse_removemapping_one { #define FUSE_REMOVEMAPPING_MAX_ENTRY \ (PAGE_SIZE / sizeof(struct fuse_removemapping_one)) +struct fuse_syncfs_in { + uint64_t padding; +}; + #endif /* _LINUX_FUSE_H */ -- 2.26.3 ^ permalink raw reply related [flat|nested] 83+ messages in thread
* [PATCH v4 5/5] virtiofs: propagate sync() to file server @ 2021-05-20 15:46 ` Greg Kurz 0 siblings, 0 replies; 83+ messages in thread From: Greg Kurz @ 2021-05-20 15:46 UTC (permalink / raw) To: Miklos Szeredi Cc: virtualization, linux-fsdevel, linux-kernel, virtio-fs, Stefan Hajnoczi, Max Reitz, Vivek Goyal, Greg Kurz, Robert Krawitz Even if POSIX doesn't mandate it, linux users legitimately expect sync() to flush all data and metadata to physical storage when it is located on the same system. This isn't happening with virtiofs though : sync() inside the guest returns right away even though data still needs to be flushed from the host page cache. This is easily demonstrated by doing the following in the guest: $ dd if=/dev/zero of=/mnt/foo bs=1M count=5K ; strace -T -e sync sync 5120+0 records in 5120+0 records out 5368709120 bytes (5.4 GB, 5.0 GiB) copied, 5.22224 s, 1.0 GB/s sync() = 0 <0.024068> +++ exited with 0 +++ and start the following in the host when the 'dd' command completes in the guest: $ strace -T -e fsync /usr/bin/sync virtiofs/foo fsync(3) = 0 <10.371640> +++ exited with 0 +++ There are no good reasons not to honor the expected behavior of sync() actually : it gives an unrealistic impression that virtiofs is super fast and that data has safely landed on HW, which isn't the case obviously. Implement a ->sync_fs() superblock operation that sends a new FUSE_SYNCFS request type for this purpose. Provision a 64-bit placeholder for possible future extensions. Since the file server cannot handle the wait == 0 case, we skip it to avoid a gratuitous roundtrip. Note that this is per-superblock : a FUSE_SYNCFS is send for the root mount and for each submount. Like with FUSE_FSYNC and FUSE_FSYNCDIR, lack of support for FUSE_SYNCFS in the file server is treated as permanent success. This ensures compatibility with older file servers : the client will get the current behavior of sync() not being propagated to the file server. Note that such an operation allows the file server to DoS sync(). Since a typical FUSE file server is an untrusted piece of software running in userspace, this is disabled by default. Only enable it with virtiofs for now since virtiofsd is supposedly trusted by the guest kernel. Reported-by: Robert Krawitz <rlk@redhat.com> Signed-off-by: Greg Kurz <groug@kaod.org> --- fs/fuse/fuse_i.h | 3 +++ fs/fuse/inode.c | 40 +++++++++++++++++++++++++++++++++++++++ fs/fuse/virtio_fs.c | 1 + include/uapi/linux/fuse.h | 10 +++++++++- 4 files changed, 53 insertions(+), 1 deletion(-) diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index e2f5c8617e0d..01d9283261af 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -761,6 +761,9 @@ struct fuse_conn { /* Auto-mount submounts announced by the server */ unsigned int auto_submounts:1; + /* Propagate syncfs() to server */ + unsigned int sync_fs:1; + /** The number of requests waiting for completion */ atomic_t num_waiting; diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index 123b53d1c3c6..96b00253f766 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -506,6 +506,45 @@ static int fuse_statfs(struct dentry *dentry, struct kstatfs *buf) return err; } +static int fuse_sync_fs(struct super_block *sb, int wait) +{ + struct fuse_mount *fm = get_fuse_mount_super(sb); + struct fuse_conn *fc = fm->fc; + struct fuse_syncfs_in inarg; + FUSE_ARGS(args); + int err; + + /* + * Userspace cannot handle the wait == 0 case. Avoid a + * gratuitous roundtrip. + */ + if (!wait) + return 0; + + /* The filesystem is being unmounted. Nothing to do. */ + if (!sb->s_root) + return 0; + + if (!fc->sync_fs) + return 0; + + memset(&inarg, 0, sizeof(inarg)); + args.in_numargs = 1; + args.in_args[0].size = sizeof(inarg); + args.in_args[0].value = &inarg; + args.opcode = FUSE_SYNCFS; + args.nodeid = get_node_id(sb->s_root->d_inode); + args.out_numargs = 0; + + err = fuse_simple_request(fm, &args); + if (err == -ENOSYS) { + fc->sync_fs = 0; + err = 0; + } + + return err; +} + enum { OPT_SOURCE, OPT_SUBTYPE, @@ -909,6 +948,7 @@ static const struct super_operations fuse_super_operations = { .put_super = fuse_put_super, .umount_begin = fuse_umount_begin, .statfs = fuse_statfs, + .sync_fs = fuse_sync_fs, .show_options = fuse_show_options, }; diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c index 8962cd033016..f649a47efb68 100644 --- a/fs/fuse/virtio_fs.c +++ b/fs/fuse/virtio_fs.c @@ -1455,6 +1455,7 @@ static int virtio_fs_get_tree(struct fs_context *fsc) fc->release = fuse_free_conn; fc->delete_stale = true; fc->auto_submounts = true; + fc->sync_fs = true; /* Tell FUSE to split requests that exceed the virtqueue's size */ fc->max_pages_limit = min_t(unsigned int, fc->max_pages_limit, diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h index 271ae90a9bb7..36ed092227fa 100644 --- a/include/uapi/linux/fuse.h +++ b/include/uapi/linux/fuse.h @@ -181,6 +181,9 @@ * - add FUSE_OPEN_KILL_SUIDGID * - extend fuse_setxattr_in, add FUSE_SETXATTR_EXT * - add FUSE_SETXATTR_ACL_KILL_SGID + * + * 7.34 + * - add FUSE_SYNCFS */ #ifndef _LINUX_FUSE_H @@ -216,7 +219,7 @@ #define FUSE_KERNEL_VERSION 7 /** Minor version number of this interface */ -#define FUSE_KERNEL_MINOR_VERSION 33 +#define FUSE_KERNEL_MINOR_VERSION 34 /** The node ID of the root inode */ #define FUSE_ROOT_ID 1 @@ -509,6 +512,7 @@ enum fuse_opcode { FUSE_COPY_FILE_RANGE = 47, FUSE_SETUPMAPPING = 48, FUSE_REMOVEMAPPING = 49, + FUSE_SYNCFS = 50, /* CUSE specific operations */ CUSE_INIT = 4096, @@ -971,4 +975,8 @@ struct fuse_removemapping_one { #define FUSE_REMOVEMAPPING_MAX_ENTRY \ (PAGE_SIZE / sizeof(struct fuse_removemapping_one)) +struct fuse_syncfs_in { + uint64_t padding; +}; + #endif /* _LINUX_FUSE_H */ -- 2.26.3 ^ permalink raw reply related [flat|nested] 83+ messages in thread
* [PATCH v4 5/5] virtiofs: propagate sync() to file server @ 2021-05-20 15:46 ` Greg Kurz 0 siblings, 0 replies; 83+ messages in thread From: Greg Kurz @ 2021-05-20 15:46 UTC (permalink / raw) To: Miklos Szeredi Cc: linux-kernel, Max Reitz, virtio-fs, Stefan Hajnoczi, linux-fsdevel, virtualization, Robert Krawitz, Vivek Goyal Even if POSIX doesn't mandate it, linux users legitimately expect sync() to flush all data and metadata to physical storage when it is located on the same system. This isn't happening with virtiofs though : sync() inside the guest returns right away even though data still needs to be flushed from the host page cache. This is easily demonstrated by doing the following in the guest: $ dd if=/dev/zero of=/mnt/foo bs=1M count=5K ; strace -T -e sync sync 5120+0 records in 5120+0 records out 5368709120 bytes (5.4 GB, 5.0 GiB) copied, 5.22224 s, 1.0 GB/s sync() = 0 <0.024068> +++ exited with 0 +++ and start the following in the host when the 'dd' command completes in the guest: $ strace -T -e fsync /usr/bin/sync virtiofs/foo fsync(3) = 0 <10.371640> +++ exited with 0 +++ There are no good reasons not to honor the expected behavior of sync() actually : it gives an unrealistic impression that virtiofs is super fast and that data has safely landed on HW, which isn't the case obviously. Implement a ->sync_fs() superblock operation that sends a new FUSE_SYNCFS request type for this purpose. Provision a 64-bit placeholder for possible future extensions. Since the file server cannot handle the wait == 0 case, we skip it to avoid a gratuitous roundtrip. Note that this is per-superblock : a FUSE_SYNCFS is send for the root mount and for each submount. Like with FUSE_FSYNC and FUSE_FSYNCDIR, lack of support for FUSE_SYNCFS in the file server is treated as permanent success. This ensures compatibility with older file servers : the client will get the current behavior of sync() not being propagated to the file server. Note that such an operation allows the file server to DoS sync(). Since a typical FUSE file server is an untrusted piece of software running in userspace, this is disabled by default. Only enable it with virtiofs for now since virtiofsd is supposedly trusted by the guest kernel. Reported-by: Robert Krawitz <rlk@redhat.com> Signed-off-by: Greg Kurz <groug@kaod.org> --- fs/fuse/fuse_i.h | 3 +++ fs/fuse/inode.c | 40 +++++++++++++++++++++++++++++++++++++++ fs/fuse/virtio_fs.c | 1 + include/uapi/linux/fuse.h | 10 +++++++++- 4 files changed, 53 insertions(+), 1 deletion(-) diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index e2f5c8617e0d..01d9283261af 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -761,6 +761,9 @@ struct fuse_conn { /* Auto-mount submounts announced by the server */ unsigned int auto_submounts:1; + /* Propagate syncfs() to server */ + unsigned int sync_fs:1; + /** The number of requests waiting for completion */ atomic_t num_waiting; diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index 123b53d1c3c6..96b00253f766 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -506,6 +506,45 @@ static int fuse_statfs(struct dentry *dentry, struct kstatfs *buf) return err; } +static int fuse_sync_fs(struct super_block *sb, int wait) +{ + struct fuse_mount *fm = get_fuse_mount_super(sb); + struct fuse_conn *fc = fm->fc; + struct fuse_syncfs_in inarg; + FUSE_ARGS(args); + int err; + + /* + * Userspace cannot handle the wait == 0 case. Avoid a + * gratuitous roundtrip. + */ + if (!wait) + return 0; + + /* The filesystem is being unmounted. Nothing to do. */ + if (!sb->s_root) + return 0; + + if (!fc->sync_fs) + return 0; + + memset(&inarg, 0, sizeof(inarg)); + args.in_numargs = 1; + args.in_args[0].size = sizeof(inarg); + args.in_args[0].value = &inarg; + args.opcode = FUSE_SYNCFS; + args.nodeid = get_node_id(sb->s_root->d_inode); + args.out_numargs = 0; + + err = fuse_simple_request(fm, &args); + if (err == -ENOSYS) { + fc->sync_fs = 0; + err = 0; + } + + return err; +} + enum { OPT_SOURCE, OPT_SUBTYPE, @@ -909,6 +948,7 @@ static const struct super_operations fuse_super_operations = { .put_super = fuse_put_super, .umount_begin = fuse_umount_begin, .statfs = fuse_statfs, + .sync_fs = fuse_sync_fs, .show_options = fuse_show_options, }; diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c index 8962cd033016..f649a47efb68 100644 --- a/fs/fuse/virtio_fs.c +++ b/fs/fuse/virtio_fs.c @@ -1455,6 +1455,7 @@ static int virtio_fs_get_tree(struct fs_context *fsc) fc->release = fuse_free_conn; fc->delete_stale = true; fc->auto_submounts = true; + fc->sync_fs = true; /* Tell FUSE to split requests that exceed the virtqueue's size */ fc->max_pages_limit = min_t(unsigned int, fc->max_pages_limit, diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h index 271ae90a9bb7..36ed092227fa 100644 --- a/include/uapi/linux/fuse.h +++ b/include/uapi/linux/fuse.h @@ -181,6 +181,9 @@ * - add FUSE_OPEN_KILL_SUIDGID * - extend fuse_setxattr_in, add FUSE_SETXATTR_EXT * - add FUSE_SETXATTR_ACL_KILL_SGID + * + * 7.34 + * - add FUSE_SYNCFS */ #ifndef _LINUX_FUSE_H @@ -216,7 +219,7 @@ #define FUSE_KERNEL_VERSION 7 /** Minor version number of this interface */ -#define FUSE_KERNEL_MINOR_VERSION 33 +#define FUSE_KERNEL_MINOR_VERSION 34 /** The node ID of the root inode */ #define FUSE_ROOT_ID 1 @@ -509,6 +512,7 @@ enum fuse_opcode { FUSE_COPY_FILE_RANGE = 47, FUSE_SETUPMAPPING = 48, FUSE_REMOVEMAPPING = 49, + FUSE_SYNCFS = 50, /* CUSE specific operations */ CUSE_INIT = 4096, @@ -971,4 +975,8 @@ struct fuse_removemapping_one { #define FUSE_REMOVEMAPPING_MAX_ENTRY \ (PAGE_SIZE / sizeof(struct fuse_removemapping_one)) +struct fuse_syncfs_in { + uint64_t padding; +}; + #endif /* _LINUX_FUSE_H */ -- 2.26.3 _______________________________________________ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization ^ permalink raw reply related [flat|nested] 83+ messages in thread
* Re: [Virtio-fs] [PATCH v4 5/5] virtiofs: propagate sync() to file server 2021-05-20 15:46 ` Greg Kurz (?) @ 2021-05-21 10:08 ` Greg Kurz -1 siblings, 0 replies; 83+ messages in thread From: Greg Kurz @ 2021-05-21 10:08 UTC (permalink / raw) To: Miklos Szeredi Cc: linux-kernel, Max Reitz, virtio-fs, linux-fsdevel, virtualization, Robert Krawitz, Vivek Goyal On Thu, 20 May 2021 17:46:54 +0200 Greg Kurz <groug@kaod.org> wrote: > Even if POSIX doesn't mandate it, linux users legitimately expect > sync() to flush all data and metadata to physical storage when it > is located on the same system. This isn't happening with virtiofs > though : sync() inside the guest returns right away even though > data still needs to be flushed from the host page cache. > > This is easily demonstrated by doing the following in the guest: > > $ dd if=/dev/zero of=/mnt/foo bs=1M count=5K ; strace -T -e sync sync > 5120+0 records in > 5120+0 records out > 5368709120 bytes (5.4 GB, 5.0 GiB) copied, 5.22224 s, 1.0 GB/s > sync() = 0 <0.024068> > +++ exited with 0 +++ > > and start the following in the host when the 'dd' command completes > in the guest: > > $ strace -T -e fsync /usr/bin/sync virtiofs/foo > fsync(3) = 0 <10.371640> > +++ exited with 0 +++ > > There are no good reasons not to honor the expected behavior of > sync() actually : it gives an unrealistic impression that virtiofs > is super fast and that data has safely landed on HW, which isn't > the case obviously. > > Implement a ->sync_fs() superblock operation that sends a new > FUSE_SYNCFS request type for this purpose. Provision a 64-bit > placeholder for possible future extensions. Since the file > server cannot handle the wait == 0 case, we skip it to avoid a > gratuitous roundtrip. Note that this is per-superblock : a > FUSE_SYNCFS is send for the root mount and for each submount. > s/send/sent Miklos, Great thanks for the quick feedback on these patches ! :) Apart from the fact that nothing is sent for submounts as long as we don't set SB_BORN on them, this patch doesn't really depends on the previous ones. If it looks good to you, maybe you can just merge it and I'll re-post the fixes separately ? Cheers, -- Greg > Like with FUSE_FSYNC and FUSE_FSYNCDIR, lack of support for > FUSE_SYNCFS in the file server is treated as permanent success. > This ensures compatibility with older file servers : the client > will get the current behavior of sync() not being propagated to > the file server. > > Note that such an operation allows the file server to DoS sync(). > Since a typical FUSE file server is an untrusted piece of software > running in userspace, this is disabled by default. Only enable it > with virtiofs for now since virtiofsd is supposedly trusted by the > guest kernel. > > Reported-by: Robert Krawitz <rlk@redhat.com> > Signed-off-by: Greg Kurz <groug@kaod.org> > --- > fs/fuse/fuse_i.h | 3 +++ > fs/fuse/inode.c | 40 +++++++++++++++++++++++++++++++++++++++ > fs/fuse/virtio_fs.c | 1 + > include/uapi/linux/fuse.h | 10 +++++++++- > 4 files changed, 53 insertions(+), 1 deletion(-) > > diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h > index e2f5c8617e0d..01d9283261af 100644 > --- a/fs/fuse/fuse_i.h > +++ b/fs/fuse/fuse_i.h > @@ -761,6 +761,9 @@ struct fuse_conn { > /* Auto-mount submounts announced by the server */ > unsigned int auto_submounts:1; > > + /* Propagate syncfs() to server */ > + unsigned int sync_fs:1; > + > /** The number of requests waiting for completion */ > atomic_t num_waiting; > > diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c > index 123b53d1c3c6..96b00253f766 100644 > --- a/fs/fuse/inode.c > +++ b/fs/fuse/inode.c > @@ -506,6 +506,45 @@ static int fuse_statfs(struct dentry *dentry, struct kstatfs *buf) > return err; > } > > +static int fuse_sync_fs(struct super_block *sb, int wait) > +{ > + struct fuse_mount *fm = get_fuse_mount_super(sb); > + struct fuse_conn *fc = fm->fc; > + struct fuse_syncfs_in inarg; > + FUSE_ARGS(args); > + int err; > + > + /* > + * Userspace cannot handle the wait == 0 case. Avoid a > + * gratuitous roundtrip. > + */ > + if (!wait) > + return 0; > + > + /* The filesystem is being unmounted. Nothing to do. */ > + if (!sb->s_root) > + return 0; > + > + if (!fc->sync_fs) > + return 0; > + > + memset(&inarg, 0, sizeof(inarg)); > + args.in_numargs = 1; > + args.in_args[0].size = sizeof(inarg); > + args.in_args[0].value = &inarg; > + args.opcode = FUSE_SYNCFS; > + args.nodeid = get_node_id(sb->s_root->d_inode); > + args.out_numargs = 0; > + > + err = fuse_simple_request(fm, &args); > + if (err == -ENOSYS) { > + fc->sync_fs = 0; > + err = 0; > + } > + > + return err; > +} > + > enum { > OPT_SOURCE, > OPT_SUBTYPE, > @@ -909,6 +948,7 @@ static const struct super_operations fuse_super_operations = { > .put_super = fuse_put_super, > .umount_begin = fuse_umount_begin, > .statfs = fuse_statfs, > + .sync_fs = fuse_sync_fs, > .show_options = fuse_show_options, > }; > > diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c > index 8962cd033016..f649a47efb68 100644 > --- a/fs/fuse/virtio_fs.c > +++ b/fs/fuse/virtio_fs.c > @@ -1455,6 +1455,7 @@ static int virtio_fs_get_tree(struct fs_context *fsc) > fc->release = fuse_free_conn; > fc->delete_stale = true; > fc->auto_submounts = true; > + fc->sync_fs = true; > > /* Tell FUSE to split requests that exceed the virtqueue's size */ > fc->max_pages_limit = min_t(unsigned int, fc->max_pages_limit, > diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h > index 271ae90a9bb7..36ed092227fa 100644 > --- a/include/uapi/linux/fuse.h > +++ b/include/uapi/linux/fuse.h > @@ -181,6 +181,9 @@ > * - add FUSE_OPEN_KILL_SUIDGID > * - extend fuse_setxattr_in, add FUSE_SETXATTR_EXT > * - add FUSE_SETXATTR_ACL_KILL_SGID > + * > + * 7.34 > + * - add FUSE_SYNCFS > */ > > #ifndef _LINUX_FUSE_H > @@ -216,7 +219,7 @@ > #define FUSE_KERNEL_VERSION 7 > > /** Minor version number of this interface */ > -#define FUSE_KERNEL_MINOR_VERSION 33 > +#define FUSE_KERNEL_MINOR_VERSION 34 > > /** The node ID of the root inode */ > #define FUSE_ROOT_ID 1 > @@ -509,6 +512,7 @@ enum fuse_opcode { > FUSE_COPY_FILE_RANGE = 47, > FUSE_SETUPMAPPING = 48, > FUSE_REMOVEMAPPING = 49, > + FUSE_SYNCFS = 50, > > /* CUSE specific operations */ > CUSE_INIT = 4096, > @@ -971,4 +975,8 @@ struct fuse_removemapping_one { > #define FUSE_REMOVEMAPPING_MAX_ENTRY \ > (PAGE_SIZE / sizeof(struct fuse_removemapping_one)) > > +struct fuse_syncfs_in { > + uint64_t padding; > +}; > + > #endif /* _LINUX_FUSE_H */ ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [PATCH v4 5/5] virtiofs: propagate sync() to file server @ 2021-05-21 10:08 ` Greg Kurz 0 siblings, 0 replies; 83+ messages in thread From: Greg Kurz @ 2021-05-21 10:08 UTC (permalink / raw) To: Miklos Szeredi Cc: virtualization, linux-fsdevel, linux-kernel, virtio-fs, Stefan Hajnoczi, Max Reitz, Vivek Goyal, Robert Krawitz On Thu, 20 May 2021 17:46:54 +0200 Greg Kurz <groug@kaod.org> wrote: > Even if POSIX doesn't mandate it, linux users legitimately expect > sync() to flush all data and metadata to physical storage when it > is located on the same system. This isn't happening with virtiofs > though : sync() inside the guest returns right away even though > data still needs to be flushed from the host page cache. > > This is easily demonstrated by doing the following in the guest: > > $ dd if=/dev/zero of=/mnt/foo bs=1M count=5K ; strace -T -e sync sync > 5120+0 records in > 5120+0 records out > 5368709120 bytes (5.4 GB, 5.0 GiB) copied, 5.22224 s, 1.0 GB/s > sync() = 0 <0.024068> > +++ exited with 0 +++ > > and start the following in the host when the 'dd' command completes > in the guest: > > $ strace -T -e fsync /usr/bin/sync virtiofs/foo > fsync(3) = 0 <10.371640> > +++ exited with 0 +++ > > There are no good reasons not to honor the expected behavior of > sync() actually : it gives an unrealistic impression that virtiofs > is super fast and that data has safely landed on HW, which isn't > the case obviously. > > Implement a ->sync_fs() superblock operation that sends a new > FUSE_SYNCFS request type for this purpose. Provision a 64-bit > placeholder for possible future extensions. Since the file > server cannot handle the wait == 0 case, we skip it to avoid a > gratuitous roundtrip. Note that this is per-superblock : a > FUSE_SYNCFS is send for the root mount and for each submount. > s/send/sent Miklos, Great thanks for the quick feedback on these patches ! :) Apart from the fact that nothing is sent for submounts as long as we don't set SB_BORN on them, this patch doesn't really depends on the previous ones. If it looks good to you, maybe you can just merge it and I'll re-post the fixes separately ? Cheers, -- Greg > Like with FUSE_FSYNC and FUSE_FSYNCDIR, lack of support for > FUSE_SYNCFS in the file server is treated as permanent success. > This ensures compatibility with older file servers : the client > will get the current behavior of sync() not being propagated to > the file server. > > Note that such an operation allows the file server to DoS sync(). > Since a typical FUSE file server is an untrusted piece of software > running in userspace, this is disabled by default. Only enable it > with virtiofs for now since virtiofsd is supposedly trusted by the > guest kernel. > > Reported-by: Robert Krawitz <rlk@redhat.com> > Signed-off-by: Greg Kurz <groug@kaod.org> > --- > fs/fuse/fuse_i.h | 3 +++ > fs/fuse/inode.c | 40 +++++++++++++++++++++++++++++++++++++++ > fs/fuse/virtio_fs.c | 1 + > include/uapi/linux/fuse.h | 10 +++++++++- > 4 files changed, 53 insertions(+), 1 deletion(-) > > diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h > index e2f5c8617e0d..01d9283261af 100644 > --- a/fs/fuse/fuse_i.h > +++ b/fs/fuse/fuse_i.h > @@ -761,6 +761,9 @@ struct fuse_conn { > /* Auto-mount submounts announced by the server */ > unsigned int auto_submounts:1; > > + /* Propagate syncfs() to server */ > + unsigned int sync_fs:1; > + > /** The number of requests waiting for completion */ > atomic_t num_waiting; > > diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c > index 123b53d1c3c6..96b00253f766 100644 > --- a/fs/fuse/inode.c > +++ b/fs/fuse/inode.c > @@ -506,6 +506,45 @@ static int fuse_statfs(struct dentry *dentry, struct kstatfs *buf) > return err; > } > > +static int fuse_sync_fs(struct super_block *sb, int wait) > +{ > + struct fuse_mount *fm = get_fuse_mount_super(sb); > + struct fuse_conn *fc = fm->fc; > + struct fuse_syncfs_in inarg; > + FUSE_ARGS(args); > + int err; > + > + /* > + * Userspace cannot handle the wait == 0 case. Avoid a > + * gratuitous roundtrip. > + */ > + if (!wait) > + return 0; > + > + /* The filesystem is being unmounted. Nothing to do. */ > + if (!sb->s_root) > + return 0; > + > + if (!fc->sync_fs) > + return 0; > + > + memset(&inarg, 0, sizeof(inarg)); > + args.in_numargs = 1; > + args.in_args[0].size = sizeof(inarg); > + args.in_args[0].value = &inarg; > + args.opcode = FUSE_SYNCFS; > + args.nodeid = get_node_id(sb->s_root->d_inode); > + args.out_numargs = 0; > + > + err = fuse_simple_request(fm, &args); > + if (err == -ENOSYS) { > + fc->sync_fs = 0; > + err = 0; > + } > + > + return err; > +} > + > enum { > OPT_SOURCE, > OPT_SUBTYPE, > @@ -909,6 +948,7 @@ static const struct super_operations fuse_super_operations = { > .put_super = fuse_put_super, > .umount_begin = fuse_umount_begin, > .statfs = fuse_statfs, > + .sync_fs = fuse_sync_fs, > .show_options = fuse_show_options, > }; > > diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c > index 8962cd033016..f649a47efb68 100644 > --- a/fs/fuse/virtio_fs.c > +++ b/fs/fuse/virtio_fs.c > @@ -1455,6 +1455,7 @@ static int virtio_fs_get_tree(struct fs_context *fsc) > fc->release = fuse_free_conn; > fc->delete_stale = true; > fc->auto_submounts = true; > + fc->sync_fs = true; > > /* Tell FUSE to split requests that exceed the virtqueue's size */ > fc->max_pages_limit = min_t(unsigned int, fc->max_pages_limit, > diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h > index 271ae90a9bb7..36ed092227fa 100644 > --- a/include/uapi/linux/fuse.h > +++ b/include/uapi/linux/fuse.h > @@ -181,6 +181,9 @@ > * - add FUSE_OPEN_KILL_SUIDGID > * - extend fuse_setxattr_in, add FUSE_SETXATTR_EXT > * - add FUSE_SETXATTR_ACL_KILL_SGID > + * > + * 7.34 > + * - add FUSE_SYNCFS > */ > > #ifndef _LINUX_FUSE_H > @@ -216,7 +219,7 @@ > #define FUSE_KERNEL_VERSION 7 > > /** Minor version number of this interface */ > -#define FUSE_KERNEL_MINOR_VERSION 33 > +#define FUSE_KERNEL_MINOR_VERSION 34 > > /** The node ID of the root inode */ > #define FUSE_ROOT_ID 1 > @@ -509,6 +512,7 @@ enum fuse_opcode { > FUSE_COPY_FILE_RANGE = 47, > FUSE_SETUPMAPPING = 48, > FUSE_REMOVEMAPPING = 49, > + FUSE_SYNCFS = 50, > > /* CUSE specific operations */ > CUSE_INIT = 4096, > @@ -971,4 +975,8 @@ struct fuse_removemapping_one { > #define FUSE_REMOVEMAPPING_MAX_ENTRY \ > (PAGE_SIZE / sizeof(struct fuse_removemapping_one)) > > +struct fuse_syncfs_in { > + uint64_t padding; > +}; > + > #endif /* _LINUX_FUSE_H */ ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [PATCH v4 5/5] virtiofs: propagate sync() to file server @ 2021-05-21 10:08 ` Greg Kurz 0 siblings, 0 replies; 83+ messages in thread From: Greg Kurz @ 2021-05-21 10:08 UTC (permalink / raw) To: Miklos Szeredi Cc: linux-kernel, Max Reitz, virtio-fs, Stefan Hajnoczi, linux-fsdevel, virtualization, Robert Krawitz, Vivek Goyal On Thu, 20 May 2021 17:46:54 +0200 Greg Kurz <groug@kaod.org> wrote: > Even if POSIX doesn't mandate it, linux users legitimately expect > sync() to flush all data and metadata to physical storage when it > is located on the same system. This isn't happening with virtiofs > though : sync() inside the guest returns right away even though > data still needs to be flushed from the host page cache. > > This is easily demonstrated by doing the following in the guest: > > $ dd if=/dev/zero of=/mnt/foo bs=1M count=5K ; strace -T -e sync sync > 5120+0 records in > 5120+0 records out > 5368709120 bytes (5.4 GB, 5.0 GiB) copied, 5.22224 s, 1.0 GB/s > sync() = 0 <0.024068> > +++ exited with 0 +++ > > and start the following in the host when the 'dd' command completes > in the guest: > > $ strace -T -e fsync /usr/bin/sync virtiofs/foo > fsync(3) = 0 <10.371640> > +++ exited with 0 +++ > > There are no good reasons not to honor the expected behavior of > sync() actually : it gives an unrealistic impression that virtiofs > is super fast and that data has safely landed on HW, which isn't > the case obviously. > > Implement a ->sync_fs() superblock operation that sends a new > FUSE_SYNCFS request type for this purpose. Provision a 64-bit > placeholder for possible future extensions. Since the file > server cannot handle the wait == 0 case, we skip it to avoid a > gratuitous roundtrip. Note that this is per-superblock : a > FUSE_SYNCFS is send for the root mount and for each submount. > s/send/sent Miklos, Great thanks for the quick feedback on these patches ! :) Apart from the fact that nothing is sent for submounts as long as we don't set SB_BORN on them, this patch doesn't really depends on the previous ones. If it looks good to you, maybe you can just merge it and I'll re-post the fixes separately ? Cheers, -- Greg > Like with FUSE_FSYNC and FUSE_FSYNCDIR, lack of support for > FUSE_SYNCFS in the file server is treated as permanent success. > This ensures compatibility with older file servers : the client > will get the current behavior of sync() not being propagated to > the file server. > > Note that such an operation allows the file server to DoS sync(). > Since a typical FUSE file server is an untrusted piece of software > running in userspace, this is disabled by default. Only enable it > with virtiofs for now since virtiofsd is supposedly trusted by the > guest kernel. > > Reported-by: Robert Krawitz <rlk@redhat.com> > Signed-off-by: Greg Kurz <groug@kaod.org> > --- > fs/fuse/fuse_i.h | 3 +++ > fs/fuse/inode.c | 40 +++++++++++++++++++++++++++++++++++++++ > fs/fuse/virtio_fs.c | 1 + > include/uapi/linux/fuse.h | 10 +++++++++- > 4 files changed, 53 insertions(+), 1 deletion(-) > > diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h > index e2f5c8617e0d..01d9283261af 100644 > --- a/fs/fuse/fuse_i.h > +++ b/fs/fuse/fuse_i.h > @@ -761,6 +761,9 @@ struct fuse_conn { > /* Auto-mount submounts announced by the server */ > unsigned int auto_submounts:1; > > + /* Propagate syncfs() to server */ > + unsigned int sync_fs:1; > + > /** The number of requests waiting for completion */ > atomic_t num_waiting; > > diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c > index 123b53d1c3c6..96b00253f766 100644 > --- a/fs/fuse/inode.c > +++ b/fs/fuse/inode.c > @@ -506,6 +506,45 @@ static int fuse_statfs(struct dentry *dentry, struct kstatfs *buf) > return err; > } > > +static int fuse_sync_fs(struct super_block *sb, int wait) > +{ > + struct fuse_mount *fm = get_fuse_mount_super(sb); > + struct fuse_conn *fc = fm->fc; > + struct fuse_syncfs_in inarg; > + FUSE_ARGS(args); > + int err; > + > + /* > + * Userspace cannot handle the wait == 0 case. Avoid a > + * gratuitous roundtrip. > + */ > + if (!wait) > + return 0; > + > + /* The filesystem is being unmounted. Nothing to do. */ > + if (!sb->s_root) > + return 0; > + > + if (!fc->sync_fs) > + return 0; > + > + memset(&inarg, 0, sizeof(inarg)); > + args.in_numargs = 1; > + args.in_args[0].size = sizeof(inarg); > + args.in_args[0].value = &inarg; > + args.opcode = FUSE_SYNCFS; > + args.nodeid = get_node_id(sb->s_root->d_inode); > + args.out_numargs = 0; > + > + err = fuse_simple_request(fm, &args); > + if (err == -ENOSYS) { > + fc->sync_fs = 0; > + err = 0; > + } > + > + return err; > +} > + > enum { > OPT_SOURCE, > OPT_SUBTYPE, > @@ -909,6 +948,7 @@ static const struct super_operations fuse_super_operations = { > .put_super = fuse_put_super, > .umount_begin = fuse_umount_begin, > .statfs = fuse_statfs, > + .sync_fs = fuse_sync_fs, > .show_options = fuse_show_options, > }; > > diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c > index 8962cd033016..f649a47efb68 100644 > --- a/fs/fuse/virtio_fs.c > +++ b/fs/fuse/virtio_fs.c > @@ -1455,6 +1455,7 @@ static int virtio_fs_get_tree(struct fs_context *fsc) > fc->release = fuse_free_conn; > fc->delete_stale = true; > fc->auto_submounts = true; > + fc->sync_fs = true; > > /* Tell FUSE to split requests that exceed the virtqueue's size */ > fc->max_pages_limit = min_t(unsigned int, fc->max_pages_limit, > diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h > index 271ae90a9bb7..36ed092227fa 100644 > --- a/include/uapi/linux/fuse.h > +++ b/include/uapi/linux/fuse.h > @@ -181,6 +181,9 @@ > * - add FUSE_OPEN_KILL_SUIDGID > * - extend fuse_setxattr_in, add FUSE_SETXATTR_EXT > * - add FUSE_SETXATTR_ACL_KILL_SGID > + * > + * 7.34 > + * - add FUSE_SYNCFS > */ > > #ifndef _LINUX_FUSE_H > @@ -216,7 +219,7 @@ > #define FUSE_KERNEL_VERSION 7 > > /** Minor version number of this interface */ > -#define FUSE_KERNEL_MINOR_VERSION 33 > +#define FUSE_KERNEL_MINOR_VERSION 34 > > /** The node ID of the root inode */ > #define FUSE_ROOT_ID 1 > @@ -509,6 +512,7 @@ enum fuse_opcode { > FUSE_COPY_FILE_RANGE = 47, > FUSE_SETUPMAPPING = 48, > FUSE_REMOVEMAPPING = 49, > + FUSE_SYNCFS = 50, > > /* CUSE specific operations */ > CUSE_INIT = 4096, > @@ -971,4 +975,8 @@ struct fuse_removemapping_one { > #define FUSE_REMOVEMAPPING_MAX_ENTRY \ > (PAGE_SIZE / sizeof(struct fuse_removemapping_one)) > > +struct fuse_syncfs_in { > + uint64_t padding; > +}; > + > #endif /* _LINUX_FUSE_H */ _______________________________________________ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [Virtio-fs] [PATCH v4 5/5] virtiofs: propagate sync() to file server 2021-05-21 10:08 ` Greg Kurz @ 2021-05-21 12:51 ` Miklos Szeredi -1 siblings, 0 replies; 83+ messages in thread From: Miklos Szeredi @ 2021-05-21 12:51 UTC (permalink / raw) To: Greg Kurz Cc: linux-kernel, Max Reitz, virtio-fs-list, linux-fsdevel, virtualization, Robert Krawitz, Vivek Goyal On Fri, 21 May 2021 at 12:09, Greg Kurz <groug@kaod.org> wrote: > If it looks good to you, maybe you can just merge it and > I'll re-post the fixes separately ? Looks good, applied. Thanks, Miklos ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [PATCH v4 5/5] virtiofs: propagate sync() to file server @ 2021-05-21 12:51 ` Miklos Szeredi 0 siblings, 0 replies; 83+ messages in thread From: Miklos Szeredi @ 2021-05-21 12:51 UTC (permalink / raw) To: Greg Kurz Cc: virtualization, linux-fsdevel, linux-kernel, virtio-fs-list, Stefan Hajnoczi, Max Reitz, Vivek Goyal, Robert Krawitz On Fri, 21 May 2021 at 12:09, Greg Kurz <groug@kaod.org> wrote: > If it looks good to you, maybe you can just merge it and > I'll re-post the fixes separately ? Looks good, applied. Thanks, Miklos ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [Virtio-fs] [PATCH v4 5/5] virtiofs: propagate sync() to file server 2021-05-20 15:46 ` Greg Kurz @ 2021-08-15 14:14 ` Amir Goldstein -1 siblings, 0 replies; 83+ messages in thread From: Amir Goldstein @ 2021-08-15 14:14 UTC (permalink / raw) To: Greg Kurz Cc: Miklos Szeredi, linux-kernel, virtualization, virtio-fs-list, linux-fsdevel, Max Reitz, Robert Krawitz, Vivek Goyal Hi Greg, Sorry for the late reply, I have some questions about this change... On Fri, May 21, 2021 at 9:12 AM Greg Kurz <groug@kaod.org> wrote: > > Even if POSIX doesn't mandate it, linux users legitimately expect > sync() to flush all data and metadata to physical storage when it > is located on the same system. This isn't happening with virtiofs > though : sync() inside the guest returns right away even though > data still needs to be flushed from the host page cache. > > This is easily demonstrated by doing the following in the guest: > > $ dd if=/dev/zero of=/mnt/foo bs=1M count=5K ; strace -T -e sync sync > 5120+0 records in > 5120+0 records out > 5368709120 bytes (5.4 GB, 5.0 GiB) copied, 5.22224 s, 1.0 GB/s > sync() = 0 <0.024068> > +++ exited with 0 +++ > > and start the following in the host when the 'dd' command completes > in the guest: > > $ strace -T -e fsync /usr/bin/sync virtiofs/foo > fsync(3) = 0 <10.371640> > +++ exited with 0 +++ > > There are no good reasons not to honor the expected behavior of > sync() actually : it gives an unrealistic impression that virtiofs > is super fast and that data has safely landed on HW, which isn't > the case obviously. > > Implement a ->sync_fs() superblock operation that sends a new > FUSE_SYNCFS request type for this purpose. Provision a 64-bit > placeholder for possible future extensions. Since the file > server cannot handle the wait == 0 case, we skip it to avoid a > gratuitous roundtrip. Note that this is per-superblock : a > FUSE_SYNCFS is send for the root mount and for each submount. > > Like with FUSE_FSYNC and FUSE_FSYNCDIR, lack of support for > FUSE_SYNCFS in the file server is treated as permanent success. > This ensures compatibility with older file servers : the client > will get the current behavior of sync() not being propagated to > the file server. I wonder - even if the server does not support SYNCFS or if the kernel does not trust the server with SYNCFS, fuse_sync_fs() can wait until all pending requests up to this call have been completed, either before or after submitting the SYNCFS request. No? Does virtiofsd track all requests prior to SYNCFS request to make sure that they were executed on the host filesystem before calling syncfs() on the host filesystem? I am not familiar enough with FUSE internals so there may already be a mechanism to track/wait for all pending requests? > > Note that such an operation allows the file server to DoS sync(). > Since a typical FUSE file server is an untrusted piece of software > running in userspace, this is disabled by default. Only enable it > with virtiofs for now since virtiofsd is supposedly trusted by the > guest kernel. Isn't there already a similar risk of DoS to sync() from the ability of any untrusted (or malfunctioning) server to block writes? Thanks, Amir. ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [PATCH v4 5/5] virtiofs: propagate sync() to file server @ 2021-08-15 14:14 ` Amir Goldstein 0 siblings, 0 replies; 83+ messages in thread From: Amir Goldstein @ 2021-08-15 14:14 UTC (permalink / raw) To: Greg Kurz Cc: Miklos Szeredi, virtualization, linux-fsdevel, linux-kernel, virtio-fs-list, Stefan Hajnoczi, Max Reitz, Vivek Goyal, Robert Krawitz Hi Greg, Sorry for the late reply, I have some questions about this change... On Fri, May 21, 2021 at 9:12 AM Greg Kurz <groug@kaod.org> wrote: > > Even if POSIX doesn't mandate it, linux users legitimately expect > sync() to flush all data and metadata to physical storage when it > is located on the same system. This isn't happening with virtiofs > though : sync() inside the guest returns right away even though > data still needs to be flushed from the host page cache. > > This is easily demonstrated by doing the following in the guest: > > $ dd if=/dev/zero of=/mnt/foo bs=1M count=5K ; strace -T -e sync sync > 5120+0 records in > 5120+0 records out > 5368709120 bytes (5.4 GB, 5.0 GiB) copied, 5.22224 s, 1.0 GB/s > sync() = 0 <0.024068> > +++ exited with 0 +++ > > and start the following in the host when the 'dd' command completes > in the guest: > > $ strace -T -e fsync /usr/bin/sync virtiofs/foo > fsync(3) = 0 <10.371640> > +++ exited with 0 +++ > > There are no good reasons not to honor the expected behavior of > sync() actually : it gives an unrealistic impression that virtiofs > is super fast and that data has safely landed on HW, which isn't > the case obviously. > > Implement a ->sync_fs() superblock operation that sends a new > FUSE_SYNCFS request type for this purpose. Provision a 64-bit > placeholder for possible future extensions. Since the file > server cannot handle the wait == 0 case, we skip it to avoid a > gratuitous roundtrip. Note that this is per-superblock : a > FUSE_SYNCFS is send for the root mount and for each submount. > > Like with FUSE_FSYNC and FUSE_FSYNCDIR, lack of support for > FUSE_SYNCFS in the file server is treated as permanent success. > This ensures compatibility with older file servers : the client > will get the current behavior of sync() not being propagated to > the file server. I wonder - even if the server does not support SYNCFS or if the kernel does not trust the server with SYNCFS, fuse_sync_fs() can wait until all pending requests up to this call have been completed, either before or after submitting the SYNCFS request. No? Does virtiofsd track all requests prior to SYNCFS request to make sure that they were executed on the host filesystem before calling syncfs() on the host filesystem? I am not familiar enough with FUSE internals so there may already be a mechanism to track/wait for all pending requests? > > Note that such an operation allows the file server to DoS sync(). > Since a typical FUSE file server is an untrusted piece of software > running in userspace, this is disabled by default. Only enable it > with virtiofs for now since virtiofsd is supposedly trusted by the > guest kernel. Isn't there already a similar risk of DoS to sync() from the ability of any untrusted (or malfunctioning) server to block writes? Thanks, Amir. ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [Virtio-fs] [PATCH v4 5/5] virtiofs: propagate sync() to file server 2021-08-15 14:14 ` Amir Goldstein (?) @ 2021-08-16 15:29 ` Vivek Goyal -1 siblings, 0 replies; 83+ messages in thread From: Vivek Goyal @ 2021-08-16 15:29 UTC (permalink / raw) To: Amir Goldstein Cc: Miklos Szeredi, linux-kernel, virtualization, virtio-fs-list, linux-fsdevel, Max Reitz, Robert Krawitz On Sun, Aug 15, 2021 at 05:14:06PM +0300, Amir Goldstein wrote: > Hi Greg, > > Sorry for the late reply, I have some questions about this change... > > On Fri, May 21, 2021 at 9:12 AM Greg Kurz <groug@kaod.org> wrote: > > > > Even if POSIX doesn't mandate it, linux users legitimately expect > > sync() to flush all data and metadata to physical storage when it > > is located on the same system. This isn't happening with virtiofs > > though : sync() inside the guest returns right away even though > > data still needs to be flushed from the host page cache. > > > > This is easily demonstrated by doing the following in the guest: > > > > $ dd if=/dev/zero of=/mnt/foo bs=1M count=5K ; strace -T -e sync sync > > 5120+0 records in > > 5120+0 records out > > 5368709120 bytes (5.4 GB, 5.0 GiB) copied, 5.22224 s, 1.0 GB/s > > sync() = 0 <0.024068> > > +++ exited with 0 +++ > > > > and start the following in the host when the 'dd' command completes > > in the guest: > > > > $ strace -T -e fsync /usr/bin/sync virtiofs/foo > > fsync(3) = 0 <10.371640> > > +++ exited with 0 +++ > > > > There are no good reasons not to honor the expected behavior of > > sync() actually : it gives an unrealistic impression that virtiofs > > is super fast and that data has safely landed on HW, which isn't > > the case obviously. > > > > Implement a ->sync_fs() superblock operation that sends a new > > FUSE_SYNCFS request type for this purpose. Provision a 64-bit > > placeholder for possible future extensions. Since the file > > server cannot handle the wait == 0 case, we skip it to avoid a > > gratuitous roundtrip. Note that this is per-superblock : a > > FUSE_SYNCFS is send for the root mount and for each submount. > > > > Like with FUSE_FSYNC and FUSE_FSYNCDIR, lack of support for > > FUSE_SYNCFS in the file server is treated as permanent success. > > This ensures compatibility with older file servers : the client > > will get the current behavior of sync() not being propagated to > > the file server. > > I wonder - even if the server does not support SYNCFS or if the kernel > does not trust the server with SYNCFS, fuse_sync_fs() can wait > until all pending requests up to this call have been completed, either > before or after submitting the SYNCFS request. No? > > Does virtiofsd track all requests prior to SYNCFS request to make > sure that they were executed on the host filesystem before calling > syncfs() on the host filesystem? Hi Amir, I don't think virtiofsd has any such notion. I would think, that client should make sure all pending writes have completed and then send SYNCFS request. Looking at the sync_filesystem(), I am assuming vfs will take care of flushing out all dirty pages and then call ->sync_fs. Having said that, I think fuse queues the writeback request internally and signals completion of writeback to mm(end_page_writeback()). And that's why fuse_fsync() has notion of waiting for all pending writes to finish on an inode (fuse_sync_writes()). So I think you have raised a good point. That is if there are pending writes at the time of syncfs(), we don't seem to have a notion of first waiting for all these writes to finish before we send FUSE_SYNCFS request to server. In case of virtiofs, we could probably move away from the notion of ending writeback immediately. IIUC, this was needed for regular fuse where we wanted to make sure rouge/malfunctining fuse server could not affect processes on system which are not dealing with fuse. But in case of virtiofs, guest is trusting file server. I had tried to get rid of this for virtiofs but ran into some other issues which I could not resolve easily at the time and then I got distracted in other things. Anyway, irrespective of that, we probably need a way to flush out all pending writes with fuse and then send FUSE_SYNCFS. (And lost make sure writes coming after call to fuse_sync_fs(), continue to be queued and we don't livelock. BTW, in the context of virtiofs, this probably is problem only with mmaped writes. otherwise cache=auto and cache=none are basically writethrough caches. So write is sent to server immediately. So there is nothing to be written back when syncfs() comes along. But mmaped() writes are different and even with cache=auto there can be dirty pages. (cache=none does not support mmap() at all). > > I am not familiar enough with FUSE internals so there may already > be a mechanism to track/wait for all pending requests? fuse_sync_writes() does it for inode. I am not aware of anything which can do it for the whole filesystem (all the inodes). > > > > > Note that such an operation allows the file server to DoS sync(). > > Since a typical FUSE file server is an untrusted piece of software > > running in userspace, this is disabled by default. Only enable it > > with virtiofs for now since virtiofsd is supposedly trusted by the > > guest kernel. > > Isn't there already a similar risk of DoS to sync() from the ability of any > untrusted (or malfunctioning) server to block writes? I think fuse has some safeguards for this. Fuse signals completion of writeback immediately so that vfs/mm/fs does not blocking trying to writeback and if server is not finishing WRITES fast enough, the there will be enough dirty pages in bdi that it will create back pressure and block process dirtying pages. Thanks Vivek ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [PATCH v4 5/5] virtiofs: propagate sync() to file server @ 2021-08-16 15:29 ` Vivek Goyal 0 siblings, 0 replies; 83+ messages in thread From: Vivek Goyal @ 2021-08-16 15:29 UTC (permalink / raw) To: Amir Goldstein Cc: Greg Kurz, Miklos Szeredi, virtualization, linux-fsdevel, linux-kernel, virtio-fs-list, Stefan Hajnoczi, Max Reitz, Robert Krawitz On Sun, Aug 15, 2021 at 05:14:06PM +0300, Amir Goldstein wrote: > Hi Greg, > > Sorry for the late reply, I have some questions about this change... > > On Fri, May 21, 2021 at 9:12 AM Greg Kurz <groug@kaod.org> wrote: > > > > Even if POSIX doesn't mandate it, linux users legitimately expect > > sync() to flush all data and metadata to physical storage when it > > is located on the same system. This isn't happening with virtiofs > > though : sync() inside the guest returns right away even though > > data still needs to be flushed from the host page cache. > > > > This is easily demonstrated by doing the following in the guest: > > > > $ dd if=/dev/zero of=/mnt/foo bs=1M count=5K ; strace -T -e sync sync > > 5120+0 records in > > 5120+0 records out > > 5368709120 bytes (5.4 GB, 5.0 GiB) copied, 5.22224 s, 1.0 GB/s > > sync() = 0 <0.024068> > > +++ exited with 0 +++ > > > > and start the following in the host when the 'dd' command completes > > in the guest: > > > > $ strace -T -e fsync /usr/bin/sync virtiofs/foo > > fsync(3) = 0 <10.371640> > > +++ exited with 0 +++ > > > > There are no good reasons not to honor the expected behavior of > > sync() actually : it gives an unrealistic impression that virtiofs > > is super fast and that data has safely landed on HW, which isn't > > the case obviously. > > > > Implement a ->sync_fs() superblock operation that sends a new > > FUSE_SYNCFS request type for this purpose. Provision a 64-bit > > placeholder for possible future extensions. Since the file > > server cannot handle the wait == 0 case, we skip it to avoid a > > gratuitous roundtrip. Note that this is per-superblock : a > > FUSE_SYNCFS is send for the root mount and for each submount. > > > > Like with FUSE_FSYNC and FUSE_FSYNCDIR, lack of support for > > FUSE_SYNCFS in the file server is treated as permanent success. > > This ensures compatibility with older file servers : the client > > will get the current behavior of sync() not being propagated to > > the file server. > > I wonder - even if the server does not support SYNCFS or if the kernel > does not trust the server with SYNCFS, fuse_sync_fs() can wait > until all pending requests up to this call have been completed, either > before or after submitting the SYNCFS request. No? > > Does virtiofsd track all requests prior to SYNCFS request to make > sure that they were executed on the host filesystem before calling > syncfs() on the host filesystem? Hi Amir, I don't think virtiofsd has any such notion. I would think, that client should make sure all pending writes have completed and then send SYNCFS request. Looking at the sync_filesystem(), I am assuming vfs will take care of flushing out all dirty pages and then call ->sync_fs. Having said that, I think fuse queues the writeback request internally and signals completion of writeback to mm(end_page_writeback()). And that's why fuse_fsync() has notion of waiting for all pending writes to finish on an inode (fuse_sync_writes()). So I think you have raised a good point. That is if there are pending writes at the time of syncfs(), we don't seem to have a notion of first waiting for all these writes to finish before we send FUSE_SYNCFS request to server. In case of virtiofs, we could probably move away from the notion of ending writeback immediately. IIUC, this was needed for regular fuse where we wanted to make sure rouge/malfunctining fuse server could not affect processes on system which are not dealing with fuse. But in case of virtiofs, guest is trusting file server. I had tried to get rid of this for virtiofs but ran into some other issues which I could not resolve easily at the time and then I got distracted in other things. Anyway, irrespective of that, we probably need a way to flush out all pending writes with fuse and then send FUSE_SYNCFS. (And lost make sure writes coming after call to fuse_sync_fs(), continue to be queued and we don't livelock. BTW, in the context of virtiofs, this probably is problem only with mmaped writes. otherwise cache=auto and cache=none are basically writethrough caches. So write is sent to server immediately. So there is nothing to be written back when syncfs() comes along. But mmaped() writes are different and even with cache=auto there can be dirty pages. (cache=none does not support mmap() at all). > > I am not familiar enough with FUSE internals so there may already > be a mechanism to track/wait for all pending requests? fuse_sync_writes() does it for inode. I am not aware of anything which can do it for the whole filesystem (all the inodes). > > > > > Note that such an operation allows the file server to DoS sync(). > > Since a typical FUSE file server is an untrusted piece of software > > running in userspace, this is disabled by default. Only enable it > > with virtiofs for now since virtiofsd is supposedly trusted by the > > guest kernel. > > Isn't there already a similar risk of DoS to sync() from the ability of any > untrusted (or malfunctioning) server to block writes? I think fuse has some safeguards for this. Fuse signals completion of writeback immediately so that vfs/mm/fs does not blocking trying to writeback and if server is not finishing WRITES fast enough, the there will be enough dirty pages in bdi that it will create back pressure and block process dirtying pages. Thanks Vivek ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [PATCH v4 5/5] virtiofs: propagate sync() to file server @ 2021-08-16 15:29 ` Vivek Goyal 0 siblings, 0 replies; 83+ messages in thread From: Vivek Goyal @ 2021-08-16 15:29 UTC (permalink / raw) To: Amir Goldstein Cc: Miklos Szeredi, linux-kernel, virtualization, virtio-fs-list, Stefan Hajnoczi, linux-fsdevel, Max Reitz, Robert Krawitz On Sun, Aug 15, 2021 at 05:14:06PM +0300, Amir Goldstein wrote: > Hi Greg, > > Sorry for the late reply, I have some questions about this change... > > On Fri, May 21, 2021 at 9:12 AM Greg Kurz <groug@kaod.org> wrote: > > > > Even if POSIX doesn't mandate it, linux users legitimately expect > > sync() to flush all data and metadata to physical storage when it > > is located on the same system. This isn't happening with virtiofs > > though : sync() inside the guest returns right away even though > > data still needs to be flushed from the host page cache. > > > > This is easily demonstrated by doing the following in the guest: > > > > $ dd if=/dev/zero of=/mnt/foo bs=1M count=5K ; strace -T -e sync sync > > 5120+0 records in > > 5120+0 records out > > 5368709120 bytes (5.4 GB, 5.0 GiB) copied, 5.22224 s, 1.0 GB/s > > sync() = 0 <0.024068> > > +++ exited with 0 +++ > > > > and start the following in the host when the 'dd' command completes > > in the guest: > > > > $ strace -T -e fsync /usr/bin/sync virtiofs/foo > > fsync(3) = 0 <10.371640> > > +++ exited with 0 +++ > > > > There are no good reasons not to honor the expected behavior of > > sync() actually : it gives an unrealistic impression that virtiofs > > is super fast and that data has safely landed on HW, which isn't > > the case obviously. > > > > Implement a ->sync_fs() superblock operation that sends a new > > FUSE_SYNCFS request type for this purpose. Provision a 64-bit > > placeholder for possible future extensions. Since the file > > server cannot handle the wait == 0 case, we skip it to avoid a > > gratuitous roundtrip. Note that this is per-superblock : a > > FUSE_SYNCFS is send for the root mount and for each submount. > > > > Like with FUSE_FSYNC and FUSE_FSYNCDIR, lack of support for > > FUSE_SYNCFS in the file server is treated as permanent success. > > This ensures compatibility with older file servers : the client > > will get the current behavior of sync() not being propagated to > > the file server. > > I wonder - even if the server does not support SYNCFS or if the kernel > does not trust the server with SYNCFS, fuse_sync_fs() can wait > until all pending requests up to this call have been completed, either > before or after submitting the SYNCFS request. No? > > Does virtiofsd track all requests prior to SYNCFS request to make > sure that they were executed on the host filesystem before calling > syncfs() on the host filesystem? Hi Amir, I don't think virtiofsd has any such notion. I would think, that client should make sure all pending writes have completed and then send SYNCFS request. Looking at the sync_filesystem(), I am assuming vfs will take care of flushing out all dirty pages and then call ->sync_fs. Having said that, I think fuse queues the writeback request internally and signals completion of writeback to mm(end_page_writeback()). And that's why fuse_fsync() has notion of waiting for all pending writes to finish on an inode (fuse_sync_writes()). So I think you have raised a good point. That is if there are pending writes at the time of syncfs(), we don't seem to have a notion of first waiting for all these writes to finish before we send FUSE_SYNCFS request to server. In case of virtiofs, we could probably move away from the notion of ending writeback immediately. IIUC, this was needed for regular fuse where we wanted to make sure rouge/malfunctining fuse server could not affect processes on system which are not dealing with fuse. But in case of virtiofs, guest is trusting file server. I had tried to get rid of this for virtiofs but ran into some other issues which I could not resolve easily at the time and then I got distracted in other things. Anyway, irrespective of that, we probably need a way to flush out all pending writes with fuse and then send FUSE_SYNCFS. (And lost make sure writes coming after call to fuse_sync_fs(), continue to be queued and we don't livelock. BTW, in the context of virtiofs, this probably is problem only with mmaped writes. otherwise cache=auto and cache=none are basically writethrough caches. So write is sent to server immediately. So there is nothing to be written back when syncfs() comes along. But mmaped() writes are different and even with cache=auto there can be dirty pages. (cache=none does not support mmap() at all). > > I am not familiar enough with FUSE internals so there may already > be a mechanism to track/wait for all pending requests? fuse_sync_writes() does it for inode. I am not aware of anything which can do it for the whole filesystem (all the inodes). > > > > > Note that such an operation allows the file server to DoS sync(). > > Since a typical FUSE file server is an untrusted piece of software > > running in userspace, this is disabled by default. Only enable it > > with virtiofs for now since virtiofsd is supposedly trusted by the > > guest kernel. > > Isn't there already a similar risk of DoS to sync() from the ability of any > untrusted (or malfunctioning) server to block writes? I think fuse has some safeguards for this. Fuse signals completion of writeback immediately so that vfs/mm/fs does not blocking trying to writeback and if server is not finishing WRITES fast enough, the there will be enough dirty pages in bdi that it will create back pressure and block process dirtying pages. Thanks Vivek _______________________________________________ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [Virtio-fs] [PATCH v4 5/5] virtiofs: propagate sync() to file server 2021-08-16 15:29 ` Vivek Goyal @ 2021-08-16 18:57 ` Amir Goldstein -1 siblings, 0 replies; 83+ messages in thread From: Amir Goldstein @ 2021-08-16 18:57 UTC (permalink / raw) To: Vivek Goyal Cc: Miklos Szeredi, linux-kernel, virtualization, virtio-fs-list, linux-fsdevel, Max Reitz, Robert Krawitz On Mon, Aug 16, 2021 at 6:29 PM Vivek Goyal <vgoyal@redhat.com> wrote: > > On Sun, Aug 15, 2021 at 05:14:06PM +0300, Amir Goldstein wrote: > > Hi Greg, > > > > Sorry for the late reply, I have some questions about this change... > > > > On Fri, May 21, 2021 at 9:12 AM Greg Kurz <groug@kaod.org> wrote: > > > > > > Even if POSIX doesn't mandate it, linux users legitimately expect > > > sync() to flush all data and metadata to physical storage when it > > > is located on the same system. This isn't happening with virtiofs > > > though : sync() inside the guest returns right away even though > > > data still needs to be flushed from the host page cache. > > > > > > This is easily demonstrated by doing the following in the guest: > > > > > > $ dd if=/dev/zero of=/mnt/foo bs=1M count=5K ; strace -T -e sync sync > > > 5120+0 records in > > > 5120+0 records out > > > 5368709120 bytes (5.4 GB, 5.0 GiB) copied, 5.22224 s, 1.0 GB/s > > > sync() = 0 <0.024068> > > > +++ exited with 0 +++ > > > > > > and start the following in the host when the 'dd' command completes > > > in the guest: > > > > > > $ strace -T -e fsync /usr/bin/sync virtiofs/foo > > > fsync(3) = 0 <10.371640> > > > +++ exited with 0 +++ > > > > > > There are no good reasons not to honor the expected behavior of > > > sync() actually : it gives an unrealistic impression that virtiofs > > > is super fast and that data has safely landed on HW, which isn't > > > the case obviously. > > > > > > Implement a ->sync_fs() superblock operation that sends a new > > > FUSE_SYNCFS request type for this purpose. Provision a 64-bit > > > placeholder for possible future extensions. Since the file > > > server cannot handle the wait == 0 case, we skip it to avoid a > > > gratuitous roundtrip. Note that this is per-superblock : a > > > FUSE_SYNCFS is send for the root mount and for each submount. > > > > > > Like with FUSE_FSYNC and FUSE_FSYNCDIR, lack of support for > > > FUSE_SYNCFS in the file server is treated as permanent success. > > > This ensures compatibility with older file servers : the client > > > will get the current behavior of sync() not being propagated to > > > the file server. > > > > I wonder - even if the server does not support SYNCFS or if the kernel > > does not trust the server with SYNCFS, fuse_sync_fs() can wait > > until all pending requests up to this call have been completed, either > > before or after submitting the SYNCFS request. No? > > > > > Does virtiofsd track all requests prior to SYNCFS request to make > > sure that they were executed on the host filesystem before calling > > syncfs() on the host filesystem? > > Hi Amir, > > I don't think virtiofsd has any such notion. I would think, that > client should make sure all pending writes have completed and > then send SYNCFS request. > > Looking at the sync_filesystem(), I am assuming vfs will take care > of flushing out all dirty pages and then call ->sync_fs. > > Having said that, I think fuse queues the writeback request internally > and signals completion of writeback to mm(end_page_writeback()). And > that's why fuse_fsync() has notion of waiting for all pending > writes to finish on an inode (fuse_sync_writes()). > > So I think you have raised a good point. That is if there are pending > writes at the time of syncfs(), we don't seem to have a notion of > first waiting for all these writes to finish before we send > FUSE_SYNCFS request to server. Maybe, but I was not referring to inode writeback requests. I had assumed that those were handled correctly. I was referring to pending metadata requests. ->sync_fs() in local fs also takes care of flushing metadata (e.g. journal). I assumed that virtiofsd implements FUSE_SYNCFS request by calling syncfs() on host fs, but it is does that than there is no guarantee that all metadata requests have reached the host fs from virtiofs unless client or server take care of waiting for all pending metadata requests before issuing FUSE_SYNCFS. But maybe I am missing something. It might be worth mentioning that I did not find any sync_fs() commands that request to flush metadata caches on the server in NFS or SMB protocols either. Thanks, Amir. ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [PATCH v4 5/5] virtiofs: propagate sync() to file server @ 2021-08-16 18:57 ` Amir Goldstein 0 siblings, 0 replies; 83+ messages in thread From: Amir Goldstein @ 2021-08-16 18:57 UTC (permalink / raw) To: Vivek Goyal Cc: Greg Kurz, Miklos Szeredi, virtualization, linux-fsdevel, linux-kernel, virtio-fs-list, Stefan Hajnoczi, Max Reitz, Robert Krawitz On Mon, Aug 16, 2021 at 6:29 PM Vivek Goyal <vgoyal@redhat.com> wrote: > > On Sun, Aug 15, 2021 at 05:14:06PM +0300, Amir Goldstein wrote: > > Hi Greg, > > > > Sorry for the late reply, I have some questions about this change... > > > > On Fri, May 21, 2021 at 9:12 AM Greg Kurz <groug@kaod.org> wrote: > > > > > > Even if POSIX doesn't mandate it, linux users legitimately expect > > > sync() to flush all data and metadata to physical storage when it > > > is located on the same system. This isn't happening with virtiofs > > > though : sync() inside the guest returns right away even though > > > data still needs to be flushed from the host page cache. > > > > > > This is easily demonstrated by doing the following in the guest: > > > > > > $ dd if=/dev/zero of=/mnt/foo bs=1M count=5K ; strace -T -e sync sync > > > 5120+0 records in > > > 5120+0 records out > > > 5368709120 bytes (5.4 GB, 5.0 GiB) copied, 5.22224 s, 1.0 GB/s > > > sync() = 0 <0.024068> > > > +++ exited with 0 +++ > > > > > > and start the following in the host when the 'dd' command completes > > > in the guest: > > > > > > $ strace -T -e fsync /usr/bin/sync virtiofs/foo > > > fsync(3) = 0 <10.371640> > > > +++ exited with 0 +++ > > > > > > There are no good reasons not to honor the expected behavior of > > > sync() actually : it gives an unrealistic impression that virtiofs > > > is super fast and that data has safely landed on HW, which isn't > > > the case obviously. > > > > > > Implement a ->sync_fs() superblock operation that sends a new > > > FUSE_SYNCFS request type for this purpose. Provision a 64-bit > > > placeholder for possible future extensions. Since the file > > > server cannot handle the wait == 0 case, we skip it to avoid a > > > gratuitous roundtrip. Note that this is per-superblock : a > > > FUSE_SYNCFS is send for the root mount and for each submount. > > > > > > Like with FUSE_FSYNC and FUSE_FSYNCDIR, lack of support for > > > FUSE_SYNCFS in the file server is treated as permanent success. > > > This ensures compatibility with older file servers : the client > > > will get the current behavior of sync() not being propagated to > > > the file server. > > > > I wonder - even if the server does not support SYNCFS or if the kernel > > does not trust the server with SYNCFS, fuse_sync_fs() can wait > > until all pending requests up to this call have been completed, either > > before or after submitting the SYNCFS request. No? > > > > > Does virtiofsd track all requests prior to SYNCFS request to make > > sure that they were executed on the host filesystem before calling > > syncfs() on the host filesystem? > > Hi Amir, > > I don't think virtiofsd has any such notion. I would think, that > client should make sure all pending writes have completed and > then send SYNCFS request. > > Looking at the sync_filesystem(), I am assuming vfs will take care > of flushing out all dirty pages and then call ->sync_fs. > > Having said that, I think fuse queues the writeback request internally > and signals completion of writeback to mm(end_page_writeback()). And > that's why fuse_fsync() has notion of waiting for all pending > writes to finish on an inode (fuse_sync_writes()). > > So I think you have raised a good point. That is if there are pending > writes at the time of syncfs(), we don't seem to have a notion of > first waiting for all these writes to finish before we send > FUSE_SYNCFS request to server. Maybe, but I was not referring to inode writeback requests. I had assumed that those were handled correctly. I was referring to pending metadata requests. ->sync_fs() in local fs also takes care of flushing metadata (e.g. journal). I assumed that virtiofsd implements FUSE_SYNCFS request by calling syncfs() on host fs, but it is does that than there is no guarantee that all metadata requests have reached the host fs from virtiofs unless client or server take care of waiting for all pending metadata requests before issuing FUSE_SYNCFS. But maybe I am missing something. It might be worth mentioning that I did not find any sync_fs() commands that request to flush metadata caches on the server in NFS or SMB protocols either. Thanks, Amir. ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [Virtio-fs] [PATCH v4 5/5] virtiofs: propagate sync() to file server 2021-08-16 18:57 ` Amir Goldstein (?) @ 2021-08-16 19:11 ` Vivek Goyal -1 siblings, 0 replies; 83+ messages in thread From: Vivek Goyal @ 2021-08-16 19:11 UTC (permalink / raw) To: Amir Goldstein Cc: Miklos Szeredi, linux-kernel, virtualization, virtio-fs-list, linux-fsdevel, Max Reitz, Robert Krawitz On Mon, Aug 16, 2021 at 09:57:08PM +0300, Amir Goldstein wrote: > On Mon, Aug 16, 2021 at 6:29 PM Vivek Goyal <vgoyal@redhat.com> wrote: > > > > On Sun, Aug 15, 2021 at 05:14:06PM +0300, Amir Goldstein wrote: > > > Hi Greg, > > > > > > Sorry for the late reply, I have some questions about this change... > > > > > > On Fri, May 21, 2021 at 9:12 AM Greg Kurz <groug@kaod.org> wrote: > > > > > > > > Even if POSIX doesn't mandate it, linux users legitimately expect > > > > sync() to flush all data and metadata to physical storage when it > > > > is located on the same system. This isn't happening with virtiofs > > > > though : sync() inside the guest returns right away even though > > > > data still needs to be flushed from the host page cache. > > > > > > > > This is easily demonstrated by doing the following in the guest: > > > > > > > > $ dd if=/dev/zero of=/mnt/foo bs=1M count=5K ; strace -T -e sync sync > > > > 5120+0 records in > > > > 5120+0 records out > > > > 5368709120 bytes (5.4 GB, 5.0 GiB) copied, 5.22224 s, 1.0 GB/s > > > > sync() = 0 <0.024068> > > > > +++ exited with 0 +++ > > > > > > > > and start the following in the host when the 'dd' command completes > > > > in the guest: > > > > > > > > $ strace -T -e fsync /usr/bin/sync virtiofs/foo > > > > fsync(3) = 0 <10.371640> > > > > +++ exited with 0 +++ > > > > > > > > There are no good reasons not to honor the expected behavior of > > > > sync() actually : it gives an unrealistic impression that virtiofs > > > > is super fast and that data has safely landed on HW, which isn't > > > > the case obviously. > > > > > > > > Implement a ->sync_fs() superblock operation that sends a new > > > > FUSE_SYNCFS request type for this purpose. Provision a 64-bit > > > > placeholder for possible future extensions. Since the file > > > > server cannot handle the wait == 0 case, we skip it to avoid a > > > > gratuitous roundtrip. Note that this is per-superblock : a > > > > FUSE_SYNCFS is send for the root mount and for each submount. > > > > > > > > Like with FUSE_FSYNC and FUSE_FSYNCDIR, lack of support for > > > > FUSE_SYNCFS in the file server is treated as permanent success. > > > > This ensures compatibility with older file servers : the client > > > > will get the current behavior of sync() not being propagated to > > > > the file server. > > > > > > I wonder - even if the server does not support SYNCFS or if the kernel > > > does not trust the server with SYNCFS, fuse_sync_fs() can wait > > > until all pending requests up to this call have been completed, either > > > before or after submitting the SYNCFS request. No? > > > > > > > > Does virtiofsd track all requests prior to SYNCFS request to make > > > sure that they were executed on the host filesystem before calling > > > syncfs() on the host filesystem? > > > > Hi Amir, > > > > I don't think virtiofsd has any such notion. I would think, that > > client should make sure all pending writes have completed and > > then send SYNCFS request. > > > > Looking at the sync_filesystem(), I am assuming vfs will take care > > of flushing out all dirty pages and then call ->sync_fs. > > > > Having said that, I think fuse queues the writeback request internally > > and signals completion of writeback to mm(end_page_writeback()). And > > that's why fuse_fsync() has notion of waiting for all pending > > writes to finish on an inode (fuse_sync_writes()). > > > > So I think you have raised a good point. That is if there are pending > > writes at the time of syncfs(), we don't seem to have a notion of > > first waiting for all these writes to finish before we send > > FUSE_SYNCFS request to server. > > Maybe, but I was not referring to inode writeback requests. > I had assumed that those were handled correctly. > I was referring to pending metadata requests. > > ->sync_fs() in local fs also takes care of flushing metadata > (e.g. journal). I assumed that virtiofsd implements FUSE_SYNCFS > request by calling syncfs() on host fs, Yes virtiofsd calls syncfs() on host fs. > but it is does that than > there is no guarantee that all metadata requests have reached the > host fs from virtiofs unless client or server take care of waiting > for all pending metadata requests before issuing FUSE_SYNCFS. We don't have any journal in virtiofs. In fact we don't seem to cache any metadta. Except probably the case when "-o writeback" where we can trust local time stamps. If "-o writeback" is not enabled, i am not sure what metadata we will be caching that we will need to worry about. Do you have something specific in mind. (Atleast from virtiofs point of view, I can't seem to think what metadata we are caching which we need to worry about). Thanks Vivek > > But maybe I am missing something. > > It might be worth mentioning that I did not find any sync_fs() > commands that request to flush metadata caches on the server in > NFS or SMB protocols either. > > Thanks, > Amir. > ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [PATCH v4 5/5] virtiofs: propagate sync() to file server @ 2021-08-16 19:11 ` Vivek Goyal 0 siblings, 0 replies; 83+ messages in thread From: Vivek Goyal @ 2021-08-16 19:11 UTC (permalink / raw) To: Amir Goldstein Cc: Greg Kurz, Miklos Szeredi, virtualization, linux-fsdevel, linux-kernel, virtio-fs-list, Stefan Hajnoczi, Max Reitz, Robert Krawitz On Mon, Aug 16, 2021 at 09:57:08PM +0300, Amir Goldstein wrote: > On Mon, Aug 16, 2021 at 6:29 PM Vivek Goyal <vgoyal@redhat.com> wrote: > > > > On Sun, Aug 15, 2021 at 05:14:06PM +0300, Amir Goldstein wrote: > > > Hi Greg, > > > > > > Sorry for the late reply, I have some questions about this change... > > > > > > On Fri, May 21, 2021 at 9:12 AM Greg Kurz <groug@kaod.org> wrote: > > > > > > > > Even if POSIX doesn't mandate it, linux users legitimately expect > > > > sync() to flush all data and metadata to physical storage when it > > > > is located on the same system. This isn't happening with virtiofs > > > > though : sync() inside the guest returns right away even though > > > > data still needs to be flushed from the host page cache. > > > > > > > > This is easily demonstrated by doing the following in the guest: > > > > > > > > $ dd if=/dev/zero of=/mnt/foo bs=1M count=5K ; strace -T -e sync sync > > > > 5120+0 records in > > > > 5120+0 records out > > > > 5368709120 bytes (5.4 GB, 5.0 GiB) copied, 5.22224 s, 1.0 GB/s > > > > sync() = 0 <0.024068> > > > > +++ exited with 0 +++ > > > > > > > > and start the following in the host when the 'dd' command completes > > > > in the guest: > > > > > > > > $ strace -T -e fsync /usr/bin/sync virtiofs/foo > > > > fsync(3) = 0 <10.371640> > > > > +++ exited with 0 +++ > > > > > > > > There are no good reasons not to honor the expected behavior of > > > > sync() actually : it gives an unrealistic impression that virtiofs > > > > is super fast and that data has safely landed on HW, which isn't > > > > the case obviously. > > > > > > > > Implement a ->sync_fs() superblock operation that sends a new > > > > FUSE_SYNCFS request type for this purpose. Provision a 64-bit > > > > placeholder for possible future extensions. Since the file > > > > server cannot handle the wait == 0 case, we skip it to avoid a > > > > gratuitous roundtrip. Note that this is per-superblock : a > > > > FUSE_SYNCFS is send for the root mount and for each submount. > > > > > > > > Like with FUSE_FSYNC and FUSE_FSYNCDIR, lack of support for > > > > FUSE_SYNCFS in the file server is treated as permanent success. > > > > This ensures compatibility with older file servers : the client > > > > will get the current behavior of sync() not being propagated to > > > > the file server. > > > > > > I wonder - even if the server does not support SYNCFS or if the kernel > > > does not trust the server with SYNCFS, fuse_sync_fs() can wait > > > until all pending requests up to this call have been completed, either > > > before or after submitting the SYNCFS request. No? > > > > > > > > Does virtiofsd track all requests prior to SYNCFS request to make > > > sure that they were executed on the host filesystem before calling > > > syncfs() on the host filesystem? > > > > Hi Amir, > > > > I don't think virtiofsd has any such notion. I would think, that > > client should make sure all pending writes have completed and > > then send SYNCFS request. > > > > Looking at the sync_filesystem(), I am assuming vfs will take care > > of flushing out all dirty pages and then call ->sync_fs. > > > > Having said that, I think fuse queues the writeback request internally > > and signals completion of writeback to mm(end_page_writeback()). And > > that's why fuse_fsync() has notion of waiting for all pending > > writes to finish on an inode (fuse_sync_writes()). > > > > So I think you have raised a good point. That is if there are pending > > writes at the time of syncfs(), we don't seem to have a notion of > > first waiting for all these writes to finish before we send > > FUSE_SYNCFS request to server. > > Maybe, but I was not referring to inode writeback requests. > I had assumed that those were handled correctly. > I was referring to pending metadata requests. > > ->sync_fs() in local fs also takes care of flushing metadata > (e.g. journal). I assumed that virtiofsd implements FUSE_SYNCFS > request by calling syncfs() on host fs, Yes virtiofsd calls syncfs() on host fs. > but it is does that than > there is no guarantee that all metadata requests have reached the > host fs from virtiofs unless client or server take care of waiting > for all pending metadata requests before issuing FUSE_SYNCFS. We don't have any journal in virtiofs. In fact we don't seem to cache any metadta. Except probably the case when "-o writeback" where we can trust local time stamps. If "-o writeback" is not enabled, i am not sure what metadata we will be caching that we will need to worry about. Do you have something specific in mind. (Atleast from virtiofs point of view, I can't seem to think what metadata we are caching which we need to worry about). Thanks Vivek > > But maybe I am missing something. > > It might be worth mentioning that I did not find any sync_fs() > commands that request to flush metadata caches on the server in > NFS or SMB protocols either. > > Thanks, > Amir. > ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [PATCH v4 5/5] virtiofs: propagate sync() to file server @ 2021-08-16 19:11 ` Vivek Goyal 0 siblings, 0 replies; 83+ messages in thread From: Vivek Goyal @ 2021-08-16 19:11 UTC (permalink / raw) To: Amir Goldstein Cc: Miklos Szeredi, linux-kernel, virtualization, virtio-fs-list, Stefan Hajnoczi, linux-fsdevel, Max Reitz, Robert Krawitz On Mon, Aug 16, 2021 at 09:57:08PM +0300, Amir Goldstein wrote: > On Mon, Aug 16, 2021 at 6:29 PM Vivek Goyal <vgoyal@redhat.com> wrote: > > > > On Sun, Aug 15, 2021 at 05:14:06PM +0300, Amir Goldstein wrote: > > > Hi Greg, > > > > > > Sorry for the late reply, I have some questions about this change... > > > > > > On Fri, May 21, 2021 at 9:12 AM Greg Kurz <groug@kaod.org> wrote: > > > > > > > > Even if POSIX doesn't mandate it, linux users legitimately expect > > > > sync() to flush all data and metadata to physical storage when it > > > > is located on the same system. This isn't happening with virtiofs > > > > though : sync() inside the guest returns right away even though > > > > data still needs to be flushed from the host page cache. > > > > > > > > This is easily demonstrated by doing the following in the guest: > > > > > > > > $ dd if=/dev/zero of=/mnt/foo bs=1M count=5K ; strace -T -e sync sync > > > > 5120+0 records in > > > > 5120+0 records out > > > > 5368709120 bytes (5.4 GB, 5.0 GiB) copied, 5.22224 s, 1.0 GB/s > > > > sync() = 0 <0.024068> > > > > +++ exited with 0 +++ > > > > > > > > and start the following in the host when the 'dd' command completes > > > > in the guest: > > > > > > > > $ strace -T -e fsync /usr/bin/sync virtiofs/foo > > > > fsync(3) = 0 <10.371640> > > > > +++ exited with 0 +++ > > > > > > > > There are no good reasons not to honor the expected behavior of > > > > sync() actually : it gives an unrealistic impression that virtiofs > > > > is super fast and that data has safely landed on HW, which isn't > > > > the case obviously. > > > > > > > > Implement a ->sync_fs() superblock operation that sends a new > > > > FUSE_SYNCFS request type for this purpose. Provision a 64-bit > > > > placeholder for possible future extensions. Since the file > > > > server cannot handle the wait == 0 case, we skip it to avoid a > > > > gratuitous roundtrip. Note that this is per-superblock : a > > > > FUSE_SYNCFS is send for the root mount and for each submount. > > > > > > > > Like with FUSE_FSYNC and FUSE_FSYNCDIR, lack of support for > > > > FUSE_SYNCFS in the file server is treated as permanent success. > > > > This ensures compatibility with older file servers : the client > > > > will get the current behavior of sync() not being propagated to > > > > the file server. > > > > > > I wonder - even if the server does not support SYNCFS or if the kernel > > > does not trust the server with SYNCFS, fuse_sync_fs() can wait > > > until all pending requests up to this call have been completed, either > > > before or after submitting the SYNCFS request. No? > > > > > > > > Does virtiofsd track all requests prior to SYNCFS request to make > > > sure that they were executed on the host filesystem before calling > > > syncfs() on the host filesystem? > > > > Hi Amir, > > > > I don't think virtiofsd has any such notion. I would think, that > > client should make sure all pending writes have completed and > > then send SYNCFS request. > > > > Looking at the sync_filesystem(), I am assuming vfs will take care > > of flushing out all dirty pages and then call ->sync_fs. > > > > Having said that, I think fuse queues the writeback request internally > > and signals completion of writeback to mm(end_page_writeback()). And > > that's why fuse_fsync() has notion of waiting for all pending > > writes to finish on an inode (fuse_sync_writes()). > > > > So I think you have raised a good point. That is if there are pending > > writes at the time of syncfs(), we don't seem to have a notion of > > first waiting for all these writes to finish before we send > > FUSE_SYNCFS request to server. > > Maybe, but I was not referring to inode writeback requests. > I had assumed that those were handled correctly. > I was referring to pending metadata requests. > > ->sync_fs() in local fs also takes care of flushing metadata > (e.g. journal). I assumed that virtiofsd implements FUSE_SYNCFS > request by calling syncfs() on host fs, Yes virtiofsd calls syncfs() on host fs. > but it is does that than > there is no guarantee that all metadata requests have reached the > host fs from virtiofs unless client or server take care of waiting > for all pending metadata requests before issuing FUSE_SYNCFS. We don't have any journal in virtiofs. In fact we don't seem to cache any metadta. Except probably the case when "-o writeback" where we can trust local time stamps. If "-o writeback" is not enabled, i am not sure what metadata we will be caching that we will need to worry about. Do you have something specific in mind. (Atleast from virtiofs point of view, I can't seem to think what metadata we are caching which we need to worry about). Thanks Vivek > > But maybe I am missing something. > > It might be worth mentioning that I did not find any sync_fs() > commands that request to flush metadata caches on the server in > NFS or SMB protocols either. > > Thanks, > Amir. > _______________________________________________ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [Virtio-fs] [PATCH v4 5/5] virtiofs: propagate sync() to file server 2021-08-16 19:11 ` Vivek Goyal @ 2021-08-16 19:46 ` Amir Goldstein -1 siblings, 0 replies; 83+ messages in thread From: Amir Goldstein @ 2021-08-16 19:46 UTC (permalink / raw) To: Vivek Goyal Cc: Miklos Szeredi, linux-kernel, virtualization, virtio-fs-list, linux-fsdevel, Max Reitz, Robert Krawitz On Mon, Aug 16, 2021 at 10:11 PM Vivek Goyal <vgoyal@redhat.com> wrote: > > On Mon, Aug 16, 2021 at 09:57:08PM +0300, Amir Goldstein wrote: > > On Mon, Aug 16, 2021 at 6:29 PM Vivek Goyal <vgoyal@redhat.com> wrote: > > > > > > On Sun, Aug 15, 2021 at 05:14:06PM +0300, Amir Goldstein wrote: > > > > Hi Greg, > > > > > > > > Sorry for the late reply, I have some questions about this change... > > > > > > > > On Fri, May 21, 2021 at 9:12 AM Greg Kurz <groug@kaod.org> wrote: > > > > > > > > > > Even if POSIX doesn't mandate it, linux users legitimately expect > > > > > sync() to flush all data and metadata to physical storage when it > > > > > is located on the same system. This isn't happening with virtiofs > > > > > though : sync() inside the guest returns right away even though > > > > > data still needs to be flushed from the host page cache. > > > > > > > > > > This is easily demonstrated by doing the following in the guest: > > > > > > > > > > $ dd if=/dev/zero of=/mnt/foo bs=1M count=5K ; strace -T -e sync sync > > > > > 5120+0 records in > > > > > 5120+0 records out > > > > > 5368709120 bytes (5.4 GB, 5.0 GiB) copied, 5.22224 s, 1.0 GB/s > > > > > sync() = 0 <0.024068> > > > > > +++ exited with 0 +++ > > > > > > > > > > and start the following in the host when the 'dd' command completes > > > > > in the guest: > > > > > > > > > > $ strace -T -e fsync /usr/bin/sync virtiofs/foo > > > > > fsync(3) = 0 <10.371640> > > > > > +++ exited with 0 +++ > > > > > > > > > > There are no good reasons not to honor the expected behavior of > > > > > sync() actually : it gives an unrealistic impression that virtiofs > > > > > is super fast and that data has safely landed on HW, which isn't > > > > > the case obviously. > > > > > > > > > > Implement a ->sync_fs() superblock operation that sends a new > > > > > FUSE_SYNCFS request type for this purpose. Provision a 64-bit > > > > > placeholder for possible future extensions. Since the file > > > > > server cannot handle the wait == 0 case, we skip it to avoid a > > > > > gratuitous roundtrip. Note that this is per-superblock : a > > > > > FUSE_SYNCFS is send for the root mount and for each submount. > > > > > > > > > > Like with FUSE_FSYNC and FUSE_FSYNCDIR, lack of support for > > > > > FUSE_SYNCFS in the file server is treated as permanent success. > > > > > This ensures compatibility with older file servers : the client > > > > > will get the current behavior of sync() not being propagated to > > > > > the file server. > > > > > > > > I wonder - even if the server does not support SYNCFS or if the kernel > > > > does not trust the server with SYNCFS, fuse_sync_fs() can wait > > > > until all pending requests up to this call have been completed, either > > > > before or after submitting the SYNCFS request. No? > > > > > > > > > > > Does virtiofsd track all requests prior to SYNCFS request to make > > > > sure that they were executed on the host filesystem before calling > > > > syncfs() on the host filesystem? > > > > > > Hi Amir, > > > > > > I don't think virtiofsd has any such notion. I would think, that > > > client should make sure all pending writes have completed and > > > then send SYNCFS request. > > > > > > Looking at the sync_filesystem(), I am assuming vfs will take care > > > of flushing out all dirty pages and then call ->sync_fs. > > > > > > Having said that, I think fuse queues the writeback request internally > > > and signals completion of writeback to mm(end_page_writeback()). And > > > that's why fuse_fsync() has notion of waiting for all pending > > > writes to finish on an inode (fuse_sync_writes()). > > > > > > So I think you have raised a good point. That is if there are pending > > > writes at the time of syncfs(), we don't seem to have a notion of > > > first waiting for all these writes to finish before we send > > > FUSE_SYNCFS request to server. > > > > Maybe, but I was not referring to inode writeback requests. > > I had assumed that those were handled correctly. > > I was referring to pending metadata requests. > > > > ->sync_fs() in local fs also takes care of flushing metadata > > (e.g. journal). I assumed that virtiofsd implements FUSE_SYNCFS > > request by calling syncfs() on host fs, > > Yes virtiofsd calls syncfs() on host fs. > > > but it is does that than > > there is no guarantee that all metadata requests have reached the > > host fs from virtiofs unless client or server take care of waiting > > for all pending metadata requests before issuing FUSE_SYNCFS. > > We don't have any journal in virtiofs. In fact we don't seem to > cache any metadta. Except probably the case when "-o writeback" > where we can trust local time stamps. > > If "-o writeback" is not enabled, i am not sure what metadata > we will be caching that we will need to worry about. Do you have > something specific in mind. (Atleast from virtiofs point of view, > I can't seem to think what metadata we are caching which we need > to worry about). No, I don't see a problem. I guess I was confused by the semantics. Thanks for clarifying. Amir. ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [PATCH v4 5/5] virtiofs: propagate sync() to file server @ 2021-08-16 19:46 ` Amir Goldstein 0 siblings, 0 replies; 83+ messages in thread From: Amir Goldstein @ 2021-08-16 19:46 UTC (permalink / raw) To: Vivek Goyal Cc: Greg Kurz, Miklos Szeredi, virtualization, linux-fsdevel, linux-kernel, virtio-fs-list, Stefan Hajnoczi, Max Reitz, Robert Krawitz On Mon, Aug 16, 2021 at 10:11 PM Vivek Goyal <vgoyal@redhat.com> wrote: > > On Mon, Aug 16, 2021 at 09:57:08PM +0300, Amir Goldstein wrote: > > On Mon, Aug 16, 2021 at 6:29 PM Vivek Goyal <vgoyal@redhat.com> wrote: > > > > > > On Sun, Aug 15, 2021 at 05:14:06PM +0300, Amir Goldstein wrote: > > > > Hi Greg, > > > > > > > > Sorry for the late reply, I have some questions about this change... > > > > > > > > On Fri, May 21, 2021 at 9:12 AM Greg Kurz <groug@kaod.org> wrote: > > > > > > > > > > Even if POSIX doesn't mandate it, linux users legitimately expect > > > > > sync() to flush all data and metadata to physical storage when it > > > > > is located on the same system. This isn't happening with virtiofs > > > > > though : sync() inside the guest returns right away even though > > > > > data still needs to be flushed from the host page cache. > > > > > > > > > > This is easily demonstrated by doing the following in the guest: > > > > > > > > > > $ dd if=/dev/zero of=/mnt/foo bs=1M count=5K ; strace -T -e sync sync > > > > > 5120+0 records in > > > > > 5120+0 records out > > > > > 5368709120 bytes (5.4 GB, 5.0 GiB) copied, 5.22224 s, 1.0 GB/s > > > > > sync() = 0 <0.024068> > > > > > +++ exited with 0 +++ > > > > > > > > > > and start the following in the host when the 'dd' command completes > > > > > in the guest: > > > > > > > > > > $ strace -T -e fsync /usr/bin/sync virtiofs/foo > > > > > fsync(3) = 0 <10.371640> > > > > > +++ exited with 0 +++ > > > > > > > > > > There are no good reasons not to honor the expected behavior of > > > > > sync() actually : it gives an unrealistic impression that virtiofs > > > > > is super fast and that data has safely landed on HW, which isn't > > > > > the case obviously. > > > > > > > > > > Implement a ->sync_fs() superblock operation that sends a new > > > > > FUSE_SYNCFS request type for this purpose. Provision a 64-bit > > > > > placeholder for possible future extensions. Since the file > > > > > server cannot handle the wait == 0 case, we skip it to avoid a > > > > > gratuitous roundtrip. Note that this is per-superblock : a > > > > > FUSE_SYNCFS is send for the root mount and for each submount. > > > > > > > > > > Like with FUSE_FSYNC and FUSE_FSYNCDIR, lack of support for > > > > > FUSE_SYNCFS in the file server is treated as permanent success. > > > > > This ensures compatibility with older file servers : the client > > > > > will get the current behavior of sync() not being propagated to > > > > > the file server. > > > > > > > > I wonder - even if the server does not support SYNCFS or if the kernel > > > > does not trust the server with SYNCFS, fuse_sync_fs() can wait > > > > until all pending requests up to this call have been completed, either > > > > before or after submitting the SYNCFS request. No? > > > > > > > > > > > Does virtiofsd track all requests prior to SYNCFS request to make > > > > sure that they were executed on the host filesystem before calling > > > > syncfs() on the host filesystem? > > > > > > Hi Amir, > > > > > > I don't think virtiofsd has any such notion. I would think, that > > > client should make sure all pending writes have completed and > > > then send SYNCFS request. > > > > > > Looking at the sync_filesystem(), I am assuming vfs will take care > > > of flushing out all dirty pages and then call ->sync_fs. > > > > > > Having said that, I think fuse queues the writeback request internally > > > and signals completion of writeback to mm(end_page_writeback()). And > > > that's why fuse_fsync() has notion of waiting for all pending > > > writes to finish on an inode (fuse_sync_writes()). > > > > > > So I think you have raised a good point. That is if there are pending > > > writes at the time of syncfs(), we don't seem to have a notion of > > > first waiting for all these writes to finish before we send > > > FUSE_SYNCFS request to server. > > > > Maybe, but I was not referring to inode writeback requests. > > I had assumed that those were handled correctly. > > I was referring to pending metadata requests. > > > > ->sync_fs() in local fs also takes care of flushing metadata > > (e.g. journal). I assumed that virtiofsd implements FUSE_SYNCFS > > request by calling syncfs() on host fs, > > Yes virtiofsd calls syncfs() on host fs. > > > but it is does that than > > there is no guarantee that all metadata requests have reached the > > host fs from virtiofs unless client or server take care of waiting > > for all pending metadata requests before issuing FUSE_SYNCFS. > > We don't have any journal in virtiofs. In fact we don't seem to > cache any metadta. Except probably the case when "-o writeback" > where we can trust local time stamps. > > If "-o writeback" is not enabled, i am not sure what metadata > we will be caching that we will need to worry about. Do you have > something specific in mind. (Atleast from virtiofs point of view, > I can't seem to think what metadata we are caching which we need > to worry about). No, I don't see a problem. I guess I was confused by the semantics. Thanks for clarifying. Amir. ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [Virtio-fs] [PATCH v4 5/5] virtiofs: propagate sync() to file server 2021-08-16 15:29 ` Vivek Goyal @ 2021-08-28 15:21 ` Miklos Szeredi -1 siblings, 0 replies; 83+ messages in thread From: Miklos Szeredi @ 2021-08-28 15:21 UTC (permalink / raw) To: Vivek Goyal Cc: Amir Goldstein, linux-kernel, virtio-fs-list, Max Reitz, linux-fsdevel, virtualization, Robert Krawitz On Mon, Aug 16, 2021 at 11:29:02AM -0400, Vivek Goyal wrote: > On Sun, Aug 15, 2021 at 05:14:06PM +0300, Amir Goldstein wrote: > > I wonder - even if the server does not support SYNCFS or if the kernel > > does not trust the server with SYNCFS, fuse_sync_fs() can wait > > until all pending requests up to this call have been completed, either > > before or after submitting the SYNCFS request. No? > > > > > Does virtiofsd track all requests prior to SYNCFS request to make > > sure that they were executed on the host filesystem before calling > > syncfs() on the host filesystem? > > Hi Amir, > > I don't think virtiofsd has any such notion. I would think, that > client should make sure all pending writes have completed and > then send SYNCFS request. > > Looking at the sync_filesystem(), I am assuming vfs will take care > of flushing out all dirty pages and then call ->sync_fs. > > Having said that, I think fuse queues the writeback request internally > and signals completion of writeback to mm(end_page_writeback()). And > that's why fuse_fsync() has notion of waiting for all pending > writes to finish on an inode (fuse_sync_writes()). > > So I think you have raised a good point. That is if there are pending > writes at the time of syncfs(), we don't seem to have a notion of > first waiting for all these writes to finish before we send > FUSE_SYNCFS request to server. So here a proposed patch for fixing this. Works by counting write requests initiated up till the syncfs call. Since more than one syncfs can be in progress counts are kept in "buckets" in order to wait for the correct write requests in each instance. I tried to make this lightweight, but the cacheline bounce due to the counter is still there, unfortunately. fc->num_waiting also causes cacheline bouce, so I'm not going to optimize this (percpu counter?) until that one is also optimizied. Not yet tested, and I'm not sure how to test this. Comments? Thanks, Miklos diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 97f860cfc195..8d1d6e895534 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -389,6 +389,7 @@ struct fuse_writepage_args { struct list_head queue_entry; struct fuse_writepage_args *next; struct inode *inode; + struct fuse_sync_bucket *bucket; }; static struct fuse_writepage_args *fuse_find_writeback(struct fuse_inode *fi, @@ -1608,6 +1609,9 @@ static void fuse_writepage_free(struct fuse_writepage_args *wpa) struct fuse_args_pages *ap = &wpa->ia.ap; int i; + if (wpa->bucket && atomic_dec_and_test(&wpa->bucket->num_writepages)) + wake_up(&wpa->bucket->waitq); + for (i = 0; i < ap->num_pages; i++) __free_page(ap->pages[i]); @@ -1871,6 +1875,19 @@ static struct fuse_writepage_args *fuse_writepage_args_alloc(void) } +static void fuse_writepage_add_to_bucket(struct fuse_conn *fc, + struct fuse_writepage_args *wpa) +{ + if (!fc->sync_fs) + return; + + rcu_read_lock(); + do { + wpa->bucket = rcu_dereference(fc->curr_bucket); + } while (unlikely(!atomic_inc_not_zero(&wpa->bucket->num_writepages))); + rcu_read_unlock(); +} + static int fuse_writepage_locked(struct page *page) { struct address_space *mapping = page->mapping; @@ -1898,6 +1915,7 @@ static int fuse_writepage_locked(struct page *page) if (!wpa->ia.ff) goto err_nofile; + fuse_writepage_add_to_bucket(fc, wpa); fuse_write_args_fill(&wpa->ia, wpa->ia.ff, page_offset(page), 0); copy_highpage(tmp_page, page); @@ -2148,6 +2166,8 @@ static int fuse_writepages_fill(struct page *page, __free_page(tmp_page); goto out_unlock; } + fuse_writepage_add_to_bucket(fc, wpa); + data->max_pages = 1; ap = &wpa->ia.ap; diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index 07829ce78695..ee638e227bb3 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -515,6 +515,14 @@ struct fuse_fs_context { void **fudptr; }; +struct fuse_sync_bucket { + atomic_t num_writepages; + union { + wait_queue_head_t waitq; + struct rcu_head rcu; + }; +}; + /** * A Fuse connection. * @@ -807,6 +815,9 @@ struct fuse_conn { /** List of filesystems using this connection */ struct list_head mounts; + + /* New writepages go into this bucket */ + struct fuse_sync_bucket *curr_bucket; }; /* diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index b9beb39a4a18..524b2d128985 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -506,10 +506,24 @@ static int fuse_statfs(struct dentry *dentry, struct kstatfs *buf) return err; } +static struct fuse_sync_bucket *fuse_sync_bucket_alloc(void) +{ + struct fuse_sync_bucket *bucket; + + bucket = kzalloc(sizeof(*bucket), GFP_KERNEL | __GFP_NOFAIL); + if (bucket) { + init_waitqueue_head(&bucket->waitq); + /* Initial active count */ + atomic_set(&bucket->num_writepages, 1); + } + return bucket; +} + static int fuse_sync_fs(struct super_block *sb, int wait) { struct fuse_mount *fm = get_fuse_mount_super(sb); struct fuse_conn *fc = fm->fc; + struct fuse_sync_bucket *bucket, *new_bucket; struct fuse_syncfs_in inarg; FUSE_ARGS(args); int err; @@ -528,6 +542,31 @@ static int fuse_sync_fs(struct super_block *sb, int wait) if (!fc->sync_fs) return 0; + new_bucket = fuse_sync_bucket_alloc(); + spin_lock(&fc->lock); + bucket = fc->curr_bucket; + if (atomic_read(&bucket->num_writepages) != 0) { + /* One more for count completion of old bucket */ + atomic_inc(&new_bucket->num_writepages); + rcu_assign_pointer(fc->curr_bucket, new_bucket); + /* Drop initially added active count */ + atomic_dec(&bucket->num_writepages); + spin_unlock(&fc->lock); + + wait_event(bucket->waitq, atomic_read(&bucket->num_writepages) == 0); + /* + * Drop count on new bucket, possibly resulting in a completion + * if more than one syncfs is going on + */ + if (atomic_dec_and_test(&new_bucket->num_writepages)) + wake_up(&new_bucket->waitq); + kfree_rcu(bucket, rcu); + } else { + spin_unlock(&fc->lock); + /* Free unused */ + kfree(new_bucket); + } + memset(&inarg, 0, sizeof(inarg)); args.in_numargs = 1; args.in_args[0].size = sizeof(inarg); @@ -770,6 +809,7 @@ void fuse_conn_put(struct fuse_conn *fc) fiq->ops->release(fiq); put_pid_ns(fc->pid_ns); put_user_ns(fc->user_ns); + kfree_rcu(fc->curr_bucket, rcu); fc->release(fc); } } @@ -1418,6 +1458,7 @@ int fuse_fill_super_common(struct super_block *sb, struct fuse_fs_context *ctx) if (sb->s_flags & SB_MANDLOCK) goto err; + fc->curr_bucket = fuse_sync_bucket_alloc(); fuse_sb_defaults(sb); if (ctx->is_bdev) { ^ permalink raw reply related [flat|nested] 83+ messages in thread
* Re: [PATCH v4 5/5] virtiofs: propagate sync() to file server @ 2021-08-28 15:21 ` Miklos Szeredi 0 siblings, 0 replies; 83+ messages in thread From: Miklos Szeredi @ 2021-08-28 15:21 UTC (permalink / raw) To: Vivek Goyal Cc: Amir Goldstein, Greg Kurz, virtualization, linux-fsdevel, linux-kernel, virtio-fs-list, Stefan Hajnoczi, Max Reitz, Robert Krawitz On Mon, Aug 16, 2021 at 11:29:02AM -0400, Vivek Goyal wrote: > On Sun, Aug 15, 2021 at 05:14:06PM +0300, Amir Goldstein wrote: > > I wonder - even if the server does not support SYNCFS or if the kernel > > does not trust the server with SYNCFS, fuse_sync_fs() can wait > > until all pending requests up to this call have been completed, either > > before or after submitting the SYNCFS request. No? > > > > > Does virtiofsd track all requests prior to SYNCFS request to make > > sure that they were executed on the host filesystem before calling > > syncfs() on the host filesystem? > > Hi Amir, > > I don't think virtiofsd has any such notion. I would think, that > client should make sure all pending writes have completed and > then send SYNCFS request. > > Looking at the sync_filesystem(), I am assuming vfs will take care > of flushing out all dirty pages and then call ->sync_fs. > > Having said that, I think fuse queues the writeback request internally > and signals completion of writeback to mm(end_page_writeback()). And > that's why fuse_fsync() has notion of waiting for all pending > writes to finish on an inode (fuse_sync_writes()). > > So I think you have raised a good point. That is if there are pending > writes at the time of syncfs(), we don't seem to have a notion of > first waiting for all these writes to finish before we send > FUSE_SYNCFS request to server. So here a proposed patch for fixing this. Works by counting write requests initiated up till the syncfs call. Since more than one syncfs can be in progress counts are kept in "buckets" in order to wait for the correct write requests in each instance. I tried to make this lightweight, but the cacheline bounce due to the counter is still there, unfortunately. fc->num_waiting also causes cacheline bouce, so I'm not going to optimize this (percpu counter?) until that one is also optimizied. Not yet tested, and I'm not sure how to test this. Comments? Thanks, Miklos diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 97f860cfc195..8d1d6e895534 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -389,6 +389,7 @@ struct fuse_writepage_args { struct list_head queue_entry; struct fuse_writepage_args *next; struct inode *inode; + struct fuse_sync_bucket *bucket; }; static struct fuse_writepage_args *fuse_find_writeback(struct fuse_inode *fi, @@ -1608,6 +1609,9 @@ static void fuse_writepage_free(struct fuse_writepage_args *wpa) struct fuse_args_pages *ap = &wpa->ia.ap; int i; + if (wpa->bucket && atomic_dec_and_test(&wpa->bucket->num_writepages)) + wake_up(&wpa->bucket->waitq); + for (i = 0; i < ap->num_pages; i++) __free_page(ap->pages[i]); @@ -1871,6 +1875,19 @@ static struct fuse_writepage_args *fuse_writepage_args_alloc(void) } +static void fuse_writepage_add_to_bucket(struct fuse_conn *fc, + struct fuse_writepage_args *wpa) +{ + if (!fc->sync_fs) + return; + + rcu_read_lock(); + do { + wpa->bucket = rcu_dereference(fc->curr_bucket); + } while (unlikely(!atomic_inc_not_zero(&wpa->bucket->num_writepages))); + rcu_read_unlock(); +} + static int fuse_writepage_locked(struct page *page) { struct address_space *mapping = page->mapping; @@ -1898,6 +1915,7 @@ static int fuse_writepage_locked(struct page *page) if (!wpa->ia.ff) goto err_nofile; + fuse_writepage_add_to_bucket(fc, wpa); fuse_write_args_fill(&wpa->ia, wpa->ia.ff, page_offset(page), 0); copy_highpage(tmp_page, page); @@ -2148,6 +2166,8 @@ static int fuse_writepages_fill(struct page *page, __free_page(tmp_page); goto out_unlock; } + fuse_writepage_add_to_bucket(fc, wpa); + data->max_pages = 1; ap = &wpa->ia.ap; diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index 07829ce78695..ee638e227bb3 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -515,6 +515,14 @@ struct fuse_fs_context { void **fudptr; }; +struct fuse_sync_bucket { + atomic_t num_writepages; + union { + wait_queue_head_t waitq; + struct rcu_head rcu; + }; +}; + /** * A Fuse connection. * @@ -807,6 +815,9 @@ struct fuse_conn { /** List of filesystems using this connection */ struct list_head mounts; + + /* New writepages go into this bucket */ + struct fuse_sync_bucket *curr_bucket; }; /* diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index b9beb39a4a18..524b2d128985 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -506,10 +506,24 @@ static int fuse_statfs(struct dentry *dentry, struct kstatfs *buf) return err; } +static struct fuse_sync_bucket *fuse_sync_bucket_alloc(void) +{ + struct fuse_sync_bucket *bucket; + + bucket = kzalloc(sizeof(*bucket), GFP_KERNEL | __GFP_NOFAIL); + if (bucket) { + init_waitqueue_head(&bucket->waitq); + /* Initial active count */ + atomic_set(&bucket->num_writepages, 1); + } + return bucket; +} + static int fuse_sync_fs(struct super_block *sb, int wait) { struct fuse_mount *fm = get_fuse_mount_super(sb); struct fuse_conn *fc = fm->fc; + struct fuse_sync_bucket *bucket, *new_bucket; struct fuse_syncfs_in inarg; FUSE_ARGS(args); int err; @@ -528,6 +542,31 @@ static int fuse_sync_fs(struct super_block *sb, int wait) if (!fc->sync_fs) return 0; + new_bucket = fuse_sync_bucket_alloc(); + spin_lock(&fc->lock); + bucket = fc->curr_bucket; + if (atomic_read(&bucket->num_writepages) != 0) { + /* One more for count completion of old bucket */ + atomic_inc(&new_bucket->num_writepages); + rcu_assign_pointer(fc->curr_bucket, new_bucket); + /* Drop initially added active count */ + atomic_dec(&bucket->num_writepages); + spin_unlock(&fc->lock); + + wait_event(bucket->waitq, atomic_read(&bucket->num_writepages) == 0); + /* + * Drop count on new bucket, possibly resulting in a completion + * if more than one syncfs is going on + */ + if (atomic_dec_and_test(&new_bucket->num_writepages)) + wake_up(&new_bucket->waitq); + kfree_rcu(bucket, rcu); + } else { + spin_unlock(&fc->lock); + /* Free unused */ + kfree(new_bucket); + } + memset(&inarg, 0, sizeof(inarg)); args.in_numargs = 1; args.in_args[0].size = sizeof(inarg); @@ -770,6 +809,7 @@ void fuse_conn_put(struct fuse_conn *fc) fiq->ops->release(fiq); put_pid_ns(fc->pid_ns); put_user_ns(fc->user_ns); + kfree_rcu(fc->curr_bucket, rcu); fc->release(fc); } } @@ -1418,6 +1458,7 @@ int fuse_fill_super_common(struct super_block *sb, struct fuse_fs_context *ctx) if (sb->s_flags & SB_MANDLOCK) goto err; + fc->curr_bucket = fuse_sync_bucket_alloc(); fuse_sb_defaults(sb); if (ctx->is_bdev) { ^ permalink raw reply related [flat|nested] 83+ messages in thread
* Re: [Virtio-fs] [PATCH v4 5/5] virtiofs: propagate sync() to file server 2021-08-28 15:21 ` Miklos Szeredi (?) @ 2021-08-30 17:01 ` Vivek Goyal -1 siblings, 0 replies; 83+ messages in thread From: Vivek Goyal @ 2021-08-30 17:01 UTC (permalink / raw) To: Miklos Szeredi Cc: Amir Goldstein, linux-kernel, virtio-fs-list, Max Reitz, linux-fsdevel, virtualization, Robert Krawitz On Sat, Aug 28, 2021 at 05:21:39PM +0200, Miklos Szeredi wrote: > On Mon, Aug 16, 2021 at 11:29:02AM -0400, Vivek Goyal wrote: > > On Sun, Aug 15, 2021 at 05:14:06PM +0300, Amir Goldstein wrote: > > > > I wonder - even if the server does not support SYNCFS or if the kernel > > > does not trust the server with SYNCFS, fuse_sync_fs() can wait > > > until all pending requests up to this call have been completed, either > > > before or after submitting the SYNCFS request. No? > > > > > > > > Does virtiofsd track all requests prior to SYNCFS request to make > > > sure that they were executed on the host filesystem before calling > > > syncfs() on the host filesystem? > > > > Hi Amir, > > > > I don't think virtiofsd has any such notion. I would think, that > > client should make sure all pending writes have completed and > > then send SYNCFS request. > > > > Looking at the sync_filesystem(), I am assuming vfs will take care > > of flushing out all dirty pages and then call ->sync_fs. > > > > Having said that, I think fuse queues the writeback request internally > > and signals completion of writeback to mm(end_page_writeback()). And > > that's why fuse_fsync() has notion of waiting for all pending > > writes to finish on an inode (fuse_sync_writes()). > > > > So I think you have raised a good point. That is if there are pending > > writes at the time of syncfs(), we don't seem to have a notion of > > first waiting for all these writes to finish before we send > > FUSE_SYNCFS request to server. > > So here a proposed patch for fixing this. Works by counting write requests > initiated up till the syncfs call. Since more than one syncfs can be in > progress counts are kept in "buckets" in order to wait for the correct write > requests in each instance. > > I tried to make this lightweight, but the cacheline bounce due to the counter is > still there, unfortunately. fc->num_waiting also causes cacheline bouce, so I'm > not going to optimize this (percpu counter?) until that one is also optimizied. > > Not yet tested, and I'm not sure how to test this. > > Comments? > > Thanks, > Miklos > > > diff --git a/fs/fuse/file.c b/fs/fuse/file.c > index 97f860cfc195..8d1d6e895534 100644 > --- a/fs/fuse/file.c > +++ b/fs/fuse/file.c > @@ -389,6 +389,7 @@ struct fuse_writepage_args { > struct list_head queue_entry; > struct fuse_writepage_args *next; > struct inode *inode; > + struct fuse_sync_bucket *bucket; > }; > > static struct fuse_writepage_args *fuse_find_writeback(struct fuse_inode *fi, > @@ -1608,6 +1609,9 @@ static void fuse_writepage_free(struct fuse_writepage_args *wpa) > struct fuse_args_pages *ap = &wpa->ia.ap; > int i; > > + if (wpa->bucket && atomic_dec_and_test(&wpa->bucket->num_writepages)) Hi Miklos, Wondering why this wpa->bucket check is there. Isn't every wpa is associated bucket. So when do we run into situation when wpa->bucket = NULL. > + wake_up(&wpa->bucket->waitq); > + > for (i = 0; i < ap->num_pages; i++) > __free_page(ap->pages[i]); > > @@ -1871,6 +1875,19 @@ static struct fuse_writepage_args *fuse_writepage_args_alloc(void) > > } > > +static void fuse_writepage_add_to_bucket(struct fuse_conn *fc, > + struct fuse_writepage_args *wpa) > +{ > + if (!fc->sync_fs) > + return; > + > + rcu_read_lock(); > + do { > + wpa->bucket = rcu_dereference(fc->curr_bucket); > + } while (unlikely(!atomic_inc_not_zero(&wpa->bucket->num_writepages))); So this loop is there because fuse_sync_fs() might be replacing fc->curr_bucket. And we are fetching this pointer under rcu. So it is possible that fuse_fs_sync() dropped its reference and that led to ->num_writepages 0 and we don't want to use this bucket. What if fuse_sync_fs() dropped its reference but still there is another wpa in progress and hence ->num_writepages is not zero. We still don't want to use this bucket for new wpa, right? > + rcu_read_unlock(); > +} > + > static int fuse_writepage_locked(struct page *page) > { > struct address_space *mapping = page->mapping; > @@ -1898,6 +1915,7 @@ static int fuse_writepage_locked(struct page *page) > if (!wpa->ia.ff) > goto err_nofile; > > + fuse_writepage_add_to_bucket(fc, wpa); > fuse_write_args_fill(&wpa->ia, wpa->ia.ff, page_offset(page), 0); > > copy_highpage(tmp_page, page); > @@ -2148,6 +2166,8 @@ static int fuse_writepages_fill(struct page *page, > __free_page(tmp_page); > goto out_unlock; > } > + fuse_writepage_add_to_bucket(fc, wpa); > + > data->max_pages = 1; > > ap = &wpa->ia.ap; > diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h > index 07829ce78695..ee638e227bb3 100644 > --- a/fs/fuse/fuse_i.h > +++ b/fs/fuse/fuse_i.h > @@ -515,6 +515,14 @@ struct fuse_fs_context { > void **fudptr; > }; > > +struct fuse_sync_bucket { > + atomic_t num_writepages; > + union { > + wait_queue_head_t waitq; > + struct rcu_head rcu; > + }; > +}; > + > /** > * A Fuse connection. > * > @@ -807,6 +815,9 @@ struct fuse_conn { > > /** List of filesystems using this connection */ > struct list_head mounts; > + > + /* New writepages go into this bucket */ > + struct fuse_sync_bucket *curr_bucket; > }; > > /* > diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c > index b9beb39a4a18..524b2d128985 100644 > --- a/fs/fuse/inode.c > +++ b/fs/fuse/inode.c > @@ -506,10 +506,24 @@ static int fuse_statfs(struct dentry *dentry, struct kstatfs *buf) > return err; > } > > +static struct fuse_sync_bucket *fuse_sync_bucket_alloc(void) > +{ > + struct fuse_sync_bucket *bucket; > + > + bucket = kzalloc(sizeof(*bucket), GFP_KERNEL | __GFP_NOFAIL); > + if (bucket) { > + init_waitqueue_head(&bucket->waitq); > + /* Initial active count */ > + atomic_set(&bucket->num_writepages, 1); > + } > + return bucket; > +} > + > static int fuse_sync_fs(struct super_block *sb, int wait) > { > struct fuse_mount *fm = get_fuse_mount_super(sb); > struct fuse_conn *fc = fm->fc; > + struct fuse_sync_bucket *bucket, *new_bucket; > struct fuse_syncfs_in inarg; > FUSE_ARGS(args); > int err; > @@ -528,6 +542,31 @@ static int fuse_sync_fs(struct super_block *sb, int wait) > if (!fc->sync_fs) > return 0; > > + new_bucket = fuse_sync_bucket_alloc(); > + spin_lock(&fc->lock); > + bucket = fc->curr_bucket; > + if (atomic_read(&bucket->num_writepages) != 0) { > + /* One more for count completion of old bucket */ > + atomic_inc(&new_bucket->num_writepages); > + rcu_assign_pointer(fc->curr_bucket, new_bucket); > + /* Drop initially added active count */ > + atomic_dec(&bucket->num_writepages); > + spin_unlock(&fc->lock); > + > + wait_event(bucket->waitq, atomic_read(&bucket->num_writepages) == 0); > + /* > + * Drop count on new bucket, possibly resulting in a completion > + * if more than one syncfs is going on > + */ > + if (atomic_dec_and_test(&new_bucket->num_writepages)) > + wake_up(&new_bucket->waitq); > + kfree_rcu(bucket, rcu); > + } else { > + spin_unlock(&fc->lock); > + /* Free unused */ > + kfree(new_bucket); When can we run into the situation when fc->curr_bucket is num_writepages == 0. When install a bucket it has count 1. And only time it can go to 0 is when we have dropped the initial reference. And initial reference can be dropped only after removing bucket from fc->curr_bucket. IOW, we don't drop initial reference on a bucket if it is in fc->curr_bucket. And that mean anything installed fc->curr_bucket should not ever have a reference count of 0. What am I missing. Thanks Vivek > + } > + > memset(&inarg, 0, sizeof(inarg)); > args.in_numargs = 1; > args.in_args[0].size = sizeof(inarg); > @@ -770,6 +809,7 @@ void fuse_conn_put(struct fuse_conn *fc) > fiq->ops->release(fiq); > put_pid_ns(fc->pid_ns); > put_user_ns(fc->user_ns); > + kfree_rcu(fc->curr_bucket, rcu); > fc->release(fc); > } > } > @@ -1418,6 +1458,7 @@ int fuse_fill_super_common(struct super_block *sb, struct fuse_fs_context *ctx) > if (sb->s_flags & SB_MANDLOCK) > goto err; > > + fc->curr_bucket = fuse_sync_bucket_alloc(); > fuse_sb_defaults(sb); > > if (ctx->is_bdev) { > ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [PATCH v4 5/5] virtiofs: propagate sync() to file server @ 2021-08-30 17:01 ` Vivek Goyal 0 siblings, 0 replies; 83+ messages in thread From: Vivek Goyal @ 2021-08-30 17:01 UTC (permalink / raw) To: Miklos Szeredi Cc: Amir Goldstein, Greg Kurz, virtualization, linux-fsdevel, linux-kernel, virtio-fs-list, Stefan Hajnoczi, Max Reitz, Robert Krawitz On Sat, Aug 28, 2021 at 05:21:39PM +0200, Miklos Szeredi wrote: > On Mon, Aug 16, 2021 at 11:29:02AM -0400, Vivek Goyal wrote: > > On Sun, Aug 15, 2021 at 05:14:06PM +0300, Amir Goldstein wrote: > > > > I wonder - even if the server does not support SYNCFS or if the kernel > > > does not trust the server with SYNCFS, fuse_sync_fs() can wait > > > until all pending requests up to this call have been completed, either > > > before or after submitting the SYNCFS request. No? > > > > > > > > Does virtiofsd track all requests prior to SYNCFS request to make > > > sure that they were executed on the host filesystem before calling > > > syncfs() on the host filesystem? > > > > Hi Amir, > > > > I don't think virtiofsd has any such notion. I would think, that > > client should make sure all pending writes have completed and > > then send SYNCFS request. > > > > Looking at the sync_filesystem(), I am assuming vfs will take care > > of flushing out all dirty pages and then call ->sync_fs. > > > > Having said that, I think fuse queues the writeback request internally > > and signals completion of writeback to mm(end_page_writeback()). And > > that's why fuse_fsync() has notion of waiting for all pending > > writes to finish on an inode (fuse_sync_writes()). > > > > So I think you have raised a good point. That is if there are pending > > writes at the time of syncfs(), we don't seem to have a notion of > > first waiting for all these writes to finish before we send > > FUSE_SYNCFS request to server. > > So here a proposed patch for fixing this. Works by counting write requests > initiated up till the syncfs call. Since more than one syncfs can be in > progress counts are kept in "buckets" in order to wait for the correct write > requests in each instance. > > I tried to make this lightweight, but the cacheline bounce due to the counter is > still there, unfortunately. fc->num_waiting also causes cacheline bouce, so I'm > not going to optimize this (percpu counter?) until that one is also optimizied. > > Not yet tested, and I'm not sure how to test this. > > Comments? > > Thanks, > Miklos > > > diff --git a/fs/fuse/file.c b/fs/fuse/file.c > index 97f860cfc195..8d1d6e895534 100644 > --- a/fs/fuse/file.c > +++ b/fs/fuse/file.c > @@ -389,6 +389,7 @@ struct fuse_writepage_args { > struct list_head queue_entry; > struct fuse_writepage_args *next; > struct inode *inode; > + struct fuse_sync_bucket *bucket; > }; > > static struct fuse_writepage_args *fuse_find_writeback(struct fuse_inode *fi, > @@ -1608,6 +1609,9 @@ static void fuse_writepage_free(struct fuse_writepage_args *wpa) > struct fuse_args_pages *ap = &wpa->ia.ap; > int i; > > + if (wpa->bucket && atomic_dec_and_test(&wpa->bucket->num_writepages)) Hi Miklos, Wondering why this wpa->bucket check is there. Isn't every wpa is associated bucket. So when do we run into situation when wpa->bucket = NULL. > + wake_up(&wpa->bucket->waitq); > + > for (i = 0; i < ap->num_pages; i++) > __free_page(ap->pages[i]); > > @@ -1871,6 +1875,19 @@ static struct fuse_writepage_args *fuse_writepage_args_alloc(void) > > } > > +static void fuse_writepage_add_to_bucket(struct fuse_conn *fc, > + struct fuse_writepage_args *wpa) > +{ > + if (!fc->sync_fs) > + return; > + > + rcu_read_lock(); > + do { > + wpa->bucket = rcu_dereference(fc->curr_bucket); > + } while (unlikely(!atomic_inc_not_zero(&wpa->bucket->num_writepages))); So this loop is there because fuse_sync_fs() might be replacing fc->curr_bucket. And we are fetching this pointer under rcu. So it is possible that fuse_fs_sync() dropped its reference and that led to ->num_writepages 0 and we don't want to use this bucket. What if fuse_sync_fs() dropped its reference but still there is another wpa in progress and hence ->num_writepages is not zero. We still don't want to use this bucket for new wpa, right? > + rcu_read_unlock(); > +} > + > static int fuse_writepage_locked(struct page *page) > { > struct address_space *mapping = page->mapping; > @@ -1898,6 +1915,7 @@ static int fuse_writepage_locked(struct page *page) > if (!wpa->ia.ff) > goto err_nofile; > > + fuse_writepage_add_to_bucket(fc, wpa); > fuse_write_args_fill(&wpa->ia, wpa->ia.ff, page_offset(page), 0); > > copy_highpage(tmp_page, page); > @@ -2148,6 +2166,8 @@ static int fuse_writepages_fill(struct page *page, > __free_page(tmp_page); > goto out_unlock; > } > + fuse_writepage_add_to_bucket(fc, wpa); > + > data->max_pages = 1; > > ap = &wpa->ia.ap; > diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h > index 07829ce78695..ee638e227bb3 100644 > --- a/fs/fuse/fuse_i.h > +++ b/fs/fuse/fuse_i.h > @@ -515,6 +515,14 @@ struct fuse_fs_context { > void **fudptr; > }; > > +struct fuse_sync_bucket { > + atomic_t num_writepages; > + union { > + wait_queue_head_t waitq; > + struct rcu_head rcu; > + }; > +}; > + > /** > * A Fuse connection. > * > @@ -807,6 +815,9 @@ struct fuse_conn { > > /** List of filesystems using this connection */ > struct list_head mounts; > + > + /* New writepages go into this bucket */ > + struct fuse_sync_bucket *curr_bucket; > }; > > /* > diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c > index b9beb39a4a18..524b2d128985 100644 > --- a/fs/fuse/inode.c > +++ b/fs/fuse/inode.c > @@ -506,10 +506,24 @@ static int fuse_statfs(struct dentry *dentry, struct kstatfs *buf) > return err; > } > > +static struct fuse_sync_bucket *fuse_sync_bucket_alloc(void) > +{ > + struct fuse_sync_bucket *bucket; > + > + bucket = kzalloc(sizeof(*bucket), GFP_KERNEL | __GFP_NOFAIL); > + if (bucket) { > + init_waitqueue_head(&bucket->waitq); > + /* Initial active count */ > + atomic_set(&bucket->num_writepages, 1); > + } > + return bucket; > +} > + > static int fuse_sync_fs(struct super_block *sb, int wait) > { > struct fuse_mount *fm = get_fuse_mount_super(sb); > struct fuse_conn *fc = fm->fc; > + struct fuse_sync_bucket *bucket, *new_bucket; > struct fuse_syncfs_in inarg; > FUSE_ARGS(args); > int err; > @@ -528,6 +542,31 @@ static int fuse_sync_fs(struct super_block *sb, int wait) > if (!fc->sync_fs) > return 0; > > + new_bucket = fuse_sync_bucket_alloc(); > + spin_lock(&fc->lock); > + bucket = fc->curr_bucket; > + if (atomic_read(&bucket->num_writepages) != 0) { > + /* One more for count completion of old bucket */ > + atomic_inc(&new_bucket->num_writepages); > + rcu_assign_pointer(fc->curr_bucket, new_bucket); > + /* Drop initially added active count */ > + atomic_dec(&bucket->num_writepages); > + spin_unlock(&fc->lock); > + > + wait_event(bucket->waitq, atomic_read(&bucket->num_writepages) == 0); > + /* > + * Drop count on new bucket, possibly resulting in a completion > + * if more than one syncfs is going on > + */ > + if (atomic_dec_and_test(&new_bucket->num_writepages)) > + wake_up(&new_bucket->waitq); > + kfree_rcu(bucket, rcu); > + } else { > + spin_unlock(&fc->lock); > + /* Free unused */ > + kfree(new_bucket); When can we run into the situation when fc->curr_bucket is num_writepages == 0. When install a bucket it has count 1. And only time it can go to 0 is when we have dropped the initial reference. And initial reference can be dropped only after removing bucket from fc->curr_bucket. IOW, we don't drop initial reference on a bucket if it is in fc->curr_bucket. And that mean anything installed fc->curr_bucket should not ever have a reference count of 0. What am I missing. Thanks Vivek > + } > + > memset(&inarg, 0, sizeof(inarg)); > args.in_numargs = 1; > args.in_args[0].size = sizeof(inarg); > @@ -770,6 +809,7 @@ void fuse_conn_put(struct fuse_conn *fc) > fiq->ops->release(fiq); > put_pid_ns(fc->pid_ns); > put_user_ns(fc->user_ns); > + kfree_rcu(fc->curr_bucket, rcu); > fc->release(fc); > } > } > @@ -1418,6 +1458,7 @@ int fuse_fill_super_common(struct super_block *sb, struct fuse_fs_context *ctx) > if (sb->s_flags & SB_MANDLOCK) > goto err; > > + fc->curr_bucket = fuse_sync_bucket_alloc(); > fuse_sb_defaults(sb); > > if (ctx->is_bdev) { > ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [PATCH v4 5/5] virtiofs: propagate sync() to file server @ 2021-08-30 17:01 ` Vivek Goyal 0 siblings, 0 replies; 83+ messages in thread From: Vivek Goyal @ 2021-08-30 17:01 UTC (permalink / raw) To: Miklos Szeredi Cc: Amir Goldstein, linux-kernel, virtio-fs-list, Max Reitz, Stefan Hajnoczi, linux-fsdevel, virtualization, Robert Krawitz On Sat, Aug 28, 2021 at 05:21:39PM +0200, Miklos Szeredi wrote: > On Mon, Aug 16, 2021 at 11:29:02AM -0400, Vivek Goyal wrote: > > On Sun, Aug 15, 2021 at 05:14:06PM +0300, Amir Goldstein wrote: > > > > I wonder - even if the server does not support SYNCFS or if the kernel > > > does not trust the server with SYNCFS, fuse_sync_fs() can wait > > > until all pending requests up to this call have been completed, either > > > before or after submitting the SYNCFS request. No? > > > > > > > > Does virtiofsd track all requests prior to SYNCFS request to make > > > sure that they were executed on the host filesystem before calling > > > syncfs() on the host filesystem? > > > > Hi Amir, > > > > I don't think virtiofsd has any such notion. I would think, that > > client should make sure all pending writes have completed and > > then send SYNCFS request. > > > > Looking at the sync_filesystem(), I am assuming vfs will take care > > of flushing out all dirty pages and then call ->sync_fs. > > > > Having said that, I think fuse queues the writeback request internally > > and signals completion of writeback to mm(end_page_writeback()). And > > that's why fuse_fsync() has notion of waiting for all pending > > writes to finish on an inode (fuse_sync_writes()). > > > > So I think you have raised a good point. That is if there are pending > > writes at the time of syncfs(), we don't seem to have a notion of > > first waiting for all these writes to finish before we send > > FUSE_SYNCFS request to server. > > So here a proposed patch for fixing this. Works by counting write requests > initiated up till the syncfs call. Since more than one syncfs can be in > progress counts are kept in "buckets" in order to wait for the correct write > requests in each instance. > > I tried to make this lightweight, but the cacheline bounce due to the counter is > still there, unfortunately. fc->num_waiting also causes cacheline bouce, so I'm > not going to optimize this (percpu counter?) until that one is also optimizied. > > Not yet tested, and I'm not sure how to test this. > > Comments? > > Thanks, > Miklos > > > diff --git a/fs/fuse/file.c b/fs/fuse/file.c > index 97f860cfc195..8d1d6e895534 100644 > --- a/fs/fuse/file.c > +++ b/fs/fuse/file.c > @@ -389,6 +389,7 @@ struct fuse_writepage_args { > struct list_head queue_entry; > struct fuse_writepage_args *next; > struct inode *inode; > + struct fuse_sync_bucket *bucket; > }; > > static struct fuse_writepage_args *fuse_find_writeback(struct fuse_inode *fi, > @@ -1608,6 +1609,9 @@ static void fuse_writepage_free(struct fuse_writepage_args *wpa) > struct fuse_args_pages *ap = &wpa->ia.ap; > int i; > > + if (wpa->bucket && atomic_dec_and_test(&wpa->bucket->num_writepages)) Hi Miklos, Wondering why this wpa->bucket check is there. Isn't every wpa is associated bucket. So when do we run into situation when wpa->bucket = NULL. > + wake_up(&wpa->bucket->waitq); > + > for (i = 0; i < ap->num_pages; i++) > __free_page(ap->pages[i]); > > @@ -1871,6 +1875,19 @@ static struct fuse_writepage_args *fuse_writepage_args_alloc(void) > > } > > +static void fuse_writepage_add_to_bucket(struct fuse_conn *fc, > + struct fuse_writepage_args *wpa) > +{ > + if (!fc->sync_fs) > + return; > + > + rcu_read_lock(); > + do { > + wpa->bucket = rcu_dereference(fc->curr_bucket); > + } while (unlikely(!atomic_inc_not_zero(&wpa->bucket->num_writepages))); So this loop is there because fuse_sync_fs() might be replacing fc->curr_bucket. And we are fetching this pointer under rcu. So it is possible that fuse_fs_sync() dropped its reference and that led to ->num_writepages 0 and we don't want to use this bucket. What if fuse_sync_fs() dropped its reference but still there is another wpa in progress and hence ->num_writepages is not zero. We still don't want to use this bucket for new wpa, right? > + rcu_read_unlock(); > +} > + > static int fuse_writepage_locked(struct page *page) > { > struct address_space *mapping = page->mapping; > @@ -1898,6 +1915,7 @@ static int fuse_writepage_locked(struct page *page) > if (!wpa->ia.ff) > goto err_nofile; > > + fuse_writepage_add_to_bucket(fc, wpa); > fuse_write_args_fill(&wpa->ia, wpa->ia.ff, page_offset(page), 0); > > copy_highpage(tmp_page, page); > @@ -2148,6 +2166,8 @@ static int fuse_writepages_fill(struct page *page, > __free_page(tmp_page); > goto out_unlock; > } > + fuse_writepage_add_to_bucket(fc, wpa); > + > data->max_pages = 1; > > ap = &wpa->ia.ap; > diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h > index 07829ce78695..ee638e227bb3 100644 > --- a/fs/fuse/fuse_i.h > +++ b/fs/fuse/fuse_i.h > @@ -515,6 +515,14 @@ struct fuse_fs_context { > void **fudptr; > }; > > +struct fuse_sync_bucket { > + atomic_t num_writepages; > + union { > + wait_queue_head_t waitq; > + struct rcu_head rcu; > + }; > +}; > + > /** > * A Fuse connection. > * > @@ -807,6 +815,9 @@ struct fuse_conn { > > /** List of filesystems using this connection */ > struct list_head mounts; > + > + /* New writepages go into this bucket */ > + struct fuse_sync_bucket *curr_bucket; > }; > > /* > diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c > index b9beb39a4a18..524b2d128985 100644 > --- a/fs/fuse/inode.c > +++ b/fs/fuse/inode.c > @@ -506,10 +506,24 @@ static int fuse_statfs(struct dentry *dentry, struct kstatfs *buf) > return err; > } > > +static struct fuse_sync_bucket *fuse_sync_bucket_alloc(void) > +{ > + struct fuse_sync_bucket *bucket; > + > + bucket = kzalloc(sizeof(*bucket), GFP_KERNEL | __GFP_NOFAIL); > + if (bucket) { > + init_waitqueue_head(&bucket->waitq); > + /* Initial active count */ > + atomic_set(&bucket->num_writepages, 1); > + } > + return bucket; > +} > + > static int fuse_sync_fs(struct super_block *sb, int wait) > { > struct fuse_mount *fm = get_fuse_mount_super(sb); > struct fuse_conn *fc = fm->fc; > + struct fuse_sync_bucket *bucket, *new_bucket; > struct fuse_syncfs_in inarg; > FUSE_ARGS(args); > int err; > @@ -528,6 +542,31 @@ static int fuse_sync_fs(struct super_block *sb, int wait) > if (!fc->sync_fs) > return 0; > > + new_bucket = fuse_sync_bucket_alloc(); > + spin_lock(&fc->lock); > + bucket = fc->curr_bucket; > + if (atomic_read(&bucket->num_writepages) != 0) { > + /* One more for count completion of old bucket */ > + atomic_inc(&new_bucket->num_writepages); > + rcu_assign_pointer(fc->curr_bucket, new_bucket); > + /* Drop initially added active count */ > + atomic_dec(&bucket->num_writepages); > + spin_unlock(&fc->lock); > + > + wait_event(bucket->waitq, atomic_read(&bucket->num_writepages) == 0); > + /* > + * Drop count on new bucket, possibly resulting in a completion > + * if more than one syncfs is going on > + */ > + if (atomic_dec_and_test(&new_bucket->num_writepages)) > + wake_up(&new_bucket->waitq); > + kfree_rcu(bucket, rcu); > + } else { > + spin_unlock(&fc->lock); > + /* Free unused */ > + kfree(new_bucket); When can we run into the situation when fc->curr_bucket is num_writepages == 0. When install a bucket it has count 1. And only time it can go to 0 is when we have dropped the initial reference. And initial reference can be dropped only after removing bucket from fc->curr_bucket. IOW, we don't drop initial reference on a bucket if it is in fc->curr_bucket. And that mean anything installed fc->curr_bucket should not ever have a reference count of 0. What am I missing. Thanks Vivek > + } > + > memset(&inarg, 0, sizeof(inarg)); > args.in_numargs = 1; > args.in_args[0].size = sizeof(inarg); > @@ -770,6 +809,7 @@ void fuse_conn_put(struct fuse_conn *fc) > fiq->ops->release(fiq); > put_pid_ns(fc->pid_ns); > put_user_ns(fc->user_ns); > + kfree_rcu(fc->curr_bucket, rcu); > fc->release(fc); > } > } > @@ -1418,6 +1458,7 @@ int fuse_fill_super_common(struct super_block *sb, struct fuse_fs_context *ctx) > if (sb->s_flags & SB_MANDLOCK) > goto err; > > + fc->curr_bucket = fuse_sync_bucket_alloc(); > fuse_sb_defaults(sb); > > if (ctx->is_bdev) { > _______________________________________________ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [Virtio-fs] [PATCH v4 5/5] virtiofs: propagate sync() to file server 2021-08-30 17:01 ` Vivek Goyal @ 2021-08-30 17:36 ` Miklos Szeredi -1 siblings, 0 replies; 83+ messages in thread From: Miklos Szeredi @ 2021-08-30 17:36 UTC (permalink / raw) To: Vivek Goyal Cc: Amir Goldstein, linux-kernel, virtio-fs-list, Max Reitz, linux-fsdevel, virtualization, Robert Krawitz On Mon, 30 Aug 2021 at 19:01, Vivek Goyal <vgoyal@redhat.com> wrote: > > static struct fuse_writepage_args *fuse_find_writeback(struct fuse_inode *fi, > > @@ -1608,6 +1609,9 @@ static void fuse_writepage_free(struct fuse_writepage_args *wpa) > > struct fuse_args_pages *ap = &wpa->ia.ap; > > int i; > > > > + if (wpa->bucket && atomic_dec_and_test(&wpa->bucket->num_writepages)) > > Hi Miklos, > > Wondering why this wpa->bucket check is there. Isn't every wpa is associated > bucket. So when do we run into situation when wpa->bucket = NULL. In case fc->sync_fs is false. > > @@ -1871,6 +1875,19 @@ static struct fuse_writepage_args *fuse_writepage_args_alloc(void) > > > > } > > > > +static void fuse_writepage_add_to_bucket(struct fuse_conn *fc, > > + struct fuse_writepage_args *wpa) > > +{ > > + if (!fc->sync_fs) > > + return; > > + > > + rcu_read_lock(); > > + do { > > + wpa->bucket = rcu_dereference(fc->curr_bucket); > > + } while (unlikely(!atomic_inc_not_zero(&wpa->bucket->num_writepages))); > > So this loop is there because fuse_sync_fs() might be replacing > fc->curr_bucket. And we are fetching this pointer under rcu. So it is > possible that fuse_fs_sync() dropped its reference and that led to > ->num_writepages 0 and we don't want to use this bucket. > > What if fuse_sync_fs() dropped its reference but still there is another > wpa in progress and hence ->num_writepages is not zero. We still don't > want to use this bucket for new wpa, right? It's an unlikely race in which case the the write will go into the old bucket, and will be waited for, but that definitely should not be a problem. > > @@ -528,6 +542,31 @@ static int fuse_sync_fs(struct super_block *sb, int wait) > > if (!fc->sync_fs) > > return 0; > > > > + new_bucket = fuse_sync_bucket_alloc(); > > + spin_lock(&fc->lock); > > + bucket = fc->curr_bucket; > > + if (atomic_read(&bucket->num_writepages) != 0) { > > + /* One more for count completion of old bucket */ > > + atomic_inc(&new_bucket->num_writepages); > > + rcu_assign_pointer(fc->curr_bucket, new_bucket); > > + /* Drop initially added active count */ > > + atomic_dec(&bucket->num_writepages); > > + spin_unlock(&fc->lock); > > + > > + wait_event(bucket->waitq, atomic_read(&bucket->num_writepages) == 0); > > + /* > > + * Drop count on new bucket, possibly resulting in a completion > > + * if more than one syncfs is going on > > + */ > > + if (atomic_dec_and_test(&new_bucket->num_writepages)) > > + wake_up(&new_bucket->waitq); > > + kfree_rcu(bucket, rcu); > > + } else { > > + spin_unlock(&fc->lock); > > + /* Free unused */ > > + kfree(new_bucket); > When can we run into the situation when fc->curr_bucket is num_writepages > == 0. When install a bucket it has count 1. And only time it can go to > 0 is when we have dropped the initial reference. And initial reference > can be dropped only after removing bucket from fc->curr_bucket. > > IOW, we don't drop initial reference on a bucket if it is in > fc->curr_bucket. And that mean anything installed fc->curr_bucket should > not ever have a reference count of 0. What am I missing. You are correct. I fixed it by warning on zero count and checking for count != 1. I have other fixes as well, will send v2. Thanks, Miklos ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [PATCH v4 5/5] virtiofs: propagate sync() to file server @ 2021-08-30 17:36 ` Miklos Szeredi 0 siblings, 0 replies; 83+ messages in thread From: Miklos Szeredi @ 2021-08-30 17:36 UTC (permalink / raw) To: Vivek Goyal Cc: Amir Goldstein, Greg Kurz, virtualization, linux-fsdevel, linux-kernel, virtio-fs-list, Stefan Hajnoczi, Max Reitz, Robert Krawitz On Mon, 30 Aug 2021 at 19:01, Vivek Goyal <vgoyal@redhat.com> wrote: > > static struct fuse_writepage_args *fuse_find_writeback(struct fuse_inode *fi, > > @@ -1608,6 +1609,9 @@ static void fuse_writepage_free(struct fuse_writepage_args *wpa) > > struct fuse_args_pages *ap = &wpa->ia.ap; > > int i; > > > > + if (wpa->bucket && atomic_dec_and_test(&wpa->bucket->num_writepages)) > > Hi Miklos, > > Wondering why this wpa->bucket check is there. Isn't every wpa is associated > bucket. So when do we run into situation when wpa->bucket = NULL. In case fc->sync_fs is false. > > @@ -1871,6 +1875,19 @@ static struct fuse_writepage_args *fuse_writepage_args_alloc(void) > > > > } > > > > +static void fuse_writepage_add_to_bucket(struct fuse_conn *fc, > > + struct fuse_writepage_args *wpa) > > +{ > > + if (!fc->sync_fs) > > + return; > > + > > + rcu_read_lock(); > > + do { > > + wpa->bucket = rcu_dereference(fc->curr_bucket); > > + } while (unlikely(!atomic_inc_not_zero(&wpa->bucket->num_writepages))); > > So this loop is there because fuse_sync_fs() might be replacing > fc->curr_bucket. And we are fetching this pointer under rcu. So it is > possible that fuse_fs_sync() dropped its reference and that led to > ->num_writepages 0 and we don't want to use this bucket. > > What if fuse_sync_fs() dropped its reference but still there is another > wpa in progress and hence ->num_writepages is not zero. We still don't > want to use this bucket for new wpa, right? It's an unlikely race in which case the the write will go into the old bucket, and will be waited for, but that definitely should not be a problem. > > @@ -528,6 +542,31 @@ static int fuse_sync_fs(struct super_block *sb, int wait) > > if (!fc->sync_fs) > > return 0; > > > > + new_bucket = fuse_sync_bucket_alloc(); > > + spin_lock(&fc->lock); > > + bucket = fc->curr_bucket; > > + if (atomic_read(&bucket->num_writepages) != 0) { > > + /* One more for count completion of old bucket */ > > + atomic_inc(&new_bucket->num_writepages); > > + rcu_assign_pointer(fc->curr_bucket, new_bucket); > > + /* Drop initially added active count */ > > + atomic_dec(&bucket->num_writepages); > > + spin_unlock(&fc->lock); > > + > > + wait_event(bucket->waitq, atomic_read(&bucket->num_writepages) == 0); > > + /* > > + * Drop count on new bucket, possibly resulting in a completion > > + * if more than one syncfs is going on > > + */ > > + if (atomic_dec_and_test(&new_bucket->num_writepages)) > > + wake_up(&new_bucket->waitq); > > + kfree_rcu(bucket, rcu); > > + } else { > > + spin_unlock(&fc->lock); > > + /* Free unused */ > > + kfree(new_bucket); > When can we run into the situation when fc->curr_bucket is num_writepages > == 0. When install a bucket it has count 1. And only time it can go to > 0 is when we have dropped the initial reference. And initial reference > can be dropped only after removing bucket from fc->curr_bucket. > > IOW, we don't drop initial reference on a bucket if it is in > fc->curr_bucket. And that mean anything installed fc->curr_bucket should > not ever have a reference count of 0. What am I missing. You are correct. I fixed it by warning on zero count and checking for count != 1. I have other fixes as well, will send v2. Thanks, Miklos ^ permalink raw reply [flat|nested] 83+ messages in thread
end of thread, other threads:[~2021-08-30 17:36 UTC | newest] Thread overview: 83+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2021-05-20 15:46 [Virtio-fs] [PATCH v4 0/5] virtiofs: propagate sync() to file server Greg Kurz 2021-05-20 15:46 ` Greg Kurz 2021-05-20 15:46 ` Greg Kurz 2021-05-20 15:46 ` [Virtio-fs] [PATCH v4 1/5] fuse: Fix leak in fuse_dentry_automount() error path Greg Kurz 2021-05-20 15:46 ` Greg Kurz 2021-05-20 15:46 ` Greg Kurz 2021-05-20 19:45 ` [Virtio-fs] " Al Viro 2021-05-20 19:45 ` Al Viro 2021-05-20 19:45 ` Al Viro 2021-05-21 7:54 ` [Virtio-fs] " Miklos Szeredi 2021-05-21 7:54 ` Miklos Szeredi 2021-05-21 8:15 ` [Virtio-fs] " Greg Kurz 2021-05-21 8:15 ` Greg Kurz 2021-05-21 8:15 ` Greg Kurz 2021-05-21 8:23 ` [Virtio-fs] " Miklos Szeredi 2021-05-21 8:23 ` Miklos Szeredi 2021-05-21 8:08 ` [Virtio-fs] " Greg Kurz 2021-05-21 8:08 ` Greg Kurz 2021-05-21 8:08 ` Greg Kurz 2021-05-20 15:46 ` [Virtio-fs] [PATCH v4 2/5] fuse: Call vfs_get_tree() for submounts Greg Kurz 2021-05-20 15:46 ` Greg Kurz 2021-05-20 15:46 ` Greg Kurz 2021-05-21 8:19 ` [Virtio-fs] " Miklos Szeredi 2021-05-21 8:19 ` Miklos Szeredi 2021-05-21 8:28 ` [Virtio-fs] " Greg Kurz 2021-05-21 8:28 ` Greg Kurz 2021-05-21 8:28 ` Greg Kurz 2021-05-22 17:50 ` kernel test robot 2021-05-22 17:50 ` kernel test robot 2021-05-22 17:50 ` kernel test robot 2021-05-22 17:50 ` [Virtio-fs] " kernel test robot 2021-05-22 20:12 ` kernel test robot 2021-05-22 20:12 ` kernel test robot 2021-05-22 20:12 ` kernel test robot 2021-05-22 20:12 ` [Virtio-fs] " kernel test robot 2021-05-20 15:46 ` [Virtio-fs] [PATCH v4 3/5] fuse: Make fuse_fill_super_submount() static Greg Kurz 2021-05-20 15:46 ` Greg Kurz 2021-05-20 15:46 ` Greg Kurz 2021-05-20 15:46 ` [Virtio-fs] [PATCH v4 4/5] virtiofs: Skip submounts in sget_fc() Greg Kurz 2021-05-20 15:46 ` Greg Kurz 2021-05-20 15:46 ` Greg Kurz 2021-05-21 8:26 ` [Virtio-fs] " Miklos Szeredi 2021-05-21 8:26 ` Miklos Szeredi 2021-05-21 8:39 ` [Virtio-fs] " Greg Kurz 2021-05-21 8:39 ` Greg Kurz 2021-05-21 8:39 ` Greg Kurz 2021-05-21 8:50 ` [Virtio-fs] " Miklos Szeredi 2021-05-21 8:50 ` Miklos Szeredi 2021-05-21 10:06 ` [Virtio-fs] " Greg Kurz 2021-05-21 10:06 ` Greg Kurz 2021-05-21 10:06 ` Greg Kurz 2021-05-21 12:37 ` [Virtio-fs] " Miklos Szeredi 2021-05-21 12:37 ` Miklos Szeredi 2021-05-21 13:36 ` [Virtio-fs] " Greg Kurz 2021-05-21 13:36 ` Greg Kurz 2021-05-21 13:36 ` Greg Kurz 2021-05-20 15:46 ` [Virtio-fs] [PATCH v4 5/5] virtiofs: propagate sync() to file server Greg Kurz 2021-05-20 15:46 ` Greg Kurz 2021-05-20 15:46 ` Greg Kurz 2021-05-21 10:08 ` [Virtio-fs] " Greg Kurz 2021-05-21 10:08 ` Greg Kurz 2021-05-21 10:08 ` Greg Kurz 2021-05-21 12:51 ` [Virtio-fs] " Miklos Szeredi 2021-05-21 12:51 ` Miklos Szeredi 2021-08-15 14:14 ` [Virtio-fs] " Amir Goldstein 2021-08-15 14:14 ` Amir Goldstein 2021-08-16 15:29 ` [Virtio-fs] " Vivek Goyal 2021-08-16 15:29 ` Vivek Goyal 2021-08-16 15:29 ` Vivek Goyal 2021-08-16 18:57 ` [Virtio-fs] " Amir Goldstein 2021-08-16 18:57 ` Amir Goldstein 2021-08-16 19:11 ` [Virtio-fs] " Vivek Goyal 2021-08-16 19:11 ` Vivek Goyal 2021-08-16 19:11 ` Vivek Goyal 2021-08-16 19:46 ` [Virtio-fs] " Amir Goldstein 2021-08-16 19:46 ` Amir Goldstein 2021-08-28 15:21 ` [Virtio-fs] " Miklos Szeredi 2021-08-28 15:21 ` Miklos Szeredi 2021-08-30 17:01 ` [Virtio-fs] " Vivek Goyal 2021-08-30 17:01 ` Vivek Goyal 2021-08-30 17:01 ` Vivek Goyal 2021-08-30 17:36 ` [Virtio-fs] " Miklos Szeredi 2021-08-30 17:36 ` Miklos Szeredi
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.