qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 0/3] virtiofsd: Add support for FUSE_SYNCFS request
@ 2022-02-14 13:58 Greg Kurz
  2022-02-14 13:58 ` [PATCH v5 1/3] virtiofsd: Add support for FUSE_SYNCFS request with announce_submounts Greg Kurz
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Greg Kurz @ 2022-02-14 13:58 UTC (permalink / raw)
  To: qemu-devel
  Cc: Sebastian Hasler, Dr. David Alan Gilbert, Greg Kurz, virtio-fs,
	Stefan Hajnoczi, Vivek Goyal

FUSE_SYNCFS allows the client to flush the host page cache.

v5: - split announce_submounts and ! announce_submounts to separate patches
      for easier review (but could be squashed together)
    - track submounts directly in lo_inode structure following the same
      logic as FUSE_ATTR_SUBMOUNT (i.e. directory with either st_dev or
      stx_mnt_id different from parent)

v4: - based on upstream linux FUSE_SYNCFS
    - added support for the '-o announce_submounts' case, i.e. client sends
      a FUSE_SYNCFS request for each submount (identified by its root inode)
    - adapted the case without '-o announce_submounts' so that syncfs() is
      no longer called with lo->mutex held

v3: - track submounts and do per-submount syncfs() (Vivek)
    - based on new version of FUSE_SYNCFS (still not upstream)
      https://listman.redhat.com/archives/virtio-fs/2021-May/msg00025.html

v2: - based on new version of FUSE_SYNCFS
      https://listman.redhat.com/archives/virtio-fs/2021-April/msg00166.html
    - propagate syncfs() errors to client (Vivek)

Greg Kurz (3):
  virtiofsd: Add support for FUSE_SYNCFS request with announce_submounts
  virtiofsd: Track submounts
  virtiofsd: Add support for FUSE_SYNCFS request without
    announce_submounts

 tools/virtiofsd/fuse_lowlevel.c       | 11 ++++
 tools/virtiofsd/fuse_lowlevel.h       | 13 ++++
 tools/virtiofsd/passthrough_ll.c      | 93 +++++++++++++++++++++++++--
 tools/virtiofsd/passthrough_seccomp.c |  1 +
 4 files changed, 114 insertions(+), 4 deletions(-)

-- 
2.34.1




^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v5 1/3] virtiofsd: Add support for FUSE_SYNCFS request with announce_submounts
  2022-02-14 13:58 [PATCH v5 0/3] virtiofsd: Add support for FUSE_SYNCFS request Greg Kurz
@ 2022-02-14 13:58 ` Greg Kurz
  2022-02-14 15:43   ` [Virtio-fs] " German Maglione
  2022-02-14 13:58 ` [PATCH v5 2/3] virtiofsd: Track submounts Greg Kurz
  2022-02-14 13:58 ` [PATCH v5 3/3] virtiofsd: Add support for FUSE_SYNCFS request without announce_submounts Greg Kurz
  2 siblings, 1 reply; 13+ messages in thread
From: Greg Kurz @ 2022-02-14 13:58 UTC (permalink / raw)
  To: qemu-devel
  Cc: Sebastian Hasler, Dr. David Alan Gilbert, Greg Kurz, virtio-fs,
	Stefan Hajnoczi, Vivek Goyal

Honor the expected behavior of syncfs() to synchronously flush all data
and metadata to disk on linux systems.

If virtiofsd is started with '-o announce_submounts', the client is
expected to send a FUSE_SYNCFS request for each individual submount.
In this case, we just create a new file descriptor on the submount
inode with lo_inode_open(), call syncfs() on it and close it. The
intermediary file is needed because O_PATH descriptors aren't
backed by an actual file and syncfs() would fail with EBADF.

If virtiofsd is started without '-o announce_submounts' or if the
client doesn't have the FUSE_CAP_SUBMOUNTS capability, the client
only sends a single FUSE_SYNCFS request for the root inode. The
server would thus need to track submounts internally and call
syncfs() on each of them. This will be implemented later.

Note that syncfs() might suffer from a time penalty if the submounts
are being hammered by some unrelated workload on the host. The only
solution to prevent that is to avoid shared mounts.

Signed-off-by: Greg Kurz <groug@kaod.org>
---
 tools/virtiofsd/fuse_lowlevel.c       | 11 +++++++
 tools/virtiofsd/fuse_lowlevel.h       | 13 ++++++++
 tools/virtiofsd/passthrough_ll.c      | 45 +++++++++++++++++++++++++++
 tools/virtiofsd/passthrough_seccomp.c |  1 +
 4 files changed, 70 insertions(+)

diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
index e4679c73abc2..e02d8b25a5f6 100644
--- a/tools/virtiofsd/fuse_lowlevel.c
+++ b/tools/virtiofsd/fuse_lowlevel.c
@@ -1876,6 +1876,16 @@ static void do_lseek(fuse_req_t req, fuse_ino_t nodeid,
     }
 }
 
+static void do_syncfs(fuse_req_t req, fuse_ino_t nodeid,
+                      struct fuse_mbuf_iter *iter)
+{
+    if (req->se->op.syncfs) {
+        req->se->op.syncfs(req, nodeid);
+    } else {
+        fuse_reply_err(req, ENOSYS);
+    }
+}
+
 static void do_init(fuse_req_t req, fuse_ino_t nodeid,
                     struct fuse_mbuf_iter *iter)
 {
@@ -2280,6 +2290,7 @@ static struct {
     [FUSE_RENAME2] = { do_rename2, "RENAME2" },
     [FUSE_COPY_FILE_RANGE] = { do_copy_file_range, "COPY_FILE_RANGE" },
     [FUSE_LSEEK] = { do_lseek, "LSEEK" },
+    [FUSE_SYNCFS] = { do_syncfs, "SYNCFS" },
 };
 
 #define FUSE_MAXOP (sizeof(fuse_ll_ops) / sizeof(fuse_ll_ops[0]))
diff --git a/tools/virtiofsd/fuse_lowlevel.h b/tools/virtiofsd/fuse_lowlevel.h
index c55c0ca2fc1c..b889dae4de0e 100644
--- a/tools/virtiofsd/fuse_lowlevel.h
+++ b/tools/virtiofsd/fuse_lowlevel.h
@@ -1226,6 +1226,19 @@ struct fuse_lowlevel_ops {
      */
     void (*lseek)(fuse_req_t req, fuse_ino_t ino, off_t off, int whence,
                   struct fuse_file_info *fi);
+
+    /**
+     * Synchronize file system content
+     *
+     * If this request is answered with an error code of ENOSYS,
+     * this is treated as success and future calls to syncfs() will
+     * succeed automatically without being sent to the filesystem
+     * process.
+     *
+     * @param req request handle
+     * @param ino the inode number
+     */
+    void (*syncfs)(fuse_req_t req, fuse_ino_t ino);
 };
 
 /**
diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index b3d0674f6d2f..2bf5d40df531 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -3357,6 +3357,50 @@ static void lo_lseek(fuse_req_t req, fuse_ino_t ino, off_t off, int whence,
     }
 }
 
+static int lo_do_syncfs(struct lo_data *lo, struct lo_inode *inode)
+{
+    int fd, ret = 0;
+
+    fuse_log(FUSE_LOG_DEBUG, "lo_do_syncfs(ino=%" PRIu64 ")\n",
+             inode->fuse_ino);
+
+    fd = lo_inode_open(lo, inode, O_RDONLY);
+    if (fd < 0) {
+        return -fd;
+    }
+
+    if (syncfs(fd) < 0) {
+        ret = errno;
+    }
+
+    close(fd);
+    return ret;
+}
+
+static void lo_syncfs(fuse_req_t req, fuse_ino_t ino)
+{
+    struct lo_data *lo = lo_data(req);
+    int err;
+
+    if (lo->announce_submounts) {
+        struct lo_inode *inode;
+
+        inode = lo_inode(req, ino);
+        if (!inode) {
+            fuse_reply_err(req, EBADF);
+            return;
+        }
+
+        err = lo_do_syncfs(lo, inode);
+        lo_inode_put(lo, &inode);
+    } else {
+        /* Requires the sever to track submounts. Not implemented yet */
+        err = ENOSYS;
+    }
+
+    fuse_reply_err(req, err);
+}
+
 static void lo_destroy(void *userdata)
 {
     struct lo_data *lo = (struct lo_data *)userdata;
@@ -3417,6 +3461,7 @@ static struct fuse_lowlevel_ops lo_oper = {
     .copy_file_range = lo_copy_file_range,
 #endif
     .lseek = lo_lseek,
+    .syncfs = lo_syncfs,
     .destroy = lo_destroy,
 };
 
diff --git a/tools/virtiofsd/passthrough_seccomp.c b/tools/virtiofsd/passthrough_seccomp.c
index a3ce9f898d2d..3e9d6181dc69 100644
--- a/tools/virtiofsd/passthrough_seccomp.c
+++ b/tools/virtiofsd/passthrough_seccomp.c
@@ -108,6 +108,7 @@ static const int syscall_allowlist[] = {
     SCMP_SYS(set_robust_list),
     SCMP_SYS(setxattr),
     SCMP_SYS(symlinkat),
+    SCMP_SYS(syncfs),
     SCMP_SYS(time), /* Rarely needed, except on static builds */
     SCMP_SYS(tgkill),
     SCMP_SYS(unlinkat),
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v5 2/3] virtiofsd: Track submounts
  2022-02-14 13:58 [PATCH v5 0/3] virtiofsd: Add support for FUSE_SYNCFS request Greg Kurz
  2022-02-14 13:58 ` [PATCH v5 1/3] virtiofsd: Add support for FUSE_SYNCFS request with announce_submounts Greg Kurz
@ 2022-02-14 13:58 ` Greg Kurz
  2022-02-14 15:43   ` Greg Kurz
  2022-02-14 13:58 ` [PATCH v5 3/3] virtiofsd: Add support for FUSE_SYNCFS request without announce_submounts Greg Kurz
  2 siblings, 1 reply; 13+ messages in thread
From: Greg Kurz @ 2022-02-14 13:58 UTC (permalink / raw)
  To: qemu-devel
  Cc: Sebastian Hasler, Dr. David Alan Gilbert, Greg Kurz, virtio-fs,
	Stefan Hajnoczi, Vivek Goyal

If
Support for FUSE_SYNCFS requires the server to track submounts

that may exist under the shared directory. lo_do_lookup() already knows
how to detect them : it is a directory with a different device ID or
mount ID than its parent. Use the same logic and record the information
under the lo_inode structure.

Signed-off-by: Greg Kurz <groug@kaod.org>
---
 tools/virtiofsd/passthrough_ll.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 2bf5d40df531..e94c4e6f8635 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -118,6 +118,7 @@ struct lo_inode {
     GHashTable *posix_locks; /* protected by lo_inode->plock_mutex */
 
     mode_t filetype;
+    bool is_submount;
 };
 
 struct lo_cred {
@@ -1017,6 +1018,7 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
     struct lo_data *lo = lo_data(req);
     struct lo_inode *inode = NULL;
     struct lo_inode *dir = lo_inode(req, parent);
+    bool is_submount;
 
     if (inodep) {
         *inodep = NULL; /* in case there is an error */
@@ -1051,8 +1053,10 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
         goto out_err;
     }
 
-    if (S_ISDIR(e->attr.st_mode) && lo->announce_submounts &&
-        (e->attr.st_dev != dir->key.dev || mnt_id != dir->key.mnt_id)) {
+    is_submount = S_ISDIR(e->attr.st_mode) &&
+        (e->attr.st_dev != dir->key.dev || mnt_id != dir->key.mnt_id);
+
+    if (is_submount && lo->announce_submounts) {
         e->attr_flags |= FUSE_ATTR_SUBMOUNT;
     }
 
@@ -1079,6 +1083,7 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
         inode->key.ino = e->attr.st_ino;
         inode->key.dev = e->attr.st_dev;
         inode->key.mnt_id = mnt_id;
+        inode->is_submount = is_submount;
         if (lo->posix_lock) {
             pthread_mutex_init(&inode->plock_mutex, NULL);
             inode->posix_locks = g_hash_table_new_full(
@@ -1100,8 +1105,9 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
 
     lo_inode_put(lo, &dir);
 
-    fuse_log(FUSE_LOG_DEBUG, "  %lli/%s -> %lli\n", (unsigned long long)parent,
-             name, (unsigned long long)e->ino);
+    fuse_log(FUSE_LOG_DEBUG, "  %lli/%s -> %lli%s\n",
+             (unsigned long long) parent, name, (unsigned long long) e->ino,
+             is_submount ? " (submount)" : "");
 
     return 0;
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v5 3/3] virtiofsd: Add support for FUSE_SYNCFS request without announce_submounts
  2022-02-14 13:58 [PATCH v5 0/3] virtiofsd: Add support for FUSE_SYNCFS request Greg Kurz
  2022-02-14 13:58 ` [PATCH v5 1/3] virtiofsd: Add support for FUSE_SYNCFS request with announce_submounts Greg Kurz
  2022-02-14 13:58 ` [PATCH v5 2/3] virtiofsd: Track submounts Greg Kurz
@ 2022-02-14 13:58 ` Greg Kurz
  2022-02-14 18:27   ` Vivek Goyal
  2 siblings, 1 reply; 13+ messages in thread
From: Greg Kurz @ 2022-02-14 13:58 UTC (permalink / raw)
  To: qemu-devel
  Cc: Sebastian Hasler, Dr. David Alan Gilbert, Greg Kurz, virtio-fs,
	Stefan Hajnoczi, Vivek Goyal

This adds the missing bits to support FUSE_SYNCFS in the case submounts
aren't announced to the client.

Iterate over all inodes and call syncfs() on the ones marked as submounts.
Since syncfs() can block for an indefinite time, we cannot call it with
lo->mutex held as it would prevent the server to process other requests.
This is thus broken down in two steps. First build a list of submounts
with lo->mutex held, drop the mutex and finally process the list. A
reference is taken on the inodes to ensure they don't go away when
lo->mutex is dropped.

Signed-off-by: Greg Kurz <groug@kaod.org>
---
 tools/virtiofsd/passthrough_ll.c | 38 ++++++++++++++++++++++++++++++--
 1 file changed, 36 insertions(+), 2 deletions(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index e94c4e6f8635..7ce944bfe2a0 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -3400,8 +3400,42 @@ static void lo_syncfs(fuse_req_t req, fuse_ino_t ino)
         err = lo_do_syncfs(lo, inode);
         lo_inode_put(lo, &inode);
     } else {
-        /* Requires the sever to track submounts. Not implemented yet */
-        err = ENOSYS;
+        g_autoptr(GSList) submount_list = NULL;
+        GSList *elem;
+        GHashTableIter iter;
+        gpointer key, value;
+
+        pthread_mutex_lock(&lo->mutex);
+
+        g_hash_table_iter_init(&iter, lo->inodes);
+        while (g_hash_table_iter_next(&iter, &key, &value)) {
+            struct lo_inode *inode = value;
+
+            if (inode->is_submount) {
+                g_atomic_int_inc(&inode->refcount);
+                submount_list = g_slist_prepend(submount_list, inode);
+            }
+        }
+
+        pthread_mutex_unlock(&lo->mutex);
+
+        /* The root inode is always present and not tracked in the hash table */
+        err = lo_do_syncfs(lo, &lo->root);
+
+        for (elem = submount_list; elem; elem = g_slist_next(elem)) {
+            struct lo_inode *inode = elem->data;
+            int r;
+
+            r = lo_do_syncfs(lo, inode);
+            if (r) {
+                /*
+                 * Try to sync as much as possible. Only one error can be
+                 * reported to the client though, arbitrarily the last one.
+                 */
+                err = r;
+            }
+            lo_inode_put(lo, &inode);
+        }
     }
 
     fuse_reply_err(req, err);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v5 2/3] virtiofsd: Track submounts
  2022-02-14 13:58 ` [PATCH v5 2/3] virtiofsd: Track submounts Greg Kurz
@ 2022-02-14 15:43   ` Greg Kurz
  0 siblings, 0 replies; 13+ messages in thread
From: Greg Kurz @ 2022-02-14 15:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: virtio-fs, Sebastian Hasler, Stefan Hajnoczi,
	Dr. David Alan Gilbert, Vivek Goyal

On Mon, 14 Feb 2022 14:58:19 +0100
Greg Kurz <groug@kaod.org> wrote:

> If
> Support for FUSE_SYNCFS requires the server to track submounts
> 
> that may exist under the shared directory. lo_do_lookup() already knows
> how to detect them : it is a directory with a different device ID or
> mount ID than its parent. Use the same logic and record the information
> under the lo_inode structure.
> 
> Signed-off-by: Greg Kurz <groug@kaod.org>
> ---

Oops I obviously did something wrong with the changelog. It should read:

---
If the server doesn't announce submounts to the client, it needs to
track them internally to properly support FUSE_SYNCFS. lo_do_lookup()
already knows how to detect them : it is a directory with a different
device ID or mount ID than its parent. Use the same logic and record
the information under the lo_inode structure.
---

I can send an updated version if needed.

>  tools/virtiofsd/passthrough_ll.c | 14 ++++++++++----
>  1 file changed, 10 insertions(+), 4 deletions(-)
> 
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index 2bf5d40df531..e94c4e6f8635 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -118,6 +118,7 @@ struct lo_inode {
>      GHashTable *posix_locks; /* protected by lo_inode->plock_mutex */
>  
>      mode_t filetype;
> +    bool is_submount;
>  };
>  
>  struct lo_cred {
> @@ -1017,6 +1018,7 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
>      struct lo_data *lo = lo_data(req);
>      struct lo_inode *inode = NULL;
>      struct lo_inode *dir = lo_inode(req, parent);
> +    bool is_submount;
>  
>      if (inodep) {
>          *inodep = NULL; /* in case there is an error */
> @@ -1051,8 +1053,10 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
>          goto out_err;
>      }
>  
> -    if (S_ISDIR(e->attr.st_mode) && lo->announce_submounts &&
> -        (e->attr.st_dev != dir->key.dev || mnt_id != dir->key.mnt_id)) {
> +    is_submount = S_ISDIR(e->attr.st_mode) &&
> +        (e->attr.st_dev != dir->key.dev || mnt_id != dir->key.mnt_id);
> +
> +    if (is_submount && lo->announce_submounts) {
>          e->attr_flags |= FUSE_ATTR_SUBMOUNT;
>      }
>  
> @@ -1079,6 +1083,7 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
>          inode->key.ino = e->attr.st_ino;
>          inode->key.dev = e->attr.st_dev;
>          inode->key.mnt_id = mnt_id;
> +        inode->is_submount = is_submount;
>          if (lo->posix_lock) {
>              pthread_mutex_init(&inode->plock_mutex, NULL);
>              inode->posix_locks = g_hash_table_new_full(
> @@ -1100,8 +1105,9 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
>  
>      lo_inode_put(lo, &dir);
>  
> -    fuse_log(FUSE_LOG_DEBUG, "  %lli/%s -> %lli\n", (unsigned long long)parent,
> -             name, (unsigned long long)e->ino);
> +    fuse_log(FUSE_LOG_DEBUG, "  %lli/%s -> %lli%s\n",
> +             (unsigned long long) parent, name, (unsigned long long) e->ino,
> +             is_submount ? " (submount)" : "");
>  
>      return 0;
>  



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Virtio-fs] [PATCH v5 1/3] virtiofsd: Add support for FUSE_SYNCFS request with announce_submounts
  2022-02-14 13:58 ` [PATCH v5 1/3] virtiofsd: Add support for FUSE_SYNCFS request with announce_submounts Greg Kurz
@ 2022-02-14 15:43   ` German Maglione
  2022-02-14 16:02     ` Greg Kurz
  0 siblings, 1 reply; 13+ messages in thread
From: German Maglione @ 2022-02-14 15:43 UTC (permalink / raw)
  To: Greg Kurz; +Cc: virtio-fs-list, qemu-devel, Vivek Goyal

[-- Attachment #1: Type: text/plain, Size: 5909 bytes --]

On Mon, Feb 14, 2022 at 3:00 PM Greg Kurz <groug@kaod.org> wrote:

> Honor the expected behavior of syncfs() to synchronously flush all data
> and metadata to disk on linux systems.
>
> If virtiofsd is started with '-o announce_submounts', the client is
> expected to send a FUSE_SYNCFS request for each individual submount.
> In this case, we just create a new file descriptor on the submount
> inode with lo_inode_open(), call syncfs() on it and close it. The
> intermediary file is needed because O_PATH descriptors aren't
> backed by an actual file and syncfs() would fail with EBADF.
>
> If virtiofsd is started without '-o announce_submounts' or if the
> client doesn't have the FUSE_CAP_SUBMOUNTS capability, the client
> only sends a single FUSE_SYNCFS request for the root inode. The
> server would thus need to track submounts internally and call
> syncfs() on each of them. This will be implemented later.
>
> Note that syncfs() might suffer from a time penalty if the submounts
> are being hammered by some unrelated workload on the host. The only
> solution to prevent that is to avoid shared mounts.
>
> Signed-off-by: Greg Kurz <groug@kaod.org>
> ---
>  tools/virtiofsd/fuse_lowlevel.c       | 11 +++++++
>  tools/virtiofsd/fuse_lowlevel.h       | 13 ++++++++
>  tools/virtiofsd/passthrough_ll.c      | 45 +++++++++++++++++++++++++++
>  tools/virtiofsd/passthrough_seccomp.c |  1 +
>  4 files changed, 70 insertions(+)
>
> diff --git a/tools/virtiofsd/fuse_lowlevel.c
> b/tools/virtiofsd/fuse_lowlevel.c
> index e4679c73abc2..e02d8b25a5f6 100644
> --- a/tools/virtiofsd/fuse_lowlevel.c
> +++ b/tools/virtiofsd/fuse_lowlevel.c
> @@ -1876,6 +1876,16 @@ static void do_lseek(fuse_req_t req, fuse_ino_t
> nodeid,
>      }
>  }
>
> +static void do_syncfs(fuse_req_t req, fuse_ino_t nodeid,
> +                      struct fuse_mbuf_iter *iter)
> +{
> +    if (req->se->op.syncfs) {
> +        req->se->op.syncfs(req, nodeid);
> +    } else {
> +        fuse_reply_err(req, ENOSYS);
> +    }
> +}
> +
>  static void do_init(fuse_req_t req, fuse_ino_t nodeid,
>                      struct fuse_mbuf_iter *iter)
>  {
> @@ -2280,6 +2290,7 @@ static struct {
>      [FUSE_RENAME2] = { do_rename2, "RENAME2" },
>      [FUSE_COPY_FILE_RANGE] = { do_copy_file_range, "COPY_FILE_RANGE" },
>      [FUSE_LSEEK] = { do_lseek, "LSEEK" },
> +    [FUSE_SYNCFS] = { do_syncfs, "SYNCFS" },
>  };
>
>  #define FUSE_MAXOP (sizeof(fuse_ll_ops) / sizeof(fuse_ll_ops[0]))
> diff --git a/tools/virtiofsd/fuse_lowlevel.h
> b/tools/virtiofsd/fuse_lowlevel.h
> index c55c0ca2fc1c..b889dae4de0e 100644
> --- a/tools/virtiofsd/fuse_lowlevel.h
> +++ b/tools/virtiofsd/fuse_lowlevel.h
> @@ -1226,6 +1226,19 @@ struct fuse_lowlevel_ops {
>       */
>      void (*lseek)(fuse_req_t req, fuse_ino_t ino, off_t off, int whence,
>                    struct fuse_file_info *fi);
> +
> +    /**
> +     * Synchronize file system content
> +     *
> +     * If this request is answered with an error code of ENOSYS,
> +     * this is treated as success and future calls to syncfs() will
> +     * succeed automatically without being sent to the filesystem
> +     * process.
> +     *
> +     * @param req request handle
> +     * @param ino the inode number
> +     */
> +    void (*syncfs)(fuse_req_t req, fuse_ino_t ino);
>  };
>
>  /**
> diff --git a/tools/virtiofsd/passthrough_ll.c
> b/tools/virtiofsd/passthrough_ll.c
> index b3d0674f6d2f..2bf5d40df531 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -3357,6 +3357,50 @@ static void lo_lseek(fuse_req_t req, fuse_ino_t
> ino, off_t off, int whence,
>      }
>  }
>
> +static int lo_do_syncfs(struct lo_data *lo, struct lo_inode *inode)
> +{
> +    int fd, ret = 0;
> +
> +    fuse_log(FUSE_LOG_DEBUG, "lo_do_syncfs(ino=%" PRIu64 ")\n",
> +             inode->fuse_ino);
> +
> +    fd = lo_inode_open(lo, inode, O_RDONLY);
> +    if (fd < 0) {
> +        return -fd;
> +    }
> +
> +    if (syncfs(fd) < 0) {
> +        ret = errno;
> +    }
> +
> +    close(fd);
> +    return ret;
> +}
> +
> +static void lo_syncfs(fuse_req_t req, fuse_ino_t ino)
> +{
> +    struct lo_data *lo = lo_data(req);
> +    int err;
> +
> +    if (lo->announce_submounts) {
> +        struct lo_inode *inode;
> +
> +        inode = lo_inode(req, ino);
> +        if (!inode) {
> +            fuse_reply_err(req, EBADF);
> +            return;
> +        }
> +
> +        err = lo_do_syncfs(lo, inode);
> +        lo_inode_put(lo, &inode);
> +    } else {
> +        /* Requires the sever to track submounts. Not implemented yet */
> +        err = ENOSYS;
> +    }
>

In the current rust version we don't check if announce_submount is enabled,
with and without it lo_syncfs do the same thing. So, if announce_submount
is not enabled, at least the root shared dir get synced.



> +
> +    fuse_reply_err(req, err);
> +}
> +
>  static void lo_destroy(void *userdata)
>  {
>      struct lo_data *lo = (struct lo_data *)userdata;
> @@ -3417,6 +3461,7 @@ static struct fuse_lowlevel_ops lo_oper = {
>      .copy_file_range = lo_copy_file_range,
>  #endif
>      .lseek = lo_lseek,
> +    .syncfs = lo_syncfs,
>      .destroy = lo_destroy,
>  };
>
> diff --git a/tools/virtiofsd/passthrough_seccomp.c
> b/tools/virtiofsd/passthrough_seccomp.c
> index a3ce9f898d2d..3e9d6181dc69 100644
> --- a/tools/virtiofsd/passthrough_seccomp.c
> +++ b/tools/virtiofsd/passthrough_seccomp.c
> @@ -108,6 +108,7 @@ static const int syscall_allowlist[] = {
>      SCMP_SYS(set_robust_list),
>      SCMP_SYS(setxattr),
>      SCMP_SYS(symlinkat),
> +    SCMP_SYS(syncfs),
>      SCMP_SYS(time), /* Rarely needed, except on static builds */
>      SCMP_SYS(tgkill),
>      SCMP_SYS(unlinkat),
> --
> 2.34.1
>
> _______________________________________________
> Virtio-fs mailing list
> Virtio-fs@redhat.com
> https://listman.redhat.com/mailman/listinfo/virtio-fs
>
>

-- 
German

[-- Attachment #2: Type: text/html, Size: 7585 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Virtio-fs] [PATCH v5 1/3] virtiofsd: Add support for FUSE_SYNCFS request with announce_submounts
  2022-02-14 15:43   ` [Virtio-fs] " German Maglione
@ 2022-02-14 16:02     ` Greg Kurz
  0 siblings, 0 replies; 13+ messages in thread
From: Greg Kurz @ 2022-02-14 16:02 UTC (permalink / raw)
  To: German Maglione; +Cc: virtio-fs-list, qemu-devel, Vivek Goyal

On Mon, 14 Feb 2022 16:43:15 +0100
German Maglione <gmaglione@redhat.com> wrote:

> On Mon, Feb 14, 2022 at 3:00 PM Greg Kurz <groug@kaod.org> wrote:
> 
> > Honor the expected behavior of syncfs() to synchronously flush all data
> > and metadata to disk on linux systems.
> >
> > If virtiofsd is started with '-o announce_submounts', the client is
> > expected to send a FUSE_SYNCFS request for each individual submount.
> > In this case, we just create a new file descriptor on the submount
> > inode with lo_inode_open(), call syncfs() on it and close it. The
> > intermediary file is needed because O_PATH descriptors aren't
> > backed by an actual file and syncfs() would fail with EBADF.
> >
> > If virtiofsd is started without '-o announce_submounts' or if the
> > client doesn't have the FUSE_CAP_SUBMOUNTS capability, the client
> > only sends a single FUSE_SYNCFS request for the root inode. The
> > server would thus need to track submounts internally and call
> > syncfs() on each of them. This will be implemented later.
> >
> > Note that syncfs() might suffer from a time penalty if the submounts
> > are being hammered by some unrelated workload on the host. The only
> > solution to prevent that is to avoid shared mounts.
> >
> > Signed-off-by: Greg Kurz <groug@kaod.org>
> > ---
> >  tools/virtiofsd/fuse_lowlevel.c       | 11 +++++++
> >  tools/virtiofsd/fuse_lowlevel.h       | 13 ++++++++
> >  tools/virtiofsd/passthrough_ll.c      | 45 +++++++++++++++++++++++++++
> >  tools/virtiofsd/passthrough_seccomp.c |  1 +
> >  4 files changed, 70 insertions(+)
> >
> > diff --git a/tools/virtiofsd/fuse_lowlevel.c
> > b/tools/virtiofsd/fuse_lowlevel.c
> > index e4679c73abc2..e02d8b25a5f6 100644
> > --- a/tools/virtiofsd/fuse_lowlevel.c
> > +++ b/tools/virtiofsd/fuse_lowlevel.c
> > @@ -1876,6 +1876,16 @@ static void do_lseek(fuse_req_t req, fuse_ino_t
> > nodeid,
> >      }
> >  }
> >
> > +static void do_syncfs(fuse_req_t req, fuse_ino_t nodeid,
> > +                      struct fuse_mbuf_iter *iter)
> > +{
> > +    if (req->se->op.syncfs) {
> > +        req->se->op.syncfs(req, nodeid);
> > +    } else {
> > +        fuse_reply_err(req, ENOSYS);
> > +    }
> > +}
> > +
> >  static void do_init(fuse_req_t req, fuse_ino_t nodeid,
> >                      struct fuse_mbuf_iter *iter)
> >  {
> > @@ -2280,6 +2290,7 @@ static struct {
> >      [FUSE_RENAME2] = { do_rename2, "RENAME2" },
> >      [FUSE_COPY_FILE_RANGE] = { do_copy_file_range, "COPY_FILE_RANGE" },
> >      [FUSE_LSEEK] = { do_lseek, "LSEEK" },
> > +    [FUSE_SYNCFS] = { do_syncfs, "SYNCFS" },
> >  };
> >
> >  #define FUSE_MAXOP (sizeof(fuse_ll_ops) / sizeof(fuse_ll_ops[0]))
> > diff --git a/tools/virtiofsd/fuse_lowlevel.h
> > b/tools/virtiofsd/fuse_lowlevel.h
> > index c55c0ca2fc1c..b889dae4de0e 100644
> > --- a/tools/virtiofsd/fuse_lowlevel.h
> > +++ b/tools/virtiofsd/fuse_lowlevel.h
> > @@ -1226,6 +1226,19 @@ struct fuse_lowlevel_ops {
> >       */
> >      void (*lseek)(fuse_req_t req, fuse_ino_t ino, off_t off, int whence,
> >                    struct fuse_file_info *fi);
> > +
> > +    /**
> > +     * Synchronize file system content
> > +     *
> > +     * If this request is answered with an error code of ENOSYS,
> > +     * this is treated as success and future calls to syncfs() will
> > +     * succeed automatically without being sent to the filesystem
> > +     * process.
> > +     *
> > +     * @param req request handle
> > +     * @param ino the inode number
> > +     */
> > +    void (*syncfs)(fuse_req_t req, fuse_ino_t ino);
> >  };
> >
> >  /**
> > diff --git a/tools/virtiofsd/passthrough_ll.c
> > b/tools/virtiofsd/passthrough_ll.c
> > index b3d0674f6d2f..2bf5d40df531 100644
> > --- a/tools/virtiofsd/passthrough_ll.c
> > +++ b/tools/virtiofsd/passthrough_ll.c
> > @@ -3357,6 +3357,50 @@ static void lo_lseek(fuse_req_t req, fuse_ino_t
> > ino, off_t off, int whence,
> >      }
> >  }
> >
> > +static int lo_do_syncfs(struct lo_data *lo, struct lo_inode *inode)
> > +{
> > +    int fd, ret = 0;
> > +
> > +    fuse_log(FUSE_LOG_DEBUG, "lo_do_syncfs(ino=%" PRIu64 ")\n",
> > +             inode->fuse_ino);
> > +
> > +    fd = lo_inode_open(lo, inode, O_RDONLY);
> > +    if (fd < 0) {
> > +        return -fd;
> > +    }
> > +
> > +    if (syncfs(fd) < 0) {
> > +        ret = errno;
> > +    }
> > +
> > +    close(fd);
> > +    return ret;
> > +}
> > +
> > +static void lo_syncfs(fuse_req_t req, fuse_ino_t ino)
> > +{
> > +    struct lo_data *lo = lo_data(req);
> > +    int err;
> > +
> > +    if (lo->announce_submounts) {
> > +        struct lo_inode *inode;
> > +
> > +        inode = lo_inode(req, ino);
> > +        if (!inode) {
> > +            fuse_reply_err(req, EBADF);
> > +            return;
> > +        }
> > +
> > +        err = lo_do_syncfs(lo, inode);
> > +        lo_inode_put(lo, &inode);
> > +    } else {
> > +        /* Requires the sever to track submounts. Not implemented yet */
> > +        err = ENOSYS;
> > +    }
> >
> 
> In the current rust version we don't check if announce_submount is enabled,
> with and without it lo_syncfs do the same thing. So, if announce_submount
> is not enabled, at least the root shared dir get synced.
> 

I hesitated to do that but since the announce_submount case is
handled in patch 3, I kept it simple. Anyway, I'm fine to post
a new version that matches what the rust implementation does.

> 
> 
> > +
> > +    fuse_reply_err(req, err);
> > +}
> > +
> >  static void lo_destroy(void *userdata)
> >  {
> >      struct lo_data *lo = (struct lo_data *)userdata;
> > @@ -3417,6 +3461,7 @@ static struct fuse_lowlevel_ops lo_oper = {
> >      .copy_file_range = lo_copy_file_range,
> >  #endif
> >      .lseek = lo_lseek,
> > +    .syncfs = lo_syncfs,
> >      .destroy = lo_destroy,
> >  };
> >
> > diff --git a/tools/virtiofsd/passthrough_seccomp.c
> > b/tools/virtiofsd/passthrough_seccomp.c
> > index a3ce9f898d2d..3e9d6181dc69 100644
> > --- a/tools/virtiofsd/passthrough_seccomp.c
> > +++ b/tools/virtiofsd/passthrough_seccomp.c
> > @@ -108,6 +108,7 @@ static const int syscall_allowlist[] = {
> >      SCMP_SYS(set_robust_list),
> >      SCMP_SYS(setxattr),
> >      SCMP_SYS(symlinkat),
> > +    SCMP_SYS(syncfs),
> >      SCMP_SYS(time), /* Rarely needed, except on static builds */
> >      SCMP_SYS(tgkill),
> >      SCMP_SYS(unlinkat),
> > --
> > 2.34.1
> >
> > _______________________________________________
> > Virtio-fs mailing list
> > Virtio-fs@redhat.com
> > https://listman.redhat.com/mailman/listinfo/virtio-fs
> >
> >
> 



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v5 3/3] virtiofsd: Add support for FUSE_SYNCFS request without announce_submounts
  2022-02-14 13:58 ` [PATCH v5 3/3] virtiofsd: Add support for FUSE_SYNCFS request without announce_submounts Greg Kurz
@ 2022-02-14 18:27   ` Vivek Goyal
  2022-02-14 18:56     ` Vivek Goyal
  2022-02-15  9:12     ` Greg Kurz
  0 siblings, 2 replies; 13+ messages in thread
From: Vivek Goyal @ 2022-02-14 18:27 UTC (permalink / raw)
  To: Greg Kurz
  Cc: virtio-fs, Sebastian Hasler, qemu-devel, Stefan Hajnoczi,
	Dr. David Alan Gilbert

On Mon, Feb 14, 2022 at 02:58:20PM +0100, Greg Kurz wrote:
> This adds the missing bits to support FUSE_SYNCFS in the case submounts
> aren't announced to the client.
> 
> Iterate over all inodes and call syncfs() on the ones marked as submounts.
> Since syncfs() can block for an indefinite time, we cannot call it with
> lo->mutex held as it would prevent the server to process other requests.
> This is thus broken down in two steps. First build a list of submounts
> with lo->mutex held, drop the mutex and finally process the list. A
> reference is taken on the inodes to ensure they don't go away when
> lo->mutex is dropped.
> 
> Signed-off-by: Greg Kurz <groug@kaod.org>
> ---
>  tools/virtiofsd/passthrough_ll.c | 38 ++++++++++++++++++++++++++++++--
>  1 file changed, 36 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index e94c4e6f8635..7ce944bfe2a0 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -3400,8 +3400,42 @@ static void lo_syncfs(fuse_req_t req, fuse_ino_t ino)
>          err = lo_do_syncfs(lo, inode);
>          lo_inode_put(lo, &inode);
>      } else {
> -        /* Requires the sever to track submounts. Not implemented yet */
> -        err = ENOSYS;
> +        g_autoptr(GSList) submount_list = NULL;
> +        GSList *elem;
> +        GHashTableIter iter;
> +        gpointer key, value;
> +
> +        pthread_mutex_lock(&lo->mutex);
> +
> +        g_hash_table_iter_init(&iter, lo->inodes);
> +        while (g_hash_table_iter_next(&iter, &key, &value)) {

Going through all the inodes sounds very inefficient. If there are large
number of inodes (say 1 million or more), and if frequent syncfs requests
are coming this can consume lot of cpu cycles.

Given C virtiofsd is slowly going away, so I don't want to be too
particular about it. But, I would have thought to put submount
inodes into another list or hash map (using mount id as key) and just
traverse through that list instead. Given number of submounts should
be small, it should be pretty quick to walk through that list.

> +            struct lo_inode *inode = value;
> +
> +            if (inode->is_submount) {
> +                g_atomic_int_inc(&inode->refcount);
> +                submount_list = g_slist_prepend(submount_list, inode);
> +            }
> +        }
> +
> +        pthread_mutex_unlock(&lo->mutex);
> +
> +        /* The root inode is always present and not tracked in the hash table */
> +        err = lo_do_syncfs(lo, &lo->root);
> +
> +        for (elem = submount_list; elem; elem = g_slist_next(elem)) {
> +            struct lo_inode *inode = elem->data;
> +            int r;
> +
> +            r = lo_do_syncfs(lo, inode);
> +            if (r) {
> +                /*
> +                 * Try to sync as much as possible. Only one error can be
> +                 * reported to the client though, arbitrarily the last one.
> +                 */
> +                err = r;
> +            }
> +            lo_inode_put(lo, &inode);
> +        }

One more minor nit. What happens if virtiofsd is processing syncfs list
and then somebody hard reboots qemu and mounts virtiofs again. That
will trigger FUSE_INIT and will call lo_destroy() first.

fuse_lowlevel.c

fuse_session_process_buf_int()
{
            fuse_log(FUSE_LOG_DEBUG, "%s: reinit\n", __func__);
            se->got_destroy = 1;
            se->got_init = 0;
            if (se->op.destroy) {
                se->op.destroy(se->userdata);
            }
}

IIUC, there is no synchronization with this path. If we are running with
thread pool enabled, it could very well happen that one thread is still
doing syncfs while other thread is executing do_init(). That sounds
like little bit of a problem. It will be good if there is a way
to either abort syncfs() or do_destroy() waits for all the previous
syncfs() to finish.

Greg, if you like, you could break down this work in two patch series.
First patch series just issues syncfs() on inode id sent with FUSE_SYNCFS.
That's easy fix and can get merged now.

And second patch series take care of above issues and will be little bit
more work.

Thanks
Vivek



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v5 3/3] virtiofsd: Add support for FUSE_SYNCFS request without announce_submounts
  2022-02-14 18:27   ` Vivek Goyal
@ 2022-02-14 18:56     ` Vivek Goyal
  2022-02-14 19:09       ` Vivek Goyal
  2022-02-15  9:12     ` Greg Kurz
  1 sibling, 1 reply; 13+ messages in thread
From: Vivek Goyal @ 2022-02-14 18:56 UTC (permalink / raw)
  To: Greg Kurz
  Cc: virtio-fs, Sebastian Hasler, qemu-devel, Stefan Hajnoczi,
	Dr. David Alan Gilbert

On Mon, Feb 14, 2022 at 01:27:22PM -0500, Vivek Goyal wrote:
> On Mon, Feb 14, 2022 at 02:58:20PM +0100, Greg Kurz wrote:
> > This adds the missing bits to support FUSE_SYNCFS in the case submounts
> > aren't announced to the client.
> > 
> > Iterate over all inodes and call syncfs() on the ones marked as submounts.
> > Since syncfs() can block for an indefinite time, we cannot call it with
> > lo->mutex held as it would prevent the server to process other requests.
> > This is thus broken down in two steps. First build a list of submounts
> > with lo->mutex held, drop the mutex and finally process the list. A
> > reference is taken on the inodes to ensure they don't go away when
> > lo->mutex is dropped.
> > 
> > Signed-off-by: Greg Kurz <groug@kaod.org>
> > ---
> >  tools/virtiofsd/passthrough_ll.c | 38 ++++++++++++++++++++++++++++++--
> >  1 file changed, 36 insertions(+), 2 deletions(-)
> > 
> > diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> > index e94c4e6f8635..7ce944bfe2a0 100644
> > --- a/tools/virtiofsd/passthrough_ll.c
> > +++ b/tools/virtiofsd/passthrough_ll.c
> > @@ -3400,8 +3400,42 @@ static void lo_syncfs(fuse_req_t req, fuse_ino_t ino)
> >          err = lo_do_syncfs(lo, inode);
> >          lo_inode_put(lo, &inode);
> >      } else {
> > -        /* Requires the sever to track submounts. Not implemented yet */
> > -        err = ENOSYS;
> > +        g_autoptr(GSList) submount_list = NULL;
> > +        GSList *elem;
> > +        GHashTableIter iter;
> > +        gpointer key, value;
> > +
> > +        pthread_mutex_lock(&lo->mutex);
> > +
> > +        g_hash_table_iter_init(&iter, lo->inodes);
> > +        while (g_hash_table_iter_next(&iter, &key, &value)) {
> 
> Going through all the inodes sounds very inefficient. If there are large
> number of inodes (say 1 million or more), and if frequent syncfs requests
> are coming this can consume lot of cpu cycles.
> 
> Given C virtiofsd is slowly going away, so I don't want to be too
> particular about it. But, I would have thought to put submount
> inodes into another list or hash map (using mount id as key) and just
> traverse through that list instead. Given number of submounts should
> be small, it should be pretty quick to walk through that list.
> 
> > +            struct lo_inode *inode = value;
> > +
> > +            if (inode->is_submount) {
> > +                g_atomic_int_inc(&inode->refcount);
> > +                submount_list = g_slist_prepend(submount_list, inode);
> > +            }
> > +        }
> > +
> > +        pthread_mutex_unlock(&lo->mutex);
> > +
> > +        /* The root inode is always present and not tracked in the hash table */
> > +        err = lo_do_syncfs(lo, &lo->root);
> > +
> > +        for (elem = submount_list; elem; elem = g_slist_next(elem)) {
> > +            struct lo_inode *inode = elem->data;
> > +            int r;
> > +
> > +            r = lo_do_syncfs(lo, inode);
> > +            if (r) {
> > +                /*
> > +                 * Try to sync as much as possible. Only one error can be
> > +                 * reported to the client though, arbitrarily the last one.
> > +                 */
> > +                err = r;
> > +            }
> > +            lo_inode_put(lo, &inode);
> > +        }
> 
> One more minor nit. What happens if virtiofsd is processing syncfs list
> and then somebody hard reboots qemu and mounts virtiofs again. That
> will trigger FUSE_INIT and will call lo_destroy() first.
> 
> fuse_lowlevel.c
> 
> fuse_session_process_buf_int()
> {
>             fuse_log(FUSE_LOG_DEBUG, "%s: reinit\n", __func__);
>             se->got_destroy = 1;
>             se->got_init = 0;
>             if (se->op.destroy) {
>                 se->op.destroy(se->userdata);
>             }
> }
> 
> IIUC, there is no synchronization with this path. If we are running with
> thread pool enabled, it could very well happen that one thread is still
> doing syncfs while other thread is executing do_init(). That sounds
> like little bit of a problem. It will be good if there is a way
> to either abort syncfs() or do_destroy() waits for all the previous
> syncfs() to finish.
> 
> Greg, if you like, you could break down this work in two patch series.
> First patch series just issues syncfs() on inode id sent with FUSE_SYNCFS.
> That's easy fix and can get merged now.

Actually I think even single "syncfs" will have synchronization issue
with do_init() upon hard reboot if we drop lo->mutex during syncfs().

Vivek

> 
> And second patch series take care of above issues and will be little bit
> more work.
> 
> Thanks
> Vivek



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v5 3/3] virtiofsd: Add support for FUSE_SYNCFS request without announce_submounts
  2022-02-14 18:56     ` Vivek Goyal
@ 2022-02-14 19:09       ` Vivek Goyal
  2022-02-15  9:18         ` Greg Kurz
  0 siblings, 1 reply; 13+ messages in thread
From: Vivek Goyal @ 2022-02-14 19:09 UTC (permalink / raw)
  To: Greg Kurz
  Cc: virtio-fs, Sebastian Hasler, qemu-devel, Stefan Hajnoczi,
	Dr. David Alan Gilbert

On Mon, Feb 14, 2022 at 01:56:08PM -0500, Vivek Goyal wrote:
> On Mon, Feb 14, 2022 at 01:27:22PM -0500, Vivek Goyal wrote:
> > On Mon, Feb 14, 2022 at 02:58:20PM +0100, Greg Kurz wrote:
> > > This adds the missing bits to support FUSE_SYNCFS in the case submounts
> > > aren't announced to the client.
> > > 
> > > Iterate over all inodes and call syncfs() on the ones marked as submounts.
> > > Since syncfs() can block for an indefinite time, we cannot call it with
> > > lo->mutex held as it would prevent the server to process other requests.
> > > This is thus broken down in two steps. First build a list of submounts
> > > with lo->mutex held, drop the mutex and finally process the list. A
> > > reference is taken on the inodes to ensure they don't go away when
> > > lo->mutex is dropped.
> > > 
> > > Signed-off-by: Greg Kurz <groug@kaod.org>
> > > ---
> > >  tools/virtiofsd/passthrough_ll.c | 38 ++++++++++++++++++++++++++++++--
> > >  1 file changed, 36 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> > > index e94c4e6f8635..7ce944bfe2a0 100644
> > > --- a/tools/virtiofsd/passthrough_ll.c
> > > +++ b/tools/virtiofsd/passthrough_ll.c
> > > @@ -3400,8 +3400,42 @@ static void lo_syncfs(fuse_req_t req, fuse_ino_t ino)
> > >          err = lo_do_syncfs(lo, inode);
> > >          lo_inode_put(lo, &inode);
> > >      } else {
> > > -        /* Requires the sever to track submounts. Not implemented yet */
> > > -        err = ENOSYS;
> > > +        g_autoptr(GSList) submount_list = NULL;
> > > +        GSList *elem;
> > > +        GHashTableIter iter;
> > > +        gpointer key, value;
> > > +
> > > +        pthread_mutex_lock(&lo->mutex);
> > > +
> > > +        g_hash_table_iter_init(&iter, lo->inodes);
> > > +        while (g_hash_table_iter_next(&iter, &key, &value)) {
> > 
> > Going through all the inodes sounds very inefficient. If there are large
> > number of inodes (say 1 million or more), and if frequent syncfs requests
> > are coming this can consume lot of cpu cycles.
> > 
> > Given C virtiofsd is slowly going away, so I don't want to be too
> > particular about it. But, I would have thought to put submount
> > inodes into another list or hash map (using mount id as key) and just
> > traverse through that list instead. Given number of submounts should
> > be small, it should be pretty quick to walk through that list.
> > 
> > > +            struct lo_inode *inode = value;
> > > +
> > > +            if (inode->is_submount) {
> > > +                g_atomic_int_inc(&inode->refcount);
> > > +                submount_list = g_slist_prepend(submount_list, inode);
> > > +            }
> > > +        }
> > > +
> > > +        pthread_mutex_unlock(&lo->mutex);
> > > +
> > > +        /* The root inode is always present and not tracked in the hash table */
> > > +        err = lo_do_syncfs(lo, &lo->root);
> > > +
> > > +        for (elem = submount_list; elem; elem = g_slist_next(elem)) {
> > > +            struct lo_inode *inode = elem->data;
> > > +            int r;
> > > +
> > > +            r = lo_do_syncfs(lo, inode);
> > > +            if (r) {
> > > +                /*
> > > +                 * Try to sync as much as possible. Only one error can be
> > > +                 * reported to the client though, arbitrarily the last one.
> > > +                 */
> > > +                err = r;
> > > +            }
> > > +            lo_inode_put(lo, &inode);
> > > +        }
> > 
> > One more minor nit. What happens if virtiofsd is processing syncfs list
> > and then somebody hard reboots qemu and mounts virtiofs again. That
> > will trigger FUSE_INIT and will call lo_destroy() first.
> > 
> > fuse_lowlevel.c
> > 
> > fuse_session_process_buf_int()
> > {
> >             fuse_log(FUSE_LOG_DEBUG, "%s: reinit\n", __func__);
> >             se->got_destroy = 1;
> >             se->got_init = 0;
> >             if (se->op.destroy) {
> >                 se->op.destroy(se->userdata);
> >             }
> > }
> > 
> > IIUC, there is no synchronization with this path. If we are running with
> > thread pool enabled, it could very well happen that one thread is still
> > doing syncfs while other thread is executing do_init(). That sounds
> > like little bit of a problem. It will be good if there is a way
> > to either abort syncfs() or do_destroy() waits for all the previous
> > syncfs() to finish.
> > 
> > Greg, if you like, you could break down this work in two patch series.
> > First patch series just issues syncfs() on inode id sent with FUSE_SYNCFS.
> > That's easy fix and can get merged now.
> 
> Actually I think even single "syncfs" will have synchronization issue
> with do_init() upon hard reboot if we drop lo->mutex during syncfs().

Actually, we have similar issues with ->fsync(). We take lo->mutex,
and then take a reference on inode. Call fsync() on this. Now it is
possible that guest hard reboots, triggers, FUSE_INIT and lo_destroy()
is called. It will take lo->mutex and drop its referene on inode.

So it looks like in extreme case a new connection can start looking
up inodes which we still have old inodes in hash table because
some thread is blocked doing operation and has not dropped its
reference.

David, do I understand it right?

We probably need to have a notion of keeping track of number of requests
which are in progress. And lo_destroy() should wait till number of
requests in progress come to zero. This will be equivalent of draining
the queues operation in virtiofs kernel driver.

Anyway, given we already have the issue w.r.t lo_destroy(), and C code
is going away, I will be fine even if you don't fix races with FUSE_INIT.

Vivek
> 
> Vivek
> 
> > 
> > And second patch series take care of above issues and will be little bit
> > more work.
> > 
> > Thanks
> > Vivek



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v5 3/3] virtiofsd: Add support for FUSE_SYNCFS request without announce_submounts
  2022-02-14 18:27   ` Vivek Goyal
  2022-02-14 18:56     ` Vivek Goyal
@ 2022-02-15  9:12     ` Greg Kurz
  1 sibling, 0 replies; 13+ messages in thread
From: Greg Kurz @ 2022-02-15  9:12 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: virtio-fs, Sebastian Hasler, qemu-devel, Stefan Hajnoczi,
	Dr. David Alan Gilbert

On Mon, 14 Feb 2022 13:27:22 -0500
Vivek Goyal <vgoyal@redhat.com> wrote:

> On Mon, Feb 14, 2022 at 02:58:20PM +0100, Greg Kurz wrote:
> > This adds the missing bits to support FUSE_SYNCFS in the case submounts
> > aren't announced to the client.
> > 
> > Iterate over all inodes and call syncfs() on the ones marked as submounts.
> > Since syncfs() can block for an indefinite time, we cannot call it with
> > lo->mutex held as it would prevent the server to process other requests.
> > This is thus broken down in two steps. First build a list of submounts
> > with lo->mutex held, drop the mutex and finally process the list. A
> > reference is taken on the inodes to ensure they don't go away when
> > lo->mutex is dropped.
> > 
> > Signed-off-by: Greg Kurz <groug@kaod.org>
> > ---
> >  tools/virtiofsd/passthrough_ll.c | 38 ++++++++++++++++++++++++++++++--
> >  1 file changed, 36 insertions(+), 2 deletions(-)
> > 
> > diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> > index e94c4e6f8635..7ce944bfe2a0 100644
> > --- a/tools/virtiofsd/passthrough_ll.c
> > +++ b/tools/virtiofsd/passthrough_ll.c
> > @@ -3400,8 +3400,42 @@ static void lo_syncfs(fuse_req_t req, fuse_ino_t ino)
> >          err = lo_do_syncfs(lo, inode);
> >          lo_inode_put(lo, &inode);
> >      } else {
> > -        /* Requires the sever to track submounts. Not implemented yet */
> > -        err = ENOSYS;
> > +        g_autoptr(GSList) submount_list = NULL;
> > +        GSList *elem;
> > +        GHashTableIter iter;
> > +        gpointer key, value;
> > +
> > +        pthread_mutex_lock(&lo->mutex);
> > +
> > +        g_hash_table_iter_init(&iter, lo->inodes);
> > +        while (g_hash_table_iter_next(&iter, &key, &value)) {
> 
> Going through all the inodes sounds very inefficient. If there are large
> number of inodes (say 1 million or more), and if frequent syncfs requests
> are coming this can consume lot of cpu cycles.
> 

Yes.

> Given C virtiofsd is slowly going away, so I don't want to be too
> particular about it. But, I would have thought to put submount
> inodes into another list or hash map (using mount id as key) and just
> traverse through that list instead. Given number of submounts should
> be small, it should be pretty quick to walk through that list.
> 

I don't think this requires a hash, but yes we could manage a list
of these so that we don't have to create the list at syncfs() time.

> > +            struct lo_inode *inode = value;
> > +
> > +            if (inode->is_submount) {
> > +                g_atomic_int_inc(&inode->refcount);
> > +                submount_list = g_slist_prepend(submount_list, inode);
> > +            }
> > +        }
> > +
> > +        pthread_mutex_unlock(&lo->mutex);
> > +
> > +        /* The root inode is always present and not tracked in the hash table */
> > +        err = lo_do_syncfs(lo, &lo->root);
> > +
> > +        for (elem = submount_list; elem; elem = g_slist_next(elem)) {
> > +            struct lo_inode *inode = elem->data;
> > +            int r;
> > +
> > +            r = lo_do_syncfs(lo, inode);
> > +            if (r) {
> > +                /*
> > +                 * Try to sync as much as possible. Only one error can be
> > +                 * reported to the client though, arbitrarily the last one.
> > +                 */
> > +                err = r;
> > +            }
> > +            lo_inode_put(lo, &inode);
> > +        }
> 
> One more minor nit. What happens if virtiofsd is processing syncfs list
> and then somebody hard reboots qemu and mounts virtiofs again. That
> will trigger FUSE_INIT and will call lo_destroy() first.
> 
> fuse_lowlevel.c
> 
> fuse_session_process_buf_int()
> {
>             fuse_log(FUSE_LOG_DEBUG, "%s: reinit\n", __func__);
>             se->got_destroy = 1;
>             se->got_init = 0;
>             if (se->op.destroy) {
>                 se->op.destroy(se->userdata);
>             }
> }
> 
> IIUC, there is no synchronization with this path. If we are running with
> thread pool enabled, it could very well happen that one thread is still
> doing syncfs while other thread is executing do_init(). That sounds
> like little bit of a problem. It will be good if there is a way
> to either abort syncfs() or do_destroy() waits for all the previous
> syncfs() to finish.
> 

Ah, this is a problem indeed but how does it differ from any other
outstanding request ? It seems that do_destroy() should drain all
of them.

> Greg, if you like, you could break down this work in two patch series.
> First patch series just issues syncfs() on inode id sent with FUSE_SYNCFS.
> That's easy fix and can get merged now.
> 

Sure. I can quickly repost patch 1 that matches what the rust
implementation is doing as suggested by German. 

> And second patch series take care of above issues and will be little bit
> more work.
> 

This might take some more time as I really only have very few cycles to
work on this.

Cheers,

--
Greg

> Thanks
> Vivek
> 



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v5 3/3] virtiofsd: Add support for FUSE_SYNCFS request without announce_submounts
  2022-02-14 19:09       ` Vivek Goyal
@ 2022-02-15  9:18         ` Greg Kurz
  2022-02-15 17:27           ` Vivek Goyal
  0 siblings, 1 reply; 13+ messages in thread
From: Greg Kurz @ 2022-02-15  9:18 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: virtio-fs, Sebastian Hasler, qemu-devel, Stefan Hajnoczi,
	Dr. David Alan Gilbert

On Mon, 14 Feb 2022 14:09:47 -0500
Vivek Goyal <vgoyal@redhat.com> wrote:

> On Mon, Feb 14, 2022 at 01:56:08PM -0500, Vivek Goyal wrote:
> > On Mon, Feb 14, 2022 at 01:27:22PM -0500, Vivek Goyal wrote:
> > > On Mon, Feb 14, 2022 at 02:58:20PM +0100, Greg Kurz wrote:
> > > > This adds the missing bits to support FUSE_SYNCFS in the case submounts
> > > > aren't announced to the client.
> > > > 
> > > > Iterate over all inodes and call syncfs() on the ones marked as submounts.
> > > > Since syncfs() can block for an indefinite time, we cannot call it with
> > > > lo->mutex held as it would prevent the server to process other requests.
> > > > This is thus broken down in two steps. First build a list of submounts
> > > > with lo->mutex held, drop the mutex and finally process the list. A
> > > > reference is taken on the inodes to ensure they don't go away when
> > > > lo->mutex is dropped.
> > > > 
> > > > Signed-off-by: Greg Kurz <groug@kaod.org>
> > > > ---
> > > >  tools/virtiofsd/passthrough_ll.c | 38 ++++++++++++++++++++++++++++++--
> > > >  1 file changed, 36 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> > > > index e94c4e6f8635..7ce944bfe2a0 100644
> > > > --- a/tools/virtiofsd/passthrough_ll.c
> > > > +++ b/tools/virtiofsd/passthrough_ll.c
> > > > @@ -3400,8 +3400,42 @@ static void lo_syncfs(fuse_req_t req, fuse_ino_t ino)
> > > >          err = lo_do_syncfs(lo, inode);
> > > >          lo_inode_put(lo, &inode);
> > > >      } else {
> > > > -        /* Requires the sever to track submounts. Not implemented yet */
> > > > -        err = ENOSYS;
> > > > +        g_autoptr(GSList) submount_list = NULL;
> > > > +        GSList *elem;
> > > > +        GHashTableIter iter;
> > > > +        gpointer key, value;
> > > > +
> > > > +        pthread_mutex_lock(&lo->mutex);
> > > > +
> > > > +        g_hash_table_iter_init(&iter, lo->inodes);
> > > > +        while (g_hash_table_iter_next(&iter, &key, &value)) {
> > > 
> > > Going through all the inodes sounds very inefficient. If there are large
> > > number of inodes (say 1 million or more), and if frequent syncfs requests
> > > are coming this can consume lot of cpu cycles.
> > > 
> > > Given C virtiofsd is slowly going away, so I don't want to be too
> > > particular about it. But, I would have thought to put submount
> > > inodes into another list or hash map (using mount id as key) and just
> > > traverse through that list instead. Given number of submounts should
> > > be small, it should be pretty quick to walk through that list.
> > > 
> > > > +            struct lo_inode *inode = value;
> > > > +
> > > > +            if (inode->is_submount) {
> > > > +                g_atomic_int_inc(&inode->refcount);
> > > > +                submount_list = g_slist_prepend(submount_list, inode);
> > > > +            }
> > > > +        }
> > > > +
> > > > +        pthread_mutex_unlock(&lo->mutex);
> > > > +
> > > > +        /* The root inode is always present and not tracked in the hash table */
> > > > +        err = lo_do_syncfs(lo, &lo->root);
> > > > +
> > > > +        for (elem = submount_list; elem; elem = g_slist_next(elem)) {
> > > > +            struct lo_inode *inode = elem->data;
> > > > +            int r;
> > > > +
> > > > +            r = lo_do_syncfs(lo, inode);
> > > > +            if (r) {
> > > > +                /*
> > > > +                 * Try to sync as much as possible. Only one error can be
> > > > +                 * reported to the client though, arbitrarily the last one.
> > > > +                 */
> > > > +                err = r;
> > > > +            }
> > > > +            lo_inode_put(lo, &inode);
> > > > +        }
> > > 
> > > One more minor nit. What happens if virtiofsd is processing syncfs list
> > > and then somebody hard reboots qemu and mounts virtiofs again. That
> > > will trigger FUSE_INIT and will call lo_destroy() first.
> > > 
> > > fuse_lowlevel.c
> > > 
> > > fuse_session_process_buf_int()
> > > {
> > >             fuse_log(FUSE_LOG_DEBUG, "%s: reinit\n", __func__);
> > >             se->got_destroy = 1;
> > >             se->got_init = 0;
> > >             if (se->op.destroy) {
> > >                 se->op.destroy(se->userdata);
> > >             }
> > > }
> > > 
> > > IIUC, there is no synchronization with this path. If we are running with
> > > thread pool enabled, it could very well happen that one thread is still
> > > doing syncfs while other thread is executing do_init(). That sounds
> > > like little bit of a problem. It will be good if there is a way
> > > to either abort syncfs() or do_destroy() waits for all the previous
> > > syncfs() to finish.
> > > 
> > > Greg, if you like, you could break down this work in two patch series.
> > > First patch series just issues syncfs() on inode id sent with FUSE_SYNCFS.
> > > That's easy fix and can get merged now.
> > 
> > Actually I think even single "syncfs" will have synchronization issue
> > with do_init() upon hard reboot if we drop lo->mutex during syncfs().
> 
> Actually, we have similar issues with ->fsync(). We take lo->mutex,
> and then take a reference on inode. Call fsync() on this. Now it is
> possible that guest hard reboots, triggers, FUSE_INIT and lo_destroy()
> is called. It will take lo->mutex and drop its referene on inode.
> 
> So it looks like in extreme case a new connection can start looking
> up inodes which we still have old inodes in hash table because
> some thread is blocked doing operation and has not dropped its
> reference.
> 
> David, do I understand it right?
> 
> We probably need to have a notion of keeping track of number of requests
> which are in progress. And lo_destroy() should wait till number of
> requests in progress come to zero. This will be equivalent of draining
> the queues operation in virtiofs kernel driver.
> 
> Anyway, given we already have the issue w.r.t lo_destroy(), and C code
> is going away, I will be fine even if you don't fix races with FUSE_INIT.
> 
> Vivek

As you pointed out, this can affect other type of requests as well, so
this would probably deserve a more generic fix than just making it
work for syncfs(). This would most likely call for cycles that I don't
have. Thanks ! ;-)

BTW, does the rust implementation have the same flaw ?

> > 
> > Vivek
> > 
> > > 
> > > And second patch series take care of above issues and will be little bit
> > > more work.
> > > 
> > > Thanks
> > > Vivek
> 



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v5 3/3] virtiofsd: Add support for FUSE_SYNCFS request without announce_submounts
  2022-02-15  9:18         ` Greg Kurz
@ 2022-02-15 17:27           ` Vivek Goyal
  0 siblings, 0 replies; 13+ messages in thread
From: Vivek Goyal @ 2022-02-15 17:27 UTC (permalink / raw)
  To: Greg Kurz
  Cc: virtio-fs, Sebastian Hasler, qemu-devel, Stefan Hajnoczi,
	Dr. David Alan Gilbert

On Tue, Feb 15, 2022 at 10:18:03AM +0100, Greg Kurz wrote:
> On Mon, 14 Feb 2022 14:09:47 -0500
> Vivek Goyal <vgoyal@redhat.com> wrote:
> 
> > On Mon, Feb 14, 2022 at 01:56:08PM -0500, Vivek Goyal wrote:
> > > On Mon, Feb 14, 2022 at 01:27:22PM -0500, Vivek Goyal wrote:
> > > > On Mon, Feb 14, 2022 at 02:58:20PM +0100, Greg Kurz wrote:
> > > > > This adds the missing bits to support FUSE_SYNCFS in the case submounts
> > > > > aren't announced to the client.
> > > > > 
> > > > > Iterate over all inodes and call syncfs() on the ones marked as submounts.
> > > > > Since syncfs() can block for an indefinite time, we cannot call it with
> > > > > lo->mutex held as it would prevent the server to process other requests.
> > > > > This is thus broken down in two steps. First build a list of submounts
> > > > > with lo->mutex held, drop the mutex and finally process the list. A
> > > > > reference is taken on the inodes to ensure they don't go away when
> > > > > lo->mutex is dropped.
> > > > > 
> > > > > Signed-off-by: Greg Kurz <groug@kaod.org>
> > > > > ---
> > > > >  tools/virtiofsd/passthrough_ll.c | 38 ++++++++++++++++++++++++++++++--
> > > > >  1 file changed, 36 insertions(+), 2 deletions(-)
> > > > > 
> > > > > diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> > > > > index e94c4e6f8635..7ce944bfe2a0 100644
> > > > > --- a/tools/virtiofsd/passthrough_ll.c
> > > > > +++ b/tools/virtiofsd/passthrough_ll.c
> > > > > @@ -3400,8 +3400,42 @@ static void lo_syncfs(fuse_req_t req, fuse_ino_t ino)
> > > > >          err = lo_do_syncfs(lo, inode);
> > > > >          lo_inode_put(lo, &inode);
> > > > >      } else {
> > > > > -        /* Requires the sever to track submounts. Not implemented yet */
> > > > > -        err = ENOSYS;
> > > > > +        g_autoptr(GSList) submount_list = NULL;
> > > > > +        GSList *elem;
> > > > > +        GHashTableIter iter;
> > > > > +        gpointer key, value;
> > > > > +
> > > > > +        pthread_mutex_lock(&lo->mutex);
> > > > > +
> > > > > +        g_hash_table_iter_init(&iter, lo->inodes);
> > > > > +        while (g_hash_table_iter_next(&iter, &key, &value)) {
> > > > 
> > > > Going through all the inodes sounds very inefficient. If there are large
> > > > number of inodes (say 1 million or more), and if frequent syncfs requests
> > > > are coming this can consume lot of cpu cycles.
> > > > 
> > > > Given C virtiofsd is slowly going away, so I don't want to be too
> > > > particular about it. But, I would have thought to put submount
> > > > inodes into another list or hash map (using mount id as key) and just
> > > > traverse through that list instead. Given number of submounts should
> > > > be small, it should be pretty quick to walk through that list.
> > > > 
> > > > > +            struct lo_inode *inode = value;
> > > > > +
> > > > > +            if (inode->is_submount) {
> > > > > +                g_atomic_int_inc(&inode->refcount);
> > > > > +                submount_list = g_slist_prepend(submount_list, inode);
> > > > > +            }
> > > > > +        }
> > > > > +
> > > > > +        pthread_mutex_unlock(&lo->mutex);
> > > > > +
> > > > > +        /* The root inode is always present and not tracked in the hash table */
> > > > > +        err = lo_do_syncfs(lo, &lo->root);
> > > > > +
> > > > > +        for (elem = submount_list; elem; elem = g_slist_next(elem)) {
> > > > > +            struct lo_inode *inode = elem->data;
> > > > > +            int r;
> > > > > +
> > > > > +            r = lo_do_syncfs(lo, inode);
> > > > > +            if (r) {
> > > > > +                /*
> > > > > +                 * Try to sync as much as possible. Only one error can be
> > > > > +                 * reported to the client though, arbitrarily the last one.
> > > > > +                 */
> > > > > +                err = r;
> > > > > +            }
> > > > > +            lo_inode_put(lo, &inode);
> > > > > +        }
> > > > 
> > > > One more minor nit. What happens if virtiofsd is processing syncfs list
> > > > and then somebody hard reboots qemu and mounts virtiofs again. That
> > > > will trigger FUSE_INIT and will call lo_destroy() first.
> > > > 
> > > > fuse_lowlevel.c
> > > > 
> > > > fuse_session_process_buf_int()
> > > > {
> > > >             fuse_log(FUSE_LOG_DEBUG, "%s: reinit\n", __func__);
> > > >             se->got_destroy = 1;
> > > >             se->got_init = 0;
> > > >             if (se->op.destroy) {
> > > >                 se->op.destroy(se->userdata);
> > > >             }
> > > > }
> > > > 
> > > > IIUC, there is no synchronization with this path. If we are running with
> > > > thread pool enabled, it could very well happen that one thread is still
> > > > doing syncfs while other thread is executing do_init(). That sounds
> > > > like little bit of a problem. It will be good if there is a way
> > > > to either abort syncfs() or do_destroy() waits for all the previous
> > > > syncfs() to finish.
> > > > 
> > > > Greg, if you like, you could break down this work in two patch series.
> > > > First patch series just issues syncfs() on inode id sent with FUSE_SYNCFS.
> > > > That's easy fix and can get merged now.
> > > 
> > > Actually I think even single "syncfs" will have synchronization issue
> > > with do_init() upon hard reboot if we drop lo->mutex during syncfs().
> > 
> > Actually, we have similar issues with ->fsync(). We take lo->mutex,
> > and then take a reference on inode. Call fsync() on this. Now it is
> > possible that guest hard reboots, triggers, FUSE_INIT and lo_destroy()
> > is called. It will take lo->mutex and drop its referene on inode.
> > 
> > So it looks like in extreme case a new connection can start looking
> > up inodes which we still have old inodes in hash table because
> > some thread is blocked doing operation and has not dropped its
> > reference.
> > 
> > David, do I understand it right?
> > 
> > We probably need to have a notion of keeping track of number of requests
> > which are in progress. And lo_destroy() should wait till number of
> > requests in progress come to zero. This will be equivalent of draining
> > the queues operation in virtiofs kernel driver.
> > 
> > Anyway, given we already have the issue w.r.t lo_destroy(), and C code
> > is going away, I will be fine even if you don't fix races with FUSE_INIT.
> > 
> > Vivek
> 
> As you pointed out, this can affect other type of requests as well, so
> this would probably deserve a more generic fix than just making it
> work for syncfs(). This would most likely call for cycles that I don't
> have. Thanks ! ;-)
> 
> BTW, does the rust implementation have the same flaw ?

I don't think Rust implementation drops any locks at all while syncfs()
is called. So next FUSE_INIT might just serialize completely and
wait for syncfs() to finish first. But don't quote me on this because
I don't understand rust virtiofsd locking well yet. It is more of a 
guess.

Vivek

> 
> > > 
> > > Vivek
> > > 
> > > > 
> > > > And second patch series take care of above issues and will be little bit
> > > > more work.
> > > > 
> > > > Thanks
> > > > Vivek
> > 
> 



^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2022-02-15 17:31 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-02-14 13:58 [PATCH v5 0/3] virtiofsd: Add support for FUSE_SYNCFS request Greg Kurz
2022-02-14 13:58 ` [PATCH v5 1/3] virtiofsd: Add support for FUSE_SYNCFS request with announce_submounts Greg Kurz
2022-02-14 15:43   ` [Virtio-fs] " German Maglione
2022-02-14 16:02     ` Greg Kurz
2022-02-14 13:58 ` [PATCH v5 2/3] virtiofsd: Track submounts Greg Kurz
2022-02-14 15:43   ` Greg Kurz
2022-02-14 13:58 ` [PATCH v5 3/3] virtiofsd: Add support for FUSE_SYNCFS request without announce_submounts Greg Kurz
2022-02-14 18:27   ` Vivek Goyal
2022-02-14 18:56     ` Vivek Goyal
2022-02-14 19:09       ` Vivek Goyal
2022-02-15  9:18         ` Greg Kurz
2022-02-15 17:27           ` Vivek Goyal
2022-02-15  9:12     ` Greg Kurz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).