qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/6] net/tap: Fix QEMU frozen issue when the maximum number of file descriptors is very large
@ 2023-06-16 15:27 Bin Meng
  2023-06-16 15:27 ` [PATCH v2 1/6] tests/tcg/cris: Fix the coding style Bin Meng
                   ` (5 more replies)
  0 siblings, 6 replies; 14+ messages in thread
From: Bin Meng @ 2023-06-16 15:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Zhangjin Wu, Claudio Imbrenda, David Hildenbrand,
	Edgar E. Iglesias, Jason Wang, Kevin Wolf, Marc-André Lureau,
	Markus Armbruster, Michael S. Tsirkin, Nikita Ivanov,
	Paolo Bonzini, Philippe Mathieu-Daudé, Thomas Huth,
	Xuzhou Cheng


Current codes using a brute-force traversal of all file descriptors
do not scale on a system where the maximum number of file descriptors
is set to a very large value (e.g.: in a Docker container of Manjaro
distribution it is set to 1073741816). QEMU just looks frozen during
start-up.

The close-on-exec flag (O_CLOEXEC) was introduced since Linux kernel
2.6.23, FreeBSD 8.3, OpenBSD 5.0, Solaris 11. While it's true QEMU
doesn't need to manually close the fds for child process as the proper
O_CLOEXEC flag should have been set properly on files with its own
codes, QEMU uses a huge number of 3rd party libraries and we don't
trust them to reliably be using O_CLOEXEC on everything they open.

Modern Linux and BSDs have the close_range() call we can use to do the
job, and on Linux we have one more way to walk through /proc/self/fd
to complete the task efficiently, which is what qemu_close_range()
does, a new API we add in utils/osdep.c.

Changes in v2:
- Change to use qemu_close_range() to close fds for child process efficiently
- v1 link: https://lore.kernel.org/qemu-devel/20230406112041.798585-1-bmeng@tinylab.org/

Bin Meng (4):
  tests/tcg/cris: Fix the coding style
  tests/tcg/cris: Correct the off-by-one error
  util/async-teardown: Fall back to close fds one by one
  utils/osdep: Introduce qemu_close_range()

Zhangjin Wu (2):
  util/async-teardown: Use qemu_close_range() to close fds
  net: tap: Use qemu_close_range() to close fds

 include/qemu/osdep.h                |  1 +
 net/tap.c                           | 23 ++++++------
 tests/tcg/cris/libc/check_openpf5.c | 57 ++++++++++++++---------------
 util/async-teardown.c               | 37 +------------------
 util/osdep.c                        | 47 ++++++++++++++++++++++++
 5 files changed, 87 insertions(+), 78 deletions(-)

-- 
2.34.1




^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2 1/6] tests/tcg/cris: Fix the coding style
  2023-06-16 15:27 [PATCH v2 0/6] net/tap: Fix QEMU frozen issue when the maximum number of file descriptors is very large Bin Meng
@ 2023-06-16 15:27 ` Bin Meng
  2023-06-19  6:55   ` Richard Henderson
  2023-06-16 15:27 ` [PATCH v2 2/6] tests/tcg/cris: Correct the off-by-one error Bin Meng
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 14+ messages in thread
From: Bin Meng @ 2023-06-16 15:27 UTC (permalink / raw)
  To: qemu-devel; +Cc: Zhangjin Wu, Edgar E. Iglesias

The code style does not conform with QEMU's. Correct it so that the
upcoming commit does not trigger checkpatch warnings.

Signed-off-by: Bin Meng <bmeng@tinylab.org>
---

(no changes since v1)

 tests/tcg/cris/libc/check_openpf5.c | 57 ++++++++++++++---------------
 1 file changed, 27 insertions(+), 30 deletions(-)

diff --git a/tests/tcg/cris/libc/check_openpf5.c b/tests/tcg/cris/libc/check_openpf5.c
index 1f86ea283d..0037fbca4c 100644
--- a/tests/tcg/cris/libc/check_openpf5.c
+++ b/tests/tcg/cris/libc/check_openpf5.c
@@ -13,44 +13,41 @@
 #include <fcntl.h>
 #include <string.h>
 
-int main (int argc, char *argv[])
+int main(int argc, char *argv[])
 {
-  int i;
-  int filemax;
+    int i;
+    int filemax;
 
 #ifdef OPEN_MAX
-  filemax = OPEN_MAX;
+    filemax = OPEN_MAX;
 #else
-  filemax = sysconf (_SC_OPEN_MAX);
+    filemax = sysconf(_SC_OPEN_MAX);
 #endif
 
-  char *fn = malloc (strlen (argv[0]) + 2);
-  if (fn == NULL)
-    abort ();
-  strcpy (fn, "/");
-  strcat (fn, argv[0]);
+    char *fn = malloc(strlen(argv[0]) + 2);
+    if (fn == NULL) {
+        abort();
+    }
+    strcpy(fn, "/");
+    strcat(fn, argv[0]);
 
-  for (i = 0; i < filemax + 1; i++)
-    {
-      if (open (fn, O_RDONLY) < 0)
-	{
-	  /* Shouldn't happen too early.  */
-	  if (i < filemax - 3 - 1)
-	    {
-	      fprintf (stderr, "i: %d\n", i);
-	      abort ();
-	    }
-	  if (errno != EMFILE)
-	    {
-	      perror ("open");
-	      abort ();
-	    }
-	  goto ok;
-	}
+    for (i = 0; i < filemax + 1; i++) {
+        if (open(fn, O_RDONLY) < 0) {
+            /* Shouldn't happen too early.  */
+            if (i < filemax - 3 - 1) {
+                fprintf(stderr, "i: %d\n", i);
+                abort();
+            }
+            if (errno != EMFILE) {
+                perror("open");
+                abort();
+            }
+        goto ok;
+        }
     }
-  abort ();
+    abort();
 
 ok:
-  printf ("pass\n");
-  exit (0);
+    printf("pass\n");
+    exit(0);
 }
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 2/6] tests/tcg/cris: Correct the off-by-one error
  2023-06-16 15:27 [PATCH v2 0/6] net/tap: Fix QEMU frozen issue when the maximum number of file descriptors is very large Bin Meng
  2023-06-16 15:27 ` [PATCH v2 1/6] tests/tcg/cris: Fix the coding style Bin Meng
@ 2023-06-16 15:27 ` Bin Meng
  2023-06-19  6:55   ` Richard Henderson
  2023-06-16 15:27 ` [PATCH v2 3/6] util/async-teardown: Fall back to close fds one by one Bin Meng
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 14+ messages in thread
From: Bin Meng @ 2023-06-16 15:27 UTC (permalink / raw)
  To: qemu-devel; +Cc: Zhangjin Wu, Edgar E. Iglesias

sysconf(_SC_OPEN_MAX) returns the maximum number of files that
a process can have open at any time, which means the fd should
not be larger than or equal to the return value.

Signed-off-by: Bin Meng <bmeng@tinylab.org>
---

(no changes since v1)

 tests/tcg/cris/libc/check_openpf5.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tests/tcg/cris/libc/check_openpf5.c b/tests/tcg/cris/libc/check_openpf5.c
index 0037fbca4c..7f585c6d37 100644
--- a/tests/tcg/cris/libc/check_openpf5.c
+++ b/tests/tcg/cris/libc/check_openpf5.c
@@ -31,10 +31,10 @@ int main(int argc, char *argv[])
     strcpy(fn, "/");
     strcat(fn, argv[0]);
 
-    for (i = 0; i < filemax + 1; i++) {
+    for (i = 0; i < filemax; i++) {
         if (open(fn, O_RDONLY) < 0) {
             /* Shouldn't happen too early.  */
-            if (i < filemax - 3 - 1) {
+            if (i < filemax - 3) {
                 fprintf(stderr, "i: %d\n", i);
                 abort();
             }
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 3/6] util/async-teardown: Fall back to close fds one by one
  2023-06-16 15:27 [PATCH v2 0/6] net/tap: Fix QEMU frozen issue when the maximum number of file descriptors is very large Bin Meng
  2023-06-16 15:27 ` [PATCH v2 1/6] tests/tcg/cris: Fix the coding style Bin Meng
  2023-06-16 15:27 ` [PATCH v2 2/6] tests/tcg/cris: Correct the off-by-one error Bin Meng
@ 2023-06-16 15:27 ` Bin Meng
  2023-06-19  6:59   ` Richard Henderson
  2023-06-16 15:27 ` [PATCH v2 4/6] utils/osdep: Introduce qemu_close_range() Bin Meng
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 14+ messages in thread
From: Bin Meng @ 2023-06-16 15:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Zhangjin Wu, Claudio Imbrenda, Markus Armbruster,
	Michael S. Tsirkin

When opening /proc/self/fd fails, current codes just return directly,
but we can fall back to close fds one by one.

Signed-off-by: Bin Meng <bmeng@tinylab.org>
---

(no changes since v1)

 util/async-teardown.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/util/async-teardown.c b/util/async-teardown.c
index 3ab19c8740..7e0177a8da 100644
--- a/util/async-teardown.c
+++ b/util/async-teardown.c
@@ -48,7 +48,11 @@ static void close_all_open_fd(void)
 
     dir = opendir("/proc/self/fd");
     if (!dir) {
-        /* If /proc is not mounted, there is nothing that can be done. */
+        /* If /proc is not mounted, close fds one by one. */
+        int open_max = sysconf(_SC_OPEN_MAX), i;
+        for (i = 0; i < open_max; i++) {
+                close(i);
+        }
         return;
     }
     /* Avoid closing the directory. */
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 4/6] utils/osdep: Introduce qemu_close_range()
  2023-06-16 15:27 [PATCH v2 0/6] net/tap: Fix QEMU frozen issue when the maximum number of file descriptors is very large Bin Meng
                   ` (2 preceding siblings ...)
  2023-06-16 15:27 ` [PATCH v2 3/6] util/async-teardown: Fall back to close fds one by one Bin Meng
@ 2023-06-16 15:27 ` Bin Meng
  2023-06-19  7:03   ` Richard Henderson
  2023-06-19  7:07   ` Richard Henderson
  2023-06-16 15:27 ` [PATCH v2 5/6] util/async-teardown: Use qemu_close_range() to close fds Bin Meng
  2023-06-16 15:27 ` [PATCH v2 6/6] net: tap: " Bin Meng
  5 siblings, 2 replies; 14+ messages in thread
From: Bin Meng @ 2023-06-16 15:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Zhangjin Wu, David Hildenbrand, Kevin Wolf,
	Marc-André Lureau, Nikita Ivanov, Paolo Bonzini, Thomas Huth,
	Xuzhou Cheng

This introduces a new QEMU API qemu_close_range() that closes all
open file descriptors from first to last (included).

This API will try a more efficient call to close_range(), or walk
through of /proc/self/fd whenever these are possible, otherwise it
falls back to a plain close loop.

Co-developed-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Bin Meng <bmeng@tinylab.org>
---

(no changes since v1)

 include/qemu/osdep.h |  1 +
 util/osdep.c         | 47 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 48 insertions(+)

diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index cc61b00ba9..e22434ce10 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -560,6 +560,7 @@ int qemu_open_old(const char *name, int flags, ...);
 int qemu_open(const char *name, int flags, Error **errp);
 int qemu_create(const char *name, int flags, mode_t mode, Error **errp);
 int qemu_close(int fd);
+int qemu_close_range(unsigned int first, unsigned int last);
 int qemu_unlink(const char *name);
 #ifndef _WIN32
 int qemu_dup_flags(int fd, int flags);
diff --git a/util/osdep.c b/util/osdep.c
index e996c4744a..fd7dd2dbdf 100644
--- a/util/osdep.c
+++ b/util/osdep.c
@@ -411,6 +411,53 @@ int qemu_close(int fd)
     return close(fd);
 }
 
+int qemu_close_range(unsigned int first, unsigned int last)
+{
+    DIR *dir = NULL;
+
+#ifdef CONFIG_CLOSE_RANGE
+    int r = close_range(first, last, 0);
+    if (!r) {
+        /* Success, no need to try other ways. */
+        return 0;
+    }
+#endif
+
+#ifdef __linux__
+    dir = opendir("/proc/self/fd");
+#endif
+    if (!dir) {
+        /*
+         * If /proc is not mounted or /proc/self/fd is not supported,
+         * try close() from first to last.
+         */
+        for (int i = first; i <= last; i++) {
+            close(i);
+        }
+
+        return 0;
+    }
+
+#ifndef _WIN32
+    /* Avoid closing the directory */
+    int dfd = dirfd(dir);
+
+    for (struct dirent *de = readdir(dir); de; de = readdir(dir)) {
+        int fd = atoi(de->d_name);
+        if (fd < first || fd > last) {
+            /* Exclude the fds outside the target range */
+            continue;
+        }
+        if (fd != dfd) {
+            close(fd);
+        }
+    }
+    closedir(dir);
+#endif /* _WIN32 */
+
+    return 0;
+}
+
 /*
  * Delete a file from the filesystem, unless the filename is /dev/fdset/...
  *
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 5/6] util/async-teardown: Use qemu_close_range() to close fds
  2023-06-16 15:27 [PATCH v2 0/6] net/tap: Fix QEMU frozen issue when the maximum number of file descriptors is very large Bin Meng
                   ` (3 preceding siblings ...)
  2023-06-16 15:27 ` [PATCH v2 4/6] utils/osdep: Introduce qemu_close_range() Bin Meng
@ 2023-06-16 15:27 ` Bin Meng
  2023-06-19  7:05   ` Richard Henderson
  2023-06-16 15:27 ` [PATCH v2 6/6] net: tap: " Bin Meng
  5 siblings, 1 reply; 14+ messages in thread
From: Bin Meng @ 2023-06-16 15:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Zhangjin Wu, Claudio Imbrenda, Michael S. Tsirkin,
	Philippe Mathieu-Daudé

From: Zhangjin Wu <falcon@tinylab.org>

Based on the old close_all_open_fd() of util/async-teardown.c, a new
generic qemu_close_range() has been added in osdep.c.

Now, let's switch over to use the generic qemu_close_range().

Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Bin Meng <bmeng@tinylab.org>
---

(no changes since v1)

 util/async-teardown.c | 41 +----------------------------------------
 1 file changed, 1 insertion(+), 40 deletions(-)

diff --git a/util/async-teardown.c b/util/async-teardown.c
index 7e0177a8da..200ec05101 100644
--- a/util/async-teardown.c
+++ b/util/async-teardown.c
@@ -29,44 +29,6 @@
 
 static pid_t the_ppid;
 
-/*
- * Close all open file descriptors.
- */
-static void close_all_open_fd(void)
-{
-    struct dirent *de;
-    int fd, dfd;
-    DIR *dir;
-
-#ifdef CONFIG_CLOSE_RANGE
-    int r = close_range(0, ~0U, 0);
-    if (!r) {
-        /* Success, no need to try other ways. */
-        return;
-    }
-#endif
-
-    dir = opendir("/proc/self/fd");
-    if (!dir) {
-        /* If /proc is not mounted, close fds one by one. */
-        int open_max = sysconf(_SC_OPEN_MAX), i;
-        for (i = 0; i < open_max; i++) {
-                close(i);
-        }
-        return;
-    }
-    /* Avoid closing the directory. */
-    dfd = dirfd(dir);
-
-    for (de = readdir(dir); de; de = readdir(dir)) {
-        fd = atoi(de->d_name);
-        if (fd != dfd) {
-            close(fd);
-        }
-    }
-    closedir(dir);
-}
-
 static void hup_handler(int signal)
 {
     /* Check every second if this process has been reparented. */
@@ -92,9 +54,8 @@ static int async_teardown_fn(void *arg)
     /*
      * Close all file descriptors that might have been inherited from the
      * main qemu process when doing clone, needed to make libvirt happy.
-     * Not using close_range for increased compatibility with older kernels.
      */
-    close_all_open_fd();
+    qemu_close_range(0, ~0U);
 
     /* Set up a handler for SIGHUP and unblock SIGHUP. */
     sigaction(SIGHUP, &sa, NULL);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 6/6] net: tap: Use qemu_close_range() to close fds
  2023-06-16 15:27 [PATCH v2 0/6] net/tap: Fix QEMU frozen issue when the maximum number of file descriptors is very large Bin Meng
                   ` (4 preceding siblings ...)
  2023-06-16 15:27 ` [PATCH v2 5/6] util/async-teardown: Use qemu_close_range() to close fds Bin Meng
@ 2023-06-16 15:27 ` Bin Meng
  2023-06-19  7:08   ` Richard Henderson
  5 siblings, 1 reply; 14+ messages in thread
From: Bin Meng @ 2023-06-16 15:27 UTC (permalink / raw)
  To: qemu-devel; +Cc: Zhangjin Wu, Jason Wang

From: Zhangjin Wu <falcon@tinylab.org>

Current codes using a brute-force traversal of all file descriptors
do not scale on a system where the maximum number of file descriptors
is set to a very large value (e.g.: in a Docker container of Manjaro
distribution it is set to 1073741816). QEMU just looks frozen during
start-up.

The close-on-exec flag (O_CLOEXEC) was introduced since Linux kernel
2.6.23, FreeBSD 8.3, OpenBSD 5.0, Solaris 11. While it's true QEMU
doesn't need to manually close the fds for child process as the proper
O_CLOEXEC flag should have been set properly on files with its own
codes, QEMU uses a huge number of 3rd party libraries and we don't
trust them to reliably be using O_CLOEXEC on everything they open.

Modern Linux and BSDs have the close_range() call we can use to do the
job, and on Linux we have one more way to walk through /proc/self/fd
to complete the task efficiently, which is what qemu_close_range() does.

Reported-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Bin Meng <bmeng@tinylab.org>

---

Changes in v2:
- Change to use qemu_close_range() to close fds for child process efficiently
- v1 link: https://lore.kernel.org/qemu-devel/20230406112041.798585-1-bmeng@tinylab.org/

 net/tap.c | 23 +++++++++++------------
 1 file changed, 11 insertions(+), 12 deletions(-)

diff --git a/net/tap.c b/net/tap.c
index 1bf085d422..d482fabdff 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -446,13 +446,13 @@ static void launch_script(const char *setup_script, const char *ifname,
         return;
     }
     if (pid == 0) {
-        int open_max = sysconf(_SC_OPEN_MAX), i;
+        unsigned int last_fd = sysconf(_SC_OPEN_MAX) - 1;
+
+        /* skip stdin, stdout and stderr */
+        qemu_close_range(3, fd - 1);
+        /* skip the currently used fd */
+        qemu_close_range(fd + 1, last_fd);
 
-        for (i = 3; i < open_max; i++) {
-            if (i != fd) {
-                close(i);
-            }
-        }
         parg = args;
         *parg++ = (char *)setup_script;
         *parg++ = (char *)ifname;
@@ -536,16 +536,15 @@ static int net_bridge_run_helper(const char *helper, const char *bridge,
         return -1;
     }
     if (pid == 0) {
-        int open_max = sysconf(_SC_OPEN_MAX), i;
+        unsigned int last_fd = sysconf(_SC_OPEN_MAX) - 1, fd = sv[1];
         char *fd_buf = NULL;
         char *br_buf = NULL;
         char *helper_cmd = NULL;
 
-        for (i = 3; i < open_max; i++) {
-            if (i != sv[1]) {
-                close(i);
-            }
-        }
+        /* skip stdin, stdout and stderr */
+        qemu_close_range(3, fd - 1);
+        /* skip the currently used fd */
+        qemu_close_range(fd + 1, last_fd);
 
         fd_buf = g_strdup_printf("%s%d", "--fd=", sv[1]);
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/6] tests/tcg/cris: Fix the coding style
  2023-06-16 15:27 ` [PATCH v2 1/6] tests/tcg/cris: Fix the coding style Bin Meng
@ 2023-06-19  6:55   ` Richard Henderson
  0 siblings, 0 replies; 14+ messages in thread
From: Richard Henderson @ 2023-06-19  6:55 UTC (permalink / raw)
  To: Bin Meng, qemu-devel; +Cc: Zhangjin Wu, Edgar E. Iglesias

On 6/16/23 17:27, Bin Meng wrote:
> The code style does not conform with QEMU's. Correct it so that the
> upcoming commit does not trigger checkpatch warnings.
> 
> Signed-off-by: Bin Meng<bmeng@tinylab.org>
> ---
> 
> (no changes since v1)
> 
>   tests/tcg/cris/libc/check_openpf5.c | 57 ++++++++++++++---------------
>   1 file changed, 27 insertions(+), 30 deletions(-)

Acked-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/6] tests/tcg/cris: Correct the off-by-one error
  2023-06-16 15:27 ` [PATCH v2 2/6] tests/tcg/cris: Correct the off-by-one error Bin Meng
@ 2023-06-19  6:55   ` Richard Henderson
  0 siblings, 0 replies; 14+ messages in thread
From: Richard Henderson @ 2023-06-19  6:55 UTC (permalink / raw)
  To: Bin Meng, qemu-devel; +Cc: Zhangjin Wu, Edgar E. Iglesias

On 6/16/23 17:27, Bin Meng wrote:
> sysconf(_SC_OPEN_MAX) returns the maximum number of files that
> a process can have open at any time, which means the fd should
> not be larger than or equal to the return value.
> 
> Signed-off-by: Bin Meng<bmeng@tinylab.org>
> ---
> 
> (no changes since v1)
> 
>   tests/tcg/cris/libc/check_openpf5.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)

Acked-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 3/6] util/async-teardown: Fall back to close fds one by one
  2023-06-16 15:27 ` [PATCH v2 3/6] util/async-teardown: Fall back to close fds one by one Bin Meng
@ 2023-06-19  6:59   ` Richard Henderson
  0 siblings, 0 replies; 14+ messages in thread
From: Richard Henderson @ 2023-06-19  6:59 UTC (permalink / raw)
  To: Bin Meng, qemu-devel
  Cc: Zhangjin Wu, Claudio Imbrenda, Markus Armbruster,
	Michael S. Tsirkin

On 6/16/23 17:27, Bin Meng wrote:
> When opening /proc/self/fd fails, current codes just return directly,
> but we can fall back to close fds one by one.
> 
> Signed-off-by: Bin Meng <bmeng@tinylab.org>
> ---
> 
> (no changes since v1)
> 
>   util/async-teardown.c | 6 +++++-
>   1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/util/async-teardown.c b/util/async-teardown.c
> index 3ab19c8740..7e0177a8da 100644
> --- a/util/async-teardown.c
> +++ b/util/async-teardown.c
> @@ -48,7 +48,11 @@ static void close_all_open_fd(void)
>   
>       dir = opendir("/proc/self/fd");
>       if (!dir) {
> -        /* If /proc is not mounted, there is nothing that can be done. */
> +        /* If /proc is not mounted, close fds one by one. */
> +        int open_max = sysconf(_SC_OPEN_MAX), i;
> +        for (i = 0; i < open_max; i++) {
> +                close(i);
> +        }
>           return;
>       }
>       /* Avoid closing the directory. */

Do we really need to make the 1M close calls?
The process is on its way to exit anyway...


r~


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 4/6] utils/osdep: Introduce qemu_close_range()
  2023-06-16 15:27 ` [PATCH v2 4/6] utils/osdep: Introduce qemu_close_range() Bin Meng
@ 2023-06-19  7:03   ` Richard Henderson
  2023-06-19  7:07   ` Richard Henderson
  1 sibling, 0 replies; 14+ messages in thread
From: Richard Henderson @ 2023-06-19  7:03 UTC (permalink / raw)
  To: Bin Meng, qemu-devel
  Cc: Zhangjin Wu, David Hildenbrand, Kevin Wolf,
	Marc-André Lureau, Nikita Ivanov, Paolo Bonzini, Thomas Huth,
	Xuzhou Cheng

On 6/16/23 17:27, Bin Meng wrote:
> This introduces a new QEMU API qemu_close_range() that closes all
> open file descriptors from first to last (included).
> 
> This API will try a more efficient call to close_range(), or walk
> through of /proc/self/fd whenever these are possible, otherwise it
> falls back to a plain close loop.
> 
> Co-developed-by: Zhangjin Wu <falcon@tinylab.org>
> Signed-off-by: Bin Meng <bmeng@tinylab.org>
> ---
> 
> (no changes since v1)
> 
>   include/qemu/osdep.h |  1 +
>   util/osdep.c         | 47 ++++++++++++++++++++++++++++++++++++++++++++
>   2 files changed, 48 insertions(+)
> 
> diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
> index cc61b00ba9..e22434ce10 100644
> --- a/include/qemu/osdep.h
> +++ b/include/qemu/osdep.h
> @@ -560,6 +560,7 @@ int qemu_open_old(const char *name, int flags, ...);
>   int qemu_open(const char *name, int flags, Error **errp);
>   int qemu_create(const char *name, int flags, mode_t mode, Error **errp);
>   int qemu_close(int fd);
> +int qemu_close_range(unsigned int first, unsigned int last);
>   int qemu_unlink(const char *name);
>   #ifndef _WIN32
>   int qemu_dup_flags(int fd, int flags);
> diff --git a/util/osdep.c b/util/osdep.c
> index e996c4744a..fd7dd2dbdf 100644
> --- a/util/osdep.c
> +++ b/util/osdep.c
> @@ -411,6 +411,53 @@ int qemu_close(int fd)
>       return close(fd);
>   }
>   
> +int qemu_close_range(unsigned int first, unsigned int last)
> +{
> +    DIR *dir = NULL;
> +
> +#ifdef CONFIG_CLOSE_RANGE
> +    int r = close_range(first, last, 0);
> +    if (!r) {
> +        /* Success, no need to try other ways. */
> +        return 0;
> +    }
> +#endif
> +
> +#ifdef __linux__
> +    dir = opendir("/proc/self/fd");
> +#endif
> +    if (!dir) {
> +        /*
> +         * If /proc is not mounted or /proc/self/fd is not supported,
> +         * try close() from first to last.
> +         */
> +        for (int i = first; i <= last; i++) {
> +            close(i);
> +        }
> +
> +        return 0;
> +    }
> +
> +#ifndef _WIN32
> +    /* Avoid closing the directory */
> +    int dfd = dirfd(dir);
> +
> +    for (struct dirent *de = readdir(dir); de; de = readdir(dir)) {
> +        int fd = atoi(de->d_name);
> +        if (fd < first || fd > last) {
> +            /* Exclude the fds outside the target range */
> +            continue;
> +        }
> +        if (fd != dfd) {
> +            close(fd);
> +        }
> +    }
> +    closedir(dir);
> +#endif /* _WIN32 */

Poor ordering of ifdefs.

#ifdef __linux__
     DIR *dir = opendir("...");
     if (dir) {
         int dfd = ...
         loop
         closedir(dir);
         return;
     }
#endif

since the first ifdef is the only way dir can ever be non-null.

> +        for (int i = first; i <= last; i++) {

typeof(i) != typeof(first).


r~


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 5/6] util/async-teardown: Use qemu_close_range() to close fds
  2023-06-16 15:27 ` [PATCH v2 5/6] util/async-teardown: Use qemu_close_range() to close fds Bin Meng
@ 2023-06-19  7:05   ` Richard Henderson
  0 siblings, 0 replies; 14+ messages in thread
From: Richard Henderson @ 2023-06-19  7:05 UTC (permalink / raw)
  To: Bin Meng, qemu-devel
  Cc: Zhangjin Wu, Claudio Imbrenda, Michael S. Tsirkin,
	Philippe Mathieu-Daudé

On 6/16/23 17:27, Bin Meng wrote:
> +    qemu_close_range(0, ~0U);

This is clearly assuming that close_range exists.

Should qemu_close_range attempt to lower last if >= _SC_OPEN_MAX?
Or better to use that here.


r~


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 4/6] utils/osdep: Introduce qemu_close_range()
  2023-06-16 15:27 ` [PATCH v2 4/6] utils/osdep: Introduce qemu_close_range() Bin Meng
  2023-06-19  7:03   ` Richard Henderson
@ 2023-06-19  7:07   ` Richard Henderson
  1 sibling, 0 replies; 14+ messages in thread
From: Richard Henderson @ 2023-06-19  7:07 UTC (permalink / raw)
  To: Bin Meng, qemu-devel
  Cc: Zhangjin Wu, David Hildenbrand, Kevin Wolf,
	Marc-André Lureau, Nikita Ivanov, Paolo Bonzini, Thomas Huth,
	Xuzhou Cheng

On 6/16/23 17:27, Bin Meng wrote:
> +int qemu_close_range(unsigned int first, unsigned int last)
> +{
> +    DIR *dir = NULL;
> +
> +#ifdef CONFIG_CLOSE_RANGE
> +    int r = close_range(first, last, 0);
> +    if (!r) {
> +        /* Success, no need to try other ways. */
> +        return 0;
> +    }
> +#endif

What about first > last?

close_range will yield EINVAL, and your other fallbacks will burn lots of cpu.


r~


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 6/6] net: tap: Use qemu_close_range() to close fds
  2023-06-16 15:27 ` [PATCH v2 6/6] net: tap: " Bin Meng
@ 2023-06-19  7:08   ` Richard Henderson
  0 siblings, 0 replies; 14+ messages in thread
From: Richard Henderson @ 2023-06-19  7:08 UTC (permalink / raw)
  To: Bin Meng, qemu-devel; +Cc: Zhangjin Wu, Jason Wang

On 6/16/23 17:27, Bin Meng wrote:
> From: Zhangjin Wu<falcon@tinylab.org>
> 
> Current codes using a brute-force traversal of all file descriptors
> do not scale on a system where the maximum number of file descriptors
> is set to a very large value (e.g.: in a Docker container of Manjaro
> distribution it is set to 1073741816). QEMU just looks frozen during
> start-up.
> 
> The close-on-exec flag (O_CLOEXEC) was introduced since Linux kernel
> 2.6.23, FreeBSD 8.3, OpenBSD 5.0, Solaris 11. While it's true QEMU
> doesn't need to manually close the fds for child process as the proper
> O_CLOEXEC flag should have been set properly on files with its own
> codes, QEMU uses a huge number of 3rd party libraries and we don't
> trust them to reliably be using O_CLOEXEC on everything they open.
> 
> Modern Linux and BSDs have the close_range() call we can use to do the
> job, and on Linux we have one more way to walk through /proc/self/fd
> to complete the task efficiently, which is what qemu_close_range() does.
> 
> Reported-by: Zhangjin Wu<falcon@tinylab.org>
> Signed-off-by: Zhangjin Wu<falcon@tinylab.org>
> Signed-off-by: Bin Meng<bmeng@tinylab.org>
> 
> ---
> 
> Changes in v2:
> - Change to use qemu_close_range() to close fds for child process efficiently
> - v1 link:https://lore.kernel.org/qemu-devel/20230406112041.798585-1-bmeng@tinylab.org/
> 
>   net/tap.c | 23 +++++++++++------------
>   1 file changed, 11 insertions(+), 12 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2023-06-19  7:09 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-06-16 15:27 [PATCH v2 0/6] net/tap: Fix QEMU frozen issue when the maximum number of file descriptors is very large Bin Meng
2023-06-16 15:27 ` [PATCH v2 1/6] tests/tcg/cris: Fix the coding style Bin Meng
2023-06-19  6:55   ` Richard Henderson
2023-06-16 15:27 ` [PATCH v2 2/6] tests/tcg/cris: Correct the off-by-one error Bin Meng
2023-06-19  6:55   ` Richard Henderson
2023-06-16 15:27 ` [PATCH v2 3/6] util/async-teardown: Fall back to close fds one by one Bin Meng
2023-06-19  6:59   ` Richard Henderson
2023-06-16 15:27 ` [PATCH v2 4/6] utils/osdep: Introduce qemu_close_range() Bin Meng
2023-06-19  7:03   ` Richard Henderson
2023-06-19  7:07   ` Richard Henderson
2023-06-16 15:27 ` [PATCH v2 5/6] util/async-teardown: Use qemu_close_range() to close fds Bin Meng
2023-06-19  7:05   ` Richard Henderson
2023-06-16 15:27 ` [PATCH v2 6/6] net: tap: " Bin Meng
2023-06-19  7:08   ` Richard Henderson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).