* [PATCH v2 0/6] net/tap: Fix QEMU frozen issue when the maximum number of file descriptors is very large
@ 2023-06-16 15:27 Bin Meng
2023-06-16 15:27 ` [PATCH v2 1/6] tests/tcg/cris: Fix the coding style Bin Meng
` (5 more replies)
0 siblings, 6 replies; 14+ messages in thread
From: Bin Meng @ 2023-06-16 15:27 UTC (permalink / raw)
To: qemu-devel
Cc: Zhangjin Wu, Claudio Imbrenda, David Hildenbrand,
Edgar E. Iglesias, Jason Wang, Kevin Wolf, Marc-André Lureau,
Markus Armbruster, Michael S. Tsirkin, Nikita Ivanov,
Paolo Bonzini, Philippe Mathieu-Daudé, Thomas Huth,
Xuzhou Cheng
Current codes using a brute-force traversal of all file descriptors
do not scale on a system where the maximum number of file descriptors
is set to a very large value (e.g.: in a Docker container of Manjaro
distribution it is set to 1073741816). QEMU just looks frozen during
start-up.
The close-on-exec flag (O_CLOEXEC) was introduced since Linux kernel
2.6.23, FreeBSD 8.3, OpenBSD 5.0, Solaris 11. While it's true QEMU
doesn't need to manually close the fds for child process as the proper
O_CLOEXEC flag should have been set properly on files with its own
codes, QEMU uses a huge number of 3rd party libraries and we don't
trust them to reliably be using O_CLOEXEC on everything they open.
Modern Linux and BSDs have the close_range() call we can use to do the
job, and on Linux we have one more way to walk through /proc/self/fd
to complete the task efficiently, which is what qemu_close_range()
does, a new API we add in utils/osdep.c.
Changes in v2:
- Change to use qemu_close_range() to close fds for child process efficiently
- v1 link: https://lore.kernel.org/qemu-devel/20230406112041.798585-1-bmeng@tinylab.org/
Bin Meng (4):
tests/tcg/cris: Fix the coding style
tests/tcg/cris: Correct the off-by-one error
util/async-teardown: Fall back to close fds one by one
utils/osdep: Introduce qemu_close_range()
Zhangjin Wu (2):
util/async-teardown: Use qemu_close_range() to close fds
net: tap: Use qemu_close_range() to close fds
include/qemu/osdep.h | 1 +
net/tap.c | 23 ++++++------
tests/tcg/cris/libc/check_openpf5.c | 57 ++++++++++++++---------------
util/async-teardown.c | 37 +------------------
util/osdep.c | 47 ++++++++++++++++++++++++
5 files changed, 87 insertions(+), 78 deletions(-)
--
2.34.1
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v2 1/6] tests/tcg/cris: Fix the coding style
2023-06-16 15:27 [PATCH v2 0/6] net/tap: Fix QEMU frozen issue when the maximum number of file descriptors is very large Bin Meng
@ 2023-06-16 15:27 ` Bin Meng
2023-06-19 6:55 ` Richard Henderson
2023-06-16 15:27 ` [PATCH v2 2/6] tests/tcg/cris: Correct the off-by-one error Bin Meng
` (4 subsequent siblings)
5 siblings, 1 reply; 14+ messages in thread
From: Bin Meng @ 2023-06-16 15:27 UTC (permalink / raw)
To: qemu-devel; +Cc: Zhangjin Wu, Edgar E. Iglesias
The code style does not conform with QEMU's. Correct it so that the
upcoming commit does not trigger checkpatch warnings.
Signed-off-by: Bin Meng <bmeng@tinylab.org>
---
(no changes since v1)
tests/tcg/cris/libc/check_openpf5.c | 57 ++++++++++++++---------------
1 file changed, 27 insertions(+), 30 deletions(-)
diff --git a/tests/tcg/cris/libc/check_openpf5.c b/tests/tcg/cris/libc/check_openpf5.c
index 1f86ea283d..0037fbca4c 100644
--- a/tests/tcg/cris/libc/check_openpf5.c
+++ b/tests/tcg/cris/libc/check_openpf5.c
@@ -13,44 +13,41 @@
#include <fcntl.h>
#include <string.h>
-int main (int argc, char *argv[])
+int main(int argc, char *argv[])
{
- int i;
- int filemax;
+ int i;
+ int filemax;
#ifdef OPEN_MAX
- filemax = OPEN_MAX;
+ filemax = OPEN_MAX;
#else
- filemax = sysconf (_SC_OPEN_MAX);
+ filemax = sysconf(_SC_OPEN_MAX);
#endif
- char *fn = malloc (strlen (argv[0]) + 2);
- if (fn == NULL)
- abort ();
- strcpy (fn, "/");
- strcat (fn, argv[0]);
+ char *fn = malloc(strlen(argv[0]) + 2);
+ if (fn == NULL) {
+ abort();
+ }
+ strcpy(fn, "/");
+ strcat(fn, argv[0]);
- for (i = 0; i < filemax + 1; i++)
- {
- if (open (fn, O_RDONLY) < 0)
- {
- /* Shouldn't happen too early. */
- if (i < filemax - 3 - 1)
- {
- fprintf (stderr, "i: %d\n", i);
- abort ();
- }
- if (errno != EMFILE)
- {
- perror ("open");
- abort ();
- }
- goto ok;
- }
+ for (i = 0; i < filemax + 1; i++) {
+ if (open(fn, O_RDONLY) < 0) {
+ /* Shouldn't happen too early. */
+ if (i < filemax - 3 - 1) {
+ fprintf(stderr, "i: %d\n", i);
+ abort();
+ }
+ if (errno != EMFILE) {
+ perror("open");
+ abort();
+ }
+ goto ok;
+ }
}
- abort ();
+ abort();
ok:
- printf ("pass\n");
- exit (0);
+ printf("pass\n");
+ exit(0);
}
--
2.34.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v2 2/6] tests/tcg/cris: Correct the off-by-one error
2023-06-16 15:27 [PATCH v2 0/6] net/tap: Fix QEMU frozen issue when the maximum number of file descriptors is very large Bin Meng
2023-06-16 15:27 ` [PATCH v2 1/6] tests/tcg/cris: Fix the coding style Bin Meng
@ 2023-06-16 15:27 ` Bin Meng
2023-06-19 6:55 ` Richard Henderson
2023-06-16 15:27 ` [PATCH v2 3/6] util/async-teardown: Fall back to close fds one by one Bin Meng
` (3 subsequent siblings)
5 siblings, 1 reply; 14+ messages in thread
From: Bin Meng @ 2023-06-16 15:27 UTC (permalink / raw)
To: qemu-devel; +Cc: Zhangjin Wu, Edgar E. Iglesias
sysconf(_SC_OPEN_MAX) returns the maximum number of files that
a process can have open at any time, which means the fd should
not be larger than or equal to the return value.
Signed-off-by: Bin Meng <bmeng@tinylab.org>
---
(no changes since v1)
tests/tcg/cris/libc/check_openpf5.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tests/tcg/cris/libc/check_openpf5.c b/tests/tcg/cris/libc/check_openpf5.c
index 0037fbca4c..7f585c6d37 100644
--- a/tests/tcg/cris/libc/check_openpf5.c
+++ b/tests/tcg/cris/libc/check_openpf5.c
@@ -31,10 +31,10 @@ int main(int argc, char *argv[])
strcpy(fn, "/");
strcat(fn, argv[0]);
- for (i = 0; i < filemax + 1; i++) {
+ for (i = 0; i < filemax; i++) {
if (open(fn, O_RDONLY) < 0) {
/* Shouldn't happen too early. */
- if (i < filemax - 3 - 1) {
+ if (i < filemax - 3) {
fprintf(stderr, "i: %d\n", i);
abort();
}
--
2.34.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v2 3/6] util/async-teardown: Fall back to close fds one by one
2023-06-16 15:27 [PATCH v2 0/6] net/tap: Fix QEMU frozen issue when the maximum number of file descriptors is very large Bin Meng
2023-06-16 15:27 ` [PATCH v2 1/6] tests/tcg/cris: Fix the coding style Bin Meng
2023-06-16 15:27 ` [PATCH v2 2/6] tests/tcg/cris: Correct the off-by-one error Bin Meng
@ 2023-06-16 15:27 ` Bin Meng
2023-06-19 6:59 ` Richard Henderson
2023-06-16 15:27 ` [PATCH v2 4/6] utils/osdep: Introduce qemu_close_range() Bin Meng
` (2 subsequent siblings)
5 siblings, 1 reply; 14+ messages in thread
From: Bin Meng @ 2023-06-16 15:27 UTC (permalink / raw)
To: qemu-devel
Cc: Zhangjin Wu, Claudio Imbrenda, Markus Armbruster,
Michael S. Tsirkin
When opening /proc/self/fd fails, current codes just return directly,
but we can fall back to close fds one by one.
Signed-off-by: Bin Meng <bmeng@tinylab.org>
---
(no changes since v1)
util/async-teardown.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/util/async-teardown.c b/util/async-teardown.c
index 3ab19c8740..7e0177a8da 100644
--- a/util/async-teardown.c
+++ b/util/async-teardown.c
@@ -48,7 +48,11 @@ static void close_all_open_fd(void)
dir = opendir("/proc/self/fd");
if (!dir) {
- /* If /proc is not mounted, there is nothing that can be done. */
+ /* If /proc is not mounted, close fds one by one. */
+ int open_max = sysconf(_SC_OPEN_MAX), i;
+ for (i = 0; i < open_max; i++) {
+ close(i);
+ }
return;
}
/* Avoid closing the directory. */
--
2.34.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v2 4/6] utils/osdep: Introduce qemu_close_range()
2023-06-16 15:27 [PATCH v2 0/6] net/tap: Fix QEMU frozen issue when the maximum number of file descriptors is very large Bin Meng
` (2 preceding siblings ...)
2023-06-16 15:27 ` [PATCH v2 3/6] util/async-teardown: Fall back to close fds one by one Bin Meng
@ 2023-06-16 15:27 ` Bin Meng
2023-06-19 7:03 ` Richard Henderson
2023-06-19 7:07 ` Richard Henderson
2023-06-16 15:27 ` [PATCH v2 5/6] util/async-teardown: Use qemu_close_range() to close fds Bin Meng
2023-06-16 15:27 ` [PATCH v2 6/6] net: tap: " Bin Meng
5 siblings, 2 replies; 14+ messages in thread
From: Bin Meng @ 2023-06-16 15:27 UTC (permalink / raw)
To: qemu-devel
Cc: Zhangjin Wu, David Hildenbrand, Kevin Wolf,
Marc-André Lureau, Nikita Ivanov, Paolo Bonzini, Thomas Huth,
Xuzhou Cheng
This introduces a new QEMU API qemu_close_range() that closes all
open file descriptors from first to last (included).
This API will try a more efficient call to close_range(), or walk
through of /proc/self/fd whenever these are possible, otherwise it
falls back to a plain close loop.
Co-developed-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Bin Meng <bmeng@tinylab.org>
---
(no changes since v1)
include/qemu/osdep.h | 1 +
util/osdep.c | 47 ++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 48 insertions(+)
diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index cc61b00ba9..e22434ce10 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -560,6 +560,7 @@ int qemu_open_old(const char *name, int flags, ...);
int qemu_open(const char *name, int flags, Error **errp);
int qemu_create(const char *name, int flags, mode_t mode, Error **errp);
int qemu_close(int fd);
+int qemu_close_range(unsigned int first, unsigned int last);
int qemu_unlink(const char *name);
#ifndef _WIN32
int qemu_dup_flags(int fd, int flags);
diff --git a/util/osdep.c b/util/osdep.c
index e996c4744a..fd7dd2dbdf 100644
--- a/util/osdep.c
+++ b/util/osdep.c
@@ -411,6 +411,53 @@ int qemu_close(int fd)
return close(fd);
}
+int qemu_close_range(unsigned int first, unsigned int last)
+{
+ DIR *dir = NULL;
+
+#ifdef CONFIG_CLOSE_RANGE
+ int r = close_range(first, last, 0);
+ if (!r) {
+ /* Success, no need to try other ways. */
+ return 0;
+ }
+#endif
+
+#ifdef __linux__
+ dir = opendir("/proc/self/fd");
+#endif
+ if (!dir) {
+ /*
+ * If /proc is not mounted or /proc/self/fd is not supported,
+ * try close() from first to last.
+ */
+ for (int i = first; i <= last; i++) {
+ close(i);
+ }
+
+ return 0;
+ }
+
+#ifndef _WIN32
+ /* Avoid closing the directory */
+ int dfd = dirfd(dir);
+
+ for (struct dirent *de = readdir(dir); de; de = readdir(dir)) {
+ int fd = atoi(de->d_name);
+ if (fd < first || fd > last) {
+ /* Exclude the fds outside the target range */
+ continue;
+ }
+ if (fd != dfd) {
+ close(fd);
+ }
+ }
+ closedir(dir);
+#endif /* _WIN32 */
+
+ return 0;
+}
+
/*
* Delete a file from the filesystem, unless the filename is /dev/fdset/...
*
--
2.34.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v2 5/6] util/async-teardown: Use qemu_close_range() to close fds
2023-06-16 15:27 [PATCH v2 0/6] net/tap: Fix QEMU frozen issue when the maximum number of file descriptors is very large Bin Meng
` (3 preceding siblings ...)
2023-06-16 15:27 ` [PATCH v2 4/6] utils/osdep: Introduce qemu_close_range() Bin Meng
@ 2023-06-16 15:27 ` Bin Meng
2023-06-19 7:05 ` Richard Henderson
2023-06-16 15:27 ` [PATCH v2 6/6] net: tap: " Bin Meng
5 siblings, 1 reply; 14+ messages in thread
From: Bin Meng @ 2023-06-16 15:27 UTC (permalink / raw)
To: qemu-devel
Cc: Zhangjin Wu, Claudio Imbrenda, Michael S. Tsirkin,
Philippe Mathieu-Daudé
From: Zhangjin Wu <falcon@tinylab.org>
Based on the old close_all_open_fd() of util/async-teardown.c, a new
generic qemu_close_range() has been added in osdep.c.
Now, let's switch over to use the generic qemu_close_range().
Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Bin Meng <bmeng@tinylab.org>
---
(no changes since v1)
util/async-teardown.c | 41 +----------------------------------------
1 file changed, 1 insertion(+), 40 deletions(-)
diff --git a/util/async-teardown.c b/util/async-teardown.c
index 7e0177a8da..200ec05101 100644
--- a/util/async-teardown.c
+++ b/util/async-teardown.c
@@ -29,44 +29,6 @@
static pid_t the_ppid;
-/*
- * Close all open file descriptors.
- */
-static void close_all_open_fd(void)
-{
- struct dirent *de;
- int fd, dfd;
- DIR *dir;
-
-#ifdef CONFIG_CLOSE_RANGE
- int r = close_range(0, ~0U, 0);
- if (!r) {
- /* Success, no need to try other ways. */
- return;
- }
-#endif
-
- dir = opendir("/proc/self/fd");
- if (!dir) {
- /* If /proc is not mounted, close fds one by one. */
- int open_max = sysconf(_SC_OPEN_MAX), i;
- for (i = 0; i < open_max; i++) {
- close(i);
- }
- return;
- }
- /* Avoid closing the directory. */
- dfd = dirfd(dir);
-
- for (de = readdir(dir); de; de = readdir(dir)) {
- fd = atoi(de->d_name);
- if (fd != dfd) {
- close(fd);
- }
- }
- closedir(dir);
-}
-
static void hup_handler(int signal)
{
/* Check every second if this process has been reparented. */
@@ -92,9 +54,8 @@ static int async_teardown_fn(void *arg)
/*
* Close all file descriptors that might have been inherited from the
* main qemu process when doing clone, needed to make libvirt happy.
- * Not using close_range for increased compatibility with older kernels.
*/
- close_all_open_fd();
+ qemu_close_range(0, ~0U);
/* Set up a handler for SIGHUP and unblock SIGHUP. */
sigaction(SIGHUP, &sa, NULL);
--
2.34.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v2 6/6] net: tap: Use qemu_close_range() to close fds
2023-06-16 15:27 [PATCH v2 0/6] net/tap: Fix QEMU frozen issue when the maximum number of file descriptors is very large Bin Meng
` (4 preceding siblings ...)
2023-06-16 15:27 ` [PATCH v2 5/6] util/async-teardown: Use qemu_close_range() to close fds Bin Meng
@ 2023-06-16 15:27 ` Bin Meng
2023-06-19 7:08 ` Richard Henderson
5 siblings, 1 reply; 14+ messages in thread
From: Bin Meng @ 2023-06-16 15:27 UTC (permalink / raw)
To: qemu-devel; +Cc: Zhangjin Wu, Jason Wang
From: Zhangjin Wu <falcon@tinylab.org>
Current codes using a brute-force traversal of all file descriptors
do not scale on a system where the maximum number of file descriptors
is set to a very large value (e.g.: in a Docker container of Manjaro
distribution it is set to 1073741816). QEMU just looks frozen during
start-up.
The close-on-exec flag (O_CLOEXEC) was introduced since Linux kernel
2.6.23, FreeBSD 8.3, OpenBSD 5.0, Solaris 11. While it's true QEMU
doesn't need to manually close the fds for child process as the proper
O_CLOEXEC flag should have been set properly on files with its own
codes, QEMU uses a huge number of 3rd party libraries and we don't
trust them to reliably be using O_CLOEXEC on everything they open.
Modern Linux and BSDs have the close_range() call we can use to do the
job, and on Linux we have one more way to walk through /proc/self/fd
to complete the task efficiently, which is what qemu_close_range() does.
Reported-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Bin Meng <bmeng@tinylab.org>
---
Changes in v2:
- Change to use qemu_close_range() to close fds for child process efficiently
- v1 link: https://lore.kernel.org/qemu-devel/20230406112041.798585-1-bmeng@tinylab.org/
net/tap.c | 23 +++++++++++------------
1 file changed, 11 insertions(+), 12 deletions(-)
diff --git a/net/tap.c b/net/tap.c
index 1bf085d422..d482fabdff 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -446,13 +446,13 @@ static void launch_script(const char *setup_script, const char *ifname,
return;
}
if (pid == 0) {
- int open_max = sysconf(_SC_OPEN_MAX), i;
+ unsigned int last_fd = sysconf(_SC_OPEN_MAX) - 1;
+
+ /* skip stdin, stdout and stderr */
+ qemu_close_range(3, fd - 1);
+ /* skip the currently used fd */
+ qemu_close_range(fd + 1, last_fd);
- for (i = 3; i < open_max; i++) {
- if (i != fd) {
- close(i);
- }
- }
parg = args;
*parg++ = (char *)setup_script;
*parg++ = (char *)ifname;
@@ -536,16 +536,15 @@ static int net_bridge_run_helper(const char *helper, const char *bridge,
return -1;
}
if (pid == 0) {
- int open_max = sysconf(_SC_OPEN_MAX), i;
+ unsigned int last_fd = sysconf(_SC_OPEN_MAX) - 1, fd = sv[1];
char *fd_buf = NULL;
char *br_buf = NULL;
char *helper_cmd = NULL;
- for (i = 3; i < open_max; i++) {
- if (i != sv[1]) {
- close(i);
- }
- }
+ /* skip stdin, stdout and stderr */
+ qemu_close_range(3, fd - 1);
+ /* skip the currently used fd */
+ qemu_close_range(fd + 1, last_fd);
fd_buf = g_strdup_printf("%s%d", "--fd=", sv[1]);
--
2.34.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH v2 1/6] tests/tcg/cris: Fix the coding style
2023-06-16 15:27 ` [PATCH v2 1/6] tests/tcg/cris: Fix the coding style Bin Meng
@ 2023-06-19 6:55 ` Richard Henderson
0 siblings, 0 replies; 14+ messages in thread
From: Richard Henderson @ 2023-06-19 6:55 UTC (permalink / raw)
To: Bin Meng, qemu-devel; +Cc: Zhangjin Wu, Edgar E. Iglesias
On 6/16/23 17:27, Bin Meng wrote:
> The code style does not conform with QEMU's. Correct it so that the
> upcoming commit does not trigger checkpatch warnings.
>
> Signed-off-by: Bin Meng<bmeng@tinylab.org>
> ---
>
> (no changes since v1)
>
> tests/tcg/cris/libc/check_openpf5.c | 57 ++++++++++++++---------------
> 1 file changed, 27 insertions(+), 30 deletions(-)
Acked-by: Richard Henderson <richard.henderson@linaro.org>
r~
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 2/6] tests/tcg/cris: Correct the off-by-one error
2023-06-16 15:27 ` [PATCH v2 2/6] tests/tcg/cris: Correct the off-by-one error Bin Meng
@ 2023-06-19 6:55 ` Richard Henderson
0 siblings, 0 replies; 14+ messages in thread
From: Richard Henderson @ 2023-06-19 6:55 UTC (permalink / raw)
To: Bin Meng, qemu-devel; +Cc: Zhangjin Wu, Edgar E. Iglesias
On 6/16/23 17:27, Bin Meng wrote:
> sysconf(_SC_OPEN_MAX) returns the maximum number of files that
> a process can have open at any time, which means the fd should
> not be larger than or equal to the return value.
>
> Signed-off-by: Bin Meng<bmeng@tinylab.org>
> ---
>
> (no changes since v1)
>
> tests/tcg/cris/libc/check_openpf5.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
Acked-by: Richard Henderson <richard.henderson@linaro.org>
r~
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 3/6] util/async-teardown: Fall back to close fds one by one
2023-06-16 15:27 ` [PATCH v2 3/6] util/async-teardown: Fall back to close fds one by one Bin Meng
@ 2023-06-19 6:59 ` Richard Henderson
0 siblings, 0 replies; 14+ messages in thread
From: Richard Henderson @ 2023-06-19 6:59 UTC (permalink / raw)
To: Bin Meng, qemu-devel
Cc: Zhangjin Wu, Claudio Imbrenda, Markus Armbruster,
Michael S. Tsirkin
On 6/16/23 17:27, Bin Meng wrote:
> When opening /proc/self/fd fails, current codes just return directly,
> but we can fall back to close fds one by one.
>
> Signed-off-by: Bin Meng <bmeng@tinylab.org>
> ---
>
> (no changes since v1)
>
> util/async-teardown.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/util/async-teardown.c b/util/async-teardown.c
> index 3ab19c8740..7e0177a8da 100644
> --- a/util/async-teardown.c
> +++ b/util/async-teardown.c
> @@ -48,7 +48,11 @@ static void close_all_open_fd(void)
>
> dir = opendir("/proc/self/fd");
> if (!dir) {
> - /* If /proc is not mounted, there is nothing that can be done. */
> + /* If /proc is not mounted, close fds one by one. */
> + int open_max = sysconf(_SC_OPEN_MAX), i;
> + for (i = 0; i < open_max; i++) {
> + close(i);
> + }
> return;
> }
> /* Avoid closing the directory. */
Do we really need to make the 1M close calls?
The process is on its way to exit anyway...
r~
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 4/6] utils/osdep: Introduce qemu_close_range()
2023-06-16 15:27 ` [PATCH v2 4/6] utils/osdep: Introduce qemu_close_range() Bin Meng
@ 2023-06-19 7:03 ` Richard Henderson
2023-06-19 7:07 ` Richard Henderson
1 sibling, 0 replies; 14+ messages in thread
From: Richard Henderson @ 2023-06-19 7:03 UTC (permalink / raw)
To: Bin Meng, qemu-devel
Cc: Zhangjin Wu, David Hildenbrand, Kevin Wolf,
Marc-André Lureau, Nikita Ivanov, Paolo Bonzini, Thomas Huth,
Xuzhou Cheng
On 6/16/23 17:27, Bin Meng wrote:
> This introduces a new QEMU API qemu_close_range() that closes all
> open file descriptors from first to last (included).
>
> This API will try a more efficient call to close_range(), or walk
> through of /proc/self/fd whenever these are possible, otherwise it
> falls back to a plain close loop.
>
> Co-developed-by: Zhangjin Wu <falcon@tinylab.org>
> Signed-off-by: Bin Meng <bmeng@tinylab.org>
> ---
>
> (no changes since v1)
>
> include/qemu/osdep.h | 1 +
> util/osdep.c | 47 ++++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 48 insertions(+)
>
> diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
> index cc61b00ba9..e22434ce10 100644
> --- a/include/qemu/osdep.h
> +++ b/include/qemu/osdep.h
> @@ -560,6 +560,7 @@ int qemu_open_old(const char *name, int flags, ...);
> int qemu_open(const char *name, int flags, Error **errp);
> int qemu_create(const char *name, int flags, mode_t mode, Error **errp);
> int qemu_close(int fd);
> +int qemu_close_range(unsigned int first, unsigned int last);
> int qemu_unlink(const char *name);
> #ifndef _WIN32
> int qemu_dup_flags(int fd, int flags);
> diff --git a/util/osdep.c b/util/osdep.c
> index e996c4744a..fd7dd2dbdf 100644
> --- a/util/osdep.c
> +++ b/util/osdep.c
> @@ -411,6 +411,53 @@ int qemu_close(int fd)
> return close(fd);
> }
>
> +int qemu_close_range(unsigned int first, unsigned int last)
> +{
> + DIR *dir = NULL;
> +
> +#ifdef CONFIG_CLOSE_RANGE
> + int r = close_range(first, last, 0);
> + if (!r) {
> + /* Success, no need to try other ways. */
> + return 0;
> + }
> +#endif
> +
> +#ifdef __linux__
> + dir = opendir("/proc/self/fd");
> +#endif
> + if (!dir) {
> + /*
> + * If /proc is not mounted or /proc/self/fd is not supported,
> + * try close() from first to last.
> + */
> + for (int i = first; i <= last; i++) {
> + close(i);
> + }
> +
> + return 0;
> + }
> +
> +#ifndef _WIN32
> + /* Avoid closing the directory */
> + int dfd = dirfd(dir);
> +
> + for (struct dirent *de = readdir(dir); de; de = readdir(dir)) {
> + int fd = atoi(de->d_name);
> + if (fd < first || fd > last) {
> + /* Exclude the fds outside the target range */
> + continue;
> + }
> + if (fd != dfd) {
> + close(fd);
> + }
> + }
> + closedir(dir);
> +#endif /* _WIN32 */
Poor ordering of ifdefs.
#ifdef __linux__
DIR *dir = opendir("...");
if (dir) {
int dfd = ...
loop
closedir(dir);
return;
}
#endif
since the first ifdef is the only way dir can ever be non-null.
> + for (int i = first; i <= last; i++) {
typeof(i) != typeof(first).
r~
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 5/6] util/async-teardown: Use qemu_close_range() to close fds
2023-06-16 15:27 ` [PATCH v2 5/6] util/async-teardown: Use qemu_close_range() to close fds Bin Meng
@ 2023-06-19 7:05 ` Richard Henderson
0 siblings, 0 replies; 14+ messages in thread
From: Richard Henderson @ 2023-06-19 7:05 UTC (permalink / raw)
To: Bin Meng, qemu-devel
Cc: Zhangjin Wu, Claudio Imbrenda, Michael S. Tsirkin,
Philippe Mathieu-Daudé
On 6/16/23 17:27, Bin Meng wrote:
> + qemu_close_range(0, ~0U);
This is clearly assuming that close_range exists.
Should qemu_close_range attempt to lower last if >= _SC_OPEN_MAX?
Or better to use that here.
r~
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 4/6] utils/osdep: Introduce qemu_close_range()
2023-06-16 15:27 ` [PATCH v2 4/6] utils/osdep: Introduce qemu_close_range() Bin Meng
2023-06-19 7:03 ` Richard Henderson
@ 2023-06-19 7:07 ` Richard Henderson
1 sibling, 0 replies; 14+ messages in thread
From: Richard Henderson @ 2023-06-19 7:07 UTC (permalink / raw)
To: Bin Meng, qemu-devel
Cc: Zhangjin Wu, David Hildenbrand, Kevin Wolf,
Marc-André Lureau, Nikita Ivanov, Paolo Bonzini, Thomas Huth,
Xuzhou Cheng
On 6/16/23 17:27, Bin Meng wrote:
> +int qemu_close_range(unsigned int first, unsigned int last)
> +{
> + DIR *dir = NULL;
> +
> +#ifdef CONFIG_CLOSE_RANGE
> + int r = close_range(first, last, 0);
> + if (!r) {
> + /* Success, no need to try other ways. */
> + return 0;
> + }
> +#endif
What about first > last?
close_range will yield EINVAL, and your other fallbacks will burn lots of cpu.
r~
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 6/6] net: tap: Use qemu_close_range() to close fds
2023-06-16 15:27 ` [PATCH v2 6/6] net: tap: " Bin Meng
@ 2023-06-19 7:08 ` Richard Henderson
0 siblings, 0 replies; 14+ messages in thread
From: Richard Henderson @ 2023-06-19 7:08 UTC (permalink / raw)
To: Bin Meng, qemu-devel; +Cc: Zhangjin Wu, Jason Wang
On 6/16/23 17:27, Bin Meng wrote:
> From: Zhangjin Wu<falcon@tinylab.org>
>
> Current codes using a brute-force traversal of all file descriptors
> do not scale on a system where the maximum number of file descriptors
> is set to a very large value (e.g.: in a Docker container of Manjaro
> distribution it is set to 1073741816). QEMU just looks frozen during
> start-up.
>
> The close-on-exec flag (O_CLOEXEC) was introduced since Linux kernel
> 2.6.23, FreeBSD 8.3, OpenBSD 5.0, Solaris 11. While it's true QEMU
> doesn't need to manually close the fds for child process as the proper
> O_CLOEXEC flag should have been set properly on files with its own
> codes, QEMU uses a huge number of 3rd party libraries and we don't
> trust them to reliably be using O_CLOEXEC on everything they open.
>
> Modern Linux and BSDs have the close_range() call we can use to do the
> job, and on Linux we have one more way to walk through /proc/self/fd
> to complete the task efficiently, which is what qemu_close_range() does.
>
> Reported-by: Zhangjin Wu<falcon@tinylab.org>
> Signed-off-by: Zhangjin Wu<falcon@tinylab.org>
> Signed-off-by: Bin Meng<bmeng@tinylab.org>
>
> ---
>
> Changes in v2:
> - Change to use qemu_close_range() to close fds for child process efficiently
> - v1 link:https://lore.kernel.org/qemu-devel/20230406112041.798585-1-bmeng@tinylab.org/
>
> net/tap.c | 23 +++++++++++------------
> 1 file changed, 11 insertions(+), 12 deletions(-)
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
r~
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2023-06-19 7:09 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-06-16 15:27 [PATCH v2 0/6] net/tap: Fix QEMU frozen issue when the maximum number of file descriptors is very large Bin Meng
2023-06-16 15:27 ` [PATCH v2 1/6] tests/tcg/cris: Fix the coding style Bin Meng
2023-06-19 6:55 ` Richard Henderson
2023-06-16 15:27 ` [PATCH v2 2/6] tests/tcg/cris: Correct the off-by-one error Bin Meng
2023-06-19 6:55 ` Richard Henderson
2023-06-16 15:27 ` [PATCH v2 3/6] util/async-teardown: Fall back to close fds one by one Bin Meng
2023-06-19 6:59 ` Richard Henderson
2023-06-16 15:27 ` [PATCH v2 4/6] utils/osdep: Introduce qemu_close_range() Bin Meng
2023-06-19 7:03 ` Richard Henderson
2023-06-19 7:07 ` Richard Henderson
2023-06-16 15:27 ` [PATCH v2 5/6] util/async-teardown: Use qemu_close_range() to close fds Bin Meng
2023-06-19 7:05 ` Richard Henderson
2023-06-16 15:27 ` [PATCH v2 6/6] net: tap: " Bin Meng
2023-06-19 7:08 ` Richard Henderson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).