* [PATCH v3 1/6] tests/tcg/cris: Fix the coding style
2023-06-17 5:36 [PATCH v3 0/6] net/tap: Fix QEMU frozen issue when the maximum number of file descriptors is very large Bin Meng
@ 2023-06-17 5:36 ` Bin Meng
2023-06-17 22:52 ` Philippe Mathieu-Daudé
2023-06-17 5:36 ` [PATCH v3 2/6] tests/tcg/cris: Correct the off-by-one error Bin Meng
` (6 subsequent siblings)
7 siblings, 1 reply; 19+ messages in thread
From: Bin Meng @ 2023-06-17 5:36 UTC (permalink / raw)
To: qemu-devel; +Cc: Zhangjin Wu, Edgar E. Iglesias
The code style does not conform with QEMU's. Correct it so that the
upcoming commit does not trigger checkpatch warnings.
Signed-off-by: Bin Meng <bmeng@tinylab.org>
---
(no changes since v2)
Changes in v2:
- new patch: "tests/tcg/cris: Fix the coding style"
tests/tcg/cris/libc/check_openpf5.c | 57 ++++++++++++++---------------
1 file changed, 27 insertions(+), 30 deletions(-)
diff --git a/tests/tcg/cris/libc/check_openpf5.c b/tests/tcg/cris/libc/check_openpf5.c
index 1f86ea283d..0037fbca4c 100644
--- a/tests/tcg/cris/libc/check_openpf5.c
+++ b/tests/tcg/cris/libc/check_openpf5.c
@@ -13,44 +13,41 @@
#include <fcntl.h>
#include <string.h>
-int main (int argc, char *argv[])
+int main(int argc, char *argv[])
{
- int i;
- int filemax;
+ int i;
+ int filemax;
#ifdef OPEN_MAX
- filemax = OPEN_MAX;
+ filemax = OPEN_MAX;
#else
- filemax = sysconf (_SC_OPEN_MAX);
+ filemax = sysconf(_SC_OPEN_MAX);
#endif
- char *fn = malloc (strlen (argv[0]) + 2);
- if (fn == NULL)
- abort ();
- strcpy (fn, "/");
- strcat (fn, argv[0]);
+ char *fn = malloc(strlen(argv[0]) + 2);
+ if (fn == NULL) {
+ abort();
+ }
+ strcpy(fn, "/");
+ strcat(fn, argv[0]);
- for (i = 0; i < filemax + 1; i++)
- {
- if (open (fn, O_RDONLY) < 0)
- {
- /* Shouldn't happen too early. */
- if (i < filemax - 3 - 1)
- {
- fprintf (stderr, "i: %d\n", i);
- abort ();
- }
- if (errno != EMFILE)
- {
- perror ("open");
- abort ();
- }
- goto ok;
- }
+ for (i = 0; i < filemax + 1; i++) {
+ if (open(fn, O_RDONLY) < 0) {
+ /* Shouldn't happen too early. */
+ if (i < filemax - 3 - 1) {
+ fprintf(stderr, "i: %d\n", i);
+ abort();
+ }
+ if (errno != EMFILE) {
+ perror("open");
+ abort();
+ }
+ goto ok;
+ }
}
- abort ();
+ abort();
ok:
- printf ("pass\n");
- exit (0);
+ printf("pass\n");
+ exit(0);
}
--
2.34.1
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH v3 1/6] tests/tcg/cris: Fix the coding style
2023-06-17 5:36 ` [PATCH v3 1/6] tests/tcg/cris: Fix the coding style Bin Meng
@ 2023-06-17 22:52 ` Philippe Mathieu-Daudé
0 siblings, 0 replies; 19+ messages in thread
From: Philippe Mathieu-Daudé @ 2023-06-17 22:52 UTC (permalink / raw)
To: Bin Meng, qemu-devel; +Cc: Zhangjin Wu, Edgar E. Iglesias
On 17/6/23 07:36, Bin Meng wrote:
> The code style does not conform with QEMU's. Correct it so that the
> upcoming commit does not trigger checkpatch warnings.
>
> Signed-off-by: Bin Meng <bmeng@tinylab.org>
>
> ---
>
> (no changes since v2)
>
> Changes in v2:
> - new patch: "tests/tcg/cris: Fix the coding style"
>
> tests/tcg/cris/libc/check_openpf5.c | 57 ++++++++++++++---------------
> 1 file changed, 27 insertions(+), 30 deletions(-)
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH v3 2/6] tests/tcg/cris: Correct the off-by-one error
2023-06-17 5:36 [PATCH v3 0/6] net/tap: Fix QEMU frozen issue when the maximum number of file descriptors is very large Bin Meng
2023-06-17 5:36 ` [PATCH v3 1/6] tests/tcg/cris: Fix the coding style Bin Meng
@ 2023-06-17 5:36 ` Bin Meng
2023-06-17 22:52 ` Philippe Mathieu-Daudé
2023-06-17 5:36 ` [PATCH v3 3/6] util/async-teardown: Fall back to close fds one by one Bin Meng
` (5 subsequent siblings)
7 siblings, 1 reply; 19+ messages in thread
From: Bin Meng @ 2023-06-17 5:36 UTC (permalink / raw)
To: qemu-devel; +Cc: Zhangjin Wu, Edgar E. Iglesias
sysconf(_SC_OPEN_MAX) returns the maximum number of files that
a process can have open at any time, which means the fd should
not be larger than or equal to the return value.
Signed-off-by: Bin Meng <bmeng@tinylab.org>
---
(no changes since v2)
Changes in v2:
- new patch: "tests/tcg/cris: Correct the off-by-one error"
tests/tcg/cris/libc/check_openpf5.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tests/tcg/cris/libc/check_openpf5.c b/tests/tcg/cris/libc/check_openpf5.c
index 0037fbca4c..7f585c6d37 100644
--- a/tests/tcg/cris/libc/check_openpf5.c
+++ b/tests/tcg/cris/libc/check_openpf5.c
@@ -31,10 +31,10 @@ int main(int argc, char *argv[])
strcpy(fn, "/");
strcat(fn, argv[0]);
- for (i = 0; i < filemax + 1; i++) {
+ for (i = 0; i < filemax; i++) {
if (open(fn, O_RDONLY) < 0) {
/* Shouldn't happen too early. */
- if (i < filemax - 3 - 1) {
+ if (i < filemax - 3) {
fprintf(stderr, "i: %d\n", i);
abort();
}
--
2.34.1
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH v3 2/6] tests/tcg/cris: Correct the off-by-one error
2023-06-17 5:36 ` [PATCH v3 2/6] tests/tcg/cris: Correct the off-by-one error Bin Meng
@ 2023-06-17 22:52 ` Philippe Mathieu-Daudé
0 siblings, 0 replies; 19+ messages in thread
From: Philippe Mathieu-Daudé @ 2023-06-17 22:52 UTC (permalink / raw)
To: Bin Meng, qemu-devel; +Cc: Zhangjin Wu, Edgar E. Iglesias
On 17/6/23 07:36, Bin Meng wrote:
> sysconf(_SC_OPEN_MAX) returns the maximum number of files that
> a process can have open at any time, which means the fd should
> not be larger than or equal to the return value.
>
> Signed-off-by: Bin Meng <bmeng@tinylab.org>
>
> ---
>
> (no changes since v2)
>
> Changes in v2:
> - new patch: "tests/tcg/cris: Correct the off-by-one error"
>
> tests/tcg/cris/libc/check_openpf5.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH v3 3/6] util/async-teardown: Fall back to close fds one by one
2023-06-17 5:36 [PATCH v3 0/6] net/tap: Fix QEMU frozen issue when the maximum number of file descriptors is very large Bin Meng
2023-06-17 5:36 ` [PATCH v3 1/6] tests/tcg/cris: Fix the coding style Bin Meng
2023-06-17 5:36 ` [PATCH v3 2/6] tests/tcg/cris: Correct the off-by-one error Bin Meng
@ 2023-06-17 5:36 ` Bin Meng
2023-06-19 9:18 ` Claudio Imbrenda
2023-06-17 5:36 ` [PATCH v3 4/6] util/osdep: Introduce qemu_close_range() Bin Meng
` (4 subsequent siblings)
7 siblings, 1 reply; 19+ messages in thread
From: Bin Meng @ 2023-06-17 5:36 UTC (permalink / raw)
To: qemu-devel
Cc: Zhangjin Wu, Claudio Imbrenda, Markus Armbruster,
Michael S. Tsirkin
When opening /proc/self/fd fails, current codes just return directly,
but we can fall back to close fds one by one.
Signed-off-by: Bin Meng <bmeng@tinylab.org>
---
(no changes since v2)
Changes in v2:
- new patch: "util/async-teardown: Fall back to close fds one by one"
util/async-teardown.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/util/async-teardown.c b/util/async-teardown.c
index 3ab19c8740..7e0177a8da 100644
--- a/util/async-teardown.c
+++ b/util/async-teardown.c
@@ -48,7 +48,11 @@ static void close_all_open_fd(void)
dir = opendir("/proc/self/fd");
if (!dir) {
- /* If /proc is not mounted, there is nothing that can be done. */
+ /* If /proc is not mounted, close fds one by one. */
+ int open_max = sysconf(_SC_OPEN_MAX), i;
+ for (i = 0; i < open_max; i++) {
+ close(i);
+ }
return;
}
/* Avoid closing the directory. */
--
2.34.1
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH v3 3/6] util/async-teardown: Fall back to close fds one by one
2023-06-17 5:36 ` [PATCH v3 3/6] util/async-teardown: Fall back to close fds one by one Bin Meng
@ 2023-06-19 9:18 ` Claudio Imbrenda
2023-06-25 15:17 ` Bin Meng
0 siblings, 1 reply; 19+ messages in thread
From: Claudio Imbrenda @ 2023-06-19 9:18 UTC (permalink / raw)
To: Bin Meng; +Cc: qemu-devel, Zhangjin Wu, Markus Armbruster, Michael S. Tsirkin
On Sat, 17 Jun 2023 13:36:18 +0800
Bin Meng <bmeng@tinylab.org> wrote:
> When opening /proc/self/fd fails, current codes just return directly,
> but we can fall back to close fds one by one.
>
> Signed-off-by: Bin Meng <bmeng@tinylab.org>
>
> ---
>
> (no changes since v2)
>
> Changes in v2:
> - new patch: "util/async-teardown: Fall back to close fds one by one"
>
> util/async-teardown.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/util/async-teardown.c b/util/async-teardown.c
> index 3ab19c8740..7e0177a8da 100644
> --- a/util/async-teardown.c
> +++ b/util/async-teardown.c
> @@ -48,7 +48,11 @@ static void close_all_open_fd(void)
>
> dir = opendir("/proc/self/fd");
> if (!dir) {
> - /* If /proc is not mounted, there is nothing that can be done. */
> + /* If /proc is not mounted, close fds one by one. */
> + int open_max = sysconf(_SC_OPEN_MAX), i;
> + for (i = 0; i < open_max; i++) {
> + close(i);
> + }
> return;
> }
> /* Avoid closing the directory. */
a few patches later, you replace the whole close_all_open_fd() with a
generic version, I don't see a point in changing the code here.
this patch is useless, just drop it
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v3 3/6] util/async-teardown: Fall back to close fds one by one
2023-06-19 9:18 ` Claudio Imbrenda
@ 2023-06-25 15:17 ` Bin Meng
0 siblings, 0 replies; 19+ messages in thread
From: Bin Meng @ 2023-06-25 15:17 UTC (permalink / raw)
To: imbrenda; +Cc: qemu-devel, falcon, armbru, mst
On 2023/6/19 17:18:23, "Claudio Imbrenda" <imbrenda@linux.ibm.com> wrote:
>On Sat, 17 Jun 2023 13:36:18 +0800
>Bin Meng <bmeng@tinylab.org> wrote:
>
>> When opening /proc/self/fd fails, current codes just return directly,
>> but we can fall back to close fds one by one.
>>
>> Signed-off-by: Bin Meng <bmeng@tinylab.org>
>>
>> ---
>>
>> (no changes since v2)
>>
>> Changes in v2:
>> - new patch: "util/async-teardown: Fall back to close fds one by one"
>>
>> util/async-teardown.c | 6 +++++-
>> 1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/util/async-teardown.c b/util/async-teardown.c
>> index 3ab19c8740..7e0177a8da 100644
>> --- a/util/async-teardown.c
>> +++ b/util/async-teardown.c
>> @@ -48,7 +48,11 @@ static void close_all_open_fd(void)
>>
>> dir = opendir("/proc/self/fd");
>> if (!dir) {
>> - /* If /proc is not mounted, there is nothing that can be done. */
>> + /* If /proc is not mounted, close fds one by one. */
>> + int open_max = sysconf(_SC_OPEN_MAX), i;
>> + for (i = 0; i < open_max; i++) {
>> + close(i);
>> + }
>> return;
>> }
>> /* Avoid closing the directory. */
>
>a few patches later, you replace the whole close_all_open_fd() with a
>generic version, I don't see a point in changing the code here.
I meant to do a 100% replacement, and this patch added the missing loop.
>
>
>this patch is useless, just drop it
Regards,
Bin
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH v3 4/6] util/osdep: Introduce qemu_close_range()
2023-06-17 5:36 [PATCH v3 0/6] net/tap: Fix QEMU frozen issue when the maximum number of file descriptors is very large Bin Meng
` (2 preceding siblings ...)
2023-06-17 5:36 ` [PATCH v3 3/6] util/async-teardown: Fall back to close fds one by one Bin Meng
@ 2023-06-17 5:36 ` Bin Meng
2023-06-19 9:18 ` Claudio Imbrenda
2023-06-19 9:22 ` Markus Armbruster
2023-06-17 5:36 ` [PATCH v3 5/6] util/async-teardown: Use qemu_close_range() to close fds Bin Meng
` (3 subsequent siblings)
7 siblings, 2 replies; 19+ messages in thread
From: Bin Meng @ 2023-06-17 5:36 UTC (permalink / raw)
To: qemu-devel
Cc: Zhangjin Wu, Alberto Faria, Kevin Wolf, Marc-André Lureau,
Nikita Ivanov, Paolo Bonzini, Thomas Huth, Xuzhou Cheng
This introduces a new QEMU API qemu_close_range() that closes all
open file descriptors from first to last (included).
This API will try a more efficient call to close_range(), or walk
through of /proc/self/fd whenever these are possible, otherwise it
falls back to a plain close loop.
Co-developed-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Bin Meng <bmeng@tinylab.org>
---
Changes in v3:
- fix win32 build failure
Changes in v2:
- new patch: "util/osdep: Introduce qemu_close_range()"
include/qemu/osdep.h | 1 +
util/osdep.c | 48 ++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 49 insertions(+)
diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index cc61b00ba9..e22434ce10 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -560,6 +560,7 @@ int qemu_open_old(const char *name, int flags, ...);
int qemu_open(const char *name, int flags, Error **errp);
int qemu_create(const char *name, int flags, mode_t mode, Error **errp);
int qemu_close(int fd);
+int qemu_close_range(unsigned int first, unsigned int last);
int qemu_unlink(const char *name);
#ifndef _WIN32
int qemu_dup_flags(int fd, int flags);
diff --git a/util/osdep.c b/util/osdep.c
index e996c4744a..91275e70f8 100644
--- a/util/osdep.c
+++ b/util/osdep.c
@@ -30,6 +30,7 @@
#include "qemu/mprotect.h"
#include "qemu/hw-version.h"
#include "monitor/monitor.h"
+#include <dirent.h>
static const char *hw_version = QEMU_HW_VERSION;
@@ -411,6 +412,53 @@ int qemu_close(int fd)
return close(fd);
}
+int qemu_close_range(unsigned int first, unsigned int last)
+{
+ DIR *dir = NULL;
+
+#ifdef CONFIG_CLOSE_RANGE
+ int r = close_range(first, last, 0);
+ if (!r) {
+ /* Success, no need to try other ways. */
+ return 0;
+ }
+#endif
+
+#ifdef __linux__
+ dir = opendir("/proc/self/fd");
+#endif
+ if (!dir) {
+ /*
+ * If /proc is not mounted or /proc/self/fd is not supported,
+ * try close() from first to last.
+ */
+ for (int i = first; i <= last; i++) {
+ close(i);
+ }
+
+ return 0;
+ }
+
+#ifndef _WIN32
+ /* Avoid closing the directory */
+ int dfd = dirfd(dir);
+
+ for (struct dirent *de = readdir(dir); de; de = readdir(dir)) {
+ int fd = atoi(de->d_name);
+ if (fd < first || fd > last) {
+ /* Exclude the fds outside the target range */
+ continue;
+ }
+ if (fd != dfd) {
+ close(fd);
+ }
+ }
+ closedir(dir);
+#endif /* _WIN32 */
+
+ return 0;
+}
+
/*
* Delete a file from the filesystem, unless the filename is /dev/fdset/...
*
--
2.34.1
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH v3 4/6] util/osdep: Introduce qemu_close_range()
2023-06-17 5:36 ` [PATCH v3 4/6] util/osdep: Introduce qemu_close_range() Bin Meng
@ 2023-06-19 9:18 ` Claudio Imbrenda
2023-06-28 15:12 ` Bin Meng
2023-06-19 9:22 ` Markus Armbruster
1 sibling, 1 reply; 19+ messages in thread
From: Claudio Imbrenda @ 2023-06-19 9:18 UTC (permalink / raw)
To: Bin Meng
Cc: qemu-devel, Zhangjin Wu, Alberto Faria, Kevin Wolf,
Marc-André Lureau, Nikita Ivanov, Paolo Bonzini, Thomas Huth,
Xuzhou Cheng
On Sat, 17 Jun 2023 13:36:19 +0800
Bin Meng <bmeng@tinylab.org> wrote:
> This introduces a new QEMU API qemu_close_range() that closes all
> open file descriptors from first to last (included).
>
> This API will try a more efficient call to close_range(), or walk
> through of /proc/self/fd whenever these are possible, otherwise it
> falls back to a plain close loop.
>
> Co-developed-by: Zhangjin Wu <falcon@tinylab.org>
> Signed-off-by: Bin Meng <bmeng@tinylab.org>
>
> ---
>
> Changes in v3:
> - fix win32 build failure
>
> Changes in v2:
> - new patch: "util/osdep: Introduce qemu_close_range()"
>
> include/qemu/osdep.h | 1 +
> util/osdep.c | 48 ++++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 49 insertions(+)
>
> diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
> index cc61b00ba9..e22434ce10 100644
> --- a/include/qemu/osdep.h
> +++ b/include/qemu/osdep.h
> @@ -560,6 +560,7 @@ int qemu_open_old(const char *name, int flags, ...);
> int qemu_open(const char *name, int flags, Error **errp);
> int qemu_create(const char *name, int flags, mode_t mode, Error **errp);
> int qemu_close(int fd);
> +int qemu_close_range(unsigned int first, unsigned int last);
> int qemu_unlink(const char *name);
> #ifndef _WIN32
> int qemu_dup_flags(int fd, int flags);
> diff --git a/util/osdep.c b/util/osdep.c
> index e996c4744a..91275e70f8 100644
> --- a/util/osdep.c
> +++ b/util/osdep.c
> @@ -30,6 +30,7 @@
> #include "qemu/mprotect.h"
> #include "qemu/hw-version.h"
> #include "monitor/monitor.h"
> +#include <dirent.h>
>
> static const char *hw_version = QEMU_HW_VERSION;
>
> @@ -411,6 +412,53 @@ int qemu_close(int fd)
> return close(fd);
> }
>
> +int qemu_close_range(unsigned int first, unsigned int last)
> +{
> + DIR *dir = NULL;
> +
> +#ifdef CONFIG_CLOSE_RANGE
> + int r = close_range(first, last, 0);
> + if (!r) {
> + /* Success, no need to try other ways. */
> + return 0;
> + }
> +#endif
> +
> +#ifdef __linux__
> + dir = opendir("/proc/self/fd");
> +#endif
> + if (!dir) {
> + /*
> + * If /proc is not mounted or /proc/self/fd is not supported,
> + * try close() from first to last.
> + */
> + for (int i = first; i <= last; i++) {
> + close(i);
will this compile on windows?
> + }
> +
> + return 0;
> + }
> +
> +#ifndef _WIN32
> + /* Avoid closing the directory */
> + int dfd = dirfd(dir);
> +
> + for (struct dirent *de = readdir(dir); de; de = readdir(dir)) {
> + int fd = atoi(de->d_name);
> + if (fd < first || fd > last) {
> + /* Exclude the fds outside the target range */
> + continue;
> + }
> + if (fd != dfd) {
> + close(fd);
> + }
> + }
> + closedir(dir);
> +#endif /* _WIN32 */
> +
> + return 0;
> +}
> +
> /*
> * Delete a file from the filesystem, unless the filename is /dev/fdset/...
> *
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v3 4/6] util/osdep: Introduce qemu_close_range()
2023-06-19 9:18 ` Claudio Imbrenda
@ 2023-06-28 15:12 ` Bin Meng
0 siblings, 0 replies; 19+ messages in thread
From: Bin Meng @ 2023-06-28 15:12 UTC (permalink / raw)
To: Claudio Imbrenda
Cc: qemu-devel, falcon, afaria, kwolf, marcandre.lureau, nivanov,
pbonzini, thuth, xuzhou.cheng
On 2023/6/19 17:18:53, "Claudio Imbrenda" <imbrenda@linux.ibm.com>
wrote:
>On Sat, 17 Jun 2023 13:36:19 +0800
>Bin Meng <bmeng@tinylab.org> wrote:
>
>> This introduces a new QEMU API qemu_close_range() that closes all
>> open file descriptors from first to last (included).
>>
>> This API will try a more efficient call to close_range(), or walk
>> through of /proc/self/fd whenever these are possible, otherwise it
>> falls back to a plain close loop.
>>
>> Co-developed-by: Zhangjin Wu <falcon@tinylab.org>
>> Signed-off-by: Bin Meng <bmeng@tinylab.org>
>>
>> ---
>>
>> Changes in v3:
>> - fix win32 build failure
>>
>> Changes in v2:
>> - new patch: "util/osdep: Introduce qemu_close_range()"
>>
>> include/qemu/osdep.h | 1 +
>> util/osdep.c | 48 ++++++++++++++++++++++++++++++++++++++++++++
>> 2 files changed, 49 insertions(+)
>>
>> diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
>> index cc61b00ba9..e22434ce10 100644
>> --- a/include/qemu/osdep.h
>> +++ b/include/qemu/osdep.h
>> @@ -560,6 +560,7 @@ int qemu_open_old(const char *name, int flags, ...);
>> int qemu_open(const char *name, int flags, Error **errp);
>> int qemu_create(const char *name, int flags, mode_t mode, Error **errp);
>> int qemu_close(int fd);
>> +int qemu_close_range(unsigned int first, unsigned int last);
>> int qemu_unlink(const char *name);
>> #ifndef _WIN32
>> int qemu_dup_flags(int fd, int flags);
>> diff --git a/util/osdep.c b/util/osdep.c
>> index e996c4744a..91275e70f8 100644
>> --- a/util/osdep.c
>> +++ b/util/osdep.c
>> @@ -30,6 +30,7 @@
>> #include "qemu/mprotect.h"
>> #include "qemu/hw-version.h"
>> #include "monitor/monitor.h"
>> +#include <dirent.h>
>>
>> static const char *hw_version = QEMU_HW_VERSION;
>>
>> @@ -411,6 +412,53 @@ int qemu_close(int fd)
>> return close(fd);
>> }
>>
>> +int qemu_close_range(unsigned int first, unsigned int last)
>> +{
>> + DIR *dir = NULL;
>> +
>> +#ifdef CONFIG_CLOSE_RANGE
>> + int r = close_range(first, last, 0);
>> + if (!r) {
>> + /* Success, no need to try other ways. */
>> + return 0;
>> + }
>> +#endif
>> +
>> +#ifdef __linux__
>> + dir = opendir("/proc/self/fd");
>> +#endif
>> + if (!dir) {
>> + /*
>> + * If /proc is not mounted or /proc/self/fd is not supported,
>> + * try close() from first to last.
>> + */
>> + for (int i = first; i <= last; i++) {
>> + close(i);
>
>will this compile on windows?
Yes, it will.
Regards,
Bin
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v3 4/6] util/osdep: Introduce qemu_close_range()
2023-06-17 5:36 ` [PATCH v3 4/6] util/osdep: Introduce qemu_close_range() Bin Meng
2023-06-19 9:18 ` Claudio Imbrenda
@ 2023-06-19 9:22 ` Markus Armbruster
2023-06-28 15:40 ` Bin Meng
1 sibling, 1 reply; 19+ messages in thread
From: Markus Armbruster @ 2023-06-19 9:22 UTC (permalink / raw)
To: Bin Meng
Cc: qemu-devel, Zhangjin Wu, Alberto Faria, Kevin Wolf,
Marc-André Lureau, Nikita Ivanov, Paolo Bonzini, Thomas Huth,
Xuzhou Cheng
Bin Meng <bmeng@tinylab.org> writes:
> This introduces a new QEMU API qemu_close_range() that closes all
> open file descriptors from first to last (included).
>
> This API will try a more efficient call to close_range(), or walk
> through of /proc/self/fd whenever these are possible, otherwise it
> falls back to a plain close loop.
>
> Co-developed-by: Zhangjin Wu <falcon@tinylab.org>
> Signed-off-by: Bin Meng <bmeng@tinylab.org>
>
> ---
>
> Changes in v3:
> - fix win32 build failure
>
> Changes in v2:
> - new patch: "util/osdep: Introduce qemu_close_range()"
>
> include/qemu/osdep.h | 1 +
> util/osdep.c | 48 ++++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 49 insertions(+)
>
> diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
> index cc61b00ba9..e22434ce10 100644
> --- a/include/qemu/osdep.h
> +++ b/include/qemu/osdep.h
> @@ -560,6 +560,7 @@ int qemu_open_old(const char *name, int flags, ...);
> int qemu_open(const char *name, int flags, Error **errp);
> int qemu_create(const char *name, int flags, mode_t mode, Error **errp);
> int qemu_close(int fd);
> +int qemu_close_range(unsigned int first, unsigned int last);
> int qemu_unlink(const char *name);
> #ifndef _WIN32
> int qemu_dup_flags(int fd, int flags);
> diff --git a/util/osdep.c b/util/osdep.c
> index e996c4744a..91275e70f8 100644
> --- a/util/osdep.c
> +++ b/util/osdep.c
> @@ -30,6 +30,7 @@
> #include "qemu/mprotect.h"
> #include "qemu/hw-version.h"
> #include "monitor/monitor.h"
> +#include <dirent.h>
>
> static const char *hw_version = QEMU_HW_VERSION;
>
> @@ -411,6 +412,53 @@ int qemu_close(int fd)
> return close(fd);
> }
>
> +int qemu_close_range(unsigned int first, unsigned int last)
> +{
> + DIR *dir = NULL;
> +
> +#ifdef CONFIG_CLOSE_RANGE
> + int r = close_range(first, last, 0);
close_range(2) explains flag
CLOSE_RANGE_UNSHARE
Unshare the specified file descriptors from any other processes
before closing them, avoiding races with other threads sharing
the file descriptor table.
Can anybody explain the races this avoids?
> + if (!r) {
> + /* Success, no need to try other ways. */
> + return 0;
> + }
What are the failure modes of close_range() where the other ways are
worth trying?
> +#endif
> +
> +#ifdef __linux__
> + dir = opendir("/proc/self/fd");
> +#endif
> + if (!dir) {
> + /*
> + * If /proc is not mounted or /proc/self/fd is not supported,
> + * try close() from first to last.
> + */
> + for (int i = first; i <= last; i++) {
> + close(i);
> + }
> +
> + return 0;
> + }
> +
> +#ifndef _WIN32
> + /* Avoid closing the directory */
> + int dfd = dirfd(dir);
This directory contains "." "..", "0", "1", ...
> +
> + for (struct dirent *de = readdir(dir); de; de = readdir(dir)) {
> + int fd = atoi(de->d_name);
Maps "." and ".." to 0. Unclean.
Please use qemu_strtoi(de->d_name, NULL, 10, &fd), and skip entries
where it fails.
> + if (fd < first || fd > last) {
> + /* Exclude the fds outside the target range */
> + continue;
> + }
> + if (fd != dfd) {
> + close(fd);
> + }
> + }
> + closedir(dir);
> +#endif /* _WIN32 */
> +
> + return 0;
> +}
I'd prefer to order this from most to least preferred:
close_range()
iterate over /proc/self/fd
iterate from first to last
Unlike close_range(), qemu_close_range() returns 0 when last < first.
If we want to deviate from close_range(), we better document the
differences.
This is a generalized version of async-teardown.c's close_all_open_fd().
I'd mention this in the commit message. Suggestion, not demand.
> +
> /*
> * Delete a file from the filesystem, unless the filename is /dev/fdset/...
> *
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v3 4/6] util/osdep: Introduce qemu_close_range()
2023-06-19 9:22 ` Markus Armbruster
@ 2023-06-28 15:40 ` Bin Meng
0 siblings, 0 replies; 19+ messages in thread
From: Bin Meng @ 2023-06-28 15:40 UTC (permalink / raw)
To: Markus Armbruster
Cc: qemu-devel, falcon, afaria, kwolf, marcandre.lureau, nivanov,
pbonzini, thuth, xuzhou.cheng
On 2023/6/19 17:22:53, "Markus Armbruster" <armbru@redhat.com> wrote:
>Bin Meng <bmeng@tinylab.org> writes:
>
>> This introduces a new QEMU API qemu_close_range() that closes all
>> open file descriptors from first to last (included).
>>
>> This API will try a more efficient call to close_range(), or walk
>> through of /proc/self/fd whenever these are possible, otherwise it
>> falls back to a plain close loop.
>>
>> Co-developed-by: Zhangjin Wu <falcon@tinylab.org>
>> Signed-off-by: Bin Meng <bmeng@tinylab.org>
>>
>> ---
>>
>> Changes in v3:
>> - fix win32 build failure
>>
>> Changes in v2:
>> - new patch: "util/osdep: Introduce qemu_close_range()"
>>
>> include/qemu/osdep.h | 1 +
>> util/osdep.c | 48 ++++++++++++++++++++++++++++++++++++++++++++
>> 2 files changed, 49 insertions(+)
>>
>> diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
>> index cc61b00ba9..e22434ce10 100644
>> --- a/include/qemu/osdep.h
>> +++ b/include/qemu/osdep.h
>> @@ -560,6 +560,7 @@ int qemu_open_old(const char *name, int flags, ...);
>> int qemu_open(const char *name, int flags, Error **errp);
>> int qemu_create(const char *name, int flags, mode_t mode, Error **errp);
>> int qemu_close(int fd);
>> +int qemu_close_range(unsigned int first, unsigned int last);
>> int qemu_unlink(const char *name);
>> #ifndef _WIN32
>> int qemu_dup_flags(int fd, int flags);
>> diff --git a/util/osdep.c b/util/osdep.c
>> index e996c4744a..91275e70f8 100644
>> --- a/util/osdep.c
>> +++ b/util/osdep.c
>> @@ -30,6 +30,7 @@
>> #include "qemu/mprotect.h"
>> #include "qemu/hw-version.h"
>> #include "monitor/monitor.h"
>> +#include <dirent.h>
>>
>> static const char *hw_version = QEMU_HW_VERSION;
>>
>> @@ -411,6 +412,53 @@ int qemu_close(int fd)
>> return close(fd);
>> }
>>
>> +int qemu_close_range(unsigned int first, unsigned int last)
>> +{
>> + DIR *dir = NULL;
>> +
>> +#ifdef CONFIG_CLOSE_RANGE
>> + int r = close_range(first, last, 0);
>
>close_range(2) explains flag
>
> CLOSE_RANGE_UNSHARE
> Unshare the specified file descriptors from any other processes
> before closing them, avoiding races with other threads sharing
> the file descriptor table.
>
>Can anybody explain the races this avoids?
The kernel commit [1] which adds the close_range syscall says:
unshare(CLONE_FILES) should only be needed if the calling
task is multi-threaded and shares the file descriptor table with another
thread in which case two threads could race with one thread allocating
file
descriptors and the other one closing them via close_range().
>
>
>> + if (!r) {
>> + /* Success, no need to try other ways. */
>> + return 0;
>> + }
>
>What are the failure modes of close_range() where the other ways are
>worth trying?
Added first < last check in v4 so that the technically close_range()
should not fail.
>
>
>> +#endif
>> +
>> +#ifdef __linux__
>> + dir = opendir("/proc/self/fd");
>> +#endif
>> + if (!dir) {
>> + /*
>> + * If /proc is not mounted or /proc/self/fd is not supported,
>> + * try close() from first to last.
>> + */
>> + for (int i = first; i <= last; i++) {
>> + close(i);
>> + }
>> +
>> + return 0;
>> + }
>> +
>> +#ifndef _WIN32
>> + /* Avoid closing the directory */
>> + int dfd = dirfd(dir);
>
>This directory contains "." "..", "0", "1", ...
>
>> +
>> + for (struct dirent *de = readdir(dir); de; de = readdir(dir)) {
>> + int fd = atoi(de->d_name);
>
>Maps "." and ".." to 0. Unclean.
>
>Please use qemu_strtoi(de->d_name, NULL, 10, &fd), and skip entries
>where it fails.
Fixed in v4.
>
>
>> + if (fd < first || fd > last) {
>> + /* Exclude the fds outside the target range */
>> + continue;
>> + }
>> + if (fd != dfd) {
>> + close(fd);
>> + }
>> + }
>> + closedir(dir);
>> +#endif /* _WIN32 */
>> +
>> + return 0;
>> +}
>
>I'd prefer to order this from most to least preferred:
>
> close_range()
> iterate over /proc/self/fd
> iterate from first to last
>
>Unlike close_range(), qemu_close_range() returns 0 when last < first.
>If we want to deviate from close_range(), we better document the
>differences.
>
>This is a generalized version of async-teardown.c's close_all_open_fd().
>I'd mention this in the commit message. Suggestion, not demand.
[1]
https://github.com/torvalds/linux/commit/278a5fbaed89dacd04e9d052f4594ffd0e0585de
Regards,
Bin
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH v3 5/6] util/async-teardown: Use qemu_close_range() to close fds
2023-06-17 5:36 [PATCH v3 0/6] net/tap: Fix QEMU frozen issue when the maximum number of file descriptors is very large Bin Meng
` (3 preceding siblings ...)
2023-06-17 5:36 ` [PATCH v3 4/6] util/osdep: Introduce qemu_close_range() Bin Meng
@ 2023-06-17 5:36 ` Bin Meng
2023-06-19 9:19 ` Claudio Imbrenda
2023-06-17 5:36 ` [PATCH v3 6/6] net: tap: " Bin Meng
` (2 subsequent siblings)
7 siblings, 1 reply; 19+ messages in thread
From: Bin Meng @ 2023-06-17 5:36 UTC (permalink / raw)
To: qemu-devel
Cc: Zhangjin Wu, Claudio Imbrenda, Juan Quintela, Michael S. Tsirkin,
Paolo Bonzini
From: Zhangjin Wu <falcon@tinylab.org>
Based on the old close_all_open_fd() of util/async-teardown.c, a new
generic qemu_close_range() has been added in osdep.c.
Now, let's switch over to use the generic qemu_close_range().
Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Bin Meng <bmeng@tinylab.org>
---
Changes in v3:
- limit the last_fd of qemu_close_range() to sysconf(_SC_OPEN_MAX)
Changes in v2:
- new patch: "util/async-teardown: Use qemu_close_range() to close fds"
util/async-teardown.c | 42 ++----------------------------------------
1 file changed, 2 insertions(+), 40 deletions(-)
diff --git a/util/async-teardown.c b/util/async-teardown.c
index 7e0177a8da..e102912f3f 100644
--- a/util/async-teardown.c
+++ b/util/async-teardown.c
@@ -29,44 +29,6 @@
static pid_t the_ppid;
-/*
- * Close all open file descriptors.
- */
-static void close_all_open_fd(void)
-{
- struct dirent *de;
- int fd, dfd;
- DIR *dir;
-
-#ifdef CONFIG_CLOSE_RANGE
- int r = close_range(0, ~0U, 0);
- if (!r) {
- /* Success, no need to try other ways. */
- return;
- }
-#endif
-
- dir = opendir("/proc/self/fd");
- if (!dir) {
- /* If /proc is not mounted, close fds one by one. */
- int open_max = sysconf(_SC_OPEN_MAX), i;
- for (i = 0; i < open_max; i++) {
- close(i);
- }
- return;
- }
- /* Avoid closing the directory. */
- dfd = dirfd(dir);
-
- for (de = readdir(dir); de; de = readdir(dir)) {
- fd = atoi(de->d_name);
- if (fd != dfd) {
- close(fd);
- }
- }
- closedir(dir);
-}
-
static void hup_handler(int signal)
{
/* Check every second if this process has been reparented. */
@@ -84,6 +46,7 @@ static int async_teardown_fn(void *arg)
struct sigaction sa = { .sa_handler = hup_handler };
sigset_t hup_signal;
char name[16];
+ int open_max = sysconf(_SC_OPEN_MAX);
/* Set a meaningful name for this process. */
snprintf(name, 16, "cleanup/%d", the_ppid);
@@ -92,9 +55,8 @@ static int async_teardown_fn(void *arg)
/*
* Close all file descriptors that might have been inherited from the
* main qemu process when doing clone, needed to make libvirt happy.
- * Not using close_range for increased compatibility with older kernels.
*/
- close_all_open_fd();
+ qemu_close_range(0, open_max - 1);
/* Set up a handler for SIGHUP and unblock SIGHUP. */
sigaction(SIGHUP, &sa, NULL);
--
2.34.1
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH v3 5/6] util/async-teardown: Use qemu_close_range() to close fds
2023-06-17 5:36 ` [PATCH v3 5/6] util/async-teardown: Use qemu_close_range() to close fds Bin Meng
@ 2023-06-19 9:19 ` Claudio Imbrenda
0 siblings, 0 replies; 19+ messages in thread
From: Claudio Imbrenda @ 2023-06-19 9:19 UTC (permalink / raw)
To: Bin Meng
Cc: qemu-devel, Zhangjin Wu, Juan Quintela, Michael S. Tsirkin,
Paolo Bonzini
On Sat, 17 Jun 2023 13:36:20 +0800
Bin Meng <bmeng@tinylab.org> wrote:
> From: Zhangjin Wu <falcon@tinylab.org>
>
> Based on the old close_all_open_fd() of util/async-teardown.c, a new
> generic qemu_close_range() has been added in osdep.c.
>
> Now, let's switch over to use the generic qemu_close_range().
>
> Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
> Signed-off-by: Bin Meng <bmeng@tinylab.org>
>
> ---
>
> Changes in v3:
> - limit the last_fd of qemu_close_range() to sysconf(_SC_OPEN_MAX)
>
> Changes in v2:
> - new patch: "util/async-teardown: Use qemu_close_range() to close fds"
>
> util/async-teardown.c | 42 ++----------------------------------------
> 1 file changed, 2 insertions(+), 40 deletions(-)
>
> diff --git a/util/async-teardown.c b/util/async-teardown.c
> index 7e0177a8da..e102912f3f 100644
> --- a/util/async-teardown.c
> +++ b/util/async-teardown.c
> @@ -29,44 +29,6 @@
>
> static pid_t the_ppid;
>
> -/*
> - * Close all open file descriptors.
> - */
> -static void close_all_open_fd(void)
> -{
> - struct dirent *de;
> - int fd, dfd;
> - DIR *dir;
> -
> -#ifdef CONFIG_CLOSE_RANGE
> - int r = close_range(0, ~0U, 0);
> - if (!r) {
> - /* Success, no need to try other ways. */
> - return;
> - }
> -#endif
> -
> - dir = opendir("/proc/self/fd");
> - if (!dir) {
> - /* If /proc is not mounted, close fds one by one. */
> - int open_max = sysconf(_SC_OPEN_MAX), i;
> - for (i = 0; i < open_max; i++) {
> - close(i);
> - }
> - return;
> - }
> - /* Avoid closing the directory. */
> - dfd = dirfd(dir);
> -
> - for (de = readdir(dir); de; de = readdir(dir)) {
> - fd = atoi(de->d_name);
> - if (fd != dfd) {
> - close(fd);
> - }
> - }
> - closedir(dir);
> -}
> -
> static void hup_handler(int signal)
> {
> /* Check every second if this process has been reparented. */
> @@ -84,6 +46,7 @@ static int async_teardown_fn(void *arg)
> struct sigaction sa = { .sa_handler = hup_handler };
> sigset_t hup_signal;
> char name[16];
> + int open_max = sysconf(_SC_OPEN_MAX);
>
> /* Set a meaningful name for this process. */
> snprintf(name, 16, "cleanup/%d", the_ppid);
> @@ -92,9 +55,8 @@ static int async_teardown_fn(void *arg)
> /*
> * Close all file descriptors that might have been inherited from the
> * main qemu process when doing clone, needed to make libvirt happy.
> - * Not using close_range for increased compatibility with older kernels.
> */
> - close_all_open_fd();
> + qemu_close_range(0, open_max - 1);
I think it would look easier to read if you just put the sysconf() call
here and avoid an extra variable
>
> /* Set up a handler for SIGHUP and unblock SIGHUP. */
> sigaction(SIGHUP, &sa, NULL);
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH v3 6/6] net: tap: Use qemu_close_range() to close fds
2023-06-17 5:36 [PATCH v3 0/6] net/tap: Fix QEMU frozen issue when the maximum number of file descriptors is very large Bin Meng
` (4 preceding siblings ...)
2023-06-17 5:36 ` [PATCH v3 5/6] util/async-teardown: Use qemu_close_range() to close fds Bin Meng
@ 2023-06-17 5:36 ` Bin Meng
2023-06-19 9:23 ` Claudio Imbrenda
2023-06-19 7:13 ` [PATCH v3 0/6] net/tap: Fix QEMU frozen issue when the maximum number of file descriptors is very large Richard Henderson
2023-06-28 17:13 ` Michael Tokarev
7 siblings, 1 reply; 19+ messages in thread
From: Bin Meng @ 2023-06-17 5:36 UTC (permalink / raw)
To: qemu-devel; +Cc: Zhangjin Wu, Jason Wang
From: Zhangjin Wu <falcon@tinylab.org>
Current codes using a brute-force traversal of all file descriptors
do not scale on a system where the maximum number of file descriptors
is set to a very large value (e.g.: in a Docker container of Manjaro
distribution it is set to 1073741816). QEMU just looks frozen during
start-up.
The close-on-exec flag (O_CLOEXEC) was introduced since Linux kernel
2.6.23, FreeBSD 8.3, OpenBSD 5.0, Solaris 11. While it's true QEMU
doesn't need to manually close the fds for child process as the proper
O_CLOEXEC flag should have been set properly on files with its own
codes, QEMU uses a huge number of 3rd party libraries and we don't
trust them to reliably be using O_CLOEXEC on everything they open.
Modern Linux and BSDs have the close_range() call we can use to do the
job, and on Linux we have one more way to walk through /proc/self/fd
to complete the task efficiently, which is what qemu_close_range() does.
Reported-by: Zhangjin Wu <falcon@tinylab.org>
Co-developed-by: Bin Meng <bmeng@tinylab.org>
Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Bin Meng <bmeng@tinylab.org>
---
(no changes since v2)
Changes in v2:
- Change to use qemu_close_range() to close fds for child process efficiently
- v1 link: https://lore.kernel.org/qemu-devel/20230406112041.798585-1-bmeng@tinylab.org/
net/tap.c | 23 +++++++++++------------
1 file changed, 11 insertions(+), 12 deletions(-)
diff --git a/net/tap.c b/net/tap.c
index 1bf085d422..d482fabdff 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -446,13 +446,13 @@ static void launch_script(const char *setup_script, const char *ifname,
return;
}
if (pid == 0) {
- int open_max = sysconf(_SC_OPEN_MAX), i;
+ unsigned int last_fd = sysconf(_SC_OPEN_MAX) - 1;
+
+ /* skip stdin, stdout and stderr */
+ qemu_close_range(3, fd - 1);
+ /* skip the currently used fd */
+ qemu_close_range(fd + 1, last_fd);
- for (i = 3; i < open_max; i++) {
- if (i != fd) {
- close(i);
- }
- }
parg = args;
*parg++ = (char *)setup_script;
*parg++ = (char *)ifname;
@@ -536,16 +536,15 @@ static int net_bridge_run_helper(const char *helper, const char *bridge,
return -1;
}
if (pid == 0) {
- int open_max = sysconf(_SC_OPEN_MAX), i;
+ unsigned int last_fd = sysconf(_SC_OPEN_MAX) - 1, fd = sv[1];
char *fd_buf = NULL;
char *br_buf = NULL;
char *helper_cmd = NULL;
- for (i = 3; i < open_max; i++) {
- if (i != sv[1]) {
- close(i);
- }
- }
+ /* skip stdin, stdout and stderr */
+ qemu_close_range(3, fd - 1);
+ /* skip the currently used fd */
+ qemu_close_range(fd + 1, last_fd);
fd_buf = g_strdup_printf("%s%d", "--fd=", sv[1]);
--
2.34.1
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH v3 6/6] net: tap: Use qemu_close_range() to close fds
2023-06-17 5:36 ` [PATCH v3 6/6] net: tap: " Bin Meng
@ 2023-06-19 9:23 ` Claudio Imbrenda
0 siblings, 0 replies; 19+ messages in thread
From: Claudio Imbrenda @ 2023-06-19 9:23 UTC (permalink / raw)
To: Bin Meng; +Cc: qemu-devel, Zhangjin Wu, Jason Wang
On Sat, 17 Jun 2023 13:36:21 +0800
Bin Meng <bmeng@tinylab.org> wrote:
> From: Zhangjin Wu <falcon@tinylab.org>
>
> Current codes using a brute-force traversal of all file descriptors
> do not scale on a system where the maximum number of file descriptors
> is set to a very large value (e.g.: in a Docker container of Manjaro
> distribution it is set to 1073741816). QEMU just looks frozen during
> start-up.
>
> The close-on-exec flag (O_CLOEXEC) was introduced since Linux kernel
> 2.6.23, FreeBSD 8.3, OpenBSD 5.0, Solaris 11. While it's true QEMU
> doesn't need to manually close the fds for child process as the proper
> O_CLOEXEC flag should have been set properly on files with its own
> codes, QEMU uses a huge number of 3rd party libraries and we don't
> trust them to reliably be using O_CLOEXEC on everything they open.
>
> Modern Linux and BSDs have the close_range() call we can use to do the
> job, and on Linux we have one more way to walk through /proc/self/fd
> to complete the task efficiently, which is what qemu_close_range() does.
>
> Reported-by: Zhangjin Wu <falcon@tinylab.org>
> Co-developed-by: Bin Meng <bmeng@tinylab.org>
> Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
> Signed-off-by: Bin Meng <bmeng@tinylab.org>
>
> ---
>
> (no changes since v2)
>
> Changes in v2:
> - Change to use qemu_close_range() to close fds for child process efficiently
> - v1 link: https://lore.kernel.org/qemu-devel/20230406112041.798585-1-bmeng@tinylab.org/
>
> net/tap.c | 23 +++++++++++------------
> 1 file changed, 11 insertions(+), 12 deletions(-)
>
> diff --git a/net/tap.c b/net/tap.c
> index 1bf085d422..d482fabdff 100644
> --- a/net/tap.c
> +++ b/net/tap.c
> @@ -446,13 +446,13 @@ static void launch_script(const char *setup_script, const char *ifname,
> return;
> }
> if (pid == 0) {
> - int open_max = sysconf(_SC_OPEN_MAX), i;
> + unsigned int last_fd = sysconf(_SC_OPEN_MAX) - 1;
> +
> + /* skip stdin, stdout and stderr */
> + qemu_close_range(3, fd - 1);
> + /* skip the currently used fd */
> + qemu_close_range(fd + 1, last_fd);
>
> - for (i = 3; i < open_max; i++) {
> - if (i != fd) {
> - close(i);
> - }
> - }
> parg = args;
> *parg++ = (char *)setup_script;
> *parg++ = (char *)ifname;
> @@ -536,16 +536,15 @@ static int net_bridge_run_helper(const char *helper, const char *bridge,
> return -1;
> }
> if (pid == 0) {
> - int open_max = sysconf(_SC_OPEN_MAX), i;
> + unsigned int last_fd = sysconf(_SC_OPEN_MAX) - 1, fd = sv[1];
please put fd on its own line
> char *fd_buf = NULL;
> char *br_buf = NULL;
> char *helper_cmd = NULL;
>
> - for (i = 3; i < open_max; i++) {
> - if (i != sv[1]) {
> - close(i);
> - }
> - }
> + /* skip stdin, stdout and stderr */
> + qemu_close_range(3, fd - 1);
> + /* skip the currently used fd */
> + qemu_close_range(fd + 1, last_fd);
>
> fd_buf = g_strdup_printf("%s%d", "--fd=", sv[1]);
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v3 0/6] net/tap: Fix QEMU frozen issue when the maximum number of file descriptors is very large
2023-06-17 5:36 [PATCH v3 0/6] net/tap: Fix QEMU frozen issue when the maximum number of file descriptors is very large Bin Meng
` (5 preceding siblings ...)
2023-06-17 5:36 ` [PATCH v3 6/6] net: tap: " Bin Meng
@ 2023-06-19 7:13 ` Richard Henderson
2023-06-28 17:13 ` Michael Tokarev
7 siblings, 0 replies; 19+ messages in thread
From: Richard Henderson @ 2023-06-19 7:13 UTC (permalink / raw)
To: Bin Meng, qemu-devel
Cc: Zhangjin Wu, Alberto Faria, Claudio Imbrenda, Edgar E. Iglesias,
Jason Wang, Juan Quintela, Kevin Wolf, Marc-André Lureau,
Markus Armbruster, Michael S. Tsirkin, Nikita Ivanov,
Paolo Bonzini, Thomas Huth, Xuzhou Cheng
On 6/17/23 07:36, Bin Meng wrote:
> Current codes using a brute-force traversal of all file descriptors
> do not scale on a system where the maximum number of file descriptors
> is set to a very large value (e.g.: in a Docker container of Manjaro
> distribution it is set to 1073741816). QEMU just looks frozen during
> start-up.
>
> The close-on-exec flag (O_CLOEXEC) was introduced since Linux kernel
> 2.6.23, FreeBSD 8.3, OpenBSD 5.0, Solaris 11. While it's true QEMU
> doesn't need to manually close the fds for child process as the proper
> O_CLOEXEC flag should have been set properly on files with its own
> codes, QEMU uses a huge number of 3rd party libraries and we don't
> trust them to reliably be using O_CLOEXEC on everything they open.
>
> Modern Linux and BSDs have the close_range() call we can use to do the
> job, and on Linux we have one more way to walk through /proc/self/fd
> to complete the task efficiently, which is what qemu_close_range()
> does, a new API we add in util/osdep.c.
>
> V1 link:https://lore.kernel.org/qemu-devel/20230406112041.798585-1-bmeng@tinylab.org/
>
> Changes in v3:
> - fix win32 build failure
> - limit the last_fd of qemu_close_range() to sysconf(_SC_OPEN_MAX)
Sorry, I didn't see this and sent some comments against v2.
Though using _SC_OPEN_MAX was one of them, so that's nice. :-)
r~
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v3 0/6] net/tap: Fix QEMU frozen issue when the maximum number of file descriptors is very large
2023-06-17 5:36 [PATCH v3 0/6] net/tap: Fix QEMU frozen issue when the maximum number of file descriptors is very large Bin Meng
` (6 preceding siblings ...)
2023-06-19 7:13 ` [PATCH v3 0/6] net/tap: Fix QEMU frozen issue when the maximum number of file descriptors is very large Richard Henderson
@ 2023-06-28 17:13 ` Michael Tokarev
7 siblings, 0 replies; 19+ messages in thread
From: Michael Tokarev @ 2023-06-28 17:13 UTC (permalink / raw)
To: Bin Meng, qemu-devel
Cc: Zhangjin Wu, Alberto Faria, Claudio Imbrenda, Edgar E. Iglesias,
Jason Wang, Juan Quintela, Kevin Wolf, Marc-André Lureau,
Markus Armbruster, Michael S. Tsirkin, Nikita Ivanov,
Paolo Bonzini, Thomas Huth, Xuzhou Cheng
17.06.2023 08:36, Bin Meng wrote:
>
> Current codes using a brute-force traversal of all file descriptors
> do not scale on a system where the maximum number of file descriptors
> is set to a very large value (e.g.: in a Docker container of Manjaro
> distribution it is set to 1073741816). QEMU just looks frozen during
> start-up.
What's the reason to close all these file descriptors in the first place?
No other software I know does this.
For some situations, such closing is actually bad, -- think, eg,
flock lockfile qemu-system-foo ... --
this one opens a file, locks it using fcntl/flock, and executes the
command, keeping the file descriptor open across exec, so the file
stays locked until the process terminates. This works and works well.
Qemu with its let's-close-everything approach breaks this.
Why? :)
/mjt
^ permalink raw reply [flat|nested] 19+ messages in thread