All of lore.kernel.org
 help / color / mirror / Atom feed
From: Markus Armbruster <armbru@redhat.com>
To: Bin Meng <bmeng@tinylab.org>
Cc: qemu-devel@nongnu.org,
	"Richard Henderson" <richard.henderson@linaro.org>,
	"Zhangjin Wu" <falcon@tinylab.org>,
	"Daniel P. Berrangé" <berrange@redhat.com>,
	"Kevin Wolf" <kwolf@redhat.com>,
	"Marc-André Lureau" <marcandre.lureau@redhat.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Thomas Huth" <thuth@redhat.com>,
	"Xuzhou Cheng" <xuzhou.cheng@windriver.com>
Subject: Re: [PATCH v4 4/6] util/osdep: Introduce qemu_close_range()
Date: Fri, 07 Jul 2023 16:40:38 +0200	[thread overview]
Message-ID: <877crbg4yh.fsf@pond.sub.org> (raw)
In-Reply-To: <20230628152726.110295-5-bmeng@tinylab.org> (Bin Meng's message of "Wed, 28 Jun 2023 23:27:24 +0800")

Bin Meng <bmeng@tinylab.org> writes:

> This introduces a new QEMU API qemu_close_range() that closes all
> open file descriptors from first to last (included).
>
> This API will try a more efficient call to close_range(), or walk
> through of /proc/self/fd whenever these are possible, otherwise it
> falls back to a plain close loop.
>
> Co-developed-by: Zhangjin Wu <falcon@tinylab.org>
> Signed-off-by: Bin Meng <bmeng@tinylab.org>
>
> ---
>
> Changes in v4:
> - add 'first > last' check logic
> - reorder the ifdefs logic
> - change i to unsigned int type
> - use qemu_strtoi() instead of atoi()
> - limit last upper value to sysconf(_SC_OPEN_MAX) - 1
>
> Changes in v3:
> - fix win32 build failure
>
> Changes in v2:
> - new patch: "util/osdep: Introduce qemu_close_range()"
>
>  include/qemu/osdep.h |  1 +
>  util/osdep.c         | 60 ++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 61 insertions(+)
>
> diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
> index cc61b00ba9..e22434ce10 100644
> --- a/include/qemu/osdep.h
> +++ b/include/qemu/osdep.h
> @@ -560,6 +560,7 @@ int qemu_open_old(const char *name, int flags, ...);
>  int qemu_open(const char *name, int flags, Error **errp);
>  int qemu_create(const char *name, int flags, mode_t mode, Error **errp);
>  int qemu_close(int fd);
> +int qemu_close_range(unsigned int first, unsigned int last);
>  int qemu_unlink(const char *name);
>  #ifndef _WIN32
>  int qemu_dup_flags(int fd, int flags);
> diff --git a/util/osdep.c b/util/osdep.c
> index e996c4744a..1d8c719b3f 100644
> --- a/util/osdep.c
> +++ b/util/osdep.c
> @@ -411,6 +411,66 @@ int qemu_close(int fd)
>      return close(fd);
>  }
>  
> +int qemu_close_range(unsigned int first, unsigned int last)
> +{
> +    if (first > last) {
> +        errno = EINVAL;
> +        return -1;
> +    }
> +
> +#ifndef _WIN32
> +    if (last >= sysconf(_SC_OPEN_MAX)) {
> +        last = sysconf(_SC_OPEN_MAX) - 1;
> +    }
> +#endif
> +
> +#ifdef CONFIG_CLOSE_RANGE
> +    int r = close_range(first, last, 0);

TOCTTOU if sysconf(_SC_OPEN_MAX) can change at run time.

Say the caller passes ~0U to @last, like the example program in
close_range()'s manual page does.

Since this is larger than sysconf(_SC_OPEN_MAX), we clamp it to
sysconf(_SC_OPEN_MAX).  If the actual value increases before we get to
call close_range(), we can fail to close all fds.

Can't happen if we simply drop the clamping.

> +    if (!r) {
> +        /* Success, no need to try other ways */
> +        return 0;
> +    }
> +#endif

What are the failure modes of close_range() where the other ways are
worth trying?  I asked this in review of v3, and you replied it should
only ever fail when first > last, which you catch before getting here in
v4.

Why not simply return r?

> +
> +#ifdef __linux__
> +    DIR *dir = opendir("/proc/self/fd");
> +    if (dir) {
> +        /* Avoid closing the directory */
> +        int dfd = dirfd(dir);
> +
> +        for (struct dirent *de = readdir(dir); de; de = readdir(dir)) {
> +            int fd, ret;
> +
> +            ret = qemu_strtoi(de->d_name, NULL, 10, &fd);
> +            if (ret) {
> +                /* skip "." and ".." */

Anything that isn't a decimal integer, actually.  Should be just "." and
".." in practice.

> +                continue;
> +            }
> +            if (fd < first || fd > last) {

Not clamping @last is just fine here.

> +                /* Exclude the fds outside the target range */

Suggest to put this comment right before the if.

> +                continue;
> +            }
> +            if (fd != dfd) {
> +                close(fd);
> +            }

Do we still need this now we skip "."?

> +        }
> +        closedir(dir);
> +
> +        return 0;
> +    }
> +#endif
> +
> +    /*
> +     * If /proc is not mounted or /proc/self/fd is not supported,
> +     * try close() from first to last.
> +     */
> +    for (unsigned int i = first; i <= last; i++) {

Here, we do need to stop at sysconf(_SC_OPEN_MAX) - 1.  Recommend to
move the clamping before this loop.

Still a TOCTTOU, but acceptable here, because this fallback is
fundamentally racy no matter what.

> +        close(i);
> +    }
> +
> +    return 0;
> +}
> +
>  /*
>   * Delete a file from the filesystem, unless the filename is /dev/fdset/...
>   *



  reply	other threads:[~2023-07-07 14:41 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-28 15:27 [PATCH v4 0/6] net/tap: Fix QEMU frozen issue when the maximum number of file descriptors is very large Bin Meng
2023-06-28 15:27 ` [PATCH v4 1/6] tests/tcg/cris: Fix the coding style Bin Meng
2023-06-28 15:27 ` [PATCH v4 2/6] tests/tcg/cris: Correct the off-by-one error Bin Meng
2023-06-28 15:27 ` [PATCH v4 3/6] util/async-teardown: Fall back to close fds one by one Bin Meng
2023-06-28 15:27 ` [PATCH v4 4/6] util/osdep: Introduce qemu_close_range() Bin Meng
2023-07-07 14:40   ` Markus Armbruster [this message]
2023-06-28 15:27 ` [PATCH v4 5/6] util/async-teardown: Use qemu_close_range() to close fds Bin Meng
2023-07-07 14:40   ` Markus Armbruster
2023-06-28 15:27 ` [PATCH v4 6/6] net: tap: " Bin Meng
2023-06-30  4:52   ` Jason Wang
2023-06-29  8:33 ` [PATCH v4 0/6] net/tap: Fix QEMU frozen issue when the maximum number of file descriptors is very large Michael Tokarev
2023-06-29  9:05   ` Daniel P. Berrangé
2023-07-09 15:47 ` Bin Meng
2023-07-10  3:05   ` Jason Wang
2023-07-10  6:07     ` Markus Armbruster
2023-07-11  2:40       ` Jason Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=877crbg4yh.fsf@pond.sub.org \
    --to=armbru@redhat.com \
    --cc=berrange@redhat.com \
    --cc=bmeng@tinylab.org \
    --cc=falcon@tinylab.org \
    --cc=kwolf@redhat.com \
    --cc=marcandre.lureau@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=richard.henderson@linaro.org \
    --cc=thuth@redhat.com \
    --cc=xuzhou.cheng@windriver.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.