Re: [PATCH v2 2/5] 9pfs: fix qemu_mknodat(S_IFSOCK) on macOS

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Christian Schoenebeck <qemu_oss@crudebyte.com>
To: qemu-devel@nongnu.org
Cc: Keno Fischer <keno@juliacomputing.com>,
	Michael Roitzsch <reactorcontrol@icloud.com>,
	Will Cohen <wwcohen@gmail.com>,  Greg Kurz <groug@kaod.org>,
	qemu-stable@nongnu.org, Akihiko Odaki <akihiko.odaki@gmail.com>
Subject: Re: [PATCH v2 2/5] 9pfs: fix qemu_mknodat(S_IFSOCK) on macOS
Date: Sun, 24 Apr 2022 20:45:21 +0200	[thread overview]
Message-ID: <3849551.ofAv5PygDX@silver> (raw)
In-Reply-To: <eafd4bbf-dbff-323a-179f-8f29905701e1@gmail.com>

On Samstag, 23. April 2022 06:33:50 CEST Akihiko Odaki wrote:
> On 2022/04/22 23:06, Christian Schoenebeck wrote:
> > On Freitag, 22. April 2022 04:43:40 CEST Akihiko Odaki wrote:
> >> On 2022/04/22 0:07, Christian Schoenebeck wrote:
> >>> mknod() on macOS does not support creating sockets, so divert to
> >>> call sequence socket(), bind() and chmod() respectively if S_IFSOCK
> >>> was passed with mode argument.
> >>> 
> >>> Link: https://lore.kernel.org/qemu-devel/17933734.zYzKuhC07K@silver/
> >>> Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
> >>> Reviewed-by: Will Cohen <wwcohen@gmail.com>
> >>> ---
> >>> 
> >>>    hw/9pfs/9p-util-darwin.c | 27 ++++++++++++++++++++++++++-
> >>>    1 file changed, 26 insertions(+), 1 deletion(-)
> >>> 
> >>> diff --git a/hw/9pfs/9p-util-darwin.c b/hw/9pfs/9p-util-darwin.c
> >>> index e24d09763a..39308f2a45 100644
> >>> --- a/hw/9pfs/9p-util-darwin.c
> >>> +++ b/hw/9pfs/9p-util-darwin.c
> >>> @@ -74,6 +74,27 @@ int fsetxattrat_nofollow(int dirfd, const char
> >>> *filename, const char *name,>
> >>> 
> >>>     */
> >>>    
> >>>    #if defined CONFIG_PTHREAD_FCHDIR_NP
> >>> 
> >>> +static int create_socket_file_at_cwd(const char *filename, mode_t mode)
> >>> {
> >>> +    int fd, err;
> >>> +    struct sockaddr_un addr = {
> >>> +        .sun_family = AF_UNIX
> >>> +    };
> >>> +
> >>> +    fd = socket(PF_UNIX, SOCK_DGRAM, 0);
> >>> +    if (fd == -1) {
> >>> +        return fd;
> >>> +    }
> >>> +    snprintf(addr.sun_path, sizeof(addr.sun_path), "./%s", filename);
> >> 
> >> It would result in an incorrect path if the path does not fit in
> >> addr.sun_path. It should report an explicit error instead.
> > 
> > Looking at its header file, 'sun_path' is indeed defined on macOS with an
> > oddly small size of only 104 bytes. So yes, I should explicitly handle
> > that
> > error case.
> > 
> > I'll post a v3.
> > 
> >>> +    err = bind(fd, (struct sockaddr *) &addr, sizeof(addr));
> >>> +    if (err == -1) {
> >>> +        goto out;
> >> 
> >> You may close(fd) as soon as bind() returns (before checking the
> >> returned value) and eliminate goto.
> > 
> > Yeah, I thought about that alternative, but found it a bit ugly, and
> > probably also counter-productive in case this function might get extended
> > with more error pathes in future. Not that I would insist on the current
> > solution though.
> 
> I'm happy with the explanation. Thanks.
> 
> >>> +    }
> >>> +    err = chmod(addr.sun_path, mode);
> >> 
> >> I'm not sure if it is fine to have a time window between bind() and
> >> chmod(). Do you have some rationale?
> > 
> > Good question. QEMU's 9p server is multi-threaded; all 9p requests come in
> > serialized and the 9p server controller portion (9p.c) is only running on
> > QEMU main thread, but the actual filesystem driver calls are then
> > dispatched to QEMU worker threads and therefore running concurrently at
> > this point:
> > 
> > https://wiki.qemu.org/Documentation/9p#Threads_and_Coroutines
> > 
> > Similar situation on Linux 9p client side: it handles access to a mounted
> > 9p filesystem concurrently, requests are then serialized by 9p driver on
> > Linux and sent over wire to 9p server (host).
> > 
> > So yes, there might be implications by that short time windows. But could
> > that be exploited on macOS hosts in practice?
> > 
> > The socket file would have mode srwxr-xr-x for a short moment.
> > 
> > For security_model=mapped* this should not be a problem.
> > 
> > For security_model=none|passhrough, in theory, maybe? But how likely is
> > that? If you are using a Linux client for instance, trying to brute-force
> > opening the socket file, the client would send several 9p commands
> > (Twalk, Tgetattr, Topen, probably more). The time window of the two
> > commands above should be much smaller than that and I would expect one of
> > the 9p commands to error out in between.
> > 
> > What would be a viable approach to avoid this issue on macOS?
> 
> It is unlikely that a naive brute-force approach will succeed to
> exploit. The more concerning scenario is that the attacker uses the
> knowledge of the underlying implementation of macOS to cause resource
> contention to widen the window. Whether an exploitation is viable
> depends on how much time you spend digging XNU.
> 
> However, I'm also not sure if it really *has* a race condition. Looking
> at v9fs_co_mknod(), it sequentially calls s->ops->mknod() and
> s->ops->lstat(). It also results in an entity called "path name based
> fid" in the code, which inherently cannot identify a file when it is
> renamed or recreated.
> 
> If there is some rationale it is safe, it may also be applied to the
> sequence of bind() and chmod(). Can anyone explain the sequence of
> s->ops->mknod() and s->ops->lstat() or path name based fid in general?

You are talking about 9p server's controller level: I don't see something that 
would prevent a concurrent open() during this bind() ... chmod() time window 
unfortunately.

Argument 'fidp' passed to function v9fs_co_mknod() reflects the directory in 
which the new device file shall be created. So 'fidp' is not the device file 
here, nor is 'fidp' modified during this function.

Function v9fs_co_mknod() is entered by 9p server on QEMU main thread. At the 
beginning of the function it first acquires a read lock on a (per 9p export) 
global coroutine mutex:

    v9fs_path_read_lock(s);

and holds this lock until returning from function v9fs_co_mknod(). But that's 
just a read lock. Function v9fs_co_open() also just gains a read lock. So they 
can happen concurrently.

Then v9fs_co_run_in_worker({...}) is called to dispatch and execute all the 
code block (think of it as an Obj-C "block") inside this (macro actually) on a 
QEMU worker thread. So an arbitrary background thread would then call the fs 
driver functions:

    s->ops->mknod()
    v9fs_name_to_path()
    s->ops->lstat()

and then at the end of the code block the background thread would dispatch 
back to QEMU main thread. So when we are reaching:

    v9fs_path_unlock(s);

we are already back on QEMU main thread, hence unlocking on main thread now 
and finally leaving function v9fs_co_mknod().

The important thing to understand is, while that

    v9fs_co_run_in_worker({...})

code block is executed on a QEMU worker thread, the QEMU main thread (9p 
server controller portion, i.e. 9p.c) is *not* sleeping, QEMU main thread 
rather continues to process other (if any) client requests in the meantime. In 
other words v9fs_co_run_in_worker() neither behaves exactly like Apple's GCD 
dispatch_async(), nor like dispatch_sync(), as GCD is not coroutine based.

So 9p server might pull a pending 'Topen' client request from the input FIFO 
in the meantime and likewise dispatch that to a worker thread, etc. Hence a 
concurrent open() might in theory be possible, but I find it quite unlikely to 
succeed in practice as the open() call on guest is translated by Linux client 
into a bunch of synchronous 9p requests on the path passed with the open() 
call on guest, and a round trip for each 9p message is like what, ~0.3ms or 
something in this order. That's quite huge compared to the time window I would 
expect between bind() ... open().

Does this answer your questions?

> Regards,
> Akihiko Odaki
> 
> >>> +out:
> >>> +    close(fd);
> >>> +    return err;
> >>> +}
> >>> +
> >>> 
> >>>    int qemu_mknodat(int dirfd, const char *filename, mode_t mode, dev_t
> >>>    dev)
> >>>    {
> >>>    
> >>>        int preserved_errno, err;
> >>> 
> >>> @@ -93,7 +114,11 @@ int qemu_mknodat(int dirfd, const char *filename,
> >>> mode_t mode, dev_t dev)>
> >>> 
> >>>        if (pthread_fchdir_np(dirfd) < 0) {
> >>>        
> >>>            return -1;
> >>>        
> >>>        }
> >>> 
> >>> -    err = mknod(filename, mode, dev);
> >>> +    if (S_ISSOCK(mode)) {
> >>> +        err = create_socket_file_at_cwd(filename, mode);
> >>> +    } else {
> >>> +        err = mknod(filename, mode, dev);
> >>> +    }
> >>> 
> >>>        preserved_errno = errno;
> >>>        /* Stop using the thread-local cwd */
> >>>        pthread_fchdir_np(-1);

next prev parent reply	other threads:[~2022-04-24 18:47 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-21 15:08 [PATCH v2 0/5] 9pfs: macOS host fixes Christian Schoenebeck
2022-04-21 15:07 ` [PATCH v2 1/5] 9pfs: fix qemu_mknodat(S_IFREG) on macOS Christian Schoenebeck
2022-04-21 16:32   ` Greg Kurz
2022-04-21 15:07 ` [PATCH v2 2/5] 9pfs: fix qemu_mknodat(S_IFSOCK) " Christian Schoenebeck
2022-04-21 16:36   ` Greg Kurz
2022-04-21 17:29     ` Christian Schoenebeck
2022-04-22  2:43   ` Akihiko Odaki
2022-04-22 14:06     ` Christian Schoenebeck
2022-04-23  4:33       ` Akihiko Odaki
2022-04-24 18:45         ` Christian Schoenebeck [this message]
2022-04-26  3:57           ` Akihiko Odaki
2022-04-26 12:38             ` Greg Kurz
2022-04-27  2:27               ` Akihiko Odaki
2022-04-27 10:18                 ` Greg Kurz
2022-04-27 12:32                   ` Christian Schoenebeck
2022-04-27 13:31                     ` Greg Kurz
2022-04-27 16:18                       ` Christian Schoenebeck
2022-04-27 17:12                         ` Will Cohen
2022-04-27 18:16                           ` Christian Schoenebeck
2022-04-27 17:37                         ` Greg Kurz
2022-04-27 18:36                           ` Christian Schoenebeck
2022-04-21 15:07 ` [PATCH v2 3/5] 9pfs: fix wrong encoding of rdev field in Rgetattr " Christian Schoenebeck
2022-04-21 16:39   ` Greg Kurz
2022-04-21 15:07 ` [PATCH v2 4/5] 9pfs: fix wrong errno being sent to Linux client on macOS host Christian Schoenebeck
2022-04-21 16:39   ` Greg Kurz
2022-04-21 15:07 ` [PATCH v2 5/5] 9pfs: fix removing non-existent POSIX ACL xattr " Christian Schoenebeck
2022-04-21 16:40   ` Greg Kurz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3849551.ofAv5PygDX@silver \
    --to=qemu_oss@crudebyte.com \
    --cc=akihiko.odaki@gmail.com \
    --cc=groug@kaod.org \
    --cc=keno@juliacomputing.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-stable@nongnu.org \
    --cc=reactorcontrol@icloud.com \
    --cc=wwcohen@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.