* [RFC] UAF in acrn_irqfd_assign() and vfio_virqfd_enable() [not found] ` <20240607210814.GC1629371@ZenIV> @ 2024-06-10 5:12 ` Al Viro 2024-06-10 17:03 ` Al Viro 2024-06-10 20:09 ` Alex Williamson 0 siblings, 2 replies; 6+ messages in thread From: Al Viro @ 2024-06-10 5:12 UTC (permalink / raw) To: Fei Li, Alex Williamson; +Cc: kvm, linux-fsdevel In acrn_irqfd_assign(): irqfd = kzalloc(sizeof(*irqfd), GFP_KERNEL); ... set it up ... mutex_lock(&vm->irqfds_lock); list_for_each_entry(tmp, &vm->irqfds, list) { if (irqfd->eventfd != tmp->eventfd) continue; ret = -EBUSY; mutex_unlock(&vm->irqfds_lock); goto fail; } list_add_tail(&irqfd->list, &vm->irqfds); mutex_unlock(&vm->irqfds_lock); Now irqfd is visible in vm->irqfds. /* Check the pending event in this stage */ events = vfs_poll(f.file, &irqfd->pt); if (events & EPOLLIN) acrn_irqfd_inject(irqfd); OTOH, in static int acrn_irqfd_deassign(struct acrn_vm *vm, struct acrn_irqfd *args) { struct hsm_irqfd *irqfd, *tmp; struct eventfd_ctx *eventfd; eventfd = eventfd_ctx_fdget(args->fd); if (IS_ERR(eventfd)) return PTR_ERR(eventfd); mutex_lock(&vm->irqfds_lock); list_for_each_entry_safe(irqfd, tmp, &vm->irqfds, list) { if (irqfd->eventfd == eventfd) { hsm_irqfd_shutdown(irqfd); and static void hsm_irqfd_shutdown(struct hsm_irqfd *irqfd) { u64 cnt; lockdep_assert_held(&irqfd->vm->irqfds_lock); /* remove from wait queue */ list_del_init(&irqfd->list); eventfd_ctx_remove_wait_queue(irqfd->eventfd, &irqfd->wait, &cnt); eventfd_ctx_put(irqfd->eventfd); kfree(irqfd); } Both acrn_irqfd_assign() and acrn_irqfd_deassign() are callable via ioctl(2), with no serialization whatsoever. Suppose deassign hits as soon as we'd inserted the damn thing into the list. By the time we call vfs_poll() irqfd might have been freed. The same can happen if hsm_irqfd_wakeup() gets called with EPOLLHUP as a key (incidentally, it ought to do __poll_t poll_bits = key_to_poll(key); instead of unsigned long poll_bits = (unsigned long)key; and check for EPOLLIN and EPOLLHUP instead of POLLIN and POLLHUP). AFAICS, that's a UAF... We could move vfs_poll() under vm->irqfds_lock, but that smells like asking for deadlocks ;-/ vfio_virqfd_enable() has the same problem, except that there we definitely can't move vfs_poll() under the lock - it's a spinlock. Could we move vfs_poll() + inject to _before_ making the thing public? We'd need to delay POLLHUP handling there, but then we need it until the moment with do inject anyway. Something like replacing if (!list_empty(&irqfd->list)) hsm_irqfd_shutdown(irqfd); in hsm_irqfd_shutdown_work() with if (!list_empty(&irqfd->list)) hsm_irqfd_shutdown(irqfd); else irqfd->need_shutdown = true; and doing if (unlikely(irqfd->need_shutdown)) hsm_irqfd_shutdown(irqfd); else list_add_tail(&irqfd->list, &vm->irqfds); when the sucker is made visible. I'm *not* familiar with the area, though, so that might be unfeasible for any number of reasons. Suggestions? ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC] UAF in acrn_irqfd_assign() and vfio_virqfd_enable() 2024-06-10 5:12 ` [RFC] UAF in acrn_irqfd_assign() and vfio_virqfd_enable() Al Viro @ 2024-06-10 17:03 ` Al Viro 2024-06-10 20:09 ` Alex Williamson 1 sibling, 0 replies; 6+ messages in thread From: Al Viro @ 2024-06-10 17:03 UTC (permalink / raw) To: Fei Li, Alex Williamson; +Cc: kvm, linux-fsdevel On Mon, Jun 10, 2024 at 06:12:06AM +0100, Al Viro wrote: > vfio_virqfd_enable() has the same problem, except that there we > definitely can't move vfs_poll() under the lock - it's a spinlock. > > Could we move vfs_poll() + inject to _before_ making the thing > public? We'd need to delay POLLHUP handling there, but then > we need it until the moment with do inject anyway. Something > like replacing > if (!list_empty(&irqfd->list)) > hsm_irqfd_shutdown(irqfd); > in hsm_irqfd_shutdown_work() with > if (!list_empty(&irqfd->list)) > hsm_irqfd_shutdown(irqfd); > else > irqfd->need_shutdown = true; > and doing > if (unlikely(irqfd->need_shutdown)) > hsm_irqfd_shutdown(irqfd); > else > list_add_tail(&irqfd->list, &vm->irqfds); > when the sucker is made visible. > > I'm *not* familiar with the area, though, so that might be unfeasible > for any number of reasons. Hmm... OK, so we rely upon EPOLLHUP being generated only upon the final close of eventfd file. And vfio seems to have an exclusion in all callers of vfio_virqfd_{en,dis}able(), which ought to be enough. For drivers/virt/acrn/irqfd.c EPOLLHUP is not a problem for the same reasons, but there's no exclusion between acrn_irqfd_assign() and acrn_irqfd_deassign() calls. So the scenario with explicit deassign racing with assign and leading to vfs_poll(file, <freed memory>) is possible. And it looks like drivers/xen/privcmd.c:privcmd_irqfd_assign() has a similar problem... How about the following for acrn side of things? Does anybody see a problem with that "do vfs_poll() before making the thing visible" approach? diff --git a/drivers/virt/acrn/irqfd.c b/drivers/virt/acrn/irqfd.c index d4ad211dce7a..71c431506a9b 100644 --- a/drivers/virt/acrn/irqfd.c +++ b/drivers/virt/acrn/irqfd.c @@ -133,7 +133,7 @@ static int acrn_irqfd_assign(struct acrn_vm *vm, struct acrn_irqfd *args) eventfd = eventfd_ctx_fileget(f.file); if (IS_ERR(eventfd)) { ret = PTR_ERR(eventfd); - goto fail; + goto out_file; } irqfd->eventfd = eventfd; @@ -145,29 +145,26 @@ static int acrn_irqfd_assign(struct acrn_vm *vm, struct acrn_irqfd *args) init_waitqueue_func_entry(&irqfd->wait, hsm_irqfd_wakeup); init_poll_funcptr(&irqfd->pt, hsm_irqfd_poll_func); + /* Check the pending event in this stage */ + events = vfs_poll(f.file, &irqfd->pt); + + if (events & EPOLLIN) + acrn_irqfd_inject(irqfd); + mutex_lock(&vm->irqfds_lock); list_for_each_entry(tmp, &vm->irqfds, list) { if (irqfd->eventfd != tmp->eventfd) continue; - ret = -EBUSY; + hsm_irqfd_shutdown(irqfd); mutex_unlock(&vm->irqfds_lock); - goto fail; + irqfd = NULL; // consumed by hsm_irqfd_shutdown() + ret = -EBUSY; + goto out_file; } list_add_tail(&irqfd->list, &vm->irqfds); + irqfd = NULL; // not for us to free... mutex_unlock(&vm->irqfds_lock); - - /* Check the pending event in this stage */ - events = vfs_poll(f.file, &irqfd->pt); - - if (events & EPOLLIN) - acrn_irqfd_inject(irqfd); - - fdput(f); - return 0; -fail: - if (eventfd && !IS_ERR(eventfd)) - eventfd_ctx_put(eventfd); - +out_file: fdput(f); out: kfree(irqfd); ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [RFC] UAF in acrn_irqfd_assign() and vfio_virqfd_enable() 2024-06-10 5:12 ` [RFC] UAF in acrn_irqfd_assign() and vfio_virqfd_enable() Al Viro 2024-06-10 17:03 ` Al Viro @ 2024-06-10 20:09 ` Alex Williamson 2024-06-10 20:53 ` Al Viro 1 sibling, 1 reply; 6+ messages in thread From: Alex Williamson @ 2024-06-10 20:09 UTC (permalink / raw) To: Al Viro; +Cc: Fei Li, kvm, linux-fsdevel On Mon, 10 Jun 2024 06:12:06 +0100 Al Viro <viro@zeniv.linux.org.uk> wrote: > In acrn_irqfd_assign(): > irqfd = kzalloc(sizeof(*irqfd), GFP_KERNEL); > ... > set it up > ... > mutex_lock(&vm->irqfds_lock); > list_for_each_entry(tmp, &vm->irqfds, list) { > if (irqfd->eventfd != tmp->eventfd) > continue; > ret = -EBUSY; > mutex_unlock(&vm->irqfds_lock); > goto fail; > } > list_add_tail(&irqfd->list, &vm->irqfds); > mutex_unlock(&vm->irqfds_lock); > Now irqfd is visible in vm->irqfds. > > /* Check the pending event in this stage */ > events = vfs_poll(f.file, &irqfd->pt); > > if (events & EPOLLIN) > acrn_irqfd_inject(irqfd); > > OTOH, in > > static int acrn_irqfd_deassign(struct acrn_vm *vm, > struct acrn_irqfd *args) > { > struct hsm_irqfd *irqfd, *tmp; > struct eventfd_ctx *eventfd; > > eventfd = eventfd_ctx_fdget(args->fd); > if (IS_ERR(eventfd)) > return PTR_ERR(eventfd); > > mutex_lock(&vm->irqfds_lock); > list_for_each_entry_safe(irqfd, tmp, &vm->irqfds, list) { > if (irqfd->eventfd == eventfd) { > hsm_irqfd_shutdown(irqfd); > > and > > static void hsm_irqfd_shutdown(struct hsm_irqfd *irqfd) > { > u64 cnt; > > lockdep_assert_held(&irqfd->vm->irqfds_lock); > > /* remove from wait queue */ > list_del_init(&irqfd->list); > eventfd_ctx_remove_wait_queue(irqfd->eventfd, &irqfd->wait, &cnt); > eventfd_ctx_put(irqfd->eventfd); > kfree(irqfd); > } > > Both acrn_irqfd_assign() and acrn_irqfd_deassign() are callable via > ioctl(2), with no serialization whatsoever. Suppose deassign hits > as soon as we'd inserted the damn thing into the list. By the > time we call vfs_poll() irqfd might have been freed. The same > can happen if hsm_irqfd_wakeup() gets called with EPOLLHUP as a key > (incidentally, it ought to do > __poll_t poll_bits = key_to_poll(key); > instead of > unsigned long poll_bits = (unsigned long)key; > and check for EPOLLIN and EPOLLHUP instead of POLLIN and POLLHUP). > > AFAICS, that's a UAF... > > We could move vfs_poll() under vm->irqfds_lock, but that smells > like asking for deadlocks ;-/ > > vfio_virqfd_enable() has the same problem, except that there we > definitely can't move vfs_poll() under the lock - it's a spinlock. vfio_virqfd_enable() and vfio_virqfd_disable() are serialized by their callers, I don't see that they have a UAF problem. Thanks, Alex > Could we move vfs_poll() + inject to _before_ making the thing > public? We'd need to delay POLLHUP handling there, but then > we need it until the moment with do inject anyway. Something > like replacing > if (!list_empty(&irqfd->list)) > hsm_irqfd_shutdown(irqfd); > in hsm_irqfd_shutdown_work() with > if (!list_empty(&irqfd->list)) > hsm_irqfd_shutdown(irqfd); > else > irqfd->need_shutdown = true; > and doing > if (unlikely(irqfd->need_shutdown)) > hsm_irqfd_shutdown(irqfd); > else > list_add_tail(&irqfd->list, &vm->irqfds); > when the sucker is made visible. > > I'm *not* familiar with the area, though, so that might be unfeasible > for any number of reasons. > > Suggestions? > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC] UAF in acrn_irqfd_assign() and vfio_virqfd_enable() 2024-06-10 20:09 ` Alex Williamson @ 2024-06-10 20:53 ` Al Viro 2024-06-11 23:04 ` Alex Williamson 0 siblings, 1 reply; 6+ messages in thread From: Al Viro @ 2024-06-10 20:53 UTC (permalink / raw) To: Alex Williamson; +Cc: Fei Li, kvm, linux-fsdevel On Mon, Jun 10, 2024 at 02:09:06PM -0600, Alex Williamson wrote: > > > > We could move vfs_poll() under vm->irqfds_lock, but that smells > > like asking for deadlocks ;-/ > > > > vfio_virqfd_enable() has the same problem, except that there we > > definitely can't move vfs_poll() under the lock - it's a spinlock. > > vfio_virqfd_enable() and vfio_virqfd_disable() are serialized by their > callers, I don't see that they have a UAF problem. Thanks, > > Alex Umm... I agree that there's no UAF on vfio side; acrn and xen/privcmd counterparts, OTOH, look like they do have that... OK, so the memory safety in there depends upon * external exclusion wrt vfio_virqfd_disable() on caller-specific locks (vfio_pci_core_device::ioeventfds_lock for vfio_pci_rdwr.c, vfio_pci_core_device::igate for the rest? What about the path via vfio_pci_core_disable()?) * no EPOLLHUP on eventfd while the file is pinned. That's what /* * Do not drop the file until the irqfd is fully initialized, * otherwise we might race against the EPOLLHUP. */ in there (that "irqfd" is a typo for "kirqfd", right?) refers to. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC] UAF in acrn_irqfd_assign() and vfio_virqfd_enable() 2024-06-10 20:53 ` Al Viro @ 2024-06-11 23:04 ` Alex Williamson 2024-06-12 2:16 ` Al Viro 0 siblings, 1 reply; 6+ messages in thread From: Alex Williamson @ 2024-06-11 23:04 UTC (permalink / raw) To: Al Viro; +Cc: Fei Li, kvm, linux-fsdevel On Mon, 10 Jun 2024 21:53:05 +0100 Al Viro <viro@zeniv.linux.org.uk> wrote: > On Mon, Jun 10, 2024 at 02:09:06PM -0600, Alex Williamson wrote: > > > > > > We could move vfs_poll() under vm->irqfds_lock, but that smells > > > like asking for deadlocks ;-/ > > > > > > vfio_virqfd_enable() has the same problem, except that there we > > > definitely can't move vfs_poll() under the lock - it's a spinlock. > > > > vfio_virqfd_enable() and vfio_virqfd_disable() are serialized by their > > callers, I don't see that they have a UAF problem. Thanks, > > > > Alex > > Umm... I agree that there's no UAF on vfio side; acrn and xen/privcmd > counterparts, OTOH, look like they do have that... > > OK, so the memory safety in there depends upon > * external exclusion wrt vfio_virqfd_disable() on caller-specific > locks (vfio_pci_core_device::ioeventfds_lock for vfio_pci_rdwr.c, > vfio_pci_core_device::igate for the rest? What about the path via > vfio_pci_core_disable()?) This is only called when the device is closed, therefore there's no userspace access to generate a race. > * no EPOLLHUP on eventfd while the file is pinned. That's what > /* > * Do not drop the file until the irqfd is fully initialized, > * otherwise we might race against the EPOLLHUP. > */ > in there (that "irqfd" is a typo for "kirqfd", right?) refers to. Sorry, I'm not fully grasping your comment. "irqfd" is not a typo here, "kirqfd" seems to be a Xen thing. I believe the comment is referring to holding a reference to the fd until everything is in place to cleanup correctly if the user process is killed mid-setup. Thanks, Alex ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC] UAF in acrn_irqfd_assign() and vfio_virqfd_enable() 2024-06-11 23:04 ` Alex Williamson @ 2024-06-12 2:16 ` Al Viro 0 siblings, 0 replies; 6+ messages in thread From: Al Viro @ 2024-06-12 2:16 UTC (permalink / raw) To: Alex Williamson; +Cc: Fei Li, kvm, linux-fsdevel On Tue, Jun 11, 2024 at 05:04:38PM -0600, Alex Williamson wrote: > > OK, so the memory safety in there depends upon > > * external exclusion wrt vfio_virqfd_disable() on caller-specific > > locks (vfio_pci_core_device::ioeventfds_lock for vfio_pci_rdwr.c, > > vfio_pci_core_device::igate for the rest? What about the path via > > vfio_pci_core_disable()?) > > This is only called when the device is closed, therefore there's no > userspace access to generate a race. Umm... Let's see if I got confused in RTFS: 1. Calls of vfio_pci_core_disable() come from assorted ->close_device() instances and failure exits in ->open_device() ones. 2. ->open_device() is called by vfio_df_device_first_open() from vfio_df_open(). That's done under device->dev_set->lock. 3. ->close_device() is called by vfio_df_device_last_close() from vfio_df_close(), under the same lock. 4. vfio_df_open() comes from vfio_df_ioctl_bind_iommufd() or from vfio_df_group_open(). vfio_df_close() is done by the failure exits in those two, as well as by vfio_df_unbind_iommufd() and vfio_df_group_close(). 5. vfio_df_bind_iommufd() handles VFIO_DEVICE_BIND_IOMMUFD in vfio_device_fops->unlocked_ioctl(); only works for !df->group and only once, unless I'm misreading vfio_df_open(). No other ioctls are allowed until that's done and vfio_df_unbind_iommufd() is done in ->release(), in case of !df->group. vfio_df_close() is done there, in case we had a successful BIND_IOMMUFD done at some point. Multiple files can be opened for the same device; once one of them have done BIND_IOMMUFD, BIND_IOMMUFD on any of them will fail until the first caller gets closed. Once that happens, others can get BIND_IOMMUFD; until then ioctls don't work for them at all (IOW, BIND_IOMMUFD grants the ability to do ioctls only for the opened file it had been done on). 6. vfio_df_group_open() and vfio_df_close() is about the other way to get files with such ->f_op - VFIO_GROUP_GET_DEVICE_FD handling in vfio_group_fops.unlocked_ioctl(). That gets an anon-inode file with vfio_device_fops and shoves it into descriptor table. Presumably vfio_device_get_from_name() always returns a device with non-NULL device->group (it would better, or the things would get really confused). vfio_df_group_open() is done fist, with vfio_df_group_close() on failure exit *and* in ->release() of those suckers (again, assuming we do get non-NULL ->group). OK, that seems to be enough - anything done in ->ioctl() would be completed between vfio_df_open() and vfio_df_close(), so we do have the exclusion. Is the above correct? FWIW, this was a confusing bit: vfio_pci_core_disable(vdev); mutex_lock(&vdev->igate); if (vdev->err_trigger) { eventfd_ctx_put(vdev->err_trigger); vdev->err_trigger = NULL; } if (vdev->req_trigger) { eventfd_ctx_put(vdev->req_trigger); vdev->req_trigger = NULL; } mutex_unlock(&vdev->igate); in vfio_pci_core_close_device(). Since we need an exclusion on ->igate there for something, and since it's one of the locks used to serialize vfio_virqfd_enable()... > > * no EPOLLHUP on eventfd while the file is pinned. That's what > > /* > > * Do not drop the file until the irqfd is fully initialized, > > * otherwise we might race against the EPOLLHUP. > > */ > > in there (that "irqfd" is a typo for "kirqfd", right?) refers to. > > Sorry, I'm not fully grasping your comment. "irqfd" is not a typo > here, "kirqfd" seems to be a Xen thing. I believe the comment is > referring to holding a reference to the fd until everything is in place > to cleanup correctly if the user process is killed mid-setup. Thanks, *blink* s/kirqfd/virqfd/, sorry. In the comment earlier in the same function: * virqfds can be released by closing the eventfd or directly ^^^^^^^ * through ioctl. These are both done through a workqueue, so * we update the pointer to the virqfd under lock to avoid ^^^ ^^^^^^ * pushing multiple jobs to release the same virqfd. ^^^ ^^^^ ^^^^^^ In the second comment you have * Do not drop the file until the irqfd is fully initialized, ^^^ ^^^^^ * otherwise we might race against the EPOLLHUP. If that refers to the same object, the comment makes sense - once you've called vfs_poll(), EPOLLHUP wakeup would have your new instance of struct virqfd freed, so accessing it (e.g. in schedule_work(&virqfd->inject); ) is only safe because EPOLLHUP won't come until eventfd_release(), which will not happen as long as you don't drop the file reference that sits in irqfd.file. That's the reason why struct fd instance can't be released until you are done with setting the struct virqfd instance up. If that reading is what you intended, "irqfd" in the second comment ought to be "virqfd", to be consistent with the reference to the same thing in the earlier comment. If that's not what you meant... what is that comment really about? Killing the user process mid-setup won't actually do anything until your thread is out of whatever syscall it had been in (ioctl(2), usually); the dangerous scenario would be having another thread close the same descriptor after you've done fdput(). The thing you need to avoid is having all references to eventfd file dropped. For that the reference in descriptor table must be gone. Sure, killing the process might do that - once all threads get to exit_files() and drop their references to descriptor table. Then put_files_struct() from the last of them will call close_files() and drop all file references you had in the table. But that can happen only when all threads have gotten through the signal delivery. Including the one that was in the middle of vfio_virqfd_enable(). And _that_ won't happen until return from that function. So having the caller killed mid-setup is not an issue. Another thread sharing the same descriptor table and calling close(fd) (or dup2(something, fd), or anything else that would close that descriptor) would be. _That_ is what is prevented by the fdget()/fdput() pair - between those we are guaranteed that file reference will stay pinned. If descriptor table is shared, fdget() will clone the reference and store it in irqfd.file, so the file remains pinned no matter what happens to descriptor table, until fdput() drops the reference. If the table is _not_ shared, the reference in it won't go away until we are done, so we can borrow that into irqfd.file (and do nothing on fdput()). Anyway, I believe that what you have there is actually safe. Analysis could be less convoluted, but then I might've missed simpler reasons why everything works. It really needs comments in there - as it is, two drivers have copied that scheme without bothering with exclusion (commit f8941e6c4c71 "xen: privcmd: Add support for irqfd" last year and commit aa3b483ff1d7 "virt: acrn: Introduce irqfd" three years ago) with, AFAICT, real UAF in each ;-/ ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2024-06-12 2:16 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20240607015656.GX1629371@ZenIV>
[not found] ` <20240607015957.2372428-1-viro@zeniv.linux.org.uk>
[not found] ` <20240607015957.2372428-11-viro@zeniv.linux.org.uk>
[not found] ` <20240607-gelacht-enkel-06a7c9b31d4e@brauner>
[not found] ` <20240607161043.GZ1629371@ZenIV>
[not found] ` <20240607210814.GC1629371@ZenIV>
2024-06-10 5:12 ` [RFC] UAF in acrn_irqfd_assign() and vfio_virqfd_enable() Al Viro
2024-06-10 17:03 ` Al Viro
2024-06-10 20:09 ` Alex Williamson
2024-06-10 20:53 ` Al Viro
2024-06-11 23:04 ` Alex Williamson
2024-06-12 2:16 ` Al Viro
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox