* vfio hang when unbinding after using qemu as user + vhost_net
@ 2016-02-05 12:36 Dominique Martinet
2016-02-05 13:59 ` Alex Williamson
0 siblings, 1 reply; 4+ messages in thread
From: Dominique Martinet @ 2016-02-05 12:36 UTC (permalink / raw)
To: kvm
Hi,
I've got a weird hang here when unbinding a device from vfio-pci.
(confirmed still hanging on 4.4.1, but originally stuck with at least
redhat's 3.10.0-327.4.4 which could contain anything so if you'd like me
to test a specific older version for regression please just give me a
tag)
Here's a reproducer for my hardware. I'm binding a mlx4 IB card
(vendor:device id are 15b3:1003, pci address is 0000:90:00.0) to a VM,
starting with vhost net on a freshly created tuntap:
------->8------
modprove vhost_net
modprobe vfio-pci
# register the model, unbind the card from current driver, bind to
# vfio-pci and give device to qemu
echo "15b3 1003" > /sys/bus/pci/drivers/vfio-pci/new_id
echo 0000:90:00.0 > /sys/bus/pci/devices/0000:90:00.0/driver/unbind
echo 0000:90:00.0 > /sys/bus/pci/drivers/vfio-pci/bind
chown qemu: /dev/vfio/32
# create tuntap, bring it up and open vhost-net fd
ip tuntap add dev tap-hang-0 mode tap user qemu
ip link set tap-hang-0 mtu 9000 up
exec 10<>/dev/vhost-net
# run qemu. This inits the first devices and exits because of
# nonexistant device, so no guest OS is ever involved
su qemu -s/bin/sh -c '/usr/libexec/qemu-kvm --enable-kvm \
-m 16G -smp 24 -device vfio-pci,id=ib0,host=90:00.0 \
-netdev type=tap,id=guest0,ifname=tap-hang-0,script=no,downscript=no,vhost=on,vhostfd=10 \
-device virtio-net-pci,netdev=guest0,mac=52:54:00:ff:17:12 -device nonexistant'
echo 0000:90:00.0 > /sys/bus/pci/drivers/vfio-pci/unbind
------->8------
The last command here hangs, I get this in dmesg:
[ 163.458817] vfio-pci 0000:90:00.0: Relaying device request to user (#0)
[ 255.802784] vfio-pci 0000:90:00.0: Relaying device request to user (#10)
[ 355.805452] vfio-pci 0000:90:00.0: Relaying device request to user (#20)
[ 455.808150] vfio-pci 0000:90:00.0: Relaying device request to user (#30)
[ 555.810916] vfio-pci 0000:90:00.0: Relaying device request to user (#40)
[ 655.813904] vfio-pci 0000:90:00.0: Relaying device request to user (#50)
[ 755.816818] vfio-pci 0000:90:00.0: Relaying device request to user (#60)
On pressing ^C I get:
[ 205.793450] vfio-pci 0000:90:00.0: Device is currently in use, task "bash" (9719) blocked until device is released
Two extra observations:
- this does not happen if I do not add vhost-net (so adding a network
interface without vhost=on will not hang)
- the exact same script running qemu as root will not hang either
I ran qemu with strace to compare the output, and after uniformizing
pointers I do not notice any real difference (no EACCES or similar error
at least), so the difference is probably somewhere in kernel land.
>From what I gather we're basically stuck in vfio_del_group_dev()
because of a ref leak or something...
Does that ring a bell for anyone?
Is there anywhere I could tune verbosity to help debug this?
Thanks,
--
Dominique Martinet
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: vfio hang when unbinding after using qemu as user + vhost_net
2016-02-05 12:36 vfio hang when unbinding after using qemu as user + vhost_net Dominique Martinet
@ 2016-02-05 13:59 ` Alex Williamson
2016-02-05 15:48 ` Dominique Martinet
0 siblings, 1 reply; 4+ messages in thread
From: Alex Williamson @ 2016-02-05 13:59 UTC (permalink / raw)
To: Dominique Martinet; +Cc: kvm
On Fri, 5 Feb 2016 13:36:05 +0100
Dominique Martinet <asmadeus@codewreck.org> wrote:
> Hi,
>
> I've got a weird hang here when unbinding a device from vfio-pci.
> (confirmed still hanging on 4.4.1, but originally stuck with at least
> redhat's 3.10.0-327.4.4 which could contain anything so if you'd like
> me to test a specific older version for regression please just give
> me a tag)
>
>
> Here's a reproducer for my hardware. I'm binding a mlx4 IB card
> (vendor:device id are 15b3:1003, pci address is 0000:90:00.0) to a VM,
> starting with vhost net on a freshly created tuntap:
>
> ------->8------
> modprove vhost_net
> modprobe vfio-pci
>
> # register the model, unbind the card from current driver, bind to
> # vfio-pci and give device to qemu
> echo "15b3 1003" > /sys/bus/pci/drivers/vfio-pci/new_id
> echo 0000:90:00.0 > /sys/bus/pci/devices/0000:90:00.0/driver/unbind
> echo 0000:90:00.0 > /sys/bus/pci/drivers/vfio-pci/bind
> chown qemu: /dev/vfio/32
>
> # create tuntap, bring it up and open vhost-net fd
> ip tuntap add dev tap-hang-0 mode tap user qemu
> ip link set tap-hang-0 mtu 9000 up
> exec 10<>/dev/vhost-net
>
> # run qemu. This inits the first devices and exits because of
> # nonexistant device, so no guest OS is ever involved
> su qemu -s/bin/sh -c '/usr/libexec/qemu-kvm --enable-kvm \
> -m 16G -smp 24 -device vfio-pci,id=ib0,host=90:00.0 \
> -netdev
> type=tap,id=guest0,ifname=tap-hang-0,script=no,downscript=no,vhost=on,vhostfd=10
> \ -device virtio-net-pci,netdev=guest0,mac=52:54:00:ff:17:12 -device
> nonexistant'
>
> echo 0000:90:00.0 > /sys/bus/pci/drivers/vfio-pci/unbind
> ------->8------
>
> The last command here hangs, I get this in dmesg:
> [ 163.458817] vfio-pci 0000:90:00.0: Relaying device request to user
> (#0) [ 255.802784] vfio-pci 0000:90:00.0: Relaying device request to
> user (#10) [ 355.805452] vfio-pci 0000:90:00.0: Relaying device
> request to user (#20) [ 455.808150] vfio-pci 0000:90:00.0: Relaying
> device request to user (#30) [ 555.810916] vfio-pci 0000:90:00.0:
> Relaying device request to user (#40) [ 655.813904] vfio-pci
> 0000:90:00.0: Relaying device request to user (#50) [ 755.816818]
> vfio-pci 0000:90:00.0: Relaying device request to user (#60)
>
> On pressing ^C I get:
> [ 205.793450] vfio-pci 0000:90:00.0: Device is currently in use,
> task "bash" (9719) blocked until device is released
>
>
> Two extra observations:
> - this does not happen if I do not add vhost-net (so adding a network
> interface without vhost=on will not hang)
> - the exact same script running qemu as root will not hang either
>
> I ran qemu with strace to compare the output, and after uniformizing
> pointers I do not notice any real difference (no EACCES or similar
> error at least), so the difference is probably somewhere in kernel
> land.
>
>
> From what I gather we're basically stuck in vfio_del_group_dev()
> because of a ref leak or something...
>
> Does that ring a bell for anyone?
> Is there anywhere I could tune verbosity to help debug this?
I just debugged your case earlier in the week and the bug is with the
test case. When vhost is used it takes a reference to the process mm
(qemu). That reference includes the mmap regions on the vfio device
file. vhost releases those references when the vhostfd file descriptor
is released. So in the scenario you have here, killing qemu doesn't
release the vhostfd file descriptor because it's still opened in the
script. The vfio device is not released because there's still a
reference to the mmap. You've essentially put yourself into a
deadlock. The solution is to close the vhostfd file descriptor in your
test script after launching qemu (echo 10<&-). Then qemu will hold the
last reference to vhostfd and killing qemu will release that file
descriptor and everything is released as intended. I don't believe
standard management tools like libvirt have this problem. Thanks,
Alex
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: vfio hang when unbinding after using qemu as user + vhost_net
2016-02-05 13:59 ` Alex Williamson
@ 2016-02-05 15:48 ` Dominique Martinet
2016-02-05 16:31 ` Alex Williamson
0 siblings, 1 reply; 4+ messages in thread
From: Dominique Martinet @ 2016-02-05 15:48 UTC (permalink / raw)
To: Alex Williamson; +Cc: kvm
Alex Williamson wrote on Fri, Feb 05, 2016:
> I just debugged your case earlier in the week and the bug is with the
> test case.
Thank you for the extra information and sorry for double work.
Getting technical informations through support is hard...
> When vhost is used it takes a reference to the process mm (qemu).
> That reference includes the mmap regions on the vfio device file.
> vhost releases those references when the vhostfd file descriptor
> is released.
> So in the scenario you have here, killing qemu doesn't release the
> vhostfd file descriptor because it's still opened in the script. The
> vfio device is not released because there's still a reference to the
> mmap. You've essentially put yourself into a deadlock
> The solution is to close the vhostfd file descriptor in your test
> script after launching qemu (echo 10<&-). Then qemu will hold the
> last reference to vhostfd and killing qemu will release that file
> descriptor and everything is released as intended.
doh, I was sure I also had the hang when giving /dev/vhost-net to qemu
and letting it open the fd so I wasn't looking at it at all, but you've
got it.
(I think I didn't have the hang as root because I didn't bother with
vhostfd then either, the devil is in the details...)
Closing the fd even after qemu has stopped will free the resource and
let me unbind, so I will just make sure to order vhost-net-related to be
closed before vfio stuff for now.
Really not an obvious lock at first glance though, not sure how this
could be 'fixed' now you've explained it so I'll just let you guys
decide how to handle it. This has been very helpful.
> I don't believe standard management tools like libvirt have this
> problem.
I'm pretty sure this would have been pointed at ages ago if libvirt had
the problem :)
Thank you for your time,
--
Dominique Martinet
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: vfio hang when unbinding after using qemu as user + vhost_net
2016-02-05 15:48 ` Dominique Martinet
@ 2016-02-05 16:31 ` Alex Williamson
0 siblings, 0 replies; 4+ messages in thread
From: Alex Williamson @ 2016-02-05 16:31 UTC (permalink / raw)
To: Dominique Martinet; +Cc: kvm
On Fri, 5 Feb 2016 16:48:07 +0100
Dominique Martinet <asmadeus@codewreck.org> wrote:
> Alex Williamson wrote on Fri, Feb 05, 2016:
> > I just debugged your case earlier in the week and the bug is with
> > the test case.
>
> Thank you for the extra information and sorry for double work.
> Getting technical informations through support is hard...
>
> > When vhost is used it takes a reference to the process mm (qemu).
> > That reference includes the mmap regions on the vfio device file.
> > vhost releases those references when the vhostfd file descriptor
> > is released.
> > So in the scenario you have here, killing qemu doesn't release the
> > vhostfd file descriptor because it's still opened in the script.
> > The vfio device is not released because there's still a reference
> > to the mmap. You've essentially put yourself into a deadlock
> > The solution is to close the vhostfd file descriptor in your test
> > script after launching qemu (echo 10<&-). Then qemu will hold the
> > last reference to vhostfd and killing qemu will release that file
> > descriptor and everything is released as intended.
>
> doh, I was sure I also had the hang when giving /dev/vhost-net to qemu
> and letting it open the fd so I wasn't looking at it at all, but
> you've got it.
> (I think I didn't have the hang as root because I didn't bother with
> vhostfd then either, the devil is in the details...)
>
> Closing the fd even after qemu has stopped will free the resource and
> let me unbind, so I will just make sure to order vhost-net-related to
> be closed before vfio stuff for now.
>
>
> Really not an obvious lock at first glance though, not sure how this
> could be 'fixed' now you've explained it so I'll just let you guys
> decide how to handle it. This has been very helpful.
>
>
> > I don't believe standard management tools like libvirt have this
> > problem.
>
> I'm pretty sure this would have been pointed at ages ago if libvirt
> had the problem :)
>
> Thank you for your time,
Yes, it's unfortunate that management tools need to be aware of these
sorts of semantics but it's somewhat fundamental in the mechanism of
providing access through file descriptors. When QEMU is killed,
there's no opportunity for cleanup, so all of the kernel interfaces
need to do this automatically, but the trigger for that is when the
release callbacks for the open files get called. Therefore the owner
of that file, the shell in the example case, needs to not only give
QEMU access, but let it hold the only outstanding references.
Another unique feature of your test case is that the unbind is called
from the same shell that owns that same open vhost file descriptor. The
unbind of course blocks until the device is unused because there is no
opportunity for a -EBUSY return through that path in the kernel. If
the unbind is executed from a separate shell, then we don't have that
interaction, qemu can be killed, which doesn't immediately release the
vhostfd, but the shell is not blocked and can exit and everything
releases as expected again.
Anyway, I appreciate your concise test case, I wasn't aware of that
interaction either, but it was an interesting problem to investigate.
Thanks,
Alex
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2016-02-05 16:31 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-02-05 12:36 vfio hang when unbinding after using qemu as user + vhost_net Dominique Martinet
2016-02-05 13:59 ` Alex Williamson
2016-02-05 15:48 ` Dominique Martinet
2016-02-05 16:31 ` Alex Williamson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox