From: Maxim Levitsky <mlevitsk@redhat.com>
To: qemu-devel@nongnu.org
Cc: "Fam Zheng" <fam@euphon.net>,
"Daniel P. Berrangé" <berrange@redhat.com>,
"Eduardo Habkost" <ehabkost@redhat.com>,
"Michael S. Tsirkin" <mst@redhat.com>,
"Maxim Levitsky" <mlevitsk@redhat.com>,
"Paolo Bonzini" <pbonzini@redhat.com>
Subject: [PATCH 0/4] RFC/WIP: Fix scsi devices plug/unplug races w.r.t virtio-scsi iothread
Date: Thu, 16 Apr 2020 23:36:20 +0300 [thread overview]
Message-ID: <20200416203624.32366-1-mlevitsk@redhat.com> (raw)
Hi!
This is a patch series that is a result of my discussion with Paulo on
how to correctly fix the root cause of the BZ #1812399.
The root cause of this bug is the fact that IO thread is running mostly
unlocked versus main thread on which device hotplug is done.
qdev_device_add first creates the device object, then places it on the bus,
and only then realizes it.
However some drivers and currently only virtio-scsi enumerate its child bus
devices on each request that is received from the guest and that can happen on the IO
thread.
Thus we have a window when new device is on the bus but not realized and can be accessed
by the virtio-scsi driver in that state.
Fix that by doing two things:
1. Add partial RCU protection to the list of a bus's child devices.
This allows the scsi IO thread to safely enumerate the child devices
while it races with the hotplug placing the device on the bus.
2. Make the virtio-scsi driver check .realized property of the scsi device
and avoid touching the device if it isn't
I don't think that this is very pretty way to solve this, we discussed this
with Paulo and it kind of looks like the lesser evil. I am open to your thoughts about this.
Note that this patch series doesn't pass some unit tests and in particular qtest 'drive_del-test'
I did some light debug of this test and I see that the reason for this is that now child device deletion
can be delayed due to RCU. This is also something I would like to discuss in this RFC.
Note also that I might have some code style errors and bugs in this since I haven't
tested the code in depth yet, because I am not yet sure that this is the right way
to fix that bug
Also note that in the particular bug report the issue wasn't a race but rather due
to combination of things, the .realize code in the middle managed to trigger IO on the virtqueue
which caused the virtio-scsi driver to access the half realized device. However
since this can happen as well with real IO thread, this patch series was done,
which fixes this as well.
Best regards,
Maxim Levitsky
Maxim Levitsky (4):
scsi/scsi_bus: switch search direction in scsi_device_find
device-core: use RCU for list of childs of a bus
device-core: use atomic_set on .realized property
virtio-scsi: don't touch scsi devices that are not yet realized
hw/core/bus.c | 43 ++++++++++++++++++++----------
hw/core/qdev.c | 48 ++++++++++++++++++++++------------
hw/scsi/scsi-bus.c | 27 ++++++++++++++++---
hw/scsi/virtio-scsi.c | 24 +++++++++++++++--
include/hw/qdev-core.h | 3 +++
include/hw/virtio/virtio-bus.h | 7 +++--
6 files changed, 114 insertions(+), 38 deletions(-)
--
2.17.2
next reply other threads:[~2020-04-16 20:37 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-04-16 20:36 Maxim Levitsky [this message]
2020-04-16 20:36 ` [PATCH 1/4] scsi/scsi_bus: switch search direction in scsi_device_find Maxim Levitsky
2020-04-16 20:36 ` [PATCH 2/4] device-core: use RCU for list of childs of a bus Maxim Levitsky
2020-05-04 10:41 ` Stefan Hajnoczi
2020-05-11 8:48 ` Maxim Levitsky
2020-04-16 20:36 ` [PATCH 3/4] device-core: use atomic_set on .realized property Maxim Levitsky
2020-05-04 10:45 ` Stefan Hajnoczi
2020-05-04 11:22 ` Paolo Bonzini
2020-05-04 11:36 ` Maxim Levitsky
2020-05-11 11:00 ` Maxim Levitsky
2020-05-11 11:11 ` Paolo Bonzini
2020-05-11 11:14 ` Maxim Levitsky
2020-04-16 20:36 ` [PATCH 4/4] virtio-scsi: don't touch scsi devices that are not yet realized Maxim Levitsky
2020-05-04 11:24 ` Paolo Bonzini
2020-05-11 11:21 ` Maxim Levitsky
2020-04-16 21:33 ` [PATCH 0/4] RFC/WIP: Fix scsi devices plug/unplug races w.r.t virtio-scsi iothread no-reply
2020-04-16 21:35 ` no-reply
2020-04-16 21:47 ` no-reply
2020-05-04 10:59 ` Stefan Hajnoczi
2020-05-04 11:38 ` Paolo Bonzini
2020-05-04 11:43 ` Maxim Levitsky
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200416203624.32366-1-mlevitsk@redhat.com \
--to=mlevitsk@redhat.com \
--cc=berrange@redhat.com \
--cc=ehabkost@redhat.com \
--cc=fam@euphon.net \
--cc=mst@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).