From: Dongli Zhang <dongli.zhang@oracle.com>
To: qemu-devel@nongnu.org
Cc: berrange@redhat.com, ehabkost@redhat.com, mst@redhat.com,
joe.jin@oracle.com, armbru@redhat.com, dgilbert@redhat.com,
pbonzini@redhat.com, joao.m.martins@oracle.com
Subject: [PATCH RFC 0/2] Add debug interface to kick/call on purpose
Date: Thu, 14 Jan 2021 16:27:28 -0800 [thread overview]
Message-ID: <20210115002730.1279-1-dongli.zhang@oracle.com> (raw)
The virtio device/driver (e.g., vhost-scsi and indeed any device including
e1000e) may hang due to the lost of IRQ or the lost of doorbell register
kick, e.g.,
https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg01711.html
The virtio-net was in trouble in above link because the 'kick' was not
taking effect (missed).
This RFC adds a new debug interface 'DeviceEvent' to DeviceClass to help
narrow down if the issue is due to lost of irq/kick. So far the new
interface handles only two events: 'call' and 'kick'. Any device (e.g.,
e1000e or vhost-scsi) may implement (e.g., via eventfd, MSI-X or legacy
IRQ).
The 'call' is to inject irq on purpose by admin for a specific device (e.g.,
vhost-scsi) from QEMU/host to VM, while the 'kick' is to kick the doorbell
on purpose by admin at QEMU/host side for a specific device.
This device can also be used as a workaround if call/kick is lost due to
virtualization software (e.g., kernel or QEMU) issue.
Below is from live crash analysis. Initially, the queue=3 has count=30 for
'kick' eventfd_ctx. Suppose there is data in vring avail while there is no
used available. We suspect this is because vhost-scsi was not notified by
VM. In order to narrow down and analyze the issue, we use live crash to
dump the current counter of eventfd for queue=3.
crash> eventfd_ctx ffffa10392537ac0
struct eventfd_ctx {
kref = {
refcount = {
refs = {
counter = 4
}
}
},
wqh = {
lock = {
{
rlock = {
raw_lock = {
{
val = {
counter = 0
},
{
locked = 0 '\000',
pending = 0 '\000'
},
{
locked_pending = 0,
tail = 0
}
}
}
}
}
},
head = {
next = 0xffffa104ae40d360,
prev = 0xffffa104ae40d360
}
},
count = 30, -----> eventfd is 30 !!!
flags = 526336,
id = 26
}
Now we kick the doorbell for vhost-scsi queue=3 on purpose for diagnostic
with this interface.
{ "execute": "x-debug-device-event", "arguments": { "dev": "/machine/peripheral/vscsi0", "event": "kick", "queue": 3 } }
The counter increased to 31. Suppose the hang issue is addressed, it
indicates something bad is in software that the 'kick' is lost.
crash> eventfd_ctx ffffa10392537ac0
struct eventfd_ctx {
kref = {
refcount = {
refs = {
counter = 4
}
}
},
wqh = {
lock = {
{
rlock = {
raw_lock = {
{
val = {
counter = 0
},
{
locked = 0 '\000',
pending = 0 '\000'
},
{
locked_pending = 0,
tail = 0
}
}
}
}
}
},
head = {
next = 0xffffa104ae40d360,
prev = 0xffffa104ae40d360
}
},
count = 31, -----> eventfd incremented to 31 !!!
flags = 526336,
id = 26
}
Only the interface for vhost-scsi is implemented since this is RFC. I will
implement for other types (e.g., eventfd or MSI-X) if the RFC is reasonable.
Thank you very much!
Dongli Zhang
next reply other threads:[~2021-01-15 0:27 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-01-15 0:27 Dongli Zhang [this message]
2021-01-15 0:27 ` [PATCH RFC 1/2] qdev: add debug interface to kick/call eventfd Dongli Zhang
2021-01-19 22:20 ` Eric Blake
2021-01-15 0:27 ` [PATCH RFC 2/2] vhost-scsi: implement DeviceEvent Dongli Zhang
2021-01-15 10:27 ` [PATCH RFC 0/2] Add debug interface to kick/call on purpose Daniel P. Berrangé
2021-01-18 16:59 ` Dr. David Alan Gilbert
2021-01-19 22:11 ` Dongli Zhang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210115002730.1279-1-dongli.zhang@oracle.com \
--to=dongli.zhang@oracle.com \
--cc=armbru@redhat.com \
--cc=berrange@redhat.com \
--cc=dgilbert@redhat.com \
--cc=ehabkost@redhat.com \
--cc=joao.m.martins@oracle.com \
--cc=joe.jin@oracle.com \
--cc=mst@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).