qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC 0/2] Add debug interface to kick/call on purpose
@ 2021-01-15  0:27 Dongli Zhang
  2021-01-15  0:27 ` [PATCH RFC 1/2] qdev: add debug interface to kick/call eventfd Dongli Zhang
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Dongli Zhang @ 2021-01-15  0:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: berrange, ehabkost, mst, joe.jin, armbru, dgilbert, pbonzini,
	joao.m.martins

The virtio device/driver (e.g., vhost-scsi and indeed any device including
e1000e) may hang due to the lost of IRQ or the lost of doorbell register
kick, e.g.,

https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg01711.html

The virtio-net was in trouble in above link because the 'kick' was not
taking effect (missed).

This RFC adds a new debug interface 'DeviceEvent' to DeviceClass to help
narrow down if the issue is due to lost of irq/kick. So far the new
interface handles only two events: 'call' and 'kick'. Any device (e.g.,
e1000e or vhost-scsi) may implement (e.g., via eventfd, MSI-X or legacy
IRQ).

The 'call' is to inject irq on purpose by admin for a specific device (e.g.,
vhost-scsi) from QEMU/host to VM, while the 'kick' is to kick the doorbell
on purpose by admin at QEMU/host side for a specific device.

This device can also be used as a workaround if call/kick is lost due to
virtualization software (e.g., kernel or QEMU) issue.


Below is from live crash analysis. Initially, the queue=3 has count=30 for
'kick' eventfd_ctx. Suppose there is data in vring avail while there is no
used available. We suspect this is because vhost-scsi was not notified by
VM. In order to narrow down and analyze the issue, we use live crash to
dump the current counter of eventfd for queue=3.

crash> eventfd_ctx ffffa10392537ac0
struct eventfd_ctx {
  kref = {
    refcount = {
      refs = {
        counter = 4
      }
    }
  }, 
  wqh = {
    lock = {
      {
        rlock = {
          raw_lock = {
            {
              val = {
                counter = 0
              }, 
              {
                locked = 0 '\000', 
                pending = 0 '\000'
              }, 
              {
                locked_pending = 0, 
                tail = 0
              }
            }
          }
        }
      }
    }, 
    head = {
      next = 0xffffa104ae40d360, 
      prev = 0xffffa104ae40d360
    }
  }, 
  count = 30,  -----> eventfd is 30 !!! 
  flags = 526336, 
  id = 26
}

Now we kick the doorbell for vhost-scsi queue=3 on purpose for diagnostic
with this interface.

{ "execute": "x-debug-device-event", "arguments": { "dev": "/machine/peripheral/vscsi0", "event": "kick", "queue": 3 } }


The counter increased to 31. Suppose the hang issue is addressed, it
indicates something bad is in software that the 'kick' is lost.

crash> eventfd_ctx ffffa10392537ac0
struct eventfd_ctx {
  kref = {
    refcount = {
      refs = {
        counter = 4
      }
    }
  },
  wqh = {
    lock = {
      {
        rlock = {
          raw_lock = {
            {
              val = {
                counter = 0
              },
              {
                locked = 0 '\000',
                pending = 0 '\000'
              },
              {
                locked_pending = 0,
                tail = 0
              }
            }
          }
        }
      }
    },
    head = {
      next = 0xffffa104ae40d360,
      prev = 0xffffa104ae40d360
    }
  },
  count = 31,  -----> eventfd incremented to 31 !!!
  flags = 526336,
  id = 26
}


Only the interface for vhost-scsi is implemented since this is RFC. I will
implement for other types (e.g., eventfd or MSI-X) if the RFC is reasonable.

Thank you very much!

Dongli Zhang




^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-01-19 22:21 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-01-15  0:27 [PATCH RFC 0/2] Add debug interface to kick/call on purpose Dongli Zhang
2021-01-15  0:27 ` [PATCH RFC 1/2] qdev: add debug interface to kick/call eventfd Dongli Zhang
2021-01-19 22:20   ` Eric Blake
2021-01-15  0:27 ` [PATCH RFC 2/2] vhost-scsi: implement DeviceEvent Dongli Zhang
2021-01-15 10:27 ` [PATCH RFC 0/2] Add debug interface to kick/call on purpose Daniel P. Berrangé
2021-01-18 16:59   ` Dr. David Alan Gilbert
2021-01-19 22:11     ` Dongli Zhang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).