devicetree.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v9 0/2] Detect stalls on guest vCPUS
@ 2022-07-01 14:40 Sebastian Ene
  2022-07-01 14:40 ` [PATCH v9 1/2] dt-bindings: vcpu_stall_detector: Add qemu,vcpu-stall-detector compatible Sebastian Ene
  2022-07-01 14:40 ` [PATCH v9 2/2] misc: Add a mechanism to detect stalls on guest vCPUs Sebastian Ene
  0 siblings, 2 replies; 10+ messages in thread
From: Sebastian Ene @ 2022-07-01 14:40 UTC (permalink / raw)
  To: Rob Herring, Greg Kroah-Hartman, Arnd Bergmann, Dragan Cvetic
  Cc: linux-kernel, devicetree, maz, will, vdonnefort, Guenter Roeck,
	Sebastian Ene

This adds a mechanism to detect stalls on the guest vCPUS by creating a
per CPU hrtimer which periodically 'pets' the host backend driver.
On a conventional watchdog-core driver, the userspace is responsible for
delivering the 'pet' events by writing to the particular /dev/watchdogN node.
In this case we require a strong thread affinity to be able to
account for lost time on a per vCPU basis.

This device driver acts as a soft lockup detector by relying on the host
backend driver to measure the elapesed time between subsequent 'pet' events.
If the elapsed time doesn't match an expected value, the backend driver
decides that the guest vCPU is locked and resets the guest. The host
backend driver takes into account the time that the guest is not
running. The communication with the backend driver is done through MMIO
and the register layout of the virtual watchdog is described as part of
the backend driver changes.

The host backend driver is implemented as part of:
https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/3548817

Changelog v9:
 - make the driver depend on CONFIG_OF
 - remove the platform_(set|get)_drvdata calls and keep a per-cpu static
   variable `vm_stall_detect` as suggested by Guenter on the (v8) series
 - improve commit description and fix styling

Changelog v8:
 - fix the yamlint dtschema warning caused by the missing 'reg' property 

Changelog v7:
 - fix the dtschema warnings for 'timeout-sec' property
 - rename vcpu_stall_detector.yaml to qemu,vcpu_stall_detector.yaml and
   place the file under misc
 - improve the Kconfig description for the driver by making it KVM
   specific

Changelog v6:
 - fix issues reported by lkp@intel robot:
     building for ARCH=h8300 incorrect type in assignment
     (different address spaces)

Sebastian Ene (2):
  dt-bindings: vcpu_stall_detector: Add qemu,vcpu-stall-detector
    compatible
  misc: Add a mechanism to detect stalls on guest vCPUs

 .../misc/qemu,vcpu-stall-detector.yaml        |  51 +++++
 drivers/misc/Kconfig                          |  13 ++
 drivers/misc/Makefile                         |   1 +
 drivers/misc/vcpu_stall_detector.c            | 212 ++++++++++++++++++
 4 files changed, 277 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/misc/qemu,vcpu-stall-detector.yaml
 create mode 100644 drivers/misc/vcpu_stall_detector.c

-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-07-07 13:17 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-07-01 14:40 [PATCH v9 0/2] Detect stalls on guest vCPUS Sebastian Ene
2022-07-01 14:40 ` [PATCH v9 1/2] dt-bindings: vcpu_stall_detector: Add qemu,vcpu-stall-detector compatible Sebastian Ene
2022-07-01 21:01   ` Rob Herring
2022-07-01 14:40 ` [PATCH v9 2/2] misc: Add a mechanism to detect stalls on guest vCPUs Sebastian Ene
2022-07-01 14:52   ` Guenter Roeck
2022-07-06 15:21   ` Will Deacon
2022-07-06 15:50     ` Greg Kroah-Hartman
2022-07-06 16:10       ` Will Deacon
2022-07-06 17:35         ` Greg Kroah-Hartman
2022-07-07 13:17     ` Sebastian Ene

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).