* [PATCH 0/4] hw: make TCO watchdog actually work by default for Q35
@ 2022-10-31 13:19 Daniel P. Berrangé
2022-10-31 13:19 ` [PATCH 1/4] hw/acpi: add trace events for TCO watchdog register access Daniel P. Berrangé
` (5 more replies)
0 siblings, 6 replies; 17+ messages in thread
From: Daniel P. Berrangé @ 2022-10-31 13:19 UTC (permalink / raw)
To: qemu-devel
Cc: Laurent Vivier, Richard Henderson, Ani Sinha, Igor Mammedov,
Michael S. Tsirkin, Thomas Huth, Paolo Bonzini, Eduardo Habkost,
Marcel Apfelbaum, Richard W.M. Jones, Daniel P. Berrangé
The TCO watchdog is unconditionally integrated into the Q35 machine
type by default, but at the same time is unconditionally disabled
from firing by a host config option that overrides guest OS attempts
to enable it. People have to know to set a magic -global to make
it non-broken
IOW we're exposing a broken watchdog by default to all Q35 machines,
but which to the guest OS & its apps looks fully functional :-(
This behaviour was set in response to feedback from Michael:
https://lists.gnu.org/archive/html/qemu-devel/2015-06/msg07128.html
"I think sample high is a safer default."
but as explained in the commit message in the last patch, I think the
watchdog defaults were already safe without that pin strap setting.
The guest OS needs to take explicit action to clear the guest visible
'no reboot' flag, and so we don't need a second guest hidden 'no reboot'
flag to override that choice IMHO. Am I missing something ?
NB, I'm toggling this for 7.2 machine type since that's the current
git latest machine. Since this has already been "broken" for 7 years
though, I am ambivalent about whether we try todo this for 7.2, vs
just wait until the 8.0 machine types arrive.
Daniel P. Berrangé (4):
hw/acpi: add trace events for TCO watchdog register access
hw/isa: add trace events for ICH9 LPC chip config access
hw/watchdog: add trace events for watchdog action handling
hw/isa: enable TCO watchdog reboot pin strap by default
hw/acpi/tco.c | 41 +++++++++++++++++++++++++++-------------
hw/acpi/trace-events | 2 ++
hw/i386/pc.c | 4 +++-
hw/isa/lpc_ich9.c | 5 ++++-
hw/isa/trace-events | 4 ++++
hw/watchdog/trace-events | 4 ++++
hw/watchdog/watchdog.c | 4 ++++
tests/qtest/tco-test.c | 2 +-
8 files changed, 50 insertions(+), 16 deletions(-)
--
2.37.3
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 1/4] hw/acpi: add trace events for TCO watchdog register access
2022-10-31 13:19 [PATCH 0/4] hw: make TCO watchdog actually work by default for Q35 Daniel P. Berrangé
@ 2022-10-31 13:19 ` Daniel P. Berrangé
2022-10-31 13:34 ` Richard W.M. Jones
2022-10-31 13:19 ` [PATCH 2/4] hw/isa: add trace events for ICH9 LPC chip config access Daniel P. Berrangé
` (4 subsequent siblings)
5 siblings, 1 reply; 17+ messages in thread
From: Daniel P. Berrangé @ 2022-10-31 13:19 UTC (permalink / raw)
To: qemu-devel
Cc: Laurent Vivier, Richard Henderson, Ani Sinha, Igor Mammedov,
Michael S. Tsirkin, Thomas Huth, Paolo Bonzini, Eduardo Habkost,
Marcel Apfelbaum, Richard W.M. Jones, Daniel P. Berrangé
These tracepoints aid in understanding and debugging the guest drivers
for the TCO watchdog.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
---
hw/acpi/tco.c | 41 ++++++++++++++++++++++++++++-------------
hw/acpi/trace-events | 2 ++
2 files changed, 30 insertions(+), 13 deletions(-)
diff --git a/hw/acpi/tco.c b/hw/acpi/tco.c
index 4783721e4e..9ebd3e5e64 100644
--- a/hw/acpi/tco.c
+++ b/hw/acpi/tco.c
@@ -86,6 +86,7 @@ static inline int can_start_tco_timer(TCOIORegs *tr)
static uint32_t tco_ioport_readw(TCOIORegs *tr, uint32_t addr)
{
uint16_t rld;
+ uint32_t ret = 0;
switch (addr) {
case TCO_RLD:
@@ -96,35 +97,49 @@ static uint32_t tco_ioport_readw(TCOIORegs *tr, uint32_t addr)
} else {
rld = tr->tco.rld;
}
- return rld;
+ ret = rld;
+ break;
case TCO_DAT_IN:
- return tr->tco.din;
+ ret = tr->tco.din;
+ break;
case TCO_DAT_OUT:
- return tr->tco.dout;
+ ret = tr->tco.dout;
+ break;
case TCO1_STS:
- return tr->tco.sts1;
+ ret = tr->tco.sts1;
+ break;
case TCO2_STS:
- return tr->tco.sts2;
+ ret = tr->tco.sts2;
+ break;
case TCO1_CNT:
- return tr->tco.cnt1;
+ ret = tr->tco.cnt1;
+ break;
case TCO2_CNT:
- return tr->tco.cnt2;
+ ret = tr->tco.cnt2;
+ break;
case TCO_MESSAGE1:
- return tr->tco.msg1;
+ ret = tr->tco.msg1;
+ break;
case TCO_MESSAGE2:
- return tr->tco.msg2;
+ ret = tr->tco.msg2;
+ break;
case TCO_WDCNT:
- return tr->tco.wdcnt;
+ ret = tr->tco.wdcnt;
+ break;
case TCO_TMR:
- return tr->tco.tmr;
+ ret = tr->tco.tmr;
+ break;
case SW_IRQ_GEN:
- return tr->sw_irq_gen;
+ ret = tr->sw_irq_gen;
+ break;
}
- return 0;
+ trace_tco_io_read(addr, ret);
+ return ret;
}
static void tco_ioport_writew(TCOIORegs *tr, uint32_t addr, uint32_t val)
{
+ trace_tco_io_write(addr, val);
switch (addr) {
case TCO_RLD:
tr->timeouts_no = 0;
diff --git a/hw/acpi/trace-events b/hw/acpi/trace-events
index eb60b04f9b..78e0a8670e 100644
--- a/hw/acpi/trace-events
+++ b/hw/acpi/trace-events
@@ -55,6 +55,8 @@ piix4_gpe_writeb(uint64_t addr, unsigned width, uint64_t val) "addr: 0x%" PRIx64
# tco.c
tco_timer_reload(int ticks, int msec) "ticks=%d (%d ms)"
tco_timer_expired(int timeouts_no, bool strap, bool no_reboot) "timeouts_no=%d no_reboot=%d/%d"
+tco_io_write(uint64_t addr, uint32_t val) "addr=0x%" PRIx64 " val=0x%" PRIx32
+tco_io_read(uint64_t addr, uint32_t val) "addr=0x%" PRIx64 " val=0x%" PRIx32
# erst.c
acpi_erst_reg_write(uint64_t addr, uint64_t val, unsigned size) "addr: 0x%04" PRIx64 " <== 0x%016" PRIx64 " (size: %u)"
--
2.37.3
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 2/4] hw/isa: add trace events for ICH9 LPC chip config access
2022-10-31 13:19 [PATCH 0/4] hw: make TCO watchdog actually work by default for Q35 Daniel P. Berrangé
2022-10-31 13:19 ` [PATCH 1/4] hw/acpi: add trace events for TCO watchdog register access Daniel P. Berrangé
@ 2022-10-31 13:19 ` Daniel P. Berrangé
2022-10-31 13:36 ` Richard W.M. Jones
2022-10-31 13:19 ` [PATCH 3/4] hw/watchdog: add trace events for watchdog action handling Daniel P. Berrangé
` (3 subsequent siblings)
5 siblings, 1 reply; 17+ messages in thread
From: Daniel P. Berrangé @ 2022-10-31 13:19 UTC (permalink / raw)
To: qemu-devel
Cc: Laurent Vivier, Richard Henderson, Ani Sinha, Igor Mammedov,
Michael S. Tsirkin, Thomas Huth, Paolo Bonzini, Eduardo Habkost,
Marcel Apfelbaum, Richard W.M. Jones, Daniel P. Berrangé
These tracepoints aid in understanding and debugging the guest drivers
for the TCO watchdog.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
---
hw/isa/lpc_ich9.c | 3 +++
hw/isa/trace-events | 4 ++++
2 files changed, 7 insertions(+)
diff --git a/hw/isa/lpc_ich9.c b/hw/isa/lpc_ich9.c
index 4553b5925b..66062a344c 100644
--- a/hw/isa/lpc_ich9.c
+++ b/hw/isa/lpc_ich9.c
@@ -51,6 +51,7 @@
#include "hw/nvram/fw_cfg.h"
#include "qemu/cutils.h"
#include "hw/acpi/acpi_aml_interface.h"
+#include "trace.h"
/*****************************************************************************/
/* ICH9 LPC PCI to ISA bridge */
@@ -161,6 +162,7 @@ static void ich9_cc_write(void *opaque, hwaddr addr,
{
ICH9LPCState *lpc = (ICH9LPCState *)opaque;
+ trace_ich9_cc_write(addr, val, len);
ich9_cc_addr_len(&addr, &len);
memcpy(lpc->chip_config + addr, &val, len);
pci_bus_fire_intx_routing_notifier(pci_get_bus(&lpc->d));
@@ -176,6 +178,7 @@ static uint64_t ich9_cc_read(void *opaque, hwaddr addr,
uint32_t val = 0;
ich9_cc_addr_len(&addr, &len);
memcpy(&val, lpc->chip_config + addr, len);
+ trace_ich9_cc_read(addr, val, len);
return val;
}
diff --git a/hw/isa/trace-events b/hw/isa/trace-events
index b8f877e1ed..c4567a9b47 100644
--- a/hw/isa/trace-events
+++ b/hw/isa/trace-events
@@ -21,3 +21,7 @@ via_pm_io_read(uint32_t addr, uint32_t val, int len) "addr 0x%x val 0x%x len 0x%
via_pm_io_write(uint32_t addr, uint32_t val, int len) "addr 0x%x val 0x%x len 0x%x"
via_superio_read(uint8_t addr, uint8_t val) "addr 0x%x val 0x%x"
via_superio_write(uint8_t addr, uint32_t val) "addr 0x%x val 0x%x"
+
+# lpc_ich9.c
+ich9_cc_write(uint64_t addr, uint64_t val, unsigned len) "addr=0x%"PRIx64 " val=0x%"PRIx64 " len=%u"
+ich9_cc_read(uint64_t addr, uint64_t val, unsigned len) "addr=0x%"PRIx64 " val=0x%"PRIx64 " len=%u"
--
2.37.3
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 3/4] hw/watchdog: add trace events for watchdog action handling
2022-10-31 13:19 [PATCH 0/4] hw: make TCO watchdog actually work by default for Q35 Daniel P. Berrangé
2022-10-31 13:19 ` [PATCH 1/4] hw/acpi: add trace events for TCO watchdog register access Daniel P. Berrangé
2022-10-31 13:19 ` [PATCH 2/4] hw/isa: add trace events for ICH9 LPC chip config access Daniel P. Berrangé
@ 2022-10-31 13:19 ` Daniel P. Berrangé
2022-10-31 13:36 ` Richard W.M. Jones
2022-10-31 15:46 ` Philippe Mathieu-Daudé
2022-10-31 13:19 ` [PATCH 4/4] hw/isa: enable TCO watchdog reboot pin strap by default Daniel P. Berrangé
` (2 subsequent siblings)
5 siblings, 2 replies; 17+ messages in thread
From: Daniel P. Berrangé @ 2022-10-31 13:19 UTC (permalink / raw)
To: qemu-devel
Cc: Laurent Vivier, Richard Henderson, Ani Sinha, Igor Mammedov,
Michael S. Tsirkin, Thomas Huth, Paolo Bonzini, Eduardo Habkost,
Marcel Apfelbaum, Richard W.M. Jones, Daniel P. Berrangé
The tracepoints aid in debugging the triggering of watchdog devices.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
---
hw/watchdog/trace-events | 4 ++++
hw/watchdog/watchdog.c | 4 ++++
2 files changed, 8 insertions(+)
diff --git a/hw/watchdog/trace-events b/hw/watchdog/trace-events
index 89ccbcfdfd..fc1d048702 100644
--- a/hw/watchdog/trace-events
+++ b/hw/watchdog/trace-events
@@ -16,3 +16,7 @@ spapr_watchdog_stop(uint64_t num, uint64_t ret) "num=%" PRIu64 " ret=%" PRId64
spapr_watchdog_query(uint64_t caps) "caps=0x%" PRIx64
spapr_watchdog_query_lpm(uint64_t caps) "caps=0x%" PRIx64
spapr_watchdog_expired(uint64_t num, unsigned action) "num=%" PRIu64 " action=%u"
+
+# watchdog.c
+watchdog_perform_action(unsigned int action) "action=%d"
+watchdog_set_action(unsigned int action) "action=%d"
diff --git a/hw/watchdog/watchdog.c b/hw/watchdog/watchdog.c
index 6c082a3263..955046161b 100644
--- a/hw/watchdog/watchdog.c
+++ b/hw/watchdog/watchdog.c
@@ -30,6 +30,7 @@
#include "sysemu/watchdog.h"
#include "hw/nmi.h"
#include "qemu/help_option.h"
+#include "trace.h"
static WatchdogAction watchdog_action = WATCHDOG_ACTION_RESET;
@@ -43,6 +44,8 @@ WatchdogAction get_watchdog_action(void)
*/
void watchdog_perform_action(void)
{
+ trace_watchdog_perform_action(watchdog_action);
+
switch (watchdog_action) {
case WATCHDOG_ACTION_RESET: /* same as 'system_reset' in monitor */
qapi_event_send_watchdog(WATCHDOG_ACTION_RESET);
@@ -89,4 +92,5 @@ void watchdog_perform_action(void)
void qmp_watchdog_set_action(WatchdogAction action, Error **errp)
{
watchdog_action = action;
+ trace_watchdog_set_action(watchdog_action);
}
--
2.37.3
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 4/4] hw/isa: enable TCO watchdog reboot pin strap by default
2022-10-31 13:19 [PATCH 0/4] hw: make TCO watchdog actually work by default for Q35 Daniel P. Berrangé
` (2 preceding siblings ...)
2022-10-31 13:19 ` [PATCH 3/4] hw/watchdog: add trace events for watchdog action handling Daniel P. Berrangé
@ 2022-10-31 13:19 ` Daniel P. Berrangé
2022-10-31 13:40 ` Richard W.M. Jones
2022-10-31 13:50 ` [PATCH 0/4] hw: make TCO watchdog actually work by default for Q35 Daniel P. Berrangé
2022-11-10 16:30 ` Michael S. Tsirkin
5 siblings, 1 reply; 17+ messages in thread
From: Daniel P. Berrangé @ 2022-10-31 13:19 UTC (permalink / raw)
To: qemu-devel
Cc: Laurent Vivier, Richard Henderson, Ani Sinha, Igor Mammedov,
Michael S. Tsirkin, Thomas Huth, Paolo Bonzini, Eduardo Habkost,
Marcel Apfelbaum, Richard W.M. Jones, Daniel P. Berrangé
The TCO watchdog implementation default behaviour from POV of the
guest OS relies on the initial values for two I/O ports:
* TCO1_CNT == 0x0
Since bit 11 (TCO Timer Halt) is clear, the watchdog state
is considered to be initially running
* GCS == 0x20
Since bit 5 (No Reboot) is set, the watchdog will not trigger
when the timer expires
This is a safe default, because the No Reboot bit will prevent the
watchdog from triggering if the guest OS is unaware of its existance,
or is slow in configuring it. When a Linux guest initializes the TCO
watchdog, it will attempt to clear the "No Reboot" flag, and read the
value back. If the clear was honoured, the driver will treat this as
an indicator that the watchdog is functional and create the geust
watchdog device.
QEMU implements a second "no reboot" flag, however, via pin straps
which overrides the behaviour of the guest controlled "no reboot"
flag:
commit 5add35bec1e249bb5345a47008c8f298d4760be4
Author: Paulo Alcantara <pcacjr@gmail.com>
Date: Sun Jun 28 14:58:58 2015 -0300
ich9: implement strap SPKR pin logic
This second 'noreboot' pin was defaulted to high, which also inhibits
triggering of the requested watchdog actions, unless QEMU is launched
with the magic flag "-global ICH9-LPC.noreboot=false".
This is a bad default as we are exposing a watchdog to every guest OS
using the q35 machine type, but preventing it from actually doing what
it is designed to do. What is worse is that the guest OS and its apps
have no way to know that the watchdog is never going to fire, due to
this second 'noreboot' pin.
If a guest OS had no watchdog device at all, then apps whose operation
and/or data integrity relies on a watchdog can refuse to launch, and
alert the administrator of the problematic deployment. With Q35 machines
unconditionally exposing a watchdog though, apps will think their
deployment is correct but in fact have no protection at all.
This patch flips the default of the second 'no reboot' flag, so that
configured watchdog actions will be honoured out of the box for the
7.2 Q35 machine type onwards, if the guest enables use of the watchdog.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
---
hw/i386/pc.c | 4 +++-
hw/isa/lpc_ich9.c | 2 +-
tests/qtest/tco-test.c | 2 +-
3 files changed, 5 insertions(+), 3 deletions(-)
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 3e86083db3..e814f62fc6 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -108,7 +108,9 @@
{ "qemu64-" TYPE_X86_CPU, "model-id", "QEMU Virtual CPU version " v, },\
{ "athlon-" TYPE_X86_CPU, "model-id", "QEMU Virtual CPU version " v, },
-GlobalProperty pc_compat_7_1[] = {};
+GlobalProperty pc_compat_7_1[] = {
+ { "ICH9-LPC", "noreboot", "true" },
+};
const size_t pc_compat_7_1_len = G_N_ELEMENTS(pc_compat_7_1);
GlobalProperty pc_compat_7_0[] = {};
diff --git a/hw/isa/lpc_ich9.c b/hw/isa/lpc_ich9.c
index 66062a344c..f9ce2ee1dc 100644
--- a/hw/isa/lpc_ich9.c
+++ b/hw/isa/lpc_ich9.c
@@ -789,7 +789,7 @@ static const VMStateDescription vmstate_ich9_lpc = {
};
static Property ich9_lpc_properties[] = {
- DEFINE_PROP_BOOL("noreboot", ICH9LPCState, pin_strap.spkr_hi, true),
+ DEFINE_PROP_BOOL("noreboot", ICH9LPCState, pin_strap.spkr_hi, false),
DEFINE_PROP_BOOL("smm-compat", ICH9LPCState, pm.smm_compat, false),
DEFINE_PROP_BIT64("x-smi-broadcast", ICH9LPCState, smi_host_features,
ICH9_LPC_SMI_F_BROADCAST_BIT, true),
diff --git a/tests/qtest/tco-test.c b/tests/qtest/tco-test.c
index 254f735370..caabcac6e5 100644
--- a/tests/qtest/tco-test.c
+++ b/tests/qtest/tco-test.c
@@ -60,7 +60,7 @@ static void test_init(TestData *d)
QTestState *qs;
qs = qtest_initf("-machine q35 %s %s",
- d->noreboot ? "" : "-global ICH9-LPC.noreboot=false",
+ d->noreboot ? "-global ICH9-LPC.noreboot=true" : "",
!d->args ? "" : d->args);
qtest_irq_intercept_in(qs, "ioapic");
--
2.37.3
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH 1/4] hw/acpi: add trace events for TCO watchdog register access
2022-10-31 13:19 ` [PATCH 1/4] hw/acpi: add trace events for TCO watchdog register access Daniel P. Berrangé
@ 2022-10-31 13:34 ` Richard W.M. Jones
0 siblings, 0 replies; 17+ messages in thread
From: Richard W.M. Jones @ 2022-10-31 13:34 UTC (permalink / raw)
To: Daniel P. Berrangé
Cc: qemu-devel, Laurent Vivier, Richard Henderson, Ani Sinha,
Igor Mammedov, Michael S. Tsirkin, Thomas Huth, Paolo Bonzini,
Eduardo Habkost, Marcel Apfelbaum
On Mon, Oct 31, 2022 at 01:19:31PM +0000, Daniel P. Berrangé wrote:
> These tracepoints aid in understanding and debugging the guest drivers
> for the TCO watchdog.
>
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> ---
> hw/acpi/tco.c | 41 ++++++++++++++++++++++++++++-------------
> hw/acpi/trace-events | 2 ++
> 2 files changed, 30 insertions(+), 13 deletions(-)
>
> diff --git a/hw/acpi/tco.c b/hw/acpi/tco.c
> index 4783721e4e..9ebd3e5e64 100644
> --- a/hw/acpi/tco.c
> +++ b/hw/acpi/tco.c
> @@ -86,6 +86,7 @@ static inline int can_start_tco_timer(TCOIORegs *tr)
> static uint32_t tco_ioport_readw(TCOIORegs *tr, uint32_t addr)
> {
> uint16_t rld;
> + uint32_t ret = 0;
>
> switch (addr) {
> case TCO_RLD:
> @@ -96,35 +97,49 @@ static uint32_t tco_ioport_readw(TCOIORegs *tr, uint32_t addr)
> } else {
> rld = tr->tco.rld;
> }
> - return rld;
> + ret = rld;
> + break;
> case TCO_DAT_IN:
> - return tr->tco.din;
> + ret = tr->tco.din;
> + break;
> case TCO_DAT_OUT:
> - return tr->tco.dout;
> + ret = tr->tco.dout;
> + break;
> case TCO1_STS:
> - return tr->tco.sts1;
> + ret = tr->tco.sts1;
> + break;
> case TCO2_STS:
> - return tr->tco.sts2;
> + ret = tr->tco.sts2;
> + break;
> case TCO1_CNT:
> - return tr->tco.cnt1;
> + ret = tr->tco.cnt1;
> + break;
> case TCO2_CNT:
> - return tr->tco.cnt2;
> + ret = tr->tco.cnt2;
> + break;
> case TCO_MESSAGE1:
> - return tr->tco.msg1;
> + ret = tr->tco.msg1;
> + break;
> case TCO_MESSAGE2:
> - return tr->tco.msg2;
> + ret = tr->tco.msg2;
> + break;
> case TCO_WDCNT:
> - return tr->tco.wdcnt;
> + ret = tr->tco.wdcnt;
> + break;
> case TCO_TMR:
> - return tr->tco.tmr;
> + ret = tr->tco.tmr;
> + break;
> case SW_IRQ_GEN:
> - return tr->sw_irq_gen;
> + ret = tr->sw_irq_gen;
> + break;
> }
> - return 0;
> + trace_tco_io_read(addr, ret);
> + return ret;
> }
>
> static void tco_ioport_writew(TCOIORegs *tr, uint32_t addr, uint32_t val)
> {
> + trace_tco_io_write(addr, val);
> switch (addr) {
> case TCO_RLD:
> tr->timeouts_no = 0;
> diff --git a/hw/acpi/trace-events b/hw/acpi/trace-events
> index eb60b04f9b..78e0a8670e 100644
> --- a/hw/acpi/trace-events
> +++ b/hw/acpi/trace-events
> @@ -55,6 +55,8 @@ piix4_gpe_writeb(uint64_t addr, unsigned width, uint64_t val) "addr: 0x%" PRIx64
> # tco.c
> tco_timer_reload(int ticks, int msec) "ticks=%d (%d ms)"
> tco_timer_expired(int timeouts_no, bool strap, bool no_reboot) "timeouts_no=%d no_reboot=%d/%d"
> +tco_io_write(uint64_t addr, uint32_t val) "addr=0x%" PRIx64 " val=0x%" PRIx32
> +tco_io_read(uint64_t addr, uint32_t val) "addr=0x%" PRIx64 " val=0x%" PRIx32
>
> # erst.c
> acpi_erst_reg_write(uint64_t addr, uint64_t val, unsigned size) "addr: 0x%04" PRIx64 " <== 0x%016" PRIx64 " (size: %u)"
Reviewed-by: Richard W.M. Jones <rjones@redhat.com>
Rich.
--
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-builder quickly builds VMs from scratch
http://libguestfs.org/virt-builder.1.html
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 2/4] hw/isa: add trace events for ICH9 LPC chip config access
2022-10-31 13:19 ` [PATCH 2/4] hw/isa: add trace events for ICH9 LPC chip config access Daniel P. Berrangé
@ 2022-10-31 13:36 ` Richard W.M. Jones
0 siblings, 0 replies; 17+ messages in thread
From: Richard W.M. Jones @ 2022-10-31 13:36 UTC (permalink / raw)
To: Daniel P. Berrangé
Cc: qemu-devel, Laurent Vivier, Richard Henderson, Ani Sinha,
Igor Mammedov, Michael S. Tsirkin, Thomas Huth, Paolo Bonzini,
Eduardo Habkost, Marcel Apfelbaum
On Mon, Oct 31, 2022 at 01:19:32PM +0000, Daniel P. Berrangé wrote:
> These tracepoints aid in understanding and debugging the guest drivers
> for the TCO watchdog.
>
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> ---
> hw/isa/lpc_ich9.c | 3 +++
> hw/isa/trace-events | 4 ++++
> 2 files changed, 7 insertions(+)
>
> diff --git a/hw/isa/lpc_ich9.c b/hw/isa/lpc_ich9.c
> index 4553b5925b..66062a344c 100644
> --- a/hw/isa/lpc_ich9.c
> +++ b/hw/isa/lpc_ich9.c
> @@ -51,6 +51,7 @@
> #include "hw/nvram/fw_cfg.h"
> #include "qemu/cutils.h"
> #include "hw/acpi/acpi_aml_interface.h"
> +#include "trace.h"
>
> /*****************************************************************************/
> /* ICH9 LPC PCI to ISA bridge */
> @@ -161,6 +162,7 @@ static void ich9_cc_write(void *opaque, hwaddr addr,
> {
> ICH9LPCState *lpc = (ICH9LPCState *)opaque;
>
> + trace_ich9_cc_write(addr, val, len);
> ich9_cc_addr_len(&addr, &len);
> memcpy(lpc->chip_config + addr, &val, len);
> pci_bus_fire_intx_routing_notifier(pci_get_bus(&lpc->d));
> @@ -176,6 +178,7 @@ static uint64_t ich9_cc_read(void *opaque, hwaddr addr,
> uint32_t val = 0;
> ich9_cc_addr_len(&addr, &len);
> memcpy(&val, lpc->chip_config + addr, len);
> + trace_ich9_cc_read(addr, val, len);
> return val;
> }
>
> diff --git a/hw/isa/trace-events b/hw/isa/trace-events
> index b8f877e1ed..c4567a9b47 100644
> --- a/hw/isa/trace-events
> +++ b/hw/isa/trace-events
> @@ -21,3 +21,7 @@ via_pm_io_read(uint32_t addr, uint32_t val, int len) "addr 0x%x val 0x%x len 0x%
> via_pm_io_write(uint32_t addr, uint32_t val, int len) "addr 0x%x val 0x%x len 0x%x"
> via_superio_read(uint8_t addr, uint8_t val) "addr 0x%x val 0x%x"
> via_superio_write(uint8_t addr, uint32_t val) "addr 0x%x val 0x%x"
> +
> +# lpc_ich9.c
> +ich9_cc_write(uint64_t addr, uint64_t val, unsigned len) "addr=0x%"PRIx64 " val=0x%"PRIx64 " len=%u"
> +ich9_cc_read(uint64_t addr, uint64_t val, unsigned len) "addr=0x%"PRIx64 " val=0x%"PRIx64 " len=%u"
Reviewed-by: Richard W.M. Jones <rjones@redhat.com>
I can't help thinking that the trace-events file ought to be generated
from the source code ...
Rich.
--
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-top is 'top' for virtual machines. Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://people.redhat.com/~rjones/virt-top
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 3/4] hw/watchdog: add trace events for watchdog action handling
2022-10-31 13:19 ` [PATCH 3/4] hw/watchdog: add trace events for watchdog action handling Daniel P. Berrangé
@ 2022-10-31 13:36 ` Richard W.M. Jones
2022-10-31 15:46 ` Philippe Mathieu-Daudé
1 sibling, 0 replies; 17+ messages in thread
From: Richard W.M. Jones @ 2022-10-31 13:36 UTC (permalink / raw)
To: Daniel P. Berrangé
Cc: qemu-devel, Laurent Vivier, Richard Henderson, Ani Sinha,
Igor Mammedov, Michael S. Tsirkin, Thomas Huth, Paolo Bonzini,
Eduardo Habkost, Marcel Apfelbaum
On Mon, Oct 31, 2022 at 01:19:33PM +0000, Daniel P. Berrangé wrote:
> The tracepoints aid in debugging the triggering of watchdog devices.
>
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> ---
> hw/watchdog/trace-events | 4 ++++
> hw/watchdog/watchdog.c | 4 ++++
> 2 files changed, 8 insertions(+)
>
> diff --git a/hw/watchdog/trace-events b/hw/watchdog/trace-events
> index 89ccbcfdfd..fc1d048702 100644
> --- a/hw/watchdog/trace-events
> +++ b/hw/watchdog/trace-events
> @@ -16,3 +16,7 @@ spapr_watchdog_stop(uint64_t num, uint64_t ret) "num=%" PRIu64 " ret=%" PRId64
> spapr_watchdog_query(uint64_t caps) "caps=0x%" PRIx64
> spapr_watchdog_query_lpm(uint64_t caps) "caps=0x%" PRIx64
> spapr_watchdog_expired(uint64_t num, unsigned action) "num=%" PRIu64 " action=%u"
> +
> +# watchdog.c
> +watchdog_perform_action(unsigned int action) "action=%d"
> +watchdog_set_action(unsigned int action) "action=%d"
> diff --git a/hw/watchdog/watchdog.c b/hw/watchdog/watchdog.c
> index 6c082a3263..955046161b 100644
> --- a/hw/watchdog/watchdog.c
> +++ b/hw/watchdog/watchdog.c
> @@ -30,6 +30,7 @@
> #include "sysemu/watchdog.h"
> #include "hw/nmi.h"
> #include "qemu/help_option.h"
> +#include "trace.h"
>
> static WatchdogAction watchdog_action = WATCHDOG_ACTION_RESET;
>
> @@ -43,6 +44,8 @@ WatchdogAction get_watchdog_action(void)
> */
> void watchdog_perform_action(void)
> {
> + trace_watchdog_perform_action(watchdog_action);
> +
> switch (watchdog_action) {
> case WATCHDOG_ACTION_RESET: /* same as 'system_reset' in monitor */
> qapi_event_send_watchdog(WATCHDOG_ACTION_RESET);
> @@ -89,4 +92,5 @@ void watchdog_perform_action(void)
> void qmp_watchdog_set_action(WatchdogAction action, Error **errp)
> {
> watchdog_action = action;
> + trace_watchdog_set_action(watchdog_action);
> }
Reviewed-by: Richard W.M. Jones <rjones@redhat.com>
Rich.
--
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-builder quickly builds VMs from scratch
http://libguestfs.org/virt-builder.1.html
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 4/4] hw/isa: enable TCO watchdog reboot pin strap by default
2022-10-31 13:19 ` [PATCH 4/4] hw/isa: enable TCO watchdog reboot pin strap by default Daniel P. Berrangé
@ 2022-10-31 13:40 ` Richard W.M. Jones
0 siblings, 0 replies; 17+ messages in thread
From: Richard W.M. Jones @ 2022-10-31 13:40 UTC (permalink / raw)
To: Daniel P. Berrangé
Cc: qemu-devel, Laurent Vivier, Richard Henderson, Ani Sinha,
Igor Mammedov, Michael S. Tsirkin, Thomas Huth, Paolo Bonzini,
Eduardo Habkost, Marcel Apfelbaum
On Mon, Oct 31, 2022 at 01:19:34PM +0000, Daniel P. Berrangé wrote:
> The TCO watchdog implementation default behaviour from POV of the
> guest OS relies on the initial values for two I/O ports:
>
> * TCO1_CNT == 0x0
>
> Since bit 11 (TCO Timer Halt) is clear, the watchdog state
> is considered to be initially running
>
> * GCS == 0x20
>
> Since bit 5 (No Reboot) is set, the watchdog will not trigger
> when the timer expires
>
> This is a safe default, because the No Reboot bit will prevent the
> watchdog from triggering if the guest OS is unaware of its existance,
> or is slow in configuring it. When a Linux guest initializes the TCO
> watchdog, it will attempt to clear the "No Reboot" flag, and read the
> value back. If the clear was honoured, the driver will treat this as
> an indicator that the watchdog is functional and create the geust
Typo: "guest"
> watchdog device.
>
> QEMU implements a second "no reboot" flag, however, via pin straps
> which overrides the behaviour of the guest controlled "no reboot"
> flag:
>
> commit 5add35bec1e249bb5345a47008c8f298d4760be4
> Author: Paulo Alcantara <pcacjr@gmail.com>
> Date: Sun Jun 28 14:58:58 2015 -0300
>
> ich9: implement strap SPKR pin logic
>
> This second 'noreboot' pin was defaulted to high, which also inhibits
> triggering of the requested watchdog actions, unless QEMU is launched
> with the magic flag "-global ICH9-LPC.noreboot=false".
>
> This is a bad default as we are exposing a watchdog to every guest OS
> using the q35 machine type, but preventing it from actually doing what
> it is designed to do. What is worse is that the guest OS and its apps
> have no way to know that the watchdog is never going to fire, due to
> this second 'noreboot' pin.
>
> If a guest OS had no watchdog device at all, then apps whose operation
> and/or data integrity relies on a watchdog can refuse to launch, and
> alert the administrator of the problematic deployment. With Q35 machines
> unconditionally exposing a watchdog though, apps will think their
> deployment is correct but in fact have no protection at all.
>
> This patch flips the default of the second 'no reboot' flag, so that
> configured watchdog actions will be honoured out of the box for the
> 7.2 Q35 machine type onwards, if the guest enables use of the watchdog.
>
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Add Fixes: or some other reference to the BZs? We have a few!
https://bugzilla.redhat.com/show_bug.cgi?id=2136889
https://bugzilla.redhat.com/show_bug.cgi?id=2080207
https://bugzilla.redhat.com/show_bug.cgi?id=2137346 (libvirt)
> hw/i386/pc.c | 4 +++-
> hw/isa/lpc_ich9.c | 2 +-
> tests/qtest/tco-test.c | 2 +-
> 3 files changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index 3e86083db3..e814f62fc6 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -108,7 +108,9 @@
> { "qemu64-" TYPE_X86_CPU, "model-id", "QEMU Virtual CPU version " v, },\
> { "athlon-" TYPE_X86_CPU, "model-id", "QEMU Virtual CPU version " v, },
>
> -GlobalProperty pc_compat_7_1[] = {};
> +GlobalProperty pc_compat_7_1[] = {
> + { "ICH9-LPC", "noreboot", "true" },
> +};
> const size_t pc_compat_7_1_len = G_N_ELEMENTS(pc_compat_7_1);
>
> GlobalProperty pc_compat_7_0[] = {};
> diff --git a/hw/isa/lpc_ich9.c b/hw/isa/lpc_ich9.c
> index 66062a344c..f9ce2ee1dc 100644
> --- a/hw/isa/lpc_ich9.c
> +++ b/hw/isa/lpc_ich9.c
> @@ -789,7 +789,7 @@ static const VMStateDescription vmstate_ich9_lpc = {
> };
>
> static Property ich9_lpc_properties[] = {
> - DEFINE_PROP_BOOL("noreboot", ICH9LPCState, pin_strap.spkr_hi, true),
> + DEFINE_PROP_BOOL("noreboot", ICH9LPCState, pin_strap.spkr_hi, false),
> DEFINE_PROP_BOOL("smm-compat", ICH9LPCState, pm.smm_compat, false),
> DEFINE_PROP_BIT64("x-smi-broadcast", ICH9LPCState, smi_host_features,
> ICH9_LPC_SMI_F_BROADCAST_BIT, true),
> diff --git a/tests/qtest/tco-test.c b/tests/qtest/tco-test.c
> index 254f735370..caabcac6e5 100644
> --- a/tests/qtest/tco-test.c
> +++ b/tests/qtest/tco-test.c
> @@ -60,7 +60,7 @@ static void test_init(TestData *d)
> QTestState *qs;
>
> qs = qtest_initf("-machine q35 %s %s",
> - d->noreboot ? "" : "-global ICH9-LPC.noreboot=false",
> + d->noreboot ? "-global ICH9-LPC.noreboot=true" : "",
> !d->args ? "" : d->args);
> qtest_irq_intercept_in(qs, "ioapic");
Acked-by: Richard W.M. Jones <rjones@redhat.com>
Rich.
--
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 0/4] hw: make TCO watchdog actually work by default for Q35
2022-10-31 13:19 [PATCH 0/4] hw: make TCO watchdog actually work by default for Q35 Daniel P. Berrangé
` (3 preceding siblings ...)
2022-10-31 13:19 ` [PATCH 4/4] hw/isa: enable TCO watchdog reboot pin strap by default Daniel P. Berrangé
@ 2022-10-31 13:50 ` Daniel P. Berrangé
2022-10-31 15:48 ` Michael S. Tsirkin
2022-11-10 16:30 ` Michael S. Tsirkin
5 siblings, 1 reply; 17+ messages in thread
From: Daniel P. Berrangé @ 2022-10-31 13:50 UTC (permalink / raw)
To: qemu-devel
Cc: Laurent Vivier, Richard Henderson, Ani Sinha, Igor Mammedov,
Michael S. Tsirkin, Thomas Huth, Paolo Bonzini, Eduardo Habkost,
Marcel Apfelbaum, Richard W.M. Jones
On Mon, Oct 31, 2022 at 01:19:30PM +0000, Daniel P. Berrangé wrote:
> The TCO watchdog is unconditionally integrated into the Q35 machine
> type by default, but at the same time is unconditionally disabled
> from firing by a host config option that overrides guest OS attempts
> to enable it. People have to know to set a magic -global to make
> it non-broken
Incidentally I found that originally the TCO watchdog was not
unconditionally enabled. Its exposure to the guest could be
turned on/off using
-global ICH9-LPC.enable_tco=bool
This was implemented for machine type compat, but it also gave
apps a way to disable the watchdog functionality. Unfortunately
that ability was discarded in this series:
https://lore.kernel.org/all/1453564933-29638-1-git-send-email-ehabkost@redhat.com/
but the 'enable_tco' property still exists in QOM, but silently
ignored.
Seems we should either fix the impl of 'enable_tco', or remove the
QOM property entirely, so we don't pretend it can be toggled anymore.
With regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 3/4] hw/watchdog: add trace events for watchdog action handling
2022-10-31 13:19 ` [PATCH 3/4] hw/watchdog: add trace events for watchdog action handling Daniel P. Berrangé
2022-10-31 13:36 ` Richard W.M. Jones
@ 2022-10-31 15:46 ` Philippe Mathieu-Daudé
1 sibling, 0 replies; 17+ messages in thread
From: Philippe Mathieu-Daudé @ 2022-10-31 15:46 UTC (permalink / raw)
To: Daniel P. Berrangé, qemu-devel
Cc: Laurent Vivier, Richard Henderson, Ani Sinha, Igor Mammedov,
Michael S. Tsirkin, Thomas Huth, Paolo Bonzini, Eduardo Habkost,
Marcel Apfelbaum, Richard W.M. Jones
On 31/10/22 14:19, Daniel P. Berrangé wrote:
> The tracepoints aid in debugging the triggering of watchdog devices.
>
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> ---
> hw/watchdog/trace-events | 4 ++++
> hw/watchdog/watchdog.c | 4 ++++
> 2 files changed, 8 insertions(+)
>
> diff --git a/hw/watchdog/trace-events b/hw/watchdog/trace-events
> index 89ccbcfdfd..fc1d048702 100644
> --- a/hw/watchdog/trace-events
> +++ b/hw/watchdog/trace-events
> @@ -16,3 +16,7 @@ spapr_watchdog_stop(uint64_t num, uint64_t ret) "num=%" PRIu64 " ret=%" PRId64
> spapr_watchdog_query(uint64_t caps) "caps=0x%" PRIx64
> spapr_watchdog_query_lpm(uint64_t caps) "caps=0x%" PRIx64
> spapr_watchdog_expired(uint64_t num, unsigned action) "num=%" PRIu64 " action=%u"
> +
> +# watchdog.c
> +watchdog_perform_action(unsigned int action) "action=%d"
> +watchdog_set_action(unsigned int action) "action=%d"
"%u", otherwise:
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 0/4] hw: make TCO watchdog actually work by default for Q35
2022-10-31 13:50 ` [PATCH 0/4] hw: make TCO watchdog actually work by default for Q35 Daniel P. Berrangé
@ 2022-10-31 15:48 ` Michael S. Tsirkin
2022-11-01 12:57 ` Igor Mammedov
0 siblings, 1 reply; 17+ messages in thread
From: Michael S. Tsirkin @ 2022-10-31 15:48 UTC (permalink / raw)
To: Daniel P. Berrangé
Cc: qemu-devel, Laurent Vivier, Richard Henderson, Ani Sinha,
Igor Mammedov, Thomas Huth, Paolo Bonzini, Eduardo Habkost,
Marcel Apfelbaum, Richard W.M. Jones
On Mon, Oct 31, 2022 at 01:50:24PM +0000, Daniel P. Berrangé wrote:
> On Mon, Oct 31, 2022 at 01:19:30PM +0000, Daniel P. Berrangé wrote:
> > The TCO watchdog is unconditionally integrated into the Q35 machine
> > type by default, but at the same time is unconditionally disabled
> > from firing by a host config option that overrides guest OS attempts
> > to enable it. People have to know to set a magic -global to make
> > it non-broken
>
> Incidentally I found that originally the TCO watchdog was not
> unconditionally enabled. Its exposure to the guest could be
> turned on/off using
>
> -global ICH9-LPC.enable_tco=bool
>
> This was implemented for machine type compat, but it also gave
> apps a way to disable the watchdog functionality. Unfortunately
> that ability was discarded in this series:
>
> https://lore.kernel.org/all/1453564933-29638-1-git-send-email-ehabkost@redhat.com/
>
> but the 'enable_tco' property still exists in QOM, but silently
> ignored.
>
> Seems we should either fix the impl of 'enable_tco', or remove the
> QOM property entirely, so we don't pretend it can be toggled anymore.
>
> With regards,
> Daniel
i am inclined to say you are right and the fix is to fix the impl.
> --
> |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org -o- https://fstop138.berrange.com :|
> |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 0/4] hw: make TCO watchdog actually work by default for Q35
2022-10-31 15:48 ` Michael S. Tsirkin
@ 2022-11-01 12:57 ` Igor Mammedov
2022-11-01 13:03 ` Daniel P. Berrangé
0 siblings, 1 reply; 17+ messages in thread
From: Igor Mammedov @ 2022-11-01 12:57 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Daniel P. Berrangé, qemu-devel, Laurent Vivier,
Richard Henderson, Ani Sinha, Thomas Huth, Paolo Bonzini,
Eduardo Habkost, Marcel Apfelbaum, Richard W.M. Jones
On Mon, 31 Oct 2022 11:48:58 -0400
"Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Mon, Oct 31, 2022 at 01:50:24PM +0000, Daniel P. Berrangé wrote:
> > On Mon, Oct 31, 2022 at 01:19:30PM +0000, Daniel P. Berrangé wrote:
> > > The TCO watchdog is unconditionally integrated into the Q35 machine
> > > type by default, but at the same time is unconditionally disabled
> > > from firing by a host config option that overrides guest OS attempts
> > > to enable it. People have to know to set a magic -global to make
> > > it non-broken
> >
> > Incidentally I found that originally the TCO watchdog was not
> > unconditionally enabled. Its exposure to the guest could be
> > turned on/off using
> >
> > -global ICH9-LPC.enable_tco=bool
> >
> > This was implemented for machine type compat, but it also gave
> > apps a way to disable the watchdog functionality. Unfortunately
> > that ability was discarded in this series:
> >
> > https://lore.kernel.org/all/1453564933-29638-1-git-send-email-ehabkost@redhat.com/
> >
> > but the 'enable_tco' property still exists in QOM, but silently
> > ignored.
> >
> > Seems we should either fix the impl of 'enable_tco', or remove the
> > QOM property entirely, so we don't pretend it can be toggled anymore.
> >
> > With regards,
> > Daniel
>
> i am inclined to say you are right and the fix is to fix the impl.
Is there need for users to disable whatchdog at all?
It was always present since then and no one complained,
so perhaps we should ditch property instead fixing it
to keep it simple.
>
> > --
> > |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
> > |: https://libvirt.org -o- https://fstop138.berrange.com :|
> > |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 0/4] hw: make TCO watchdog actually work by default for Q35
2022-11-01 12:57 ` Igor Mammedov
@ 2022-11-01 13:03 ` Daniel P. Berrangé
2022-11-10 16:29 ` Michael S. Tsirkin
0 siblings, 1 reply; 17+ messages in thread
From: Daniel P. Berrangé @ 2022-11-01 13:03 UTC (permalink / raw)
To: Igor Mammedov
Cc: Michael S. Tsirkin, qemu-devel, Laurent Vivier, Richard Henderson,
Ani Sinha, Thomas Huth, Paolo Bonzini, Eduardo Habkost,
Marcel Apfelbaum, Richard W.M. Jones
On Tue, Nov 01, 2022 at 01:57:24PM +0100, Igor Mammedov wrote:
> On Mon, 31 Oct 2022 11:48:58 -0400
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
>
> > On Mon, Oct 31, 2022 at 01:50:24PM +0000, Daniel P. Berrangé wrote:
> > > On Mon, Oct 31, 2022 at 01:19:30PM +0000, Daniel P. Berrangé wrote:
> > > > The TCO watchdog is unconditionally integrated into the Q35 machine
> > > > type by default, but at the same time is unconditionally disabled
> > > > from firing by a host config option that overrides guest OS attempts
> > > > to enable it. People have to know to set a magic -global to make
> > > > it non-broken
> > >
> > > Incidentally I found that originally the TCO watchdog was not
> > > unconditionally enabled. Its exposure to the guest could be
> > > turned on/off using
> > >
> > > -global ICH9-LPC.enable_tco=bool
> > >
> > > This was implemented for machine type compat, but it also gave
> > > apps a way to disable the watchdog functionality. Unfortunately
> > > that ability was discarded in this series:
> > >
> > > https://lore.kernel.org/all/1453564933-29638-1-git-send-email-ehabkost@redhat.com/
> > >
> > > but the 'enable_tco' property still exists in QOM, but silently
> > > ignored.
> > >
> > > Seems we should either fix the impl of 'enable_tco', or remove the
> > > QOM property entirely, so we don't pretend it can be toggled anymore.
> > >
> > > With regards,
> > > Daniel
> >
> > i am inclined to say you are right and the fix is to fix the impl.
>
> Is there need for users to disable whatchdog at all?
> It was always present since then and no one complained,
> so perhaps we should ditch property instead fixing it
> to keep it simple.
Thinking about it more, I think we should NOT fix the 'enable_tco' property,
because there will be no way for a mgmt appp to tell if they're using a
fixed or broken QEMU. So if they use 'enable_tco' on a broken QEMU and then
live migrate, they'll get an guest ABI change. If we did want to support
disabling it, then we should have a brand new property that apps can probe
for.
In the absence of a request to disable watchdog, I'd say we just delete
'enable_tco' right now. If someone wants it in future, we can add it with
a new name.
With regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 0/4] hw: make TCO watchdog actually work by default for Q35
2022-11-01 13:03 ` Daniel P. Berrangé
@ 2022-11-10 16:29 ` Michael S. Tsirkin
2022-11-10 18:21 ` Daniel P. Berrangé
0 siblings, 1 reply; 17+ messages in thread
From: Michael S. Tsirkin @ 2022-11-10 16:29 UTC (permalink / raw)
To: Daniel P. Berrangé
Cc: Igor Mammedov, qemu-devel, Laurent Vivier, Richard Henderson,
Ani Sinha, Thomas Huth, Paolo Bonzini, Eduardo Habkost,
Marcel Apfelbaum, Richard W.M. Jones
On Tue, Nov 01, 2022 at 01:03:17PM +0000, Daniel P. Berrangé wrote:
> On Tue, Nov 01, 2022 at 01:57:24PM +0100, Igor Mammedov wrote:
> > On Mon, 31 Oct 2022 11:48:58 -0400
> > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >
> > > On Mon, Oct 31, 2022 at 01:50:24PM +0000, Daniel P. Berrangé wrote:
> > > > On Mon, Oct 31, 2022 at 01:19:30PM +0000, Daniel P. Berrangé wrote:
> > > > > The TCO watchdog is unconditionally integrated into the Q35 machine
> > > > > type by default, but at the same time is unconditionally disabled
> > > > > from firing by a host config option that overrides guest OS attempts
> > > > > to enable it. People have to know to set a magic -global to make
> > > > > it non-broken
> > > >
> > > > Incidentally I found that originally the TCO watchdog was not
> > > > unconditionally enabled. Its exposure to the guest could be
> > > > turned on/off using
> > > >
> > > > -global ICH9-LPC.enable_tco=bool
> > > >
> > > > This was implemented for machine type compat, but it also gave
> > > > apps a way to disable the watchdog functionality. Unfortunately
> > > > that ability was discarded in this series:
> > > >
> > > > https://lore.kernel.org/all/1453564933-29638-1-git-send-email-ehabkost@redhat.com/
> > > >
> > > > but the 'enable_tco' property still exists in QOM, but silently
> > > > ignored.
> > > >
> > > > Seems we should either fix the impl of 'enable_tco', or remove the
> > > > QOM property entirely, so we don't pretend it can be toggled anymore.
> > > >
> > > > With regards,
> > > > Daniel
> > >
> > > i am inclined to say you are right and the fix is to fix the impl.
> >
> > Is there need for users to disable whatchdog at all?
> > It was always present since then and no one complained,
> > so perhaps we should ditch property instead fixing it
> > to keep it simple.
>
> Thinking about it more, I think we should NOT fix the 'enable_tco' property,
> because there will be no way for a mgmt appp to tell if they're using a
> fixed or broken QEMU.
This is always the case for any bug. We don't as a rule add properties
for this, it is distro's responsibility to pick bugfixes for
features users care about.
What makes this bug different?
> So if they use 'enable_tco' on a broken QEMU and then
> live migrate, they'll get an guest ABI change. If we did want to support
> disabling it, then we should have a brand new property that apps can probe
> for.
>
> In the absence of a request to disable watchdog, I'd say we just delete
> 'enable_tco' right now. If someone wants it in future, we can add it with
> a new name.
>
> With regards,
> Daniel
I am really stressed about watchdog firing and resetting VMs that previously
were working fine. Adding this without a safety mechanism to quickly
disable in case of problems in the field is not wise imho.
> --
> |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org -o- https://fstop138.berrange.com :|
> |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 0/4] hw: make TCO watchdog actually work by default for Q35
2022-10-31 13:19 [PATCH 0/4] hw: make TCO watchdog actually work by default for Q35 Daniel P. Berrangé
` (4 preceding siblings ...)
2022-10-31 13:50 ` [PATCH 0/4] hw: make TCO watchdog actually work by default for Q35 Daniel P. Berrangé
@ 2022-11-10 16:30 ` Michael S. Tsirkin
5 siblings, 0 replies; 17+ messages in thread
From: Michael S. Tsirkin @ 2022-11-10 16:30 UTC (permalink / raw)
To: Daniel P. Berrangé
Cc: qemu-devel, Laurent Vivier, Richard Henderson, Ani Sinha,
Igor Mammedov, Thomas Huth, Paolo Bonzini, Eduardo Habkost,
Marcel Apfelbaum, Richard W.M. Jones
On Mon, Oct 31, 2022 at 01:19:30PM +0000, Daniel P. Berrangé wrote:
> The TCO watchdog is unconditionally integrated into the Q35 machine
> type by default, but at the same time is unconditionally disabled
> from firing by a host config option that overrides guest OS attempts
> to enable it. People have to know to set a magic -global to make
> it non-broken
>
> IOW we're exposing a broken watchdog by default to all Q35 machines,
> but which to the guest OS & its apps looks fully functional :-(
>
> This behaviour was set in response to feedback from Michael:
>
> https://lists.gnu.org/archive/html/qemu-devel/2015-06/msg07128.html
>
> "I think sample high is a safer default."
>
> but as explained in the commit message in the last patch, I think the
> watchdog defaults were already safe without that pin strap setting.
> The guest OS needs to take explicit action to clear the guest visible
> 'no reboot' flag, and so we don't need a second guest hidden 'no reboot'
> flag to override that choice IMHO. Am I missing something ?
>
> NB, I'm toggling this for 7.2 machine type since that's the current
> git latest machine. Since this has already been "broken" for 7 years
> though, I am ambivalent about whether we try todo this for 7.2, vs
> just wait until the 8.0 machine types arrive.
So I expect v2 with minor issues fixed and hopefully a
fixed property (we can debate removing it down the road
if we want).
> Daniel P. Berrangé (4):
> hw/acpi: add trace events for TCO watchdog register access
> hw/isa: add trace events for ICH9 LPC chip config access
> hw/watchdog: add trace events for watchdog action handling
> hw/isa: enable TCO watchdog reboot pin strap by default
>
> hw/acpi/tco.c | 41 +++++++++++++++++++++++++++-------------
> hw/acpi/trace-events | 2 ++
> hw/i386/pc.c | 4 +++-
> hw/isa/lpc_ich9.c | 5 ++++-
> hw/isa/trace-events | 4 ++++
> hw/watchdog/trace-events | 4 ++++
> hw/watchdog/watchdog.c | 4 ++++
> tests/qtest/tco-test.c | 2 +-
> 8 files changed, 50 insertions(+), 16 deletions(-)
>
> --
> 2.37.3
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 0/4] hw: make TCO watchdog actually work by default for Q35
2022-11-10 16:29 ` Michael S. Tsirkin
@ 2022-11-10 18:21 ` Daniel P. Berrangé
0 siblings, 0 replies; 17+ messages in thread
From: Daniel P. Berrangé @ 2022-11-10 18:21 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Igor Mammedov, qemu-devel, Laurent Vivier, Richard Henderson,
Ani Sinha, Thomas Huth, Paolo Bonzini, Eduardo Habkost,
Marcel Apfelbaum, Richard W.M. Jones
On Thu, Nov 10, 2022 at 11:29:21AM -0500, Michael S. Tsirkin wrote:
> On Tue, Nov 01, 2022 at 01:03:17PM +0000, Daniel P. Berrangé wrote:
> > On Tue, Nov 01, 2022 at 01:57:24PM +0100, Igor Mammedov wrote:
> > > On Mon, 31 Oct 2022 11:48:58 -0400
> > > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > >
> > > > On Mon, Oct 31, 2022 at 01:50:24PM +0000, Daniel P. Berrangé wrote:
> > > > > On Mon, Oct 31, 2022 at 01:19:30PM +0000, Daniel P. Berrangé wrote:
> > > > > > The TCO watchdog is unconditionally integrated into the Q35 machine
> > > > > > type by default, but at the same time is unconditionally disabled
> > > > > > from firing by a host config option that overrides guest OS attempts
> > > > > > to enable it. People have to know to set a magic -global to make
> > > > > > it non-broken
> > > > >
> > > > > Incidentally I found that originally the TCO watchdog was not
> > > > > unconditionally enabled. Its exposure to the guest could be
> > > > > turned on/off using
> > > > >
> > > > > -global ICH9-LPC.enable_tco=bool
> > > > >
> > > > > This was implemented for machine type compat, but it also gave
> > > > > apps a way to disable the watchdog functionality. Unfortunately
> > > > > that ability was discarded in this series:
> > > > >
> > > > > https://lore.kernel.org/all/1453564933-29638-1-git-send-email-ehabkost@redhat.com/
> > > > >
> > > > > but the 'enable_tco' property still exists in QOM, but silently
> > > > > ignored.
> > > > >
> > > > > Seems we should either fix the impl of 'enable_tco', or remove the
> > > > > QOM property entirely, so we don't pretend it can be toggled anymore.
> > > > >
> > > > > With regards,
> > > > > Daniel
> > > >
> > > > i am inclined to say you are right and the fix is to fix the impl.
> > >
> > > Is there need for users to disable whatchdog at all?
> > > It was always present since then and no one complained,
> > > so perhaps we should ditch property instead fixing it
> > > to keep it simple.
> >
> > Thinking about it more, I think we should NOT fix the 'enable_tco' property,
> > because there will be no way for a mgmt appp to tell if they're using a
> > fixed or broken QEMU.
>
> This is always the case for any bug. We don't as a rule add properties
> for this, it is distro's responsibility to pick bugfixes for
> features users care about.
>
> What makes this bug different?
It is impossible to safely use the enable_tco property because it
will break ABI across live migrations between old and new QEMU.
If it was a recent bug I'd be ok with just fixing it. Since this
property has been broken for 6 years though, IMHO it just needs
to be deleted entirely.
> > So if they use 'enable_tco' on a broken QEMU and then
> > live migrate, they'll get an guest ABI change. If we did want to support
> > disabling it, then we should have a brand new property that apps can probe
> > for.
> >
> > In the absence of a request to disable watchdog, I'd say we just delete
> > 'enable_tco' right now. If someone wants it in future, we can add it with
> > a new name.
>
> I am really stressed about watchdog firing and resetting VMs that previously
> were working fine. Adding this without a safety mechanism to quickly
> disable in case of problems in the field is not wise imho.
We would only toggle the ICH9-LPC.noreboot flag for new machine types, so
wouldn't affect existing VMs, and if anyone has a problem they can explicitly
set ICH9-LPC.noreboot back to true, or pick the older machine type again.
With regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2022-11-10 18:22 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-10-31 13:19 [PATCH 0/4] hw: make TCO watchdog actually work by default for Q35 Daniel P. Berrangé
2022-10-31 13:19 ` [PATCH 1/4] hw/acpi: add trace events for TCO watchdog register access Daniel P. Berrangé
2022-10-31 13:34 ` Richard W.M. Jones
2022-10-31 13:19 ` [PATCH 2/4] hw/isa: add trace events for ICH9 LPC chip config access Daniel P. Berrangé
2022-10-31 13:36 ` Richard W.M. Jones
2022-10-31 13:19 ` [PATCH 3/4] hw/watchdog: add trace events for watchdog action handling Daniel P. Berrangé
2022-10-31 13:36 ` Richard W.M. Jones
2022-10-31 15:46 ` Philippe Mathieu-Daudé
2022-10-31 13:19 ` [PATCH 4/4] hw/isa: enable TCO watchdog reboot pin strap by default Daniel P. Berrangé
2022-10-31 13:40 ` Richard W.M. Jones
2022-10-31 13:50 ` [PATCH 0/4] hw: make TCO watchdog actually work by default for Q35 Daniel P. Berrangé
2022-10-31 15:48 ` Michael S. Tsirkin
2022-11-01 12:57 ` Igor Mammedov
2022-11-01 13:03 ` Daniel P. Berrangé
2022-11-10 16:29 ` Michael S. Tsirkin
2022-11-10 18:21 ` Daniel P. Berrangé
2022-11-10 16:30 ` Michael S. Tsirkin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).