From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
To: Jonathan Cameron <jonathan.cameron@huawei.com>
Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>,
Michael S Tsirkin <mst@redhat.com>,
Shiju Jose <shiju.jose@huawei.com>,
qemu-devel@nongnu.org, Igor Mammedov <imammedo@redhat.com>,
Cleber Rosa <crosa@redhat.com>, John Snow <jsnow@redhat.com>
Subject: Re: [PATCH 06/13] scripts/qmp_helper: add support for a timeout logic
Date: Wed, 21 Jan 2026 16:56:00 +0100 [thread overview]
Message-ID: <aXD1Nb4MU2XYI0HL@foz.lan> (raw)
In-Reply-To: <20260121123927.00001daa@huawei.com>
On Wed, Jan 21, 2026 at 12:39:27PM +0000, Jonathan Cameron wrote:
> On Wed, 21 Jan 2026 12:25:14 +0100
> Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
>
> > We can't inject a new GHES record to the same source before
> > it has been acked. There is an async mechanism to verify when
> > the Kernel is ready, which is implemented at QEMU's ghes
> > driver.
> >
> > If error inject is too fast, QEMU may return an error. When
> > such errors occur, implement a retry mechanism, based on a
> > maximum timeout.
> >
> > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> A few trivial comments below. Either way this seems fine to me and
> should make the tooling easier to use.
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
>
> > ---
> > scripts/qmp_helper.py | 47 +++++++++++++++++++++++++++++++------------
> > 1 file changed, 34 insertions(+), 13 deletions(-)
> >
> > diff --git a/scripts/qmp_helper.py b/scripts/qmp_helper.py
> > index 40059cd105f6..63f3df2d75c3 100755
> > --- a/scripts/qmp_helper.py
> > +++ b/scripts/qmp_helper.py
> > @@ -14,6 +14,7 @@
> >
> > from datetime import datetime
> > from os import path as os_path
> > +from time import sleep
> >
> > try:
> > qemu_dir = os_path.abspath(os_path.dirname(os_path.dirname(__file__)))
> > @@ -324,7 +325,8 @@ class qmp:
> > Opens a connection and send/receive QMP commands.
> > """
> >
> > - def send_cmd(self, command, args=None, may_open=False, return_error=True):
> > + def send_cmd(self, command, args=None, may_open=False, return_error=True,
> > + timeout=None):
> > """Send a command to QMP, optinally opening a connection"""
> >
> > if may_open:
> > @@ -336,12 +338,31 @@ def send_cmd(self, command, args=None, may_open=False, return_error=True):
> > if args:
> > msg['arguments'] = args
> >
> > - try:
> > - obj = self.qmp_monitor.cmd_obj(msg)
> > - # Can we use some other exception class here?
> > - except Exception as e: # pylint: disable=W0718
> > - print(f"Command: {command}")
> > - print(f"Failed to inject error: {e}.")
> > + if timeout and timeout > 0:
> > + attempts = int(timeout * 10)
> > + else:
> > + attempts = 1
> > +
> > + # Try up to attempts
> That reads oddly because of the variable name. Made me ask myself
> "How many attempts?"
> Maybe " Retry up to attempts times" or something like that.
I'll improve the message. The goal here is to try up to at least
timeout" seconds.
That's why we multiply it by 10...
>
> > + for i in range(0, attempts):
> > + try:
> > + obj = self.qmp_monitor.cmd_obj(msg)
> > +
> > + if obj and "return" in obj and not obj["return"]:
> > + break
> > +
> > + except Exception as e: # pylint: disable=W0718
> > + print(f"Command: {command}")
> > + print(f"Failed to inject error: {e}.")
> > + obj = None
> > +
> > + if attempts > 1:
> > + print(f"Error inject attempt {i + 1}/{attempts} failed.")
> > +
> > + if i + 1 < attempts:
> > + sleep(0.1)
... and here, we sleep for 0.1 seconds.
>
> Do we care about a sleep at the end? Feels like a micro optimization that
> isn't needed.
This is not a micro-optimization. It is more to ensure that we won't
respin it too fast.
What happens is that QMP interface asks the BIOS to send an async
message to OSPM, cleaning an ack register. When the OSPM reads the
error, it writes 1 to the ack register.
If we send messages too fast, the logic at ghes.c will detect that
the ack didn't happen, imediately returning an errocr code.
On such case, we sleep for 100ms before trying again.
In practice, on my Ryzen 9 machines with QEMU emulating ARM,
even under massive error injection, 99% of the time no retries
happen. The worse case scenario I got here is that sometimes
Kernel got stuck and took between 5s to 10s to accept the error
submission.
>
> > +
> > + if not obj:
> > return None
>
>
--
Thanks,
Mauro
next prev parent reply other threads:[~2026-01-21 15:56 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-21 11:25 [PATCH 00/13] Add more commands to scripts/ghes_inject.py Mauro Carvalho Chehab
2026-01-21 11:25 ` [PATCH 01/13] scripts/qmp_helper: add a return code to send_cper Mauro Carvalho Chehab
2026-01-21 12:08 ` Jonathan Cameron via qemu development
2026-01-21 11:25 ` [PATCH 02/13] scripts/qmp_helper: add missing CXL UEFI GUID Mauro Carvalho Chehab
2026-01-21 12:26 ` Jonathan Cameron
2026-01-21 12:26 ` Jonathan Cameron via qemu development
2026-01-21 15:45 ` Mauro Carvalho Chehab
2026-01-22 10:52 ` Jonathan Cameron
2026-01-22 10:52 ` Jonathan Cameron via qemu development
2026-01-22 15:08 ` Mauro Carvalho Chehab
2026-01-22 17:13 ` Jonathan Cameron
2026-01-22 17:13 ` Jonathan Cameron via qemu development
2026-01-21 11:25 ` [PATCH 03/13] scripts/qmp_helper: add support for FRU Memory Poison Mauro Carvalho Chehab
2026-01-21 12:27 ` Jonathan Cameron via qemu development
2026-01-21 11:25 ` [PATCH 04/13] scripts/qmp_helper: make send_cper() more generic Mauro Carvalho Chehab
2026-01-21 12:30 ` Jonathan Cameron via qemu development
2026-01-21 11:25 ` [PATCH 05/13] scripts/qmp_helper: fix raw_data logic Mauro Carvalho Chehab
2026-01-21 12:35 ` Jonathan Cameron via qemu development
2026-01-21 11:25 ` [PATCH 06/13] scripts/qmp_helper: add support for a timeout logic Mauro Carvalho Chehab
2026-01-21 12:39 ` Jonathan Cameron via qemu development
2026-01-21 15:56 ` Mauro Carvalho Chehab [this message]
2026-01-23 16:16 ` Jonathan Cameron via qemu development
2026-01-26 11:23 ` Mauro Carvalho Chehab
2026-01-26 11:29 ` Mauro Carvalho Chehab
2026-01-26 12:27 ` Jonathan Cameron via qemu development
2026-01-21 11:25 ` [PATCH 07/13] scripts/ghes_inject: add a logic to decode CPER Mauro Carvalho Chehab
2026-01-21 13:27 ` Jonathan Cameron via qemu development
2026-01-21 16:24 ` Mauro Carvalho Chehab
2026-01-22 16:23 ` Mauro Carvalho Chehab
2026-01-21 11:25 ` [PATCH 08/13] scripts/ghes_inject: exit 1 if command was not sent Mauro Carvalho Chehab
2026-01-21 13:28 ` Jonathan Cameron via qemu development
2026-01-21 11:25 ` [PATCH 09/13] scripts/ghes_inject: add a handler for PCIe bus error Mauro Carvalho Chehab
2026-01-21 13:32 ` Jonathan Cameron via qemu development
2026-01-21 13:33 ` Jonathan Cameron via qemu development
2026-02-06 12:52 ` Jonathan Cameron via qemu development
2026-01-21 16:26 ` Mauro Carvalho Chehab
2026-01-22 16:42 ` Mauro Carvalho Chehab
2026-01-21 11:25 ` [PATCH 10/13] scripts/ghes_inject: add support for fuzzy logic testing Mauro Carvalho Chehab
2026-01-21 13:37 ` Jonathan Cameron via qemu development
2026-01-21 16:35 ` Mauro Carvalho Chehab
2026-01-21 11:25 ` [PATCH 11/13] scripts/ghes_inject: add a raw error inject command Mauro Carvalho Chehab
2026-01-21 11:25 ` [PATCH 12/13] scripts/ghes_inject: print help if no command specified Mauro Carvalho Chehab
2026-01-21 13:42 ` Jonathan Cameron via qemu development
2026-01-21 11:25 ` [PATCH 13/13] scripts/ghes_inject: improve help message Mauro Carvalho Chehab
2026-01-21 13:43 ` Jonathan Cameron via qemu development
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aXD1Nb4MU2XYI0HL@foz.lan \
--to=mchehab+huawei@kernel.org \
--cc=crosa@redhat.com \
--cc=imammedo@redhat.com \
--cc=jonathan.cameron@huawei.com \
--cc=jsnow@redhat.com \
--cc=mst@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=shiju.jose@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.