public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
To: Igor Mammedov <imammedo@redhat.com>
Cc: Jonathan Cameron <Jonathan.Cameron@Huawei.com>,
	Shiju Jose <shiju.jose@huawei.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	Ani Sinha <anisinha@redhat.com>,
	Dongjiu Geng <gengdongjiu1@gmail.com>,
	<linux-kernel@vger.kernel.org>, <qemu-arm@nongnu.org>,
	<qemu-devel@nongnu.org>
Subject: Re: [PATCH v8 06/13] acpi/ghes: add support for generic error injection via QAPI
Date: Fri, 13 Sep 2024 07:20:25 +0200	[thread overview]
Message-ID: <20240913072025.76a329b0@foz.lan> (raw)
In-Reply-To: <20240912144233.675d6b63@imammedo.users.ipa.redhat.com>

Em Thu, 12 Sep 2024 14:42:33 +0200
Igor Mammedov <imammedo@redhat.com> escreveu:

> On Wed, 11 Sep 2024 16:34:36 +0100
> Jonathan Cameron <Jonathan.Cameron@Huawei.com> wrote:
> 
> > On Wed, 11 Sep 2024 15:21:32 +0200
> > Igor Mammedov <imammedo@redhat.com> wrote:
> > 
> > > On Sun, 25 Aug 2024 05:29:23 +0200
> > > Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> > >   
> > > > Em Mon, 19 Aug 2024 14:51:36 +0200
> > > > Igor Mammedov <imammedo@redhat.com> escreveu:
> > > >     
> > > > > > +        read_ack = 1;
> > > > > > +        cpu_physical_memory_write(read_ack_start_addr,
> > > > > > +                                  &read_ack, (uint64_t));        
> > > > > we don't do this for SEV so, why are you setting it to 1 here?    

The diffstat doesn't really help here. The full code is:

    /* zero means OSPM does not acknowledge the error */
    if (!read_ack) {
        error_setg(errp,
                   "Last CPER record was not acknowledged yet");
        read_ack = 1;
        cpu_physical_memory_write(read_ack_start_addr,
                                  &read_ack, sizeof(read_ack));
        return;
    }

> > > what you are doing here by setting read_ack = 1,
> > > is making ack on behalf of OSPM when OSPM haven't handled existing error yet.
> > > 
> > > Essentially making HW/FW do the job of OSPM. That looks wrong to me.
> > > From HW/FW side read_ack register should be thought as read-only.  
> > 
> > It's not read-only because HW/FW has to clear it so that HW/FW can detect
> > when the OSPM next writes it.
> 
> By readonly, I've meant that hw shall not do above mentioned write
> (bad phrasing on my side).

The above code is actually an error handling condition: if for some
reason errors are triggered too fast, there's a bug on QEMU or there is
a bug at the OSPM, an error message is raised and the logic resets the 
record to a sane state. So, on a next error, OSPM will get it.

As described at https://uefi.org/specs/ACPI/6.5/18_Platform_Error_Interfaces.html?highlight=asynchronous#generic-hardware-error-source:

   "Some platforms may describe multiple Generic Hardware Error Source
    structures with different notification types, as defined in 
    Table 18.10. For example, a platform may describe one error source
    for the handling of synchronous errors (e.g. MCE or SEA), and a 
    second source for handling asynchronous errors (e.g. SCI or
    External Interrupt)."

Basically, the error logic there seems to fit for the asynchronous
case, detecting if another error happened before OSPM handles the
first one.

IMO, there are a couple of alternatives to handle such case:

1. Keep the code as-is: if this ever happens, an error message will
   be issued. If SEA/MCE gets implemented synchronously on HW/FW/OSPM,
   the above code will never be called;
2. Change the logic to do that only for asynchronous sources
   (currently, only if source ID is QMP);
3. Add a special QMP message to reset the notification ack. Probably
   would use Notification type as an input parameter;
4. Have a much more complex code to implement asynchronous notifications,
   with a queue to receive HEST errors and a separate thread to deliver
   errors to OSPM asynchronously. If we go this way, QMP would be
   returning the number of error messages queued, allowing error injection
   code to know if OSPM has troubles delivering errors;
5. Just return an error code without doing any resets. To me, this is 
   the worse scenario.

I don't like (5), as if something bad happens, there's nothing to be
done.

For QMP error injection (4) seems is overkill. It may be needed in the
future if we end implementing a logic where host OS informs guest about
hardware problems, and such errors use asynchronous notifications.

I would also avoid implementing (3) at least for now, as reporting
such error via QMP seems enough for the QMP usecase.

So, if ok for you, I'll change the code to (2).


> > Agreed this write to 1 looks wrong, but the one a few lines further down (to zero
> > it) is correct.
> 
> yep, hw should clear register.
> It would be better to so on OSPM ACK, but alas we can't intercept that,
> so the next option would be to do that at the time when we add a new error block
> 
> > 
> > My bug a long time back I think.
> > 
> > Jonathan
> > 
> > >   
> > > > 
> > > > IMO, this is needed, independently of the notification mechanism.
> > > > 
> > > > Regards,
> > > > Mauro
> > > >     
> > > 
> > >   
> > 
> 



Thanks,
Mauro

  reply	other threads:[~2024-09-13  5:20 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <cover.1723793768.git.mchehab+huawei@kernel.org>
2024-08-16  7:37 ` [PATCH v8 01/13] acpi/generic_event_device: add an APEI error device Mauro Carvalho Chehab
2024-08-19 11:21   ` Igor Mammedov
2024-08-16  7:37 ` [PATCH v8 02/13] arm/virt: Wire up a GED error device for ACPI / GHES Mauro Carvalho Chehab
2024-08-16  7:37 ` [PATCH v8 03/13] acpi/ghes: Add support for GED error device Mauro Carvalho Chehab
2024-08-19 11:43   ` Igor Mammedov
2024-08-23 23:28     ` Mauro Carvalho Chehab
2024-08-16  7:37 ` [PATCH v8 04/13] qapi/acpi-hest: add an interface to do generic CPER error injection Mauro Carvalho Chehab
2024-08-19 11:54   ` Igor Mammedov
2024-08-16  7:37 ` [PATCH v8 05/13] acpi/ghes: rework the logic to handle HEST source ID Mauro Carvalho Chehab
2024-08-19 12:10   ` Igor Mammedov
2024-08-25  2:02     ` Mauro Carvalho Chehab
2024-08-16  7:37 ` [PATCH v8 06/13] acpi/ghes: add support for generic error injection via QAPI Mauro Carvalho Chehab
2024-08-19 12:51   ` Igor Mammedov
2024-08-25  3:29     ` Mauro Carvalho Chehab
2024-09-11 13:21       ` Igor Mammedov
2024-09-11 15:34         ` Jonathan Cameron
2024-09-12 12:42           ` Igor Mammedov
2024-09-13  5:20             ` Mauro Carvalho Chehab [this message]
2024-09-13 10:13               ` Jonathan Cameron
2024-09-13 12:28                 ` Igor Mammedov
2024-09-14  5:38                   ` Mauro Carvalho Chehab
2024-08-16  7:37 ` [PATCH v8 07/13] acpi/ghes: cleanup the memory error code logic Mauro Carvalho Chehab
2024-08-16  7:37 ` [PATCH v8 08/13] docs: acpi_hest_ghes: fix documentation for CPER size Mauro Carvalho Chehab
2024-08-16  7:37 ` [PATCH v8 09/13] scripts/ghes_inject: add a script to generate GHES error inject Mauro Carvalho Chehab
2024-08-16  7:37 ` [PATCH v8 10/13] target/arm: add an experimental mpidr arm cpu property object Mauro Carvalho Chehab
2024-08-16  7:37 ` [PATCH v8 11/13] scripts/arm_processor_error.py: retrieve mpidr if not filled Mauro Carvalho Chehab
2024-08-16  7:37 ` [PATCH v8 12/13] acpi/ghes: cleanup generic error data logic Mauro Carvalho Chehab
2024-08-19 12:57   ` Igor Mammedov
2024-08-16  7:37 ` [PATCH v8 13/13] acpi/ghes: check if the BIOS pointers for HEST are correct Mauro Carvalho Chehab
2024-08-19 14:07   ` Igor Mammedov
2024-08-24  0:15     ` Mauro Carvalho Chehab
2024-08-25  3:48       ` Mauro Carvalho Chehab

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240913072025.76a329b0@foz.lan \
    --to=mchehab+huawei@kernel.org \
    --cc=Jonathan.Cameron@Huawei.com \
    --cc=anisinha@redhat.com \
    --cc=gengdongjiu1@gmail.com \
    --cc=imammedo@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=qemu-arm@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=shiju.jose@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox