From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id D66DFCCFA1A for ; Tue, 11 Nov 2025 10:09:30 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1vIlIn-0003dF-EK; Tue, 11 Nov 2025 05:08:41 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1vIlIG-0003Oa-J3; Tue, 11 Nov 2025 05:08:10 -0500 Received: from frasgout.his.huawei.com ([185.176.79.56]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1vIlID-0005BK-FO; Tue, 11 Nov 2025 05:08:08 -0500 Received: from mail.maildlp.com (unknown [172.18.186.231]) by frasgout.his.huawei.com (SkyGuard) with ESMTPS id 4d5Mdx4T7yzJ46dq; Tue, 11 Nov 2025 18:07:17 +0800 (CST) Received: from dubpeml100005.china.huawei.com (unknown [7.214.146.113]) by mail.maildlp.com (Postfix) with ESMTPS id 4DFF114038F; Tue, 11 Nov 2025 18:07:48 +0800 (CST) Received: from localhost (10.203.177.15) by dubpeml100005.china.huawei.com (7.214.146.113) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.36; Tue, 11 Nov 2025 10:07:47 +0000 Date: Tue, 11 Nov 2025 10:07:46 +0000 To: Gavin Shan CC: Igor Mammedov , , , , , , , , , Subject: Re: [PATCH v3 4/8] acpi/ghes: Extend acpi_ghes_memory_errors() to support multiple CPERs Message-ID: <20251111100746.00003136@huawei.com> In-Reply-To: References: <20251105114453.2164073-1-gshan@redhat.com> <20251105114453.2164073-5-gshan@redhat.com> <20251105141455.000052f0@huawei.com> <20251110154957.5c481df7@fedora> X-Mailer: Claws Mail 4.3.0 (GTK 3.24.42; x86_64-w64-mingw32) MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.203.177.15] X-ClientProxiedBy: lhrpeml500011.china.huawei.com (7.191.174.215) To dubpeml100005.china.huawei.com (7.214.146.113) Received-SPF: pass client-ip=185.176.79.56; envelope-from=jonathan.cameron@huawei.com; helo=frasgout.his.huawei.com X-Spam_score_int: -41 X-Spam_score: -4.2 X-Spam_bar: ---- X-Spam_report: (-4.2 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, RCVD_IN_VALIDITY_CERTIFIED_BLOCKED=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-to: Jonathan Cameron From: Jonathan Cameron via Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org On Tue, 11 Nov 2025 14:08:13 +1000 Gavin Shan wrote: > Hi Igor and Jonathan, > > On 11/11/25 12:49 AM, Igor Mammedov wrote: > > On Thu, 6 Nov 2025 13:15:52 +1000 > > Gavin Shan wrote: > >> On 11/6/25 12:14 AM, Jonathan Cameron wrote: > >>> On Wed, 5 Nov 2025 21:44:49 +1000 > >>> Gavin Shan wrote: > >>> > >>>> In the situation where host and guest has 64KiB and 4KiB page sizes, > >>>> one problematic host page affects 16 guest pages. we need to send 16 > >>>> consective errors in this specific case. > >>>> > >>>> Extend acpi_ghes_memory_errors() to support multiple CPERs after the > >>>> hunk of code to generate the GHES error status is pulled out from > >>>> ghes_gen_err_data_uncorrectable_recoverable(). The status field of > >>>> generic error status block is also updated accordingly if multiple > >>>> error data entries are contained in the generic error status block. > >>>> > >>>> Signed-off-by: Gavin Shan > >>> Hi Gavin, > >>> > >>> Mostly fine, but a few comments on the defines added and a > >>> question on what the multiple things are meant to mean? > >>> > >> > >> Thanks for your review and comments, replies as below. > >> > >>>> diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c > >>>> index a9c08e73c0..527b85c8d8 100644 > >>>> --- a/hw/acpi/ghes.c > >>>> +++ b/hw/acpi/ghes.c > >>>> @@ -57,8 +57,12 @@ > >>>> /* The memory section CPER size, UEFI 2.6: N.2.5 Memory Error Section */ > >>>> #define ACPI_GHES_MEM_CPER_LENGTH 80 > >>>> > >>>> -/* Masks for block_status flags */ > >>>> -#define ACPI_GEBS_UNCORRECTABLE 1 > >>>> +/* Bits for block_status flags */ > >>>> +#define ACPI_GEBS_UNCORRECTABLE 0 > >>>> +#define ACPI_GEBS_CORRECTABLE 1 > >>>> +#define ACPI_GEBS_MULTIPLE_UNCORRECTABLE 2 > >>>> +#define ACPI_GEBS_MULTIPLE_CORRECTABLE 3 > >>> > >>> So this maps to the bits in block status. > >>> > >>> I'm not actually sure what these multiple variants are meant to tell us. > >>> The multiple error blocks example referred to by the spec is a way to represent > >>> the same error applying to multiple places. So that's one error, many blocks. > >>> I have no idea if we set these bits in that case. > >>> > >>> Based on a quick look I don't think linux even takes any notice. THere > >>> are defines in actbl1.h but I'm not seeing any use made of them. > >>> > >> > >> I hope Igor can confirm since it was suggested by him. > >> > >> It's hard to understand how exactly these multiple variants are used from the > >> spec. In ACPI 6.5 Table 18.11, it's explained as below. > >> > >> Bit [2] - Multiple Uncorrectable Errors: If set to one, indicates that more > >> than one uncorrectable errors have been detected. > >> > >> I don't see those multiple variants have been used by Linux. So I think it's > >> safe to drop them. > > > > even though example describes 'same' error at different components, > > the bit fields descriptions doesn't set any limits on what 'more than one' means. > > > > Also from guest POV it's multiple different pages that we are reporting here > > as multiple CPERs. > > It seems to me that setting *_MULTIPLE_* here is correct thing to do. > > > > I don't have strong opinions. Lets keep to set _MULTIPLE_ flag if Jonathan > is fine. Again, this field isn't used by Linux guest. I don't care strongly. Maybe we should ask for a spec clarification as I doubt implementations will be consistent on this given the vague description and that Linux ignores it today. > > >>>> +#define ACPI_GEBS_ERROR_DATA_ENTRIES 4 > >>> > >>> This is bits 4-13 and the define isn't used. I'd drop it. > >>> > >> > >> The definition is used in acpi_ghes_memory_errors() of this patch. However, > >> I don't see it has been used by Linux. This field isn't used by Linux to determine > >> the total number of error entries. So I think I can drop it either if Igor is ok. > >> > > Lets keep this field either in next revision if Jonathan is fine. I'm fine with the field, but not the value. As far as I can tell form the spec, it should be a mask, not a single bit. > > Thanks, > Gavin > > >