From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 8E8B6C36019 for ; Tue, 17 Sep 2024 12:16:02 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sqX7R-0003jZ-V9; Tue, 17 Sep 2024 08:15:46 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sqX7E-0003Mm-B5 for qemu-devel@nongnu.org; Tue, 17 Sep 2024 08:15:32 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sqX7B-00079D-74 for qemu-devel@nongnu.org; Tue, 17 Sep 2024 08:15:31 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1726575326; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=AsBnjy+X7R8yO5SwSmmu3aaG/Z4/ynLw4YA/Z9vtMbI=; b=KFaGzJ/qA36IZMNGLo0mRp18uvDxfmyLa+kvrs9ulceKfF/TzXglaMJqTf1q2q6Z32MhY6 eOL4bEFKry8ar/lvlkXROxc5rkFsQCFIXaZr3iXhngZigskeVxprFo2fMLDK7Okpph6z7s M99b3WVbDicvYdg7fyXf5xUPOHjZF84= Received: from mail-wm1-f69.google.com (mail-wm1-f69.google.com [209.85.128.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-371-qNKgWVwnNA2QsDed1HAp4A-1; Tue, 17 Sep 2024 08:15:23 -0400 X-MC-Unique: qNKgWVwnNA2QsDed1HAp4A-1 Received: by mail-wm1-f69.google.com with SMTP id 5b1f17b1804b1-42cb857fc7dso41233645e9.0 for ; Tue, 17 Sep 2024 05:15:23 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1726575322; x=1727180122; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=AsBnjy+X7R8yO5SwSmmu3aaG/Z4/ynLw4YA/Z9vtMbI=; b=BRiMI1tDAj2Rr8uLMLvxFD+m0WJ4f/Ea4Atj83RVosevBcPOKGYUHlnHhmjX7eCn1F /0GPC2b3cviNktQ7d8EQV4KZpg6g/asm/GLyYsaoJJZbVvkbdeiTN3cRQRPWu/iYRzfD CU7FYtQeKY4vpPAJU8aOkkY8Sdg1OgDbnqH+AGmTQRbXzF3yChiY1JLOxObK266YyeFd HS5Apk0PBSQXR2mCDHR9Y/5wG24WPyEuulcYGA5gp6ICD4ZG2cqbO+66GXWzf9Zn8OZD VFK9x7zwpA2kKS3saGfWmP2tLinkp3QPH12SLd0DKYLpCxQ3kPnhHBaaT54SmRfTnu+H NWVg== X-Forwarded-Encrypted: i=1; AJvYcCU+aGAMnwTkYdXBWGsU56bgH/4eEKwmCvZvFWuZdG53HPtcJoNLyKIu/ELHYeizY+4tKUn9T9jRGbDV@nongnu.org X-Gm-Message-State: AOJu0Yxq2rqq0s7YtnGyLR9V3NLFMfD1nfss9vNeNbU18/YOtJ+o2ogu vWpt+VnfU9kI7kD/Ru9F4e8jfscpcb6J4oY2qLUwLx1MfxFWyla962PGEKkVzzY3CZHXZDtNB8N FRcdcySwLdnBRK3FXX5yKtSIHREAahjLS157JxuGIVMIiDHOz59rA X-Received: by 2002:a05:600c:474d:b0:42b:8a35:1acf with SMTP id 5b1f17b1804b1-42cdb586f4cmr147962515e9.25.1726575321791; Tue, 17 Sep 2024 05:15:21 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGDaQt/Kk+FBdnrigXqyg0bYmIA0V9x+x+wJSLf76x1CQYsA1tvA83D0CUwSoJQMo2sBz6ibg== X-Received: by 2002:a05:600c:474d:b0:42b:8a35:1acf with SMTP id 5b1f17b1804b1-42cdb586f4cmr147961835e9.25.1726575321020; Tue, 17 Sep 2024 05:15:21 -0700 (PDT) Received: from imammedo.users.ipa.redhat.com (nat-pool-brq-t.redhat.com. [213.175.37.10]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-378e73e80eesm9339148f8f.30.2024.09.17.05.15.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 Sep 2024 05:15:20 -0700 (PDT) Date: Tue, 17 Sep 2024 14:15:19 +0200 From: Igor Mammedov To: Mauro Carvalho Chehab Cc: Jonathan Cameron , Shiju Jose , "Michael S. Tsirkin" , Ani Sinha , Cleber Rosa , Dongjiu Geng , Eric Blake , John Snow , Markus Armbruster , Michael Roth , Paolo Bonzini , Peter Maydell , Shannon Zhao , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, qemu-arm@nongnu.org, qemu-devel@nongnu.org Subject: Re: [PATCH v10 00/21] Add ACPI CPER firmware first error injection on ARM emulation Message-ID: <20240917141519.57766bb6@imammedo.users.ipa.redhat.com> In-Reply-To: References: X-Mailer: Claws Mail 4.3.0 (GTK 3.24.43; x86_64-redhat-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Received-SPF: pass client-ip=170.10.129.124; envelope-from=imammedo@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org On Sat, 14 Sep 2024 08:13:21 +0200 Mauro Carvalho Chehab wrote: > This series add support for injecting generic CPER records. Such records > are generated outside QEMU via a provided script. > > On this version, the patch reworking the way offsets are calculated were > split on several other patches, to make one logical change per patch and > make review easier. > > Despite the number of patches increased from 12 to 21, there is just one > real new patch (as the other ones are a split from a big change): > > acpi/generic_event_device: Update GHES migration to cover hest addr I'm done with this round of review. Given that the series accumulated a bunch of cleanups, I'd suggest to move all cleanups/renamings not related to new HEST lookup and new src id mapping to the beginning of the series, so once they reviewed they could be split up into a separate series that could be merged while we are ironing down the new functionality. > --- > > v10: > - Patch 1 split on several patches to make reviews easier; > - Added a migration patch; > - CPER QMP command was renamed; > - Updated some comments to better reflect exact ACPI version; > - Removed a code to reset acks when OSPM fails to read records; > - Removed a duplicated config GHES_CPER symbol; > - There is now an arch-independent namespace for GHES source IDs; > - Fixed the size of hest_ghes_notify array when creating tables; > - acpi-hest.json is now a section of ACPI; > - QMP command renamed from @ghes-cper to inject-ghes-error. > > v9: > - Patches reorganized to make easier for reviewers; > - source ID is now guest-OS specific; > - Some patches got a revision history since v8; > - Several minor cleanups. > > v8: > - Fix one of the BIOS links that were incorrect; > - Changed mem error internal injection to use a common code; > - No more hardcoded values for CPER: instead of using just the > payload at the QAPI, it now has the full raw CPER there; > - Error injection script now supports changing fields at the > Generic Error Data section of the CPER; > - Several minor cleanups. > > v7: > - Change the way offsets are calculated and used on HEST table. > Now, it is compatible with migrations as all offsets are relative > to the HEST table; > - GHES interface is now more generic: the entire CPER is sent via > QMP, instead of just the payload; > - Some code cleanups to make the code more robust; > - The python script now uses QEMUMonitorProtocol class. > > v6: > - PNP0C33 device creation moved to aml-build.c; > - acpi_ghes record functions now use ACPI notify parameter, > instead of source ID; > - the number of source IDs is now automatically calculated; > - some code cleanups and function/var renames; > - some fixes and cleanups at the error injection script; > - ghes cper stub now produces an error if cper JSON is not compiled; > - Offset calculation logic for GHES was refactored; > - Updated documentation to reflect the GHES allocated size; > - Added a x-mpidr object for QOM usage; > - Added a patch making usage of x-mpidr field at ARM injection > script; > > v5: > - CPER guid is now passing as string; > - raw-data is now passed with base64 encode; > - Removed several GPIO left-overs from arm/virt.c changes; > - Lots of cleanups and improvements at the error injection script. > It now better handles QMP dialog and doesn't print debug messages. > Also, code was split on two modules, to make easier to add more > error injection commands. > > v4: > - CPER generation moved to happen outside QEMU; > - One patch adding support for mpidr query was removed. > > v3: > - patch 1 cleanups with some comment changes and adding another place where > the poweroff GPIO define should be used. No changes on other patches (except > due to conflict resolution). > > v2: > - added a new patch using a define for GPIO power pin; > - patch 2 changed to also use a define for generic error GPIO pin; > - a couple cleanups at patch 2 removing uneeded else clauses. > > Example of generating a CPER record: > > $ scripts/ghes_inject.py -d arm -p 0xdeadbeef > GUID: e19e3d16-bc11-11e4-9caa-c2051d5d46b0 > Generic Error Status Block (20 bytes): > 00000000 01 00 00 00 00 00 00 00 00 00 00 00 90 00 00 00 ................ > 00000010 00 00 00 00 .... > > Generic Error Data Entry (72 bytes): > 00000000 16 3d 9e e1 11 bc e4 11 9c aa c2 05 1d 5d 46 b0 .=...........]F. > 00000010 00 00 00 00 00 03 00 00 48 00 00 00 00 00 00 00 ........H....... > 00000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > 00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > 00000040 00 00 00 00 00 00 00 00 ........ > > Payload (72 bytes): > 00000000 05 00 00 00 01 00 00 00 48 00 00 00 00 00 00 00 ........H....... > 00000010 00 00 00 80 00 00 00 00 10 05 0f 00 00 00 00 00 ................ > 00000020 00 00 00 00 00 00 00 00 00 20 14 00 02 01 00 03 ......... ...... > 00000030 0f 00 91 00 00 00 00 00 ef be ad de 00 00 00 00 ................ > 00000040 ef be ad de 00 00 00 00 ........ > > Error injected. > > [ 9.358364] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1 > [ 9.359027] {1}[Hardware Error]: event severity: recoverable > [ 9.359586] {1}[Hardware Error]: Error 0, type: recoverable > [ 9.360124] {1}[Hardware Error]: section_type: ARM processor error > [ 9.360561] {1}[Hardware Error]: MIDR: 0x00000000000f0510 > [ 9.361160] {1}[Hardware Error]: Multiprocessor Affinity Register (MPIDR): 0x0000000080000000 > [ 9.361643] {1}[Hardware Error]: running state: 0x0 > [ 9.362142] {1}[Hardware Error]: Power State Coordination Interface state: 0 > [ 9.362682] {1}[Hardware Error]: Error info structure 0: > [ 9.363030] {1}[Hardware Error]: num errors: 2 > [ 9.363656] {1}[Hardware Error]: error_type: 0x02: cache error > [ 9.364163] {1}[Hardware Error]: error_info: 0x000000000091000f > [ 9.364834] {1}[Hardware Error]: transaction type: Data Access > [ 9.365599] {1}[Hardware Error]: cache error, operation type: Data write > [ 9.366441] {1}[Hardware Error]: cache level: 2 > [ 9.367005] {1}[Hardware Error]: processor context not corrupted > [ 9.367753] {1}[Hardware Error]: physical fault address: 0x00000000deadbeef > [ 9.374267] Memory failure: 0xdeadb: recovery action for free buddy page: Recovered > > Such script currently supports arm processor error CPER, but can easily be > extended to other GHES notification types. > > > Mauro Carvalho Chehab (21): > acpi/ghes: add a firmware file with HEST address > acpi/generic_event_device: Update GHES migration to cover hest addr > acpi/ghes: get rid of ACPI_HEST_SRC_ID_RESERVED > acpi/ghes: simplify acpi_ghes_record_errors() code > acpi/ghes: better handle source_id and notification > acpi/ghes: Remove a duplicated out of bounds check > acpi/ghes: rework the logic to handle HEST source ID > acpi/ghes: Change the type for source_id > acpi/ghes: Don't hardcode the number of sources on ghes > acpi/ghes: make the GHES record generation more generic > acpi/ghes: don't crash QEMU if ghes GED is not found > acpi/ghes: rename etc/hardware_error file macros > acpi/ghes: better name GHES memory error function > acpi/ghes: add a notifier to notify when error data is ready > acpi/generic_event_device: add an APEI error device > arm/virt: Wire up a GED error device for ACPI / GHES > qapi/acpi-hest: add an interface to do generic CPER error injection > docs: acpi_hest_ghes: fix documentation for CPER size > scripts/ghes_inject: add a script to generate GHES error inject > target/arm: add an experimental mpidr arm cpu property object > scripts/arm_processor_error.py: retrieve mpidr if not filled > > MAINTAINERS | 10 + > docs/specs/acpi_hest_ghes.rst | 6 +- > hw/acpi/Kconfig | 5 + > hw/acpi/aml-build.c | 10 + > hw/acpi/generic_event_device.c | 19 +- > hw/acpi/ghes-stub.c | 2 +- > hw/acpi/ghes.c | 312 +++++++---- > hw/acpi/ghes_cper.c | 32 ++ > hw/acpi/ghes_cper_stub.c | 19 + > hw/acpi/meson.build | 2 + > hw/arm/virt-acpi-build.c | 12 +- > hw/arm/virt.c | 19 +- > include/hw/acpi/acpi_dev_interface.h | 1 + > include/hw/acpi/aml-build.h | 2 + > include/hw/acpi/generic_event_device.h | 1 + > include/hw/acpi/ghes.h | 37 +- > include/hw/arm/virt.h | 2 + > qapi/acpi-hest.json | 35 ++ > qapi/meson.build | 1 + > qapi/qapi-schema.json | 1 + > scripts/arm_processor_error.py | 388 ++++++++++++++ > scripts/ghes_inject.py | 51 ++ > scripts/qmp_helper.py | 702 +++++++++++++++++++++++++ > target/arm/cpu.c | 1 + > target/arm/cpu.h | 1 + > target/arm/helper.c | 10 +- > target/arm/kvm.c | 3 +- > 27 files changed, 1552 insertions(+), 132 deletions(-) > create mode 100644 hw/acpi/ghes_cper.c > create mode 100644 hw/acpi/ghes_cper_stub.c > create mode 100644 qapi/acpi-hest.json > create mode 100644 scripts/arm_processor_error.py > create mode 100755 scripts/ghes_inject.py > create mode 100644 scripts/qmp_helper.py >