From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6C4671714A3 for ; Tue, 17 Sep 2024 12:15:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726575330; cv=none; b=jOY7kAYVpmbuPgrkXCOwCgoEoKUWVDustvaZACaWq+Kn/+k0HIGUU/Z6g7f8fsaXafjUjpxc6Ra5Mdftl7JumKTZO6V53PuDLpApUcruBCvF/LCFFiQYTcn4Q5Z7TPuRJJakR3gkpAlV1R+eO6OkCp84l+KFuuVBQKUirTMPytE= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726575330; c=relaxed/simple; bh=aHYn3dmDCJnWNIlJcWJDvc+UKiUxjkFYBntpcXxE8kU=; h=Date:From:To:Cc:Subject:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=R18xDRGQKMFt0s0sp3DD7KipjlaBinao0L2Pn+es0cECSpCg9QPc6+RTjQOkd67uzcsdCyEcrlNi3c4JQ1dDPMy6etn+0XR4WImAkuKPXcmnXUI6j8DN5+kTguP7ujq4omHBqX79XPuOKFR6visDAPG7tYKGmUqh5DZ9t2c89Ks= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=bQPc5YG0; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="bQPc5YG0" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1726575324; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=AsBnjy+X7R8yO5SwSmmu3aaG/Z4/ynLw4YA/Z9vtMbI=; b=bQPc5YG0eZRaSZdE2eeYt3F90mH5lBrwcvsJral2fE5WtIpQfbnzRm3GbJVfN139RlvffX oxlbfFk4UcT+jyl79VgCwvoV/XOXbK68ucJ92zk3wjv5yiuPFNFmsypRaeyPNqp4coIsED ZtEvsGACg0+tx1vK3Bb7AuwjKdT2nUE= Received: from mail-wm1-f71.google.com (mail-wm1-f71.google.com [209.85.128.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-629-WvoY2QPVPqqwzAsMGuOBQQ-1; Tue, 17 Sep 2024 08:15:23 -0400 X-MC-Unique: WvoY2QPVPqqwzAsMGuOBQQ-1 Received: by mail-wm1-f71.google.com with SMTP id 5b1f17b1804b1-42ac185e26cso41100545e9.3 for ; Tue, 17 Sep 2024 05:15:22 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1726575322; x=1727180122; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=AsBnjy+X7R8yO5SwSmmu3aaG/Z4/ynLw4YA/Z9vtMbI=; b=vVkNsLnTjGvo80p1pMedtFbmnXIWrBvURYzuTayLkrM281h73GBUimDl9ddhQgIQaq pbLOMWkJJk3kV1jPymc2UteC1P0YbnrA/xMY0yCovbMpde1aex1qCOYO6oYxFBpDRu4z Vx/nsV0uAUu8hFax0xk9S5SxHdgD/l6/PzXbTl7fKR6GofQ7Q01SR6uMuQ+KqYItUZR/ HfmRAh4XYWhMsd5IyUrxsIhsWaRa+D16vh+9BeML6kxvFTCvHIMWaKfMh3si4s6n+Waa zRqOF0lCjygl5dRXJRpjp5PYwSpMkJiX7AkedQzejKPXgeIL/da6dFvs6sI9L47qvzEJ ND0Q== X-Forwarded-Encrypted: i=1; AJvYcCW2JkGLuiuUziq3IIunL2tyRwc+Wm7dL1TjMdK4Jg1VgZuuhRi87me9Ym9EVAYnuzCb5Pg=@vger.kernel.org X-Gm-Message-State: AOJu0YxOS/ZfjlXw8Ai3AsCUynH3veMGCnFVLgapFefbntS5y/uvs1h2 PR1xHfu/f9uUGhaFAwcrwvkHBkdMaf9ErCh4dgGtHI6Plthfy6v1xFm29EW5MOoeSJixHWvWO9Z 2bgyE9PJl2ffNEwiUcvcu+fnvIOg7/3Px7+1mfNjd406AO3faSQ== X-Received: by 2002:a05:600c:474d:b0:42b:8a35:1acf with SMTP id 5b1f17b1804b1-42cdb586f4cmr147962365e9.25.1726575321766; Tue, 17 Sep 2024 05:15:21 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGDaQt/Kk+FBdnrigXqyg0bYmIA0V9x+x+wJSLf76x1CQYsA1tvA83D0CUwSoJQMo2sBz6ibg== X-Received: by 2002:a05:600c:474d:b0:42b:8a35:1acf with SMTP id 5b1f17b1804b1-42cdb586f4cmr147961835e9.25.1726575321020; Tue, 17 Sep 2024 05:15:21 -0700 (PDT) Received: from imammedo.users.ipa.redhat.com (nat-pool-brq-t.redhat.com. [213.175.37.10]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-378e73e80eesm9339148f8f.30.2024.09.17.05.15.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 Sep 2024 05:15:20 -0700 (PDT) Date: Tue, 17 Sep 2024 14:15:19 +0200 From: Igor Mammedov To: Mauro Carvalho Chehab Cc: Jonathan Cameron , Shiju Jose , "Michael S. Tsirkin" , Ani Sinha , Cleber Rosa , Dongjiu Geng , Eric Blake , John Snow , Markus Armbruster , Michael Roth , Paolo Bonzini , Peter Maydell , Shannon Zhao , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, qemu-arm@nongnu.org, qemu-devel@nongnu.org Subject: Re: [PATCH v10 00/21] Add ACPI CPER firmware first error injection on ARM emulation Message-ID: <20240917141519.57766bb6@imammedo.users.ipa.redhat.com> In-Reply-To: References: X-Mailer: Claws Mail 4.3.0 (GTK 3.24.43; x86_64-redhat-linux-gnu) Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit On Sat, 14 Sep 2024 08:13:21 +0200 Mauro Carvalho Chehab wrote: > This series add support for injecting generic CPER records. Such records > are generated outside QEMU via a provided script. > > On this version, the patch reworking the way offsets are calculated were > split on several other patches, to make one logical change per patch and > make review easier. > > Despite the number of patches increased from 12 to 21, there is just one > real new patch (as the other ones are a split from a big change): > > acpi/generic_event_device: Update GHES migration to cover hest addr I'm done with this round of review. Given that the series accumulated a bunch of cleanups, I'd suggest to move all cleanups/renamings not related to new HEST lookup and new src id mapping to the beginning of the series, so once they reviewed they could be split up into a separate series that could be merged while we are ironing down the new functionality. > --- > > v10: > - Patch 1 split on several patches to make reviews easier; > - Added a migration patch; > - CPER QMP command was renamed; > - Updated some comments to better reflect exact ACPI version; > - Removed a code to reset acks when OSPM fails to read records; > - Removed a duplicated config GHES_CPER symbol; > - There is now an arch-independent namespace for GHES source IDs; > - Fixed the size of hest_ghes_notify array when creating tables; > - acpi-hest.json is now a section of ACPI; > - QMP command renamed from @ghes-cper to inject-ghes-error. > > v9: > - Patches reorganized to make easier for reviewers; > - source ID is now guest-OS specific; > - Some patches got a revision history since v8; > - Several minor cleanups. > > v8: > - Fix one of the BIOS links that were incorrect; > - Changed mem error internal injection to use a common code; > - No more hardcoded values for CPER: instead of using just the > payload at the QAPI, it now has the full raw CPER there; > - Error injection script now supports changing fields at the > Generic Error Data section of the CPER; > - Several minor cleanups. > > v7: > - Change the way offsets are calculated and used on HEST table. > Now, it is compatible with migrations as all offsets are relative > to the HEST table; > - GHES interface is now more generic: the entire CPER is sent via > QMP, instead of just the payload; > - Some code cleanups to make the code more robust; > - The python script now uses QEMUMonitorProtocol class. > > v6: > - PNP0C33 device creation moved to aml-build.c; > - acpi_ghes record functions now use ACPI notify parameter, > instead of source ID; > - the number of source IDs is now automatically calculated; > - some code cleanups and function/var renames; > - some fixes and cleanups at the error injection script; > - ghes cper stub now produces an error if cper JSON is not compiled; > - Offset calculation logic for GHES was refactored; > - Updated documentation to reflect the GHES allocated size; > - Added a x-mpidr object for QOM usage; > - Added a patch making usage of x-mpidr field at ARM injection > script; > > v5: > - CPER guid is now passing as string; > - raw-data is now passed with base64 encode; > - Removed several GPIO left-overs from arm/virt.c changes; > - Lots of cleanups and improvements at the error injection script. > It now better handles QMP dialog and doesn't print debug messages. > Also, code was split on two modules, to make easier to add more > error injection commands. > > v4: > - CPER generation moved to happen outside QEMU; > - One patch adding support for mpidr query was removed. > > v3: > - patch 1 cleanups with some comment changes and adding another place where > the poweroff GPIO define should be used. No changes on other patches (except > due to conflict resolution). > > v2: > - added a new patch using a define for GPIO power pin; > - patch 2 changed to also use a define for generic error GPIO pin; > - a couple cleanups at patch 2 removing uneeded else clauses. > > Example of generating a CPER record: > > $ scripts/ghes_inject.py -d arm -p 0xdeadbeef > GUID: e19e3d16-bc11-11e4-9caa-c2051d5d46b0 > Generic Error Status Block (20 bytes): > 00000000 01 00 00 00 00 00 00 00 00 00 00 00 90 00 00 00 ................ > 00000010 00 00 00 00 .... > > Generic Error Data Entry (72 bytes): > 00000000 16 3d 9e e1 11 bc e4 11 9c aa c2 05 1d 5d 46 b0 .=...........]F. > 00000010 00 00 00 00 00 03 00 00 48 00 00 00 00 00 00 00 ........H....... > 00000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > 00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > 00000040 00 00 00 00 00 00 00 00 ........ > > Payload (72 bytes): > 00000000 05 00 00 00 01 00 00 00 48 00 00 00 00 00 00 00 ........H....... > 00000010 00 00 00 80 00 00 00 00 10 05 0f 00 00 00 00 00 ................ > 00000020 00 00 00 00 00 00 00 00 00 20 14 00 02 01 00 03 ......... ...... > 00000030 0f 00 91 00 00 00 00 00 ef be ad de 00 00 00 00 ................ > 00000040 ef be ad de 00 00 00 00 ........ > > Error injected. > > [ 9.358364] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1 > [ 9.359027] {1}[Hardware Error]: event severity: recoverable > [ 9.359586] {1}[Hardware Error]: Error 0, type: recoverable > [ 9.360124] {1}[Hardware Error]: section_type: ARM processor error > [ 9.360561] {1}[Hardware Error]: MIDR: 0x00000000000f0510 > [ 9.361160] {1}[Hardware Error]: Multiprocessor Affinity Register (MPIDR): 0x0000000080000000 > [ 9.361643] {1}[Hardware Error]: running state: 0x0 > [ 9.362142] {1}[Hardware Error]: Power State Coordination Interface state: 0 > [ 9.362682] {1}[Hardware Error]: Error info structure 0: > [ 9.363030] {1}[Hardware Error]: num errors: 2 > [ 9.363656] {1}[Hardware Error]: error_type: 0x02: cache error > [ 9.364163] {1}[Hardware Error]: error_info: 0x000000000091000f > [ 9.364834] {1}[Hardware Error]: transaction type: Data Access > [ 9.365599] {1}[Hardware Error]: cache error, operation type: Data write > [ 9.366441] {1}[Hardware Error]: cache level: 2 > [ 9.367005] {1}[Hardware Error]: processor context not corrupted > [ 9.367753] {1}[Hardware Error]: physical fault address: 0x00000000deadbeef > [ 9.374267] Memory failure: 0xdeadb: recovery action for free buddy page: Recovered > > Such script currently supports arm processor error CPER, but can easily be > extended to other GHES notification types. > > > Mauro Carvalho Chehab (21): > acpi/ghes: add a firmware file with HEST address > acpi/generic_event_device: Update GHES migration to cover hest addr > acpi/ghes: get rid of ACPI_HEST_SRC_ID_RESERVED > acpi/ghes: simplify acpi_ghes_record_errors() code > acpi/ghes: better handle source_id and notification > acpi/ghes: Remove a duplicated out of bounds check > acpi/ghes: rework the logic to handle HEST source ID > acpi/ghes: Change the type for source_id > acpi/ghes: Don't hardcode the number of sources on ghes > acpi/ghes: make the GHES record generation more generic > acpi/ghes: don't crash QEMU if ghes GED is not found > acpi/ghes: rename etc/hardware_error file macros > acpi/ghes: better name GHES memory error function > acpi/ghes: add a notifier to notify when error data is ready > acpi/generic_event_device: add an APEI error device > arm/virt: Wire up a GED error device for ACPI / GHES > qapi/acpi-hest: add an interface to do generic CPER error injection > docs: acpi_hest_ghes: fix documentation for CPER size > scripts/ghes_inject: add a script to generate GHES error inject > target/arm: add an experimental mpidr arm cpu property object > scripts/arm_processor_error.py: retrieve mpidr if not filled > > MAINTAINERS | 10 + > docs/specs/acpi_hest_ghes.rst | 6 +- > hw/acpi/Kconfig | 5 + > hw/acpi/aml-build.c | 10 + > hw/acpi/generic_event_device.c | 19 +- > hw/acpi/ghes-stub.c | 2 +- > hw/acpi/ghes.c | 312 +++++++---- > hw/acpi/ghes_cper.c | 32 ++ > hw/acpi/ghes_cper_stub.c | 19 + > hw/acpi/meson.build | 2 + > hw/arm/virt-acpi-build.c | 12 +- > hw/arm/virt.c | 19 +- > include/hw/acpi/acpi_dev_interface.h | 1 + > include/hw/acpi/aml-build.h | 2 + > include/hw/acpi/generic_event_device.h | 1 + > include/hw/acpi/ghes.h | 37 +- > include/hw/arm/virt.h | 2 + > qapi/acpi-hest.json | 35 ++ > qapi/meson.build | 1 + > qapi/qapi-schema.json | 1 + > scripts/arm_processor_error.py | 388 ++++++++++++++ > scripts/ghes_inject.py | 51 ++ > scripts/qmp_helper.py | 702 +++++++++++++++++++++++++ > target/arm/cpu.c | 1 + > target/arm/cpu.h | 1 + > target/arm/helper.c | 10 +- > target/arm/kvm.c | 3 +- > 27 files changed, 1552 insertions(+), 132 deletions(-) > create mode 100644 hw/acpi/ghes_cper.c > create mode 100644 hw/acpi/ghes_cper_stub.c > create mode 100644 qapi/acpi-hest.json > create mode 100644 scripts/arm_processor_error.py > create mode 100755 scripts/ghes_inject.py > create mode 100644 scripts/qmp_helper.py >