From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 8A25EC07E9D for ; Tue, 27 Sep 2022 16:00:25 +0000 (UTC) Received: by smtp.kernel.org (Postfix) id 5C9F4C4347C; Tue, 27 Sep 2022 16:00:25 +0000 (UTC) Received: from mail-ed1-f45.google.com (mail-ed1-f45.google.com [209.85.208.45]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.kernel.org (Postfix) with ESMTPS id 3169AC433C1; Tue, 27 Sep 2022 16:00:24 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 smtp.kernel.org 3169AC433C1 Authentication-Results: smtp.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.kernel.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-ed1-f45.google.com with SMTP id y8so13819657edc.10; Tue, 27 Sep 2022 09:00:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=user-agent:in-reply-to:content-disposition:mime-version:references :message-id:subject:cc:to:from:date:from:to:cc:subject:date; bh=aJuVRn2ynJ1wXVYFYmSjthXIebM45gjx5lMxq4i3RYU=; b=nLdsiPS0LfTl0r4gKIVJQw3DFA6Qymnb+9UxbtB7/hFgQTrRV6WuOGZmsNQ5pkN0Qq lU1TWVrdLl8TbgEY2EBVTrgRQpzp3czCcdBJO+uXk81oovA0vrk1bUOfd4VJOWyZ73Nd +fdcxik/Ae2eRONez+34XbxLFOYMdt5B50bhZggBcmJWBztRARNz3U6NqiY84uVmz+iB 1LN9YZfU6iHgwvKMM2qGKSry2fb6d0PEc5QlqTW7IY5XNsWMqcq/sQCKVmII6GCImUHw 2THSZIfDoYp4RAFpocaWNf5/fkfncr3FpKThfrual81nTw9cNOMVSNY1B+P7xOj58yUh EHtQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=user-agent:in-reply-to:content-disposition:mime-version:references :message-id:subject:cc:to:from:date:x-gm-message-state:from:to:cc :subject:date; bh=aJuVRn2ynJ1wXVYFYmSjthXIebM45gjx5lMxq4i3RYU=; b=2v+yoZkfSOMvyy0z+7dn5T9CxxSE0tT2oVt5RUHDbGXUAygJk+kcm1G6NNNkqC5YFD 1F3hAe4VmecqoKme8tVhz56Aav3hCzA2jTP26iNWZRzWmOPD4eqjhoT0vwfVEG2ZLwpV //cVAV4jvk/AgJWxl/NM0ps6bAAxIzZpgCEdC2zO9JnZQMOPNASTquwzTCJ3OwprVCa1 U27yvSOj2rRGCsya44NdMYF/RxHEjEwl1krEf81ZCDS0xY7bf6njNgwFRlq+8xeqPE8F zYjc9TsMl2y0UOmbI50AaovEN8Zob+9tqn6PsPyPtT9eT+vtFoxh3N6k7z2xUKiZT0Tb kvjw== X-Gm-Message-State: ACrzQf0kkGubLfrYaC17esrk4Q36hIgSjHIMFi4KZJR43x7qNkiYtMqY 7/IW+vWcT6K0oCfbsPuRwu8= X-Google-Smtp-Source: AMsMyM5yL2bZzFqp/CZGyeLzDMFRkqmHp3vo2ulfm1un/c9MP9lSWKMrVthKTc+9rpj3zqjzijDGsw== X-Received: by 2002:aa7:dd57:0:b0:453:2d35:70bb with SMTP id o23-20020aa7dd57000000b004532d3570bbmr29223064edw.26.1664294422229; Tue, 27 Sep 2022 09:00:22 -0700 (PDT) Received: from orome (p200300e41f201d00f22f74fffe1f3a53.dip0.t-ipconnect.de. [2003:e4:1f20:1d00:f22f:74ff:fe1f:3a53]) by smtp.gmail.com with ESMTPSA id ku18-20020a170907789200b0073100dfa7b0sm1013854ejc.8.2022.09.27.09.00.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 27 Sep 2022 09:00:20 -0700 (PDT) Date: Tue, 27 Sep 2022 18:00:19 +0200 From: Thierry Reding To: Borislav Petkov , Arnd Bergmann List-Id: Cc: arm@kernel.org, soc@kernel.org, Jon Hunter , linux-tegra@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-edac@vger.kernel.org, Mauro Carvalho Chehab , Tony Luck , James Morse , Robert Richter , Rahul Bedarkar Subject: Re: [GIT PULL 1/7] soc/tegra: Changes for v5.20-rc1 Message-ID: References: <20220708185608.676474-1-thierry.reding@gmail.com> <20220708185608.676474-2-thierry.reding@gmail.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="XRzEosQ5RA2jqi3s" Content-Disposition: inline In-Reply-To: User-Agent: Mutt/2.2.7 (2022-08-07) --XRzEosQ5RA2jqi3s Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Jul 14, 2022 at 03:31:07PM +0200, Borislav Petkov wrote: > On Wed, Jul 13, 2022 at 02:14:27PM +0200, Arnd Bergmann wrote: > > I think this is just a reflection of what other hardware can do: > > most machines only detect memory errors, but the EDAC subsystem > > can work with any type in principle. There are also a lot of > > conditions elsewhere that can be detected but not corrected. >=20 > Just a couple of thoughts from looking at this: >=20 > So the EDAC thing reports *hardware* errors by using the RAS > capabilities built into an IP block. So it started with memory > controllers but it is getting extended to other blocks. AMD are looking > at how to integrate GPU hw errors reporting into it, for example. >=20 > Looking at that CBB thing, it looks like it is supposed to report not > so much hardware errors but operational errors. Some of the hw errors > reported by RAS hw are also operation-related but not the majority. >=20 > Then, EDAC has this counters exposed in: >=20 > $ grep -r . /sys/devices/system/edac/ > /sys/devices/system/edac/power/runtime_active_time:0 > /sys/devices/system/edac/power/runtime_status:unsupported > /sys/devices/system/edac/power/runtime_suspended_time:0 > /sys/devices/system/edac/power/control:auto > /sys/devices/system/edac/pci/edac_pci_log_pe:1 > /sys/devices/system/edac/pci/pci0/pe_count:0 > /sys/devices/system/edac/pci/pci0/npe_count:0 > /sys/devices/system/edac/pci/pci_parity_count:0 > /sys/devices/system/edac/pci/pci_nonparity_count:0 > /sys/devices/system/edac/pci/edac_pci_log_npe:1 > /sys/devices/system/edac/pci/edac_pci_panic_on_pe:0 > /sys/devices/system/edac/pci/check_pci_errors:0 > /sys/devices/system/edac/mc/power/runtime_active_time:0 > /sys/devices/system/edac/mc/power/runtime_status:unsupported > ... >=20 > with the respective hierarchy: memory controllers, PCI errors, etc. >=20 > So the main question is, does it make sense for you to fit this into the > EDAC hierarchy and what would even be the advantage of making it part of > EDAC? Closing the loop on this: we've decided to keep this in drivers/soc for now, with the option of re-evaluating when we encounter similar functionality on other hardware. I'm also going to hijack the thread because something else came up recently that fits the audience here and it's up the same alley: on Tegra234 a mechanism, called FSI (Functional Safety Island), exists to report failures to an external MCU that's monitoring the system. Special hardware exists in the SoC that can send these errors to the MCU via different transports, and the idea is to report software- detected failures from kernel drivers such as I2C or PCI via this mechanism, so appropriate action can be taken. So essentially we're looking at adding some new API, preferably something generic, to these bus drivers along with "provider" drivers that get notified of these reports so that they can be forwarded to the FSI (and then the MCU). This again doesn't seem to be a great fit for EDAC as it is today, but I can also not find anything better looking around the kernel. So I'm wondering if this is something that others have encountered and might have solved already and I just haven't found it, or if this is something that would be worth creating a new subsystem for. Or perhaps this could be integrated into EDAC somehow? I'm a bit reluctant to add yet another custom infrastructure for this, given that it's functionality that likely exists in other SoCs as well. Any thoughts on this? Thierry --XRzEosQ5RA2jqi3s Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEiOrDCAFJzPfAjcif3SOs138+s6EFAmMzHhAACgkQ3SOs138+ s6G4wxAAnW+S0l13pKsw2U4zZ5sNcYSoEgHdCR8jCxsO0W4Lc3b8L/xQkfxIleZW 9/6fKDe6sygRujnqnYgBGIUUC6SjgklTDMpQaI39oFK6XglyFsFaYVyZpKBfFISc UbVFT71QClax5nUbOXmftPPMSHp7OUQzql3ANq2gT5r9usHIqtE3cDamJv1p/CRA TMPQSakOv9cI+CtQHWPMV7b9DUG9e6SNzsHhpC/JZsbySATcsTW0IyTBY2pNTBf7 Rqe/uCh8jlJx42rCuY/GVGl0TGDO/Im1GjrII5M3OoSkb1aTH0VaSCa4KQhi5WRS Jh2tnMCkKwwmjpuuNnUc2K6+OUHl2Iya2Gfa+Q0OVhTvi+stpjImpG5xjLpyBj3W KHFdE1tzXSn1yxwYhe7WS1vd4ZsHWeP19oo6VRFsav+51e1qyXsj2I7PoucD67bG o+NjiVIZxNg/lnaf2za/6qZ7qMT9kJAXDeH4oUmYhDsydnpLvQZ6aPkDl3X11qaK BejA3/XatDIk/hL8sqFiVTvgyiqaZJzaVYdEGrr0yJRpuDVCXkCHrG9njyLOYtdH LQ2aTmFIgunI+r2tDLoEjR8bjxEvV+C9kwhRVAySSYC9Pdwz+Dnb2yWE6AYt8EnR ZV3aVTGFyjc5OByFB3KBIptfB/syLfnnq9ZcKcWfQvDyI2EMS8E= =UI0x -----END PGP SIGNATURE----- --XRzEosQ5RA2jqi3s--