From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 66B17C54EE9 for ; Tue, 27 Sep 2022 16:01:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:Content-Type: List-Subscribe:List-Help:List-Post:List-Archive:List-Unsubscribe:List-Id: In-Reply-To:MIME-Version:References:Message-ID:Subject:Cc:To:From:Date: Reply-To:Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date :Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=WhZCnPLKIZfW6r/DUokr5VXHlxedSVRnQqCIAzDXowQ=; b=MbPKo1hCp9opIYlTFMSRjGJMhh EKJrhQobO0aBVUMAlUSIDbEenfgxw6l5Wsi7WwGj1VFik8z60Dlo1chqT0wZLL61IYHzj87AY7Sge ldRo+vuLgNf9ECuV2lYB8vhZTJJuLmTOknu6/GsuNflcOo+OijizkVoGLPcjhQ9JJfKGQBBCRDQ5M FH5Xxaija67zWcmfIcGoapsMz3JRCdR5gcm/JhXLtWFSc4orVsPSg3SyMWPfsGfFUr3vYnepehh28 PfeEEZQyL7IZz0bOS4tGEVtpV3oL2SWW1zw+lgXvlko/PcCPCF9saANGfWUQ2xXzuW/BuKk3MGto5 0IU7GsVw==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1odD0X-00BgBX-2N; Tue, 27 Sep 2022 16:00:29 +0000 Received: from mail-ed1-x52d.google.com ([2a00:1450:4864:20::52d]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1odD0T-00Bg9P-R0 for linux-arm-kernel@lists.infradead.org; Tue, 27 Sep 2022 16:00:27 +0000 Received: by mail-ed1-x52d.google.com with SMTP id f20so13819440edf.6 for ; Tue, 27 Sep 2022 09:00:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=user-agent:in-reply-to:content-disposition:mime-version:references :message-id:subject:cc:to:from:date:from:to:cc:subject:date; bh=aJuVRn2ynJ1wXVYFYmSjthXIebM45gjx5lMxq4i3RYU=; b=nLdsiPS0LfTl0r4gKIVJQw3DFA6Qymnb+9UxbtB7/hFgQTrRV6WuOGZmsNQ5pkN0Qq lU1TWVrdLl8TbgEY2EBVTrgRQpzp3czCcdBJO+uXk81oovA0vrk1bUOfd4VJOWyZ73Nd +fdcxik/Ae2eRONez+34XbxLFOYMdt5B50bhZggBcmJWBztRARNz3U6NqiY84uVmz+iB 1LN9YZfU6iHgwvKMM2qGKSry2fb6d0PEc5QlqTW7IY5XNsWMqcq/sQCKVmII6GCImUHw 2THSZIfDoYp4RAFpocaWNf5/fkfncr3FpKThfrual81nTw9cNOMVSNY1B+P7xOj58yUh EHtQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=user-agent:in-reply-to:content-disposition:mime-version:references :message-id:subject:cc:to:from:date:x-gm-message-state:from:to:cc :subject:date; bh=aJuVRn2ynJ1wXVYFYmSjthXIebM45gjx5lMxq4i3RYU=; b=DiTMdljDjLA7iJx+lZsRmR4DOQDZFSjYDH9DmRRYyV5Z0SDfLUVBH6Pze3GKN1wOTl erWwUeSg2IBmoq3Zz91GmfB64uEFc4+nFnNjwT9n/nPrc9nQWE/adcivE5JmdaAOhDCc 9Lo98dRph3jfnyC/rpIJ/UDorGoefbvvkBxUhGnKkvtvIysITud4E36x/g+TTK7nrbGZ e/izTd9HgHa028DMzWTpFaUMJT9r83zxU8Nv5A1HuqukZzrB9p31bYEE+uPno6SJ/BF0 hPAGnrG1r/Z/Oe/xBZ+lyBMSqsIAFvyGGmYxcuggEZh/4f/waCpDr2wJCMB8C/64/uRW b2RQ== X-Gm-Message-State: ACrzQf3PYXPS3dHEykdt9uuAIxO3Ilz2gluCEbBWgzoAs7Lqh3Af7DfM oX/fPsxgdmtHjK+hEoX7yco= X-Google-Smtp-Source: AMsMyM5yL2bZzFqp/CZGyeLzDMFRkqmHp3vo2ulfm1un/c9MP9lSWKMrVthKTc+9rpj3zqjzijDGsw== X-Received: by 2002:aa7:dd57:0:b0:453:2d35:70bb with SMTP id o23-20020aa7dd57000000b004532d3570bbmr29223064edw.26.1664294422229; Tue, 27 Sep 2022 09:00:22 -0700 (PDT) Received: from orome (p200300e41f201d00f22f74fffe1f3a53.dip0.t-ipconnect.de. [2003:e4:1f20:1d00:f22f:74ff:fe1f:3a53]) by smtp.gmail.com with ESMTPSA id ku18-20020a170907789200b0073100dfa7b0sm1013854ejc.8.2022.09.27.09.00.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 27 Sep 2022 09:00:20 -0700 (PDT) Date: Tue, 27 Sep 2022 18:00:19 +0200 From: Thierry Reding To: Borislav Petkov , Arnd Bergmann Cc: arm@kernel.org, soc@kernel.org, Jon Hunter , linux-tegra@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-edac@vger.kernel.org, Mauro Carvalho Chehab , Tony Luck , James Morse , Robert Richter , Rahul Bedarkar Subject: Re: [GIT PULL 1/7] soc/tegra: Changes for v5.20-rc1 Message-ID: References: <20220708185608.676474-1-thierry.reding@gmail.com> <20220708185608.676474-2-thierry.reding@gmail.com> MIME-Version: 1.0 In-Reply-To: User-Agent: Mutt/2.2.7 (2022-08-07) X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20220927_090025_916407_9EB0DCCC X-CRM114-Status: GOOD ( 30.39 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: multipart/mixed; boundary="===============3510136695957461446==" Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org --===============3510136695957461446== Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="XRzEosQ5RA2jqi3s" Content-Disposition: inline --XRzEosQ5RA2jqi3s Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Jul 14, 2022 at 03:31:07PM +0200, Borislav Petkov wrote: > On Wed, Jul 13, 2022 at 02:14:27PM +0200, Arnd Bergmann wrote: > > I think this is just a reflection of what other hardware can do: > > most machines only detect memory errors, but the EDAC subsystem > > can work with any type in principle. There are also a lot of > > conditions elsewhere that can be detected but not corrected. >=20 > Just a couple of thoughts from looking at this: >=20 > So the EDAC thing reports *hardware* errors by using the RAS > capabilities built into an IP block. So it started with memory > controllers but it is getting extended to other blocks. AMD are looking > at how to integrate GPU hw errors reporting into it, for example. >=20 > Looking at that CBB thing, it looks like it is supposed to report not > so much hardware errors but operational errors. Some of the hw errors > reported by RAS hw are also operation-related but not the majority. >=20 > Then, EDAC has this counters exposed in: >=20 > $ grep -r . /sys/devices/system/edac/ > /sys/devices/system/edac/power/runtime_active_time:0 > /sys/devices/system/edac/power/runtime_status:unsupported > /sys/devices/system/edac/power/runtime_suspended_time:0 > /sys/devices/system/edac/power/control:auto > /sys/devices/system/edac/pci/edac_pci_log_pe:1 > /sys/devices/system/edac/pci/pci0/pe_count:0 > /sys/devices/system/edac/pci/pci0/npe_count:0 > /sys/devices/system/edac/pci/pci_parity_count:0 > /sys/devices/system/edac/pci/pci_nonparity_count:0 > /sys/devices/system/edac/pci/edac_pci_log_npe:1 > /sys/devices/system/edac/pci/edac_pci_panic_on_pe:0 > /sys/devices/system/edac/pci/check_pci_errors:0 > /sys/devices/system/edac/mc/power/runtime_active_time:0 > /sys/devices/system/edac/mc/power/runtime_status:unsupported > ... >=20 > with the respective hierarchy: memory controllers, PCI errors, etc. >=20 > So the main question is, does it make sense for you to fit this into the > EDAC hierarchy and what would even be the advantage of making it part of > EDAC? Closing the loop on this: we've decided to keep this in drivers/soc for now, with the option of re-evaluating when we encounter similar functionality on other hardware. I'm also going to hijack the thread because something else came up recently that fits the audience here and it's up the same alley: on Tegra234 a mechanism, called FSI (Functional Safety Island), exists to report failures to an external MCU that's monitoring the system. Special hardware exists in the SoC that can send these errors to the MCU via different transports, and the idea is to report software- detected failures from kernel drivers such as I2C or PCI via this mechanism, so appropriate action can be taken. So essentially we're looking at adding some new API, preferably something generic, to these bus drivers along with "provider" drivers that get notified of these reports so that they can be forwarded to the FSI (and then the MCU). This again doesn't seem to be a great fit for EDAC as it is today, but I can also not find anything better looking around the kernel. So I'm wondering if this is something that others have encountered and might have solved already and I just haven't found it, or if this is something that would be worth creating a new subsystem for. Or perhaps this could be integrated into EDAC somehow? I'm a bit reluctant to add yet another custom infrastructure for this, given that it's functionality that likely exists in other SoCs as well. Any thoughts on this? Thierry --XRzEosQ5RA2jqi3s Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEiOrDCAFJzPfAjcif3SOs138+s6EFAmMzHhAACgkQ3SOs138+ s6G4wxAAnW+S0l13pKsw2U4zZ5sNcYSoEgHdCR8jCxsO0W4Lc3b8L/xQkfxIleZW 9/6fKDe6sygRujnqnYgBGIUUC6SjgklTDMpQaI39oFK6XglyFsFaYVyZpKBfFISc UbVFT71QClax5nUbOXmftPPMSHp7OUQzql3ANq2gT5r9usHIqtE3cDamJv1p/CRA TMPQSakOv9cI+CtQHWPMV7b9DUG9e6SNzsHhpC/JZsbySATcsTW0IyTBY2pNTBf7 Rqe/uCh8jlJx42rCuY/GVGl0TGDO/Im1GjrII5M3OoSkb1aTH0VaSCa4KQhi5WRS Jh2tnMCkKwwmjpuuNnUc2K6+OUHl2Iya2Gfa+Q0OVhTvi+stpjImpG5xjLpyBj3W KHFdE1tzXSn1yxwYhe7WS1vd4ZsHWeP19oo6VRFsav+51e1qyXsj2I7PoucD67bG o+NjiVIZxNg/lnaf2za/6qZ7qMT9kJAXDeH4oUmYhDsydnpLvQZ6aPkDl3X11qaK BejA3/XatDIk/hL8sqFiVTvgyiqaZJzaVYdEGrr0yJRpuDVCXkCHrG9njyLOYtdH LQ2aTmFIgunI+r2tDLoEjR8bjxEvV+C9kwhRVAySSYC9Pdwz+Dnb2yWE6AYt8EnR ZV3aVTGFyjc5OByFB3KBIptfB/syLfnnq9ZcKcWfQvDyI2EMS8E= =UI0x -----END PGP SIGNATURE----- --XRzEosQ5RA2jqi3s-- --===============3510136695957461446== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel --===============3510136695957461446==--