From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id ABECAC00144 for ; Mon, 1 Aug 2022 15:35:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:Content-Type: Content-Transfer-Encoding:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:In-Reply-To:From:References:Cc:To:Subject: MIME-Version:Date:Message-ID:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=Cy6Q23HrrSOg/mz1SWjA/nUcsucSfn6w4aASKJrefV4=; b=gCtr8Mi1Fk+v4G eOu/H0pk2akTBEGAfdCvJe5JPxjMXCwG5ero4VTtGp1/51AGP+3jWJKcSGK9JJR7/d7BR8WLGDPj2 IwU4z0x4Lhr7PLiNSZek18qmb2oKYnB86PVjwxj4qZzf9eEVH84JaMyDsIzCtDhN8LS/qyIhStN/x ia5fLhLG8iljm559Ps8aIIHVHi1Bs3e8wcCRDuiUaHeEZWOMx+5QNLY1EB8CAswjVweN41pJYLdwr XIer+FuxUNRCZLElRw+tNUp0iFNoZZX9zas16jyoxGVuI5c//lQl6r/7rvIex+yemvLv+NEBVdKs/ 9aLs5nYOfurFukawWmBw==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1oIXR7-007TiH-Vz; Mon, 01 Aug 2022 15:34:30 +0000 Received: from mail-pg1-x52a.google.com ([2607:f8b0:4864:20::52a]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1oIXR2-007TgI-Vk for linux-arm-kernel@lists.infradead.org; Mon, 01 Aug 2022 15:34:27 +0000 Received: by mail-pg1-x52a.google.com with SMTP id f65so9992653pgc.12 for ; Mon, 01 Aug 2022 08:34:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc; bh=4o2qdt7SZGblVE7kju+ci215WWeZIAGpk1CSOhpNJVg=; b=QG2mU3hq3vwmH+oiUzkI5Bbir1f0F6w+DXNjRNVGI0Ptv/LjVmZFQ3jxSjb61eDArQ EaM4kizSLifnhwT9E9M8AAewtjqpw7z4iKJGeJysdcsUspSkWkq79N6gTU8xgp5TyiFc u6ugIPSs9qKeZ3PbAlppVjUQL0T1Ue+Hl4IYizCYen4bOlDtVJrZXZGv1ennRomjzcHS qaHUqJVTzGIEQXfHj4s6qnQAzI0L1YXoQw7JRrn8Oqa9RKoBabEsd2A1ZDRyNsvf0nEV ww4buTLCqi/lstrjDkRusj8j1bHF0UvBjIVAcbEyG9yP0H4cTBqAdUSMwA+tbWruSFSt LPkA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc; bh=4o2qdt7SZGblVE7kju+ci215WWeZIAGpk1CSOhpNJVg=; b=oqak0R+Qu6uhoa+7M4c3dyA0kQQdAbyOcf1R0PWylb0NLNRs/hnB474OITEpH/BrzG x1lFeO3BBNlxH9kTbg0AYuy4/O3lzKFSwd7mTPfdmhSjUeSTQURhwkVDEJxzTR+twlNT 5ycsV0KXjcdJcvjOx6POfzyTGpHFJO15vn95tV0i5eDtFlnT2OVH+yCt0mTTLhaaBD9b SE0+U6+GpUyRGLmoACqw0HBfWfWmTwrBlBKBQIhY+g3u5SI4NoAcDv7V2iS7HNHWTeJJ 0SzNfI2F/nranjCaoPznKkVUeexI3TWnvibLIhJKT1S2ry2/UeDoLd3SWZan+5JnVA+w XYPQ== X-Gm-Message-State: AJIora8t5wJcsa/FJn1VuK4Tv5HJOeMfCIU21kcGveqxMVm8oT6dYCuv ti1l8qf8Y0rcSDKE3w/LMUE= X-Google-Smtp-Source: AGRyM1sNAgHSTTEW9vSsltNGBUcdjujIMwX8ay61Yf3t820y1s5yAEeZn6swq7mfiFFV6TE6QYManQ== X-Received: by 2002:a63:2c10:0:b0:411:4fd8:9fc8 with SMTP id s16-20020a632c10000000b004114fd89fc8mr14109427pgs.313.1659368063121; Mon, 01 Aug 2022 08:34:23 -0700 (PDT) Received: from [192.168.1.102] (ip72-194-116-95.oc.oc.cox.net. [72.194.116.95]) by smtp.gmail.com with ESMTPSA id n14-20020a170902f60e00b0016c4331e61csm9712401plg.137.2022.08.01.08.34.21 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 01 Aug 2022 08:34:22 -0700 (PDT) Message-ID: Date: Mon, 1 Aug 2022 08:34:21 -0700 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.0.3 Subject: Re: bcm2711_thermal: Kernel panic - not syncing: Asynchronous SError Interrupt Content-Language: en-US To: Juerg Haefliger , Florian Fainelli Cc: Nicolas Saenz Julienne , Robin Murphy , stefan.wahren@i2se.com, Catalin Marinas , Robin Murphy , bcm-kernel-feedback-list@broadcom.com, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-pm@vger.kernel.org References: <20210210114829.2915de78@gollum> <6d9ca41b4ad2225db102da654d38bc61f6c1c111.camel@suse.de> <35e17dc9-c88d-582f-607d-1d90b20868fa@arm.com> <6612b35f-86bb-bb1e-bae8-188366495dbe@gmail.com> <20220727100510.4723ec84@smeagol> <20220728103513.38e93fa9@gollum> From: Florian Fainelli In-Reply-To: <20220728103513.38e93fa9@gollum> X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20220801_083425_192363_5FE0A96E X-CRM114-Status: GOOD ( 27.50 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On 7/28/2022 2:06 AM, Juerg Haefliger wrote: > On Wed, 27 Jul 2022 14:51:24 -0700 > Florian Fainelli wrote: > >> On 7/27/22 01:05, Juerg Haefliger wrote: >>> On Wed, 10 Feb 2021 14:59:45 -0800 >>> Florian Fainelli wrote: >>> >>>> On 2/10/2021 8:55 AM, Nicolas Saenz Julienne wrote: >>>>> Hi Robin, >>>>> >>>>> On Wed, 2021-02-10 at 16:25 +0000, Robin Murphy wrote: >>>>>> On 2021-02-10 13:15, Nicolas Saenz Julienne wrote: >>>>>>> [ Add Robin, Catalin and Florian in case they want to chime in ] >>>>>>> >>>>>>> Hi Juerg, thanks for the report! >>>>>>> >>>>>>> On Wed, 2021-02-10 at 11:48 +0100, Juerg Haefliger wrote: >>>>>>>> Trying to dump the BCM2711 registers kills the kernel: >>>>>>>> >>>>>>>> # cat /sys/kernel/debug/regmap/dummy-avs-monitor\@fd5d2000/range >>>>>>>> 0-efc >>>>>>>> # cat /sys/kernel/debug/regmap/dummy-avs-monitor\@fd5d2000/registers >>>>>>>> >>>>>>>> [ 62.857661] SError Interrupt on CPU1, code 0xbf000002 -- SError >>>>>>> >>>>>>> So ESR's IDS (bit 24) is set, which means it's an 'Implementation Defined >>>>>>> SError,' hence IIUC the rest of the error code is meaningless to anyone outside >>>>>>> of Broadcom/RPi. >>>>>> >>>>>> It's imp-def from the architecture's PoV, but the implementation in this >>>>>> case is Cortex-A72, where 0x000002 means an attributable, containable >>>>>> Slave Error: >>>>>> >>>>>> https://developer.arm.com/documentation/100095/0003/system-control/aarch64-register-descriptions/exception-syndrome-register--el1-and-el3?lang=en >>>>>> >>>>>> In other words, the thing at the other end of an interconnect >>>>>> transaction said "no" :) >>>>>> >>>>>> (The fact that Cortex-A72 gets too far ahead of itself to take it as a >>>>>> synchronous external abort is a mild annoyance, but hey...) >>>>> >>>>> Thanks for both your clarifications! Reading arm documentation is a skill on >>>>> its own. >>>> >>>> Yes it is. >>>> >>>>> >>>>>>> The regmap is created through the following syscon device: >>>>>>> >>>>>>> avs_monitor: avs-monitor@7d5d2000 { >>>>>>> compatible = "brcm,bcm2711-avs-monitor", >>>>>>> "syscon", "simple-mfd"; >>>>>>> reg = <0x7d5d2000 0xf00>; >>>>>>> >>>>>>> thermal: thermal { >>>>>>> compatible = "brcm,bcm2711-thermal"; >>>>>>> #thermal-sensor-cells = <0>; >>>>>>> }; >>>>>>> }; >>>>>>> >>>>>>> I've done some tests with devmem, and the whole <0x7d5d2000 0xf00> range is >>>>>>> full of addresses that trigger this same error. Also note that as per Florian's >>>>>>> comments[1]: "AVS_RO_REGISTERS_0: 0x7d5d2200 - 0x7d5d22e3." But from what I can >>>>>>> tell, at least 0x7d5d22b0 seems to be faulty too. >>>>>>> >>>>>>> Any ideas/comments? My guess is that those addresses are marked somehow as >>>>>>> secure, and only for VC4 to access (VC4 is RPi4's co-processor). Ultimately, >>>>>>> the solution is to narrow the register range exposed by avs-monitor to whatever >>>>>>> bcm2711-thermal needs (which is ATM a single 32bit register). >>>>>> >>>>>> When a peripheral decodes a region of address space, nobody says it has >>>>>> to accept accesses to *every* address in that space; registers may be >>>>>> sparsely populated, and although some devices might be "nice" and make >>>>>> unused areas behave as RAZ/WI, others may throw slave errors if you poke >>>>>> at the wrong places. As you note, in a TrustZone-aware device some >>>>>> registers may only exist in one or other of the Secure/Non-Secure >>>>>> address spaces. >>>>>> >>>>>> Even when there is a defined register at a given address, it still >>>>>> doesn't necessarily accept all possible types of access; it wouldn't be >>>>>> particularly friendly, but a device *could* have, say, some registers >>>>>> that support 32-bit accesses and others that only support 16-bit >>>>>> accesses, and thus throw slave errors if you do the wrong thing in the >>>>>> wrong place. >>>>>> >>>>>> It really all depends on the device itself. >>>>> >>>>> All in all, assuming there is no special device quirk to apply, the feeling I'm >>>>> getting is to just let the error be. As you hint, firmware has no blame here, >>>>> and debugfs is a 'best effort, zero guarantees' interface after all. >>>> >>>> We should probably fill a regmap_access_table to deny reading registers >>>> for which there is no address decoding and possibly another one to deny >>>> writing to the read-only registers. >>> >>> >>> Below is a patch that adds a read access table but it seems wrong to include >>> 'internal.h' and add the table in the thermal driver. Shouldn't this happen >>> in a higher layer, somehow between syscon and the thermal node? >> >> What is the purpose of doing doing this though that cannot already be done using devmem/devmem2 if the point is explore the address space? > > The goal is to prevent a kernel crash when doing > $ cat /sys/kernel/debug/regmap/dummy-avs-monitor\@fd5d2000/registers Fair enough, but that really does not scale across drivers nor across power management decisions being made to various drivers. The thermal sensor is unlikely to ever be clock gated by the time Linux runs, but if you were to do the same thing for any other type of peripheral, chances are the same outcome would be produced. So this really begs the question as to how to address this globally short of disabling regmap debugfs support which is likely what is happening in a production environment anyway. -- Florian _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel