From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 89F32CEF17C for ; Tue, 8 Oct 2024 14:11:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:In-Reply-To:Content-Type: MIME-Version:References:Message-ID:Subject:Cc:To:From:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=FivGlGofmaVBhpcxJhbzFPXIYsaYBKhkxIXOmV5AnfM=; b=SYzeeeUvshgZoQaHyANUylw3Gj JEteSFjPtmdOnQnf79idAbc9GRhDVZRR+fswrVGelKmrTekY2dof5JQU9hmrqgAHKs4x6QJm+KLUm x4PGbQ2NXLumHqi0G7Gja9ExcS+J6U878GGJMZK8zPR+6UE2g4NixhsuHzrHJAB6fF/fmMqHa+2FJ J+WjIv0hWl083EH2b6L4adwnyx0YG9KOb/VQq7xB1wiJ2jLKloqLzXo5Kw8Tki1Ip5VoL6JysL90t mbRJab0DKB8BNJyOxf1015xQQ3Tvg+z3tpoESA8w67AB1MD97TExh7f5GxCY0Cw1rh2kdLePmaJc7 SQMTUbWQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1syAwD-000000067gu-2lxl; Tue, 08 Oct 2024 14:11:45 +0000 Received: from foss.arm.com ([217.140.110.172]) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1syAuu-000000067Xa-2RFD for linux-arm-kernel@lists.infradead.org; Tue, 08 Oct 2024 14:10:25 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 78ED0DA7; Tue, 8 Oct 2024 07:10:53 -0700 (PDT) Received: from pluto (usa-sjc-mx-foss1.foss.arm.com [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id EA68E3F64C; Tue, 8 Oct 2024 07:10:21 -0700 (PDT) Date: Tue, 8 Oct 2024 15:10:19 +0100 From: Cristian Marussi To: Sudeep Holla Cc: Florian Fainelli , Cristian Marussi , linux-arm-kernel@lists.infread.org, Rob Herring , Krzysztof Kozlowski , Conor Dooley , "open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS" , open list , "open list:SYSTEM CONTROL & POWER/MANAGEMENT INTERFACE" , "moderated list:SYSTEM CONTROL & POWER/MANAGEMENT INTERFACE" , justin.chen@broadcom.com, opendmb@gmail.com, kapil.hali@broadcom.com, bcm-kernel-feedback-list@broadcom.com, Arnd Bergmann Subject: Re: [PATCH] firmware: arm_scmi: Give SMC transport precedence over mailbox Message-ID: References: <20241006043317.3867421-1-florian.fainelli@broadcom.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20241008_071024_736554_893031EB X-CRM114-Status: GOOD ( 51.89 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Tue, Oct 08, 2024 at 02:06:17PM +0100, Sudeep Holla wrote: > Hi Florian, > > Thanks for the detailed explanation. > > On Mon, Oct 07, 2024 at 10:07:46AM -0700, Florian Fainelli wrote: > > Hi Cristian, > > > > On October 7, 2024 4:52:33 AM PDT, Cristian Marussi > > wrote: > > > On Sat, Oct 05, 2024 at 09:33:17PM -0700, Florian Fainelli wrote: > > > > Broadcom STB platforms have for historical reasons included both > > > > "arm,scmi-smc" and "arm,scmi" in their SCMI Device Tree node compatible > > > > string. > > > > > > Hi Florian, > > > > > > did not know this.. > > > > It stems from us starting with a mailbox driver that did the SMC call, and > > later transitioning to the "smc" transport proper. Our boot loader provides > > the Device Tree blob to the kernel and we maintain backward/forward > > compatibility as much as possible. > > > > IIUC, you need to support old kernel with SMC mailbox driver and new SMC > transport within the SCMI. Is that right understanding ? > > > > > > > > > > > > After the commit cited in the Fixes tag and with a kernel > > > > configuration that enables both the SCMI and the Mailbox transports, we > > > > would probe the mailbox transport, but fail to complete since we would > > > > not have a mailbox driver available. > > > > > > > Not sure to have understood this... > > > > > > ...you mean you DO have the SMC/Mailbox SCMI transport drivers compiled > > > into the Kconfig AND you have BOTH the SMC AND Mailbox compatibles in > > > DT, BUT your platform does NOT physically have a mbox/shmem transport > > > and as a consequence, when MBOX probes (at first), you see an error from > > > the core like: > > > > > > "arm-scmi: unable to communicate with SCMI" > > > > > > since it gets no reply from the SCMI server (being not connnected via > > > mbox) and it bails out .... am I right ? > > > > In an unmodified kernel where both the "mailbox" and "smc" transports are > > enabled, we get the "mailbox" driver to probe first since it matched the > > "arm,scmi" part of the compatible string and it is linked first into the > > kernel. Down the road though we will fail the initialization with: > > > > [ 1.135363] arm-scmi arm-scmi.1.auto: Using scmi_mailbox_transport > > [ 1.141901] arm-scmi arm-scmi.1.auto: SCMI max-rx-timeout: 30ms > > [ 1.148113] arm-scmi arm-scmi.1.auto: failed to setup channel for > > protocol:0x10 > > IIUC, the DTB has mailbox nodes that are available but fail only in the setup > stage ? Or is it marked unavailable and we are missing some checks either > in SCMI or mailbox ? > > IOW, have you already explored that this -EINVAL is correct return value > here and can't be changed to -ENODEV ? I might be not following the failure > path correctly here, but I assume it is > scmi_chan_setup() > info->desc->ops->chan_setup() > mailbox_chan_setup() > mbox_request_channel() > > > [ 1.155828] arm-scmi arm-scmi.1.auto: error -EINVAL: failed to setup > > channels > > [ 1.163379] arm-scmi arm-scmi.1.auto: probe with driver arm-scmi failed > > with error -22 > > > > Because the platform device is now bound, and there is no mechanism to > > return -ENODEV, we won't try another transport driver that would attempt to > > match the other compatibility strings. That makes sense because in general > > you specify the Device Tree precisely, and you also have a tailored kernel > > configuration. Right now this is only an issue using arm's > > multi_v7_defconfig and arm64's defconfig both of which that we intend to > > keep on using for CI purposes. > > > > > > > > > > If this is the case, without this patch, after this error and the mbox probe > > > failing, the SMC transport, instead, DO probe successfully at the end, right ? > > > > With my patch we probe the "smc" transport first and foremost and we > > successfully initialize it, therefore we do not even try the "mailbox" > > transport at all, which is intended. > > > > > > > > IOW, what is the impact without this patch, an error and a delay in the > > > probe sequence till it gets to the SMC transport probe 9as second > > > attempt) or worse ? (trying to understand here...) > > > > There is no recovery without the patch, we are not giving up the arm_scmi > > platform device because there is no mechanism to return -ENODEV and allow > > any of the subsequent transport drivers enabled to attempt to take over the > > platform device and probe it again. > > > > OK this sounds like you have already explored returning -ENODEV is not > an option ? It is fair enough, but just want to understand correctly. > I still think I am missing something. Having a quick look at dd.c it seems to me that the probe error from the first matched driver->probe is propagated back to the callchain (and the driver that fails the probe in any way is NOT bound at that point) till driver_probe_device() THEN, on one side, in __driver_attach() then the retval is ignored: dd.c::__driver_attach() /* * Lock device and try to bind to it. We drop the error * here and always return 0, because we need to keep trying * to bind to devices and some drivers will return an error * simply if it didn't support the device. * * driver_probe_device() will spit a warning if there * is an error. ...while, on the other side, looking at __device_attach_driver() it DOES report the error from driver_probe_device() BUT the __device_attach_driver() routine is called by bus_for_eachdrv() inside __device_attach() and DOES cause such loop (bus_for_each_drv() to bail out with an error...BUT, again, no more driver match/probe is attempted and I suppose that if you restart somehow such sequence you will endup again failing at the same point on the same first-match driver... So seems a sort of structural issue...also because indeed you have something that is somehow a malformed DT so the device_match succeeds for good reasons... I may have miss a lot more, though :D Thanks, Cristian