From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 52788CEF172 for ; Tue, 8 Oct 2024 13:18:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:In-Reply-To:Content-Type: MIME-Version:References:Message-ID:Subject:Cc:To:From:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=AkryPjAUKwOVUDBI9WjeI1wAdcI1XNyVKA5YnNhkW4M=; b=MNVHQ0zx+w3LZUNr1aKD41IMpn A+M600KhBCLk68y57l5y+5hlxG3cC0/y4ugZS6NfCBQ1GYxV2AEL85y7Ddf1OCPVtddv2bQgkJ36L bfAYMDX2O8QO7pC2RmEYDpzUFIwfBl85GnsbvrLiqlh8GRMU97YIVsyx/lvMURk0w2cIMMbPL+H5Y kA5+KjyctGXxF4xVqsWWoVGrlpciv5nrVBEW7UgJzSrhAkvL/tO3aqqU4chuv6I/eoQzBSDkybVCG OjBAxqLFBvKmV9WpQRwQ2sWXD3nzHnRbI7C2Lih2vXIWPSqpyT37KRUHjE9cyGDFCtQaNffUxElGg gXowa1Lw==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1syA6e-00000005x7S-07KR; Tue, 08 Oct 2024 13:18:28 +0000 Received: from foss.arm.com ([217.140.110.172]) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1syA5K-00000005wsZ-10Op for linux-arm-kernel@lists.infradead.org; Tue, 08 Oct 2024 13:17:07 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 68DB3FEC; Tue, 8 Oct 2024 06:17:34 -0700 (PDT) Received: from bogus (e133711.arm.com [10.1.196.77]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 8EAED3F640; Tue, 8 Oct 2024 06:17:03 -0700 (PDT) Date: Tue, 8 Oct 2024 14:17:00 +0100 From: Sudeep Holla To: Justin Chen Cc: Cristian Marussi , arm-scmi@vger.kernel.org, linux-arm-kernel@lists.infradead.org, peng.fan@nxp.com, Sudeep Holla , bcm-kernel-feedback-list@broadcom.com, florian.fainelli@broadcom.com Subject: Re: [PATCH] firmware: arm_scmi: Queue in scmi layer for mailbox implementation Message-ID: References: <20241004221257.2888603-1-justin.chen@broadcom.com> <1ad5c4e9-9f98-40ab-afa4-a7939781e8cc@broadcom.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1ad5c4e9-9f98-40ab-afa4-a7939781e8cc@broadcom.com> X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20241008_061706_387583_B730EA89 X-CRM114-Status: GOOD ( 27.95 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Mon, Oct 07, 2024 at 10:58:47AM -0700, Justin Chen wrote: > > > On 10/7/24 6:10 AM, Cristian Marussi wrote: > > On Mon, Oct 07, 2024 at 02:04:10PM +0100, Sudeep Holla wrote: > > > On Fri, Oct 04, 2024 at 03:12:57PM -0700, Justin Chen wrote: > > > > The mailbox layer has its own queue. However this confuses the per > > > > message timeouts since the clock starts ticking the moment the messages > > > > get queued up. So all messages in the queue have there timeout clocks > > > > ticking instead of only the message inflight. To fix this, lets move the > > > > queue back into the SCMI layer. > > > > > > > > > > I think this has come up in the past. We have avoided adding addition > > > locking here as the mailbox layer takes care of it. Has anything changed > > > recently ? > > > > I asked for an explanation in my reply (we crossed each other mails probably) > > since it alredy came up in the past a few times and central locking seemed not > > to be needed...here the difference is about the reason...Justin talks about > > message timeouts related to the queueing process..so I asked to better > > explain the detail (and the anbomaly observed) since it still does not > > seem to me that even in this case the lock is needed....anyway I can > > definitely be woring of course :D > > > > Hello Cristian, > > Thanks for the response. I'll try to elaborate. > > When comparing SMC and mailbox transport, we noticed mailbox transport > timesout much quicker when under load. Originally we thought this was the > latency of the mailbox implementation, but after debugging we noticed a > weird behavior. We saw SMCI transactions timing out before the mailbox even > transmitted the message. > > This issue lies in the SCMI layer. drivers/firmware/arm_scmi/driver.c > do_xfer() function. > > The fundamental issue is send_message() blocks for SMC transport, but > doesn't block for mailbox transport. So if send_message() doesn't block we > can have multiple messages waiting at scmi_wait_for_message_response(). > > SMC looks like this > CPU #0 SCMI message 0 -> calls send_message() then calls > scmi_wait_for_message_response(), timesout after 30ms. > CPU #1 SCMI message 1 -> blocks at send_message() waiting for SCMI message 0 > to complete. > > Mailbox looks like this > CPU #0 SCMI message 0 -> calls send_message(), mailbox layer queues up > message, mailbox layer sees no message is outgoing and sends it. CPU waits > at scmi_wait_for_message_response(), timesout after 30ms > CPU #1 SCMI message 1 -> calls send_message(), mailbox layer queues up > message, mailbox layer sees message pending, hold message in queue. CPU > waits at scmi_wait_for_message_response(), timesout after 30ms. > > Lets say if transport takes 25ms. The first message would succeed, the > second message would timeout after 5ms. > I understand this issue and was aware of this. Just had assumed it won't have such a impact as I was assuming transport to take couple of ms for any synchronous commands. Typically I would expect transport to take much less that 25ms(1-4ms IMO) to complete any synchronous commands. If it takes 25ms for sync command, then it could be some serious issue. Anyways I am not against the idea. More details in response to Cristian. -- Regards, Sudeep