From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 3F3C2C6FA83 for ; Mon, 12 Sep 2022 18:17:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References: Message-ID:Subject:Cc:To:From:Date:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=WsJZ9876eClKThwSClFtXjszYx3hR9YpzzPJrTO9aqw=; b=0vYbXVHF5VziFL r5h8kNwt69X2w6N/EOs2fOuOpsJmskoU3cvxHh7EUxlHNhnP8+hhHt60IlFxYdtuEQOx42+CzM9lo OIVwabbqyRorKwi4hjypk7jph+mIpeQeQF8Ysw8RlFDbQtJr0b4orCcgspwuC90twY2NljU9TLQhy TZny6ORLSfeT/QdNN752RKXGKh5mVWwwL0qDGf5C3gVyMbZkJv+Jl32uuT47xrDzL1IHEyJ9O+FLf A0FDRZLoXEvW/DjvLMgna5glixwKeG2wmRYihatVSAOgP7rE8B/bThDn3cMtJenVICUOajazrQ3OU w76mgJqJut7+9spsdcjQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1oXnzH-00CPPa-R8; Mon, 12 Sep 2022 18:16:51 +0000 Received: from ams.source.kernel.org ([2604:1380:4601:e00::1]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1oXnz2-00CPHz-55 for linux-arm-kernel@lists.infradead.org; Mon, 12 Sep 2022 18:16:38 +0000 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 20490B80CBC; Mon, 12 Sep 2022 18:16:33 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id D1548C433D6; Mon, 12 Sep 2022 18:16:29 +0000 (UTC) Date: Mon, 12 Sep 2022 19:16:26 +0100 From: Catalin Marinas To: Will Deacon Cc: Mark Zhang , linux-arm-kernel@lists.infradead.org, Yishai Hadas , Jason Gunthorpe , Maor Gottlieb , Leon Romanovsky , Michael Guralnik , Michael Berezin , yong.xu@arm.com, Eran Ben Elisha Subject: Re: Should we use "dsb" or "dmb" between write to buffer and write to register Message-ID: References: <2bea1a7b-935e-695d-ddaa-13eacda5672c@nvidia.com> <20220908135017.GB31677@willie-the-truck> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20220908135017.GB31677@willie-the-truck> X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20220912_111636_513180_5DB31603 X-CRM114-Status: GOOD ( 51.84 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Thu, Sep 08, 2022 at 02:50:17PM +0100, Will Deacon wrote: > On Wed, Sep 07, 2022 at 06:53:43PM +0100, Catalin Marinas wrote: > > On Mon, Aug 22, 2022 at 03:53:42PM +0800, Mark Zhang wrote: > > > May I consult when to use dsb or dmb in our device driver, thanks: > > > > > > For example when send a command a FW/HW, usually we do it with 3 steps: > > > 1. memcpy(buff, src, size); > > > 2. wmb(); > > > 3. write64(ctrl, reg_addr); > > I'm assuming that write64 is just a plain 64-bit store to a device mapping > and doesn't imply any further ordering. That was my assumption as well, an STR to device memory (if it's an MSR, we do need a DSB). > > > IIUC in kernel wmb() is implemented with "dsb st". When we change this to > > > "dmb st" then we get better performance, but we are not sure if it's safe. I > > > have read Will's post[1] but still not sure. > > > > > > So our questions are: > > > 1. can we use "dmb" here? > > > 2. If we can then should we use "dmb st", or "dmb oshst"? > > > > > > Thank you very much. > > > > > > [1] https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git/commit/?id=22ec71615d824f4f11d38d0e55a88d8956b7e45f > > > > Will convinced me at the time that it's sufficient, though every time I > > revisit this I get confused ;). Not sure whether we have updated the > > memory model since to cover such scenarios. In practice at least from > > what I recall that should be safe. > > The Armv8 memory model is "other-multi-copy-atomic" which means that a > store is either visible _only_ to the observer from which it originates > or it is visible to all observers. It cannot exist in some intermediate > state. > > With that, the insight is that a write to the MMIO interface of a shared > peripheral must be observed by all observers when it reaches the endpoint. What's the endpoint here? The device itself or some serialisation point on the path to the device? IIUC, this can be a serialisation point in certain circumstances (e.g. with early write acknowledgement). > Consequently, we only need to ensure that the stores from your memcpy() > in the motivating example are observed before the MMIO write is observed > and a DMB ST is sufficient for that. Yes but this is all about other observers observing the MMIO write rather than the device itself which cannot observe the MMIO write, so the CPU doesn't need to impose any order between these two. Let's say we have a topology with two ports, one for MMIO and the other for RAM accesses, each with its own serialisation point: +-------+ +-------+ | CPU 0 | | CPU 1 | +-------+ +-------+ | | | | (a)--|--------+ | (a) MMIO serialisation point | +-----------(b)---+ (b) RAM serialisation point | | | +-----+ +-----+ | | Dev | | RAM | | +-----+ +-----+ | | | +-----DMA------------+ All accesses to RAM, including the device DMA, go through serialisation point (b). The MMIO accesses go through point (a). I don't know how realistic this is in practice (well, it can be a lot more complex) but with a few rules the above topology can obey the memory model. The simplest is for a DMB to cause the CPU to wait for the acknowledgement that a transaction reached a serialisation point before issuing new ones but there can be other ways like accesses issued on both ports before reaching the corresponding serialisation points. The serialisation points could communicate between them to ensure ordering in the presence of a third observer. My worry is that in the absence of CPU1 (or transactions from CPU1), the hardware may decide to forward an MMIO access to the device even if it is ordered after a RAM transaction since it doesn't break any observability rules (it might as well consider the device private). > > I guess the question is what does it mean for the device that a third > > observer saw the write64. In one interpretation of observability, > > another write64 from the third observer is ordered after the original > > write64 but to me it still doesn't help clarify any order imposed on the > > device access to 'buff': > > > > Initial state: > > buff=0 > > ctrl=0 > > > > P0: P1: Device: > > Wbuff=1 Wctrl=2 Ry=buff > > DMB DMB > > Wctrl=1 Rx=buff > > > > If the final 'ctrl' register value is 2 then x==1. But I don't see how > > y==0 or 1 is influenced by Wctrl=2. If x==1 on P1, any other observer, > > including the device, should see the buff value of 1 but this assumes > > that there is some other ordering for when Ry=buff is issued. > > You need to relate the write to 'ctrl' with the device's read of 'buff' > somehow. Under which circumstances does the device read 'buff' (i.e. > what are the register fields in 'ctrl')? I don't think we have anything in the memory model that can relate the write to MMIO with the device read from memory (DMA) since the device doesn't do a 'master' access to its own registers (i.e. go through serialisation point (a)). That's where I fail to explain in terms of the memory model why a DMB is sufficient (but I'm far from an expert here). The scenario I have in mind is that P0 might forward the Wctrl=1 before Wbuff=1 reaches serialisation point (b) (e.g. there is some congestion on that port). If Wctlr=2 on P1 arrives at (a) after Wctrl=1, serialisation point (a) could stall it until point (b) confirms that all transactions prior to DMB have been sent so that the P0/P1 ordering is respected. However, this has no effect on the device observing Wbuff=1. I think the other way around holds - if the device observes Wbuff=1, the P1 must observe it as well. But if the device doesn't observe Wbuff=1, nothing breaks AFAICT. I think we need a whiteboard (or a table in a pub after Plumbers). -- Catalin _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel