From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id C0AEBC7EE2E for ; Thu, 25 May 2023 08:31:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References: Message-ID:Subject:Cc:To:From:Date:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=QmuIkLLy/X/c8I65kIElAnrswXspfjmJ+yPQuHK9+bM=; b=rI8bghY6PakjNq 6sf62yDxhPMtctIXTF6sE2kLFZ+Mn8ZWdB7ih5jhH79SXlST5Ut/yP0i+8LAK+ZfVIcAJYznKTwY1 Xw0tGpQc2E7Jw4AUT5IH37dx/NH1Oy0B6bjZ9NBsg1Q5CADEDy6QTcK/Dcc4LbeK0zjLf7PcpK6qZ QppqskR/WqdlAqa34Q+oonm2qrlnADyxMtUDrjDhi76EYKU6jy9AAYisvS+RVcUhPS09dIkLH7L3o vS+eO6U1EzsDMX+9wtL8NTn/B74EBTBimaU/xCGeGMgt7yuoH4BSCdhwAp1Ad4J7epROPNfgCIPr1 JXkynX4NdOTbNkNRiDZQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1q26Mm-00FzfQ-1T; Thu, 25 May 2023 08:30:36 +0000 Received: from dfw.source.kernel.org ([139.178.84.217]) by bombadil.infradead.org with esmtps (Exim 4.96 #2 (Red Hat Linux)) id 1q26Mi-00Fzeq-28 for linux-arm-kernel@lists.infradead.org; Thu, 25 May 2023 08:30:34 +0000 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 2EA5B643D7; Thu, 25 May 2023 08:30:32 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 28428C433EF; Thu, 25 May 2023 08:30:29 +0000 (UTC) Date: Thu, 25 May 2023 09:30:27 +0100 From: Catalin Marinas To: David Clear Cc: ardb@kernel.org, linux-arm-kernel@lists.infradead.org, mark.rutland@arm.com, maz@kernel.org, will@kernel.org Subject: Re: ARM64: Question: How to map non-shareable memory Message-ID: References: <20230525003359.3690-1-dclear@amd.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20230525003359.3690-1-dclear@amd.com> X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230525_013032_777776_5F7C245C X-CRM114-Status: GOOD ( 32.77 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Hi David, On Wed, May 24, 2023 at 05:33:59PM -0700, David Clear wrote: > On Wed, 24 May 2023 at 23:59, Ard Biesheuvel wrote: > > Non-shareable cacheable mappings are problematic because they are not > > covered by the hardware coherency protocol that keeps caches > > synchronized between CPUs and cluster-level and system-level caches. > > (IOW, accesses to non-shareable mappings will have snooping disabled). > > > > This means that, unless your system only has a single CPU and does not > > support cache coherent DMA at all, the cached view of those RAM > > regions will go out of sync between CPUs and wrt other coherent > > masters, which is probably not what you're after. > > Hi Ard. Thanks for the quick reply. > > I understand your concerns. The general Linux memory within the > (multi-cluster) system is fully coherent, and there are no surprises > w.r.t normal SMP system operation and device DMA. > > The non-coherent memories are outside of the general Linux pool, owned > by autonomous hardware units, and are used for product-specific purposes. > These memories are either internal to the units (far away from coherence > machinery) or purposefully avoid the system coherency controllers so as > to not incur the latecy tax in back-to-back dependent transactions. In > this product it would be a significant performance burden to maintain > coherence with ARM caches that are essentially nothing to do with these > unit's operations. Are these memories bus masters themselves? I doubt it. My guess is that such memory is also accessed by a device that cannot maintain coherency with the CPU caches. So IIUC you want a cached mapping from the CPU side for performance reason but treat it non-coherent from a DMA perspective. For some hardware reason, shareable cacheable transactions to such memory trigger SErrors. Do you know why this is the case? Because any other non-cacheable transactions are considered shareable anyway. Or is it that out shareable is fine but inner shareable is not? The Arm CPUs don't really distinguish between these AFAIK. > For the userspace software that needs to access this memory, the current > non-cached mapping is obtained via a device driver and the goal is > to minimize the number of discrete memory transactions by supporting > cached burst-reads and burst-writes, bracketed with appropriate cache > maintenance ops. There are already private caches within the hardware > pipelines that software needs to be explicitly flush or invalidate, > so this is just one more thing. I agree with Ard, such mapping won't work. When you mark it as non-shareable, it tells the CPU that the cache lines for that mapping are not shared with other CPUs, they don't participate in the cache coherency protocols. Any cache maintenance to PoC is also limited to that CPU. See "Effects of instructions that operate by VA to the PoC" in the latest Arm ARM (page D7-5784). So let's say that your user process starts reading from such mapping (potentially speculatively) but doing some DC IVAC before (it needs to be in the kernel). The process is than migrated by the kernel to another CPU which has stale cache lines for that range because the DC IVAC only affected the first CPU. Similarly with the writes, you can't guarantee that the write and the DC CVAC happen on the same CPU. I also have no idea how some "transparent" system caches behave here, whether they do anything on the DC instructions and how shareability changes their behaviour. Your best bet is Normal Non-cacheable here. On newer architecture versions Arm introduced ST64B/LD64B for similar performance reasons (FEAT_LS64 in Armv8.7) but I don't think there's hardware yet. -- Catalin _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel