From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C6418C433E7 for ; Wed, 2 Sep 2020 16:48:33 +0000 (UTC) Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 941F620767 for ; Wed, 2 Sep 2020 16:48:33 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="i4k3hpa3" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 941F620767 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=arm.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=merlin.20170209; h=Sender:Content-Transfer-Encoding: Content-Type:Cc:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References:Message-ID: Subject:To:From:Date:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=2Dk5Hsv68SuDGxziDjL4m3fMpDIhdWR/HnBnE3efhJA=; b=i4k3hpa3G7zLdCZRK5i7uxUpk ITNk8Z/5InMY80puNIcpDaQjt0kyrtJUXTiNi0eY847myKe72O1QEssSznw1OITkcDyMgX5jEWaE/ EPICmjYRApGEP6A+z2dGJg6z/BQLT2W0G0FQnPKXUA2gzZw6tIW8zMCnSldrhUP1vTHS3T9kj/lMj 8hMkEh2lWCDvhHXA9XDtiB95JtD7kOqAs0+pMYMepF1axxCW4bQXpNtI24+2+f6OO7QRvRmlJzvk+ zDjU1jx4+B4vEh+nwDENo/Rsbshh2/CZwj3N0nRcPJ86vl3u301u2BNWKyH6F1Bsluh1Y+p+D3hFz 6/Oi9+BDw==; Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1kDVuj-0000Km-Ak; Wed, 02 Sep 2020 16:47:13 +0000 Received: from foss.arm.com ([217.140.110.172]) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1kDVug-0000FS-8G for linux-arm-kernel@lists.infradead.org; Wed, 02 Sep 2020 16:47:11 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 99C481045; Wed, 2 Sep 2020 09:47:08 -0700 (PDT) Received: from e121166-lin.cambridge.arm.com (e121166-lin.cambridge.arm.com [10.1.196.255]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 99F153F66F; Wed, 2 Sep 2020 09:47:07 -0700 (PDT) Date: Wed, 2 Sep 2020 17:47:02 +0100 From: Lorenzo Pieralisi To: Clint Sbisa Subject: Re: [PATCH] arm64: Enable PCI write-combine resources under sysfs Message-ID: <20200902164702.GA30611@e121166-lin.cambridge.arm.com> References: <20200831151827.pumm2p54fyj7fz5s@amazon.com> <20200902113207.GA27676@e121166-lin.cambridge.arm.com> <20200902142922.xc4x6m33unkzewuh@amazon.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20200902142922.xc4x6m33unkzewuh@amazon.com> User-Agent: Mutt/1.9.4 (2018-02-28) X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20200902_124710_380659_F7F9CAB0 X-CRM114-Status: GOOD ( 48.02 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: benh@kernel.crashing.org, linux-pci@vger.kernel.org, Bjorn Helgaas , catalin.marinas@arm.com, will@kernel.org, linux-arm-kernel@lists.infradead.org Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org [+LAKML, Will, Catalin] This is an ARM64 arch discussion so please keep the above CCed from now onwards. On Wed, Sep 02, 2020 at 02:29:22PM +0000, Clint Sbisa wrote: > On Wed, Sep 02, 2020 at 12:32:07PM +0100, Lorenzo Pieralisi wrote: > > > > On Mon, Aug 31, 2020 at 03:18:27PM +0000, Clint Sbisa wrote: > > > Using write-combine is crucial for performance of PCI devices where > > > significant amounts of transactions go over PCI BARs. > > > > Write-combine is an x86ism that means nothing on ARM64 platforms > > so this should be rewritten to say what you actually mean, namely, > > you want to allow prefetchable resources to be mapped with > > "write combine" semantics (which means normal non-cacheable > > memory on arm64) through proc/sysfs. > > > > This is an outright can of worms and the PCI specs don't help in this > > respect, since we may end up mapping resources that have read > > side-effects with normal NC mappings (ie that's what "write combine" is > > in arm64 - pgprot_writecombine() and that's speculative memory). > > > > I am referring to "Additional Guidance on the Prefetchable Bit > > in Memory Space BARs" in the PCI specifications - it does not make > > any sense and must be removed because people use it to design > > endpoints. > > > > True - this is a problem even in kernel drivers but at least there > > the ioremap_ semantics is in the driver and can be vetted. > > > > This patch would make it user space ABI so I am a little nervous > > about merging this code TBH. > > > > User space applications are utilizing WC already. You can see DPDK code using > resourceX_wc over the usual resourceX file at > https://github.com/DPDK/dpdk/blob/main/drivers/bus/pci/linux/pci_uio.c#L312 > (commit https://github.com/DPDK/dpdk/commit/4a928ef9f61). I know. > Given that write-combine support was added in 2008 for x86 (and is > also enabled for powerpc and ia64), I'm not sure if there'd be a > downside to enabling it on arm64 as well given how prevalent it is. I explained to you the reasons why this can have downsides and write combine is a concept that does not exist in the ARM64 world, actually it would be good if Ben can chime in to define how this works on power. >Lorenzo, do you still have particular concerns about exposing this to >userspace? Yes I do and I expressed them. The first concern is the WC ambiguity on non-x86 systems, it looks like write combinining means everything and nothing at the same time on != x86 arches. On x86 prefetchable BAR == WC mapping (still conditional on arch features ie PAT, not a blanket enable). On ARM64 WC mapping currently corresponds to normal NC memory and the PCIe specs allow read side-effects BAR to be marked as prefetchable, I need to force PCI sig to remove the section I mentioned from the specifications because there is NO way it can be detected if a prefetchable BAR maps to read side-effects memory. A kernel device driver would (hopefully) know, sysfs code that just checks the prefetchable attribute and exports resource_WC does not. As I mentioned, if the mapping is done in a device specific driver it can be vetted and there are not many drivers mapping BARs as ioremap_wc(). > I understand that my commit message is outright wrong after your explanation, > so I'll rewrite that. > > > > arm64 supports write-combine PCI mappings, so the appropriate define > > > has been added which will expose write-combine mappings under sysfs > > > for prefetchable PCI resources. > > > > > > Signed-off-by: Clint Sbisa > > > --- > > > arch/arm64/include/asm/pci.h | 1 + > > > 1 file changed, 1 insertion(+) > > > > > > diff --git a/arch/arm64/include/asm/pci.h b/arch/arm64/include/asm/pci.h > > > index 70b323cf8300..b33ca260e3c9 100644 > > > --- a/arch/arm64/include/asm/pci.h > > > +++ b/arch/arm64/include/asm/pci.h > > > @@ -17,6 +17,7 @@ > > > #define pcibios_assign_all_busses() \ > > > (pci_has_flag(PCI_REASSIGN_ALL_BUS)) > > > > > > +#define arch_can_pci_mmap_wc() 1 > > > > I am not comfortable with this blanket enable. Some existing drivers, > > eg: > > > > drivers/infiniband/hw/mlx5 > > > > use this macro to detect WC capability which again, it is x86 specific, > > on arm64 it means nothing and can have consequences on the driver > > operations. > > If that driver is fixed to check what it actually wants to check, would that > address your concern about the blanket enable? I don't see any other references > to this in kernel drivers and I think the documentation at > `filesystems/sysfs-pci.rst` outlines it pretty explicitly: > > Platforms which support write-combining maps of PCI resources must define > arch_can_pci_mmap_wc() which shall evaluate to non-zero at runtime when > write-combining is permitted. That's exactly the problem. I am asking you: what does "write-combining maps of PCI resources" mean ? I understand we do want weak ordering for prefetchable BAR mappings but my worry is that by exposing the resources as WC to user space we are giving user space the impression that those mappings mirror x86 WC mappings behaviour that is not true on ARM64. Again - Ben has extensive experience on this on power I would be happy to get his point of view before proceeding, it is important to undestand how this work on non-x86 systems. > It is otherwise only used by pci-sysfs to determine if a resourceX_wc > file should be exposed. I know - that's understood. Thanks, Lorenzo _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel