From: Srinath Mannam <srinath.mannam@broadcom.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: Ray Jui <ray.jui@broadcom.com>,
Vikram Prakash <vikram.prakash@broadcom.com>,
Scott Branden <scott.branden@broadcom.com>,
kvm@vger.kernel.org,
BCM Kernel Feedback <bcm-kernel-feedback-list@broadcom.com>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
aik@ozlabs.ru, David Gibson <david@gibson.dropbear.id.au>,
benh@kernel.crashing.org
Subject: Re: [RFC PATCH] vfio/pci: map prefetchble bars as writecombine
Date: Wed, 1 Aug 2018 23:28:53 +0530 [thread overview]
Message-ID: <CABe79T7-+qL3siCDc6SeEqEaj3TJQ6O_sVTZzm+wXoYJQWyvbA@mail.gmail.com> (raw)
In-Reply-To: <CABe79T5du2Ctx9LuFU8_5eO+9ckm6+BMB0vMf6yL2A_wNwE1TQ@mail.gmail.com>
Hi Alex,
In user space UIO driver (DPDK) implementation, sysfs interface
"/sys/devices/pci/.../resource0_wc" is used to map prefetchable PCI
resources as WC.
Platforms which support write-combining maps of PCI resources have
arch_can_pci_mmap_wc() flag enabled. So that it allows to map resources as WC.
In this approach mmap calls "pci_mmap_resource_range" kernel function
with write_combine parameter set.
"drivers/pci/pci-sysfs.c" kernel file has this implementation.
If this approach fits to vfio driver, then code change in vfio driver are
if (arch_can_pci_mmap_wc() &&
(pci_resource_flags(pdev, index) & IORESOURCE_PREFETCH))
vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
else
vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
Please provide your feedback.
Thank you.
On Mon, Jul 23, 2018 at 2:03 PM, Srinath Mannam
<srinath.mannam@broadcom.com> wrote:
> On Sat, Jul 21, 2018 at 1:57 AM, Alex Williamson
> <alex.williamson@redhat.com> wrote:
>> On Thu, 19 Jul 2018 21:49:48 +0530
>> Srinath Mannam <srinath.mannam@broadcom.com> wrote:
>>
>>> HI Alex,
>>>
>>> On Thu, Jul 19, 2018 at 8:42 PM, Alex Williamson
>>> <alex.williamson@redhat.com> wrote:
>>> > On Thu, 19 Jul 2018 20:17:11 +0530
>>> > Srinath Mannam <srinath.mannam@broadcom.com> wrote:
>>> >
>>> >> HI Alex,
>>> >>
>>> >> On Thu, Jul 19, 2018 at 2:55 AM, Alex Williamson
>>> >> <alex.williamson@redhat.com> wrote:
>>> >> > On Thu, 19 Jul 2018 00:05:18 +0530
>>> >> > Srinath Mannam <srinath.mannam@broadcom.com> wrote:
>>> >> >
>>> >> >> Hi Alex,
>>> >> >>
>>> >> >> On Tue, Jul 17, 2018 at 8:52 PM, Alex Williamson
>>> >> >> <alex.williamson@redhat.com> wrote:
>>> >> >> > On Fri, 13 Jul 2018 10:26:17 +0530
>>> >> >> > Srinath Mannam <srinath.mannam@broadcom.com> wrote:
>>> >> >> >
>>> >> >> >> By default all BARs map with VMA access permissions
>>> >> >> >> as pgprot_noncached.
>>> >> >> >>
>>> >> >> >> In ARM64 pgprot_noncached is MT_DEVICE_nGnRnE which
>>> >> >> >> is strongly ordered and allows aligned access.
>>> >> >> >> This type of mapping works for NON-PREFETCHABLE bars
>>> >> >> >> containing EP controller registers.
>>> >> >> >> But it restricts PREFETCHABLE bars from doing
>>> >> >> >> unaligned access.
>>> >> >> >>
>>> >> >> >> In CMB NVMe drives PREFETCHABLE bars are required to
>>> >> >> >> map as MT_NORMAL_NC to do unaligned access.
>>> >> >> >>
>>> >> >> >> Signed-off-by: Srinath Mannam <srinath.mannam@broadcom.com>
>>> >> >> >> Reviewed-by: Ray Jui <ray.jui@broadcom.com>
>>> >> >> >> Reviewed-by: Vikram Prakash <vikram.prakash@broadcom.com>
>>> >> >> >> ---
>>> >> >> >
>>> >> >> > This has been discussed before:
>>> >> >> >
>>> >> >> > https://www.spinics.net/lists/kvm/msg156548.html
>>> >> >> Thank you for inputs.. I have gone through the long list of mail chain
>>> >> >> discussion.
>>> >> >> >
>>> >> >> > CC'ing the usual suspects from the previous thread. I'm not convinced
>>> >> >> > that the patch here has considered anything other than the ARM64
>>> >> >> > implications and it's not clear that it considers compatibility with
>>> >> >> > existing users or devices at all. Can we guarantee for all devices and
>>> >> >> > use cases that WC is semantically equivalent and preferable to UC? If
>>> >> >> > not then we need to device an extension to the interface that allows
>>> >> >> > the user to specify WC. Thanks,
>>> >> >> >
>>> >> >> To implement with user specified WC flags, many changes need to be done.
>>> >> >> Suppose In DPDK, prefetcable BARs map using WC flag, then also same
>>> >> >> question comes
>>> >> >> that WC may be different for different CPUs.
>>> >> >> As per functionality, both WC and PREFETCHABLE are same, like merging writes and
>>> >> >> typically WC is uncached.
>>> >> >> So, based on prefetchable BARs behavior and usage we need to map bar memory.
>>> >> >> Is it right to map prefetchable BARs as strongly ordered, aligned
>>> >> >> access and uncached?
>>> >> >
>>> >> > Is it possible to answer that question generically? Whether to map a
>>> >> > BAR as UC or WC is generally a question for the driver. Does the
>>> >> > device handle unaligned accesses? Does the device need strong memory
>>> >> > ordering? If this is a driver level question then the driver that
>>> >> > needs to make that decision is the userspace driver. VFIO is just a
>>> >> > pass-through here and since we don't offer the user a choice of
>>> >> > mappings, we take the safer and more conservative mapping, ie. UC.
>>> >> >
>>> >> Yes, you are right, driver should make the decision based on its requirement.
>>> >> In my case, user space driver is part of SPDK, so SPDK should request DPDK
>>> >> and DPDK should request VFIO to map BAR for its choice of mapping.
>>> >> So to implement this we need code changes in VFIO, DPDK and SPDK.
>>> >>
>>> >> > You're suggesting that there are many changes to be done if we modify
>>> >> > the vfio interface to expose WC under the user's control rather than
>>> >> > simply transparently impose WC for all mappings, but is that really the
>>> >> > case? Most devices on most platforms seem to work fine now. Perhaps WC
>>> >> > is a performance optimization, but this is the first instance I've seen
>>> >> > of it as a functional issue. Does that suggest that the imposed
>>> >> > alignment on your platform is perhaps unique and the relaxed alignment
>>> >> > should be implemented at the architecture specific memory flags for UC
>>> >> > mappings? For instance, does x86 require this change for the same
>>> >> > device? The chance for regressions of other devices on other platforms
>>> >> > seems rather high as proposed. Thanks,
>>> >> This issue is not specific to platform or device. this is the requirement of
>>> >> CMB enabled NVMe cards.
>>> >> NVMe kernel driver already has support to map CMB bar as WC using
>>> >> ioremap_wc function.
>>> >> File: drivers/nvme/host/pci.c
>>> >> Function: nvme_map_cmb
>>> >> code: dev->cmb = ioremap_wc(pci_resource_start(pdev, bar) + offset, size);
>>> >> It means ioremap_wc is working with all platforms and WC map of
>>> >> perfetchable BARs does not harm.
>>> >> Same is required in SPDK NVMe driver also, without WC map it may work
>>> >> in x86 platform, but it does not work in ARM platforms.
>>> >
>>> > Doesn't this contradict your assertion that it's not specific to
>>> > platform or device? The device requires support for unaligned
>>> > accesses. The platform chooses to restrict unaligned accesses for
>>> > non-WC mappings while other platforms do not. The native driver can
>>> > still clearly have performance considerations for choosing to use WC
>>> > mappings, but it's still not clear to me that the functionality issue
>>> > isn't self inflicted by the platform definition of UC vs WC. Thanks,
>>> >
>>> Device allows both aligned and unaligned access.. so software have
>>> flexibility can do unaligned access.
>>> As per ARM64 platform, with UC map, memory access should be un-cached,
>>> aligned access and strongly order mapping.
>>> with WC map, memory access can be aligned/unaligned and un-cached. I
>>> thought this is the property of platform not issue.
>>> To allow software to do unaligned access of device memory, we need to
>>> use WC map of ARM64 platform case.
>>> In ARM platforms UC mapping is used to map controller registers which
>>> are 4 byte aligned exposed by non-prefetchable bars.
>>> Also prefetchable BARs allows write merging so I thought using WC map
>>> fulfills both write merging (add performance) and unaligned
>>> access.
>>> Can I modify the code as below to enable only for ARM platforms.
>>> vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
>>> +#if defined(CONFIG_ARM) || defined(CONFIG_ARM64)
>>> + if (pci_resource_flags(pdev, index) & IORESOURCE_PREFETCH)
>>> + vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
>>> +#endif
>>> vma->vm_pgoff = (pci_resource_start(pdev, index) >> PAGE_SHIFT) + pgoff;
>>
>> While the risk of regression is smaller by restricting this to ARM, I
>> don't think it's the right solution. What happens when a device
>> requires strict ordering? ARM now behaves differently than any other
>> architecture, that's not acceptable. Thanks,
> If strict ordering is required for prefetchable bars, driver software
> has to add barrier instructions.
> With this we can assume for prefetchable bars WC mapping should be fine?
>
> Regards,
> Srinath.
>>
>> Alex
next prev parent reply other threads:[~2018-08-01 17:58 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-07-13 4:56 [RFC PATCH] vfio/pci: map prefetchble bars as writecombine Srinath Mannam
2018-07-17 15:22 ` Alex Williamson
2018-07-18 18:35 ` Srinath Mannam
2018-07-18 21:25 ` Alex Williamson
2018-07-19 14:47 ` Srinath Mannam
2018-07-19 15:12 ` Alex Williamson
2018-07-19 16:19 ` Srinath Mannam
2018-07-20 20:27 ` Alex Williamson
2018-07-23 8:33 ` Srinath Mannam
2018-08-01 17:58 ` Srinath Mannam [this message]
2018-08-01 19:38 ` Alex Williamson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CABe79T7-+qL3siCDc6SeEqEaj3TJQ6O_sVTZzm+wXoYJQWyvbA@mail.gmail.com \
--to=srinath.mannam@broadcom.com \
--cc=aik@ozlabs.ru \
--cc=alex.williamson@redhat.com \
--cc=bcm-kernel-feedback-list@broadcom.com \
--cc=benh@kernel.crashing.org \
--cc=david@gibson.dropbear.id.au \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=ray.jui@broadcom.com \
--cc=scott.branden@broadcom.com \
--cc=vikram.prakash@broadcom.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).