* [RFC PATCH] PCI: Add quirk to always map ivshmem as write-back
@ 2025-06-12 8:22 Alexey Kardashevskiy
2025-06-12 8:27 ` Alexey Kardashevskiy
0 siblings, 1 reply; 5+ messages in thread
From: Alexey Kardashevskiy @ 2025-06-12 8:22 UTC (permalink / raw)
To: linux-pci
Cc: linux-kernel, Bjorn Helgaas, David Woodhouse, Kai-Heng Feng,
Paolo Bonzini, Sean Christopherson, Nikunj A Dadhania,
Santosh Shukla, Alexey Kardashevskiy
QEMU Inter-VM Shared Memory (ivshmem) is designed to share a memory
region between guest and host. The host creates a file, passes it to QEMU
which it presents to the guest via PCI BAR#2. The guest userspace
can map /sys/bus/pci/devices/0000:01:02.3/resource2(_wc) to use the region
without having the guest driver for the device at all.
The problem with this, since it is a PCI resource, the PCI sysfs
reasonably enforces:
- no caching when mapped via "resourceN" (PTE::PCD on x86) or
- write-through when mapped via "resourceN_wc" (PTE::PWT on x86).
As the result, the host writes are seen by the guest immediately
(as the region is just a mapped file) but it takes quite some time for
the host to see non-cached guest writes.
Add a quirk to always map ivshmem's BAR2 as cacheable (==write-back) as
ivshmem is backed by RAM anyway.
(Re)use already defined but not used IORESOURCE_CACHEABLE flag.
This does not affect other ways of mapping a PCI BAR, a driver can use
memremap() for this functionality.
Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
---
What is this IORESOURCE_CACHEABLE for actually?
Anyway, the alternatives are:
1. add a new node in sysfs - "resourceN_wb" - for mapping as writeback
but this requires changing existing (and likely old) userspace tools;
2. fix the kernel to strictly follow /proc/mtrr (now it is rather
a recommendation) but Documentation/arch/x86/mtrr.rst says it is replaced
with PAT which does not seem to allow overriding caching for specific
devices (==MMIO ranges).
---
drivers/pci/mmap.c | 6 ++++++
drivers/pci/quirks.c | 8 ++++++++
2 files changed, 14 insertions(+)
diff --git a/drivers/pci/mmap.c b/drivers/pci/mmap.c
index 8da3347a95c4..8495bee08fae 100644
--- a/drivers/pci/mmap.c
+++ b/drivers/pci/mmap.c
@@ -35,6 +35,7 @@ int pci_mmap_resource_range(struct pci_dev *pdev, int bar,
if (write_combine)
vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
else
+ else if (!(pci_resource_flags(pdev, bar) & IORESOURCE_CACHEABLE))
vma->vm_page_prot = pgprot_device(vma->vm_page_prot);
if (mmap_state == pci_mmap_io) {
@@ -46,6 +47,11 @@ int pci_mmap_resource_range(struct pci_dev *pdev, int bar,
vma->vm_ops = &pci_phys_vm_ops;
+ if (pci_resource_flags(pdev, bar) & IORESOURCE_CACHEABLE)
+ return remap_pfn_range_notrack(vma, vma->vm_start, vma->vm_pgoff,
+ vma->vm_end - vma->vm_start,
+ vma->vm_page_prot);
+
return io_remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff,
vma->vm_end - vma->vm_start,
vma->vm_page_prot);
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index d7f4ee634263..858869ec6612 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -6335,3 +6335,11 @@ static void pci_mask_replay_timer_timeout(struct pci_dev *pdev)
DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_GLI, 0x9750, pci_mask_replay_timer_timeout);
DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_GLI, 0x9755, pci_mask_replay_timer_timeout);
#endif
+
+static void pci_ivshmem_writeback(struct pci_dev *dev)
+{
+ struct resource *r = &dev->resource[2];
+
+ r->flags |= IORESOURCE_CACHEABLE;
+}
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_REDHAT_QUMRANET, 0x1110, pci_ivshmem_writeback);
--
2.49.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [RFC PATCH] PCI: Add quirk to always map ivshmem as write-back
2025-06-12 8:22 [RFC PATCH] PCI: Add quirk to always map ivshmem as write-back Alexey Kardashevskiy
@ 2025-06-12 8:27 ` Alexey Kardashevskiy
2025-06-24 1:42 ` Alexey Kardashevskiy
0 siblings, 1 reply; 5+ messages in thread
From: Alexey Kardashevskiy @ 2025-06-12 8:27 UTC (permalink / raw)
To: linux-pci
Cc: linux-kernel, Bjorn Helgaas, David Woodhouse, Kai-Heng Feng,
Paolo Bonzini, Sean Christopherson, Santosh Shukla,
Nikunj A. Dadhania, kvm@vger.kernel.org
Wrong email for Nikunj :) And I missed the KVM ml. Sorry for the noise.
On 12/6/25 18:22, Alexey Kardashevskiy wrote:
> QEMU Inter-VM Shared Memory (ivshmem) is designed to share a memory
> region between guest and host. The host creates a file, passes it to QEMU
> which it presents to the guest via PCI BAR#2. The guest userspace
> can map /sys/bus/pci/devices/0000:01:02.3/resource2(_wc) to use the region
> without having the guest driver for the device at all.
>
> The problem with this, since it is a PCI resource, the PCI sysfs
> reasonably enforces:
> - no caching when mapped via "resourceN" (PTE::PCD on x86) or
> - write-through when mapped via "resourceN_wc" (PTE::PWT on x86).
>
> As the result, the host writes are seen by the guest immediately
> (as the region is just a mapped file) but it takes quite some time for
> the host to see non-cached guest writes.
>
> Add a quirk to always map ivshmem's BAR2 as cacheable (==write-back) as
> ivshmem is backed by RAM anyway.
> (Re)use already defined but not used IORESOURCE_CACHEABLE flag.
>
> This does not affect other ways of mapping a PCI BAR, a driver can use
> memremap() for this functionality.
>
> Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
> ---
>
> What is this IORESOURCE_CACHEABLE for actually?
>
> Anyway, the alternatives are:
>
> 1. add a new node in sysfs - "resourceN_wb" - for mapping as writeback
> but this requires changing existing (and likely old) userspace tools;
>
> 2. fix the kernel to strictly follow /proc/mtrr (now it is rather
> a recommendation) but Documentation/arch/x86/mtrr.rst says it is replaced
> with PAT which does not seem to allow overriding caching for specific
> devices (==MMIO ranges).
>
> ---
> drivers/pci/mmap.c | 6 ++++++
> drivers/pci/quirks.c | 8 ++++++++
> 2 files changed, 14 insertions(+)
>
> diff --git a/drivers/pci/mmap.c b/drivers/pci/mmap.c
> index 8da3347a95c4..8495bee08fae 100644
> --- a/drivers/pci/mmap.c
> +++ b/drivers/pci/mmap.c
> @@ -35,6 +35,7 @@ int pci_mmap_resource_range(struct pci_dev *pdev, int bar,
> if (write_combine)
> vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
> else
> + else if (!(pci_resource_flags(pdev, bar) & IORESOURCE_CACHEABLE))
> vma->vm_page_prot = pgprot_device(vma->vm_page_prot);
>
> if (mmap_state == pci_mmap_io) {
> @@ -46,6 +47,11 @@ int pci_mmap_resource_range(struct pci_dev *pdev, int bar,
>
> vma->vm_ops = &pci_phys_vm_ops;
>
> + if (pci_resource_flags(pdev, bar) & IORESOURCE_CACHEABLE)
> + return remap_pfn_range_notrack(vma, vma->vm_start, vma->vm_pgoff,
> + vma->vm_end - vma->vm_start,
> + vma->vm_page_prot);
> +
> return io_remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff,
> vma->vm_end - vma->vm_start,
> vma->vm_page_prot);
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index d7f4ee634263..858869ec6612 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -6335,3 +6335,11 @@ static void pci_mask_replay_timer_timeout(struct pci_dev *pdev)
> DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_GLI, 0x9750, pci_mask_replay_timer_timeout);
> DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_GLI, 0x9755, pci_mask_replay_timer_timeout);
> #endif
> +
> +static void pci_ivshmem_writeback(struct pci_dev *dev)
> +{
> + struct resource *r = &dev->resource[2];
> +
> + r->flags |= IORESOURCE_CACHEABLE;
> +}
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_REDHAT_QUMRANET, 0x1110, pci_ivshmem_writeback);
--
Alexey
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC PATCH] PCI: Add quirk to always map ivshmem as write-back
2025-06-12 8:27 ` Alexey Kardashevskiy
@ 2025-06-24 1:42 ` Alexey Kardashevskiy
2025-06-25 16:28 ` Manivannan Sadhasivam
0 siblings, 1 reply; 5+ messages in thread
From: Alexey Kardashevskiy @ 2025-06-24 1:42 UTC (permalink / raw)
To: linux-pci
Cc: linux-kernel, Bjorn Helgaas, David Woodhouse, Kai-Heng Feng,
Paolo Bonzini, Sean Christopherson, Santosh Shukla,
Nikunj A. Dadhania, kvm@vger.kernel.org
Ping? Thanks,
On 12/6/25 18:27, Alexey Kardashevskiy wrote:
> Wrong email for Nikunj :) And I missed the KVM ml. Sorry for the noise.
>
>
> On 12/6/25 18:22, Alexey Kardashevskiy wrote:
>> QEMU Inter-VM Shared Memory (ivshmem) is designed to share a memory
>> region between guest and host. The host creates a file, passes it to QEMU
>> which it presents to the guest via PCI BAR#2. The guest userspace
>> can map /sys/bus/pci/devices/0000:01:02.3/resource2(_wc) to use the region
>> without having the guest driver for the device at all.
>>
>> The problem with this, since it is a PCI resource, the PCI sysfs
>> reasonably enforces:
>> - no caching when mapped via "resourceN" (PTE::PCD on x86) or
>> - write-through when mapped via "resourceN_wc" (PTE::PWT on x86).
>>
>> As the result, the host writes are seen by the guest immediately
>> (as the region is just a mapped file) but it takes quite some time for
>> the host to see non-cached guest writes.
>>
>> Add a quirk to always map ivshmem's BAR2 as cacheable (==write-back) as
>> ivshmem is backed by RAM anyway.
>> (Re)use already defined but not used IORESOURCE_CACHEABLE flag.
>>
>> This does not affect other ways of mapping a PCI BAR, a driver can use
>> memremap() for this functionality.
>>
>> Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
>> ---
>>
>> What is this IORESOURCE_CACHEABLE for actually?
>>
>> Anyway, the alternatives are:
>>
>> 1. add a new node in sysfs - "resourceN_wb" - for mapping as writeback
>> but this requires changing existing (and likely old) userspace tools;
>>
>> 2. fix the kernel to strictly follow /proc/mtrr (now it is rather
>> a recommendation) but Documentation/arch/x86/mtrr.rst says it is replaced
>> with PAT which does not seem to allow overriding caching for specific
>> devices (==MMIO ranges).
>>
>> ---
>> drivers/pci/mmap.c | 6 ++++++
>> drivers/pci/quirks.c | 8 ++++++++
>> 2 files changed, 14 insertions(+)
>>
>> diff --git a/drivers/pci/mmap.c b/drivers/pci/mmap.c
>> index 8da3347a95c4..8495bee08fae 100644
>> --- a/drivers/pci/mmap.c
>> +++ b/drivers/pci/mmap.c
>> @@ -35,6 +35,7 @@ int pci_mmap_resource_range(struct pci_dev *pdev, int bar,
>> if (write_combine)
>> vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
>> else
>> + else if (!(pci_resource_flags(pdev, bar) & IORESOURCE_CACHEABLE))
>> vma->vm_page_prot = pgprot_device(vma->vm_page_prot);
>> if (mmap_state == pci_mmap_io) {
>> @@ -46,6 +47,11 @@ int pci_mmap_resource_range(struct pci_dev *pdev, int bar,
>> vma->vm_ops = &pci_phys_vm_ops;
>> + if (pci_resource_flags(pdev, bar) & IORESOURCE_CACHEABLE)
>> + return remap_pfn_range_notrack(vma, vma->vm_start, vma->vm_pgoff,
>> + vma->vm_end - vma->vm_start,
>> + vma->vm_page_prot);
>> +
>> return io_remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff,
>> vma->vm_end - vma->vm_start,
>> vma->vm_page_prot);
>> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
>> index d7f4ee634263..858869ec6612 100644
>> --- a/drivers/pci/quirks.c
>> +++ b/drivers/pci/quirks.c
>> @@ -6335,3 +6335,11 @@ static void pci_mask_replay_timer_timeout(struct pci_dev *pdev)
>> DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_GLI, 0x9750, pci_mask_replay_timer_timeout);
>> DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_GLI, 0x9755, pci_mask_replay_timer_timeout);
>> #endif
>> +
>> +static void pci_ivshmem_writeback(struct pci_dev *dev)
>> +{
>> + struct resource *r = &dev->resource[2];
>> +
>> + r->flags |= IORESOURCE_CACHEABLE;
>> +}
>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_REDHAT_QUMRANET, 0x1110, pci_ivshmem_writeback);
>
--
Alexey
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC PATCH] PCI: Add quirk to always map ivshmem as write-back
2025-06-24 1:42 ` Alexey Kardashevskiy
@ 2025-06-25 16:28 ` Manivannan Sadhasivam
2025-07-18 1:58 ` Alexey Kardashevskiy
0 siblings, 1 reply; 5+ messages in thread
From: Manivannan Sadhasivam @ 2025-06-25 16:28 UTC (permalink / raw)
To: Alexey Kardashevskiy
Cc: linux-pci, linux-kernel, Bjorn Helgaas, David Woodhouse,
Kai-Heng Feng, Paolo Bonzini, Sean Christopherson, Santosh Shukla,
Nikunj A. Dadhania, kvm@vger.kernel.org
On Tue, Jun 24, 2025 at 11:42:47AM +1000, Alexey Kardashevskiy wrote:
> Ping? Thanks,
>
>
> On 12/6/25 18:27, Alexey Kardashevskiy wrote:
> > Wrong email for Nikunj :) And I missed the KVM ml. Sorry for the noise.
> >
> >
> > On 12/6/25 18:22, Alexey Kardashevskiy wrote:
> > > QEMU Inter-VM Shared Memory (ivshmem) is designed to share a memory
> > > region between guest and host. The host creates a file, passes it to QEMU
> > > which it presents to the guest via PCI BAR#2. The guest userspace
> > > can map /sys/bus/pci/devices/0000:01:02.3/resource2(_wc) to use the region
> > > without having the guest driver for the device at all.
> > >
> > > The problem with this, since it is a PCI resource, the PCI sysfs
> > > reasonably enforces:
> > > - no caching when mapped via "resourceN" (PTE::PCD on x86) or
> > > - write-through when mapped via "resourceN_wc" (PTE::PWT on x86).
> > >
> > > As the result, the host writes are seen by the guest immediately
> > > (as the region is just a mapped file) but it takes quite some time for
> > > the host to see non-cached guest writes.
> > >
> > > Add a quirk to always map ivshmem's BAR2 as cacheable (==write-back) as
> > > ivshmem is backed by RAM anyway.
> > > (Re)use already defined but not used IORESOURCE_CACHEABLE flag.
> > >
It just makes me nervous to change the sematics of the sysfs attribute, even if
the user knows what it is expecting. Now the "resourceN_wc" essentially becomes
"resourceN_wb", which goes against the rule of sysfs I'm afraid.
> > > This does not affect other ways of mapping a PCI BAR, a driver can use
> > > memremap() for this functionality.
> > >
> > > Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
> > > ---
> > >
> > > What is this IORESOURCE_CACHEABLE for actually?
> > >
> > > Anyway, the alternatives are:
> > >
> > > 1. add a new node in sysfs - "resourceN_wb" - for mapping as writeback
> > > but this requires changing existing (and likely old) userspace tools;
> > >
I guess this would the cleanest approach. The old tools can continue to suffer
from the performance issue and the new tools can work more faster.
- Mani
--
மணிவண்ணன் சதாசிவம்
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC PATCH] PCI: Add quirk to always map ivshmem as write-back
2025-06-25 16:28 ` Manivannan Sadhasivam
@ 2025-07-18 1:58 ` Alexey Kardashevskiy
0 siblings, 0 replies; 5+ messages in thread
From: Alexey Kardashevskiy @ 2025-07-18 1:58 UTC (permalink / raw)
To: Manivannan Sadhasivam
Cc: linux-pci, linux-kernel, Bjorn Helgaas, David Woodhouse,
Kai-Heng Feng, Paolo Bonzini, Sean Christopherson, Santosh Shukla,
Nikunj A. Dadhania, kvm@vger.kernel.org
On 26/6/25 02:28, Manivannan Sadhasivam wrote:
> On Tue, Jun 24, 2025 at 11:42:47AM +1000, Alexey Kardashevskiy wrote:
>> Ping? Thanks,
>>
>>
>> On 12/6/25 18:27, Alexey Kardashevskiy wrote:
>>> Wrong email for Nikunj :) And I missed the KVM ml. Sorry for the noise.
>>>
>>>
>>> On 12/6/25 18:22, Alexey Kardashevskiy wrote:
>>>> QEMU Inter-VM Shared Memory (ivshmem) is designed to share a memory
>>>> region between guest and host. The host creates a file, passes it to QEMU
>>>> which it presents to the guest via PCI BAR#2. The guest userspace
>>>> can map /sys/bus/pci/devices/0000:01:02.3/resource2(_wc) to use the region
>>>> without having the guest driver for the device at all.
>>>>
>>>> The problem with this, since it is a PCI resource, the PCI sysfs
>>>> reasonably enforces:
>>>> - no caching when mapped via "resourceN" (PTE::PCD on x86) or
>>>> - write-through when mapped via "resourceN_wc" (PTE::PWT on x86).
>>>>
>>>> As the result, the host writes are seen by the guest immediately
>>>> (as the region is just a mapped file) but it takes quite some time for
>>>> the host to see non-cached guest writes.
>>>>
>>>> Add a quirk to always map ivshmem's BAR2 as cacheable (==write-back) as
>>>> ivshmem is backed by RAM anyway.
>>>> (Re)use already defined but not used IORESOURCE_CACHEABLE flag.
>>>>
>
> It just makes me nervous to change the sematics of the sysfs attribute, even if
> the user knows what it is expecting.
On 1) Intel 2) without VFIO, the user already gets this semantic. Which seems... alright?
> Now the "resourceN_wc" essentially becomes
> "resourceN_wb", which goes against the rule of sysfs I'm afraid.
What is this rule?
>
>>>> This does not affect other ways of mapping a PCI BAR, a driver can use
>>>> memremap() for this functionality.
>>>>
>>>> Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
>>>> ---
>>>>
>>>> What is this IORESOURCE_CACHEABLE for actually?
>>>>
>>>> Anyway, the alternatives are:
>>>>
>>>> 1. add a new node in sysfs - "resourceN_wb" - for mapping as writeback
>>>> but this requires changing existing (and likely old) userspace tools;
>>>>
>
> I guess this would the cleanest approach. The old tools can continue to suffer
> from the performance issue and the new tools can work more faster.
Well yes but the only possible user of this is going to be ivshmem as every other cache coherent thing has a driver which can pick any sort of caching policy, and nobody will ever want a slow ivshmem because there will be no added benefit. I can send a patch if we get consensus on this though. Thanks,
--
Alexey
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-07-18 1:58 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-12 8:22 [RFC PATCH] PCI: Add quirk to always map ivshmem as write-back Alexey Kardashevskiy
2025-06-12 8:27 ` Alexey Kardashevskiy
2025-06-24 1:42 ` Alexey Kardashevskiy
2025-06-25 16:28 ` Manivannan Sadhasivam
2025-07-18 1:58 ` Alexey Kardashevskiy
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).