* [PATCH] x86: define arch_vm_get_page_prot to set _PAGE_IOMAP on VM_IO vmas
@ 2010-10-21 22:40 Jeremy Fitzhardinge
2010-10-21 22:47 ` H. Peter Anvin
0 siblings, 1 reply; 13+ messages in thread
From: Jeremy Fitzhardinge @ 2010-10-21 22:40 UTC (permalink / raw)
To: H. Peter Anvin
Cc: the arch/x86 maintainers, Xen-devel@lists.xensource.com,
Linux Kernel Mailing List, Konrad Rzeszutek Wilk
[-- Attachment #1: Type: text/plain, Size: 1581 bytes --]
Set _PAGE_IOMAP in ptes mapping a VM_IO vma. This says that the mapping
is of a real piece of physical hardware, and not just system memory.
Xen, in particular, uses to this to inhibit the normal pfn->mfn conversion
that would normally happen - in other words, treat the address directly
as a machine physical address without converting it from pseudo-physical.
[ Impact: make VM_IO mappings map the right thing under Xen ]
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 3cc06e3..4595ae2 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -394,6 +394,9 @@ static inline unsigned long pages_to_mb(unsigned long npg)
#define io_remap_pfn_range(vma, vaddr, pfn, size, prot) \
remap_pfn_range(vma, vaddr, pfn, size, prot)
+#define arch_vm_get_page_prot arch_vm_get_page_prot
+extern pgprot_t arch_vm_get_page_prot(unsigned vm_flags);
+
#if PAGETABLE_LEVELS > 2
static inline int pud_none(pud_t pud)
{
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 8e43bdd..e68aea6 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -6,6 +6,16 @@
#define PGALLOC_GFP GFP_KERNEL | __GFP_NOTRACK | __GFP_REPEAT | __GFP_ZERO
+pgprot_t arch_vm_get_page_prot(unsigned vm_flags)
+{
+ pgprot_t ret = __pgprot(0);
+
+ if (vm_flags & VM_IO)
+ ret = __pgprot(_PAGE_IOMAP);
+
+ return ret;
+}
+
pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address)
{
return (pte_t *)__get_free_page(PGALLOC_GFP);
[-- Attachment #2: define-arch_vm_get_page_prot.patch --]
[-- Type: text/plain, Size: 1830 bytes --]
From 81550b51436c282311c531d3ba7a79defd21c729 Mon Sep 17 00:00:00 2001
From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Date: Fri, 20 Feb 2009 12:58:42 -0800
Subject: [PATCH] x86: define arch_vm_get_page_prot to set _PAGE_IOMAP on VM_IO vmas
Set _PAGE_IOMAP in ptes mapping a VM_IO vma. This says that the mapping
is of a real piece of physical hardware, and not just system memory.
Xen, in particular, uses to this to inhibit the normal pfn->mfn conversion
that would normally happen - in other words, treat the address directly
as a machine physical address without converting it from pseudo-physical.
[ Impact: make VM_IO mappings map the right thing under Xen ]
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 3cc06e3..4595ae2 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -394,6 +394,9 @@ static inline unsigned long pages_to_mb(unsigned long npg)
#define io_remap_pfn_range(vma, vaddr, pfn, size, prot) \
remap_pfn_range(vma, vaddr, pfn, size, prot)
+#define arch_vm_get_page_prot arch_vm_get_page_prot
+extern pgprot_t arch_vm_get_page_prot(unsigned vm_flags);
+
#if PAGETABLE_LEVELS > 2
static inline int pud_none(pud_t pud)
{
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 8e43bdd..e68aea6 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -6,6 +6,16 @@
#define PGALLOC_GFP GFP_KERNEL | __GFP_NOTRACK | __GFP_REPEAT | __GFP_ZERO
+pgprot_t arch_vm_get_page_prot(unsigned vm_flags)
+{
+ pgprot_t ret = __pgprot(0);
+
+ if (vm_flags & VM_IO)
+ ret = __pgprot(_PAGE_IOMAP);
+
+ return ret;
+}
+
pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address)
{
return (pte_t *)__get_free_page(PGALLOC_GFP);
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH] x86: define arch_vm_get_page_prot to set _PAGE_IOMAP on VM_IO vmas
2010-10-21 22:40 [PATCH] x86: define arch_vm_get_page_prot to set _PAGE_IOMAP on VM_IO vmas Jeremy Fitzhardinge
@ 2010-10-21 22:47 ` H. Peter Anvin
2010-10-21 23:17 ` Jeremy Fitzhardinge
0 siblings, 1 reply; 13+ messages in thread
From: H. Peter Anvin @ 2010-10-21 22:47 UTC (permalink / raw)
To: Jeremy Fitzhardinge
Cc: the arch/x86 maintainers, Xen-devel@lists.xensource.com,
Linux Kernel Mailing List, Konrad Rzeszutek Wilk
On 10/21/2010 03:40 PM, Jeremy Fitzhardinge wrote:
>
> Set _PAGE_IOMAP in ptes mapping a VM_IO vma. This says that the mapping
> is of a real piece of physical hardware, and not just system memory.
>
> Xen, in particular, uses to this to inhibit the normal pfn->mfn conversion
> that would normally happen - in other words, treat the address directly
> as a machine physical address without converting it from pseudo-physical.
>
> [ Impact: make VM_IO mappings map the right thing under Xen ]
>
> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
>
Am I the only one who thinks this seems extremely odd that the guest is
trusted to make this distinction?
-hpa
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] x86: define arch_vm_get_page_prot to set _PAGE_IOMAP on VM_IO vmas
2010-10-21 22:47 ` H. Peter Anvin
@ 2010-10-21 23:17 ` Jeremy Fitzhardinge
2010-10-22 0:45 ` H. Peter Anvin
0 siblings, 1 reply; 13+ messages in thread
From: Jeremy Fitzhardinge @ 2010-10-21 23:17 UTC (permalink / raw)
To: H. Peter Anvin
Cc: the arch/x86 maintainers, Xen-devel@lists.xensource.com,
Linux Kernel Mailing List, Konrad Rzeszutek Wilk
On 10/21/2010 03:47 PM, H. Peter Anvin wrote:
> On 10/21/2010 03:40 PM, Jeremy Fitzhardinge wrote:
>> Set _PAGE_IOMAP in ptes mapping a VM_IO vma. This says that the mapping
>> is of a real piece of physical hardware, and not just system memory.
>>
>> Xen, in particular, uses to this to inhibit the normal pfn->mfn conversion
>> that would normally happen - in other words, treat the address directly
>> as a machine physical address without converting it from pseudo-physical.
>>
>> [ Impact: make VM_IO mappings map the right thing under Xen ]
>>
>> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
>>
> Am I the only one who thinks this seems extremely odd that the guest is
> trusted to make this distinction?
Xen PV guests are always responsible for constructing ptes with machine
addresses in them (ie, doing their own pseudo-physical to machine
address conversion), and Xen verifies that the pages they want to map
either belong to them or have been granted to them. The _PAGE_IOMAP
flag is a kernel-internal one which allows us to distinguish between
ptes intended to map memory vs machine hardware addresses; it is not
part of the Xen ABI.
If you're passing a device through to a domain, the domain is given
access to the device's address space so it can legally map those pages
(and if an IOMMU is available, the device is constrained to only DMA
that domain's memory).
J
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] x86: define arch_vm_get_page_prot to set _PAGE_IOMAP on VM_IO vmas
2010-10-21 23:17 ` Jeremy Fitzhardinge
@ 2010-10-22 0:45 ` H. Peter Anvin
2010-10-22 15:08 ` Konrad Rzeszutek Wilk
0 siblings, 1 reply; 13+ messages in thread
From: H. Peter Anvin @ 2010-10-22 0:45 UTC (permalink / raw)
To: Jeremy Fitzhardinge
Cc: the arch/x86 maintainers, Xen-devel@lists.xensource.com,
Linux Kernel Mailing List, Konrad Rzeszutek Wilk
On 10/21/2010 04:17 PM, Jeremy Fitzhardinge wrote:
>
> Xen PV guests are always responsible for constructing ptes with machine
> addresses in them (ie, doing their own pseudo-physical to machine
> address conversion), and Xen verifies that the pages they want to map
> either belong to them or have been granted to them. The _PAGE_IOMAP
> flag is a kernel-internal one which allows us to distinguish between
> ptes intended to map memory vs machine hardware addresses; it is not
> part of the Xen ABI.
>
> If you're passing a device through to a domain, the domain is given
> access to the device's address space so it can legally map those pages
> (and if an IOMMU is available, the device is constrained to only DMA
> that domain's memory).
>
Okay, could you clarify this part a bit? Why does the kernel need to
know the difference between "pseudo-physical" and "machine addresses" at
all? If they overlap, there is a problem, and if they don't overlap, it
will be a 1:1 mapping anyway...
-hpa
--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] x86: define arch_vm_get_page_prot to set _PAGE_IOMAP on VM_IO vmas
2010-10-22 0:45 ` H. Peter Anvin
@ 2010-10-22 15:08 ` Konrad Rzeszutek Wilk
2010-10-22 16:44 ` H. Peter Anvin
0 siblings, 1 reply; 13+ messages in thread
From: Konrad Rzeszutek Wilk @ 2010-10-22 15:08 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Jeremy Fitzhardinge, the arch/x86 maintainers,
Xen-devel@lists.xensource.com, Linux Kernel Mailing List
On Thu, Oct 21, 2010 at 05:45:44PM -0700, H. Peter Anvin wrote:
> On 10/21/2010 04:17 PM, Jeremy Fitzhardinge wrote:
> >
> > Xen PV guests are always responsible for constructing ptes with machine
> > addresses in them (ie, doing their own pseudo-physical to machine
> > address conversion), and Xen verifies that the pages they want to map
> > either belong to them or have been granted to them. The _PAGE_IOMAP
> > flag is a kernel-internal one which allows us to distinguish between
> > ptes intended to map memory vs machine hardware addresses; it is not
> > part of the Xen ABI.
> >
> > If you're passing a device through to a domain, the domain is given
> > access to the device's address space so it can legally map those pages
> > (and if an IOMMU is available, the device is constrained to only DMA
> > that domain's memory).
> >
>
> Okay, could you clarify this part a bit? Why does the kernel need to
> know the difference between "pseudo-physical" and "machine addresses" at
> all? If they overlap, there is a problem, and if they don't overlap, it
> will be a 1:1 mapping anyway...
The flag (_PAGE_IOMAP) is used when we set the PTE so that the MFN value is
used instead of the PFN. We need that b/c when a driver does page_to_pfn()
it ends up using the PFN as bus address to write out registers data.
Without this patch, the page->virt->PFN value is used and the PFN != to real MFN
so we end up writing in a memory address that the PCI device has no idea about.
By setting the PTE with the MFN, the virt->PFN gets the real MFN value.
The drivers I am talking about are mostly, if not all, located in drivers/gpu
and it looks that we are missing two more patches to utilize the patch
that Jeremy posted.
Please note that I am _not_ suggesting that the two patches
below should go out - I still need to post them on drm mailing list.
commit 1b5a6e09831f44445cdb96f29c287e18d0317136
Author: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Date: Fri Oct 2 09:49:05 2009 -0700
drm: recompute vma->vm_page_prot after changing vm_flags
vm_get_page_prot() computes vm_page_prot depending on vm_flags, so
we need to re-call it if we change flags.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c
index 1c040d0..3dc8d6b 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
@@ -272,6 +272,7 @@ int ttm_bo_mmap(struct file *filp, struct vm_area_struct *vma,
vma->vm_private_data = bo;
vma->vm_flags |= VM_RESERVED | VM_IO | VM_MIXEDMAP | VM_DONTEXPAND;
+ vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
return 0;
out_unref:
ttm_bo_unref(&bo);
@@ -287,6 +288,7 @@ int ttm_fbdev_mmap(struct vm_area_struct *vma, struct ttm_buffer_object *bo)
vma->vm_ops = &ttm_bo_vm_ops;
vma->vm_private_data = ttm_bo_reference(bo);
vma->vm_flags |= VM_RESERVED | VM_IO | VM_MIXEDMAP | VM_DONTEXPAND;
+ vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
return 0;
}
EXPORT_SYMBOL(ttm_fbdev_mmap);
commit 9551827190db2d34f106255f0b5bfdc452e85cd8
Author: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Date: Mon Jul 26 14:15:55 2010 -0400
ttm: Set VM_IO only on pages with TTM_MEMTYPE_FLAG_NEEDS_IOREMAP set.
This patch is based on "[Patch RFC] ttm: nouveau accelerated on Xen
pv-ops kernel"
http://lists.freedesktop.org/archives/nouveau/2010-March/005326.html
Under Xen, the PFN of page is virtualized. The physical addresses used
for DMA programming needs to be the Machine Frame Number (MFN).
Xen transparently does the correct translation using the _PAGE_IOMEM
PTE bit. If the bit is set, Xen assumes that the backing memory is in
the IOMEM space, and PFN equals MFN. If not set, page_to_pfn() returns
a phantom MFN.
The patch enables the ttm_bo_vm_fault() handler to behave correctly
under Xen, and has no side-effects on normal (not under Xen) operations.
The use of TTM_MEMTYPE_FLAG_NEEDS_IOREMAP in the check assumes that
only pages which have this flag are backed by device memory or IO.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Arvind R <arvino55@gmail.com>
diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c
index 3dc8d6b..ea3b8fb 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
@@ -239,6 +239,7 @@ int ttm_bo_mmap(struct file *filp, struct vm_area_struct *vma,
{
struct ttm_bo_driver *driver;
struct ttm_buffer_object *bo;
+ struct ttm_mem_type_manager *man;
int ret;
read_lock(&bdev->vm_lock);
@@ -271,7 +272,10 @@ int ttm_bo_mmap(struct file *filp, struct vm_area_struct *vma,
*/
vma->vm_private_data = bo;
- vma->vm_flags |= VM_RESERVED | VM_IO | VM_MIXEDMAP | VM_DONTEXPAND;
+ vma->vm_flags |= VM_RESERVED | VM_MIXEDMAP | VM_DONTEXPAND;
+ man = &bdev->man[bo->mem.mem_type];
+ if (man->flags & TTM_MEMTYPE_FLAG_NEEDS_IOREMAP)
+ vma->vm_flags |= VM_IO;
vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
return 0;
out_unref:
@@ -288,7 +292,6 @@ int ttm_fbdev_mmap(struct vm_area_struct *vma, struct ttm_buffer_object *bo)
vma->vm_ops = &ttm_bo_vm_ops;
vma->vm_private_data = ttm_bo_reference(bo);
vma->vm_flags |= VM_RESERVED | VM_IO | VM_MIXEDMAP | VM_DONTEXPAND;
- vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
return 0;
}
EXPORT_SYMBOL(ttm_fbdev_mmap);
>
> -hpa
>
> --
> H. Peter Anvin, Intel Open Source Technology Center
> I work for Intel. I don't speak on their behalf.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH] x86: define arch_vm_get_page_prot to set _PAGE_IOMAP on VM_IO vmas
2010-10-22 15:08 ` Konrad Rzeszutek Wilk
@ 2010-10-22 16:44 ` H. Peter Anvin
2010-10-22 18:02 ` Konrad Rzeszutek Wilk
2010-10-22 19:06 ` Jeremy Fitzhardinge
0 siblings, 2 replies; 13+ messages in thread
From: H. Peter Anvin @ 2010-10-22 16:44 UTC (permalink / raw)
To: Konrad Rzeszutek Wilk
Cc: Jeremy Fitzhardinge, the arch/x86 maintainers,
Xen-devel@lists.xensource.com, Linux Kernel Mailing List
On 10/22/2010 08:08 AM, Konrad Rzeszutek Wilk wrote:
>>
>> Okay, could you clarify this part a bit? Why does the kernel need to
>> know the difference between "pseudo-physical" and "machine addresses" at
>> all? If they overlap, there is a problem, and if they don't overlap, it
>> will be a 1:1 mapping anyway...
>
> The flag (_PAGE_IOMAP) is used when we set the PTE so that the MFN value is
> used instead of the PFN. We need that b/c when a driver does page_to_pfn()
> it ends up using the PFN as bus address to write out registers data.
>
> Without this patch, the page->virt->PFN value is used and the PFN != to real MFN
> so we end up writing in a memory address that the PCI device has no idea about.
> By setting the PTE with the MFN, the virt->PFN gets the real MFN value.
>
> The drivers I am talking about are mostly, if not all, located in drivers/gpu
> and it looks that we are missing two more patches to utilize the patch
> that Jeremy posted.
>
> Please note that I am _not_ suggesting that the two patches
> below should go out - I still need to post them on drm mailing list.
>
I'm still seriously confused. If I understand this correctly, we're
talking about DMA addresses here (as opposed to PIO addresses, i.e.
BARs), right?
It's the bimodality that really bothers me. I understand of course that
Xen imposes yet another address remapping layer, but I'm having a hard
time understanding any conditions under with we would need that layer to
go away, as long as DMA addresses are translated via the DMA APIs -- and
if they aren't, then iommus will break, too.
As such, I don't grok this page flag and what it does, and why it's
needed. I'm not saying it's *wrong*, I'm saying the design is opaque to
me and I'm not sure it is the right solution.
-hpa
--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] x86: define arch_vm_get_page_prot to set _PAGE_IOMAP on VM_IO vmas
2010-10-22 16:44 ` H. Peter Anvin
@ 2010-10-22 18:02 ` Konrad Rzeszutek Wilk
2010-10-22 18:05 ` Konrad Rzeszutek Wilk
2010-10-22 19:06 ` H. Peter Anvin
2010-10-22 19:06 ` Jeremy Fitzhardinge
1 sibling, 2 replies; 13+ messages in thread
From: Konrad Rzeszutek Wilk @ 2010-10-22 18:02 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Jeremy Fitzhardinge, the arch/x86 maintainers,
Xen-devel@lists.xensource.com, Linux Kernel Mailing List
On Fri, Oct 22, 2010 at 09:44:08AM -0700, H. Peter Anvin wrote:
> On 10/22/2010 08:08 AM, Konrad Rzeszutek Wilk wrote:
> >>
> >> Okay, could you clarify this part a bit? Why does the kernel need to
> >> know the difference between "pseudo-physical" and "machine addresses" at
> >> all? If they overlap, there is a problem, and if they don't overlap, it
> >> will be a 1:1 mapping anyway...
> >
> > The flag (_PAGE_IOMAP) is used when we set the PTE so that the MFN value is
> > used instead of the PFN. We need that b/c when a driver does page_to_pfn()
> > it ends up using the PFN as bus address to write out registers data.
> >
> > Without this patch, the page->virt->PFN value is used and the PFN != to real MFN
> > so we end up writing in a memory address that the PCI device has no idea about.
> > By setting the PTE with the MFN, the virt->PFN gets the real MFN value.
> >
> > The drivers I am talking about are mostly, if not all, located in drivers/gpu
> > and it looks that we are missing two more patches to utilize the patch
> > that Jeremy posted.
> >
> > Please note that I am _not_ suggesting that the two patches
> > below should go out - I still need to post them on drm mailing list.
> >
>
> I'm still seriously confused. If I understand this correctly, we're
> talking about DMA addresses here (as opposed to PIO addresses, i.e.
> BARs), right?
Correct. The BARs are ok since they go through the ioremap.
>
> It's the bimodality that really bothers me. I understand of course that
> Xen imposes yet another address remapping layer, but I'm having a hard
> time understanding any conditions under with we would need that layer to
> go away, as long as DMA addresses are translated via the DMA APIs -- and
> if they aren't, then iommus will break, too.
That is it. They aren't using the DMA or PCI API completly(*). Try doing
'iommu=soft swiotlb=force' with your radeon card under baremetal
(I used an ATI ES1000). I think it will grind to halt during the writeback test.
(*): This was with 2.6.34, I haven't touched 2.6.36 and there was an drm/iomem rewrite
so it might be that this now working. The incomplete part of the graphics
drivers was that it would not do pci_dma_sync_*, so when the MFN was programmed in the
GTT/GART (check out radeon_gart_bind, the call to pci_map_page gets the bus address, also
known as MFN). So the GPU would now have a virt->MFN mapping. However, on the CPU side
when the driver writes a texture to virtual address, the mapping is virt->PFN.
So when we kick the GPU to do its magic, the VM on the graphics card would translate
the virtual address to the MFN, which did not have the data that was written by the
kernel to the PFN. In other words *PFN != *MFN, while we need *PFN == *MFN.
There are two ways of making this work:
1). PFN == MFN (this is what Jeremy's patch ends up doing) and under
baremetal it won't affect as baremetal doesn't care what the VM_IO flag
stands for.
2). Add a whole bunch of pci_dma_sync in the appropiate sections in the
graphic drivers.
I am not qualified to do 2) - that code scares me. Also 1) is the easier :-)
I am actually not sure how it works with AMD-Vi or Intel VT-d. I do remember
something about letting certain devices bypass the VT-d, and I think I saw
the nouveau making the DMAR throw a fit.
> As such, I don't grok this page flag and what it does, and why it's
> needed. I'm not saying it's *wrong*, I'm saying the design is opaque to
I hope my explanation cleared the confusion.
> me and I'm not sure it is the right solution.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] x86: define arch_vm_get_page_prot to set _PAGE_IOMAP on VM_IO vmas
2010-10-22 18:02 ` Konrad Rzeszutek Wilk
@ 2010-10-22 18:05 ` Konrad Rzeszutek Wilk
2010-10-22 19:06 ` H. Peter Anvin
1 sibling, 0 replies; 13+ messages in thread
From: Konrad Rzeszutek Wilk @ 2010-10-22 18:05 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Jeremy Fitzhardinge, the arch/x86 maintainers,
Xen-devel@lists.xensource.com, Linux Kernel Mailing List
On Fri, Oct 22, 2010 at 02:02:31PM -0400, Konrad Rzeszutek Wilk wrote:
> On Fri, Oct 22, 2010 at 09:44:08AM -0700, H. Peter Anvin wrote:
> > On 10/22/2010 08:08 AM, Konrad Rzeszutek Wilk wrote:
> > >>
> > >> Okay, could you clarify this part a bit? Why does the kernel need to
> > >> know the difference between "pseudo-physical" and "machine addresses" at
> > >> all? If they overlap, there is a problem, and if they don't overlap, it
> > >> will be a 1:1 mapping anyway...
> > >
> > > The flag (_PAGE_IOMAP) is used when we set the PTE so that the MFN value is
> > > used instead of the PFN. We need that b/c when a driver does page_to_pfn()
> > > it ends up using the PFN as bus address to write out registers data.
> > >
> > > Without this patch, the page->virt->PFN value is used and the PFN != to real MFN
> > > so we end up writing in a memory address that the PCI device has no idea about.
> > > By setting the PTE with the MFN, the virt->PFN gets the real MFN value.
> > >
> > > The drivers I am talking about are mostly, if not all, located in drivers/gpu
> > > and it looks that we are missing two more patches to utilize the patch
> > > that Jeremy posted.
> > >
> > > Please note that I am _not_ suggesting that the two patches
> > > below should go out - I still need to post them on drm mailing list.
> > >
> >
> > I'm still seriously confused. If I understand this correctly, we're
> > talking about DMA addresses here (as opposed to PIO addresses, i.e.
> > BARs), right?
>
> Correct. The BARs are ok since they go through the ioremap.
Whoops. I answered that incorrectly. No, we are not talking about
BARs. The BARs are ok (look at x86/PCI: make sure _PAGE_IOMAP it set on pci mappings
patch, http://amailbox.org/mailarchive/linux-kernel/2010/10/12/4630930/thread)
We are talking about the GPU's VM engine (or the GART on the Northbridge).
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] x86: define arch_vm_get_page_prot to set _PAGE_IOMAP on VM_IO vmas
2010-10-22 16:44 ` H. Peter Anvin
2010-10-22 18:02 ` Konrad Rzeszutek Wilk
@ 2010-10-22 19:06 ` Jeremy Fitzhardinge
2010-10-22 19:20 ` H. Peter Anvin
1 sibling, 1 reply; 13+ messages in thread
From: Jeremy Fitzhardinge @ 2010-10-22 19:06 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Konrad Rzeszutek Wilk, the arch/x86 maintainers,
Xen-devel@lists.xensource.com, Linux Kernel Mailing List
On 10/22/2010 09:44 AM, H. Peter Anvin wrote:
> On 10/22/2010 08:08 AM, Konrad Rzeszutek Wilk wrote:
>>> Okay, could you clarify this part a bit? Why does the kernel need to
>>> know the difference between "pseudo-physical" and "machine addresses" at
>>> all? If they overlap, there is a problem, and if they don't overlap, it
>>> will be a 1:1 mapping anyway...
>> The flag (_PAGE_IOMAP) is used when we set the PTE so that the MFN value is
>> used instead of the PFN. We need that b/c when a driver does page_to_pfn()
>> it ends up using the PFN as bus address to write out registers data.
>>
>> Without this patch, the page->virt->PFN value is used and the PFN != to real MFN
>> so we end up writing in a memory address that the PCI device has no idea about.
>> By setting the PTE with the MFN, the virt->PFN gets the real MFN value.
>>
>> The drivers I am talking about are mostly, if not all, located in drivers/gpu
>> and it looks that we are missing two more patches to utilize the patch
>> that Jeremy posted.
>>
>> Please note that I am _not_ suggesting that the two patches
>> below should go out - I still need to post them on drm mailing list.
>>
> I'm still seriously confused. If I understand this correctly, we're
> talking about DMA addresses here (as opposed to PIO addresses, i.e.
> BARs), right?
>
> It's the bimodality that really bothers me. I understand of course that
> Xen imposes yet another address remapping layer, but I'm having a hard
> time understanding any conditions under with we would need that layer to
> go away, as long as DMA addresses are translated via the DMA APIs -- and
> if they aren't, then iommus will break, too.
>
> As such, I don't grok this page flag and what it does, and why it's
> needed. I'm not saying it's *wrong*, I'm saying the design is opaque to
> me and I'm not sure it is the right solution.
Well, if you want to map a normal memory page, you'd use, say,
pfn_pte(pfn, PAGE_KERNEL) to generate the pte. The pfn is a
domain-local pseudo-physical address. When it ends up in
xen_make_pte(), it will translate the the pfn into a machine-global mfn
to generate a pte_t which can be inserted into a pagetable. (And when
that pagetable starts being used as such, Xen will validate that the mfn
is actually one the domain is allowed to address.)
However, if you're doing an ioremap(), then the mapped address is a
hardware one. In that case, we construct the pte with
pfn_pte(device_pfn, PAGE_KERNEL_IO), which sets the _PAGE_IOMAP flag in
the pte flags. When it gets to xen_make_pte(), it sees _PAGE_IOMAP and
constructs a pte_t containing the literal untranslated device_pfn
(really an mfn). (And again, Xen will check that the domain has access
to that mfn before allowing the mapping to be used.)
We use the DMA API to make sure that pfn<->mfn conversions happen
correctly when setting up DMA (all the Xen swiotlb stuff Konrad has been
working on). _PAGE_IOMAP is used in the implementation of that, as well
as in ioremap(), remap_pfn_range(), drm, etc.
All this machinery has been in the kernel for quite a while. This
particular patch fixes gap, making sure that a vma with VM_IO set will
be mapped with _PAGE_IOMAP set, which makes a large class of drm, fbdev,
capture, etc device drivers work properly under Xen unmodified. (Though
DRM is full of other pitfalls and Konrad has been up to his neck in
piranhas lately, which is why is answer is a little fixated on the DRM
aspect.)
J
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] x86: define arch_vm_get_page_prot to set _PAGE_IOMAP on VM_IO vmas
2010-10-22 18:02 ` Konrad Rzeszutek Wilk
2010-10-22 18:05 ` Konrad Rzeszutek Wilk
@ 2010-10-22 19:06 ` H. Peter Anvin
2010-10-22 19:11 ` [Xen-devel] " Konrad Rzeszutek Wilk
1 sibling, 1 reply; 13+ messages in thread
From: H. Peter Anvin @ 2010-10-22 19:06 UTC (permalink / raw)
To: Konrad Rzeszutek Wilk
Cc: Jeremy Fitzhardinge, the arch/x86 maintainers,
Xen-devel@lists.xensource.com, Linux Kernel Mailing List
On 10/22/2010 11:02 AM, Konrad Rzeszutek Wilk wrote:
>
> I hope my explanation cleared the confusion.
>
>> me and I'm not sure it is the right solution.
OK, now I'm *positive* this isn't the right solution...
-hpa
--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Xen-devel] Re: [PATCH] x86: define arch_vm_get_page_prot to set _PAGE_IOMAP on VM_IO vmas
2010-10-22 19:06 ` H. Peter Anvin
@ 2010-10-22 19:11 ` Konrad Rzeszutek Wilk
0 siblings, 0 replies; 13+ messages in thread
From: Konrad Rzeszutek Wilk @ 2010-10-22 19:11 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Jeremy Fitzhardinge, the arch/x86 maintainers,
Xen-devel@lists.xensource.com, Linux Kernel Mailing List
> > I hope my explanation cleared the confusion.
> >
> >> me and I'm not sure it is the right solution.
>
> OK, now I'm *positive* this isn't the right solution...
Uhhh.. could you elaborate on a right solution?
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] x86: define arch_vm_get_page_prot to set _PAGE_IOMAP on VM_IO vmas
2010-10-22 19:06 ` Jeremy Fitzhardinge
@ 2010-10-22 19:20 ` H. Peter Anvin
2010-10-22 19:36 ` Jeremy Fitzhardinge
0 siblings, 1 reply; 13+ messages in thread
From: H. Peter Anvin @ 2010-10-22 19:20 UTC (permalink / raw)
To: Jeremy Fitzhardinge
Cc: Konrad Rzeszutek Wilk, the arch/x86 maintainers,
Xen-devel@lists.xensource.com, Linux Kernel Mailing List
On 10/22/2010 12:06 PM, Jeremy Fitzhardinge wrote:
>
> Well, if you want to map a normal memory page, you'd use, say,
> pfn_pte(pfn, PAGE_KERNEL) to generate the pte. The pfn is a
> domain-local pseudo-physical address. When it ends up in
> xen_make_pte(), it will translate the the pfn into a machine-global mfn
> to generate a pte_t which can be inserted into a pagetable. (And when
> that pagetable starts being used as such, Xen will validate that the mfn
> is actually one the domain is allowed to address.)
>
> However, if you're doing an ioremap(), then the mapped address is a
> hardware one. In that case, we construct the pte with
> pfn_pte(device_pfn, PAGE_KERNEL_IO), which sets the _PAGE_IOMAP flag in
> the pte flags. When it gets to xen_make_pte(), it sees _PAGE_IOMAP and
> constructs a pte_t containing the literal untranslated device_pfn
> (really an mfn). (And again, Xen will check that the domain has access
> to that mfn before allowing the mapping to be used.)
>
When you're doing an ioremap(), then the mapped address is *both* a PFN
and an MFN, right? So why do your need a flag? That is the part I
don't get...
-hpa
--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] x86: define arch_vm_get_page_prot to set _PAGE_IOMAP on VM_IO vmas
2010-10-22 19:20 ` H. Peter Anvin
@ 2010-10-22 19:36 ` Jeremy Fitzhardinge
0 siblings, 0 replies; 13+ messages in thread
From: Jeremy Fitzhardinge @ 2010-10-22 19:36 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Konrad Rzeszutek Wilk, the arch/x86 maintainers,
Xen-devel@lists.xensource.com, Linux Kernel Mailing List
On 10/22/2010 12:20 PM, H. Peter Anvin wrote:
> On 10/22/2010 12:06 PM, Jeremy Fitzhardinge wrote:
>> Well, if you want to map a normal memory page, you'd use, say,
>> pfn_pte(pfn, PAGE_KERNEL) to generate the pte. The pfn is a
>> domain-local pseudo-physical address. When it ends up in
>> xen_make_pte(), it will translate the the pfn into a machine-global mfn
>> to generate a pte_t which can be inserted into a pagetable. (And when
>> that pagetable starts being used as such, Xen will validate that the mfn
>> is actually one the domain is allowed to address.)
>>
>> However, if you're doing an ioremap(), then the mapped address is a
>> hardware one. In that case, we construct the pte with
>> pfn_pte(device_pfn, PAGE_KERNEL_IO), which sets the _PAGE_IOMAP flag in
>> the pte flags. When it gets to xen_make_pte(), it sees _PAGE_IOMAP and
>> constructs a pte_t containing the literal untranslated device_pfn
>> (really an mfn). (And again, Xen will check that the domain has access
>> to that mfn before allowing the mapping to be used.)
>>
> When you're doing an ioremap(), then the mapped address is *both* a PFN
> and an MFN, right? So why do your need a flag? That is the part I
> don't get...
Xen always needs an mfn, so for memory the pfn needs to be converted to
an mfn, but for devices the frame number is already an mfn. You could
look at it as having two distinct frame number address spaces, and the
_PAGE_IOMAP flag tells the lower levels which address space the frame
number is in. Both memory and device mappings use the same interface -
__pte() - to convert a (pfn,prot) tuple into a pte_t, so we use a bit in
prot to distinguish them.
In the kernel, the places where device mappings vs memory mappings are
quite well delineated. There used to be a few places which could be
either depending on context, but they've since been cleaned up by other
efforts (the PAT/pageattr work did a lot of them, I think).
J
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2010-10-22 19:36 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-10-21 22:40 [PATCH] x86: define arch_vm_get_page_prot to set _PAGE_IOMAP on VM_IO vmas Jeremy Fitzhardinge
2010-10-21 22:47 ` H. Peter Anvin
2010-10-21 23:17 ` Jeremy Fitzhardinge
2010-10-22 0:45 ` H. Peter Anvin
2010-10-22 15:08 ` Konrad Rzeszutek Wilk
2010-10-22 16:44 ` H. Peter Anvin
2010-10-22 18:02 ` Konrad Rzeszutek Wilk
2010-10-22 18:05 ` Konrad Rzeszutek Wilk
2010-10-22 19:06 ` H. Peter Anvin
2010-10-22 19:11 ` [Xen-devel] " Konrad Rzeszutek Wilk
2010-10-22 19:06 ` Jeremy Fitzhardinge
2010-10-22 19:20 ` H. Peter Anvin
2010-10-22 19:36 ` Jeremy Fitzhardinge
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox