Re: [PATCH] x86: define arch_vm_get_page_prot to set _PAGE_IOMAP on VM_IO vmas

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To: "H. Peter Anvin" <hpa@zytor.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>,
	the arch/x86 maintainers <x86@kernel.org>,
	"Xen-devel@lists.xensource.com" <Xen-devel@lists.xensource.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] x86: define arch_vm_get_page_prot to set _PAGE_IOMAP on VM_IO vmas
Date: Fri, 22 Oct 2010 14:02:31 -0400	[thread overview]
Message-ID: <20101022180231.GA26365@dumpdata.com> (raw)
In-Reply-To: <4CC1BF58.9020001@zytor.com>

On Fri, Oct 22, 2010 at 09:44:08AM -0700, H. Peter Anvin wrote:
> On 10/22/2010 08:08 AM, Konrad Rzeszutek Wilk wrote:
> >>
> >> Okay, could you clarify this part a bit?  Why does the kernel need to
> >> know the difference between "pseudo-physical" and "machine addresses" at
> >> all?  If they overlap, there is a problem, and if they don't overlap, it
> >> will be a 1:1 mapping anyway...
> > 
> > The flag (_PAGE_IOMAP) is used when we set the PTE so that the MFN value is
> > used instead of the PFN. We need that b/c when a driver does page_to_pfn()
> > it ends up using the PFN as bus address to write out registers data.
> > 
> > Without this patch, the page->virt->PFN value is used and the PFN != to real MFN
> > so we end up writing in a memory address that the PCI device has no idea about.
> > By setting the PTE with the MFN, the virt->PFN gets the real MFN value.
> > 
> > The drivers I am talking about are mostly, if not all, located in drivers/gpu
> > and it looks that we are missing two more patches to utilize the patch
> > that Jeremy posted.
> > 
> > Please note that I am _not_ suggesting that the two patches
> > below should go out - I still need to post them on drm mailing list.
> > 
> 
> I'm still seriously confused.  If I understand this correctly, we're
> talking about DMA addresses here (as opposed to PIO addresses, i.e.
> BARs), right?

Correct. The BARs are ok since they go through the ioremap.
> 
> It's the bimodality that really bothers me.  I understand of course that
> Xen imposes yet another address remapping layer, but I'm having a hard
> time understanding any conditions under with we would need that layer to
> go away, as long as DMA addresses are translated via the DMA APIs -- and
> if they aren't, then iommus will break, too.

That is it. They aren't using the DMA or PCI API completly(*).  Try doing
'iommu=soft swiotlb=force' with your radeon card under baremetal
(I used an ATI ES1000).  I think it will grind to halt during the writeback test.

(*): This was with 2.6.34, I haven't touched 2.6.36 and there was an drm/iomem rewrite
so it might be that this now working. The incomplete part of the graphics
drivers was that it would not do pci_dma_sync_*, so when the MFN was programmed in the
GTT/GART (check out radeon_gart_bind, the call to pci_map_page gets the bus address, also
known as MFN). So the GPU would now have a virt->MFN mapping. However, on the CPU side
when the driver writes a texture to virtual address, the mapping is virt->PFN.

So when we kick the GPU to do its magic, the VM on the graphics card would translate
the virtual address to the MFN, which did not have the data that was written by the
kernel to the PFN. In other words *PFN != *MFN, while we need *PFN == *MFN.
There are two ways of making this work:
 1). PFN == MFN (this is what Jeremy's patch ends up doing) and under
     baremetal it won't affect as baremetal doesn't care what the VM_IO flag
     stands for.
 2). Add a whole bunch of pci_dma_sync in the appropiate sections in the 
     graphic drivers.

I am not qualified to do 2) - that code scares me. Also 1) is the easier :-)

I am actually not sure how it works with AMD-Vi or Intel VT-d. I do remember
something about letting certain devices bypass the VT-d, and I think I saw
the nouveau making the DMAR throw a fit.

> As such, I don't grok this page flag and what it does, and why it's
> needed.  I'm not saying it's *wrong*, I'm saying the design is opaque to

I hope my explanation cleared the confusion.

> me and I'm not sure it is the right solution.

next prev parent reply	other threads:[~2010-10-22 18:04 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-10-21 22:40 [PATCH] x86: define arch_vm_get_page_prot to set _PAGE_IOMAP on VM_IO vmas Jeremy Fitzhardinge
2010-10-21 22:47 ` H. Peter Anvin
2010-10-21 23:17   ` Jeremy Fitzhardinge
2010-10-22  0:45     ` H. Peter Anvin
2010-10-22 15:08       ` Konrad Rzeszutek Wilk
2010-10-22 16:44         ` H. Peter Anvin
2010-10-22 18:02           ` Konrad Rzeszutek Wilk [this message]
2010-10-22 18:05             ` Konrad Rzeszutek Wilk
2010-10-22 19:06             ` H. Peter Anvin
2010-10-22 19:11               ` [Xen-devel] " Konrad Rzeszutek Wilk
2010-10-22 19:06           ` Jeremy Fitzhardinge
2010-10-22 19:20             ` H. Peter Anvin
2010-10-22 19:36               ` Jeremy Fitzhardinge

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101022180231.GA26365@dumpdata.com \
    --to=konrad.wilk@oracle.com \
    --cc=Xen-devel@lists.xensource.com \
    --cc=hpa@zytor.com \
    --cc=jeremy@goop.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox