linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* chasing the four level page table
@ 2005-01-06 17:17 Jon Smirl
  2005-01-06 18:12 ` Andi Kleen
  0 siblings, 1 reply; 11+ messages in thread
From: Jon Smirl @ 2005-01-06 17:17 UTC (permalink / raw)
  To: lkml

The DRM driver contains this routine:

drivers/char/drm/drm_memory.h

static inline unsigned long
drm_follow_page (void *vaddr)
{
	pgd_t *pgd = pgd_offset_k((unsigned long) vaddr);
	pud_t *pud = pud_offset(pgd, (unsigned long) vaddr);
	pmd_t *pmd = pmd_offset(pud, (unsigned long) vaddr);
	pte_t *ptep = pte_offset_kernel(pmd, (unsigned long) vaddr);
	return pte_pfn(*ptep) << PAGE_SHIFT;
}

No other driver needs to chase the page table like this so there is
probably some other way to achieve this. Can someone who knows more
about the VM system tell me if there is a way to eliminate this code?

If there are any VM/AGP experts with some free time, drm_memory.h
could use some rewriting to make it pass sparse checks.

-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: chasing the four level page table
  2005-01-06 17:17 chasing the four level page table Jon Smirl
@ 2005-01-06 18:12 ` Andi Kleen
  2005-01-06 18:36   ` Jon Smirl
  0 siblings, 1 reply; 11+ messages in thread
From: Andi Kleen @ 2005-01-06 18:12 UTC (permalink / raw)
  To: Jon Smirl; +Cc: linux-kernel

Jon Smirl <jonsmirl@gmail.com> writes:

> The DRM driver contains this routine:
>
> drivers/char/drm/drm_memory.h
>
> static inline unsigned long
> drm_follow_page (void *vaddr)
> {
> 	pgd_t *pgd = pgd_offset_k((unsigned long) vaddr);
> 	pud_t *pud = pud_offset(pgd, (unsigned long) vaddr);
> 	pmd_t *pmd = pmd_offset(pud, (unsigned long) vaddr);
> 	pte_t *ptep = pte_offset_kernel(pmd, (unsigned long) vaddr);
> 	return pte_pfn(*ptep) << PAGE_SHIFT;
> }
>
> No other driver needs to chase the page table like this so there is
> probably some other way to achieve this. Can someone who knows more
> about the VM system tell me if there is a way to eliminate this code?

Yes, you should use get_user_pages() instead if you access real memory.
If you try to find hardware mappings using that there is no ready
function for you right now, although I guess it could be added.

The function is also not quite correct, it should already least take
the page_table_lock (depending on where you call it from) and check
p*_none() on each level.

-Andi

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: chasing the four level page table
  2005-01-06 18:12 ` Andi Kleen
@ 2005-01-06 18:36   ` Jon Smirl
  2005-01-06 19:38     ` Andi Kleen
  0 siblings, 1 reply; 11+ messages in thread
From: Jon Smirl @ 2005-01-06 18:36 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel, DRI Devel

On Thu, 06 Jan 2005 19:12:15 +0100, Andi Kleen <ak@muc.de> wrote:
> Yes, you should use get_user_pages() instead if you access real memory.
> If you try to find hardware mappings using that there is no ready
> function for you right now, although I guess it could be added.

drm_follow_page is used like this:

offset = drm_follow_page(pt) | ((unsigned long) pt & ~PAGE_MASK);
map = drm_lookup_map(offset, size, dev);
if (map && map->type == _DRM_AGP) {
	vunmap(pt);
	return;
}

I think pt is a user space address. In DRM AGP memory is mapped into
kernel and user space so the user space address is being converted
into a kernel space one. The kernel space one is used to verify that
the address is a valid mapping to AGP space, if so the page is
unmapped.  I didn't write this code so I'm not 100% sure of what is
going on.

On Thu, 06 Jan 2005 19:12:15 +0100, Andi Kleen <ak@muc.de> wrote:
> Jon Smirl <jonsmirl@gmail.com> writes:
> 
> > The DRM driver contains this routine:
> >
> > drivers/char/drm/drm_memory.h
> >
> > static inline unsigned long
> > drm_follow_page (void *vaddr)
> > {
> >       pgd_t *pgd = pgd_offset_k((unsigned long) vaddr);
> >       pud_t *pud = pud_offset(pgd, (unsigned long) vaddr);
> >       pmd_t *pmd = pmd_offset(pud, (unsigned long) vaddr);
> >       pte_t *ptep = pte_offset_kernel(pmd, (unsigned long) vaddr);
> >       return pte_pfn(*ptep) << PAGE_SHIFT;
> > }
> >
> > No other driver needs to chase the page table like this so there is
> > probably some other way to achieve this. Can someone who knows more
> > about the VM system tell me if there is a way to eliminate this code?
> 
> Yes, you should use get_user_pages() instead if you access real memory.
> If you try to find hardware mappings using that there is no ready
> function for you right now, although I guess it could be added.
> 
> The function is also not quite correct, it should already least take
> the page_table_lock (depending on where you call it from) and check
> p*_none() on each level.
> 
> -Andi
> 

-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: chasing the four level page table
  2005-01-06 18:36   ` Jon Smirl
@ 2005-01-06 19:38     ` Andi Kleen
  2005-01-06 20:05       ` Jon Smirl
  0 siblings, 1 reply; 11+ messages in thread
From: Andi Kleen @ 2005-01-06 19:38 UTC (permalink / raw)
  To: Jon Smirl; +Cc: linux-kernel, DRI Devel

On Thu, Jan 06, 2005 at 01:36:46PM -0500, Jon Smirl wrote:
> On Thu, 06 Jan 2005 19:12:15 +0100, Andi Kleen <ak@muc.de> wrote:
> > Yes, you should use get_user_pages() instead if you access real memory.
> > If you try to find hardware mappings using that there is no ready
> > function for you right now, although I guess it could be added.
> 
> drm_follow_page is used like this:
> 
> offset = drm_follow_page(pt) | ((unsigned long) pt & ~PAGE_MASK);
> map = drm_lookup_map(offset, size, dev);
> if (map && map->type == _DRM_AGP) {
> 	vunmap(pt);
> 	return;
> }
> 
> I think pt is a user space address. In DRM AGP memory is mapped into
> kernel and user space so the user space address is being converted
> into a kernel space one. The kernel space one is used to verify that
> the address is a valid mapping to AGP space, if so the page is
> unmapped.  I didn't write this code so I'm not 100% sure of what is
> going on.

You can't use get_user_pages in this case because the AGP aperture
can be above mem_map.  If none of the callers take page_table_lock
already you would need to add that too. I guess from the context the lock
is not taken, but better double check.

Perhaps we should add a get_user_phys() or somesuch for this.

-Andi


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: chasing the four level page table
  2005-01-06 19:38     ` Andi Kleen
@ 2005-01-06 20:05       ` Jon Smirl
  2005-01-06 21:41         ` Dave Jones
  0 siblings, 1 reply; 11+ messages in thread
From: Jon Smirl @ 2005-01-06 20:05 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel, DRI Devel

On 6 Jan 2005 20:38:27 +0100, Andi Kleen <ak@muc.de> wrote:
> You can't use get_user_pages in this case because the AGP aperture
> can be above mem_map.  If none of the callers take page_table_lock
> already you would need to add that too. I guess from the context the lock
> is not taken, but better double check.
> 
> Perhaps we should add a get_user_phys() or somesuch for this.

No where in DRM is page_table_lock being taken.  Also, no other device
driver takes page_table_lock either, so that probably implies that DRM
shouldn't start doing it to. Best solution would probably be add an mm
function for get_user_phys() that takes the lock internally. If you
add the function I'll convert DRM to use it.

-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: chasing the four level page table
  2005-01-06 20:05       ` Jon Smirl
@ 2005-01-06 21:41         ` Dave Jones
  2005-01-08  5:22           ` Jon Smirl
  0 siblings, 1 reply; 11+ messages in thread
From: Dave Jones @ 2005-01-06 21:41 UTC (permalink / raw)
  To: Jon Smirl; +Cc: Andi Kleen, linux-kernel, DRI Devel

On Thu, Jan 06, 2005 at 03:05:49PM -0500, Jon Smirl wrote:
 > On 6 Jan 2005 20:38:27 +0100, Andi Kleen <ak@muc.de> wrote:
 > > You can't use get_user_pages in this case because the AGP aperture
 > > can be above mem_map.  If none of the callers take page_table_lock
 > > already you would need to add that too. I guess from the context the lock
 > > is not taken, but better double check.
 > > 
 > > Perhaps we should add a get_user_phys() or somesuch for this.
 > 
 > No where in DRM is page_table_lock being taken.  Also, no other device
 > driver takes page_table_lock either, so that probably implies that DRM
 > shouldn't start doing it to. 

No other device driver is also doing such lowlevel stuff with
page tables directly afaics. drivers/char/drm seem to be the only drivers
using [pgd|pmd|pte]_offset() routines.

		Dave


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: chasing the four level page table
  2005-01-06 21:41         ` Dave Jones
@ 2005-01-08  5:22           ` Jon Smirl
  2005-01-15  2:54             ` H. Peter Anvin
  0 siblings, 1 reply; 11+ messages in thread
From: Jon Smirl @ 2005-01-08  5:22 UTC (permalink / raw)
  To: Dave Jones, Andi Kleen, linux-kernel, DRI Devel

On Thu, 6 Jan 2005 16:41:59 -0500, Dave Jones <davej@redhat.com> wrote:
> No other device driver is also doing such lowlevel stuff with
> page tables directly afaics. drivers/char/drm seem to be the only drivers
> using [pgd|pmd|pte]_offset() routines.

On 6 Jan 2005 20:38:27 +0100, Andi Kleen <ak@muc.de> wrote:
> Perhaps we should add a get_user_phys() or somesuch for this.

I think this is a case where the memory manager is missing a function
that DRM needs. If there was a get_user_phys() function DRM wouldn't
need to walk the page tables.

-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: chasing the four level page table
  2005-01-08  5:22           ` Jon Smirl
@ 2005-01-15  2:54             ` H. Peter Anvin
  2005-01-15  3:37               ` Roland Dreier
  2005-01-15  4:24               ` Andi Kleen
  0 siblings, 2 replies; 11+ messages in thread
From: H. Peter Anvin @ 2005-01-15  2:54 UTC (permalink / raw)
  To: linux-kernel

Followup to:  <9e47339105010721225c0cfb32@mail.gmail.com>
By author:    Jon Smirl <jonsmirl@gmail.com>
In newsgroup: linux.dev.kernel
>
> On Thu, 6 Jan 2005 16:41:59 -0500, Dave Jones <davej@redhat.com> wrote:
> > No other device driver is also doing such lowlevel stuff with
> > page tables directly afaics. drivers/char/drm seem to be the only drivers
> > using [pgd|pmd|pte]_offset() routines.
> 
> On 6 Jan 2005 20:38:27 +0100, Andi Kleen <ak@muc.de> wrote:
> > Perhaps we should add a get_user_phys() or somesuch for this.
> 
> I think this is a case where the memory manager is missing a function
> that DRM needs. If there was a get_user_phys() function DRM wouldn't
> need to walk the page tables.
> 

FWIW, the Nvidia device driver wrapper also has this issue.

There seems to be at least two classes of device drivers -- graphics
and RDMA -- which have a genuine need to DMA user pages, after
appropriate locking, of course.

At that point we're better off having the mm export the right
functionality to keep device driver authors from doing it wrong.

	-hpa

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: chasing the four level page table
  2005-01-15  2:54             ` H. Peter Anvin
@ 2005-01-15  3:37               ` Roland Dreier
  2005-01-15  4:24               ` Andi Kleen
  1 sibling, 0 replies; 11+ messages in thread
From: Roland Dreier @ 2005-01-15  3:37 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: linux-kernel

    ak> Perhaps we should add a get_user_phys() or somesuch for this.

    hpa> There seems to be at least two classes of device drivers --
    hpa> graphics and RDMA -- which have a genuine need to DMA user
    hpa> pages, after appropriate locking, of course.

I'm working on InfiniBand drivers, which will be doing RDMA.  I'm
probably missing something, but get_user_pages() seems to be all I
need -- I don't see how get_user_phys() would help me.  Of course I
need to do dma_map_sg() or the like to get an address I can pass to
the InfiniBand device but I don't see anything wrong with that.

There are some issues around fork() and copy-on-write, but those
really require more access to vma handling than page tables.  What
would be nice would be an equivalent to do_mlock() that also sets or
clears the VM_DONTCOPY flag.  This is because an RDMA application
wants to do something like mlock() to keep any pages to be DMAed
present, but even after doing mlock(), if the application forks and
touches one of these locked pages, the COW will move the page to a new
physical address.

 - Roland

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: chasing the four level page table
  2005-01-15  2:54             ` H. Peter Anvin
  2005-01-15  3:37               ` Roland Dreier
@ 2005-01-15  4:24               ` Andi Kleen
  2005-01-15  5:59                 ` Jon Smirl
  1 sibling, 1 reply; 11+ messages in thread
From: Andi Kleen @ 2005-01-15  4:24 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: linux-kernel

hpa@zytor.com (H. Peter Anvin) writes:
>
> There seems to be at least two classes of device drivers -- graphics
> and RDMA -- which have a genuine need to DMA user pages, after
> appropriate locking, of course.

And block devices with O_DIRECT and network devices. We supported that
fine for years with get_user_pages.

The issue with DRM is just that it does something more different:

it wants to get at a AGP page outside get_user_pages doesn't work for
this because the AGP hole is often outside mem_map. For that a 
nice helper is missing.

I'm not 100% we really want a helper because it's rather obscure
requirement, unlikely to be useful for others, and it may be better 
to keep it in DRM.

-Andi

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: chasing the four level page table
  2005-01-15  4:24               ` Andi Kleen
@ 2005-01-15  5:59                 ` Jon Smirl
  0 siblings, 0 replies; 11+ messages in thread
From: Jon Smirl @ 2005-01-15  5:59 UTC (permalink / raw)
  To: Andi Kleen; +Cc: H. Peter Anvin, linux-kernel

On Sat, 15 Jan 2005 05:24:54 +0100, Andi Kleen <ak@muc.de> wrote:
> it wants to get at a AGP page outside get_user_pages doesn't work for
> this because the AGP hole is often outside mem_map. For that a
> nice helper is missing.
> 
> I'm not 100% we really want a helper because it's rather obscure
> requirement, unlikely to be useful for others, and it may be better
> to keep it in DRM.

Wouldn't it be better as a helper where the memory management people
maintain it? For example I only work on x86, are there cross platform
issues? Also, the DRM code is missing the page_table_lock around the
calls too.

-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2005-01-15  5:59 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-01-06 17:17 chasing the four level page table Jon Smirl
2005-01-06 18:12 ` Andi Kleen
2005-01-06 18:36   ` Jon Smirl
2005-01-06 19:38     ` Andi Kleen
2005-01-06 20:05       ` Jon Smirl
2005-01-06 21:41         ` Dave Jones
2005-01-08  5:22           ` Jon Smirl
2005-01-15  2:54             ` H. Peter Anvin
2005-01-15  3:37               ` Roland Dreier
2005-01-15  4:24               ` Andi Kleen
2005-01-15  5:59                 ` Jon Smirl

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).