[PATCH] mm: export mmu notifier invalidates

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] mm: export mmu notifier invalidates
@ 2013-01-04 15:41 Cliff Wickman
  2013-01-04 21:35 ` Christoph Hellwig
  2013-01-07 14:14 ` Mel Gorman
  0 siblings, 2 replies; 14+ messages in thread
From: Cliff Wickman @ 2013-01-04 15:41 UTC (permalink / raw)
  To: aarcange, akpm, avi, hughd, mgorman; +Cc: linux-mm

From: Cliff Wickman <cpw@sgi.com>

Avi, Andrea, Andrew, Hugh, Mel,

We at SGI have a need to address some very high physical address ranges with
our GRU (global reference unit), sometimes across partitioned machine boundaries
and sometimes with larger addresses than the cpu supports.
We do this with the aid of our own 'extended vma' module which mimics the vma.
When something (either unmap or exit) frees an 'extended vma' we use the mmu
notifiers to clean them up.

We had been able to mimic the functions __mmu_notifier_invalidate_range_start()
and __mmu_notifier_invalidate_range_end() by locking the per-mm lock and 
walking the per-mm notifier list.  But with the change to a global srcu
lock (static in mmu_notifier.c) we can no longer do that.  Our module has
no access to that lock.

So we request that these two functions be exported.

Signed-off-by: Cliff Wickman <cpw@sgi.com>
Acked-by: Robin Holt <holt@sgi.com>

---
 mm/mmu_notifier.c |    2 ++
 1 file changed, 2 insertions(+)

Index: linux/mm/mmu_notifier.c
===================================================================
--- linux.orig/mm/mmu_notifier.c
+++ linux/mm/mmu_notifier.c
@@ -170,6 +170,7 @@ void __mmu_notifier_invalidate_range_sta
 	}
 	srcu_read_unlock(&srcu, id);
 }
+EXPORT_SYMBOL_GPL(__mmu_notifier_invalidate_range_start);

 void __mmu_notifier_invalidate_range_end(struct mm_struct *mm,
 				  unsigned long start, unsigned long end)
@@ -185,6 +186,7 @@ void __mmu_notifier_invalidate_range_end
 	}
 	srcu_read_unlock(&srcu, id);
 }
+EXPORT_SYMBOL_GPL(__mmu_notifier_invalidate_range_end);

 static int do_mmu_notifier_register(struct mmu_notifier *mn,
 				    struct mm_struct *mm,

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] mm: export mmu notifier invalidates
  2013-01-04 15:41 [PATCH] mm: export mmu notifier invalidates Cliff Wickman
@ 2013-01-04 21:35 ` Christoph Hellwig
  2013-01-04 22:09   ` Cliff Wickman
  2013-01-07 14:14 ` Mel Gorman
  1 sibling, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2013-01-04 21:35 UTC (permalink / raw)
  To: Cliff Wickman; +Cc: aarcange, akpm, avi, hughd, mgorman, linux-mm

On Fri, Jan 04, 2013 at 09:41:53AM -0600, Cliff Wickman wrote:
> So we request that these two functions be exported.

Can you please post the patch that actually uses it in the same series?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] mm: export mmu notifier invalidates
  2013-01-04 21:35 ` Christoph Hellwig
@ 2013-01-04 22:09   ` Cliff Wickman
  0 siblings, 0 replies; 14+ messages in thread
From: Cliff Wickman @ 2013-01-04 22:09 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: aarcange, akpm, avi, hughd, mgorman, linux-mm

On Fri, Jan 04, 2013 at 04:35:17PM -0500, Christoph Hellwig wrote:
> On Fri, Jan 04, 2013 at 09:41:53AM -0600, Cliff Wickman wrote:
> > So we request that these two functions be exported.
> 
> Can you please post the patch that actually uses it in the same series?

The code that needs to use these two functions is an SGI module.  We'd
be happy to open source it, but I think no one else is interested in it.

This is what that patch looks like:

---
 opensource/xvma/xvma/kernel/xvma.c |   12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

Index: 121214.rhel7/opensource/xvma/xvma/kernel/xvma.c
===================================================================
--- 121214.rhel7.orig/opensource/xvma/xvma/kernel/xvma.c
+++ 121214.rhel7/opensource/xvma/xvma/kernel/xvma.c
@@ -32,6 +32,7 @@
 #include <linux/mmu_notifier.h>
 #include <linux/rculist.h>
 #include <linux/spinlock.h>
+#include <linux/version.h>
 #include <asm/current.h>
 #include "xvma.h"
 static struct rb_root xmm_rb_root = RB_ROOT;
@@ -1248,16 +1249,19 @@ void
 zap_xvma_ptes(struct xvma_struct * xvma, unsigned long start, unsigned long size)
 {
 	struct mm_struct * mm = xvma->xvma_mm;
+	unsigned long end = start + size;
+#if LINUX_VERSION_CODE <= KERNEL_VERSION(3,5,0)
 	struct mmu_notifier * mn;
 	struct hlist_node * n;
-	unsigned long end = start + size;
+	int srcu;
+#endif
 
 	DPRINTK_XMM_XVMA(xvma->xvma_xmm, xvma);
 	if (mm) {
-		int srcu;
 		/* don't remove this - superpages may have no mmu notifier */
         	if (!mm->mmu_notifier_mm)
                 	return;
+#if LINUX_VERSION_CODE <= KERNEL_VERSION(3,5,0)
 		srcu = srcu_read_lock(&mm->mmu_notifier_mm->srcu);
 		hlist_for_each_entry_rcu(mn, n, &mm->mmu_notifier_mm->list, hlist) {
 			if (mn->ops->invalidate_range_start)
@@ -1268,6 +1272,10 @@ zap_xvma_ptes(struct xvma_struct * xvma,
 				mn->ops->invalidate_range_end(mn, mm, start, end);
 		}
 		srcu_read_unlock(&mm->mmu_notifier_mm->srcu, srcu);
+#else
+                __mmu_notifier_invalidate_range_start(mm, start, end);
+                __mmu_notifier_invalidate_range_end(mm, start, end);
+#endif
 	} else if (xvma->xvma_xmm->xmm_invalidate_high_range) {
 		xvma->xvma_xmm->xmm_invalidate_high_range(xvma->xvma_xmm, start, end);
 	}
-- 
Cliff Wickman
SGI
cpw@sgi.com
(651) 683-3824

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] mm: export mmu notifier invalidates
  2013-01-04 15:41 [PATCH] mm: export mmu notifier invalidates Cliff Wickman
  2013-01-04 21:35 ` Christoph Hellwig
@ 2013-01-07 14:14 ` Mel Gorman
  2013-01-07 15:35   ` Andrea Arcangeli
  1 sibling, 1 reply; 14+ messages in thread
From: Mel Gorman @ 2013-01-07 14:14 UTC (permalink / raw)
  To: Cliff Wickman; +Cc: aarcange, akpm, avi, hughd, linux-mm

On Fri, Jan 04, 2013 at 09:41:53AM -0600, Cliff Wickman wrote:
> From: Cliff Wickman <cpw@sgi.com>
> 
> Avi, Andrea, Andrew, Hugh, Mel,
> 
> We at SGI have a need to address some very high physical address ranges with
> our GRU (global reference unit), sometimes across partitioned machine boundaries
> and sometimes with larger addresses than the cpu supports.
> We do this with the aid of our own 'extended vma' module which mimics the vma.
> When something (either unmap or exit) frees an 'extended vma' we use the mmu
> notifiers to clean them up.
> 
> We had been able to mimic the functions __mmu_notifier_invalidate_range_start()
> and __mmu_notifier_invalidate_range_end() by locking the per-mm lock and 
> walking the per-mm notifier list.  But with the change to a global srcu
> lock (static in mmu_notifier.c) we can no longer do that.  Our module has
> no access to that lock.
> 
> So we request that these two functions be exported.
> 

I do not believe I wrote any of the MMU notifier code so it's not up to
me how it should be exported (or if it should even be allowed). I find it
curious that it appears that no other driver needs this and wonder if you
could also abuse the vma_ops->close interface to do some of the cleanup
but I've no idea what your module is doing. I've no objection to the
export as such but it's really not my call.

Andrea?

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] mm: export mmu notifier invalidates
  2013-01-07 14:14 ` Mel Gorman
@ 2013-01-07 15:35   ` Andrea Arcangeli
  0 siblings, 0 replies; 14+ messages in thread
From: Andrea Arcangeli @ 2013-01-07 15:35 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Cliff Wickman, akpm, avi, hughd, linux-mm

Hi Mel,

On Mon, Jan 07, 2013 at 02:14:46PM +0000, Mel Gorman wrote:
> On Fri, Jan 04, 2013 at 09:41:53AM -0600, Cliff Wickman wrote:
> > From: Cliff Wickman <cpw@sgi.com>
> > 
> > Avi, Andrea, Andrew, Hugh, Mel,
> > 
> > We at SGI have a need to address some very high physical address ranges with
> > our GRU (global reference unit), sometimes across partitioned machine boundaries
> > and sometimes with larger addresses than the cpu supports.
> > We do this with the aid of our own 'extended vma' module which mimics the vma.
> > When something (either unmap or exit) frees an 'extended vma' we use the mmu
> > notifiers to clean them up.
> > 
> > We had been able to mimic the functions __mmu_notifier_invalidate_range_start()
> > and __mmu_notifier_invalidate_range_end() by locking the per-mm lock and 
> > walking the per-mm notifier list.  But with the change to a global srcu
> > lock (static in mmu_notifier.c) we can no longer do that.  Our module has
> > no access to that lock.
> > 
> > So we request that these two functions be exported.
> > 
> 
> I do not believe I wrote any of the MMU notifier code so it's not up to
> me how it should be exported (or if it should even be allowed). I find it
> curious that it appears that no other driver needs this and wonder if you
> could also abuse the vma_ops->close interface to do some of the cleanup
> but I've no idea what your module is doing. I've no objection to the
> export as such but it's really not my call.

The patch itself is zero risk and in fact it will make life easier to
their out-of-tree kernel module (that will be able to use the common
code in mmu_notifier.c and remove some duplicate).

The real question is if we're going to support extended vma
abstractions in kernel modules out of tree and that's not only my call
so I suggest others to comment too. If yes then applying this patch to
mmu notifier (so the device driver can call those methods) sounds fine
with me. I'm neutral on the broader question.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH] mm: export mmu notifier invalidates
@ 2013-02-12 21:35 Cliff Wickman
  2013-02-12 21:57 ` Andrew Morton
  0 siblings, 1 reply; 14+ messages in thread
From: Cliff Wickman @ 2013-02-12 21:35 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, aarcange, mgorman

Commenting on this patch ended with Andrea's post on 07Jan, which was
a more-or-less endorsement and a question about support for extended vma
abstractions in kernel modules out of tree.
(that comment can be found at http://marc.info/?l=linux-mm&m=135757292605395&w=2)

I'd like to make the request again to consider export of these two symbols. 

We at SGI have a need to address some very high physical address ranges with
our GRU (global reference unit), sometimes across partitioned machine boundaries
and sometimes with larger addresses than the cpu supports.
We do this with the aid of our own 'extended vma' module which mimics the vma.
When something (either unmap or exit) frees an 'extended vma' we use the mmu
notifiers to clean them up.

We had been able to mimic the functions __mmu_notifier_invalidate_range_start()
and __mmu_notifier_invalidate_range_end() by locking the per-mm lock and 
walking the per-mm notifier list.  But with the change to a global srcu
lock (static in mmu_notifier.c) we can no longer do that.  Our module has
no access to that lock.

So we request that these two functions be exported.

Signed-off-by: Cliff Wickman <cpw@sgi.com>
Acked-by: Robin Holt <holt@sgi.com>

---
 mm/mmu_notifier.c |    2 ++
 1 file changed, 2 insertions(+)

Index: linux/mm/mmu_notifier.c
===================================================================
--- linux.orig/mm/mmu_notifier.c
+++ linux/mm/mmu_notifier.c
@@ -170,6 +170,7 @@ void __mmu_notifier_invalidate_range_sta
 	}
 	srcu_read_unlock(&srcu, id);
 }
+EXPORT_SYMBOL_GPL(__mmu_notifier_invalidate_range_start);

 void __mmu_notifier_invalidate_range_end(struct mm_struct *mm,
 				  unsigned long start, unsigned long end)
@@ -185,6 +186,7 @@ void __mmu_notifier_invalidate_range_end
 	}
 	srcu_read_unlock(&srcu, id);
 }
+EXPORT_SYMBOL_GPL(__mmu_notifier_invalidate_range_end);

 static int do_mmu_notifier_register(struct mmu_notifier *mn,
 				    struct mm_struct *mm,

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

----- End forwarded message -----

-- 
Cliff Wickman
SGI
cpw@sgi.com
(651) 683-3824

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] mm: export mmu notifier invalidates
  2013-02-12 21:35 Cliff Wickman
@ 2013-02-12 21:57 ` Andrew Morton
  2013-02-13 15:03   ` Robin Holt
  0 siblings, 1 reply; 14+ messages in thread
From: Andrew Morton @ 2013-02-12 21:57 UTC (permalink / raw)
  To: Cliff Wickman; +Cc: linux-mm, aarcange, mgorman

On Tue, 12 Feb 2013 15:35:34 -0600
Cliff Wickman <cpw@sgi.com> wrote:

> 
> Commenting on this patch ended with Andrea's post on 07Jan, which was
> a more-or-less endorsement and a question about support for extended vma
> abstractions in kernel modules out of tree.
> (that comment can be found at http://marc.info/?l=linux-mm&m=135757292605395&w=2)
> 
> I'd like to make the request again to consider export of these two symbols. 
> 
> 
> We at SGI have a need to address some very high physical address ranges with
> our GRU (global reference unit), sometimes across partitioned machine boundaries
> and sometimes with larger addresses than the cpu supports.
> We do this with the aid of our own 'extended vma' module which mimics the vma.
> When something (either unmap or exit) frees an 'extended vma' we use the mmu
> notifiers to clean them up.
> 
> We had been able to mimic the functions __mmu_notifier_invalidate_range_start()
> and __mmu_notifier_invalidate_range_end() by locking the per-mm lock and 
> walking the per-mm notifier list.  But with the change to a global srcu
> lock (static in mmu_notifier.c) we can no longer do that.  Our module has
> no access to that lock.
> 
> So we request that these two functions be exported.
> 
> ...
>
> +EXPORT_SYMBOL_GPL(__mmu_notifier_invalidate_range_start);
> +EXPORT_SYMBOL_GPL(__mmu_notifier_invalidate_range_end);

erk.  Having remote, modular, out-of-tree *sending* mmu notifications
is pretty abusive :(

I don't have a problem with the patch personally.  It's a GPL export
and it's only 2 lines and if we break it, you own both pieces ;)

But in a better world, the core kernel would support your machines
adequately and you wouldn't need to maintain that out-of-tree MM code. 
What are the prospects of this?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] mm: export mmu notifier invalidates
  2013-02-12 21:57 ` Andrew Morton
@ 2013-02-13 15:03   ` Robin Holt
  2013-02-13 20:11     ` Andrew Morton
  0 siblings, 1 reply; 14+ messages in thread
From: Robin Holt @ 2013-02-13 15:03 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Cliff Wickman, linux-mm, aarcange, mgorman

On Tue, Feb 12, 2013 at 01:57:26PM -0800, Andrew Morton wrote:
> On Tue, 12 Feb 2013 15:35:34 -0600
> Cliff Wickman <cpw@sgi.com> wrote:
> 
> > 
> > Commenting on this patch ended with Andrea's post on 07Jan, which was
> > a more-or-less endorsement and a question about support for extended vma
> > abstractions in kernel modules out of tree.
> > (that comment can be found at http://marc.info/?l=linux-mm&m=135757292605395&w=2)
> > 
> > I'd like to make the request again to consider export of these two symbols. 
> > 
> > 
> > We at SGI have a need to address some very high physical address ranges with
> > our GRU (global reference unit), sometimes across partitioned machine boundaries
> > and sometimes with larger addresses than the cpu supports.
> > We do this with the aid of our own 'extended vma' module which mimics the vma.
> > When something (either unmap or exit) frees an 'extended vma' we use the mmu
> > notifiers to clean them up.
> > 
> > We had been able to mimic the functions __mmu_notifier_invalidate_range_start()
> > and __mmu_notifier_invalidate_range_end() by locking the per-mm lock and 
> > walking the per-mm notifier list.  But with the change to a global srcu
> > lock (static in mmu_notifier.c) we can no longer do that.  Our module has
> > no access to that lock.
> > 
> > So we request that these two functions be exported.
> > 
> > ...
> >
> > +EXPORT_SYMBOL_GPL(__mmu_notifier_invalidate_range_start);
> > +EXPORT_SYMBOL_GPL(__mmu_notifier_invalidate_range_end);
> 
> erk.  Having remote, modular, out-of-tree *sending* mmu notifications
> is pretty abusive :(
> 
> I don't have a problem with the patch personally.  It's a GPL export
> and it's only 2 lines and if we break it, you own both pieces ;)
> 
> But in a better world, the core kernel would support your machines
> adequately and you wouldn't need to maintain that out-of-tree MM code. 
> What are the prospects of this?

We can put it on our todo list.  Getting a user of this infrastructure
will require changes by Dimitri for the GRU driver (drivers/misc/sgi-gru).
He is currently focused on getting the design of some upcoming hardware
finalized and design changes tested in our simulation environment so he
will be consumed for the next several months.

If you would like, I can clean up the driver in my spare time and submit
it for review.  Would you consider allowing its inclusion without the
GRU driver as a user?

In the transition period, could we allow this change in and then remove
the exports as part of that driver being accepted?  That would help us
with an upcoming distro release.

Thanks,
Robin

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] mm: export mmu notifier invalidates
  2013-02-13 15:03   ` Robin Holt
@ 2013-02-13 20:11     ` Andrew Morton
  2013-02-13 21:03       ` Robin Holt
  0 siblings, 1 reply; 14+ messages in thread
From: Andrew Morton @ 2013-02-13 20:11 UTC (permalink / raw)
  To: Robin Holt; +Cc: Cliff Wickman, linux-mm, aarcange, mgorman

On Wed, 13 Feb 2013 09:03:40 -0600
Robin Holt <holt@sgi.com> wrote:

> > But in a better world, the core kernel would support your machines
> > adequately and you wouldn't need to maintain that out-of-tree MM code. 
> > What are the prospects of this?
> 
> We can put it on our todo list.  Getting a user of this infrastructure
> will require changes by Dimitri for the GRU driver (drivers/misc/sgi-gru).
> He is currently focused on getting the design of some upcoming hardware
> finalized and design changes tested in our simulation environment so he
> will be consumed for the next several months.
> 
> If you would like, I can clean up the driver in my spare time and submit
> it for review.  Would you consider allowing its inclusion without the
> GRU driver as a user?

>From Cliff's description it sounded like that driver is
duplicating/augmenting core MM functions.  I was more wondering
whether core MM could be enhanced so that driver becomes obsolete?

> In the transition period, could we allow this change in and then remove
> the exports as part of that driver being accepted?  That would help us
> with an upcoming distro release.

I'm OK with this patch for 3.9-rc1.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] mm: export mmu notifier invalidates
  2013-02-13 20:11     ` Andrew Morton
@ 2013-02-13 21:03       ` Robin Holt
  2013-02-14 21:08         ` Andrew Morton
  0 siblings, 1 reply; 14+ messages in thread
From: Robin Holt @ 2013-02-13 21:03 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Robin Holt, Cliff Wickman, linux-mm, aarcange, mgorman

On Wed, Feb 13, 2013 at 12:11:49PM -0800, Andrew Morton wrote:
> On Wed, 13 Feb 2013 09:03:40 -0600
> Robin Holt <holt@sgi.com> wrote:
> 
> > > But in a better world, the core kernel would support your machines
> > > adequately and you wouldn't need to maintain that out-of-tree MM code. 
> > > What are the prospects of this?
> > 
> > We can put it on our todo list.  Getting a user of this infrastructure
> > will require changes by Dimitri for the GRU driver (drivers/misc/sgi-gru).
> > He is currently focused on getting the design of some upcoming hardware
> > finalized and design changes tested in our simulation environment so he
> > will be consumed for the next several months.
> > 
> > If you would like, I can clean up the driver in my spare time and submit
> > it for review.  Would you consider allowing its inclusion without the
> > GRU driver as a user?
> 
> >From Cliff's description it sounded like that driver is
> duplicating/augmenting core MM functions.  I was more wondering
> whether core MM could be enhanced so that driver becomes obsolete?

That would be fine with me.  The requirements on the driver are fairly
small and well known.  We separate virtual addresses above processor
addressable space into two "regions".  Memory from 1UL << 53 to 1UL <<
63 is considered one set of virtual addresses.  Memory above 1UL << 63
is considered "shared among a process group".

I will only mention in passing that we also have a driver which exposes
mega-size pages which the kernel has not been informed of by the EFI
memory map and xvma is used to allow the GRU to fault pages of a supported
page size (eg: 64KB, 256KB 512KB, 2MB, 8MB, ... 1TB).

The shared address has a couple unusual features.  One task makes a ioctl
(happens to come via XPMEM) which creates a shared_xmm.  This is roughly
equivalent to an mm for a pthread app.  Once it is created, a shared_xmm
id is returned.  Other tasks then join that shared xmm.

At any time, any process can created shared mmap entries (again, currently
via XPMEM).  Again, this is like a pthread in that this new mapping is
now referencable from all tasks at the same virtual address.

There are similar functions for removing the shared mapping.

The non-shared case is equivalent to a regular mm/vma, but beyond
processor addressable space.

SGI's MPI utilizes these address spaces for directly mapping portions
of the other tasks address space.  This can include processes in other
portions of the machine beyond the processor's ability to physically
address.

The above, of course, is an oversimplification, but should give you and
idea of the big picture design goals.

Does any of this make sense?  Do you see areas where you think we should
extend regular mm functionality to include these functions?

How would you like me to proceed?

Thanks,
Robin

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] mm: export mmu notifier invalidates
  2013-02-13 21:03       ` Robin Holt
@ 2013-02-14 21:08         ` Andrew Morton
  2013-02-14 21:35           ` Robin Holt
  0 siblings, 1 reply; 14+ messages in thread
From: Andrew Morton @ 2013-02-14 21:08 UTC (permalink / raw)
  To: Robin Holt; +Cc: Cliff Wickman, linux-mm, aarcange, mgorman

On Wed, 13 Feb 2013 15:03:05 -0600
Robin Holt <holt@sgi.com> wrote:

> On Wed, Feb 13, 2013 at 12:11:49PM -0800, Andrew Morton wrote:
> > On Wed, 13 Feb 2013 09:03:40 -0600
> > Robin Holt <holt@sgi.com> wrote:
> > 
> > > > But in a better world, the core kernel would support your machines
> > > > adequately and you wouldn't need to maintain that out-of-tree MM code. 
> > > > What are the prospects of this?
> > > 
> > > We can put it on our todo list.  Getting a user of this infrastructure
> > > will require changes by Dimitri for the GRU driver (drivers/misc/sgi-gru).
> > > He is currently focused on getting the design of some upcoming hardware
> > > finalized and design changes tested in our simulation environment so he
> > > will be consumed for the next several months.
> > > 
> > > If you would like, I can clean up the driver in my spare time and submit
> > > it for review.  Would you consider allowing its inclusion without the
> > > GRU driver as a user?
> > 
> > >From Cliff's description it sounded like that driver is
> > duplicating/augmenting core MM functions.  I was more wondering
> > whether core MM could be enhanced so that driver becomes obsolete?
> 
> That would be fine with me.  The requirements on the driver are fairly
> small and well known.  We separate virtual addresses above processor
> addressable space into two "regions".  Memory from 1UL << 53 to 1UL <<
> 63 is considered one set of virtual addresses.  Memory above 1UL << 63
> is considered "shared among a process group".
> 
> I will only mention in passing that we also have a driver which exposes
> mega-size pages which the kernel has not been informed of by the EFI
> memory map and xvma is used to allow the GRU to fault pages of a supported
> page size (eg: 64KB, 256KB 512KB, 2MB, 8MB, ... 1TB).
> 
> The shared address has a couple unusual features.  One task makes a ioctl
> (happens to come via XPMEM) which creates a shared_xmm.  This is roughly
> equivalent to an mm for a pthread app.  Once it is created, a shared_xmm
> id is returned.  Other tasks then join that shared xmm.
> 
> At any time, any process can created shared mmap entries (again, currently
> via XPMEM).  Again, this is like a pthread in that this new mapping is
> now referencable from all tasks at the same virtual address.
> 
> There are similar functions for removing the shared mapping.
> 
> The non-shared case is equivalent to a regular mm/vma, but beyond
> processor addressable space.
> 
> SGI's MPI utilizes these address spaces for directly mapping portions
> of the other tasks address space.  This can include processes in other
> portions of the machine beyond the processor's ability to physically
> address.

What exactly is "SGI's MPI" from the kernel POV?  A separate
out-of-tree driver?

If the objective is to "directly map portions of the other tasks
address space" then how does this slicing-up of physical address
regions come into play?  If one wishes to map another mm's memory,
wouldn't you just go ahead and map it, regardless of physical address?

To what extent is all this specific to SGI hardware characteristics?

> The above, of course, is an oversimplification, but should give you and
> idea of the big picture design goals.
>
> Does any of this make sense?  Do you see areas where you think we should
> extend regular mm functionality to include these functions?
> 
> How would you like me to proceed?

I'm obviously on first base here, but overall approach:

- Is the top-level feature useful to general Linux users?  Perhaps
  after suitable generalisations (aka dumbing down :))

- Even if the answer to that is "no", should we maintain the feature
  in-tree rather than out-of-tree?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] mm: export mmu notifier invalidates
  2013-02-14 21:08         ` Andrew Morton
@ 2013-02-14 21:35           ` Robin Holt
  2013-02-14 21:52             ` Andrew Morton
  0 siblings, 1 reply; 14+ messages in thread
From: Robin Holt @ 2013-02-14 21:35 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Robin Holt, Cliff Wickman, linux-mm, aarcange, mgorman

On Thu, Feb 14, 2013 at 01:08:56PM -0800, Andrew Morton wrote:
> On Wed, 13 Feb 2013 15:03:05 -0600
> Robin Holt <holt@sgi.com> wrote:
> 
> > On Wed, Feb 13, 2013 at 12:11:49PM -0800, Andrew Morton wrote:
> > > On Wed, 13 Feb 2013 09:03:40 -0600
> > > Robin Holt <holt@sgi.com> wrote:
> > > 
> > > > > But in a better world, the core kernel would support your machines
> > > > > adequately and you wouldn't need to maintain that out-of-tree MM code. 
> > > > > What are the prospects of this?
> > > > 
> > > > We can put it on our todo list.  Getting a user of this infrastructure
> > > > will require changes by Dimitri for the GRU driver (drivers/misc/sgi-gru).
> > > > He is currently focused on getting the design of some upcoming hardware
> > > > finalized and design changes tested in our simulation environment so he
> > > > will be consumed for the next several months.
> > > > 
> > > > If you would like, I can clean up the driver in my spare time and submit
> > > > it for review.  Would you consider allowing its inclusion without the
> > > > GRU driver as a user?
> > > 
> > > >From Cliff's description it sounded like that driver is
> > > duplicating/augmenting core MM functions.  I was more wondering
> > > whether core MM could be enhanced so that driver becomes obsolete?
> > 
> > That would be fine with me.  The requirements on the driver are fairly
> > small and well known.  We separate virtual addresses above processor
> > addressable space into two "regions".  Memory from 1UL << 53 to 1UL <<
> > 63 is considered one set of virtual addresses.  Memory above 1UL << 63
> > is considered "shared among a process group".
> > 
> > I will only mention in passing that we also have a driver which exposes
> > mega-size pages which the kernel has not been informed of by the EFI
> > memory map and xvma is used to allow the GRU to fault pages of a supported
> > page size (eg: 64KB, 256KB 512KB, 2MB, 8MB, ... 1TB).
> > 
> > The shared address has a couple unusual features.  One task makes a ioctl
> > (happens to come via XPMEM) which creates a shared_xmm.  This is roughly
> > equivalent to an mm for a pthread app.  Once it is created, a shared_xmm
> > id is returned.  Other tasks then join that shared xmm.
> > 
> > At any time, any process can created shared mmap entries (again, currently
> > via XPMEM).  Again, this is like a pthread in that this new mapping is
> > now referencable from all tasks at the same virtual address.
> > 
> > There are similar functions for removing the shared mapping.
> > 
> > The non-shared case is equivalent to a regular mm/vma, but beyond
> > processor addressable space.
> > 
> > SGI's MPI utilizes these address spaces for directly mapping portions
> > of the other tasks address space.  This can include processes in other
> > portions of the machine beyond the processor's ability to physically
> > address.
> 
> What exactly is "SGI's MPI" from the kernel POV?  A separate
> out-of-tree driver?

MPI (Message Passing Interface) is a standardized library of routines
for building parallelized jobs.  It is a standard.  SGI has their
implementation.  Cray has a similar implementation and, as I understand
it, have leveraged an earlier version of xpmem that I posted here a few
years ago.  There are also Intel MPI, HP MPI, IBM MPI, and many others.
They are all libraries that provide a means to rapidly communicate
between processing units.  IBM has, in the past, attempted to get changes
introduced for their direct communications between jobs.

SGI and Cray's implementations both do single copy between ranks without
going to kernel space.  Think of it as RDMA using IB, but the processor
does the work.

> If the objective is to "directly map portions of the other tasks
> address space" then how does this slicing-up of physical address
> regions come into play?  If one wishes to map another mm's memory,
> wouldn't you just go ahead and map it, regardless of physical address?

I probably am not quite understanding your meaning here or not explaining
myself well enough, but the library does not control what portion
of the address space contains the data for use by the collective.
A library call is made and the collective does the work of signalling
to the other ranks where to find the data.  With XPMEM and the GRUs much
larger virtual addresing capabilties, we can have all of the other rank's
virtual address space pre-mapped.

I am open to suggestions.  Can you suggest existing kernel functionality
that allows one task to map another virtual address space into their
va space to allow userland-to-userland copies without system calls?
If there is functionality that has been introduced in the last couple
years, I could very well have missed it as I have been fairly heads-down
on other things for some time.

> To what extent is all this specific to SGI hardware characteristics?

SGI's hardware allows two things, a vastly larger virtual address space
and the ability to access memory in other system images on the same numa
fabric which are beyond the processsors physical addressing capabilities.

I am fairly sure Cray has taken an older version of XPMEM and stripped
out a bunch of SGI specific bits and implemented it on their hardware.

> > The above, of course, is an oversimplification, but should give you and
> > idea of the big picture design goals.
> >
> > Does any of this make sense?  Do you see areas where you think we should
> > extend regular mm functionality to include these functions?
> > 
> > How would you like me to proceed?
> 
> I'm obviously on first base here, but overall approach:
> 
> - Is the top-level feature useful to general Linux users?  Perhaps
>   after suitable generalisations (aka dumbing down :))

I am not sure how useful it is.  I know IBM has tried in the past to
get a similar feature introduced.  I believe they settled on a ptrace
extension to do direct user-to-user copies from within the kernel.

> - Even if the answer to that is "no", should we maintain the feature
>   in-tree rather than out-of-tree?

Not sure on the second one, but I believe Linus' objection is security and
I can certainly understand that.  Right now, SGI's xpmem implementation
enforces that all jobs in the task need to have the same UID.  There is
no exception for root or and administrator.

Thanks,
Robin

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] mm: export mmu notifier invalidates
  2013-02-14 21:35           ` Robin Holt
@ 2013-02-14 21:52             ` Andrew Morton
  2013-02-14 22:25               ` Robin Holt
  0 siblings, 1 reply; 14+ messages in thread
From: Andrew Morton @ 2013-02-14 21:52 UTC (permalink / raw)
  To: Robin Holt; +Cc: Cliff Wickman, linux-mm, aarcange, mgorman

On Thu, 14 Feb 2013 15:35:12 -0600
Robin Holt <holt@sgi.com> wrote:

> I am open to suggestions.  Can you suggest existing kernel functionality
> that allows one task to map another virtual address space into their
> va space to allow userland-to-userland copies without system calls?
> If there is functionality that has been introduced in the last couple
> years, I could very well have missed it as I have been fairly heads-down
> on other things for some time.

That's conceptually very similar to mm/process_vm_access.c. 
process_vm_readv/writev do kernel-based copying rather than a direct
mmap.

> > To what extent is all this specific to SGI hardware characteristics?
> 
> SGI's hardware allows two things, a vastly larger virtual address space
> and the ability to access memory in other system images on the same numa
> fabric which are beyond the processsors physical addressing capabilities.
> 
> I am fairly sure Cray has taken an older version of XPMEM and stripped
> out a bunch of SGI specific bits and implemented it on their hardware.
> 
> > > The above, of course, is an oversimplification, but should give you and
> > > idea of the big picture design goals.
> > >
> > > Does any of this make sense?  Do you see areas where you think we should
> > > extend regular mm functionality to include these functions?
> > > 
> > > How would you like me to proceed?
> > 
> > I'm obviously on first base here, but overall approach:
> > 
> > - Is the top-level feature useful to general Linux users?  Perhaps
> >   after suitable generalisations (aka dumbing down :))
> 
> I am not sure how useful it is.  I know IBM has tried in the past to
> get a similar feature introduced.  I believe they settled on a ptrace
> extension to do direct user-to-user copies from within the kernel.

process_vm_readv/writev is from Christopher Yeoh@IBM.

> > - Even if the answer to that is "no", should we maintain the feature
> >   in-tree rather than out-of-tree?
> 
> Not sure on the second one, but I believe Linus' objection is security and
> I can certainly understand that.  Right now, SGI's xpmem implementation
> enforces that all jobs in the task need to have the same UID.  There is
> no exception for root or and administrator.

I'd have thought that the security processing of a direct map would be
identical to those in process_vm_readv/writev?

If we were to add a general map-this-into-that facility which is
available to and runs adequately on our typical machines, I assume your
systems would need some SGI-specific augmentation?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] mm: export mmu notifier invalidates
  2013-02-14 21:52             ` Andrew Morton
@ 2013-02-14 22:25               ` Robin Holt
  0 siblings, 0 replies; 14+ messages in thread
From: Robin Holt @ 2013-02-14 22:25 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Robin Holt, Cliff Wickman, linux-mm, aarcange, mgorman

On Thu, Feb 14, 2013 at 01:52:34PM -0800, Andrew Morton wrote:
> On Thu, 14 Feb 2013 15:35:12 -0600
> Robin Holt <holt@sgi.com> wrote:
> 
> > I am open to suggestions.  Can you suggest existing kernel functionality
> > that allows one task to map another virtual address space into their
> > va space to allow userland-to-userland copies without system calls?
> > If there is functionality that has been introduced in the last couple
> > years, I could very well have missed it as I have been fairly heads-down
> > on other things for some time.
> 
> That's conceptually very similar to mm/process_vm_access.c. 
> process_vm_readv/writev do kernel-based copying rather than a direct
> mmap.

I will go look at those now.  I am not familiar with them as they went
in during my "dark period" where I was working on system controller
functionality and not paying attention to kernel activity.

> 
> > > To what extent is all this specific to SGI hardware characteristics?
> > 
> > SGI's hardware allows two things, a vastly larger virtual address space
> > and the ability to access memory in other system images on the same numa
> > fabric which are beyond the processsors physical addressing capabilities.
> > 
> > I am fairly sure Cray has taken an older version of XPMEM and stripped
> > out a bunch of SGI specific bits and implemented it on their hardware.
> > 
> > > > The above, of course, is an oversimplification, but should give you and
> > > > idea of the big picture design goals.
> > > >
> > > > Does any of this make sense?  Do you see areas where you think we should
> > > > extend regular mm functionality to include these functions?
> > > > 
> > > > How would you like me to proceed?
> > > 
> > > I'm obviously on first base here, but overall approach:
> > > 
> > > - Is the top-level feature useful to general Linux users?  Perhaps
> > >   after suitable generalisations (aka dumbing down :))
> > 
> > I am not sure how useful it is.  I know IBM has tried in the past to
> > get a similar feature introduced.  I believe they settled on a ptrace
> > extension to do direct user-to-user copies from within the kernel.
> 
> process_vm_readv/writev is from Christopher Yeoh@IBM.
> 
> > > - Even if the answer to that is "no", should we maintain the feature
> > >   in-tree rather than out-of-tree?
> > 
> > Not sure on the second one, but I believe Linus' objection is security and
> > I can certainly understand that.  Right now, SGI's xpmem implementation
> > enforces that all jobs in the task need to have the same UID.  There is
> > no exception for root or and administrator.
> 
> I'd have thought that the security processing of a direct map would be
> identical to those in process_vm_readv/writev?
> 
> If we were to add a general map-this-into-that facility which is
> available to and runs adequately on our typical machines, I assume your
> systems would need some SGI-specific augmentation?

Yes, for the extended virtual and physical address space and for the
weird page sizes.

Thanks,
Robin

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2013-02-14 22:25 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-01-04 15:41 [PATCH] mm: export mmu notifier invalidates Cliff Wickman
2013-01-04 21:35 ` Christoph Hellwig
2013-01-04 22:09   ` Cliff Wickman
2013-01-07 14:14 ` Mel Gorman
2013-01-07 15:35   ` Andrea Arcangeli
  -- strict thread matches above, loose matches on Subject: below --
2013-02-12 21:35 Cliff Wickman
2013-02-12 21:57 ` Andrew Morton
2013-02-13 15:03   ` Robin Holt
2013-02-13 20:11     ` Andrew Morton
2013-02-13 21:03       ` Robin Holt
2013-02-14 21:08         ` Andrew Morton
2013-02-14 21:35           ` Robin Holt
2013-02-14 21:52             ` Andrew Morton
2013-02-14 22:25               ` Robin Holt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).