* [PATCH] mm: rearrange exit_mmap() to unlock before arch_exit_mmap
@ 2009-02-09 17:29 Lee Schermerhorn
2009-02-10 21:31 ` Andrew Morton
0 siblings, 1 reply; 3+ messages in thread
From: Lee Schermerhorn @ 2009-02-09 17:29 UTC (permalink / raw)
To: linux-kernel, stable, Andrew Morton
Cc: Jeremy Fitzhardinge, Keir Fraser, Christophe Saout,
Alex Williamson, Nick Piggin
From: Jeremy Fitzhardinge <jeremy@goop.org>
Subject: mm: rearrange exit_mmap() to unlock before arch_exit_mmap
Applicable to 29-rc4 and 28-stable
Christophe Saout reported [in precursor to:
http://marc.info/?l=linux-kernel&m=123209902707347&w=4]:
> Note that I also some a different issue with CONFIG_UNEVICTABLE_LRU.
> Seems like Xen tears down current->mm early on process termination, so
> that __get_user_pages in exit_mmap causes nasty messages when the
> process had any mlocked pages. (in fact, it somehow manages to get into
> the swapping code and produces a null pointer dereference trying to get
> a swap token)
Jeremy explained:
Yes. In the normal case under Xen, an in-use pagetable is "pinned",
meaning that it is RO to the kernel, and all updates must go via
hypercall (or writes are trapped and emulated, which is much the same
thing). An unpinned pagetable is not currently in use by any process,
and can be directly accessed as normal RW pages.
As an optimisation at process exit time, we unpin the pagetable as early
as possible (switching the process to init_mm), so that all the normal
pagetable teardown can happen with direct memory accesses.
This happens in exit_mmap() -> arch_exit_mmap(). The munlocking happens
a few lines below. The obvious thing to do would be to move
arch_exit_mmap() to below the munlock code, but I think we'd want to
call it even if mm->mmap is NULL, just to be on the safe side.
Thus, this patch:
exit_mmap() needs to unlock any locked vmas before calling
arch_exit_mmap, as the latter may switch the current mm to init_mm,
which would cause the former to fail.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
---
mm/mmap.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
===================================================================
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2078,12 +2078,8 @@
unsigned long end;
/* mm's last user has gone, and its about to be pulled down */
- arch_exit_mmap(mm);
mmu_notifier_release(mm);
- if (!mm->mmap) /* Can happen if dup_mmap() received an OOM */
- return;
-
if (mm->locked_vm) {
vma = mm->mmap;
while (vma) {
@@ -2092,7 +2088,13 @@
vma = vma->vm_next;
}
}
+
+ arch_exit_mmap(mm);
+
vma = mm->mmap;
+ if (!vma) /* Can happen if dup_mmap() received an OOM */
+ return;
+
lru_add_drain();
flush_cache_mm(mm);
tlb = tlb_gather_mmu(mm, 1);
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] mm: rearrange exit_mmap() to unlock before arch_exit_mmap
2009-02-09 17:29 [PATCH] mm: rearrange exit_mmap() to unlock before arch_exit_mmap Lee Schermerhorn
@ 2009-02-10 21:31 ` Andrew Morton
2009-02-10 21:45 ` Lee Schermerhorn
0 siblings, 1 reply; 3+ messages in thread
From: Andrew Morton @ 2009-02-10 21:31 UTC (permalink / raw)
To: Lee Schermerhorn
Cc: linux-kernel, stable, jeremy, keir.fraser, christophe,
alex.williamson, npiggin
On Mon, 09 Feb 2009 12:29:48 -0500
Lee Schermerhorn <Lee.Schermerhorn@hp.com> wrote:
> From: Jeremy Fitzhardinge <jeremy@goop.org>
>
> Subject: mm: rearrange exit_mmap() to unlock before arch_exit_mmap
>
> Applicable to 29-rc4 and 28-stable
>
> Christophe Saout reported [in precursor to:
> http://marc.info/?l=linux-kernel&m=123209902707347&w=4]:
>
> > Note that I also some a different issue with CONFIG_UNEVICTABLE_LRU.
> > Seems like Xen tears down current->mm early on process termination, so
> > that __get_user_pages in exit_mmap causes nasty messages when the
> > process had any mlocked pages. (in fact, it somehow manages to get into
> > the swapping code and produces a null pointer dereference trying to get
> > a swap token)
>
> Jeremy explained:
>
> Yes. In the normal case under Xen, an in-use pagetable is "pinned",
> meaning that it is RO to the kernel, and all updates must go via
> hypercall (or writes are trapped and emulated, which is much the same
> thing). An unpinned pagetable is not currently in use by any process,
> and can be directly accessed as normal RW pages.
>
> As an optimisation at process exit time, we unpin the pagetable as early
> as possible (switching the process to init_mm), so that all the normal
> pagetable teardown can happen with direct memory accesses.
>
> This happens in exit_mmap() -> arch_exit_mmap(). The munlocking happens
> a few lines below. The obvious thing to do would be to move
> arch_exit_mmap() to below the munlock code, but I think we'd want to
> call it even if mm->mmap is NULL, just to be on the safe side.
>
> Thus, this patch:
>
> exit_mmap() needs to unlock any locked vmas before calling
> arch_exit_mmap, as the latter may switch the current mm to init_mm,
> which would cause the former to fail.
>
> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
> Acked-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
>
> ---
> mm/mmap.c | 10 ++++++----
> 1 file changed, 6 insertions(+), 4 deletions(-)
>
> ===================================================================
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -2078,12 +2078,8 @@
> unsigned long end;
>
> /* mm's last user has gone, and its about to be pulled down */
> - arch_exit_mmap(mm);
> mmu_notifier_release(mm);
>
> - if (!mm->mmap) /* Can happen if dup_mmap() received an OOM */
> - return;
> -
> if (mm->locked_vm) {
> vma = mm->mmap;
> while (vma) {
> @@ -2092,7 +2088,13 @@
> vma = vma->vm_next;
> }
> }
> +
> + arch_exit_mmap(mm);
> +
> vma = mm->mmap;
> + if (!vma) /* Can happen if dup_mmap() received an OOM */
> + return;
> +
> lru_add_drain();
> flush_cache_mm(mm);
> tlb = tlb_gather_mmu(mm, 1);
The patch as it stands doesn't apply cleanly to 2.6.28. I didn't look
into what needs to be done to fix it up. Presumably the stable beavers
would like a fixed-up and tested version for backporting sometime.
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] mm: rearrange exit_mmap() to unlock before arch_exit_mmap
2009-02-10 21:31 ` Andrew Morton
@ 2009-02-10 21:45 ` Lee Schermerhorn
0 siblings, 0 replies; 3+ messages in thread
From: Lee Schermerhorn @ 2009-02-10 21:45 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-kernel, stable, jeremy, keir.fraser, christophe,
alex.williamson, npiggin
On Tue, 2009-02-10 at 13:31 -0800, Andrew Morton wrote:
> On Mon, 09 Feb 2009 12:29:48 -0500
> Lee Schermerhorn <Lee.Schermerhorn@hp.com> wrote:
>
> > From: Jeremy Fitzhardinge <jeremy@goop.org>
> >
> > Subject: mm: rearrange exit_mmap() to unlock before arch_exit_mmap
> >
> > Applicable to 29-rc4 and 28-stable
> >
> > Christophe Saout reported [in precursor to:
> > http://marc.info/?l=linux-kernel&m=123209902707347&w=4]:
> >
> > > Note that I also some a different issue with CONFIG_UNEVICTABLE_LRU.
> > > Seems like Xen tears down current->mm early on process termination, so
> > > that __get_user_pages in exit_mmap causes nasty messages when the
> > > process had any mlocked pages. (in fact, it somehow manages to get into
> > > the swapping code and produces a null pointer dereference trying to get
> > > a swap token)
> >
> > Jeremy explained:
> >
> > Yes. In the normal case under Xen, an in-use pagetable is "pinned",
> > meaning that it is RO to the kernel, and all updates must go via
> > hypercall (or writes are trapped and emulated, which is much the same
> > thing). An unpinned pagetable is not currently in use by any process,
> > and can be directly accessed as normal RW pages.
> >
> > As an optimisation at process exit time, we unpin the pagetable as early
> > as possible (switching the process to init_mm), so that all the normal
> > pagetable teardown can happen with direct memory accesses.
> >
> > This happens in exit_mmap() -> arch_exit_mmap(). The munlocking happens
> > a few lines below. The obvious thing to do would be to move
> > arch_exit_mmap() to below the munlock code, but I think we'd want to
> > call it even if mm->mmap is NULL, just to be on the safe side.
> >
> > Thus, this patch:
> >
> > exit_mmap() needs to unlock any locked vmas before calling
> > arch_exit_mmap, as the latter may switch the current mm to init_mm,
> > which would cause the former to fail.
> >
> > Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
> > Acked-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
> >
> > ---
> > mm/mmap.c | 10 ++++++----
> > 1 file changed, 6 insertions(+), 4 deletions(-)
> >
> > ===================================================================
> > --- a/mm/mmap.c
> > +++ b/mm/mmap.c
> > @@ -2078,12 +2078,8 @@
> > unsigned long end;
> >
> > /* mm's last user has gone, and its about to be pulled down */
> > - arch_exit_mmap(mm);
> > mmu_notifier_release(mm);
> >
> > - if (!mm->mmap) /* Can happen if dup_mmap() received an OOM */
> > - return;
> > -
> > if (mm->locked_vm) {
> > vma = mm->mmap;
> > while (vma) {
> > @@ -2092,7 +2088,13 @@
> > vma = vma->vm_next;
> > }
> > }
> > +
> > + arch_exit_mmap(mm);
> > +
> > vma = mm->mmap;
> > + if (!vma) /* Can happen if dup_mmap() received an OOM */
> > + return;
> > +
> > lru_add_drain();
> > flush_cache_mm(mm);
> > tlb = tlb_gather_mmu(mm, 1);
>
> The patch as it stands doesn't apply cleanly to 2.6.28. I didn't look
> into what needs to be done to fix it up. Presumably the stable beavers
> would like a fixed-up and tested version for backporting sometime.
The only difference I can see in this area, looking at a 28.2 tree I
have, the 28.4 patch on kernel.org and the context in the patch, is that
"flush_cache_mm(mm);" line.
Lee
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2009-02-10 21:45 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-02-09 17:29 [PATCH] mm: rearrange exit_mmap() to unlock before arch_exit_mmap Lee Schermerhorn
2009-02-10 21:31 ` Andrew Morton
2009-02-10 21:45 ` Lee Schermerhorn
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox