From: Ingo Molnar <mingo@kernel.org>
To: Davidlohr Bueso <davidlohr@hp.com>, Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
Michel Lespinasse <walken@google.com>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
Rik van Riel <riel@redhat.com>,
Tim Chen <tim.c.chen@linux.intel.com>,
aswin@hp.com, linux-mm <linux-mm@kvack.org>,
linux-kernel@vger.kernel.org,
Russell King <linux@arm.linux.org.uk>,
Catalin Marinas <catalin.marinas@arm.com>,
Will Deacon <will.deacon@arm.com>,
Richard Kuo <rkuo@codeaurora.org>,
Ralf Baechle <ralf@linux-mips.org>,
Benjamin Herrenschmidt <benh@kernel.crashing.org>,
Paul Mackerras <paulus@samba.org>,
Martin Schwidefsky <schwidefsky@de.ibm.com>,
Heiko Carstens <heiko.carstens@de.ibm.com>,
Paul Mundt <lethal@linux-sh.org>,
Chris Metcalf <cmetcalf@tilera.com>,
Jeff Dike <jdike@addtoit.com>,
Richard Weinberger <richard@nod.at>,
Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>
Subject: [PATCH 4/3] x86/vdso: Optimize setup_additional_pages()
Date: Fri, 18 Oct 2013 08:05:01 +0200 [thread overview]
Message-ID: <20131018060501.GA3411@gmail.com> (raw)
In-Reply-To: <1382057438-3306-4-git-send-email-davidlohr@hp.com>
* Davidlohr Bueso <davidlohr@hp.com> wrote:
> --- a/arch/x86/vdso/vma.c
> +++ b/arch/x86/vdso/vma.c
> @@ -154,12 +154,17 @@ static int setup_additional_pages(struct linux_binprm *bprm,
> unsigned size)
> {
> struct mm_struct *mm = current->mm;
> + struct vm_area_struct *vma;
> unsigned long addr;
> int ret;
>
> if (!vdso_enabled)
> return 0;
>
> + vma = kmem_cache_zalloc(vm_area_cachep, GFP_KERNEL);
> + if (unlikely(!vma))
> + return -ENOMEM;
> +
> down_write(&mm->mmap_sem);
> addr = vdso_addr(mm->start_stack, size);
> addr = get_unmapped_area(NULL, addr, size, 0, 0);
> @@ -173,14 +178,17 @@ static int setup_additional_pages(struct linux_binprm *bprm,
> ret = install_special_mapping(mm, addr, size,
> VM_READ|VM_EXEC|
> VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC,
> - pages);
> + pages, &vma);
> if (ret) {
> current->mm->context.vdso = NULL;
> goto up_fail;
> }
>
> + up_write(&mm->mmap_sem);
> + return ret;
> up_fail:
> up_write(&mm->mmap_sem);
> + kmem_cache_free(vm_area_cachep, vma);
> return ret;
> }
>
1)
Beyond the simplification that Linus suggested, why not introduce a new
function, named 'install_special_mapping_vma()' or so, and convert
architectures one by one, without pressure to get it all done (and all
correct) in a single patch?
2)
I don't see the justification: this code gets executed in exec() where a
new mm has just been allocated. There's only a single user of the mm and
thus the critical section width of mmap_sem is more or less irrelevant.
mmap_sem critical section size only matters for codepaths that threaded
programs can hit.
3)
But, if we do all that, a couple of other (micro-)optimizations are
possible in setup_additional_pages() as well:
- vdso_addr(), which is actually much _more_ expensive than kmalloc()
because on most distros it will call into the RNG, can also be done
outside the mmap_sem.
- the error paths can all be merged and the common case can be made
fall-through.
- use 'mm' consistently instead of repeating 'current->mm'
- set 'mm->context.vdso' only once we know it's all a success, and do it
outside the lock
- add a few comments about which operations are locked, which unlocked,
and why. Please double check the assumptions I documented there.
See the diff attached below. (Totally untested and all that.)
Also note that I think, in theory, if exec() guaranteed the privacy and
single threadedness of the new mm, we could probably do _all_ of this
unlocked. Right now I don't think this is guaranteed: ptrace() users might
look up the new PID and might interfere on the MM via
PTRACE_PEEK*/PTRACE_POKE*.
( Furthermore, /proc/ might also give early access to aspects of the mm -
although no manipulation of the mm is possible there. )
If such privacy of the new mm was guaranteed then that would also remove
the need to move the allocation out of install_special_mapping().
But, I don't think it all matters, due to #2 - and your changes actively
complicate setup_pages(), which makes this security sensitive code a bit
more fragile. We can still do it out of sheer principle, I just don't see
where it's supposed to help scale better.
Thanks,
Ingo
arch/x86/vdso/vma.c | 40 +++++++++++++++++++++++++++-------------
1 file changed, 27 insertions(+), 13 deletions(-)
diff --git a/arch/x86/vdso/vma.c b/arch/x86/vdso/vma.c
index 431e875..c590387 100644
--- a/arch/x86/vdso/vma.c
+++ b/arch/x86/vdso/vma.c
@@ -157,30 +157,44 @@ static int setup_additional_pages(struct linux_binprm *bprm,
unsigned long addr;
int ret;
- if (!vdso_enabled)
+ if (unlikely(!vdso_enabled))
return 0;
- down_write(&mm->mmap_sem);
+ /*
+ * Do this outside the MM lock - we are in exec() with a new MM,
+ * nobody else can use these fields of the mm:
+ */
addr = vdso_addr(mm->start_stack, size);
- addr = get_unmapped_area(NULL, addr, size, 0, 0);
- if (IS_ERR_VALUE(addr)) {
- ret = addr;
- goto up_fail;
- }
- current->mm->context.vdso = (void *)addr;
+ /*
+ * This must be done under the MM lock - there might be parallel
+ * accesses to this mm, such as ptrace().
+ *
+ * [ This could be further optimized if exec() reliably inhibited
+ * all early access to the mm. ]
+ */
+ down_write(&mm->mmap_sem);
+ addr = get_unmapped_area(NULL, addr, size, 0, 0);
+ if (IS_ERR_VALUE(addr))
+ goto up_fail_addr;
ret = install_special_mapping(mm, addr, size,
VM_READ|VM_EXEC|
VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC,
pages);
- if (ret) {
- current->mm->context.vdso = NULL;
- goto up_fail;
- }
+ up_write(&mm->mmap_sem);
+ if (ret)
+ goto fail;
-up_fail:
+ mm->context.vdso = (void *)addr;
+ return ret;
+
+up_fail_addr:
+ ret = addr;
up_write(&mm->mmap_sem);
+fail:
+ mm->context.vdso = NULL;
+
return ret;
}
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Ingo Molnar <mingo@kernel.org>
To: Davidlohr Bueso <davidlohr@hp.com>, Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
Michel Lespinasse <walken@google.com>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
Rik van Riel <riel@redhat.com>,
Tim Chen <tim.c.chen@linux.intel.com>,
aswin@hp.com, linux-mm <linux-mm@kvack.org>,
linux-kernel@vger.kernel.org,
Russell King <linux@arm.linux.org.uk>,
Catalin Marinas <catalin.marinas@arm.com>,
Will Deacon <will.deacon@arm.com>,
Richard Kuo <rkuo@codeaurora.org>,
Ralf Baechle <ralf@linux-mips.org>,
Benjamin Herrenschmidt <benh@kernel.crashing.org>,
Paul Mackerras <paulus@samba.org>,
Martin Schwidefsky <schwidefsky@de.ibm.com>,
Heiko Carstens <heiko.carstens@de.ibm.com>,
Paul Mundt <lethal@linux-sh.org>,
Chris Metcalf <cmetcalf@tilera.com>,
Jeff Dike <jdike@addtoit.com>,
Richard Weinberger <richard@nod.at>,
Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>
Subject: [PATCH 4/3] x86/vdso: Optimize setup_additional_pages()
Date: Fri, 18 Oct 2013 08:05:01 +0200 [thread overview]
Message-ID: <20131018060501.GA3411@gmail.com> (raw)
In-Reply-To: <1382057438-3306-4-git-send-email-davidlohr@hp.com>
* Davidlohr Bueso <davidlohr@hp.com> wrote:
> --- a/arch/x86/vdso/vma.c
> +++ b/arch/x86/vdso/vma.c
> @@ -154,12 +154,17 @@ static int setup_additional_pages(struct linux_binprm *bprm,
> unsigned size)
> {
> struct mm_struct *mm = current->mm;
> + struct vm_area_struct *vma;
> unsigned long addr;
> int ret;
>
> if (!vdso_enabled)
> return 0;
>
> + vma = kmem_cache_zalloc(vm_area_cachep, GFP_KERNEL);
> + if (unlikely(!vma))
> + return -ENOMEM;
> +
> down_write(&mm->mmap_sem);
> addr = vdso_addr(mm->start_stack, size);
> addr = get_unmapped_area(NULL, addr, size, 0, 0);
> @@ -173,14 +178,17 @@ static int setup_additional_pages(struct linux_binprm *bprm,
> ret = install_special_mapping(mm, addr, size,
> VM_READ|VM_EXEC|
> VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC,
> - pages);
> + pages, &vma);
> if (ret) {
> current->mm->context.vdso = NULL;
> goto up_fail;
> }
>
> + up_write(&mm->mmap_sem);
> + return ret;
> up_fail:
> up_write(&mm->mmap_sem);
> + kmem_cache_free(vm_area_cachep, vma);
> return ret;
> }
>
1)
Beyond the simplification that Linus suggested, why not introduce a new
function, named 'install_special_mapping_vma()' or so, and convert
architectures one by one, without pressure to get it all done (and all
correct) in a single patch?
2)
I don't see the justification: this code gets executed in exec() where a
new mm has just been allocated. There's only a single user of the mm and
thus the critical section width of mmap_sem is more or less irrelevant.
mmap_sem critical section size only matters for codepaths that threaded
programs can hit.
3)
But, if we do all that, a couple of other (micro-)optimizations are
possible in setup_additional_pages() as well:
- vdso_addr(), which is actually much _more_ expensive than kmalloc()
because on most distros it will call into the RNG, can also be done
outside the mmap_sem.
- the error paths can all be merged and the common case can be made
fall-through.
- use 'mm' consistently instead of repeating 'current->mm'
- set 'mm->context.vdso' only once we know it's all a success, and do it
outside the lock
- add a few comments about which operations are locked, which unlocked,
and why. Please double check the assumptions I documented there.
See the diff attached below. (Totally untested and all that.)
Also note that I think, in theory, if exec() guaranteed the privacy and
single threadedness of the new mm, we could probably do _all_ of this
unlocked. Right now I don't think this is guaranteed: ptrace() users might
look up the new PID and might interfere on the MM via
PTRACE_PEEK*/PTRACE_POKE*.
( Furthermore, /proc/ might also give early access to aspects of the mm -
although no manipulation of the mm is possible there. )
If such privacy of the new mm was guaranteed then that would also remove
the need to move the allocation out of install_special_mapping().
But, I don't think it all matters, due to #2 - and your changes actively
complicate setup_pages(), which makes this security sensitive code a bit
more fragile. We can still do it out of sheer principle, I just don't see
where it's supposed to help scale better.
Thanks,
Ingo
arch/x86/vdso/vma.c | 40 +++++++++++++++++++++++++++-------------
1 file changed, 27 insertions(+), 13 deletions(-)
diff --git a/arch/x86/vdso/vma.c b/arch/x86/vdso/vma.c
index 431e875..c590387 100644
--- a/arch/x86/vdso/vma.c
+++ b/arch/x86/vdso/vma.c
@@ -157,30 +157,44 @@ static int setup_additional_pages(struct linux_binprm *bprm,
unsigned long addr;
int ret;
- if (!vdso_enabled)
+ if (unlikely(!vdso_enabled))
return 0;
- down_write(&mm->mmap_sem);
+ /*
+ * Do this outside the MM lock - we are in exec() with a new MM,
+ * nobody else can use these fields of the mm:
+ */
addr = vdso_addr(mm->start_stack, size);
- addr = get_unmapped_area(NULL, addr, size, 0, 0);
- if (IS_ERR_VALUE(addr)) {
- ret = addr;
- goto up_fail;
- }
- current->mm->context.vdso = (void *)addr;
+ /*
+ * This must be done under the MM lock - there might be parallel
+ * accesses to this mm, such as ptrace().
+ *
+ * [ This could be further optimized if exec() reliably inhibited
+ * all early access to the mm. ]
+ */
+ down_write(&mm->mmap_sem);
+ addr = get_unmapped_area(NULL, addr, size, 0, 0);
+ if (IS_ERR_VALUE(addr))
+ goto up_fail_addr;
ret = install_special_mapping(mm, addr, size,
VM_READ|VM_EXEC|
VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC,
pages);
- if (ret) {
- current->mm->context.vdso = NULL;
- goto up_fail;
- }
+ up_write(&mm->mmap_sem);
+ if (ret)
+ goto fail;
-up_fail:
+ mm->context.vdso = (void *)addr;
+ return ret;
+
+up_fail_addr:
+ ret = addr;
up_write(&mm->mmap_sem);
+fail:
+ mm->context.vdso = NULL;
+
return ret;
}
next prev parent reply other threads:[~2013-10-18 6:05 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-10-18 0:50 [PATCH 0/3] mm,vdso: preallocate new vmas Davidlohr Bueso
2013-10-18 0:50 ` Davidlohr Bueso
2013-10-18 0:50 ` [PATCH 1/3] mm: add mlock_future_check helper Davidlohr Bueso
2013-10-18 0:50 ` Davidlohr Bueso
2013-10-23 9:30 ` walken
2013-10-23 9:30 ` walken
2013-10-18 0:50 ` [PATCH 2/3] mm/mlock: prepare params outside critical region Davidlohr Bueso
2013-10-18 0:50 ` Davidlohr Bueso
2013-10-23 9:33 ` walken
2013-10-23 9:33 ` walken
2013-10-23 9:46 ` Vlastimil Babka
2013-10-23 9:46 ` Vlastimil Babka
2013-10-18 0:50 ` [PATCH 3/3] vdso: preallocate new vmas Davidlohr Bueso
2013-10-18 0:50 ` Davidlohr Bueso
2013-10-18 1:17 ` Linus Torvalds
2013-10-18 1:17 ` Linus Torvalds
2013-10-18 5:59 ` Richard Weinberger
2013-10-18 5:59 ` Richard Weinberger
2013-10-18 6:05 ` Ingo Molnar [this message]
2013-10-18 6:05 ` [PATCH 4/3] x86/vdso: Optimize setup_additional_pages() Ingo Molnar
2013-10-21 3:52 ` Davidlohr Bueso
2013-10-21 3:52 ` Davidlohr Bueso
2013-10-21 5:27 ` Ingo Molnar
2013-10-21 5:27 ` Ingo Molnar
2013-10-21 3:26 ` [PATCH 3/3] vdso: preallocate new vmas Davidlohr Bueso
2013-10-21 3:26 ` Davidlohr Bueso
2013-10-23 9:53 ` walken
2013-10-23 9:53 ` walken
2013-10-25 0:55 ` Davidlohr Bueso
2013-10-25 0:55 ` Davidlohr Bueso
2013-10-22 15:48 ` [PATCH 0/3] mm,vdso: " walken
2013-10-22 15:48 ` walken
2013-10-22 16:20 ` Linus Torvalds
2013-10-22 16:20 ` Linus Torvalds
2013-10-22 17:04 ` Michel Lespinasse
2013-10-22 17:04 ` Michel Lespinasse
2013-10-22 17:54 ` Andy Lutomirski
2013-10-22 17:54 ` Andy Lutomirski
2013-10-23 10:13 ` Michel Lespinasse
2013-10-23 10:13 ` Michel Lespinasse
2013-10-23 21:42 ` Andy Lutomirski
2013-10-23 21:42 ` Andy Lutomirski
2013-10-23 2:46 ` Davidlohr Bueso
2013-10-23 2:46 ` Davidlohr Bueso
2013-11-05 0:39 ` Davidlohr Bueso
2013-11-05 0:39 ` Davidlohr Bueso
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20131018060501.GA3411@gmail.com \
--to=mingo@kernel.org \
--cc=a.p.zijlstra@chello.nl \
--cc=akpm@linux-foundation.org \
--cc=aswin@hp.com \
--cc=benh@kernel.crashing.org \
--cc=catalin.marinas@arm.com \
--cc=cmetcalf@tilera.com \
--cc=davidlohr@hp.com \
--cc=heiko.carstens@de.ibm.com \
--cc=hpa@zytor.com \
--cc=jdike@addtoit.com \
--cc=lethal@linux-sh.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux@arm.linux.org.uk \
--cc=mingo@redhat.com \
--cc=paulus@samba.org \
--cc=ralf@linux-mips.org \
--cc=richard@nod.at \
--cc=riel@redhat.com \
--cc=rkuo@codeaurora.org \
--cc=schwidefsky@de.ibm.com \
--cc=tglx@linutronix.de \
--cc=tim.c.chen@linux.intel.com \
--cc=torvalds@linux-foundation.org \
--cc=viro@zeniv.linux.org.uk \
--cc=walken@google.com \
--cc=will.deacon@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.