From: Ingo Molnar <mingo@kernel.org>
To: Davidlohr Bueso <davidlohr@hp.com>, Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
Michel Lespinasse <walken@google.com>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
Rik van Riel <riel@redhat.com>,
Tim Chen <tim.c.chen@linux.intel.com>,
aswin@hp.com, linux-mm <linux-mm@kvack.org>,
linux-kernel@vger.kernel.org,
Russell King <linux@arm.linux.org.uk>,
Catalin Marinas <catalin.marinas@arm.com>,
Will Deacon <will.deacon@arm.com>,
Richard Kuo <rkuo@codeaurora.org>,
Ralf Baechle <ralf@linux-mips.org>,
Benjamin Herrenschmidt <benh@kernel.crashing.org>,
Paul Mackerras <paulus@samba.org>,
Martin Schwidefsky <schwidefsky@de.ibm.com>,
Heiko Carstens <heiko.carstens@de.ibm.com>,
Paul Mundt <lethal@linux-sh.org>,
Chris Metcalf <cmetcalf@tilera.com>,
Jeff Dike <jdike@addtoit.com>,
Richard Weinberger <richard@nod.at>,
Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>
Subject: [PATCH 4/3] x86/vdso: Optimize setup_additional_pages()
Date: Fri, 18 Oct 2013 08:05:01 +0200 [thread overview]
Message-ID: <20131018060501.GA3411@gmail.com> (raw)
In-Reply-To: <1382057438-3306-4-git-send-email-davidlohr@hp.com>
* Davidlohr Bueso <davidlohr@hp.com> wrote:
> --- a/arch/x86/vdso/vma.c
> +++ b/arch/x86/vdso/vma.c
> @@ -154,12 +154,17 @@ static int setup_additional_pages(struct linux_binprm *bprm,
> unsigned size)
> {
> struct mm_struct *mm = current->mm;
> + struct vm_area_struct *vma;
> unsigned long addr;
> int ret;
>
> if (!vdso_enabled)
> return 0;
>
> + vma = kmem_cache_zalloc(vm_area_cachep, GFP_KERNEL);
> + if (unlikely(!vma))
> + return -ENOMEM;
> +
> down_write(&mm->mmap_sem);
> addr = vdso_addr(mm->start_stack, size);
> addr = get_unmapped_area(NULL, addr, size, 0, 0);
> @@ -173,14 +178,17 @@ static int setup_additional_pages(struct linux_binprm *bprm,
> ret = install_special_mapping(mm, addr, size,
> VM_READ|VM_EXEC|
> VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC,
> - pages);
> + pages, &vma);
> if (ret) {
> current->mm->context.vdso = NULL;
> goto up_fail;
> }
>
> + up_write(&mm->mmap_sem);
> + return ret;
> up_fail:
> up_write(&mm->mmap_sem);
> + kmem_cache_free(vm_area_cachep, vma);
> return ret;
> }
>
1)
Beyond the simplification that Linus suggested, why not introduce a new
function, named 'install_special_mapping_vma()' or so, and convert
architectures one by one, without pressure to get it all done (and all
correct) in a single patch?
2)
I don't see the justification: this code gets executed in exec() where a
new mm has just been allocated. There's only a single user of the mm and
thus the critical section width of mmap_sem is more or less irrelevant.
mmap_sem critical section size only matters for codepaths that threaded
programs can hit.
3)
But, if we do all that, a couple of other (micro-)optimizations are
possible in setup_additional_pages() as well:
- vdso_addr(), which is actually much _more_ expensive than kmalloc()
because on most distros it will call into the RNG, can also be done
outside the mmap_sem.
- the error paths can all be merged and the common case can be made
fall-through.
- use 'mm' consistently instead of repeating 'current->mm'
- set 'mm->context.vdso' only once we know it's all a success, and do it
outside the lock
- add a few comments about which operations are locked, which unlocked,
and why. Please double check the assumptions I documented there.
See the diff attached below. (Totally untested and all that.)
Also note that I think, in theory, if exec() guaranteed the privacy and
single threadedness of the new mm, we could probably do _all_ of this
unlocked. Right now I don't think this is guaranteed: ptrace() users might
look up the new PID and might interfere on the MM via
PTRACE_PEEK*/PTRACE_POKE*.
( Furthermore, /proc/ might also give early access to aspects of the mm -
although no manipulation of the mm is possible there. )
If such privacy of the new mm was guaranteed then that would also remove
the need to move the allocation out of install_special_mapping().
But, I don't think it all matters, due to #2 - and your changes actively
complicate setup_pages(), which makes this security sensitive code a bit
more fragile. We can still do it out of sheer principle, I just don't see
where it's supposed to help scale better.
Thanks,
Ingo
arch/x86/vdso/vma.c | 40 +++++++++++++++++++++++++++-------------
1 file changed, 27 insertions(+), 13 deletions(-)
diff --git a/arch/x86/vdso/vma.c b/arch/x86/vdso/vma.c
index 431e875..c590387 100644
--- a/arch/x86/vdso/vma.c
+++ b/arch/x86/vdso/vma.c
@@ -157,30 +157,44 @@ static int setup_additional_pages(struct linux_binprm *bprm,
unsigned long addr;
int ret;
- if (!vdso_enabled)
+ if (unlikely(!vdso_enabled))
return 0;
- down_write(&mm->mmap_sem);
+ /*
+ * Do this outside the MM lock - we are in exec() with a new MM,
+ * nobody else can use these fields of the mm:
+ */
addr = vdso_addr(mm->start_stack, size);
- addr = get_unmapped_area(NULL, addr, size, 0, 0);
- if (IS_ERR_VALUE(addr)) {
- ret = addr;
- goto up_fail;
- }
- current->mm->context.vdso = (void *)addr;
+ /*
+ * This must be done under the MM lock - there might be parallel
+ * accesses to this mm, such as ptrace().
+ *
+ * [ This could be further optimized if exec() reliably inhibited
+ * all early access to the mm. ]
+ */
+ down_write(&mm->mmap_sem);
+ addr = get_unmapped_area(NULL, addr, size, 0, 0);
+ if (IS_ERR_VALUE(addr))
+ goto up_fail_addr;
ret = install_special_mapping(mm, addr, size,
VM_READ|VM_EXEC|
VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC,
pages);
- if (ret) {
- current->mm->context.vdso = NULL;
- goto up_fail;
- }
+ up_write(&mm->mmap_sem);
+ if (ret)
+ goto fail;
-up_fail:
+ mm->context.vdso = (void *)addr;
+ return ret;
+
+up_fail_addr:
+ ret = addr;
up_write(&mm->mmap_sem);
+fail:
+ mm->context.vdso = NULL;
+
return ret;
}
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2013-10-18 6:05 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-10-18 0:50 [PATCH 0/3] mm,vdso: preallocate new vmas Davidlohr Bueso
2013-10-18 0:50 ` [PATCH 1/3] mm: add mlock_future_check helper Davidlohr Bueso
2013-10-23 9:30 ` walken
2013-10-18 0:50 ` [PATCH 2/3] mm/mlock: prepare params outside critical region Davidlohr Bueso
2013-10-23 9:33 ` walken
2013-10-23 9:46 ` Vlastimil Babka
2013-10-18 0:50 ` [PATCH 3/3] vdso: preallocate new vmas Davidlohr Bueso
2013-10-18 1:17 ` Linus Torvalds
2013-10-18 5:59 ` Richard Weinberger
2013-10-18 6:05 ` Ingo Molnar [this message]
2013-10-21 3:52 ` [PATCH 4/3] x86/vdso: Optimize setup_additional_pages() Davidlohr Bueso
2013-10-21 5:27 ` Ingo Molnar
2013-10-21 3:26 ` [PATCH 3/3] vdso: preallocate new vmas Davidlohr Bueso
2013-10-23 9:53 ` walken
2013-10-25 0:55 ` Davidlohr Bueso
2013-10-22 15:48 ` [PATCH 0/3] mm,vdso: " walken
2013-10-22 16:20 ` Linus Torvalds
2013-10-22 17:04 ` Michel Lespinasse
2013-10-22 17:54 ` Andy Lutomirski
2013-10-23 10:13 ` Michel Lespinasse
2013-10-23 21:42 ` Andy Lutomirski
2013-10-23 2:46 ` Davidlohr Bueso
2013-11-05 0:39 ` Davidlohr Bueso
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20131018060501.GA3411@gmail.com \
--to=mingo@kernel.org \
--cc=a.p.zijlstra@chello.nl \
--cc=akpm@linux-foundation.org \
--cc=aswin@hp.com \
--cc=benh@kernel.crashing.org \
--cc=catalin.marinas@arm.com \
--cc=cmetcalf@tilera.com \
--cc=davidlohr@hp.com \
--cc=heiko.carstens@de.ibm.com \
--cc=hpa@zytor.com \
--cc=jdike@addtoit.com \
--cc=lethal@linux-sh.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux@arm.linux.org.uk \
--cc=mingo@redhat.com \
--cc=paulus@samba.org \
--cc=ralf@linux-mips.org \
--cc=richard@nod.at \
--cc=riel@redhat.com \
--cc=rkuo@codeaurora.org \
--cc=schwidefsky@de.ibm.com \
--cc=tglx@linutronix.de \
--cc=tim.c.chen@linux.intel.com \
--cc=torvalds@linux-foundation.org \
--cc=viro@zeniv.linux.org.uk \
--cc=walken@google.com \
--cc=will.deacon@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).