All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dmitry Safonov <dima@arista.com>
To: linux-kernel@vger.kernel.org
Cc: Dmitry Safonov <dima@arista.com>, Adrian Reber <adrian@lisas.de>,
	Andrei Vagin <avagin@openvz.org>,
	Andy Lutomirski <luto@kernel.org>, Arnd Bergmann <arnd@arndb.de>,
	Christian Brauner <christian.brauner@ubuntu.com>,
	Cyrill Gorcunov <gorcunov@openvz.org>,
	Dmitry Safonov <0x7f454c46@gmail.com>,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	"H. Peter Anvin" <hpa@zytor.com>, Ingo Molnar <mingo@redhat.com>,
	Jann Horn <jannh@google.com>, Jeff Dike <jdike@addtoit.com>,
	Oleg Nesterov <oleg@redhat.com>,
	Pavel Emelyanov <xemul@virtuozzo.com>,
	Shuah Khan <shuah@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Vincenzo Frascino <vincenzo.frascino@arm.com>,
	containers@lists.linux-foundation.org, criu@openvz.org,
	linux-api@vger.kernel.org, x86@kernel.orgAndrei
Subject: [PATCHv4 17/28] x86/vdso: Switch image on setns()/unshare()/clone()
Date: Wed, 12 Jun 2019 20:26:16 +0100	[thread overview]
Message-ID: <20190612192628.23797-18-dima@arista.com> (raw)
In-Reply-To: <20190612192628.23797-1-dima@arista.com>

As it has been discussed on timens RFC, adding a new conditional branch
`if (inside_time_ns)` on VDSO for all processes is undesirable.
It will add a penalty for everybody as branch predictor may mispredict
the jump. Also there are instruction cache lines wasted on cmp/jmp.

Those effects of introducing time namespace are very much unwanted
having in mind how much work have been spent on micro-optimisation
vdso code.

Addressing those problems, there are two versions of VDSO's .so:
for host tasks (without any penalty) and for processes inside of time
namespace with clk_to_ns() that subtracts offsets from host's time.

Whenever a user does setns()/unshare() or clone() with CLONE_TIMENS,
change VDSO image in mm and zap existing VVAR/VDSO page tables.
They will be re-faulted with corresponding image and VVAR offsets.

Co-developed-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
---
 arch/x86/entry/vdso/vma.c   | 28 ++++++++++++++++++++++++++++
 arch/x86/include/asm/vdso.h |  1 +
 kernel/time_namespace.c     | 11 +++++++++++
 3 files changed, 40 insertions(+)

diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
index cc06c6b70167..3ed5bf4932af 100644
--- a/arch/x86/entry/vdso/vma.c
+++ b/arch/x86/entry/vdso/vma.c
@@ -25,6 +25,7 @@
 #include <asm/cpufeature.h>
 #include <asm/mshyperv.h>
 #include <asm/page.h>
+#include <asm/tlb.h>
 
 #if defined(CONFIG_X86_64)
 unsigned int __read_mostly vdso64_enabled = 1;
@@ -266,6 +267,33 @@ static const struct vm_special_mapping vvar_mapping = {
 	.mremap = vvar_mremap,
 };
 
+#ifdef CONFIG_TIME_NS
+int vdso_join_timens(struct task_struct *task)
+{
+	struct mm_struct *mm = task->mm;
+	struct vm_area_struct *vma;
+
+	if (down_write_killable(&mm->mmap_sem))
+		return -EINTR;
+
+	for (vma = mm->mmap; vma; vma = vma->vm_next) {
+		unsigned long size = vma->vm_end - vma->vm_start;
+
+		if (vma_is_special_mapping(vma, &vvar_mapping) ||
+		    vma_is_special_mapping(vma, &vdso_mapping))
+			zap_page_range(vma, vma->vm_start, size);
+	}
+
+	up_write(&mm->mmap_sem);
+	return 0;
+}
+#else /* CONFIG_TIME_NS */
+int vdso_join_timens(struct task_struct *task)
+{
+	return -ENXIO;
+}
+#endif
+
 /*
  * Add vdso and vvar mappings to current process.
  * @image          - blob to map
diff --git a/arch/x86/include/asm/vdso.h b/arch/x86/include/asm/vdso.h
index 03f468c63a24..ccf89dedd04f 100644
--- a/arch/x86/include/asm/vdso.h
+++ b/arch/x86/include/asm/vdso.h
@@ -45,6 +45,7 @@ extern struct vdso_image vdso_image_32;
 extern void __init init_vdso_image(struct vdso_image *image);
 
 extern int map_vdso_once(const struct vdso_image *image, unsigned long addr);
+extern int vdso_join_timens(struct task_struct *task);
 
 #endif /* __ASSEMBLER__ */
 
diff --git a/kernel/time_namespace.c b/kernel/time_namespace.c
index b3cffdf2635c..2a2cab14ac29 100644
--- a/kernel/time_namespace.c
+++ b/kernel/time_namespace.c
@@ -14,6 +14,7 @@
 #include <linux/proc_ns.h>
 #include <linux/sched/task.h>
 #include <linux/mm.h>
+#include <asm/vdso.h>
 
 ktime_t do_timens_ktime_to_host(clockid_t clockid, ktime_t tim, struct timens_offsets *ns_offsets)
 {
@@ -182,11 +183,16 @@ static void timens_put(struct ns_common *ns)
 static int timens_install(struct nsproxy *nsproxy, struct ns_common *new)
 {
 	struct time_namespace *ns = to_time_ns(new);
+	int ret;
 
 	if (!ns_capable(ns->user_ns, CAP_SYS_ADMIN) ||
 	    !ns_capable(current_user_ns(), CAP_SYS_ADMIN))
 		return -EPERM;
 
+	ret = vdso_join_timens(current);
+	if (ret)
+		return ret;
+
 	get_time_ns(ns);
 	get_time_ns(ns);
 	put_time_ns(nsproxy->time_ns);
@@ -201,10 +207,15 @@ int timens_on_fork(struct nsproxy *nsproxy, struct task_struct *tsk)
 {
 	struct ns_common *nsc = &nsproxy->time_ns_for_children->ns;
 	struct time_namespace *ns = to_time_ns(nsc);
+	int ret;
 
 	if (nsproxy->time_ns == nsproxy->time_ns_for_children)
 		return 0;
 
+	ret = vdso_join_timens(tsk);
+	if (ret)
+		return ret;
+
 	get_time_ns(ns);
 	put_time_ns(nsproxy->time_ns);
 	nsproxy->time_ns = ns;
-- 
2.22.0

WARNING: multiple messages have this Message-ID (diff)
From: Dmitry Safonov <dima@arista.com>
To: linux-kernel@vger.kernel.org
Cc: Dmitry Safonov <dima@arista.com>, Adrian Reber <adrian@lisas.de>,
	Andrei Vagin <avagin@openvz.org>,
	Andy Lutomirski <luto@kernel.org>, Arnd Bergmann <arnd@arndb.de>,
	Christian Brauner <christian.brauner@ubuntu.com>,
	Cyrill Gorcunov <gorcunov@openvz.org>,
	Dmitry Safonov <0x7f454c46@gmail.com>,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	"H. Peter Anvin" <hpa@zytor.com>, Ingo Molnar <mingo@redhat.com>,
	Jann Horn <jannh@google.com>, Jeff Dike <jdike@addtoit.com>,
	Oleg Nesterov <oleg@redhat.com>,
	Pavel Emelyanov <xemul@virtuozzo.com>,
	Shuah Khan <shuah@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Vincenzo Frascino <vincenzo.frascino@arm.com>,
	containers@lists.linux-foundation.org, criu@openvz.org,
	linux-api@vger.kernel.org, x86@kernel.org,
	Andrei Vagin <avagin@gmail.com>
Subject: [PATCHv4 17/28] x86/vdso: Switch image on setns()/unshare()/clone()
Date: Wed, 12 Jun 2019 20:26:16 +0100	[thread overview]
Message-ID: <20190612192628.23797-18-dima@arista.com> (raw)
In-Reply-To: <20190612192628.23797-1-dima@arista.com>

As it has been discussed on timens RFC, adding a new conditional branch
`if (inside_time_ns)` on VDSO for all processes is undesirable.
It will add a penalty for everybody as branch predictor may mispredict
the jump. Also there are instruction cache lines wasted on cmp/jmp.

Those effects of introducing time namespace are very much unwanted
having in mind how much work have been spent on micro-optimisation
vdso code.

Addressing those problems, there are two versions of VDSO's .so:
for host tasks (without any penalty) and for processes inside of time
namespace with clk_to_ns() that subtracts offsets from host's time.

Whenever a user does setns()/unshare() or clone() with CLONE_TIMENS,
change VDSO image in mm and zap existing VVAR/VDSO page tables.
They will be re-faulted with corresponding image and VVAR offsets.

Co-developed-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
---
 arch/x86/entry/vdso/vma.c   | 28 ++++++++++++++++++++++++++++
 arch/x86/include/asm/vdso.h |  1 +
 kernel/time_namespace.c     | 11 +++++++++++
 3 files changed, 40 insertions(+)

diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
index cc06c6b70167..3ed5bf4932af 100644
--- a/arch/x86/entry/vdso/vma.c
+++ b/arch/x86/entry/vdso/vma.c
@@ -25,6 +25,7 @@
 #include <asm/cpufeature.h>
 #include <asm/mshyperv.h>
 #include <asm/page.h>
+#include <asm/tlb.h>
 
 #if defined(CONFIG_X86_64)
 unsigned int __read_mostly vdso64_enabled = 1;
@@ -266,6 +267,33 @@ static const struct vm_special_mapping vvar_mapping = {
 	.mremap = vvar_mremap,
 };
 
+#ifdef CONFIG_TIME_NS
+int vdso_join_timens(struct task_struct *task)
+{
+	struct mm_struct *mm = task->mm;
+	struct vm_area_struct *vma;
+
+	if (down_write_killable(&mm->mmap_sem))
+		return -EINTR;
+
+	for (vma = mm->mmap; vma; vma = vma->vm_next) {
+		unsigned long size = vma->vm_end - vma->vm_start;
+
+		if (vma_is_special_mapping(vma, &vvar_mapping) ||
+		    vma_is_special_mapping(vma, &vdso_mapping))
+			zap_page_range(vma, vma->vm_start, size);
+	}
+
+	up_write(&mm->mmap_sem);
+	return 0;
+}
+#else /* CONFIG_TIME_NS */
+int vdso_join_timens(struct task_struct *task)
+{
+	return -ENXIO;
+}
+#endif
+
 /*
  * Add vdso and vvar mappings to current process.
  * @image          - blob to map
diff --git a/arch/x86/include/asm/vdso.h b/arch/x86/include/asm/vdso.h
index 03f468c63a24..ccf89dedd04f 100644
--- a/arch/x86/include/asm/vdso.h
+++ b/arch/x86/include/asm/vdso.h
@@ -45,6 +45,7 @@ extern struct vdso_image vdso_image_32;
 extern void __init init_vdso_image(struct vdso_image *image);
 
 extern int map_vdso_once(const struct vdso_image *image, unsigned long addr);
+extern int vdso_join_timens(struct task_struct *task);
 
 #endif /* __ASSEMBLER__ */
 
diff --git a/kernel/time_namespace.c b/kernel/time_namespace.c
index b3cffdf2635c..2a2cab14ac29 100644
--- a/kernel/time_namespace.c
+++ b/kernel/time_namespace.c
@@ -14,6 +14,7 @@
 #include <linux/proc_ns.h>
 #include <linux/sched/task.h>
 #include <linux/mm.h>
+#include <asm/vdso.h>
 
 ktime_t do_timens_ktime_to_host(clockid_t clockid, ktime_t tim, struct timens_offsets *ns_offsets)
 {
@@ -182,11 +183,16 @@ static void timens_put(struct ns_common *ns)
 static int timens_install(struct nsproxy *nsproxy, struct ns_common *new)
 {
 	struct time_namespace *ns = to_time_ns(new);
+	int ret;
 
 	if (!ns_capable(ns->user_ns, CAP_SYS_ADMIN) ||
 	    !ns_capable(current_user_ns(), CAP_SYS_ADMIN))
 		return -EPERM;
 
+	ret = vdso_join_timens(current);
+	if (ret)
+		return ret;
+
 	get_time_ns(ns);
 	get_time_ns(ns);
 	put_time_ns(nsproxy->time_ns);
@@ -201,10 +207,15 @@ int timens_on_fork(struct nsproxy *nsproxy, struct task_struct *tsk)
 {
 	struct ns_common *nsc = &nsproxy->time_ns_for_children->ns;
 	struct time_namespace *ns = to_time_ns(nsc);
+	int ret;
 
 	if (nsproxy->time_ns == nsproxy->time_ns_for_children)
 		return 0;
 
+	ret = vdso_join_timens(tsk);
+	if (ret)
+		return ret;
+
 	get_time_ns(ns);
 	put_time_ns(nsproxy->time_ns);
 	nsproxy->time_ns = ns;
-- 
2.22.0


  parent reply	other threads:[~2019-06-12 19:26 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-12 19:25 [PATCHv4 00/28] kernel: Introduce Time Namespace Dmitry Safonov
2019-06-12 19:25 ` Dmitry Safonov
2019-06-12 19:26 ` [PATCHv4 01/28] ns: " Dmitry Safonov
2019-06-12 19:26 ` [PATCHv4 02/28] timens: Add timens_offsets Dmitry Safonov
2019-06-14 13:11   ` Thomas Gleixner
2019-06-14 14:32     ` Dmitry Safonov
2019-07-29 22:26     ` Dmitry Safonov
2019-06-12 19:26 ` [PATCHv4 03/28] posix-clocks: add another call back to return clock time in ktime_t Dmitry Safonov
2019-06-12 19:26   ` Dmitry Safonov
2019-06-14 13:32   ` Thomas Gleixner
2019-06-14 14:39     ` Dmitry Safonov
2019-06-12 19:26 ` [PATCHv4 04/28] timens: Introduce CLOCK_MONOTONIC offsets Dmitry Safonov
2019-06-12 19:26 ` [PATCHv4 05/28] timens: Introduce CLOCK_BOOTTIME offset Dmitry Safonov
2019-06-12 19:26   ` Dmitry Safonov
2019-06-12 19:26 ` [PATCHv4 06/28] timerfd/timens: Take into account ns clock offsets Dmitry Safonov
2019-06-12 19:26   ` Dmitry Safonov
2019-06-14 13:37   ` Thomas Gleixner
2019-06-16 17:43     ` Dmitry Safonov
2019-06-12 19:26 ` [PATCHv4 07/28] posix-timers/timens: Take into account " Dmitry Safonov
2019-06-12 19:26   ` Dmitry Safonov
2019-06-14 13:42   ` Thomas Gleixner
2019-06-16 17:45     ` Dmitry Safonov
2019-06-12 19:26 ` [PATCHv4 08/28] timens/kernel: Take into account timens clock offsets in clock_nanosleep Dmitry Safonov
2019-06-12 19:26   ` Dmitry Safonov
2019-06-14 13:49   ` Thomas Gleixner
2019-06-12 19:26 ` [PATCHv4 09/28] timens: Shift /proc/uptime Dmitry Safonov
2019-06-14 13:50   ` Thomas Gleixner
2019-06-16 17:48     ` Dmitry Safonov
2019-06-12 19:26 ` [PATCHv4 10/28] x86/vdso2c: Correct err messages on file opening Dmitry Safonov
2019-06-12 19:26 ` [PATCHv4 11/28] x86/vdso2c: Convert iterator to unsigned Dmitry Safonov
2019-06-12 19:26 ` [PATCHv4 12/28] x86/vdso/Makefile: Add vobjs32 Dmitry Safonov
2019-06-12 19:26 ` [PATCHv4 13/28] x86/vdso: Restrict splitting VVAR VMA Dmitry Safonov
2019-06-12 19:26 ` [PATCHv4 14/28] x86/vdso: Rename vdso_image {.data=>.text} Dmitry Safonov
2019-06-12 19:26 ` [PATCHv4 15/28] x86/vdso: Add offsets page in vvar Dmitry Safonov
2019-06-12 19:26   ` Dmitry Safonov
2019-06-14 13:58   ` Thomas Gleixner
2019-06-16 17:49     ` Dmitry Safonov
2019-06-12 19:26 ` [PATCHv4 16/28] x86/vdso: Allocate timens vdso Dmitry Safonov
2019-06-12 19:26 ` Dmitry Safonov [this message]
2019-06-12 19:26   ` [PATCHv4 17/28] x86/vdso: Switch image on setns()/unshare()/clone() Dmitry Safonov
2019-06-14 14:05   ` Thomas Gleixner
2019-06-16 17:51     ` Dmitry Safonov
2019-06-12 19:26 ` [PATCHv4 18/28] vdso: introduce timens_static_branch Dmitry Safonov
2019-06-12 19:26   ` Dmitry Safonov
2019-06-12 19:26 ` [PATCHv4 19/28] timens: Add align for timens_offsets Dmitry Safonov
2019-06-12 19:26 ` [PATCHv4 20/28] timens/fs/proc: Introduce /proc/pid/timens_offsets Dmitry Safonov
2019-06-12 19:26   ` Dmitry Safonov
2019-06-12 19:26 ` [PATCHv4 21/28] selftest/timens: Add Time Namespace test for supported clocks Dmitry Safonov
2019-06-12 19:26   ` Dmitry Safonov
2019-06-12 19:26 ` [PATCHv4 22/28] selftest/timens: Add a test for timerfd Dmitry Safonov
2019-06-12 19:26   ` Dmitry Safonov
2019-06-12 19:26 ` [PATCHv4 23/28] selftest/timens: Add a test for clock_nanosleep() Dmitry Safonov
2019-06-12 19:26   ` Dmitry Safonov
2019-06-12 19:26 ` [PATCHv4 24/28] selftest/timens: Add procfs selftest Dmitry Safonov
2019-06-12 19:26   ` Dmitry Safonov
2019-06-12 19:26 ` [PATCHv4 25/28] selftest/timens: Add timer offsets test Dmitry Safonov
2019-06-12 19:26   ` Dmitry Safonov
2019-06-12 19:26 ` [PATCHv4 26/28] x86/vdso: Align VDSO functions by CPU L1 cache line Dmitry Safonov
2019-06-12 19:26   ` Dmitry Safonov
2019-06-14 14:13   ` Thomas Gleixner
2019-06-23  5:26     ` Andrei Vagin
2019-06-12 19:26 ` [PATCHv4 27/28] selftests: Add a simple perf test for clock_gettime() Dmitry Safonov
2019-06-12 19:26   ` Dmitry Safonov
2019-06-12 19:26 ` [PATCHv4 28/28] selftest/timens: Check that a right vdso is mapped after fork and exec Dmitry Safonov
2019-06-12 19:26   ` Dmitry Safonov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190612192628.23797-18-dima@arista.com \
    --to=dima@arista.com \
    --cc=0x7f454c46@gmail.com \
    --cc=adrian@lisas.de \
    --cc=arnd@arndb.de \
    --cc=avagin@openvz.org \
    --cc=christian.brauner@ubuntu.com \
    --cc=containers@lists.linux-foundation.org \
    --cc=criu@openvz.org \
    --cc=ebiederm@xmission.com \
    --cc=gorcunov@openvz.org \
    --cc=hpa@zytor.com \
    --cc=jannh@google.com \
    --cc=jdike@addtoit.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mingo@redhat.com \
    --cc=oleg@redhat.com \
    --cc=shuah@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=vincenzo.frascino@arm.com \
    --cc=x86@kernel.orgAndrei \
    --cc=xemul@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.