* [PATCH] x86/vdso: Add prctl to set per-process VDSO load
@ 2014-09-17 0:05 Richard Larocque
2014-09-17 0:13 ` Andi Kleen
` (2 more replies)
0 siblings, 3 replies; 36+ messages in thread
From: Richard Larocque @ 2014-09-17 0:05 UTC (permalink / raw)
To: mingo, tglx, hpa, luto, filbranden, md, gthelen
Cc: x86, linux-api, linux-kernel, Richard Larocque
Adds new prctl calls to enable or disable VDSO loading for a process
and its children.
The PR_SET_DISABLE_VDSO call takes one argument, which is interpreted as
a boolean value. If true, it disables the loading of the VDSO on exec()
for this process and any children created after this call. A false
value unsets the flag.
The PR_GET_DISABLE_VDSO option returns a non-negative true value if VDSO
loading has been disabled for this process, zero if it has not been
disabled, and a negative value in case of error.
These prctl calls are hidden behind a new Kconfig,
CONFIG_VDSO_DISABLE_PRCTL. This feature is available only on x86.
The command line option vdso=0 overrides the behavior of
PR_SET_DISABLE_VDSO, however, PR_GET_DISABLE_VDSO will coninue to return
whetever setting was last set with PR_SET_DISABLE_VDSO.
Signed-off-by: Richard Larocque <rlarocque@google.com>
---
This patch is part of some work to better handle times and CRIU migration.
I suspect that there are other use cases out there, so I'm offering this
patch separately.
When considering CRIU migration and times, we put some thought into how
to handle the rdtsc instruction. If we migrate between machines or across
reboots, the migrated process will see values that could break its assumptions
about how rdtsc is supposed to work. To deal with this, we could:
* let the application handle it
* ban the instruction (ie. PR_TSC_SIGSEGV), to make sure the application
doesn't use it by accident
* trap the instruction then mark the process as "tainted" for migration
purposes
* trap the instruction then apply an adjustment to keep values consistent
across machines
* do something really crazy involving the VMCS
There's no great option here. Which one we choose probably depends on
what kind of process is being migrated.
Many of these options involve setting a trap for the rdtsc instruction in the
process, which creates problems for the vDSO. The vDSO implementations of
clock_gettime() make use of that instruction from userspace. We can use
workarounds in the kernel to turn the trap into a no-op when it comes from vDSO
code, but that somewhat defeats the purpose of having a vDSO in the first
place. It was supposed to avoid unnecessary calls to kernel space. Trapping
its instructions goes against that goal.
So we think it would be nice to disable the vDSO for some processes on a
machine. This would allow us to implement some of the rdtsc handling options
without having to worry about the vDSO.
We have some additional plans for clock_gettime() and friends that may or
may not depend on disabling the vDSO, but it might be best if we defer that
part of the discussion to the next patch set. I think this patch and its
use case can stand on their own.
arch/x86/Kconfig | 8 ++++++++
arch/x86/vdso/vma.c | 18 ++++++++++++++++++
include/linux/sched.h | 5 +++++
include/uapi/linux/prctl.h | 9 +++++++++
kernel/fork.c | 4 ++++
kernel/sys.c | 16 ++++++++++++++++
6 files changed, 60 insertions(+)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 3632743..ff54ead 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1875,6 +1875,14 @@ config COMPAT_VDSO
If unsure, say N: if you are compiling your own kernel, you
are unlikely to be using a buggy version of glibc.
+config VDSO_DISABLE_PRCTL
+ depends on X86
+ bool "prctl to disable VDSO loading"
+ ---help---
+ Enabling this option adds support for prctl calls that
+ set and retrieve a per-process flag to disable VDSO loading on
+ exec() for this process and all of its children.
+
config CMDLINE_BOOL
bool "Built-in kernel command line"
---help---
diff --git a/arch/x86/vdso/vma.c b/arch/x86/vdso/vma.c
index 970463b..496c48b 100644
--- a/arch/x86/vdso/vma.c
+++ b/arch/x86/vdso/vma.c
@@ -23,6 +23,15 @@ unsigned int __read_mostly vdso64_enabled = 1;
extern unsigned short vdso_sync_cpuid;
#endif
+static int vdso_enabled_for_current_process(void)
+{
+#if defined(CONFIG_VDSO_DISABLE_PRCTL)
+ return !current->signal->disable_vdso;
+#else
+ return 1;
+#endif
+}
+
void __init init_vdso_image(const struct vdso_image *image)
{
int i;
@@ -185,6 +194,9 @@ static int load_vdso32(void)
if (vdso32_enabled != 1) /* Other values all mean "disabled" */
return 0;
+ if (!vdso_enabled_for_current_process())
+ return 0;
+
ret = map_vdso(selected_vdso32, false);
if (ret)
return ret;
@@ -204,6 +216,9 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
if (!vdso64_enabled)
return 0;
+ if (!vdso_enabled_for_current_process())
+ return 0;
+
return map_vdso(&vdso_image_64, true);
}
@@ -216,6 +231,9 @@ int compat_arch_setup_additional_pages(struct linux_binprm *bprm,
if (!vdso64_enabled)
return 0;
+ if (!vdso_enabled_for_current_process())
+ return 0;
+
return map_vdso(&vdso_image_x32, true);
}
#endif
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 5c2c885..37f6a7a 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -708,6 +708,11 @@ struct signal_struct {
struct mutex cred_guard_mutex; /* guard against foreign influences on
* credential calculations
* (notably. ptrace) */
+
+#ifdef CONFIG_VDSO_DISABLE_PRCTL
+ unsigned int disable_vdso; /* If true, prevents loading of VDSO on
+ next exec() */
+#endif
};
/*
diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
index 58afc04..3dbbeda 100644
--- a/include/uapi/linux/prctl.h
+++ b/include/uapi/linux/prctl.h
@@ -152,4 +152,13 @@
#define PR_SET_THP_DISABLE 41
#define PR_GET_THP_DISABLE 42
+/*
+ * These can be used to flag a process so that neither it nor its children will
+ * receive VDSO mappings on their next exec() call.
+ */
+#define PR_SET_VDSO 43
+#define PR_GET_VDSO 44
+# define PR_VDSO_DISABLE 0 /* prevent loading of VDSO on exec() */
+# define PR_VDSO_ENABLE 1 /* allow loading of VDSO on exec() */
+
#endif /* _LINUX_PRCTL_H */
diff --git a/kernel/fork.c b/kernel/fork.c
index 0cf9cdb..11ede19 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1091,6 +1091,10 @@ static int copy_signal(unsigned long clone_flags, struct task_struct *tsk)
sig->has_child_subreaper = current->signal->has_child_subreaper ||
current->signal->is_child_subreaper;
+#ifdef CONFIG_VDSO_DISABLE_PRCTL
+ sig->disable_vdso = current->signal->disable_vdso;
+#endif
+
mutex_init(&sig->cred_guard_mutex);
return 0;
diff --git a/kernel/sys.c b/kernel/sys.c
index ce81291..eb94a96 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -2011,6 +2011,22 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
me->mm->def_flags &= ~VM_NOHUGEPAGE;
up_write(&me->mm->mmap_sem);
break;
+#ifdef CONFIG_VDSO_DISABLE_PRCTL
+ case PR_SET_VDSO:
+ if (arg2 == PR_VDSO_ENABLE)
+ me->signal->disable_vdso = 0;
+ else if (arg2 == PR_VDSO_DISABLE)
+ me->signal->disable_vdso = 1;
+ else
+ return -EINVAL;
+ break;
+ case PR_GET_VDSO:
+ if (!me->signal->disable_vdso)
+ error = put_user(PR_VDSO_ENABLE, (int __user *)arg2);
+ else
+ error = put_user(PR_VDSO_DISABLE, (int __user *)arg2);
+ break;
+#endif
default:
error = -EINVAL;
break;
--
2.1.0.rc2.206.gedb03e5
^ permalink raw reply related [flat|nested] 36+ messages in thread* Re: [PATCH] x86/vdso: Add prctl to set per-process VDSO load
2014-09-17 0:05 [PATCH] x86/vdso: Add prctl to set per-process VDSO load Richard Larocque
@ 2014-09-17 0:13 ` Andi Kleen
[not found] ` <8738brhzoc.fsf-KWJ+5VKanrL29G5dvP0v1laTQe2KTcn/@public.gmane.org>
[not found] ` <1410912351-31273-1-git-send-email-rlarocque-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2014-09-17 11:36 ` Kevin Easton
2 siblings, 1 reply; 36+ messages in thread
From: Andi Kleen @ 2014-09-17 0:13 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-api
Richard Larocque <rlarocque@google.com> writes:
Perhaps I'm missing something, but how do you modify the AUX vector
for the children?
> +config VDSO_DISABLE_PRCTL
> + depends on X86
> + bool "prctl to disable VDSO loading"
> + ---help---
> + Enabling this option adds support for prctl calls that
> + set and retrieve a per-process flag to disable VDSO loading on
> + exec() for this process and all of its children.
I don't think it makes any sense to have a config for a simple
feature like this. Just do it unconditionally.
-Andi
--
ak@linux.intel.com -- Speaking for myself only
^ permalink raw reply [flat|nested] 36+ messages in thread[parent not found: <1410912351-31273-1-git-send-email-rlarocque-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH] x86/vdso: Add prctl to set per-process VDSO load
2014-09-17 0:05 [PATCH] x86/vdso: Add prctl to set per-process VDSO load Richard Larocque
@ 2014-09-17 0:27 ` Andy Lutomirski
[not found] ` <1410912351-31273-1-git-send-email-rlarocque-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2014-09-17 11:36 ` Kevin Easton
2 siblings, 0 replies; 36+ messages in thread
From: Andy Lutomirski @ 2014-09-17 0:27 UTC (permalink / raw)
To: Richard Larocque
Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
filbranden-hpIqsD4AKlfQT0dZR+AlfA, md-hpIqsD4AKlfQT0dZR+AlfA,
gthelen-hpIqsD4AKlfQT0dZR+AlfA, X86 ML, Linux API,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
On Tue, Sep 16, 2014 at 5:05 PM, Richard Larocque <rlarocque-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:
> Adds new prctl calls to enable or disable VDSO loading for a process
> and its children.
>
> The PR_SET_DISABLE_VDSO call takes one argument, which is interpreted as
> a boolean value. If true, it disables the loading of the VDSO on exec()
> for this process and any children created after this call. A false
> value unsets the flag.
>
> The PR_GET_DISABLE_VDSO option returns a non-negative true value if VDSO
> loading has been disabled for this process, zero if it has not been
> disabled, and a negative value in case of error.
>
> These prctl calls are hidden behind a new Kconfig,
> CONFIG_VDSO_DISABLE_PRCTL. This feature is available only on x86.
>
> The command line option vdso=0 overrides the behavior of
> PR_SET_DISABLE_VDSO, however, PR_GET_DISABLE_VDSO will coninue to return
> whetever setting was last set with PR_SET_DISABLE_VDSO.
>
> Signed-off-by: Richard Larocque <rlarocque-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> ---
> This patch is part of some work to better handle times and CRIU migration.
> I suspect that there are other use cases out there, so I'm offering this
> patch separately.
>
> When considering CRIU migration and times, we put some thought into how
> to handle the rdtsc instruction. If we migrate between machines or across
> reboots, the migrated process will see values that could break its assumptions
> about how rdtsc is supposed to work.
I don't get it.
If __vdso_clock_gettime returns the wrong value in any scenario, we
should fix that. Simiarly, CRIU *already works*, unless there's
something I don't know of.
That being said, I would like an option to gate off RDTSC for a
process and its children in order to make PR_TSC_SIGSEGV more useful.
All the prerequisites are there now.
What problem are you trying to solve exactly?
--Andy
^ permalink raw reply [flat|nested] 36+ messages in thread* Re: [PATCH] x86/vdso: Add prctl to set per-process VDSO load
@ 2014-09-17 0:27 ` Andy Lutomirski
0 siblings, 0 replies; 36+ messages in thread
From: Andy Lutomirski @ 2014-09-17 0:27 UTC (permalink / raw)
To: Richard Larocque
Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, filbranden, md,
gthelen, X86 ML, Linux API, linux-kernel@vger.kernel.org
On Tue, Sep 16, 2014 at 5:05 PM, Richard Larocque <rlarocque@google.com> wrote:
> Adds new prctl calls to enable or disable VDSO loading for a process
> and its children.
>
> The PR_SET_DISABLE_VDSO call takes one argument, which is interpreted as
> a boolean value. If true, it disables the loading of the VDSO on exec()
> for this process and any children created after this call. A false
> value unsets the flag.
>
> The PR_GET_DISABLE_VDSO option returns a non-negative true value if VDSO
> loading has been disabled for this process, zero if it has not been
> disabled, and a negative value in case of error.
>
> These prctl calls are hidden behind a new Kconfig,
> CONFIG_VDSO_DISABLE_PRCTL. This feature is available only on x86.
>
> The command line option vdso=0 overrides the behavior of
> PR_SET_DISABLE_VDSO, however, PR_GET_DISABLE_VDSO will coninue to return
> whetever setting was last set with PR_SET_DISABLE_VDSO.
>
> Signed-off-by: Richard Larocque <rlarocque@google.com>
> ---
> This patch is part of some work to better handle times and CRIU migration.
> I suspect that there are other use cases out there, so I'm offering this
> patch separately.
>
> When considering CRIU migration and times, we put some thought into how
> to handle the rdtsc instruction. If we migrate between machines or across
> reboots, the migrated process will see values that could break its assumptions
> about how rdtsc is supposed to work.
I don't get it.
If __vdso_clock_gettime returns the wrong value in any scenario, we
should fix that. Simiarly, CRIU *already works*, unless there's
something I don't know of.
That being said, I would like an option to gate off RDTSC for a
process and its children in order to make PR_TSC_SIGSEGV more useful.
All the prerequisites are there now.
What problem are you trying to solve exactly?
--Andy
^ permalink raw reply [flat|nested] 36+ messages in thread
[parent not found: <CALCETrXtYV5xKkTxothuqNb7ra80Be7ZXJ-hDnC6p-bfEPZ=Lw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [PATCH] x86/vdso: Add prctl to set per-process VDSO load
2014-09-17 0:27 ` Andy Lutomirski
@ 2014-09-17 1:18 ` Richard Larocque
-1 siblings, 0 replies; 36+ messages in thread
From: Richard Larocque @ 2014-09-17 1:18 UTC (permalink / raw)
To: Andy Lutomirski
Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
Filipe Brandenburger, Michael Davidson, Greg Thelen, X86 ML,
Linux API, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
On Tue, Sep 16, 2014 at 5:27 PM, Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> wrote:
> On Tue, Sep 16, 2014 at 5:05 PM, Richard Larocque <rlarocque-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:
>> Adds new prctl calls to enable or disable VDSO loading for a process
>> and its children.
>>
>> The PR_SET_DISABLE_VDSO call takes one argument, which is interpreted as
>> a boolean value. If true, it disables the loading of the VDSO on exec()
>> for this process and any children created after this call. A false
>> value unsets the flag.
>>
>> The PR_GET_DISABLE_VDSO option returns a non-negative true value if VDSO
>> loading has been disabled for this process, zero if it has not been
>> disabled, and a negative value in case of error.
>>
>> These prctl calls are hidden behind a new Kconfig,
>> CONFIG_VDSO_DISABLE_PRCTL. This feature is available only on x86.
>>
>> The command line option vdso=0 overrides the behavior of
>> PR_SET_DISABLE_VDSO, however, PR_GET_DISABLE_VDSO will coninue to return
>> whetever setting was last set with PR_SET_DISABLE_VDSO.
>>
>> Signed-off-by: Richard Larocque <rlarocque-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
>> ---
>> This patch is part of some work to better handle times and CRIU migration.
>> I suspect that there are other use cases out there, so I'm offering this
>> patch separately.
>>
>> When considering CRIU migration and times, we put some thought into how
>> to handle the rdtsc instruction. If we migrate between machines or across
>> reboots, the migrated process will see values that could break its assumptions
>> about how rdtsc is supposed to work.
>
> I don't get it.
>
> If __vdso_clock_gettime returns the wrong value in any scenario, we
> should fix that. Simiarly, CRIU *already works*, unless there's
> something I don't know of.
Right. As far as I know, there's nothing wrong with the use of RDTSC
in the vDSO following a migration. The problem is that some
applications might use RDTSC outside of the vDSO. If they save the
returned values, then compare pre- and post- migration values, bad
things could happen (in theory).
Anything we do to try to trap and handle the use of RDTSC in wider
userspace will affect its use in the vDSO, too. In some situations,
it might be nice to run applications with no vDSO and PR_TSC_SIGSEGV,
just to make sure they don't have any heavy reliance on the TSC. It
would be nice if those applications didn't crash when they called
clock_gettime().
Another alternative is to trap and adjust the RDTSC. That might be a
viable option for applications that care about reliable RDTSC behavior
and migration, but don't care about performance. I think it makes
sense to disable the vDSO in that case, rather than trap on every call
that it makes.
> That being said, I would like an option to gate off RDTSC for a
> process and its children in order to make PR_TSC_SIGSEGV more useful.
> All the prerequisites are there now.
Agreed. That's what this patch is attempting to do, and that's the
main reason why I figured it was worth submitting independent of any
other time-related work.
> What problem are you trying to solve exactly?
Eventually, we'd like to make it so that neither RDTSC nor
CLOCK_MONOTONIC can go backwards following a migration.
The fix for RDTSC starts here. Building on this patch as a base, we
can either ban it from being used entirely, or write some code to
adjust its value as necessary.
The CLOCK_MONOTONIC fix will be a different patch stack. We're
currently hoping to do that without disable the vDSO, but that's
another discussion.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] x86/vdso: Add prctl to set per-process VDSO load
@ 2014-09-17 1:18 ` Richard Larocque
0 siblings, 0 replies; 36+ messages in thread
From: Richard Larocque @ 2014-09-17 1:18 UTC (permalink / raw)
To: Andy Lutomirski
Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
Filipe Brandenburger, Michael Davidson, Greg Thelen, X86 ML,
Linux API, linux-kernel@vger.kernel.org
On Tue, Sep 16, 2014 at 5:27 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> On Tue, Sep 16, 2014 at 5:05 PM, Richard Larocque <rlarocque@google.com> wrote:
>> Adds new prctl calls to enable or disable VDSO loading for a process
>> and its children.
>>
>> The PR_SET_DISABLE_VDSO call takes one argument, which is interpreted as
>> a boolean value. If true, it disables the loading of the VDSO on exec()
>> for this process and any children created after this call. A false
>> value unsets the flag.
>>
>> The PR_GET_DISABLE_VDSO option returns a non-negative true value if VDSO
>> loading has been disabled for this process, zero if it has not been
>> disabled, and a negative value in case of error.
>>
>> These prctl calls are hidden behind a new Kconfig,
>> CONFIG_VDSO_DISABLE_PRCTL. This feature is available only on x86.
>>
>> The command line option vdso=0 overrides the behavior of
>> PR_SET_DISABLE_VDSO, however, PR_GET_DISABLE_VDSO will coninue to return
>> whetever setting was last set with PR_SET_DISABLE_VDSO.
>>
>> Signed-off-by: Richard Larocque <rlarocque@google.com>
>> ---
>> This patch is part of some work to better handle times and CRIU migration.
>> I suspect that there are other use cases out there, so I'm offering this
>> patch separately.
>>
>> When considering CRIU migration and times, we put some thought into how
>> to handle the rdtsc instruction. If we migrate between machines or across
>> reboots, the migrated process will see values that could break its assumptions
>> about how rdtsc is supposed to work.
>
> I don't get it.
>
> If __vdso_clock_gettime returns the wrong value in any scenario, we
> should fix that. Simiarly, CRIU *already works*, unless there's
> something I don't know of.
Right. As far as I know, there's nothing wrong with the use of RDTSC
in the vDSO following a migration. The problem is that some
applications might use RDTSC outside of the vDSO. If they save the
returned values, then compare pre- and post- migration values, bad
things could happen (in theory).
Anything we do to try to trap and handle the use of RDTSC in wider
userspace will affect its use in the vDSO, too. In some situations,
it might be nice to run applications with no vDSO and PR_TSC_SIGSEGV,
just to make sure they don't have any heavy reliance on the TSC. It
would be nice if those applications didn't crash when they called
clock_gettime().
Another alternative is to trap and adjust the RDTSC. That might be a
viable option for applications that care about reliable RDTSC behavior
and migration, but don't care about performance. I think it makes
sense to disable the vDSO in that case, rather than trap on every call
that it makes.
> That being said, I would like an option to gate off RDTSC for a
> process and its children in order to make PR_TSC_SIGSEGV more useful.
> All the prerequisites are there now.
Agreed. That's what this patch is attempting to do, and that's the
main reason why I figured it was worth submitting independent of any
other time-related work.
> What problem are you trying to solve exactly?
Eventually, we'd like to make it so that neither RDTSC nor
CLOCK_MONOTONIC can go backwards following a migration.
The fix for RDTSC starts here. Building on this patch as a base, we
can either ban it from being used entirely, or write some code to
adjust its value as necessary.
The CLOCK_MONOTONIC fix will be a different patch stack. We're
currently hoping to do that without disable the vDSO, but that's
another discussion.
^ permalink raw reply [flat|nested] 36+ messages in thread
[parent not found: <CABgu+=NC1ZMQkV0J2c9-MD1+TAmTjDm4ACZASPnfX2Fwiu5rMQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [PATCH] x86/vdso: Add prctl to set per-process VDSO load
2014-09-17 1:18 ` Richard Larocque
@ 2014-09-17 5:00 ` Andy Lutomirski
-1 siblings, 0 replies; 36+ messages in thread
From: Andy Lutomirski @ 2014-09-17 5:00 UTC (permalink / raw)
To: Richard Larocque
Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
Filipe Brandenburger, Michael Davidson, Greg Thelen, X86 ML,
Linux API, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
On Tue, Sep 16, 2014 at 6:18 PM, Richard Larocque <rlarocque-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:
> On Tue, Sep 16, 2014 at 5:27 PM, Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> wrote:
>> On Tue, Sep 16, 2014 at 5:05 PM, Richard Larocque <rlarocque-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:
>>> Adds new prctl calls to enable or disable VDSO loading for a process
>>> and its children.
>>>
>>> The PR_SET_DISABLE_VDSO call takes one argument, which is interpreted as
>>> a boolean value. If true, it disables the loading of the VDSO on exec()
>>> for this process and any children created after this call. A false
>>> value unsets the flag.
>>>
>>> The PR_GET_DISABLE_VDSO option returns a non-negative true value if VDSO
>>> loading has been disabled for this process, zero if it has not been
>>> disabled, and a negative value in case of error.
>>>
>>> These prctl calls are hidden behind a new Kconfig,
>>> CONFIG_VDSO_DISABLE_PRCTL. This feature is available only on x86.
>>>
>>> The command line option vdso=0 overrides the behavior of
>>> PR_SET_DISABLE_VDSO, however, PR_GET_DISABLE_VDSO will coninue to return
>>> whetever setting was last set with PR_SET_DISABLE_VDSO.
>>>
>>> Signed-off-by: Richard Larocque <rlarocque-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
>>> ---
>>> This patch is part of some work to better handle times and CRIU migration.
>>> I suspect that there are other use cases out there, so I'm offering this
>>> patch separately.
>>>
>>> When considering CRIU migration and times, we put some thought into how
>>> to handle the rdtsc instruction. If we migrate between machines or across
>>> reboots, the migrated process will see values that could break its assumptions
>>> about how rdtsc is supposed to work.
>>
>> I don't get it.
>>
>> If __vdso_clock_gettime returns the wrong value in any scenario, we
>> should fix that. Simiarly, CRIU *already works*, unless there's
>> something I don't know of.
>
> Right. As far as I know, there's nothing wrong with the use of RDTSC
> in the vDSO following a migration. The problem is that some
> applications might use RDTSC outside of the vDSO. If they save the
> returned values, then compare pre- and post- migration values, bad
> things could happen (in theory).
These applications are broken, full stop. They will misbehave on VMs,
or older machines, and even on the rather new piece of sh*t MSI
motherboard under my desk. I think that CRIU is just icing on the
cake. Also, they'll probably just crash if you turn off RDTSC.
>
> Anything we do to try to trap and handle the use of RDTSC in wider
> userspace will affect its use in the vDSO, too. In some situations,
> it might be nice to run applications with no vDSO and PR_TSC_SIGSEGV,
> just to make sure they don't have any heavy reliance on the TSC. It
> would be nice if those applications didn't crash when they called
> clock_gettime().
Agreed. But let's do it without turning off the vdso. Also, turning
off the 32-bit vdso could break a lot of things.
>
> Another alternative is to trap and adjust the RDTSC. That might be a
> viable option for applications that care about reliable RDTSC behavior
> and migration, but don't care about performance. I think it makes
> sense to disable the vDSO in that case, rather than trap on every call
> that it makes.
Here I disagree. Let's just tweak the vdso not to use rdtsc in this case.
>
>> That being said, I would like an option to gate off RDTSC for a
>> process and its children in order to make PR_TSC_SIGSEGV more useful.
>> All the prerequisites are there now.
>
> Agreed. That's what this patch is attempting to do, and that's the
> main reason why I figured it was worth submitting independent of any
> other time-related work.
>
>> What problem are you trying to solve exactly?
>
> Eventually, we'd like to make it so that neither RDTSC nor
> CLOCK_MONOTONIC can go backwards following a migration.
>
> The fix for RDTSC starts here. Building on this patch as a base, we
> can either ban it from being used entirely, or write some code to
> adjust its value as necessary.
>
> The CLOCK_MONOTONIC fix will be a different patch stack. We're
> currently hoping to do that without disable the vDSO, but that's
> another discussion.
I think that the patch should instead tweak the vvar mapping to tell
the vdso not to use rdtsc. It should be based on this:
https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/log/?h=x86/vsyscall
and I'll talk to hpa tomorrow about about getting that, or something
like it, into the tip tree. In particular, you'll need this:
https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/commit/?h=x86/vsyscall&id=0cc410a05cb95e073ebfe099c9e03cef48d2be0f
Also, this kind of inheritable restriction may end up requiring
no_new_privs or CAP_SYS_ADMIN to be secure.
--Andy
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] x86/vdso: Add prctl to set per-process VDSO load
@ 2014-09-17 5:00 ` Andy Lutomirski
0 siblings, 0 replies; 36+ messages in thread
From: Andy Lutomirski @ 2014-09-17 5:00 UTC (permalink / raw)
To: Richard Larocque
Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
Filipe Brandenburger, Michael Davidson, Greg Thelen, X86 ML,
Linux API, linux-kernel@vger.kernel.org
On Tue, Sep 16, 2014 at 6:18 PM, Richard Larocque <rlarocque@google.com> wrote:
> On Tue, Sep 16, 2014 at 5:27 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>> On Tue, Sep 16, 2014 at 5:05 PM, Richard Larocque <rlarocque@google.com> wrote:
>>> Adds new prctl calls to enable or disable VDSO loading for a process
>>> and its children.
>>>
>>> The PR_SET_DISABLE_VDSO call takes one argument, which is interpreted as
>>> a boolean value. If true, it disables the loading of the VDSO on exec()
>>> for this process and any children created after this call. A false
>>> value unsets the flag.
>>>
>>> The PR_GET_DISABLE_VDSO option returns a non-negative true value if VDSO
>>> loading has been disabled for this process, zero if it has not been
>>> disabled, and a negative value in case of error.
>>>
>>> These prctl calls are hidden behind a new Kconfig,
>>> CONFIG_VDSO_DISABLE_PRCTL. This feature is available only on x86.
>>>
>>> The command line option vdso=0 overrides the behavior of
>>> PR_SET_DISABLE_VDSO, however, PR_GET_DISABLE_VDSO will coninue to return
>>> whetever setting was last set with PR_SET_DISABLE_VDSO.
>>>
>>> Signed-off-by: Richard Larocque <rlarocque@google.com>
>>> ---
>>> This patch is part of some work to better handle times and CRIU migration.
>>> I suspect that there are other use cases out there, so I'm offering this
>>> patch separately.
>>>
>>> When considering CRIU migration and times, we put some thought into how
>>> to handle the rdtsc instruction. If we migrate between machines or across
>>> reboots, the migrated process will see values that could break its assumptions
>>> about how rdtsc is supposed to work.
>>
>> I don't get it.
>>
>> If __vdso_clock_gettime returns the wrong value in any scenario, we
>> should fix that. Simiarly, CRIU *already works*, unless there's
>> something I don't know of.
>
> Right. As far as I know, there's nothing wrong with the use of RDTSC
> in the vDSO following a migration. The problem is that some
> applications might use RDTSC outside of the vDSO. If they save the
> returned values, then compare pre- and post- migration values, bad
> things could happen (in theory).
These applications are broken, full stop. They will misbehave on VMs,
or older machines, and even on the rather new piece of sh*t MSI
motherboard under my desk. I think that CRIU is just icing on the
cake. Also, they'll probably just crash if you turn off RDTSC.
>
> Anything we do to try to trap and handle the use of RDTSC in wider
> userspace will affect its use in the vDSO, too. In some situations,
> it might be nice to run applications with no vDSO and PR_TSC_SIGSEGV,
> just to make sure they don't have any heavy reliance on the TSC. It
> would be nice if those applications didn't crash when they called
> clock_gettime().
Agreed. But let's do it without turning off the vdso. Also, turning
off the 32-bit vdso could break a lot of things.
>
> Another alternative is to trap and adjust the RDTSC. That might be a
> viable option for applications that care about reliable RDTSC behavior
> and migration, but don't care about performance. I think it makes
> sense to disable the vDSO in that case, rather than trap on every call
> that it makes.
Here I disagree. Let's just tweak the vdso not to use rdtsc in this case.
>
>> That being said, I would like an option to gate off RDTSC for a
>> process and its children in order to make PR_TSC_SIGSEGV more useful.
>> All the prerequisites are there now.
>
> Agreed. That's what this patch is attempting to do, and that's the
> main reason why I figured it was worth submitting independent of any
> other time-related work.
>
>> What problem are you trying to solve exactly?
>
> Eventually, we'd like to make it so that neither RDTSC nor
> CLOCK_MONOTONIC can go backwards following a migration.
>
> The fix for RDTSC starts here. Building on this patch as a base, we
> can either ban it from being used entirely, or write some code to
> adjust its value as necessary.
>
> The CLOCK_MONOTONIC fix will be a different patch stack. We're
> currently hoping to do that without disable the vDSO, but that's
> another discussion.
I think that the patch should instead tweak the vvar mapping to tell
the vdso not to use rdtsc. It should be based on this:
https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/log/?h=x86/vsyscall
and I'll talk to hpa tomorrow about about getting that, or something
like it, into the tip tree. In particular, you'll need this:
https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/commit/?h=x86/vsyscall&id=0cc410a05cb95e073ebfe099c9e03cef48d2be0f
Also, this kind of inheritable restriction may end up requiring
no_new_privs or CAP_SYS_ADMIN to be secure.
--Andy
^ permalink raw reply [flat|nested] 36+ messages in thread
[parent not found: <CALCETrWj8Pj8d8YjybvOKG-=xmy-XGFo9cGQ9qn0V4t9Oj+dOw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [PATCH] x86/vdso: Add prctl to set per-process VDSO load
2014-09-17 5:00 ` Andy Lutomirski
@ 2014-09-17 5:34 ` Andy Lutomirski
-1 siblings, 0 replies; 36+ messages in thread
From: Andy Lutomirski @ 2014-09-17 5:34 UTC (permalink / raw)
To: Richard Larocque
Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
Filipe Brandenburger, Michael Davidson, Greg Thelen, X86 ML,
Linux API, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
On Tue, Sep 16, 2014 at 10:00 PM, Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> wrote:
> On Tue, Sep 16, 2014 at 6:18 PM, Richard Larocque <rlarocque-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:
>> On Tue, Sep 16, 2014 at 5:27 PM, Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> wrote:
> I think that the patch should instead tweak the vvar mapping to tell
> the vdso not to use rdtsc. It should be based on this:
>
> https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/log/?h=x86/vsyscall
>
> and I'll talk to hpa tomorrow about about getting that, or something
> like it, into the tip tree. In particular, you'll need this:
>
> https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/commit/?h=x86/vsyscall&id=0cc410a05cb95e073ebfe099c9e03cef48d2be0f
Crud. Now that I've said that, I realize that this won't work right
if rdtscp is off. I'll drop that patch.
I don't think that this changes the conclusion. It should be possible
to swap out the vvar page to keep the vdso working even without rdtsc
available.
--Andy
>
> Also, this kind of inheritable restriction may end up requiring
> no_new_privs or CAP_SYS_ADMIN to be secure.
>
> --Andy
--
Andy Lutomirski
AMA Capital Management, LLC
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] x86/vdso: Add prctl to set per-process VDSO load
@ 2014-09-17 5:34 ` Andy Lutomirski
0 siblings, 0 replies; 36+ messages in thread
From: Andy Lutomirski @ 2014-09-17 5:34 UTC (permalink / raw)
To: Richard Larocque
Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
Filipe Brandenburger, Michael Davidson, Greg Thelen, X86 ML,
Linux API, linux-kernel@vger.kernel.org
On Tue, Sep 16, 2014 at 10:00 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> On Tue, Sep 16, 2014 at 6:18 PM, Richard Larocque <rlarocque@google.com> wrote:
>> On Tue, Sep 16, 2014 at 5:27 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> I think that the patch should instead tweak the vvar mapping to tell
> the vdso not to use rdtsc. It should be based on this:
>
> https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/log/?h=x86/vsyscall
>
> and I'll talk to hpa tomorrow about about getting that, or something
> like it, into the tip tree. In particular, you'll need this:
>
> https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/commit/?h=x86/vsyscall&id=0cc410a05cb95e073ebfe099c9e03cef48d2be0f
Crud. Now that I've said that, I realize that this won't work right
if rdtscp is off. I'll drop that patch.
I don't think that this changes the conclusion. It should be possible
to swap out the vvar page to keep the vdso working even without rdtsc
available.
--Andy
>
> Also, this kind of inheritable restriction may end up requiring
> no_new_privs or CAP_SYS_ADMIN to be secure.
>
> --Andy
--
Andy Lutomirski
AMA Capital Management, LLC
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] x86/vdso: Add prctl to set per-process VDSO load
2014-09-17 5:00 ` Andy Lutomirski
@ 2014-09-17 6:21 ` Filipe Brandenburger
-1 siblings, 0 replies; 36+ messages in thread
From: Filipe Brandenburger @ 2014-09-17 6:21 UTC (permalink / raw)
To: Andy Lutomirski
Cc: Richard Larocque, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
Michael Davidson, Greg Thelen, X86 ML, Linux API,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Hi Andy,
On Tue, Sep 16, 2014 at 10:00 PM, Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> wrote:
> I think that the patch should instead tweak the vvar mapping to tell
> the vdso not to use rdtsc. It should be based on this:
I've been working on this approach which extends the vvar from 2 to 3
pages. The third page would initially be mapped to a zero page but
then through a prctl a task could replace it with a real page that
could then be inherited through fork and exec.
That would make it possible to have per-task vvar contents.
We could use some of those values as flags to indicate whether vdso
routines may use RDTSC or not.
In the future, we're planning to also use that to store clock offsets
so that we can ensure CLOCK_MONOTONIC works after CRIU migration
without having to turn off the VDSO or have to always fallback to full
syscalls on every case.
Do you think that would be a reasonable way to accomplish that?
Thanks,
Filipe
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] x86/vdso: Add prctl to set per-process VDSO load
@ 2014-09-17 6:21 ` Filipe Brandenburger
0 siblings, 0 replies; 36+ messages in thread
From: Filipe Brandenburger @ 2014-09-17 6:21 UTC (permalink / raw)
To: Andy Lutomirski
Cc: Richard Larocque, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
Michael Davidson, Greg Thelen, X86 ML, Linux API,
linux-kernel@vger.kernel.org
Hi Andy,
On Tue, Sep 16, 2014 at 10:00 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> I think that the patch should instead tweak the vvar mapping to tell
> the vdso not to use rdtsc. It should be based on this:
I've been working on this approach which extends the vvar from 2 to 3
pages. The third page would initially be mapped to a zero page but
then through a prctl a task could replace it with a real page that
could then be inherited through fork and exec.
That would make it possible to have per-task vvar contents.
We could use some of those values as flags to indicate whether vdso
routines may use RDTSC or not.
In the future, we're planning to also use that to store clock offsets
so that we can ensure CLOCK_MONOTONIC works after CRIU migration
without having to turn off the VDSO or have to always fallback to full
syscalls on every case.
Do you think that would be a reasonable way to accomplish that?
Thanks,
Filipe
^ permalink raw reply [flat|nested] 36+ messages in thread
[parent not found: <CADU+-uCguxhkw259MefABuzOxvyK_F6bhwza0YhveCCJxEV8wQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [PATCH] x86/vdso: Add prctl to set per-process VDSO load
2014-09-17 6:21 ` Filipe Brandenburger
@ 2014-09-17 8:46 ` H. Peter Anvin
-1 siblings, 0 replies; 36+ messages in thread
From: H. Peter Anvin @ 2014-09-17 8:46 UTC (permalink / raw)
To: Filipe Brandenburger, Andy Lutomirski
Cc: Richard Larocque, Ingo Molnar, Thomas Gleixner, Michael Davidson,
Greg Thelen, X86 ML, Linux API,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
On 09/16/2014 11:21 PM, Filipe Brandenburger wrote:
> Hi Andy,
>
> On Tue, Sep 16, 2014 at 10:00 PM, Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> wrote:
>> I think that the patch should instead tweak the vvar mapping to tell
>> the vdso not to use rdtsc. It should be based on this:
>
> I've been working on this approach which extends the vvar from 2 to 3
> pages. The third page would initially be mapped to a zero page but
> then through a prctl a task could replace it with a real page that
> could then be inherited through fork and exec.
>
> That would make it possible to have per-task vvar contents.
>
> We could use some of those values as flags to indicate whether vdso
> routines may use RDTSC or not.
>
> In the future, we're planning to also use that to store clock offsets
> so that we can ensure CLOCK_MONOTONIC works after CRIU migration
> without having to turn off the VDSO or have to always fallback to full
> syscalls on every case.
>
> Do you think that would be a reasonable way to accomplish that?
>
Why would we need/want per process vvar contents? It seems better to
have the code swapped out.
-hpa
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] x86/vdso: Add prctl to set per-process VDSO load
@ 2014-09-17 8:46 ` H. Peter Anvin
0 siblings, 0 replies; 36+ messages in thread
From: H. Peter Anvin @ 2014-09-17 8:46 UTC (permalink / raw)
To: Filipe Brandenburger, Andy Lutomirski
Cc: Richard Larocque, Ingo Molnar, Thomas Gleixner, Michael Davidson,
Greg Thelen, X86 ML, Linux API, linux-kernel@vger.kernel.org
On 09/16/2014 11:21 PM, Filipe Brandenburger wrote:
> Hi Andy,
>
> On Tue, Sep 16, 2014 at 10:00 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>> I think that the patch should instead tweak the vvar mapping to tell
>> the vdso not to use rdtsc. It should be based on this:
>
> I've been working on this approach which extends the vvar from 2 to 3
> pages. The third page would initially be mapped to a zero page but
> then through a prctl a task could replace it with a real page that
> could then be inherited through fork and exec.
>
> That would make it possible to have per-task vvar contents.
>
> We could use some of those values as flags to indicate whether vdso
> routines may use RDTSC or not.
>
> In the future, we're planning to also use that to store clock offsets
> so that we can ensure CLOCK_MONOTONIC works after CRIU migration
> without having to turn off the VDSO or have to always fallback to full
> syscalls on every case.
>
> Do you think that would be a reasonable way to accomplish that?
>
Why would we need/want per process vvar contents? It seems better to
have the code swapped out.
-hpa
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] x86/vdso: Add prctl to set per-process VDSO load
2014-09-17 8:46 ` H. Peter Anvin
(?)
@ 2014-09-17 13:48 ` Filipe Brandenburger
-1 siblings, 0 replies; 36+ messages in thread
From: Filipe Brandenburger @ 2014-09-17 13:48 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Andy Lutomirski, Richard Larocque, Ingo Molnar, Thomas Gleixner,
Michael Davidson, Greg Thelen, X86 ML, Linux API,
linux-kernel@vger.kernel.org
Hi,
On Wed, Sep 17, 2014 at 1:46 AM, H. Peter Anvin <hpa@zytor.com> wrote:
> Why would we need/want per process vvar contents? It seems better to
> have the code swapped out.
We are looking at the migration use case (CRIU).
In specific, we want to make CLOCK_MONOTONIC keep the "monotonic"
promise after migration. In order to accomplish that, we may need to
add an offset to CLOCK_MONOTONIC calls after migration.
By keeping a per process vvar page we should be able to implement that
without having to fallback to a real syscall for calls to
clock_gettime(CLOCK_MONOTONIC).
One other thought was that we could use this process vvar page to
expose a migration counter that would get incremented after migration.
That way we could allow (some) userspace tasks (and possibly some VDSO
routines) to use RDTSC but check the contents of that counter
before/after taking measurements to check whether the elapsed time is
valid or not. That means we could still provide tasks with a fast time
counter if they really need one.
Cheers,
Filipe
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] x86/vdso: Add prctl to set per-process VDSO load
2014-09-17 8:46 ` H. Peter Anvin
(?)
(?)
@ 2014-09-17 14:28 ` Andy Lutomirski
2014-09-19 19:27 ` Andy Lutomirski
-1 siblings, 1 reply; 36+ messages in thread
From: Andy Lutomirski @ 2014-09-17 14:28 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Greg Thelen, X86 ML, Thomas Gleixner, Michael Davidson,
Ingo Molnar, Richard Larocque, linux-kernel@vger.kernel.org,
Filipe Brandenburger, Linux API
On Sep 17, 2014 1:46 AM, "H. Peter Anvin" <hpa@zytor.com> wrote:
>
> On 09/16/2014 11:21 PM, Filipe Brandenburger wrote:
> > Hi Andy,
> >
> > On Tue, Sep 16, 2014 at 10:00 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> >> I think that the patch should instead tweak the vvar mapping to tell
> >> the vdso not to use rdtsc. It should be based on this:
> >
> > I've been working on this approach which extends the vvar from 2 to 3
> > pages. The third page would initially be mapped to a zero page but
> > then through a prctl a task could replace it with a real page that
> > could then be inherited through fork and exec.
> >
> > That would make it possible to have per-task vvar contents.
> >
> > We could use some of those values as flags to indicate whether vdso
> > routines may use RDTSC or not.
> >
> > In the future, we're planning to also use that to store clock offsets
> > so that we can ensure CLOCK_MONOTONIC works after CRIU migration
> > without having to turn off the VDSO or have to always fallback to full
> > syscalls on every case.
> >
> > Do you think that would be a reasonable way to accomplish that?
> >
>
> Why would we need/want per process vvar contents? It seems better to
> have the code swapped out.
That seems messier from a build perspective. Also, if we ever want to
switch this dynamically, swapping out data is much easier than
swapping out code. I think we should be able to replace the vvar page
with the zero page, though.
One tricky bit: currently we can only easily do this on exec, but we
should be able to do it immediately if we start tracking mremap of the
vdso. Should we make that a prerequisite? I don't really want this
to end up being permanently weird.
As for an actual post-migration offset, I'd rather add support for
per-mm forced syscall fallback and then get something into the code
timing code before even thinking about an x86 vdso fast path. I don't
think that a feature like per-mm timing offsets should happen as an
arch-specific thing first.
--Andy
>
> -hpa
>
>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] x86/vdso: Add prctl to set per-process VDSO load
2014-09-17 14:28 ` Andy Lutomirski
@ 2014-09-19 19:27 ` Andy Lutomirski
2014-09-19 21:26 ` Richard Larocque
[not found] ` <CALCETrXpB4qOS+JUBqJnTY8JUfQ=A6DaebU6e=32sSv=0c0QCg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
0 siblings, 2 replies; 36+ messages in thread
From: Andy Lutomirski @ 2014-09-19 19:27 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Greg Thelen, X86 ML, Thomas Gleixner, Michael Davidson,
Ingo Molnar, Richard Larocque, linux-kernel@vger.kernel.org,
Filipe Brandenburger, Linux API
On Wed, Sep 17, 2014 at 7:28 AM, Andy Lutomirski <luto@amacapital.net> wrote:
> On Sep 17, 2014 1:46 AM, "H. Peter Anvin" <hpa@zytor.com> wrote:
>>
>> On 09/16/2014 11:21 PM, Filipe Brandenburger wrote:
>> > Hi Andy,
>> >
>> > On Tue, Sep 16, 2014 at 10:00 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>> >> I think that the patch should instead tweak the vvar mapping to tell
>> >> the vdso not to use rdtsc. It should be based on this:
>> >
>> > I've been working on this approach which extends the vvar from 2 to 3
>> > pages. The third page would initially be mapped to a zero page but
>> > then through a prctl a task could replace it with a real page that
>> > could then be inherited through fork and exec.
>> >
>> > That would make it possible to have per-task vvar contents.
>> >
>> > We could use some of those values as flags to indicate whether vdso
>> > routines may use RDTSC or not.
>> >
>> > In the future, we're planning to also use that to store clock offsets
>> > so that we can ensure CLOCK_MONOTONIC works after CRIU migration
>> > without having to turn off the VDSO or have to always fallback to full
>> > syscalls on every case.
>> >
>> > Do you think that would be a reasonable way to accomplish that?
>> >
>>
>> Why would we need/want per process vvar contents? It seems better to
>> have the code swapped out.
>
> That seems messier from a build perspective. Also, if we ever want to
> switch this dynamically, swapping out data is much easier than
> swapping out code. I think we should be able to replace the vvar page
> with the zero page, though.
>
> One tricky bit: currently we can only easily do this on exec, but we
> should be able to do it immediately if we start tracking mremap of the
> vdso. Should we make that a prerequisite? I don't really want this
> to end up being permanently weird.
I have this (special mapping tracking) 3/4 implemented. I'm planning
on making it fully functional for 64-bit programs and almost correct
for 32-bit. (You'll still crash if you have multiple threads, you use
sysenter, and you remap the vdso, but I think that this is essentially
unavoidable until someone lets mremap work on multiple vmas at once.)
>
> As for an actual post-migration offset, I'd rather add support for
> per-mm forced syscall fallback and then get something into the code
> timing code before even thinking about an x86 vdso fast path. I don't
> think that a feature like per-mm timing offsets should happen as an
> arch-specific thing first.
>
> --Andy
>
>>
>> -hpa
>>
>>
--
Andy Lutomirski
AMA Capital Management, LLC
^ permalink raw reply [flat|nested] 36+ messages in thread* Re: [PATCH] x86/vdso: Add prctl to set per-process VDSO load
2014-09-19 19:27 ` Andy Lutomirski
@ 2014-09-19 21:26 ` Richard Larocque
[not found] ` <CALCETrXpB4qOS+JUBqJnTY8JUfQ=A6DaebU6e=32sSv=0c0QCg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
1 sibling, 0 replies; 36+ messages in thread
From: Richard Larocque @ 2014-09-19 21:26 UTC (permalink / raw)
To: Andy Lutomirski
Cc: H. Peter Anvin, Greg Thelen, X86 ML, Thomas Gleixner,
Michael Davidson, Ingo Molnar, linux-kernel@vger.kernel.org,
Filipe Brandenburger, Linux API
On Fri, Sep 19, 2014 at 12:27 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> On Wed, Sep 17, 2014 at 7:28 AM, Andy Lutomirski <luto@amacapital.net> wrote:
>> On Sep 17, 2014 1:46 AM, "H. Peter Anvin" <hpa@zytor.com> wrote:
>>>
>>> On 09/16/2014 11:21 PM, Filipe Brandenburger wrote:
>>> > Hi Andy,
>>> >
>>> > On Tue, Sep 16, 2014 at 10:00 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>>> >> I think that the patch should instead tweak the vvar mapping to tell
>>> >> the vdso not to use rdtsc. It should be based on this:
>>> >
>>> > I've been working on this approach which extends the vvar from 2 to 3
>>> > pages. The third page would initially be mapped to a zero page but
>>> > then through a prctl a task could replace it with a real page that
>>> > could then be inherited through fork and exec.
>>> >
>>> > That would make it possible to have per-task vvar contents.
>>> >
>>> > We could use some of those values as flags to indicate whether vdso
>>> > routines may use RDTSC or not.
>>> >
>>> > In the future, we're planning to also use that to store clock offsets
>>> > so that we can ensure CLOCK_MONOTONIC works after CRIU migration
>>> > without having to turn off the VDSO or have to always fallback to full
>>> > syscalls on every case.
>>> >
>>> > Do you think that would be a reasonable way to accomplish that?
>>> >
>>>
>>> Why would we need/want per process vvar contents? It seems better to
>>> have the code swapped out.
>>
>> That seems messier from a build perspective. Also, if we ever want to
>> switch this dynamically, swapping out data is much easier than
>> swapping out code. I think we should be able to replace the vvar page
>> with the zero page, though.
>>
>> One tricky bit: currently we can only easily do this on exec, but we
>> should be able to do it immediately if we start tracking mremap of the
>> vdso. Should we make that a prerequisite? I don't really want this
>> to end up being permanently weird.
>
> I have this (special mapping tracking) 3/4 implemented. I'm planning
> on making it fully functional for 64-bit programs and almost correct
> for 32-bit. (You'll still crash if you have multiple threads, you use
> sysenter, and you remap the vdso, but I think that this is essentially
> unavoidable until someone lets mremap work on multiple vmas at once.)
>
Thanks! I look forward to seeing the result.
I have some per-process clock offset patches that currently work only
when the vDSO is disabled. It sounds like your patches will provide a
clean solution to deal with that issue. I'll try to rebase my work on
top of your changes when they're ready.
We've also got some patches to apply an offset to the TSC that could
benefit from your changes, but I guess there's not much appetite for
merging them. That's fine with me. I don't see any need for that
feature until we have a few examples of applications that could be
broken by TSC changes during migration.
^ permalink raw reply [flat|nested] 36+ messages in thread[parent not found: <CALCETrXpB4qOS+JUBqJnTY8JUfQ=A6DaebU6e=32sSv=0c0QCg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [PATCH] x86/vdso: Add prctl to set per-process VDSO load
2014-09-19 19:27 ` Andy Lutomirski
@ 2014-09-19 22:02 ` Filipe Brandenburger
[not found] ` <CALCETrXpB4qOS+JUBqJnTY8JUfQ=A6DaebU6e=32sSv=0c0QCg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
1 sibling, 0 replies; 36+ messages in thread
From: Filipe Brandenburger @ 2014-09-19 22:02 UTC (permalink / raw)
To: Andy Lutomirski
Cc: H. Peter Anvin, Greg Thelen, X86 ML, Thomas Gleixner,
Michael Davidson, Ingo Molnar, Richard Larocque,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Linux API
Hi Andy,
On Fri, Sep 19, 2014 at 12:27 PM, Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> wrote:
> I have this (special mapping tracking) 3/4 implemented. I'm planning
> on making it fully functional for 64-bit programs and almost correct
> for 32-bit. (You'll still crash if you have multiple threads, you use
> sysenter, and you remap the vdso, but I think that this is essentially
> unavoidable until someone lets mremap work on multiple vmas at once.)
In case that's useful, I was looking at swapping the vvar page by
changing the vm_special_mapping to change the pages array between the
actual vvar page and the zero page and using zap_page_range to force
the next access to go through a page fault that would remap it.
I didn't have all the details figured out (I was closer to 1/4 of it
implemented) but I didn't see any issues on 32-bit programs.
Let me know if you'd like to see some of my patches or if you think I
should keep working on them.
Cheers,
Filipe
^ permalink raw reply [flat|nested] 36+ messages in thread* Re: [PATCH] x86/vdso: Add prctl to set per-process VDSO load
@ 2014-09-19 22:02 ` Filipe Brandenburger
0 siblings, 0 replies; 36+ messages in thread
From: Filipe Brandenburger @ 2014-09-19 22:02 UTC (permalink / raw)
To: Andy Lutomirski
Cc: H. Peter Anvin, Greg Thelen, X86 ML, Thomas Gleixner,
Michael Davidson, Ingo Molnar, Richard Larocque,
linux-kernel@vger.kernel.org, Linux API
Hi Andy,
On Fri, Sep 19, 2014 at 12:27 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> I have this (special mapping tracking) 3/4 implemented. I'm planning
> on making it fully functional for 64-bit programs and almost correct
> for 32-bit. (You'll still crash if you have multiple threads, you use
> sysenter, and you remap the vdso, but I think that this is essentially
> unavoidable until someone lets mremap work on multiple vmas at once.)
In case that's useful, I was looking at swapping the vvar page by
changing the vm_special_mapping to change the pages array between the
actual vvar page and the zero page and using zap_page_range to force
the next access to go through a page fault that would remap it.
I didn't have all the details figured out (I was closer to 1/4 of it
implemented) but I didn't see any issues on 32-bit programs.
Let me know if you'd like to see some of my patches or if you think I
should keep working on them.
Cheers,
Filipe
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] x86/vdso: Add prctl to set per-process VDSO load
2014-09-19 22:02 ` Filipe Brandenburger
(?)
@ 2014-09-19 22:09 ` Andy Lutomirski
[not found] ` <CALCETrXOeD=zBPvYV+-j0Ok6MXNZwdjRaz73rjuLGShx+o6Y0w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
-1 siblings, 1 reply; 36+ messages in thread
From: Andy Lutomirski @ 2014-09-19 22:09 UTC (permalink / raw)
To: Filipe Brandenburger
Cc: H. Peter Anvin, Greg Thelen, X86 ML, Thomas Gleixner,
Michael Davidson, Ingo Molnar, Richard Larocque,
linux-kernel@vger.kernel.org, Linux API
On Fri, Sep 19, 2014 at 3:02 PM, Filipe Brandenburger
<filbranden@google.com> wrote:
> Hi Andy,
>
> On Fri, Sep 19, 2014 at 12:27 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>> I have this (special mapping tracking) 3/4 implemented. I'm planning
>> on making it fully functional for 64-bit programs and almost correct
>> for 32-bit. (You'll still crash if you have multiple threads, you use
>> sysenter, and you remap the vdso, but I think that this is essentially
>> unavoidable until someone lets mremap work on multiple vmas at once.)
>
> In case that's useful, I was looking at swapping the vvar page by
> changing the vm_special_mapping to change the pages array between the
> actual vvar page and the zero page and using zap_page_range to force
> the next access to go through a page fault that would remap it.
That will do it globally, since the vm_special_mapping is global. I
was just thinking of using remap_pfn_range.
>
> I didn't have all the details figured out (I was closer to 1/4 of it
> implemented) but I didn't see any issues on 32-bit programs.
The 32-bit issue comes from mremap.
>
> Let me know if you'd like to see some of my patches or if you think I
> should keep working on them.
Give me another day or two to straighten out the vma stuff, although
it shouldn't impact your patches too much. The main effect will be
that you'll be able to rely on mm->context.vdso (renamed to
vvar_vma_start, most likey) being correct. Currently, you should
*not* rely on it, especially if CRIU is involved.
--Andy
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] x86/vdso: Add prctl to set per-process VDSO load
2014-09-17 0:05 [PATCH] x86/vdso: Add prctl to set per-process VDSO load Richard Larocque
2014-09-17 0:13 ` Andi Kleen
[not found] ` <1410912351-31273-1-git-send-email-rlarocque-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
@ 2014-09-17 11:36 ` Kevin Easton
2 siblings, 0 replies; 36+ messages in thread
From: Kevin Easton @ 2014-09-17 11:36 UTC (permalink / raw)
To: Richard Larocque
Cc: mingo, tglx, hpa, luto, filbranden, md, gthelen, x86, linux-api,
linux-kernel
On Tue, Sep 16, 2014 at 05:05:51PM -0700, Richard Larocque wrote:
> + case PR_SET_VDSO:
> + if (arg2 == PR_VDSO_ENABLE)
> + me->signal->disable_vdso = 0;
> + else if (arg2 == PR_VDSO_DISABLE)
> + me->signal->disable_vdso = 1;
> + else
> + return -EINVAL;
> + break;
> + case PR_GET_VDSO:
> + if (!me->signal->disable_vdso)
> + error = put_user(PR_VDSO_ENABLE, (int __user *)arg2);
> + else
> + error = put_user(PR_VDSO_DISABLE, (int __user *)arg2);
> + break;
Perhaps both of those should do
if (arg3 || arg4 || arg5)
return -EINVAL;
- Kevin
^ permalink raw reply [flat|nested] 36+ messages in thread
end of thread, other threads:[~2014-09-20 5:30 UTC | newest]
Thread overview: 36+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-09-17 0:05 [PATCH] x86/vdso: Add prctl to set per-process VDSO load Richard Larocque
2014-09-17 0:13 ` Andi Kleen
[not found] ` <8738brhzoc.fsf-KWJ+5VKanrL29G5dvP0v1laTQe2KTcn/@public.gmane.org>
2014-09-17 0:21 ` Richard Larocque
2014-09-17 0:21 ` Richard Larocque
[not found] ` <1410912351-31273-1-git-send-email-rlarocque-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2014-09-17 0:27 ` Andy Lutomirski
2014-09-17 0:27 ` Andy Lutomirski
[not found] ` <CALCETrXtYV5xKkTxothuqNb7ra80Be7ZXJ-hDnC6p-bfEPZ=Lw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-09-17 1:18 ` Richard Larocque
2014-09-17 1:18 ` Richard Larocque
[not found] ` <CABgu+=NC1ZMQkV0J2c9-MD1+TAmTjDm4ACZASPnfX2Fwiu5rMQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-09-17 5:00 ` Andy Lutomirski
2014-09-17 5:00 ` Andy Lutomirski
[not found] ` <CALCETrWj8Pj8d8YjybvOKG-=xmy-XGFo9cGQ9qn0V4t9Oj+dOw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-09-17 5:34 ` Andy Lutomirski
2014-09-17 5:34 ` Andy Lutomirski
2014-09-17 6:21 ` Filipe Brandenburger
2014-09-17 6:21 ` Filipe Brandenburger
[not found] ` <CADU+-uCguxhkw259MefABuzOxvyK_F6bhwza0YhveCCJxEV8wQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-09-17 8:46 ` H. Peter Anvin
2014-09-17 8:46 ` H. Peter Anvin
2014-09-17 13:48 ` Filipe Brandenburger
2014-09-17 14:28 ` Andy Lutomirski
2014-09-19 19:27 ` Andy Lutomirski
2014-09-19 21:26 ` Richard Larocque
[not found] ` <CALCETrXpB4qOS+JUBqJnTY8JUfQ=A6DaebU6e=32sSv=0c0QCg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-09-19 22:02 ` Filipe Brandenburger
2014-09-19 22:02 ` Filipe Brandenburger
2014-09-19 22:09 ` Andy Lutomirski
[not found] ` <CALCETrXOeD=zBPvYV+-j0Ok6MXNZwdjRaz73rjuLGShx+o6Y0w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-09-19 22:19 ` Filipe Brandenburger
2014-09-19 22:19 ` Filipe Brandenburger
[not found] ` <CADU+-uBmQdKSX==5NrYfSEWLg-X4cvJYXj=N8UdMdN_rHnWk4w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-09-19 22:31 ` Andy Lutomirski
2014-09-19 22:31 ` Andy Lutomirski
[not found] ` <CALCETrV7thL-0mST4O6YZgDNm1LyLqB=_bQ3vwwrAPYZCB=W+A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-09-20 3:10 ` Filipe Brandenburger
2014-09-20 3:10 ` Filipe Brandenburger
[not found] ` <CADU+-uBoMrF=vbOM4TW7YzD5cdHSn8BJdfs5nnPM6uS9-6n_Ag-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-09-20 3:27 ` Andy Lutomirski
2014-09-20 3:27 ` Andy Lutomirski
[not found] ` <CALCETrUgtveB-ddeL2t0cK18r_VeP5T7LrWCzszfFo+8GcdmEg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-09-20 3:46 ` Filipe Brandenburger
2014-09-20 3:46 ` Filipe Brandenburger
[not found] ` <CADU+-uAw8wLbY6E0msuwZFgW9hDXz8a6oeDJBTSURUfS3aO1gw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-09-20 5:30 ` Andy Lutomirski
2014-09-20 5:30 ` Andy Lutomirski
2014-09-17 11:36 ` Kevin Easton
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.