* Re: [PATCHv4 26/28] x86/vdso: Align VDSO functions by CPU L1 cache line
From: Thomas Gleixner @ 2019-06-14 14:13 UTC (permalink / raw)
To: Dmitry Safonov
Cc: linux-kernel, Andrei Vagin, Adrian Reber, Andrei Vagin,
Andy Lutomirski, Arnd Bergmann, Christian Brauner,
Cyrill Gorcunov, Dmitry Safonov, Eric W. Biederman,
H. Peter Anvin, Ingo Molnar, Jann Horn, Jeff Dike, Oleg Nesterov,
Pavel Emelyanov, Shuah Khan, Vincenzo Frascino, containers, criu,
linux-api, x86
In-Reply-To: <20190612192628.23797-27-dima@arista.com>
On Wed, 12 Jun 2019, Dmitry Safonov wrote:
> From: Andrei Vagin <avagin@gmail.com>
>
> After performance testing VDSO patches a noticeable 20% regression was
> found on gettime_perf selftest with a cold cache.
> As it turns to be, before time namespaces introduction, VDSO functions
> were quite aligned to cache lines, but adding a new code to adjust
> timens offset inside namespace created a small shift and vdso functions
> become unaligned on cache lines.
>
> Add align to vdso functions with gcc option to fix performance drop.
>
> Coping the resulting numbers from cover letter:
>
> Hot CPU cache (more gettime_perf.c cycles - the better):
> | before | CONFIG_TIME_NS=n | host | inside timens
> --------|------------|------------------|-------------|-------------
> cycles | 139887013 | 139453003 | 139899785 | 128792458
> diff (%)| 100 | 99.7 | 100 | 92
Why is CONFIG_TIME_NS=n behaving worse than current mainline and
worse than 'host' mode?
> Cold cache (lesser tsc per gettime_perf_cold.c cycle - the better):
> | before | CONFIG_TIME_NS=n | host | inside timens
> --------|------------|------------------|-------------|-------------
> tsc | 6748 | 6718 | 6862 | 12682
> diff (%)| 100 | 99.6 | 101.7 | 188
Weird, now CONFIG_TIME_NS=n is better than current mainline and 'host' mode
drops.
Either I'm misreading the numbers or missing something or I'm just confused
as usual :)
Thanks,
tglx
^ permalink raw reply
* Re: [PATCHv4 17/28] x86/vdso: Switch image on setns()/unshare()/clone()
From: Thomas Gleixner @ 2019-06-14 14:05 UTC (permalink / raw)
To: Dmitry Safonov
Cc: linux-kernel, Adrian Reber, Andrei Vagin, Andy Lutomirski,
Arnd Bergmann, Christian Brauner, Cyrill Gorcunov, Dmitry Safonov,
Eric W. Biederman, H. Peter Anvin, Ingo Molnar, Jann Horn,
Jeff Dike, Oleg Nesterov, Pavel Emelyanov, Shuah Khan,
Vincenzo Frascino, containers, criu, linux-api, x86, Andrei Vagin
In-Reply-To: <20190612192628.23797-18-dima@arista.com>
On Wed, 12 Jun 2019, Dmitry Safonov wrote:
>
> +#ifdef CONFIG_TIME_NS
> +int vdso_join_timens(struct task_struct *task)
> +{
> + struct mm_struct *mm = task->mm;
> + struct vm_area_struct *vma;
> +
> + if (down_write_killable(&mm->mmap_sem))
> + return -EINTR;
> +
> + for (vma = mm->mmap; vma; vma = vma->vm_next) {
> + unsigned long size = vma->vm_end - vma->vm_start;
> +
> + if (vma_is_special_mapping(vma, &vvar_mapping) ||
> + vma_is_special_mapping(vma, &vdso_mapping))
> + zap_page_range(vma, vma->vm_start, size);
> + }
> +
> + up_write(&mm->mmap_sem);
> + return 0;
> +}
> +#else /* CONFIG_TIME_NS */
> +int vdso_join_timens(struct task_struct *task)
> +{
> + return -ENXIO;
> +}
Is that else path really required? The callsite is only compiled when
CONFIG_TIME_NS is enabled, right?
Thanks,
tglx
^ permalink raw reply
* Re: [PATCH 1/5] glibc: Perform rseq(2) registration at C startup and thread creation (v10)
From: Mathieu Desnoyers @ 2019-06-14 13:59 UTC (permalink / raw)
To: Florian Weimer
Cc: carlos, Joseph Myers, Szabolcs Nagy, libc-alpha, Thomas Gleixner,
Ben Maurer, Peter Zijlstra, Paul E. McKenney, Boqun Feng,
Will Deacon, Dave Watson, Paul Turner, Rich Felker, linux-kernel,
linux-api
In-Reply-To: <87ef3wtgs4.fsf@oldenburg2.str.redhat.com>
----- On Jun 14, 2019, at 3:53 PM, Florian Weimer fweimer@redhat.com wrote:
> * Mathieu Desnoyers:
>
>> ----- On Jun 14, 2019, at 3:42 PM, Florian Weimer fweimer@redhat.com wrote:
>>
>>> * Mathieu Desnoyers:
>>>
>>>> + /* Publicize rseq registration ownership. This must be performed
>>>> + after rtld re-relocation, before invoking constructors of
>>>> + preloaded libraries. */
>>>> + rseq_init ();
>>>
>>> Please add a comment that IFUNC resolvers do not see the initialized
>>> value. I think this is okay because we currently do not support access
>>> to extern variables in IFUNC resolvers.
>>
>> Do IFUNC resolvers happen to observe the __rseq_handled address that
>> was internal to ld.so ?
>
> They should observe the correct address, but they can access the
> variable before initialization. An initializer in ld.so will not have
> an effect if an interposed definition initalized the variable to
> something else.
>
>> If so, we could simply initialize __rseq_handled twice: early before calling
>> IFUNC resolvers, and after ld.so re-relocation.
>
> No, I don't think this will make a difference.
So comment it is:
/* Publicize rseq registration ownership. This must be performed
after rtld re-relocation, before invoking constructors of
preloaded libraries. IFUNC resolvers are called before this
initialization, so they may not observe the initialized state. */
rseq_init ();
Thanks,
Mathieu
--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
^ permalink raw reply
* Re: [PATCHv4 15/28] x86/vdso: Add offsets page in vvar
From: Thomas Gleixner @ 2019-06-14 13:58 UTC (permalink / raw)
To: Dmitry Safonov
Cc: linux-kernel, Andrei Vagin, Adrian Reber, Andy Lutomirski,
Arnd Bergmann, Christian Brauner, Cyrill Gorcunov, Dmitry Safonov,
Eric W. Biederman, H. Peter Anvin, Ingo Molnar, Jann Horn,
Jeff Dike, Oleg Nesterov, Pavel Emelyanov, Shuah Khan,
Vincenzo Frascino, containers, criu, linux-api, x86, Andrei Vagin
In-Reply-To: <20190612192628.23797-16-dima@arista.com>
On Wed, 12 Jun 2019, Dmitry Safonov wrote:
>
> +#ifdef CONFIG_TIME_NS
> +notrace static __always_inline void clk_to_ns(clockid_t clk, struct timespec *ts)
> +{
> + struct timens_offsets *timens = (struct timens_offsets *) &timens_page;
> + struct timespec64 *offset64;
> +
> + switch (clk) {
> + case CLOCK_MONOTONIC:
> + case CLOCK_MONOTONIC_COARSE:
> + case CLOCK_MONOTONIC_RAW:
> + offset64 = &timens->monotonic;
> + break;
> + case CLOCK_BOOTTIME:
> + offset64 = &timens->boottime;
> + default:
> + return;
> + }
> +
> + ts->tv_nsec += offset64->tv_nsec;
> + ts->tv_sec += offset64->tv_sec;
> + if (ts->tv_nsec >= NSEC_PER_SEC) {
> + ts->tv_nsec -= NSEC_PER_SEC;
> + ts->tv_sec++;
> + }
> + if (ts->tv_nsec < 0) {
> + ts->tv_nsec += NSEC_PER_SEC;
> + ts->tv_sec--;
> + }
I had to think twice why adding the offset (which can be negative) can
never result in negative time being returned. A comment explaining this
would be appreciated.
As I'm planning to merge Vincezos VDSO consolidation into 5.3, can you
please start to work on top of his series, which should be available as
final v7 next week hopefully.
Thanks,
tglx
^ permalink raw reply
* Re: [PATCH 1/5] glibc: Perform rseq(2) registration at C startup and thread creation (v10)
From: Florian Weimer @ 2019-06-14 13:53 UTC (permalink / raw)
To: Mathieu Desnoyers
Cc: carlos, Joseph Myers, Szabolcs Nagy, libc-alpha, Thomas Gleixner,
Ben Maurer, Peter Zijlstra, Paul E. McKenney, Boqun Feng,
Will Deacon, Dave Watson, Paul Turner, Rich Felker, linux-kernel,
linux-api
In-Reply-To: <26171199.3391.1560520033825.JavaMail.zimbra@efficios.com>
* Mathieu Desnoyers:
> ----- On Jun 14, 2019, at 3:42 PM, Florian Weimer fweimer@redhat.com wrote:
>
>> * Mathieu Desnoyers:
>>
>>> + /* Publicize rseq registration ownership. This must be performed
>>> + after rtld re-relocation, before invoking constructors of
>>> + preloaded libraries. */
>>> + rseq_init ();
>>
>> Please add a comment that IFUNC resolvers do not see the initialized
>> value. I think this is okay because we currently do not support access
>> to extern variables in IFUNC resolvers.
>
> Do IFUNC resolvers happen to observe the __rseq_handled address that
> was internal to ld.so ?
They should observe the correct address, but they can access the
variable before initialization. An initializer in ld.so will not have
an effect if an interposed definition initalized the variable to
something else.
> If so, we could simply initialize __rseq_handled twice: early before calling
> IFUNC resolvers, and after ld.so re-relocation.
No, I don't think this will make a difference.
Thanks,
Florian
^ permalink raw reply
* Re: [PATCHv4 09/28] timens: Shift /proc/uptime
From: Thomas Gleixner @ 2019-06-14 13:50 UTC (permalink / raw)
To: Dmitry Safonov
Cc: linux-kernel, Adrian Reber, Andrei Vagin, Andy Lutomirski,
Arnd Bergmann, Christian Brauner, Cyrill Gorcunov, Dmitry Safonov,
Eric W. Biederman, H. Peter Anvin, Ingo Molnar, Jann Horn,
Jeff Dike, Oleg Nesterov, Pavel Emelyanov, Shuah Khan,
Vincenzo Frascino, containers, criu, linux-api, x86
In-Reply-To: <20190612192628.23797-10-dima@arista.com>
On Wed, 12 Jun 2019, Dmitry Safonov wrote:
Again, please use the usual prefix and bolt not everything to
timens. timens: is the proper prefix for the actual time namespace core
code.
Thanks,
tglx
^ permalink raw reply
* Re: [PATCHv4 08/28] timens/kernel: Take into account timens clock offsets in clock_nanosleep
From: Thomas Gleixner @ 2019-06-14 13:49 UTC (permalink / raw)
To: Dmitry Safonov
Cc: linux-kernel, Andrei Vagin, Adrian Reber, Andrei Vagin,
Andy Lutomirski, Arnd Bergmann, Christian Brauner,
Cyrill Gorcunov, Dmitry Safonov, Eric W. Biederman,
H. Peter Anvin, Ingo Molnar, Jann Horn, Jeff Dike, Oleg Nesterov,
Pavel Emelyanov, Shuah Khan, Vincenzo Frascino, containers, criu,
linux-api, x86
In-Reply-To: <20190612192628.23797-9-dima@arista.com>
On Wed, 12 Jun 2019, Dmitry Safonov wrote:
Again the subsystem prefix is something pulled out of thin air.
> From: Andrei Vagin <avagin@gmail.com>
>
> Wire up clock_nanosleep() to timens offsets.
>
> Signed-off-by: Andrei Vagin <avagin@openvz.org>
> Co-developed-by: Dmitry Safonov <dima@arista.com>
> Signed-off-by: Dmitry Safonov <dima@arista.com>
> ---
> include/linux/hrtimer.h | 2 +-
> kernel/time/alarmtimer.c | 2 ++
> kernel/time/hrtimer.c | 8 ++++----
> kernel/time/posix-stubs.c | 12 ++++++++++--
> kernel/time/posix-timers.c | 19 ++++++++++++++++---
> 5 files changed, 33 insertions(+), 10 deletions(-)
>
> diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h
> index 2e8957eac4d4..5a3b3e17d0e8 100644
> --- a/include/linux/hrtimer.h
> +++ b/include/linux/hrtimer.h
> @@ -473,7 +473,7 @@ static inline u64 hrtimer_forward_now(struct hrtimer *timer,
> /* Precise sleep: */
>
> extern int nanosleep_copyout(struct restart_block *, struct timespec64 *);
> -extern long hrtimer_nanosleep(const struct timespec64 *rqtp,
> +extern long hrtimer_nanosleep(ktime_t rqtp,
> const enum hrtimer_mode mode,
> const clockid_t clockid);
>
> diff --git a/kernel/time/alarmtimer.c b/kernel/time/alarmtimer.c
> index 6346e6ee0d32..f1f42df179d0 100644
> --- a/kernel/time/alarmtimer.c
> +++ b/kernel/time/alarmtimer.c
> @@ -819,6 +819,8 @@ static int alarm_timer_nsleep(const clockid_t which_clock, int flags,
> ktime_t now = alarm_bases[type].gettime();
>
> exp = ktime_add_safe(now, exp);
> + } else {
> + exp = timens_ktime_to_host(which_clock, exp);
> }
This one is independent of the hrtimer part. Please split it out into
Subject: alarmtimer: Make nanosleep time namespace aware
> ret = alarmtimer_do_nsleep(&alarm, exp, type);
> diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
> index 41dfff23c1f9..b245f6ff9c8f 100644
> --- a/kernel/time/hrtimer.c
> +++ b/kernel/time/hrtimer.c
> @@ -1716,7 +1716,7 @@ static long __sched hrtimer_nanosleep_restart(struct restart_block *restart)
> return ret;
> }
>
> -long hrtimer_nanosleep(const struct timespec64 *rqtp,
> +long hrtimer_nanosleep(ktime_t rqtp,
> const enum hrtimer_mode mode, const clockid_t clockid)
This signature change wants to be separate.
Subject: hrtimers: Prepare hrtimer_nanosleep() for time namespaces
> {
> struct restart_block *restart;
> @@ -1729,7 +1729,7 @@ long hrtimer_nanosleep(const struct timespec64 *rqtp,
> slack = 0;
>
> hrtimer_init_on_stack(&t.timer, clockid, mode);
> - hrtimer_set_expires_range_ns(&t.timer, timespec64_to_ktime(*rqtp), slack);
> + hrtimer_set_expires_range_ns(&t.timer, rqtp, slack);
> ret = do_nanosleep(&t, mode);
> if (ret != -ERESTART_RESTARTBLOCK)
> goto out;
> @@ -1764,7 +1764,7 @@ SYSCALL_DEFINE2(nanosleep, struct __kernel_timespec __user *, rqtp,
>
> current->restart_block.nanosleep.type = rmtp ? TT_NATIVE : TT_NONE;
> current->restart_block.nanosleep.rmtp = rmtp;
> - return hrtimer_nanosleep(&tu, HRTIMER_MODE_REL, CLOCK_MONOTONIC);
> + return hrtimer_nanosleep(timespec64_to_ktime(tu), HRTIMER_MODE_REL, CLOCK_MONOTONIC);
> }
>
> #endif
> @@ -1784,7 +1784,7 @@ SYSCALL_DEFINE2(nanosleep_time32, struct old_timespec32 __user *, rqtp,
>
> current->restart_block.nanosleep.type = rmtp ? TT_COMPAT : TT_NONE;
> current->restart_block.nanosleep.compat_rmtp = rmtp;
> - return hrtimer_nanosleep(&tu, HRTIMER_MODE_REL, CLOCK_MONOTONIC);
> + return hrtimer_nanosleep(timespec64_to_ktime(tu), HRTIMER_MODE_REL, CLOCK_MONOTONIC);
> }
> #endif
>
> diff --git a/kernel/time/posix-stubs.c b/kernel/time/posix-stubs.c
> index edaf075d1ee4..4ee0dc180866 100644
> --- a/kernel/time/posix-stubs.c
> +++ b/kernel/time/posix-stubs.c
> @@ -129,6 +129,7 @@ SYSCALL_DEFINE4(clock_nanosleep, const clockid_t, which_clock, int, flags,
> struct __kernel_timespec __user *, rmtp)
> {
> struct timespec64 t;
> + ktime_t texp;
>
> switch (which_clock) {
> case CLOCK_REALTIME:
> @@ -147,7 +148,10 @@ SYSCALL_DEFINE4(clock_nanosleep, const clockid_t, which_clock, int, flags,
> rmtp = NULL;
> current->restart_block.nanosleep.type = rmtp ? TT_NATIVE : TT_NONE;
> current->restart_block.nanosleep.rmtp = rmtp;
> - return hrtimer_nanosleep(&t, flags & TIMER_ABSTIME ?
> + texp = timespec64_to_ktime(t);
> + if (flags & TIMER_ABSTIME)
> + texp = timens_ktime_to_host(clockid, texp;
And then add actual name space support with:
Subject: posix-timers: Make clock_nanosleep() time namespace aware
> + return hrtimer_nanosleep(texp, flags & TIMER_ABSTIME ?
> HRTIMER_MODE_ABS : HRTIMER_MODE_REL,
> which_clock);
> }
Thanks,
tglx
^ permalink raw reply
* Re: [PATCH 1/5] glibc: Perform rseq(2) registration at C startup and thread creation (v10)
From: Mathieu Desnoyers @ 2019-06-14 13:47 UTC (permalink / raw)
To: Florian Weimer
Cc: carlos, Joseph Myers, Szabolcs Nagy, libc-alpha, Thomas Gleixner,
Ben Maurer, Peter Zijlstra, Paul E. McKenney, Boqun Feng,
Will Deacon, Dave Watson, Paul Turner, Rich Felker, linux-kernel,
linux-api
In-Reply-To: <87imt8tha5.fsf@oldenburg2.str.redhat.com>
----- On Jun 14, 2019, at 3:42 PM, Florian Weimer fweimer@redhat.com wrote:
> * Mathieu Desnoyers:
>
>> + /* Publicize rseq registration ownership. This must be performed
>> + after rtld re-relocation, before invoking constructors of
>> + preloaded libraries. */
>> + rseq_init ();
>
> Please add a comment that IFUNC resolvers do not see the initialized
> value. I think this is okay because we currently do not support access
> to extern variables in IFUNC resolvers.
Do IFUNC resolvers happen to observe the __rseq_handled address that
was internal to ld.so ?
If so, we could simply initialize __rseq_handled twice: early before calling
IFUNC resolvers, and after ld.so re-relocation.
Thanks,
Mathieu
--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
^ permalink raw reply
* Re: [PATCHv4 07/28] posix-timers/timens: Take into account clock offsets
From: Thomas Gleixner @ 2019-06-14 13:42 UTC (permalink / raw)
To: Dmitry Safonov
Cc: linux-kernel, Andrei Vagin, Adrian Reber, Andrei Vagin,
Andy Lutomirski, Arnd Bergmann, Christian Brauner,
Cyrill Gorcunov, Dmitry Safonov, Eric W. Biederman,
H. Peter Anvin, Ingo Molnar, Jann Horn, Jeff Dike, Oleg Nesterov,
Pavel Emelyanov, Shuah Khan, Vincenzo Frascino, containers, criu,
linux-api, x86
In-Reply-To: <20190612192628.23797-8-dima@arista.com>
On Wed, 12 Jun 2019, Dmitry Safonov wrote:
> Subject: posix-timers/timens: Take into account clock offsets
Please avoid that '/timens' appendix. It's not really a new subsystem or
subfunction of posix-timers.
posix-timers: Add time namespace support to common_timer_set()
> From: Andrei Vagin <avagin@gmail.com>
>
> Wire timer_settime() syscall into time namespace virtualization.
Please explain why this only affects common_timer_set() and not any other
incarnation along with an explanation why only ABSTIME timers need to be
converted.
Thanks,
tglx
^ permalink raw reply
* Re: [PATCH 1/5] glibc: Perform rseq(2) registration at C startup and thread creation (v10)
From: Florian Weimer @ 2019-06-14 13:42 UTC (permalink / raw)
To: Mathieu Desnoyers
Cc: carlos, Joseph Myers, Szabolcs Nagy, libc-alpha, Thomas Gleixner,
Ben Maurer, Peter Zijlstra, Paul E. McKenney, Boqun Feng,
Will Deacon, Dave Watson, Paul Turner, Rich Felker, linux-kernel,
linux-api
In-Reply-To: <189377747.3315.1560519247118.JavaMail.zimbra@efficios.com>
* Mathieu Desnoyers:
> + /* Publicize rseq registration ownership. This must be performed
> + after rtld re-relocation, before invoking constructors of
> + preloaded libraries. */
> + rseq_init ();
Please add a comment that IFUNC resolvers do not see the initialized
value. I think this is okay because we currently do not support access
to extern variables in IFUNC resolvers.
> /* Do any necessary cleanups for the startup OS interface code.
> We do these now so that no calls are made after rtld re-relocation
> which might be resolved to different functions than we expect.
>
> It works fine now!
Great.
Thanks,
Florian
^ permalink raw reply
* Re: [PATCH 1/5] glibc: Perform rseq(2) registration at C startup and thread creation (v10)
From: Mathieu Desnoyers @ 2019-06-14 13:39 UTC (permalink / raw)
To: David Laight
Cc: Florian Weimer, carlos, Joseph Myers, Szabolcs Nagy, libc-alpha,
Thomas Gleixner, Ben Maurer, Peter Zijlstra, Paul E. McKenney,
Boqun Feng, Will Deacon, Dave Watson, Paul Turner, Rich Felker,
linux-kernel, linux-api
In-Reply-To: <69a53ec2ce184af29c4cae58e0b2fb57@AcuMS.aculab.com>
----- On Jun 14, 2019, at 3:29 PM, David Laight David.Laight@ACULAB.COM wrote:
> From: Mathieu Desnoyers
>> Sent: 14 June 2019 14:02
> ...
>> But my original issue remains: if I define a variable called __rseq_handled
>> within either the main executable or the preloaded library, it overshadows
>> the libc one:
>
> 1) That is the was elf symbol resolution is required to work.
> Otherwise variables like 'errno' (non-thread safe form) wouldn't work.
>
> 2) Don't do it then :-)
> Names starting with __ will be reserved (probably 'for the implementation').
>
> The real 'fun' starts because, under some circumstances, looking up a symbol as:
> foo = dlsym(lib_handle, "foo");
> Can find the data item instead of the function!
> Usually it works (even when foo is global data) because 'lib_handle' refers
> to a different symbol table.
> But it can go horribly wrong.
I was setting __rseq_handled too soon, before re-relocation of the dynamic linker.
I moved the initialization after re-relocation and it works fine now.
The purpose of __rseq_handled is to allow early adopter libraries and applications
to define their own global instance of the symbol, and check whether the libc
they are linked against handle rseq registration or not.
libc specifies the layout of that variable (an integer). The dynamic linker
chooses one of those instances so it's used in the global symbol table of the
program. The important thing is that all libraries agree on that global symbol.
Of course this is not compatible with libraries compiled with forced "hidden"
symbols only.
Thanks,
Mathieu
--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
^ permalink raw reply
* Re: [PATCHv4 06/28] timerfd/timens: Take into account ns clock offsets
From: Thomas Gleixner @ 2019-06-14 13:37 UTC (permalink / raw)
To: Dmitry Safonov
Cc: linux-kernel, Andrei Vagin, Adrian Reber, Andrei Vagin,
Andy Lutomirski, Arnd Bergmann, Christian Brauner,
Cyrill Gorcunov, Dmitry Safonov, Eric W. Biederman,
H. Peter Anvin, Ingo Molnar, Jann Horn, Jeff Dike, Oleg Nesterov,
Pavel Emelyanov, Shuah Khan, Vincenzo Frascino, containers, criu,
linux-api, x86
In-Reply-To: <20190612192628.23797-7-dima@arista.com>
On Wed, 12 Jun 2019, Dmitry Safonov wrote:
> ---
> fs/timerfd.c | 3 +++
> include/linux/time_namespace.h | 18 ++++++++++++++++++
> kernel/time_namespace.c | 27 +++++++++++++++++++++++++++
> 3 files changed, 48 insertions(+)
Again, please split that into:
1) Introduce the new function
2) Make use of it
> diff --git a/fs/timerfd.c b/fs/timerfd.c
> index 6a6fc8aa1de7..9b0c2f65e7e8 100644
> --- a/fs/timerfd.c
> +++ b/fs/timerfd.c
> @@ -26,6 +26,7 @@
> #include <linux/syscalls.h>
> #include <linux/compat.h>
> #include <linux/rcupdate.h>
> +#include <linux/time_namespace.h>
>
> struct timerfd_ctx {
> union {
> @@ -196,6 +197,8 @@ static int timerfd_setup(struct timerfd_ctx *ctx, int flags,
> }
>
> if (texp != 0) {
> + if (flags & TFD_TIMER_ABSTIME)
> + texp = timens_ktime_to_host(clockid, texp);
> if (isalarm(ctx)) {
> if (flags & TFD_TIMER_ABSTIME)
> alarm_start(&ctx->t.alarm, texp);
> diff --git a/include/linux/time_namespace.h b/include/linux/time_namespace.h
> index 1dda8af6b9fe..d32b55fad953 100644
> --- a/include/linux/time_namespace.h
> +++ b/include/linux/time_namespace.h
> @@ -56,6 +56,19 @@ static inline void timens_add_boottime(struct timespec64 *ts)
> *ts = timespec64_add(*ts, ns_offsets->boottime);
> }
>
> +ktime_t do_timens_ktime_to_host(clockid_t clockid, ktime_t tim,
> + struct timens_offsets *offsets);
> +static inline ktime_t timens_ktime_to_host(clockid_t clockid, ktime_t tim)
> +{
> + struct timens_offsets *offsets = current->nsproxy->time_ns->offsets;
> +
> + if (!offsets) /* fast-path for the root time namespace */
Can you please avoid tail comments. They break the reading flow. Aside of
that I don't see the value of documenting the obvious.
> +ktime_t do_timens_ktime_to_host(clockid_t clockid, ktime_t tim, struct timens_offsets *ns_offsets)
Please line break the arguments
ktime_t do_timens_ktime_to_host(clockid_t clockid, ktime_t tim,
struct timens_offsets *ns_offsets)
> +{
> + ktime_t koff;
> +
> + switch (clockid) {
> + case CLOCK_MONOTONIC:
> + koff = timespec64_to_ktime(ns_offsets->monotonic);
> + break;
> + case CLOCK_BOOTTIME:
> + case CLOCK_BOOTTIME_ALARM:
> + koff = timespec64_to_ktime(ns_offsets->boottime);
> + break;
> + default:
> + return tim;
> + }
> +
> + /* tim - off has to be in [0, KTIME_MAX) */
Please be more elaborate why the below conditions can happen at all.
> + if (tim < koff)
> + tim = 0;
> + else if (KTIME_MAX - tim < -koff)
> + tim = KTIME_MAX;
> + else
> + tim = ktime_sub(tim, koff);
Thanks,
tglx
^ permalink raw reply
* Re: [PATCH 1/5] glibc: Perform rseq(2) registration at C startup and thread creation (v10)
From: Mathieu Desnoyers @ 2019-06-14 13:34 UTC (permalink / raw)
To: Florian Weimer
Cc: carlos, Joseph Myers, Szabolcs Nagy, libc-alpha, Thomas Gleixner,
Ben Maurer, Peter Zijlstra, Paul E. McKenney, Boqun Feng,
Will Deacon, Dave Watson, Paul Turner, Rich Felker, linux-kernel,
linux-api
In-Reply-To: <87wohoti47.fsf@oldenburg2.str.redhat.com>
----- On Jun 14, 2019, at 3:24 PM, Florian Weimer fweimer@redhat.com wrote:
> * Mathieu Desnoyers:
>
>> ----- On Jun 14, 2019, at 3:09 PM, Florian Weimer fweimer@redhat.com wrote:
>>
>>> * Mathieu Desnoyers:
>>>
>>>> But my original issue remains: if I define a variable called __rseq_handled
>>>> within either the main executable or the preloaded library, it overshadows
>>>> the libc one:
>>>>
>>>> efficios@compudjdev:~/test/libc-sym$ ./a
>>>> __rseq_handled main: 0 0x56135fd5102c
>>>> __rseq_abi.cpu_id main: 29 0x7fcbeca6d5a0
>>>> efficios@compudjdev:~/test/libc-sym$ LD_PRELOAD=./s.so ./a
>>>> __rseq_handled s.so: 0 0x558f70aeb02c
>>>> __rseq_abi.cpu_id s.so: -1 0x7fdca78b7760
>>>> __rseq_handled main: 0 0x558f70aeb02c
>>>> __rseq_abi.cpu_id main: 27 0x7fdca78b7760
>>>>
>>>> Which is unexpected.
>>>
>>> Why is this unexpected? It has to be this way if the main program uses
>>> a copy relocation of __rseq_handled. As long as there is just one
>>> address across the entire program and ld.so initializes the copy of the
>>> variable that is actually used, everything will be fine.
>>
>> Here is a printout of the __rseq_handled address observed by ld.so, it
>> does not match:
>>
>> LD_PRELOAD=./s.so ./a
>> elf: __rseq_handled addr: 7f501c98a140
>> __rseq_handled s.so: 0 0x55817a88d02c
>> __rseq_abi.cpu_id s.so: -1 0x7f501c983760
>> __rseq_handled main: 0 0x55817a88d02c
>> __rseq_abi.cpu_id main: 27 0x7f501c983760
>
> Where do you print the address? Before or after the self-relocation of
> the dynamic loader? The address is only correct after self-relocation.
I printed the address within rseq_init (), which happened to be invoked
by the linker startup waaaay too early. I followed your advice and moved
the rseq_init () invocation after linker re-relocation:
diff --git a/elf/rtld.c b/elf/rtld.c
index f29f284a7c..66b0894f9d 100644
--- a/elf/rtld.c
+++ b/elf/rtld.c
@@ -1410,9 +1410,6 @@ ERROR: '%s': cannot process note segment.\n", _dl_argv[0]);
/* Assign a module ID. Do this before loading any audit modules. */
GL(dl_rtld_map).l_tls_modid = _dl_next_tls_modid ();
- /* Publicize rseq registration ownership. */
- rseq_init ();
-
/* If we have auditing DSOs to load, do it now. */
bool need_security_init = true;
if (__glibc_unlikely (audit_list != NULL)
@@ -2284,6 +2281,11 @@ ERROR: ld.so: object '%s' cannot be loaded as audit interface: %s; ignored.\n",
HP_TIMING_ACCUM_NT (relocate_time, add);
}
+ /* Publicize rseq registration ownership. This must be performed
+ after rtld re-relocation, before invoking constructors of
+ preloaded libraries. */
+ rseq_init ();
+
/* Do any necessary cleanups for the startup OS interface code.
We do these now so that no calls are made after rtld re-relocation
which might be resolved to different functions than we expect.
It works fine now!
LD_PRELOAD=./s.so ./a
elf: __rseq_handled addr: 56300f0a402c
__rseq_handled s.so: 1 0x56300f0a402c
__rseq_abi.cpu_id s.so: -1 0x7fad2ff58760
__rseq_handled main: 1 0x56300f0a402c
__rseq_abi.cpu_id main: 27 0x7fad2ff58760
Thanks!
Mathieu
--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
^ permalink raw reply related
* Re: [PATCHv4 03/28] posix-clocks: add another call back to return clock time in ktime_t
From: Thomas Gleixner @ 2019-06-14 13:32 UTC (permalink / raw)
To: Dmitry Safonov
Cc: linux-kernel, Andrei Vagin, Adrian Reber, Andrei Vagin,
Andy Lutomirski, Arnd Bergmann, Christian Brauner,
Cyrill Gorcunov, Dmitry Safonov, Eric W. Biederman,
H. Peter Anvin, Ingo Molnar, Jann Horn, Jeff Dike, Oleg Nesterov,
Pavel Emelyanov, Shuah Khan, Vincenzo Frascino, containers, criu,
linux-api, x86
In-Reply-To: <20190612192628.23797-4-dima@arista.com>
Dmitry,
On Wed, 12 Jun 2019, Dmitry Safonov wrote:
> From: Andrei Vagin <avagin@gmail.com>
>
> The callsite in common_timer_get() has already a comment:
> /*
> * The timespec64 based conversion is suboptimal, but it's not
> * worth to implement yet another callback.
> */
> kc->clock_get(timr->it_clock, &ts64);
> now = timespec64_to_ktime(ts64);
>
> Now we are going to add time namespaces and we need to be able to get:
Please avoid 'we' and try to describe the changes in a neutral technical
form, e.g.:
The upcoming support for time namespaces requires to have access to:
> * clock value in a task time namespace to return it from the clock_gettime
> syscall.
- The time in a tasks time namespace for sys_clock_gettime()
> * clock valuse in the root time namespace to use it in
> common_timer_get().
- The time in the root name space for common_timer_get()
> It looks like another reason why we need a separate callback to return
> clock value in ktime_t.
That adds a valid reason to finally implement a separate callback which
returns the time in ktime_t format.
Hmm?
> +int posix_get_timespec(clockid_t which_clock, struct timespec64 *tp);
> +int posix_get_boottime_timespec(const clockid_t which_clock, struct timespec64 *tp);
> #endif
> diff --git a/kernel/time/alarmtimer.c b/kernel/time/alarmtimer.c
> index 0519a8805aab..68a163c8b4f2 100644
> --- a/kernel/time/alarmtimer.c
> +++ b/kernel/time/alarmtimer.c
> @@ -43,6 +43,8 @@ static struct alarm_base {
> spinlock_t lock;
> struct timerqueue_head timerqueue;
> ktime_t (*gettime)(void);
> + int (*get_timespec)(const clockid_t which_clock,
> + struct timespec64 *tp);
> clockid_t base_clockid;
> } alarm_bases[ALARM_NUMTYPE];
>
> @@ -645,21 +647,30 @@ static int alarm_clock_getres(const clockid_t which_clock, struct timespec64 *tp
> }
>
> /**
> - * alarm_clock_get - posix clock_get interface
> + * alarm_clock_get_timespec - posix clock_get_timespec interface
> * @which_clock: clockid
> * @tp: timespec to fill.
> *
> * Provides the underlying alarm base time.
> */
> -static int alarm_clock_get(clockid_t which_clock, struct timespec64 *tp)
> +static int alarm_clock_get_timespec(clockid_t which_clock, struct timespec64 *tp)
> {
> struct alarm_base *base = &alarm_bases[clock2alarm(which_clock)];
>
> if (!alarmtimer_get_rtcdev())
> return -EINVAL;
>
> - *tp = ktime_to_timespec64(base->gettime());
> - return 0;
> + return base->get_timespec(base->base_clockid, tp);
> +}
> +
> +static ktime_t alarm_clock_get_ktime(clockid_t which_clock)
Please add kernel doc for this function. It does not make sense to have one
documented and the other not.
> +{
> + struct alarm_base *base = &alarm_bases[clock2alarm(which_clock)];
> +
> + if (!alarmtimer_get_rtcdev())
> + return -EINVAL;
> +
> + return base->gettime();
> }
> --- a/kernel/time/posix-timers.h
> +++ b/kernel/time/posix-timers.h
> @@ -6,8 +6,11 @@ struct k_clock {
> struct timespec64 *tp);
> int (*clock_set)(const clockid_t which_clock,
> const struct timespec64 *tp);
> - int (*clock_get)(const clockid_t which_clock,
> - struct timespec64 *tp);
> + /* return the clock value in the current time namespace. */
> + int (*clock_get_timespec)(const clockid_t which_clock,
> + struct timespec64 *tp);
> + /* return the clock value in the root time namespace. */
> + ktime_t (*clock_get_ktime)(const clockid_t which_clock);
> int (*clock_adj)(const clockid_t which_clock, struct __kernel_timex *tx);
> int (*timer_create)(struct k_itimer *timer);
> int (*nsleep)(const clockid_t which_clock, int flags,
TBH, this patch is way to big. It changes too many things at once. Can you
please structure it this way:
1) Rename k_clock::clock_get to k_clock::clock_get_timespec and fix up all
struct initializers
2) Rename the clock_get_timespec functions per instance
3) Add the new callback
4) Add the new functions per instance and add them to the corresponding
struct initializers
5) Use the new callback
Thanks,
tglx
^ permalink raw reply
* RE: [PATCH 1/5] glibc: Perform rseq(2) registration at C startup and thread creation (v10)
From: David Laight @ 2019-06-14 13:29 UTC (permalink / raw)
To: 'Mathieu Desnoyers', Florian Weimer
Cc: carlos, Joseph Myers, Szabolcs Nagy, libc-alpha, Thomas Gleixner,
Ben Maurer, Peter Zijlstra, Paul E. McKenney, Boqun Feng,
Will Deacon, Dave Watson, Paul Turner, Rich Felker, linux-kernel,
linux-api
In-Reply-To: <1085273942.3137.1560517301721.JavaMail.zimbra@efficios.com>
From: Mathieu Desnoyers
> Sent: 14 June 2019 14:02
...
> But my original issue remains: if I define a variable called __rseq_handled
> within either the main executable or the preloaded library, it overshadows
> the libc one:
1) That is the was elf symbol resolution is required to work.
Otherwise variables like 'errno' (non-thread safe form) wouldn't work.
2) Don't do it then :-)
Names starting with __ will be reserved (probably 'for the implementation').
The real 'fun' starts because, under some circumstances, looking up a symbol as:
foo = dlsym(lib_handle, "foo");
Can find the data item instead of the function!
Usually it works (even when foo is global data) because 'lib_handle' refers
to a different symbol table.
But it can go horribly wrong.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
^ permalink raw reply
* Re: [PATCH 1/5] glibc: Perform rseq(2) registration at C startup and thread creation (v10)
From: Florian Weimer @ 2019-06-14 13:24 UTC (permalink / raw)
To: Mathieu Desnoyers
Cc: carlos, Joseph Myers, Szabolcs Nagy, libc-alpha, Thomas Gleixner,
Ben Maurer, Peter Zijlstra, Paul E. McKenney, Boqun Feng,
Will Deacon, Dave Watson, Paul Turner, Rich Felker, linux-kernel,
linux-api
In-Reply-To: <1779359826.3226.1560518318701.JavaMail.zimbra@efficios.com>
* Mathieu Desnoyers:
> ----- On Jun 14, 2019, at 3:09 PM, Florian Weimer fweimer@redhat.com wrote:
>
>> * Mathieu Desnoyers:
>>
>>> But my original issue remains: if I define a variable called __rseq_handled
>>> within either the main executable or the preloaded library, it overshadows
>>> the libc one:
>>>
>>> efficios@compudjdev:~/test/libc-sym$ ./a
>>> __rseq_handled main: 0 0x56135fd5102c
>>> __rseq_abi.cpu_id main: 29 0x7fcbeca6d5a0
>>> efficios@compudjdev:~/test/libc-sym$ LD_PRELOAD=./s.so ./a
>>> __rseq_handled s.so: 0 0x558f70aeb02c
>>> __rseq_abi.cpu_id s.so: -1 0x7fdca78b7760
>>> __rseq_handled main: 0 0x558f70aeb02c
>>> __rseq_abi.cpu_id main: 27 0x7fdca78b7760
>>>
>>> Which is unexpected.
>>
>> Why is this unexpected? It has to be this way if the main program uses
>> a copy relocation of __rseq_handled. As long as there is just one
>> address across the entire program and ld.so initializes the copy of the
>> variable that is actually used, everything will be fine.
>
> Here is a printout of the __rseq_handled address observed by ld.so, it
> does not match:
>
> LD_PRELOAD=./s.so ./a
> elf: __rseq_handled addr: 7f501c98a140
> __rseq_handled s.so: 0 0x55817a88d02c
> __rseq_abi.cpu_id s.so: -1 0x7f501c983760
> __rseq_handled main: 0 0x55817a88d02c
> __rseq_abi.cpu_id main: 27 0x7f501c983760
Where do you print the address? Before or after the self-relocation of
the dynamic loader? The address is only correct after self-relocation.
Thanks,
Florian
^ permalink raw reply
* Re: [PATCH 1/5] glibc: Perform rseq(2) registration at C startup and thread creation (v10)
From: Mathieu Desnoyers @ 2019-06-14 13:18 UTC (permalink / raw)
To: Florian Weimer
Cc: carlos, Joseph Myers, Szabolcs Nagy, libc-alpha, Thomas Gleixner,
Ben Maurer, Peter Zijlstra, Paul E. McKenney, Boqun Feng,
Will Deacon, Dave Watson, Paul Turner, Rich Felker, linux-kernel,
linux-api
In-Reply-To: <87d0jguxdk.fsf@oldenburg2.str.redhat.com>
----- On Jun 14, 2019, at 3:09 PM, Florian Weimer fweimer@redhat.com wrote:
> * Mathieu Desnoyers:
>
>> But my original issue remains: if I define a variable called __rseq_handled
>> within either the main executable or the preloaded library, it overshadows
>> the libc one:
>>
>> efficios@compudjdev:~/test/libc-sym$ ./a
>> __rseq_handled main: 0 0x56135fd5102c
>> __rseq_abi.cpu_id main: 29 0x7fcbeca6d5a0
>> efficios@compudjdev:~/test/libc-sym$ LD_PRELOAD=./s.so ./a
>> __rseq_handled s.so: 0 0x558f70aeb02c
>> __rseq_abi.cpu_id s.so: -1 0x7fdca78b7760
>> __rseq_handled main: 0 0x558f70aeb02c
>> __rseq_abi.cpu_id main: 27 0x7fdca78b7760
>>
>> Which is unexpected.
>
> Why is this unexpected? It has to be this way if the main program uses
> a copy relocation of __rseq_handled. As long as there is just one
> address across the entire program and ld.so initializes the copy of the
> variable that is actually used, everything will be fine.
Here is a printout of the __rseq_handled address observed by ld.so, it
does not match:
LD_PRELOAD=./s.so ./a
elf: __rseq_handled addr: 7f501c98a140
__rseq_handled s.so: 0 0x55817a88d02c
__rseq_abi.cpu_id s.so: -1 0x7f501c983760
__rseq_handled main: 0 0x55817a88d02c
__rseq_abi.cpu_id main: 27 0x7f501c983760
This is with the following in a.c:
#include <stdio.h>
#include <linux/rseq.h>
__thread struct rseq __rseq_abi
__attribute__ ((tls_model ("initial-exec"))) = {
.cpu_id = -1,
};
int __rseq_handled;
int main()
{
fprintf(stderr, "__rseq_handled main: %d %p\n", __rseq_handled, &__rseq_handled);
fprintf(stderr, "__rseq_abi.cpu_id main: %d %p\n", __rseq_abi.cpu_id, &__rseq_abi);
return 0;
}
As we can see, the state of __rseq_handled observed by the preloaded
lib and the program is "0", but should really be "1". This can be
explained by ld.so not using the same address as the rest of the
program, but how can we fix that ?
Thanks,
Mathieu
--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
^ permalink raw reply
* Re: [PATCHv4 02/28] timens: Add timens_offsets
From: Thomas Gleixner @ 2019-06-14 13:11 UTC (permalink / raw)
To: Dmitry Safonov
Cc: linux-kernel, Andrei Vagin, Adrian Reber, Andy Lutomirski,
Arnd Bergmann, Christian Brauner, Cyrill Gorcunov, Dmitry Safonov,
Eric W. Biederman, H. Peter Anvin, Ingo Molnar, Jann Horn,
Jeff Dike, Oleg Nesterov, Pavel Emelyanov, Shuah Khan,
Vincenzo Frascino, containers, criu, linux-api, x86
In-Reply-To: <20190612192628.23797-3-dima@arista.com>
On Wed, 12 Jun 2019, Dmitry Safonov wrote:
> From: Andrei Vagin <avagin@openvz.org>
>
> Introduce offsets for time namespace. They will contain an adjustment
> needed to convert clocks to/from host's.
>
> Allocate one page for each time namespace that will be premapped into
> userspace among vvar pages.
> index 000000000000..7d7cb68ea778
> --- /dev/null
> +++ b/include/linux/timens_offsets.h
> @@ -0,0 +1,8 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _LINUX_TIME_OFFSETS_H
> +#define _LINUX_TIME_OFFSETS_H
> +
> +struct timens_offsets {
> +};
That empty struct which is nowhere used looks odd. Can you move that to the
patch which actually makes use of it?
Thanks,
tglx
^ permalink raw reply
* Re: [PATCH 1/5] glibc: Perform rseq(2) registration at C startup and thread creation (v10)
From: Florian Weimer @ 2019-06-14 13:09 UTC (permalink / raw)
To: Mathieu Desnoyers
Cc: carlos, Joseph Myers, Szabolcs Nagy, libc-alpha, Thomas Gleixner,
Ben Maurer, Peter Zijlstra, Paul E. McKenney, Boqun Feng,
Will Deacon, Dave Watson, Paul Turner, Rich Felker, linux-kernel,
linux-api
In-Reply-To: <1085273942.3137.1560517301721.JavaMail.zimbra@efficios.com>
* Mathieu Desnoyers:
> But my original issue remains: if I define a variable called __rseq_handled
> within either the main executable or the preloaded library, it overshadows
> the libc one:
>
> efficios@compudjdev:~/test/libc-sym$ ./a
> __rseq_handled main: 0 0x56135fd5102c
> __rseq_abi.cpu_id main: 29 0x7fcbeca6d5a0
> efficios@compudjdev:~/test/libc-sym$ LD_PRELOAD=./s.so ./a
> __rseq_handled s.so: 0 0x558f70aeb02c
> __rseq_abi.cpu_id s.so: -1 0x7fdca78b7760
> __rseq_handled main: 0 0x558f70aeb02c
> __rseq_abi.cpu_id main: 27 0x7fdca78b7760
>
> Which is unexpected.
Why is this unexpected? It has to be this way if the main program uses
a copy relocation of __rseq_handled. As long as there is just one
address across the entire program and ld.so initializes the copy of the
variable that is actually used, everything will be fine.
Thanks,
Florian
^ permalink raw reply
* Re: [PATCH 1/5] glibc: Perform rseq(2) registration at C startup and thread creation (v10)
From: Mathieu Desnoyers @ 2019-06-14 13:01 UTC (permalink / raw)
To: Florian Weimer
Cc: carlos, Joseph Myers, Szabolcs Nagy, libc-alpha, Thomas Gleixner,
Ben Maurer, Peter Zijlstra, Paul E. McKenney, Boqun Feng,
Will Deacon, Dave Watson, Paul Turner, Rich Felker, linux-kernel,
linux-api
In-Reply-To: <1190407525.3131.1560516910936.JavaMail.zimbra@efficios.com>
----- On Jun 14, 2019, at 2:55 PM, Mathieu Desnoyers mathieu.desnoyers@efficios.com wrote:
> ----- On Jun 14, 2019, at 1:35 PM, Florian Weimer fweimer@redhat.com wrote:
>
>> * Mathieu Desnoyers:
>>
>>> * Makefile:
>>>
>>> LIBCPATH=/home/efficios/glibc-test/lib
>>> KERNEL_HEADERS=/home/efficios/git/linux-percpu-dev/usr/include
>>> CFLAGS=-I${KERNEL_HEADERS} -L${LIBCPATH} -Wl,--rpath=${LIBCPATH}
>>> -Wl,--dynamic-linker=${LIBCPATH}/ld-linux-x86-64.so.2
>>>
>>> all:
>>> gcc ${CFLAGS} -o a a.c
>>> gcc ${CFLAGS} -shared -fPIC -o s.so s.c
>>
>> For me, that does not correctly link against the built libc because the
>> system dynamic loader seeps into the link.
>
> I have the same issue. I tried adding "-B${LIBCPATH}" as well, but it did
> not seem to help. I still have this ldd output:
>
> ldd a
> ./a: /lib64/ld-linux-x86-64.so.2: version `GLIBC_2.30' not found (required by
> ./a)
> linux-vdso.so.1 (0x00007fffaa7e9000)
> libc.so.6 => /home/efficios/glibc-test/lib/libc.so.6 (0x00007fac5d479000)
> /home/efficios/glibc-test/lib/ld-linux-x86-64.so.2 =>
> /lib64/ld-linux-x86-64.so.2 (0x00007fac5da33000)
>
> Still no luck there. Any idea what compiler/linker flag I am missing ?
>
Actually, even though ldd seems confused, running the program seems to
use the right ld.so:
efficios@compudjdev:~/test/libc-sym$ ./a
__rseq_handled main: 1 0x55f0ec915020
__rseq_abi.cpu_id main: 28 0x7f54f6c2d4c0
efficios@compudjdev:~/test/libc-sym$ LD_PRELOAD=./s.so ./a
__rseq_handled s.so: 1 0x557350bc6020
__rseq_abi.cpu_id s.so: -1 0x7fe2f30f2680
__rseq_handled main: 1 0x557350bc6020
__rseq_abi.cpu_id main: 27 0x7fe2f30f2680
But my original issue remains: if I define a variable called __rseq_handled
within either the main executable or the preloaded library, it overshadows
the libc one:
efficios@compudjdev:~/test/libc-sym$ ./a
__rseq_handled main: 0 0x56135fd5102c
__rseq_abi.cpu_id main: 29 0x7fcbeca6d5a0
efficios@compudjdev:~/test/libc-sym$ LD_PRELOAD=./s.so ./a
__rseq_handled s.so: 0 0x558f70aeb02c
__rseq_abi.cpu_id s.so: -1 0x7fdca78b7760
__rseq_handled main: 0 0x558f70aeb02c
__rseq_abi.cpu_id main: 27 0x7fdca78b7760
Which is unexpected.
This is with my dev branch at this commit:
https://github.com/compudj/glibc-dev/commit/f0d4e60e5d0ceb0c2642f99da5af61b6ad988531
What am I missing ?
Thanks,
Mathieu
--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
^ permalink raw reply
* Re: [PATCH 1/5] glibc: Perform rseq(2) registration at C startup and thread creation (v10)
From: Mathieu Desnoyers @ 2019-06-14 12:55 UTC (permalink / raw)
To: Florian Weimer
Cc: carlos, Joseph Myers, Szabolcs Nagy, libc-alpha, Thomas Gleixner,
Ben Maurer, Peter Zijlstra, Paul E. McKenney, Boqun Feng,
Will Deacon, Dave Watson, Paul Turner, Rich Felker, linux-kernel,
linux-api
In-Reply-To: <87tvcsv1pk.fsf@oldenburg2.str.redhat.com>
----- On Jun 14, 2019, at 1:35 PM, Florian Weimer fweimer@redhat.com wrote:
> * Mathieu Desnoyers:
>
>> * Makefile:
>>
>> LIBCPATH=/home/efficios/glibc-test/lib
>> KERNEL_HEADERS=/home/efficios/git/linux-percpu-dev/usr/include
>> CFLAGS=-I${KERNEL_HEADERS} -L${LIBCPATH} -Wl,--rpath=${LIBCPATH}
>> -Wl,--dynamic-linker=${LIBCPATH}/ld-linux-x86-64.so.2
>>
>> all:
>> gcc ${CFLAGS} -o a a.c
>> gcc ${CFLAGS} -shared -fPIC -o s.so s.c
>
> For me, that does not correctly link against the built libc because the
> system dynamic loader seeps into the link.
I have the same issue. I tried adding "-B${LIBCPATH}" as well, but it did
not seem to help. I still have this ldd output:
ldd a
./a: /lib64/ld-linux-x86-64.so.2: version `GLIBC_2.30' not found (required by ./a)
linux-vdso.so.1 (0x00007fffaa7e9000)
libc.so.6 => /home/efficios/glibc-test/lib/libc.so.6 (0x00007fac5d479000)
/home/efficios/glibc-test/lib/ld-linux-x86-64.so.2 => /lib64/ld-linux-x86-64.so.2 (0x00007fac5da33000)
Still no luck there. Any idea what compiler/linker flag I am missing ?
>
>> specifically this commit:
>> https://github.com/compudj/glibc-dev/commit/c49a286497d065a7fc00aafd846e6edce14f97fc
>
> This commit links __rseq_handled into libc.so.6 via rseq-sym.c, but does
> not export it from there.
Moving __rseq_handled to elf/dl-support.c and elf/rtld.c was part a commit on top.
I've force-pushed on the dev branch, and the commit moving __rseq_handled to the
dynamic linker it now appears as:
https://github.com/compudj/glibc-dev/commit/f0d4e60e5d0ceb0c2642f99da5af61b6ad988531
Thanks,
Mathieu
--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
^ permalink raw reply
* Re: Regression for MS_MOVE on kernel v5.1
From: Eric W. Biederman @ 2019-06-14 12:08 UTC (permalink / raw)
To: Christian Brauner
Cc: Miklos Szeredi, Linus Torvalds, Al Viro,
Linux List Kernel Mailing, linux-fsdevel, Linux API,
David Howells
In-Reply-To: <20190613233706.6k6struu7valxaxy@brauner.io>
Christian Brauner <christian@brauner.io> writes:
> On Thu, Jun 13, 2019 at 04:59:24PM -0500, Eric W. Biederman wrote:
>> Miklos Szeredi <miklos@szeredi.hu> writes:
>>
>> > On Thu, Jun 13, 2019 at 8:35 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
>> >>
>> >> Christian Brauner <christian@brauner.io> writes:
>> >>
>> >> > On Wed, Jun 12, 2019 at 06:00:39PM -1000, Linus Torvalds wrote:
>> >> >> On Wed, Jun 12, 2019 at 12:54 PM Christian Brauner <christian@brauner.io> wrote:
>> >> >> >
>> >> >> > The commit changes the internal logic to lock mounts when propagating
>> >> >> > mounts (user+)mount namespaces and - I believe - causes do_mount_move()
>> >> >> > to fail at:
>> >> >>
>> >> >> You mean 'do_move_mount()'.
>> >> >>
>> >> >> > if (old->mnt.mnt_flags & MNT_LOCKED)
>> >> >> > goto out;
>> >> >> >
>> >> >> > If that's indeed the case we should either revert this commit (reverts
>> >> >> > cleanly, just tested it) or find a fix.
>> >> >>
>> >> >> Hmm.. I'm not entirely sure of the logic here, and just looking at
>> >> >> that commit 3bd045cc9c4b ("separate copying and locking mount tree on
>> >> >> cross-userns copies") doesn't make me go "Ahh" either.
>> >> >>
>> >> >> Al? My gut feel is that we need to just revert, since this was in 5.1
>> >> >> and it's getting reasonably late in 5.2 too. But maybe you go "guys,
>> >> >> don't be silly, this is easily fixed with this one-liner".
>> >> >
>> >> > David and I have been staring at that code today for a while together.
>> >> > I think I made some sense of it.
>> >> > One thing we weren't absolutely sure is if the old MS_MOVE behavior was
>> >> > intentional or a bug. If it is a bug we have a problem since we quite
>> >> > heavily rely on this...
>> >>
>> >> It was intentional.
>> >>
>> >> The only mounts that are locked in propagation are the mounts that
>> >> propagate together. If you see the mounts come in as individuals you
>> >> can always see/manipulate/work with the underlying mount.
>> >>
>> >> I can think of only a few ways for MNT_LOCKED to become set:
>> >> a) unshare(CLONE_NEWNS)
>> >> b) mount --rclone /path/to/mnt/tree /path/to/propagation/point
>> >> c) mount --move /path/to/mnt/tree /path/to/propgation/point
>> >>
>> >> Nothing in the target namespace should be locked on the propgation point
>> >> but all of the new mounts that came across as a unit should be locked
>> >> together.
>> >
>> > Locked together means the root of the new mount tree doesn't have
>> > MNT_LOCKED set, but all mounts below do have MNT_LOCKED, right?
>> >
>> > Isn't the bug here that the root mount gets MNT_LOCKED as well?
>
> Yes, we suspected this as well. We just couldn't pinpoint where the
> surgery would need to start.
>
>>
>> Yes, and the code to remove MNT_LOCKED is still sitting there in
>> propogate_one right after it calls copy_tree. It should be a trivial
>> matter of moving that change to after the lock_mnt_tree call.
>>
>> Now that I have been elightened about anonymous mount namespaces
>> I am suspecting that we want to take the user_namespace of the anonymous
>> namespace into account when deciding to lock the mounts.
>>
>> >> Then it breaking is definitely a regression that needs to be fixed.
>> >>
>> >> I believe the problematic change as made because the new mount
>> >> api allows attaching floating mounts. Or that was the plan last I
>> >> looked. Those floating mounts don't have a mnt_ns so will result
>> >> in a NULL pointer dereference when they are attached.
>> >
>> > Well, it's called anonymous namespace. So there *is* an mnt_ns, and
>> > its lifetime is bound to the file returned by fsmount().
>>
>> Interesting. That has changed since I last saw the patches.
>>
>> Below is what will probably be a straight forward fix for the regression.
>
> Tested the patch just now applied on top of v5.1. It fixes the
> regression.
> Can you please send a proper patch, Eric?
>
> Tested-by: Christian Brauner <christian@brauner.io>
> Acked-by: Christian Brauner <christian@brauner.io>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
I will let Al or whoever take this over the finish line.
I am too sleep deprived at the moment to say anything about the quality
of my patch.
Eric
>> diff --git a/fs/namespace.c b/fs/namespace.c
>> index ffb13f0562b0..a39edeecbc46 100644
>> --- a/fs/namespace.c
>> +++ b/fs/namespace.c
>> @@ -2105,6 +2105,7 @@ static int attach_recursive_mnt(struct mount *source_mnt,
>> /* Notice when we are propagating across user namespaces */
>> if (child->mnt_parent->mnt_ns->user_ns != user_ns)
>> lock_mnt_tree(child);
>> + child->mnt.mnt_flags &= ~MNT_LOCKED;
>> commit_tree(child);
>> }
>> put_mountpoint(smp);
>> diff --git a/fs/pnode.c b/fs/pnode.c
>> index 7ea6cfb65077..012be405fec0 100644
>> --- a/fs/pnode.c
>> +++ b/fs/pnode.c
>> @@ -262,7 +262,6 @@ static int propagate_one(struct mount *m)
>> child = copy_tree(last_source, last_source->mnt.mnt_root, type);
>> if (IS_ERR(child))
>> return PTR_ERR(child);
>> - child->mnt.mnt_flags &= ~MNT_LOCKED;
>> mnt_set_mountpoint(m, mp, child);
>> last_dest = m;
>> last_source = child;
>>
>>
^ permalink raw reply
* Re: [PATCH 1/5] glibc: Perform rseq(2) registration at C startup and thread creation (v10)
From: Florian Weimer @ 2019-06-14 11:35 UTC (permalink / raw)
To: Mathieu Desnoyers
Cc: carlos, Joseph Myers, Szabolcs Nagy, libc-alpha, Thomas Gleixner,
Ben Maurer, Peter Zijlstra, Paul E. McKenney, Boqun Feng,
Will Deacon, Dave Watson, Paul Turner, Rich Felker, linux-kernel,
linux-api
In-Reply-To: <1635690189.3049.1560507249693.JavaMail.zimbra@efficios.com>
* Mathieu Desnoyers:
> * Makefile:
>
> LIBCPATH=/home/efficios/glibc-test/lib
> KERNEL_HEADERS=/home/efficios/git/linux-percpu-dev/usr/include
> CFLAGS=-I${KERNEL_HEADERS} -L${LIBCPATH} -Wl,--rpath=${LIBCPATH} -Wl,--dynamic-linker=${LIBCPATH}/ld-linux-x86-64.so.2
>
> all:
> gcc ${CFLAGS} -o a a.c
> gcc ${CFLAGS} -shared -fPIC -o s.so s.c
For me, that does not correctly link against the built libc because the
system dynamic loader seeps into the link.
> specifically this commit:
> https://github.com/compudj/glibc-dev/commit/c49a286497d065a7fc00aafd846e6edce14f97fc
This commit links __rseq_handled into libc.so.6 via rseq-sym.c, but does
not export it from there.
Thanks,
Florian
^ permalink raw reply
* Re: [PATCH 1/5] glibc: Perform rseq(2) registration at C startup and thread creation (v10)
From: Mathieu Desnoyers @ 2019-06-14 10:14 UTC (permalink / raw)
To: Florian Weimer
Cc: carlos, Joseph Myers, Szabolcs Nagy, libc-alpha, Thomas Gleixner,
Ben Maurer, Peter Zijlstra, Paul E. McKenney, Boqun Feng,
Will Deacon, Dave Watson, Paul Turner, Rich Felker, linux-kernel,
linux-api
In-Reply-To: <87ftocwkei.fsf@oldenburg2.str.redhat.com>
----- On Jun 14, 2019, at 12:06 PM, Florian Weimer fweimer@redhat.com wrote:
> * Mathieu Desnoyers:
>
>> ----- On Jun 12, 2019, at 4:00 PM, Mathieu Desnoyers
>> mathieu.desnoyers@efficios.com wrote:
>>
>>> ----- On Jun 10, 2019, at 4:43 PM, carlos carlos@redhat.com wrote:
>>>
>>>> On 6/6/19 7:57 AM, Florian Weimer wrote:
>>>>> Let me ask the key question again: Does it matter if code observes the
>>>>> rseq area first without kernel support, and then with kernel support?
>>>>> If we don't expect any problems immediately, we do not need to worry
>>>>> much about the constructor ordering right now. I expect that over time,
>>>>> fixing this properly will become easier.
>>>>
>>>> I just wanted to chime in and say that splitting this into:
>>>>
>>>> * Ownership (__rseq_handled)
>>>>
>>>> * Initialization (__rseq_abi)
>>>>
>>>> Makes sense to me.
>>>>
>>>> I agree we need an answer to this question of ownership but not yet
>>>> initialized, to owned and initialized.
>>>>
>>>> I like the idea of having __rseq_handled in ld.so.
>>>
>>> Very good, so I'll implement this approach. Sorry for the delayed
>>> feedback, I am traveling this week.
>>
>> I had issues with cases where application or LD_PRELOAD library also
>> define the __rseq_handled symbol. They appear not to see the same
>> address as the one initialized by ld.so.
>
> What exactly did you do? How did you determine the addresses? How is
> __rseq_handled defined in ld.so?
The easiest way to answer these questions is through links to my github
dev branch:
https://github.com/compudj/glibc-dev/tree/glibc-rseq
specifically this commit:
https://github.com/compudj/glibc-dev/commit/c49a286497d065a7fc00aafd846e6edce14f97fc
and this attempt at using GL():
https://github.com/compudj/glibc-dev/commit/8a02acfbb6943672bfa36b4fc6f61905ee4fa180
My test programs are:
* a.c:
#include <stdio.h>
#include <linux/rseq.h>
extern __thread struct rseq __rseq_abi
__attribute__ ((tls_model ("initial-exec")));/* = {
.cpu_id = -1,
};*/
extern int __rseq_handled;
int main()
{
fprintf(stderr, "__rseq_handled main: %d %p\n", __rseq_handled, &__rseq_handled);
fprintf(stderr, "__rseq_abi.cpu_id main: %d %p\n", __rseq_abi.cpu_id, &__rseq_abi);
return 0;
}
* s.c:
#include <stdio.h>
#include <linux/rseq.h>
#if 0
__thread struct rseq __rseq_abi
__attribute__ ((tls_model ("initial-exec"))) = {
.cpu_id = -1,
};
int __rseq_handled;
#else
extern __thread struct rseq __rseq_abi
__attribute__ ((tls_model ("initial-exec")));
extern int __rseq_handled;
#endif
void __attribute__((constructor)) myinit(void)
{
fprintf(stderr, "__rseq_handled s.so: %d %p\n", __rseq_handled, &__rseq_handled);
fprintf(stderr, "__rseq_abi.cpu_id s.so: %d %p\n", __rseq_abi.cpu_id, &__rseq_abi);
}
* Makefile:
LIBCPATH=/home/efficios/glibc-test/lib
KERNEL_HEADERS=/home/efficios/git/linux-percpu-dev/usr/include
CFLAGS=-I${KERNEL_HEADERS} -L${LIBCPATH} -Wl,--rpath=${LIBCPATH} -Wl,--dynamic-linker=${LIBCPATH}/ld-linux-x86-64.so.2
all:
gcc ${CFLAGS} -o a a.c
gcc ${CFLAGS} -shared -fPIC -o s.so s.c
Thanks,
Mathieu
>
> Thanks,
> Florian
--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
^ permalink raw reply
* Re: [PATCH 1/5] glibc: Perform rseq(2) registration at C startup and thread creation (v10)
From: Florian Weimer @ 2019-06-14 10:06 UTC (permalink / raw)
To: Mathieu Desnoyers
Cc: carlos, Joseph Myers, Szabolcs Nagy, libc-alpha, Thomas Gleixner,
Ben Maurer, Peter Zijlstra, Paul E. McKenney, Boqun Feng,
Will Deacon, Dave Watson, Paul Turner, Rich Felker, linux-kernel,
linux-api
In-Reply-To: <802638054.3032.1560506584705.JavaMail.zimbra@efficios.com>
* Mathieu Desnoyers:
> ----- On Jun 12, 2019, at 4:00 PM, Mathieu Desnoyers mathieu.desnoyers@efficios.com wrote:
>
>> ----- On Jun 10, 2019, at 4:43 PM, carlos carlos@redhat.com wrote:
>>
>>> On 6/6/19 7:57 AM, Florian Weimer wrote:
>>>> Let me ask the key question again: Does it matter if code observes the
>>>> rseq area first without kernel support, and then with kernel support?
>>>> If we don't expect any problems immediately, we do not need to worry
>>>> much about the constructor ordering right now. I expect that over time,
>>>> fixing this properly will become easier.
>>>
>>> I just wanted to chime in and say that splitting this into:
>>>
>>> * Ownership (__rseq_handled)
>>>
>>> * Initialization (__rseq_abi)
>>>
>>> Makes sense to me.
>>>
>>> I agree we need an answer to this question of ownership but not yet
>>> initialized, to owned and initialized.
>>>
>>> I like the idea of having __rseq_handled in ld.so.
>>
>> Very good, so I'll implement this approach. Sorry for the delayed
>> feedback, I am traveling this week.
>
> I had issues with cases where application or LD_PRELOAD library also
> define the __rseq_handled symbol. They appear not to see the same
> address as the one initialized by ld.so.
What exactly did you do? How did you determine the addresses? How is
__rseq_handled defined in ld.so?
Thanks,
Florian
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox