* [GIT PULL] One more vdso fix for 3.19
@ 2014-12-23 21:19 Andy Lutomirski
2015-01-01 21:23 ` Ingo Molnar
0 siblings, 1 reply; 3+ messages in thread
From: Andy Lutomirski @ 2014-12-23 21:19 UTC (permalink / raw)
To: Ingo Molnar, X86 ML, linux-kernel@vger.kernel.org
Cc: Paolo Bonzini, Marcelo Tosatti
Hi Ingo et al,
Please consider pulling for x86/urgent.
Thanks,
Andy
The following changes since commit 394f56fe480140877304d342dec46d50dc823d46:
x86_64, vdso: Fix the vdso address randomization algorithm
(2014-12-20 16:56:57 -0800)
are available in the git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git
tags/pr-20141223-x86-vdso
for you to fetch changes up to 1ddf0b1b11aa8a90cef6706e935fc31c75c406ba:
x86, vdso: Use asm volatile in __getcpu (2014-12-23 13:05:30 -0800)
----------------------------------------------------------------
This is hopefully the last vdso fix for 3.19. It should be very
safe (it just adds a volatile).
I don't think it fixes an actual bug (the __getcpu calls in the
pvclock code may not have been needed in the first place), but
discussion on that point is ongoing.
It also fixes a big performance issue in 3.18 and earlier in which
the lsl instructions in vclock_gettime got hoisted so far up the
function that they happened even when the function they were in was
never called. n 3.19, the performance issue seems to be gone due to
the whims of my compiler and some interaction with a branch that's
now gone.
I'll hopefully have a much bigger overhaul of the pvclock code
for 3.20, but it needs careful review.
----------------------------------------------------------------
Andy Lutomirski (1):
x86, vdso: Use asm volatile in __getcpu
arch/x86/include/asm/vgtod.h | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: [GIT PULL] One more vdso fix for 3.19 2014-12-23 21:19 [GIT PULL] One more vdso fix for 3.19 Andy Lutomirski @ 2015-01-01 21:23 ` Ingo Molnar 2015-01-02 18:19 ` Andy Lutomirski 0 siblings, 1 reply; 3+ messages in thread From: Ingo Molnar @ 2015-01-01 21:23 UTC (permalink / raw) To: Andy Lutomirski Cc: Ingo Molnar, X86 ML, linux-kernel@vger.kernel.org, Paolo Bonzini, Marcelo Tosatti, Linus Torvalds * Andy Lutomirski <luto@amacapital.net> wrote: > Hi Ingo et al, > > Please consider pulling for x86/urgent. > > Thanks, > Andy > > > The following changes since commit 394f56fe480140877304d342dec46d50dc823d46: > > x86_64, vdso: Fix the vdso address randomization algorithm > (2014-12-20 16:56:57 -0800) > > are available in the git repository at: > > git://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git > tags/pr-20141223-x86-vdso > > for you to fetch changes up to 1ddf0b1b11aa8a90cef6706e935fc31c75c406ba: > > x86, vdso: Use asm volatile in __getcpu (2014-12-23 13:05:30 -0800) > > ---------------------------------------------------------------- > This is hopefully the last vdso fix for 3.19. It should be very > safe (it just adds a volatile). > > I don't think it fixes an actual bug (the __getcpu calls in the > pvclock code may not have been needed in the first place), but > discussion on that point is ongoing. > > It also fixes a big performance issue in 3.18 and earlier in which > the lsl instructions in vclock_gettime got hoisted so far up the > function that they happened even when the function they were in was > never called. n 3.19, the performance issue seems to be gone due to > the whims of my compiler and some interaction with a branch that's > now gone. > > I'll hopefully have a much bigger overhaul of the pvclock code > for 3.20, but it needs careful review. > > ---------------------------------------------------------------- > Andy Lutomirski (1): > x86, vdso: Use asm volatile in __getcpu > > arch/x86/include/asm/vgtod.h | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) Pulled into tip:x86/urgent, thanks Andy! I've attached the full commit below - I'm not sure it was sent to lkml. Thanks, Ingo ====================> >From 1ddf0b1b11aa8a90cef6706e935fc31c75c406ba Mon Sep 17 00:00:00 2001 From: Andy Lutomirski <luto@amacapital.net> Date: Sun, 21 Dec 2014 08:57:46 -0800 Subject: [PATCH] x86, vdso: Use asm volatile in __getcpu In Linux 3.18 and below, GCC hoists the lsl instructions in the pvclock code all the way to the beginning of __vdso_clock_gettime, slowing the non-paravirt case significantly. For unknown reasons, presumably related to the removal of a branch, the performance issue is gone as of e76b027e6408 x86,vdso: Use LSL unconditionally for vgetcpu but I don't trust GCC enough to expect the problem to stay fixed. There should be no correctness issue, because the __getcpu calls in __vdso_vlock_gettime were never necessary in the first place. Note to stable maintainers: In 3.18 and below, depending on configuration, gcc 4.9.2 generates code like this: 9c3: 44 0f 03 e8 lsl %ax,%r13d 9c7: 45 89 eb mov %r13d,%r11d 9ca: 0f 03 d8 lsl %ax,%ebx This patch won't apply as is to any released kernel, but I'll send a trivial backported version if needed. Fixes: 51c19b4f5927 x86: vdso: pvclock gettime support Cc: stable@vger.kernel.org # 3.8+ Cc: Marcelo Tosatti <mtosatti@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Andy Lutomirski <luto@amacapital.net> --- arch/x86/include/asm/vgtod.h | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/vgtod.h b/arch/x86/include/asm/vgtod.h index e7e9682a33e9..f556c4843aa1 100644 --- a/arch/x86/include/asm/vgtod.h +++ b/arch/x86/include/asm/vgtod.h @@ -80,9 +80,11 @@ static inline unsigned int __getcpu(void) /* * Load per CPU data from GDT. LSL is faster than RDTSCP and - * works on all CPUs. + * works on all CPUs. This is volatile so that it orders + * correctly wrt barrier() and to keep gcc from cleverly + * hoisting it out of the calling function. */ - asm("lsl %1,%0" : "=r" (p) : "r" (__PER_CPU_SEG)); + asm volatile ("lsl %1,%0" : "=r" (p) : "r" (__PER_CPU_SEG)); return p; } ^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [GIT PULL] One more vdso fix for 3.19 2015-01-01 21:23 ` Ingo Molnar @ 2015-01-02 18:19 ` Andy Lutomirski 0 siblings, 0 replies; 3+ messages in thread From: Andy Lutomirski @ 2015-01-02 18:19 UTC (permalink / raw) To: Ingo Molnar Cc: Ingo Molnar, X86 ML, linux-kernel@vger.kernel.org, Paolo Bonzini, Marcelo Tosatti, Linus Torvalds On Thu, Jan 1, 2015 at 1:23 PM, Ingo Molnar <mingo@kernel.org> wrote: > > * Andy Lutomirski <luto@amacapital.net> wrote: > >> Hi Ingo et al, >> >> Please consider pulling for x86/urgent. >> >> Thanks, >> Andy >> >> >> The following changes since commit 394f56fe480140877304d342dec46d50dc823d46: >> >> x86_64, vdso: Fix the vdso address randomization algorithm >> (2014-12-20 16:56:57 -0800) >> >> are available in the git repository at: >> >> git://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git >> tags/pr-20141223-x86-vdso >> >> for you to fetch changes up to 1ddf0b1b11aa8a90cef6706e935fc31c75c406ba: >> >> x86, vdso: Use asm volatile in __getcpu (2014-12-23 13:05:30 -0800) >> >> ---------------------------------------------------------------- >> This is hopefully the last vdso fix for 3.19. It should be very >> safe (it just adds a volatile). >> >> I don't think it fixes an actual bug (the __getcpu calls in the >> pvclock code may not have been needed in the first place), but >> discussion on that point is ongoing. >> >> It also fixes a big performance issue in 3.18 and earlier in which >> the lsl instructions in vclock_gettime got hoisted so far up the >> function that they happened even when the function they were in was >> never called. n 3.19, the performance issue seems to be gone due to >> the whims of my compiler and some interaction with a branch that's >> now gone. >> >> I'll hopefully have a much bigger overhaul of the pvclock code >> for 3.20, but it needs careful review. >> >> ---------------------------------------------------------------- >> Andy Lutomirski (1): >> x86, vdso: Use asm volatile in __getcpu >> >> arch/x86/include/asm/vgtod.h | 6 ++++-- >> 1 file changed, 4 insertions(+), 2 deletions(-) > > Pulled into tip:x86/urgent, thanks Andy! > > I've attached the full commit below - I'm not sure it was sent to > lkml. > It's here: http://lkml.kernel.org/g/cover.1419295081.git.luto@amacapital.net It was part of an "RFC" series, so it may have slipped under the rader. This patch is Obviously Correct (tm) and was acked even in RFC form, so I split it out. The rest needs much more careful review from the paravirt people and might be ready in time for 3.20. --Andy > Thanks, > > Ingo > > ====================> > From 1ddf0b1b11aa8a90cef6706e935fc31c75c406ba Mon Sep 17 00:00:00 2001 > From: Andy Lutomirski <luto@amacapital.net> > Date: Sun, 21 Dec 2014 08:57:46 -0800 > Subject: [PATCH] x86, vdso: Use asm volatile in __getcpu > > In Linux 3.18 and below, GCC hoists the lsl instructions in the > pvclock code all the way to the beginning of __vdso_clock_gettime, > slowing the non-paravirt case significantly. For unknown reasons, > presumably related to the removal of a branch, the performance issue > is gone as of > > e76b027e6408 x86,vdso: Use LSL unconditionally for vgetcpu > > but I don't trust GCC enough to expect the problem to stay fixed. > > There should be no correctness issue, because the __getcpu calls in > __vdso_vlock_gettime were never necessary in the first place. > > Note to stable maintainers: In 3.18 and below, depending on > configuration, gcc 4.9.2 generates code like this: > > 9c3: 44 0f 03 e8 lsl %ax,%r13d > 9c7: 45 89 eb mov %r13d,%r11d > 9ca: 0f 03 d8 lsl %ax,%ebx > > This patch won't apply as is to any released kernel, but I'll send a > trivial backported version if needed. > > Fixes: 51c19b4f5927 x86: vdso: pvclock gettime support > Cc: stable@vger.kernel.org # 3.8+ > Cc: Marcelo Tosatti <mtosatti@redhat.com> > Acked-by: Paolo Bonzini <pbonzini@redhat.com> > Signed-off-by: Andy Lutomirski <luto@amacapital.net> > --- > arch/x86/include/asm/vgtod.h | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/include/asm/vgtod.h b/arch/x86/include/asm/vgtod.h > index e7e9682a33e9..f556c4843aa1 100644 > --- a/arch/x86/include/asm/vgtod.h > +++ b/arch/x86/include/asm/vgtod.h > @@ -80,9 +80,11 @@ static inline unsigned int __getcpu(void) > > /* > * Load per CPU data from GDT. LSL is faster than RDTSCP and > - * works on all CPUs. > + * works on all CPUs. This is volatile so that it orders > + * correctly wrt barrier() and to keep gcc from cleverly > + * hoisting it out of the calling function. > */ > - asm("lsl %1,%0" : "=r" (p) : "r" (__PER_CPU_SEG)); > + asm volatile ("lsl %1,%0" : "=r" (p) : "r" (__PER_CPU_SEG)); > > return p; > } -- Andy Lutomirski AMA Capital Management, LLC ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2015-01-02 18:19 UTC | newest] Thread overview: 3+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-12-23 21:19 [GIT PULL] One more vdso fix for 3.19 Andy Lutomirski 2015-01-01 21:23 ` Ingo Molnar 2015-01-02 18:19 ` Andy Lutomirski
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.