public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] x86/vdso: Use RDPID in preference to LSL when available
@ 2016-04-21  1:16 Andy Lutomirski
  2016-04-21 12:16 ` Borislav Petkov
  0 siblings, 1 reply; 4+ messages in thread
From: Andy Lutomirski @ 2016-04-21  1:16 UTC (permalink / raw)
  To: x86; +Cc: linux-kernel@vger.kernel.org, Borislav Petkov, Andy Lutomirski

RDPID is a new instruction that reads MSR_TSC_AUX quickly.  This
should be considerably faster than reading the GDT.  Add a
cpufeature for it and use it from __vdso_getcpu when available.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---

I don't have a Cannonlake CPU (or whatever CPU I'd need for this).
Could someone who has such a beast give this a try?

Boris, could you double-check me?  You're a lot more familiar with
CPUID stuff and the instruction tables than I am, and I can't fall
back to a real disassembler because the instruction is too new.

Also, it's time for someone to do UMIP.  I'll see if I can convince
someone in KVM land to emulate it to make it easier to test.

Changes from v1:
 - Remove rdpid() from special_instructions.h.  (Was a leftover -- sorry.)

arch/x86/include/asm/cpufeatures.h | 1 +
 arch/x86/include/asm/vgtod.h       | 7 ++++++-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 7bfb6b70c745..beaf2fb601ee 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -279,6 +279,7 @@
 /* Intel-defined CPU features, CPUID level 0x00000007:0 (ecx), word 16 */
 #define X86_FEATURE_PKU		(16*32+ 3) /* Protection Keys for Userspace */
 #define X86_FEATURE_OSPKE	(16*32+ 4) /* OS Protection Keys Enable */
+#define X86_FEATURE_RDPID	(16*32+ 22) /* RDPID instruction */
 
 /*
  * BUG word(s)
diff --git a/arch/x86/include/asm/vgtod.h b/arch/x86/include/asm/vgtod.h
index e728699db774..3a01996db58f 100644
--- a/arch/x86/include/asm/vgtod.h
+++ b/arch/x86/include/asm/vgtod.h
@@ -89,8 +89,13 @@ static inline unsigned int __getcpu(void)
 	 * works on all CPUs.  This is volatile so that it orders
 	 * correctly wrt barrier() and to keep gcc from cleverly
 	 * hoisting it out of the calling function.
+	 *
+	 * If RDPID is available, use it.
 	 */
-	asm volatile ("lsl %1,%0" : "=r" (p) : "r" (__PER_CPU_SEG));
+	alternative_io ("lsl %[p],%[seg]",
+			".byte 0xf3,0x0f,0xc7,0xf8", /* RDPID %eax/rax */
+			X86_FEATURE_RDPID,
+			[p] "=a" (p), [seg] "r" (__PER_CPU_SEG));
 
 	return p;
 }
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] x86/vdso: Use RDPID in preference to LSL when available
  2016-04-21  1:16 [PATCH v2] x86/vdso: Use RDPID in preference to LSL when available Andy Lutomirski
@ 2016-04-21 12:16 ` Borislav Petkov
  2016-04-21 15:25   ` Andy Lutomirski
  0 siblings, 1 reply; 4+ messages in thread
From: Borislav Petkov @ 2016-04-21 12:16 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: x86, linux-kernel@vger.kernel.org

On Wed, Apr 20, 2016 at 06:16:01PM -0700, Andy Lutomirski wrote:
> Also, it's time for someone to do UMIP.  I'll see if I can convince
> someone in KVM land to emulate it to make it easier to test.

That'll be fun - we can simply set that bit in CR4 and see who screams
:-P

> Changes from v1:
>  - Remove rdpid() from special_instructions.h.  (Was a leftover -- sorry.)
> 
> arch/x86/include/asm/cpufeatures.h | 1 +
>  arch/x86/include/asm/vgtod.h       | 7 ++++++-
>  2 files changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> index 7bfb6b70c745..beaf2fb601ee 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -279,6 +279,7 @@
>  /* Intel-defined CPU features, CPUID level 0x00000007:0 (ecx), word 16 */
>  #define X86_FEATURE_PKU		(16*32+ 3) /* Protection Keys for Userspace */
>  #define X86_FEATURE_OSPKE	(16*32+ 4) /* OS Protection Keys Enable */
> +#define X86_FEATURE_RDPID	(16*32+ 22) /* RDPID instruction */
>  
>  /*
>   * BUG word(s)
> diff --git a/arch/x86/include/asm/vgtod.h b/arch/x86/include/asm/vgtod.h
> index e728699db774..3a01996db58f 100644
> --- a/arch/x86/include/asm/vgtod.h
> +++ b/arch/x86/include/asm/vgtod.h
> @@ -89,8 +89,13 @@ static inline unsigned int __getcpu(void)
>  	 * works on all CPUs.  This is volatile so that it orders
>  	 * correctly wrt barrier() and to keep gcc from cleverly
>  	 * hoisting it out of the calling function.
> +	 *
> +	 * If RDPID is available, use it.
>  	 */
> -	asm volatile ("lsl %1,%0" : "=r" (p) : "r" (__PER_CPU_SEG));
> +	alternative_io ("lsl %[p],%[seg]",
> +			".byte 0xf3,0x0f,0xc7,0xf8", /* RDPID %eax/rax */

AFAICT, 0xf8 is correct, if I'm reading the SDM right:

bits [7:6] must be 11b for opcode group 9 and RDPID is in the 11b row,
bits [5:3] are ModRM.reg and they need to be 111b for RDPID (0x7 column)
and the last three [2:0] select the register and they must be 000b for
rAX.

HOWEVER, you need to make the asm output register constraint "=a"
because you're specifying rAX as a destination register for RDPID.

Also, I'm wondering: should we supply that alternative in a separate
inline function in special_instructions.h for wider use? I.e., something
like read_cpu_num() or so...

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] x86/vdso: Use RDPID in preference to LSL when available
  2016-04-21 12:16 ` Borislav Petkov
@ 2016-04-21 15:25   ` Andy Lutomirski
  2016-04-21 16:26     ` Borislav Petkov
  0 siblings, 1 reply; 4+ messages in thread
From: Andy Lutomirski @ 2016-04-21 15:25 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: Andy Lutomirski, X86 ML, linux-kernel@vger.kernel.org

On Thu, Apr 21, 2016 at 5:16 AM, Borislav Petkov <bp@alien8.de> wrote:
> On Wed, Apr 20, 2016 at 06:16:01PM -0700, Andy Lutomirski wrote:
>> Also, it's time for someone to do UMIP.  I'll see if I can convince
>> someone in KVM land to emulate it to make it easier to test.
>
> That'll be fun - we can simply set that bit in CR4 and see who screams
> :-P
>
>> Changes from v1:
>>  - Remove rdpid() from special_instructions.h.  (Was a leftover -- sorry.)
>>
>> arch/x86/include/asm/cpufeatures.h | 1 +
>>  arch/x86/include/asm/vgtod.h       | 7 ++++++-
>>  2 files changed, 7 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
>> index 7bfb6b70c745..beaf2fb601ee 100644
>> --- a/arch/x86/include/asm/cpufeatures.h
>> +++ b/arch/x86/include/asm/cpufeatures.h
>> @@ -279,6 +279,7 @@
>>  /* Intel-defined CPU features, CPUID level 0x00000007:0 (ecx), word 16 */
>>  #define X86_FEATURE_PKU              (16*32+ 3) /* Protection Keys for Userspace */
>>  #define X86_FEATURE_OSPKE    (16*32+ 4) /* OS Protection Keys Enable */
>> +#define X86_FEATURE_RDPID    (16*32+ 22) /* RDPID instruction */
>>
>>  /*
>>   * BUG word(s)
>> diff --git a/arch/x86/include/asm/vgtod.h b/arch/x86/include/asm/vgtod.h
>> index e728699db774..3a01996db58f 100644
>> --- a/arch/x86/include/asm/vgtod.h
>> +++ b/arch/x86/include/asm/vgtod.h
>> @@ -89,8 +89,13 @@ static inline unsigned int __getcpu(void)
>>        * works on all CPUs.  This is volatile so that it orders
>>        * correctly wrt barrier() and to keep gcc from cleverly
>>        * hoisting it out of the calling function.
>> +      *
>> +      * If RDPID is available, use it.
>>        */
>> -     asm volatile ("lsl %1,%0" : "=r" (p) : "r" (__PER_CPU_SEG));
>> +     alternative_io ("lsl %[p],%[seg]",
>> +                     ".byte 0xf3,0x0f,0xc7,0xf8", /* RDPID %eax/rax */
>
> AFAICT, 0xf8 is correct, if I'm reading the SDM right:
>
> bits [7:6] must be 11b for opcode group 9 and RDPID is in the 11b row,
> bits [5:3] are ModRM.reg and they need to be 111b for RDPID (0x7 column)
> and the last three [2:0] select the register and they must be 000b for
> rAX.
>
> HOWEVER, you need to make the asm output register constraint "=a"
> because you're specifying rAX as a destination register for RDPID.

Didn't I?

>
> Also, I'm wondering: should we supply that alternative in a separate
> inline function in special_instructions.h for wider use? I.e., something
> like read_cpu_num() or so...
>

I thought about it, and there were two reasons:

1. I don't think we want to use __getcpu in the kernel.  LSL is fairly
slow, and we'd still need to mask off the node number.
raw_smp_processor_id(), in contrast, is a single load.

2. I have no way to benchmark this thing.  I'm assuming the RDPID will
be faster than LSL, but that doesn't mean it's faster than a load.
(It could be -- it will save a cache line.)

So we might actually want something that does an alternative where the
two choices are the percpu load and RDPID ; AND, but that wouldn't end
up sharing code.  But I'll leave that to someone with an actual
RDPID-supporting CPU :)

--Andy

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] x86/vdso: Use RDPID in preference to LSL when available
  2016-04-21 15:25   ` Andy Lutomirski
@ 2016-04-21 16:26     ` Borislav Petkov
  0 siblings, 0 replies; 4+ messages in thread
From: Borislav Petkov @ 2016-04-21 16:26 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: Andy Lutomirski, X86 ML, linux-kernel@vger.kernel.org

On Thu, Apr 21, 2016 at 08:25:45AM -0700, Andy Lutomirski wrote:
> Didn't I?

Bah, I cut off the line which has the "=a" and then did the commenting.
Sorry about the noise.

> I thought about it, and there were two reasons:
> 
> 1. I don't think we want to use __getcpu in the kernel.  LSL is fairly
> slow, and we'd still need to mask off the node number.
> raw_smp_processor_id(), in contrast, is a single load.

Right.

> 2. I have no way to benchmark this thing.  I'm assuming the RDPID will
> be faster than LSL, but that doesn't mean it's faster than a load.
> (It could be -- it will save a cache line.)

But the RDPID reads an MSR. So it probably is microcode and thus slower
than a load... I guess one of the reasons for the RDPID is to avoid the
serialization cost of RDTSCP.

> So we might actually want something that does an alternative where the
> two choices are the percpu load and RDPID ; AND, but that wouldn't end
> up sharing code.  But I'll leave that to someone with an actual
> RDPID-supporting CPU :)

Right.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2016-04-21 16:26 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-04-21  1:16 [PATCH v2] x86/vdso: Use RDPID in preference to LSL when available Andy Lutomirski
2016-04-21 12:16 ` Borislav Petkov
2016-04-21 15:25   ` Andy Lutomirski
2016-04-21 16:26     ` Borislav Petkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox