* [PATCH 0/4] arm64: advertise availability of CRC and crypto instructions
@ 2013-12-16 21:04 Ard Biesheuvel
2013-12-16 21:04 ` [PATCH 1/4] arm64: drop redundant macros from read_cpuid() Ard Biesheuvel
` (4 more replies)
0 siblings, 5 replies; 37+ messages in thread
From: Ard Biesheuvel @ 2013-12-16 21:04 UTC (permalink / raw)
To: linux-arm-kernel
This series is an expansion of the patch posted by Steve Capper about 6 weeks
ago that allocates hwcaps bits for CRC and Crypto Extensions instructions so
userland can discover whether the current CPU has any of those capabilities.
Patch #1 is a cleanup patch for read_cpuid(), which allowed me to skip adding
yet another #define to asm/cputype.h (for ID_ISAR5_EL1)
Patch #2 is Steve's original patch, but slightly tweaked because hwcaps bit 2
has been allocated for something else in the mean time.
Patch #3 allocates the capability bits in the arch/arm tree. This is necessary
because 32-bit ARM binaries can execute both under ARM and under arm64 kernels,
so there should be agreement about the meaning of feature bits, even if those
features don't actually exist on systems covered by the arch/arm tree.
@Russell: if this looks ok to you, could you please indicate whether you prefer
to take this patch separately, or ack it and let it be merged as part of the
series.
Patch #4 advertises the CRC and Crypto Extensions to 32-bit binaries running
under an arm64 kernel.
Ard Biesheuvel (3):
arm64: drop redundant macros from read_cpuid()
ARM: allocate hwcaps bits for v8 crypto extensions
arm64: add 32-bit compat hwcaps for v8 crypto extensions
Steve Capper (1):
arm64: Add hwcaps for crypto and CRC32 extensions.
arch/arm/include/uapi/asm/hwcap.h | 5 +++
arch/arm/kernel/setup.c | 5 +++
arch/arm64/include/asm/cputype.h | 18 +++-------
arch/arm64/include/asm/hwcap.h | 6 ++++
arch/arm64/include/uapi/asm/hwcap.h | 6 +++-
arch/arm64/kernel/setup.c | 69 +++++++++++++++++++++++++++++++++++++
6 files changed, 94 insertions(+), 15 deletions(-)
--
1.8.3.2
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 1/4] arm64: drop redundant macros from read_cpuid()
2013-12-16 21:04 [PATCH 0/4] arm64: advertise availability of CRC and crypto instructions Ard Biesheuvel
@ 2013-12-16 21:04 ` Ard Biesheuvel
2013-12-17 12:04 ` Catalin Marinas
2013-12-16 21:04 ` [PATCH 2/4] arm64: Add hwcaps for crypto and CRC32 extensions Ard Biesheuvel
` (3 subsequent siblings)
4 siblings, 1 reply; 37+ messages in thread
From: Ard Biesheuvel @ 2013-12-16 21:04 UTC (permalink / raw)
To: linux-arm-kernel
asm/cputype.h contains a bunch of #defines for CPU id registers
that essentially map to themselves. Remove the #defines and pass
the tokens directly to the inline asm() that reads the registers.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
arch/arm64/include/asm/cputype.h | 18 ++++--------------
1 file changed, 4 insertions(+), 14 deletions(-)
diff --git a/arch/arm64/include/asm/cputype.h b/arch/arm64/include/asm/cputype.h
index 5fe138e..e1af1b4 100644
--- a/arch/arm64/include/asm/cputype.h
+++ b/arch/arm64/include/asm/cputype.h
@@ -16,23 +16,13 @@
#ifndef __ASM_CPUTYPE_H
#define __ASM_CPUTYPE_H
-#define ID_MIDR_EL1 "midr_el1"
-#define ID_MPIDR_EL1 "mpidr_el1"
-#define ID_CTR_EL0 "ctr_el0"
-
-#define ID_AA64PFR0_EL1 "id_aa64pfr0_el1"
-#define ID_AA64DFR0_EL1 "id_aa64dfr0_el1"
-#define ID_AA64AFR0_EL1 "id_aa64afr0_el1"
-#define ID_AA64ISAR0_EL1 "id_aa64isar0_el1"
-#define ID_AA64MMFR0_EL1 "id_aa64mmfr0_el1"
-
#define INVALID_HWID ULONG_MAX
#define MPIDR_HWID_BITMASK 0xff00ffffff
#define read_cpuid(reg) ({ \
u64 __val; \
- asm("mrs %0, " reg : "=r" (__val)); \
+ asm("mrs %0, " #reg : "=r" (__val)); \
__val; \
})
@@ -54,12 +44,12 @@
*/
static inline u32 __attribute_const__ read_cpuid_id(void)
{
- return read_cpuid(ID_MIDR_EL1);
+ return read_cpuid(MIDR_EL1);
}
static inline u64 __attribute_const__ read_cpuid_mpidr(void)
{
- return read_cpuid(ID_MPIDR_EL1);
+ return read_cpuid(MPIDR_EL1);
}
static inline unsigned int __attribute_const__ read_cpuid_implementor(void)
@@ -74,7 +64,7 @@ static inline unsigned int __attribute_const__ read_cpuid_part_number(void)
static inline u32 __attribute_const__ read_cpuid_cachetype(void)
{
- return read_cpuid(ID_CTR_EL0);
+ return read_cpuid(CTR_EL0);
}
#endif /* __ASSEMBLY__ */
--
1.8.3.2
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH 2/4] arm64: Add hwcaps for crypto and CRC32 extensions.
2013-12-16 21:04 [PATCH 0/4] arm64: advertise availability of CRC and crypto instructions Ard Biesheuvel
2013-12-16 21:04 ` [PATCH 1/4] arm64: drop redundant macros from read_cpuid() Ard Biesheuvel
@ 2013-12-16 21:04 ` Ard Biesheuvel
2013-12-17 12:08 ` Catalin Marinas
2013-12-16 21:04 ` [PATCH 3/4] ARM: allocate hwcaps bits for v8 crypto extensions Ard Biesheuvel
` (2 subsequent siblings)
4 siblings, 1 reply; 37+ messages in thread
From: Ard Biesheuvel @ 2013-12-16 21:04 UTC (permalink / raw)
To: linux-arm-kernel
From: Steve Capper <steve.capper@linaro.org>
Advertise the optional cryptographic and CRC32 instructions to
user space where present. Several hwcap bits [3-7] are allocated.
Signed-off-by: Steve Capper <steve.capper@linaro.org>
[bit 2 is taken now so use bits 3-7 instead]
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
arch/arm64/include/uapi/asm/hwcap.h | 6 +++++-
arch/arm64/kernel/setup.c | 37 +++++++++++++++++++++++++++++++++++++
2 files changed, 42 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/include/uapi/asm/hwcap.h b/arch/arm64/include/uapi/asm/hwcap.h
index 9b12476..73cf0f5 100644
--- a/arch/arm64/include/uapi/asm/hwcap.h
+++ b/arch/arm64/include/uapi/asm/hwcap.h
@@ -22,6 +22,10 @@
#define HWCAP_FP (1 << 0)
#define HWCAP_ASIMD (1 << 1)
#define HWCAP_EVTSTRM (1 << 2)
-
+#define HWCAP_AES (1 << 3)
+#define HWCAP_PMULL (1 << 4)
+#define HWCAP_SHA1 (1 << 5)
+#define HWCAP_SHA2 (1 << 6)
+#define HWCAP_CRC32 (1 << 7)
#endif /* _UAPI__ASM_HWCAP_H */
diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index 0bc5e4c..961c961 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -116,6 +116,7 @@ bool arch_match_cpu_phys_id(int cpu, u64 phys_id)
static void __init setup_processor(void)
{
struct cpu_info *cpu_info;
+ u64 features, block;
/*
* locate processor in the list of supported processor
@@ -136,6 +137,37 @@ static void __init setup_processor(void)
sprintf(init_utsname()->machine, ELF_PLATFORM);
elf_hwcap = 0;
+
+ /*
+ * ID_AA64ISAR0_EL1 contains 4-bit wide signed feature blocks.
+ * The blocks we test below represent incremental functionality
+ * for non-negative values. Negative values are reserved.
+ */
+ features = read_cpuid(ID_AA64ISAR0_EL1);
+ block = (features >> 4) & 0xf;
+ if (!(block & 0x8)) {
+ switch (block) {
+ default:
+ case 2:
+ elf_hwcap |= HWCAP_PMULL;
+ case 1:
+ elf_hwcap |= HWCAP_AES;
+ case 0:
+ break;
+ }
+ }
+
+ block = (features >> 8) & 0xf;
+ if (block && !(block & 0x8))
+ elf_hwcap |= HWCAP_SHA1;
+
+ block = (features >> 12) & 0xf;
+ if (block && !(block & 0x8))
+ elf_hwcap |= HWCAP_SHA2;
+
+ block = (features >> 16) & 0xf;
+ if (block && !(block & 0x8))
+ elf_hwcap |= HWCAP_CRC32;
}
static void __init setup_machine_fdt(phys_addr_t dt_phys)
@@ -270,6 +302,11 @@ static const char *hwcap_str[] = {
"fp",
"asimd",
"evtstrm",
+ "aes",
+ "pmull",
+ "sha1",
+ "sha2",
+ "crc32",
NULL
};
--
1.8.3.2
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH 3/4] ARM: allocate hwcaps bits for v8 crypto extensions
2013-12-16 21:04 [PATCH 0/4] arm64: advertise availability of CRC and crypto instructions Ard Biesheuvel
2013-12-16 21:04 ` [PATCH 1/4] arm64: drop redundant macros from read_cpuid() Ard Biesheuvel
2013-12-16 21:04 ` [PATCH 2/4] arm64: Add hwcaps for crypto and CRC32 extensions Ard Biesheuvel
@ 2013-12-16 21:04 ` Ard Biesheuvel
2013-12-16 21:04 ` [PATCH 4/4] arm64: add 32-bit compat hwcaps " Ard Biesheuvel
2013-12-17 12:25 ` [PATCH 0/4] arm64: advertise availability of CRC and crypto instructions Catalin Marinas
4 siblings, 0 replies; 37+ messages in thread
From: Ard Biesheuvel @ 2013-12-16 21:04 UTC (permalink / raw)
To: linux-arm-kernel
ARM binaries running under an arm64 kernel are able to use
the special 32 bit versions of the v8 Crypto Extensions.
Even if the ARM port itself does not cover systems with such
capabilities, the allocation of 32-bit hwcaps bits should be
aligned between ARM and arm64 so a 32-bit userland does not need
to care about the difference.
This patch allocates bits 22-26 for the Crypto Extensions AES,
PMULL.64, SHA1, SHA2 and CRC32 respectively.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
arch/arm/include/uapi/asm/hwcap.h | 5 +++++
arch/arm/kernel/setup.c | 5 +++++
2 files changed, 10 insertions(+)
diff --git a/arch/arm/include/uapi/asm/hwcap.h b/arch/arm/include/uapi/asm/hwcap.h
index 7dcc10d..0726024 100644
--- a/arch/arm/include/uapi/asm/hwcap.h
+++ b/arch/arm/include/uapi/asm/hwcap.h
@@ -27,5 +27,10 @@
#define HWCAP_IDIV (HWCAP_IDIVA | HWCAP_IDIVT)
#define HWCAP_LPAE (1 << 20)
#define HWCAP_EVTSTRM (1 << 21)
+#define HWCAP_AES (1 << 22)
+#define HWCAP_PMULL (1 << 23)
+#define HWCAP_SHA1 (1 << 24)
+#define HWCAP_SHA2 (1 << 25)
+#define HWCAP_CRC32 (1 << 26)
#endif /* _UAPI__ASMARM_HWCAP_H */
diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c
index 6a1b8a8..57e6b5e 100644
--- a/arch/arm/kernel/setup.c
+++ b/arch/arm/kernel/setup.c
@@ -990,6 +990,11 @@ static const char *hwcap_str[] = {
"vfpd32",
"lpae",
"evtstrm",
+ "aes",
+ "pmull",
+ "sha1",
+ "sha2",
+ "crc32",
NULL
};
--
1.8.3.2
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH 4/4] arm64: add 32-bit compat hwcaps for v8 crypto extensions
2013-12-16 21:04 [PATCH 0/4] arm64: advertise availability of CRC and crypto instructions Ard Biesheuvel
` (2 preceding siblings ...)
2013-12-16 21:04 ` [PATCH 3/4] ARM: allocate hwcaps bits for v8 crypto extensions Ard Biesheuvel
@ 2013-12-16 21:04 ` Ard Biesheuvel
2013-12-17 12:25 ` [PATCH 0/4] arm64: advertise availability of CRC and crypto instructions Catalin Marinas
4 siblings, 0 replies; 37+ messages in thread
From: Ard Biesheuvel @ 2013-12-16 21:04 UTC (permalink / raw)
To: linux-arm-kernel
The ARMv8 Crypto Extensions may also be available to userland
processes running in 32-bit mode. Allocate the compat bits and
set them at boot.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
arch/arm64/include/asm/hwcap.h | 6 ++++++
arch/arm64/kernel/setup.c | 32 ++++++++++++++++++++++++++++++++
2 files changed, 38 insertions(+)
diff --git a/arch/arm64/include/asm/hwcap.h b/arch/arm64/include/asm/hwcap.h
index 6cddbb0..4ae8c69 100644
--- a/arch/arm64/include/asm/hwcap.h
+++ b/arch/arm64/include/asm/hwcap.h
@@ -31,6 +31,12 @@
#define COMPAT_HWCAP_IDIVT (1 << 18)
#define COMPAT_HWCAP_IDIV (COMPAT_HWCAP_IDIVA|COMPAT_HWCAP_IDIVT)
#define COMPAT_HWCAP_EVTSTRM (1 << 21)
+#define COMPAT_HWCAP_EVTSTRM (1 << 21)
+#define COMPAT_HWCAP_AES (1 << 22)
+#define COMPAT_HWCAP_PMULL (1 << 23)
+#define COMPAT_HWCAP_SHA1 (1 << 24)
+#define COMPAT_HWCAP_SHA2 (1 << 25)
+#define COMPAT_HWCAP_CRC32 (1 << 26)
#ifndef __ASSEMBLY__
/*
diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index 961c961..283039d 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -168,6 +168,38 @@ static void __init setup_processor(void)
block = (features >> 16) & 0xf;
if (block && !(block & 0x8))
elf_hwcap |= HWCAP_CRC32;
+
+#ifdef CONFIG_COMPAT
+ /*
+ * ID_ISAR5_EL1 carries similar information as above, but pertaining to
+ * the Aarch32 32-bit execution state.
+ */
+ features = read_cpuid(ID_ISAR5_EL1);
+ block = (features >> 4) & 0xf;
+ if (!(block & 0x8)) {
+ switch (block) {
+ default:
+ case 2:
+ compat_elf_hwcap |= COMPAT_HWCAP_PMULL;
+ case 1:
+ compat_elf_hwcap |= COMPAT_HWCAP_AES;
+ case 0:
+ break;
+ }
+ }
+
+ block = (features >> 8) & 0xf;
+ if (block && !(block & 0x8))
+ compat_elf_hwcap |= COMPAT_HWCAP_SHA1;
+
+ block = (features >> 12) & 0xf;
+ if (block && !(block & 0x8))
+ compat_elf_hwcap |= COMPAT_HWCAP_SHA2;
+
+ block = (features >> 16) & 0xf;
+ if (block && !(block & 0x8))
+ compat_elf_hwcap |= COMPAT_HWCAP_CRC32;
+#endif
}
static void __init setup_machine_fdt(phys_addr_t dt_phys)
--
1.8.3.2
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH 1/4] arm64: drop redundant macros from read_cpuid()
2013-12-16 21:04 ` [PATCH 1/4] arm64: drop redundant macros from read_cpuid() Ard Biesheuvel
@ 2013-12-17 12:04 ` Catalin Marinas
2013-12-17 12:10 ` Will Deacon
0 siblings, 1 reply; 37+ messages in thread
From: Catalin Marinas @ 2013-12-17 12:04 UTC (permalink / raw)
To: linux-arm-kernel
On Mon, Dec 16, 2013 at 09:04:35PM +0000, Ard Biesheuvel wrote:
> #define read_cpuid(reg) ({ \
> u64 __val; \
> - asm("mrs %0, " reg : "=r" (__val)); \
> + asm("mrs %0, " #reg : "=r" (__val)); \
> __val; \
> })
>
> @@ -54,12 +44,12 @@
> */
> static inline u32 __attribute_const__ read_cpuid_id(void)
> {
> - return read_cpuid(ID_MIDR_EL1);
> + return read_cpuid(MIDR_EL1);
> }
It makes sense. Just nitpick, could you please use lowercase register
names for consistency?
Thanks.
--
Catalin
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 2/4] arm64: Add hwcaps for crypto and CRC32 extensions.
2013-12-16 21:04 ` [PATCH 2/4] arm64: Add hwcaps for crypto and CRC32 extensions Ard Biesheuvel
@ 2013-12-17 12:08 ` Catalin Marinas
2013-12-17 12:11 ` Catalin Marinas
0 siblings, 1 reply; 37+ messages in thread
From: Catalin Marinas @ 2013-12-17 12:08 UTC (permalink / raw)
To: linux-arm-kernel
On Mon, Dec 16, 2013 at 09:04:36PM +0000, Ard Biesheuvel wrote:
> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
> index 0bc5e4c..961c961 100644
> --- a/arch/arm64/kernel/setup.c
> +++ b/arch/arm64/kernel/setup.c
> @@ -116,6 +116,7 @@ bool arch_match_cpu_phys_id(int cpu, u64 phys_id)
> static void __init setup_processor(void)
> {
> struct cpu_info *cpu_info;
> + u64 features, block;
>
> /*
> * locate processor in the list of supported processor
> @@ -136,6 +137,37 @@ static void __init setup_processor(void)
>
> sprintf(init_utsname()->machine, ELF_PLATFORM);
> elf_hwcap = 0;
> +
> + /*
> + * ID_AA64ISAR0_EL1 contains 4-bit wide signed feature blocks.
> + * The blocks we test below represent incremental functionality
> + * for non-negative values. Negative values are reserved.
> + */
> + features = read_cpuid(ID_AA64ISAR0_EL1);
Have you built this?
--
Catalin
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 1/4] arm64: drop redundant macros from read_cpuid()
2013-12-17 12:04 ` Catalin Marinas
@ 2013-12-17 12:10 ` Will Deacon
2013-12-17 12:12 ` Catalin Marinas
0 siblings, 1 reply; 37+ messages in thread
From: Will Deacon @ 2013-12-17 12:10 UTC (permalink / raw)
To: linux-arm-kernel
On Tue, Dec 17, 2013 at 12:04:31PM +0000, Catalin Marinas wrote:
> On Mon, Dec 16, 2013 at 09:04:35PM +0000, Ard Biesheuvel wrote:
> > #define read_cpuid(reg) ({ \
> > u64 __val; \
> > - asm("mrs %0, " reg : "=r" (__val)); \
> > + asm("mrs %0, " #reg : "=r" (__val)); \
> > __val; \
> > })
> >
> > @@ -54,12 +44,12 @@
> > */
> > static inline u32 __attribute_const__ read_cpuid_id(void)
> > {
> > - return read_cpuid(ID_MIDR_EL1);
> > + return read_cpuid(MIDR_EL1);
> > }
>
> It makes sense. Just nitpick, could you please use lowercase register
> names for consistency?
Hmm: cputype, hw_breakpoint, perf and kvm are using upper-case names...
Will
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 2/4] arm64: Add hwcaps for crypto and CRC32 extensions.
2013-12-17 12:08 ` Catalin Marinas
@ 2013-12-17 12:11 ` Catalin Marinas
0 siblings, 0 replies; 37+ messages in thread
From: Catalin Marinas @ 2013-12-17 12:11 UTC (permalink / raw)
To: linux-arm-kernel
On Tue, Dec 17, 2013 at 12:08:31PM +0000, Catalin Marinas wrote:
> On Mon, Dec 16, 2013 at 09:04:36PM +0000, Ard Biesheuvel wrote:
> > diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
> > index 0bc5e4c..961c961 100644
> > --- a/arch/arm64/kernel/setup.c
> > +++ b/arch/arm64/kernel/setup.c
> > @@ -116,6 +116,7 @@ bool arch_match_cpu_phys_id(int cpu, u64 phys_id)
> > static void __init setup_processor(void)
> > {
> > struct cpu_info *cpu_info;
> > + u64 features, block;
> >
> > /*
> > * locate processor in the list of supported processor
> > @@ -136,6 +137,37 @@ static void __init setup_processor(void)
> >
> > sprintf(init_utsname()->machine, ELF_PLATFORM);
> > elf_hwcap = 0;
> > +
> > + /*
> > + * ID_AA64ISAR0_EL1 contains 4-bit wide signed feature blocks.
> > + * The blocks we test below represent incremental functionality
> > + * for non-negative values. Negative values are reserved.
> > + */
> > + features = read_cpuid(ID_AA64ISAR0_EL1);
>
> Have you built this?
I guess you did, sorry for the noise (got confused with the other ID_*
macros that you removed). As I keep staring at them, I'm fine with upper
case as well ;)
--
Catalin
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 1/4] arm64: drop redundant macros from read_cpuid()
2013-12-17 12:10 ` Will Deacon
@ 2013-12-17 12:12 ` Catalin Marinas
0 siblings, 0 replies; 37+ messages in thread
From: Catalin Marinas @ 2013-12-17 12:12 UTC (permalink / raw)
To: linux-arm-kernel
On Tue, Dec 17, 2013 at 12:10:33PM +0000, Will Deacon wrote:
> On Tue, Dec 17, 2013 at 12:04:31PM +0000, Catalin Marinas wrote:
> > On Mon, Dec 16, 2013 at 09:04:35PM +0000, Ard Biesheuvel wrote:
> > > #define read_cpuid(reg) ({ \
> > > u64 __val; \
> > > - asm("mrs %0, " reg : "=r" (__val)); \
> > > + asm("mrs %0, " #reg : "=r" (__val)); \
> > > __val; \
> > > })
> > >
> > > @@ -54,12 +44,12 @@
> > > */
> > > static inline u32 __attribute_const__ read_cpuid_id(void)
> > > {
> > > - return read_cpuid(ID_MIDR_EL1);
> > > + return read_cpuid(MIDR_EL1);
> > > }
> >
> > It makes sense. Just nitpick, could you please use lowercase register
> > names for consistency?
>
> Hmm: cputype, hw_breakpoint, perf and kvm are using upper-case names...
OK, I don't care ;). In .S we are using lower-case.
--
Catalin
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 0/4] arm64: advertise availability of CRC and crypto instructions
2013-12-16 21:04 [PATCH 0/4] arm64: advertise availability of CRC and crypto instructions Ard Biesheuvel
` (3 preceding siblings ...)
2013-12-16 21:04 ` [PATCH 4/4] arm64: add 32-bit compat hwcaps " Ard Biesheuvel
@ 2013-12-17 12:25 ` Catalin Marinas
2013-12-18 9:50 ` Ard Biesheuvel
4 siblings, 1 reply; 37+ messages in thread
From: Catalin Marinas @ 2013-12-17 12:25 UTC (permalink / raw)
To: linux-arm-kernel
On Mon, Dec 16, 2013 at 09:04:34PM +0000, Ard Biesheuvel wrote:
> This series is an expansion of the patch posted by Steve Capper about 6 weeks
> ago that allocates hwcaps bits for CRC and Crypto Extensions instructions so
> userland can discover whether the current CPU has any of those capabilities.
>
> Patch #1 is a cleanup patch for read_cpuid(), which allowed me to skip adding
> yet another #define to asm/cputype.h (for ID_ISAR5_EL1)
>
> Patch #2 is Steve's original patch, but slightly tweaked because hwcaps bit 2
> has been allocated for something else in the mean time.
>
> Patch #3 allocates the capability bits in the arch/arm tree. This is necessary
> because 32-bit ARM binaries can execute both under ARM and under arm64 kernels,
> so there should be agreement about the meaning of feature bits, even if those
> features don't actually exist on systems covered by the arch/arm tree.
>
> @Russell: if this looks ok to you, could you please indicate whether you prefer
> to take this patch separately, or ack it and let it be merged as part of the
> series.
>
> Patch #4 advertises the CRC and Crypto Extensions to 32-bit binaries running
> under an arm64 kernel.
The series look fine to me. I'm happy to take all the patches if Russell
Acks the arm one (or it can send it via the patch system).
--
Catalin
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 0/4] arm64: advertise availability of CRC and crypto instructions
2013-12-17 12:25 ` [PATCH 0/4] arm64: advertise availability of CRC and crypto instructions Catalin Marinas
@ 2013-12-18 9:50 ` Ard Biesheuvel
2013-12-18 10:03 ` Russell King - ARM Linux
0 siblings, 1 reply; 37+ messages in thread
From: Ard Biesheuvel @ 2013-12-18 9:50 UTC (permalink / raw)
To: linux-arm-kernel
On 17 December 2013 13:25, Catalin Marinas <catalin.marinas@arm.com> wrote:
> On Mon, Dec 16, 2013 at 09:04:34PM +0000, Ard Biesheuvel wrote:
>> This series is an expansion of the patch posted by Steve Capper about 6 weeks
>> ago that allocates hwcaps bits for CRC and Crypto Extensions instructions so
>> userland can discover whether the current CPU has any of those capabilities.
>>
>> Patch #1 is a cleanup patch for read_cpuid(), which allowed me to skip adding
>> yet another #define to asm/cputype.h (for ID_ISAR5_EL1)
>>
>> Patch #2 is Steve's original patch, but slightly tweaked because hwcaps bit 2
>> has been allocated for something else in the mean time.
>>
>> Patch #3 allocates the capability bits in the arch/arm tree. This is necessary
>> because 32-bit ARM binaries can execute both under ARM and under arm64 kernels,
>> so there should be agreement about the meaning of feature bits, even if those
>> features don't actually exist on systems covered by the arch/arm tree.
>>
>> @Russell: if this looks ok to you, could you please indicate whether you prefer
>> to take this patch separately, or ack it and let it be merged as part of the
>> series.
>>
>> Patch #4 advertises the CRC and Crypto Extensions to 32-bit binaries running
>> under an arm64 kernel.
>
> The series look fine to me. I'm happy to take all the patches if Russell
> Acks the arm one (or it can send it via the patch system).
>
Hello Russell,
Care to share your take on this? I imagine new hwcaps bits take a
while to percolate and make their way into a stable glibc, so I would
like to have these changes in sooner rather than later, and 3.14 seems
feasible.
Regards,
Ard.
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 0/4] arm64: advertise availability of CRC and crypto instructions
2013-12-18 9:50 ` Ard Biesheuvel
@ 2013-12-18 10:03 ` Russell King - ARM Linux
2013-12-18 10:25 ` Ard Biesheuvel
0 siblings, 1 reply; 37+ messages in thread
From: Russell King - ARM Linux @ 2013-12-18 10:03 UTC (permalink / raw)
To: linux-arm-kernel
On Wed, Dec 18, 2013 at 10:50:38AM +0100, Ard Biesheuvel wrote:
> On 17 December 2013 13:25, Catalin Marinas <catalin.marinas@arm.com> wrote:
> > On Mon, Dec 16, 2013 at 09:04:34PM +0000, Ard Biesheuvel wrote:
> >> This series is an expansion of the patch posted by Steve Capper about 6 weeks
> >> ago that allocates hwcaps bits for CRC and Crypto Extensions instructions so
> >> userland can discover whether the current CPU has any of those capabilities.
> >>
> >> Patch #1 is a cleanup patch for read_cpuid(), which allowed me to skip adding
> >> yet another #define to asm/cputype.h (for ID_ISAR5_EL1)
> >>
> >> Patch #2 is Steve's original patch, but slightly tweaked because hwcaps bit 2
> >> has been allocated for something else in the mean time.
> >>
> >> Patch #3 allocates the capability bits in the arch/arm tree. This is necessary
> >> because 32-bit ARM binaries can execute both under ARM and under arm64 kernels,
> >> so there should be agreement about the meaning of feature bits, even if those
> >> features don't actually exist on systems covered by the arch/arm tree.
> >>
> >> @Russell: if this looks ok to you, could you please indicate whether you prefer
> >> to take this patch separately, or ack it and let it be merged as part of the
> >> series.
> >>
> >> Patch #4 advertises the CRC and Crypto Extensions to 32-bit binaries running
> >> under an arm64 kernel.
> >
> > The series look fine to me. I'm happy to take all the patches if Russell
> > Acks the arm one (or it can send it via the patch system).
> >
>
> Hello Russell,
>
> Care to share your take on this? I imagine new hwcaps bits take a
> while to percolate and make their way into a stable glibc, so I would
> like to have these changes in sooner rather than later, and 3.14 seems
> feasible.
I'm not all that happy as it gobbles up a load of bits in the hwcap that
will never be set for ARM, and we only have 32 of them (limited by the
size of elf_addr_t). On ARM64, it's less of a problem because the hwcap
is 64-bit there.
If we allocate the ARM64 private never-will-appear-on-ARM hwcaps in bit
32 and above, they'll be hidden from 32-bit stuff. Hopefully, glibc
doesn't concatenate the HWCAP and HWCAP2 fields though - someone should
check that.
Since the bits in the ARM64 hwcap are different from the ARM32 hwcap, I
don't see any point in defining them for ARM32 - userspace needs to make
the definition conditional anyway, and can't interpret the bits as-is
because ARM64 already omits many of the ARM32 ones.
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 0/4] arm64: advertise availability of CRC and crypto instructions
2013-12-18 10:03 ` Russell King - ARM Linux
@ 2013-12-18 10:25 ` Ard Biesheuvel
2013-12-18 10:55 ` Russell King - ARM Linux
0 siblings, 1 reply; 37+ messages in thread
From: Ard Biesheuvel @ 2013-12-18 10:25 UTC (permalink / raw)
To: linux-arm-kernel
On 18 December 2013 11:03, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Wed, Dec 18, 2013 at 10:50:38AM +0100, Ard Biesheuvel wrote:
>> On 17 December 2013 13:25, Catalin Marinas <catalin.marinas@arm.com> wrote:
>> > On Mon, Dec 16, 2013 at 09:04:34PM +0000, Ard Biesheuvel wrote:
>> >> This series is an expansion of the patch posted by Steve Capper about 6 weeks
>> >> ago that allocates hwcaps bits for CRC and Crypto Extensions instructions so
>> >> userland can discover whether the current CPU has any of those capabilities.
>> >>
>> >> Patch #1 is a cleanup patch for read_cpuid(), which allowed me to skip adding
>> >> yet another #define to asm/cputype.h (for ID_ISAR5_EL1)
>> >>
>> >> Patch #2 is Steve's original patch, but slightly tweaked because hwcaps bit 2
>> >> has been allocated for something else in the mean time.
>> >>
>> >> Patch #3 allocates the capability bits in the arch/arm tree. This is necessary
>> >> because 32-bit ARM binaries can execute both under ARM and under arm64 kernels,
>> >> so there should be agreement about the meaning of feature bits, even if those
>> >> features don't actually exist on systems covered by the arch/arm tree.
>> >>
>> >> @Russell: if this looks ok to you, could you please indicate whether you prefer
>> >> to take this patch separately, or ack it and let it be merged as part of the
>> >> series.
>> >>
>> >> Patch #4 advertises the CRC and Crypto Extensions to 32-bit binaries running
>> >> under an arm64 kernel.
>> >
>> > The series look fine to me. I'm happy to take all the patches if Russell
>> > Acks the arm one (or it can send it via the patch system).
>> >
>>
>> Hello Russell,
>>
>> Care to share your take on this? I imagine new hwcaps bits take a
>> while to percolate and make their way into a stable glibc, so I would
>> like to have these changes in sooner rather than later, and 3.14 seems
>> feasible.
>
> I'm not all that happy as it gobbles up a load of bits in the hwcap that
> will never be set for ARM, and we only have 32 of them (limited by the
> size of elf_addr_t). On ARM64, it's less of a problem because the hwcap
> is 64-bit there.
>
I see. Unfortunately, all these features are really separate, i.e., it
is up to the implementor to decide which arbitrary combination of the
extensions he will implement. We could consider merging some bits,
e.g., HWCAP_SHA iff both SHA1 and SHA2 extensions are available, but
it seems like a bit of a hack,
> If we allocate the ARM64 private never-will-appear-on-ARM hwcaps in bit
> 32 and above, they'll be hidden from 32-bit stuff. Hopefully, glibc
> doesn't concatenate the HWCAP and HWCAP2 fields though - someone should
> check that.
>
> Since the bits in the ARM64 hwcap are different from the ARM32 hwcap, I
> don't see any point in defining them for ARM32 - userspace needs to make
> the definition conditional anyway, and can't interpret the bits as-is
> because ARM64 already omits many of the ARM32 ones.
Please note that this is about the compat bits, not the ARM64 specific
ones. These correspond 1:1 with the ARM32 ones. The idea is that a
binary built for ARM will have access to the extended instructions
which ARM64 offers to ARM32 binaries running in 32 bit compatibility
mode (such as AES, SHAx etc). The point of allocating them for ARM is
to avoid conflicts, so if there is another way to ensure that
(HWCAP2?), we could consider that as well. [However, I personally feel
that it makes more sense to spill over to HWCAP2 once we really have
run out of bits in HWCAP, not to logically partition the hwcaps space]
Regards,
Ard.
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 0/4] arm64: advertise availability of CRC and crypto instructions
2013-12-18 10:25 ` Ard Biesheuvel
@ 2013-12-18 10:55 ` Russell King - ARM Linux
2013-12-18 11:15 ` Ard Biesheuvel
0 siblings, 1 reply; 37+ messages in thread
From: Russell King - ARM Linux @ 2013-12-18 10:55 UTC (permalink / raw)
To: linux-arm-kernel
On Wed, Dec 18, 2013 at 11:25:40AM +0100, Ard Biesheuvel wrote:
> On 18 December 2013 11:03, Russell King - ARM Linux
> <linux@arm.linux.org.uk> wrote:
> > If we allocate the ARM64 private never-will-appear-on-ARM hwcaps in bit
> > 32 and above, they'll be hidden from 32-bit stuff. Hopefully, glibc
> > doesn't concatenate the HWCAP and HWCAP2 fields though - someone should
> > check that.
> >
> > Since the bits in the ARM64 hwcap are different from the ARM32 hwcap, I
> > don't see any point in defining them for ARM32 - userspace needs to make
> > the definition conditional anyway, and can't interpret the bits as-is
> > because ARM64 already omits many of the ARM32 ones.
>
> Please note that this is about the compat bits, not the ARM64 specific
> ones. These correspond 1:1 with the ARM32 ones. The idea is that a
> binary built for ARM will have access to the extended instructions
> which ARM64 offers to ARM32 binaries running in 32 bit compatibility
> mode (such as AES, SHAx etc).
This all sounds rather silly IMHO. As ARM32 natively doesn't support
these instructions, why should running an ARM32 binary under ARM64
end up offering this?
If the ARM64 additional instructions are to be used, surely it's not
unreasonable to require ARM64 native applications?
In order to use these new instructions, you're going to have to build
using 64-bit anyway, at which point you have the problem of linking
32-bit applications with 64-bit libraries, and the inherent
incompatibility in the API between the two. Remember, in 64-bit mode,
your pointers are 64-bit whereas in 32-bit mode, they're 32-bit.
Frankly, I don't see the point, not do I see what you're talking about
being technically possible.
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 0/4] arm64: advertise availability of CRC and crypto instructions
2013-12-18 10:55 ` Russell King - ARM Linux
@ 2013-12-18 11:15 ` Ard Biesheuvel
2013-12-18 11:27 ` Catalin Marinas
0 siblings, 1 reply; 37+ messages in thread
From: Ard Biesheuvel @ 2013-12-18 11:15 UTC (permalink / raw)
To: linux-arm-kernel
On 18 December 2013 11:55, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Wed, Dec 18, 2013 at 11:25:40AM +0100, Ard Biesheuvel wrote:
>> On 18 December 2013 11:03, Russell King - ARM Linux
>> <linux@arm.linux.org.uk> wrote:
>> > If we allocate the ARM64 private never-will-appear-on-ARM hwcaps in bit
>> > 32 and above, they'll be hidden from 32-bit stuff. Hopefully, glibc
>> > doesn't concatenate the HWCAP and HWCAP2 fields though - someone should
>> > check that.
>> >
>> > Since the bits in the ARM64 hwcap are different from the ARM32 hwcap, I
>> > don't see any point in defining them for ARM32 - userspace needs to make
>> > the definition conditional anyway, and can't interpret the bits as-is
>> > because ARM64 already omits many of the ARM32 ones.
>>
>> Please note that this is about the compat bits, not the ARM64 specific
>> ones. These correspond 1:1 with the ARM32 ones. The idea is that a
>> binary built for ARM will have access to the extended instructions
>> which ARM64 offers to ARM32 binaries running in 32 bit compatibility
>> mode (such as AES, SHAx etc).
>
> This all sounds rather silly IMHO. As ARM32 natively doesn't support
> these instructions, why should running an ARM32 binary under ARM64
> end up offering this?
>
> If the ARM64 additional instructions are to be used, surely it's not
> unreasonable to require ARM64 native applications?
>
Well, the ARM architects have decided that there shall be Crypto
Extensions instructions not only for ARMv8/Aarch64 but also for
ARMv8/Aarch32. This is fully spec'ed in the latest ARM ARM. For
instance, previously unused NEON opcodes on ARM32 have been allocated
to AES instructions. (for instance, implemented for QEMU here
https://git.linaro.org/people/peter.maydell/qemu-arm.git/commitdiff/9d935509)
> In order to use these new instructions, you're going to have to build
> using 64-bit anyway, at which point you have the problem of linking
> 32-bit applications with 64-bit libraries, and the inherent
> incompatibility in the API between the two. Remember, in 64-bit mode,
> your pointers are 64-bit whereas in 32-bit mode, they're 32-bit.
>
Not quite. The latest Binutils for *32 bit* ARM already supports
something like this:
.arch armv8-a
.arch_extension crypto
aese q0, q1
aesmc q0, q0
which results in an ordinary 32 bit ARM binary but using instructions
that are not available on v7 and earlier.
> Frankly, I don't see the point, not do I see what you're talking about
> being technically possible.
Well, what can I say. I a not making this up, the instructions are
there and functional, and all I am proposing is the ARM and arm64
trees to align on how to advertise the (lack of) capabilities to
userland.
Regards,
Ard.
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 0/4] arm64: advertise availability of CRC and crypto instructions
2013-12-18 11:15 ` Ard Biesheuvel
@ 2013-12-18 11:27 ` Catalin Marinas
2013-12-18 11:34 ` Catalin Marinas
2013-12-18 11:42 ` Russell King - ARM Linux
0 siblings, 2 replies; 37+ messages in thread
From: Catalin Marinas @ 2013-12-18 11:27 UTC (permalink / raw)
To: linux-arm-kernel
On Wed, Dec 18, 2013 at 11:15:45AM +0000, Ard Biesheuvel wrote:
> On 18 December 2013 11:55, Russell King - ARM Linux
> <linux@arm.linux.org.uk> wrote:
> > On Wed, Dec 18, 2013 at 11:25:40AM +0100, Ard Biesheuvel wrote:
> >> On 18 December 2013 11:03, Russell King - ARM Linux
> >> <linux@arm.linux.org.uk> wrote:
> >> > If we allocate the ARM64 private never-will-appear-on-ARM hwcaps in bit
> >> > 32 and above, they'll be hidden from 32-bit stuff. Hopefully, glibc
> >> > doesn't concatenate the HWCAP and HWCAP2 fields though - someone should
> >> > check that.
> >> >
> >> > Since the bits in the ARM64 hwcap are different from the ARM32 hwcap, I
> >> > don't see any point in defining them for ARM32 - userspace needs to make
> >> > the definition conditional anyway, and can't interpret the bits as-is
> >> > because ARM64 already omits many of the ARM32 ones.
> >>
> >> Please note that this is about the compat bits, not the ARM64 specific
> >> ones. These correspond 1:1 with the ARM32 ones. The idea is that a
> >> binary built for ARM will have access to the extended instructions
> >> which ARM64 offers to ARM32 binaries running in 32 bit compatibility
> >> mode (such as AES, SHAx etc).
> >
> > This all sounds rather silly IMHO. As ARM32 natively doesn't support
> > these instructions, why should running an ARM32 binary under ARM64
> > end up offering this?
> >
> > If the ARM64 additional instructions are to be used, surely it's not
> > unreasonable to require ARM64 native applications?
>
> Well, the ARM architects have decided that there shall be Crypto
> Extensions instructions not only for ARMv8/Aarch64 but also for
> ARMv8/Aarch32. This is fully spec'ed in the latest ARM ARM. For
> instance, previously unused NEON opcodes on ARM32 have been allocated
> to AES instructions. (for instance, implemented for QEMU here
> https://git.linaro.org/people/peter.maydell/qemu-arm.git/commitdiff/9d935509)
Indeed. AArch32 is not _dead_ with ARMv8 but getting new features. The
point of this patch is to have a common set of bits between compat arm64
and arm kernel. The AArch32 applications running on ARMv8 (most likely
with an arm64 kernel) may want to make use of the crypto extensions.
If you want a more complete solution, we could add ID_ISAR5 checks on
the arm kernel.
--
Catalin
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 0/4] arm64: advertise availability of CRC and crypto instructions
2013-12-18 11:27 ` Catalin Marinas
@ 2013-12-18 11:34 ` Catalin Marinas
2013-12-18 11:42 ` Russell King - ARM Linux
1 sibling, 0 replies; 37+ messages in thread
From: Catalin Marinas @ 2013-12-18 11:34 UTC (permalink / raw)
To: linux-arm-kernel
On Wed, Dec 18, 2013 at 11:27:14AM +0000, Catalin Marinas wrote:
> On Wed, Dec 18, 2013 at 11:15:45AM +0000, Ard Biesheuvel wrote:
> > On 18 December 2013 11:55, Russell King - ARM Linux
> > <linux@arm.linux.org.uk> wrote:
> > > On Wed, Dec 18, 2013 at 11:25:40AM +0100, Ard Biesheuvel wrote:
> > >> On 18 December 2013 11:03, Russell King - ARM Linux
> > >> <linux@arm.linux.org.uk> wrote:
> > >> > If we allocate the ARM64 private never-will-appear-on-ARM hwcaps in bit
> > >> > 32 and above, they'll be hidden from 32-bit stuff. Hopefully, glibc
> > >> > doesn't concatenate the HWCAP and HWCAP2 fields though - someone should
> > >> > check that.
> > >> >
> > >> > Since the bits in the ARM64 hwcap are different from the ARM32 hwcap, I
> > >> > don't see any point in defining them for ARM32 - userspace needs to make
> > >> > the definition conditional anyway, and can't interpret the bits as-is
> > >> > because ARM64 already omits many of the ARM32 ones.
> > >>
> > >> Please note that this is about the compat bits, not the ARM64 specific
> > >> ones. These correspond 1:1 with the ARM32 ones. The idea is that a
> > >> binary built for ARM will have access to the extended instructions
> > >> which ARM64 offers to ARM32 binaries running in 32 bit compatibility
> > >> mode (such as AES, SHAx etc).
> > >
> > > This all sounds rather silly IMHO. As ARM32 natively doesn't support
> > > these instructions, why should running an ARM32 binary under ARM64
> > > end up offering this?
> > >
> > > If the ARM64 additional instructions are to be used, surely it's not
> > > unreasonable to require ARM64 native applications?
> >
> > Well, the ARM architects have decided that there shall be Crypto
> > Extensions instructions not only for ARMv8/Aarch64 but also for
> > ARMv8/Aarch32. This is fully spec'ed in the latest ARM ARM. For
> > instance, previously unused NEON opcodes on ARM32 have been allocated
> > to AES instructions. (for instance, implemented for QEMU here
> > https://git.linaro.org/people/peter.maydell/qemu-arm.git/commitdiff/9d935509)
>
> Indeed. AArch32 is not _dead_ with ARMv8 but getting new features. The
> point of this patch is to have a common set of bits between compat arm64
> and arm kernel. The AArch32 applications running on ARMv8 (most likely
> with an arm64 kernel) may want to make use of the crypto extensions.
>
> If you want a more complete solution, we could add ID_ISAR5 checks on
> the arm kernel.
BTW, at some point we'll need ARMv8 support to the arm (uClinux) kernel:
http://www.arm.com/products/processors/instruction-set-architectures/armv8-r-architecture.php
ARMv8-R for now is 32-bit only.
--
Catalin
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 0/4] arm64: advertise availability of CRC and crypto instructions
2013-12-18 11:27 ` Catalin Marinas
2013-12-18 11:34 ` Catalin Marinas
@ 2013-12-18 11:42 ` Russell King - ARM Linux
2013-12-18 11:59 ` Ard Biesheuvel
2013-12-18 12:03 ` Catalin Marinas
1 sibling, 2 replies; 37+ messages in thread
From: Russell King - ARM Linux @ 2013-12-18 11:42 UTC (permalink / raw)
To: linux-arm-kernel
On Wed, Dec 18, 2013 at 11:27:14AM +0000, Catalin Marinas wrote:
> On Wed, Dec 18, 2013 at 11:15:45AM +0000, Ard Biesheuvel wrote:
> > On 18 December 2013 11:55, Russell King - ARM Linux
> > <linux@arm.linux.org.uk> wrote:
> > > On Wed, Dec 18, 2013 at 11:25:40AM +0100, Ard Biesheuvel wrote:
> > >> On 18 December 2013 11:03, Russell King - ARM Linux
> > >> <linux@arm.linux.org.uk> wrote:
> > >> > If we allocate the ARM64 private never-will-appear-on-ARM hwcaps in bit
> > >> > 32 and above, they'll be hidden from 32-bit stuff. Hopefully, glibc
> > >> > doesn't concatenate the HWCAP and HWCAP2 fields though - someone should
> > >> > check that.
> > >> >
> > >> > Since the bits in the ARM64 hwcap are different from the ARM32 hwcap, I
> > >> > don't see any point in defining them for ARM32 - userspace needs to make
> > >> > the definition conditional anyway, and can't interpret the bits as-is
> > >> > because ARM64 already omits many of the ARM32 ones.
> > >>
> > >> Please note that this is about the compat bits, not the ARM64 specific
> > >> ones. These correspond 1:1 with the ARM32 ones. The idea is that a
> > >> binary built for ARM will have access to the extended instructions
> > >> which ARM64 offers to ARM32 binaries running in 32 bit compatibility
> > >> mode (such as AES, SHAx etc).
> > >
> > > This all sounds rather silly IMHO. As ARM32 natively doesn't support
> > > these instructions, why should running an ARM32 binary under ARM64
> > > end up offering this?
> > >
> > > If the ARM64 additional instructions are to be used, surely it's not
> > > unreasonable to require ARM64 native applications?
> >
> > Well, the ARM architects have decided that there shall be Crypto
> > Extensions instructions not only for ARMv8/Aarch64 but also for
> > ARMv8/Aarch32. This is fully spec'ed in the latest ARM ARM. For
> > instance, previously unused NEON opcodes on ARM32 have been allocated
> > to AES instructions. (for instance, implemented for QEMU here
> > https://git.linaro.org/people/peter.maydell/qemu-arm.git/commitdiff/9d935509)
>
> Indeed. AArch32 is not _dead_ with ARMv8 but getting new features. The
> point of this patch is to have a common set of bits between compat arm64
> and arm kernel. The AArch32 applications running on ARMv8 (most likely
> with an arm64 kernel) may want to make use of the crypto extensions.
>
> If you want a more complete solution, we could add ID_ISAR5 checks on
> the arm kernel.
The point is that they'll never appear on an ARMv7 implementation because
they're not part of the ARMv7 architecture. I see no point in needlessly
polluting ARM32 with ARM64 stuff - in exactly the same way that you see
no point in polluting ARM64 with ARM32 stuff.
So, frankly, find a different way to this. We don't need to needlessly
waste HWCAP bits on ARM32.
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 0/4] arm64: advertise availability of CRC and crypto instructions
2013-12-18 11:42 ` Russell King - ARM Linux
@ 2013-12-18 11:59 ` Ard Biesheuvel
2013-12-18 12:03 ` Catalin Marinas
1 sibling, 0 replies; 37+ messages in thread
From: Ard Biesheuvel @ 2013-12-18 11:59 UTC (permalink / raw)
To: linux-arm-kernel
On 18 December 2013 12:42, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Wed, Dec 18, 2013 at 11:27:14AM +0000, Catalin Marinas wrote:
>> On Wed, Dec 18, 2013 at 11:15:45AM +0000, Ard Biesheuvel wrote:
>> > On 18 December 2013 11:55, Russell King - ARM Linux
>> > <linux@arm.linux.org.uk> wrote:
>> > > On Wed, Dec 18, 2013 at 11:25:40AM +0100, Ard Biesheuvel wrote:
>> > >> On 18 December 2013 11:03, Russell King - ARM Linux
>> > >> <linux@arm.linux.org.uk> wrote:
>> > >> > If we allocate the ARM64 private never-will-appear-on-ARM hwcaps in bit
>> > >> > 32 and above, they'll be hidden from 32-bit stuff. Hopefully, glibc
>> > >> > doesn't concatenate the HWCAP and HWCAP2 fields though - someone should
>> > >> > check that.
>> > >> >
>> > >> > Since the bits in the ARM64 hwcap are different from the ARM32 hwcap, I
>> > >> > don't see any point in defining them for ARM32 - userspace needs to make
>> > >> > the definition conditional anyway, and can't interpret the bits as-is
>> > >> > because ARM64 already omits many of the ARM32 ones.
>> > >>
>> > >> Please note that this is about the compat bits, not the ARM64 specific
>> > >> ones. These correspond 1:1 with the ARM32 ones. The idea is that a
>> > >> binary built for ARM will have access to the extended instructions
>> > >> which ARM64 offers to ARM32 binaries running in 32 bit compatibility
>> > >> mode (such as AES, SHAx etc).
>> > >
>> > > This all sounds rather silly IMHO. As ARM32 natively doesn't support
>> > > these instructions, why should running an ARM32 binary under ARM64
>> > > end up offering this?
>> > >
>> > > If the ARM64 additional instructions are to be used, surely it's not
>> > > unreasonable to require ARM64 native applications?
>> >
>> > Well, the ARM architects have decided that there shall be Crypto
>> > Extensions instructions not only for ARMv8/Aarch64 but also for
>> > ARMv8/Aarch32. This is fully spec'ed in the latest ARM ARM. For
>> > instance, previously unused NEON opcodes on ARM32 have been allocated
>> > to AES instructions. (for instance, implemented for QEMU here
>> > https://git.linaro.org/people/peter.maydell/qemu-arm.git/commitdiff/9d935509)
>>
>> Indeed. AArch32 is not _dead_ with ARMv8 but getting new features. The
>> point of this patch is to have a common set of bits between compat arm64
>> and arm kernel. The AArch32 applications running on ARMv8 (most likely
>> with an arm64 kernel) may want to make use of the crypto extensions.
>>
>> If you want a more complete solution, we could add ID_ISAR5 checks on
>> the arm kernel.
>
> The point is that they'll never appear on an ARMv7 implementation because
> they're not part of the ARMv7 architecture. I see no point in needlessly
> polluting ARM32 with ARM64 stuff - in exactly the same way that you see
> no point in polluting ARM64 with ARM32 stuff.
>
You are assuming ARMv7 == ARM32 and ARMv8 == ARM64, while in reality,
they are orthogonal.
The 32-bit kernel you are maintaining can potentially run on 32-bit
only v8 implementations (as Catalin pointed out), and both the arm64
and ARM kernels contain implementations for the Aarch32 userland ABI.
So if the current proposal is unsuitable in your opinion, could we at
least have your input on how it should be done instead? Preferably
using a method that supports ifunc relocations, so the loader can
automatically resolve the correct implementation for the hardware at
hand?
Regards,
Ard.
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 0/4] arm64: advertise availability of CRC and crypto instructions
2013-12-18 11:42 ` Russell King - ARM Linux
2013-12-18 11:59 ` Ard Biesheuvel
@ 2013-12-18 12:03 ` Catalin Marinas
2013-12-18 14:27 ` Christopher Covington
1 sibling, 1 reply; 37+ messages in thread
From: Catalin Marinas @ 2013-12-18 12:03 UTC (permalink / raw)
To: linux-arm-kernel
On Wed, Dec 18, 2013 at 11:42:12AM +0000, Russell King - ARM Linux wrote:
> On Wed, Dec 18, 2013 at 11:27:14AM +0000, Catalin Marinas wrote:
> > On Wed, Dec 18, 2013 at 11:15:45AM +0000, Ard Biesheuvel wrote:
> > > On 18 December 2013 11:55, Russell King - ARM Linux
> > > <linux@arm.linux.org.uk> wrote:
> > > > On Wed, Dec 18, 2013 at 11:25:40AM +0100, Ard Biesheuvel wrote:
> > > >> On 18 December 2013 11:03, Russell King - ARM Linux
> > > >> <linux@arm.linux.org.uk> wrote:
> > > >> > If we allocate the ARM64 private never-will-appear-on-ARM hwcaps in bit
> > > >> > 32 and above, they'll be hidden from 32-bit stuff. Hopefully, glibc
> > > >> > doesn't concatenate the HWCAP and HWCAP2 fields though - someone should
> > > >> > check that.
> > > >> >
> > > >> > Since the bits in the ARM64 hwcap are different from the ARM32 hwcap, I
> > > >> > don't see any point in defining them for ARM32 - userspace needs to make
> > > >> > the definition conditional anyway, and can't interpret the bits as-is
> > > >> > because ARM64 already omits many of the ARM32 ones.
> > > >>
> > > >> Please note that this is about the compat bits, not the ARM64 specific
> > > >> ones. These correspond 1:1 with the ARM32 ones. The idea is that a
> > > >> binary built for ARM will have access to the extended instructions
> > > >> which ARM64 offers to ARM32 binaries running in 32 bit compatibility
> > > >> mode (such as AES, SHAx etc).
> > > >
> > > > This all sounds rather silly IMHO. As ARM32 natively doesn't support
> > > > these instructions, why should running an ARM32 binary under ARM64
> > > > end up offering this?
> > > >
> > > > If the ARM64 additional instructions are to be used, surely it's not
> > > > unreasonable to require ARM64 native applications?
> > >
> > > Well, the ARM architects have decided that there shall be Crypto
> > > Extensions instructions not only for ARMv8/Aarch64 but also for
> > > ARMv8/Aarch32. This is fully spec'ed in the latest ARM ARM. For
> > > instance, previously unused NEON opcodes on ARM32 have been allocated
> > > to AES instructions. (for instance, implemented for QEMU here
> > > https://git.linaro.org/people/peter.maydell/qemu-arm.git/commitdiff/9d935509)
> >
> > Indeed. AArch32 is not _dead_ with ARMv8 but getting new features. The
> > point of this patch is to have a common set of bits between compat arm64
> > and arm kernel. The AArch32 applications running on ARMv8 (most likely
> > with an arm64 kernel) may want to make use of the crypto extensions.
> >
> > If you want a more complete solution, we could add ID_ISAR5 checks on
> > the arm kernel.
>
> The point is that they'll never appear on an ARMv7 implementation because
> they're not part of the ARMv7 architecture. I see no point in needlessly
> polluting ARM32 with ARM64 stuff - in exactly the same way that you see
> no point in polluting ARM64 with ARM32 stuff.
I'm not sure whether you are confusing architecture versions with
instruction sets / exception models or you are simply stating that the
32-bit arm kernel will stop at ARMv7.
> So, frankly, find a different way to this. We don't need to needlessly
> waste HWCAP bits on ARM32.
So in your opinion 32-bit only ARMv8-R profile won't be fully supported
in the mainline kernel.
(I mistakenly said uClinux in my previous email; the normal/rich OS part
of the ARMv8-R is AArch32 MMU capable, the Hyp and real-time
capabilities are MMU-less, only MPU)
--
Catalin
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 0/4] arm64: advertise availability of CRC and crypto instructions
2013-12-18 12:03 ` Catalin Marinas
@ 2013-12-18 14:27 ` Christopher Covington
2013-12-18 16:13 ` Ard Biesheuvel
0 siblings, 1 reply; 37+ messages in thread
From: Christopher Covington @ 2013-12-18 14:27 UTC (permalink / raw)
To: linux-arm-kernel
On 12/18/2013 07:03 AM, Catalin Marinas wrote:
> On Wed, Dec 18, 2013 at 11:42:12AM +0000, Russell King - ARM Linux wrote:
>> On Wed, Dec 18, 2013 at 11:27:14AM +0000, Catalin Marinas wrote:
>>> On Wed, Dec 18, 2013 at 11:15:45AM +0000, Ard Biesheuvel wrote:
>>>> On 18 December 2013 11:55, Russell King - ARM Linux
>>>> <linux@arm.linux.org.uk> wrote:
>>>>> On Wed, Dec 18, 2013 at 11:25:40AM +0100, Ard Biesheuvel wrote:
>>>>>> On 18 December 2013 11:03, Russell King - ARM Linux
>>>>>> <linux@arm.linux.org.uk> wrote:
>>>>>>> If we allocate the ARM64 private never-will-appear-on-ARM hwcaps in bit
>>>>>>> 32 and above, they'll be hidden from 32-bit stuff. Hopefully, glibc
>>>>>>> doesn't concatenate the HWCAP and HWCAP2 fields though - someone should
>>>>>>> check that.
>>>>>>>
>>>>>>> Since the bits in the ARM64 hwcap are different from the ARM32 hwcap, I
>>>>>>> don't see any point in defining them for ARM32 - userspace needs to make
>>>>>>> the definition conditional anyway, and can't interpret the bits as-is
>>>>>>> because ARM64 already omits many of the ARM32 ones.
>>>>>>
>>>>>> Please note that this is about the compat bits, not the ARM64 specific
>>>>>> ones. These correspond 1:1 with the ARM32 ones. The idea is that a
>>>>>> binary built for ARM will have access to the extended instructions
>>>>>> which ARM64 offers to ARM32 binaries running in 32 bit compatibility
>>>>>> mode (such as AES, SHAx etc).
>>>>>
>>>>> This all sounds rather silly IMHO. As ARM32 natively doesn't support
>>>>> these instructions, why should running an ARM32 binary under ARM64
>>>>> end up offering this?
>>>>>
>>>>> If the ARM64 additional instructions are to be used, surely it's not
>>>>> unreasonable to require ARM64 native applications?
>>>>
>>>> Well, the ARM architects have decided that there shall be Crypto
>>>> Extensions instructions not only for ARMv8/Aarch64 but also for
>>>> ARMv8/Aarch32. This is fully spec'ed in the latest ARM ARM. For
>>>> instance, previously unused NEON opcodes on ARM32 have been allocated
>>>> to AES instructions. (for instance, implemented for QEMU here
>>>> https://git.linaro.org/people/peter.maydell/qemu-arm.git/commitdiff/9d935509)
>>>
>>> Indeed. AArch32 is not _dead_ with ARMv8 but getting new features. The
>>> point of this patch is to have a common set of bits between compat arm64
>>> and arm kernel. The AArch32 applications running on ARMv8 (most likely
>>> with an arm64 kernel) may want to make use of the crypto extensions.
>>>
>>> If you want a more complete solution, we could add ID_ISAR5 checks on
>>> the arm kernel.
>>
>> The point is that they'll never appear on an ARMv7 implementation because
>> they're not part of the ARMv7 architecture. I see no point in needlessly
>> polluting ARM32 with ARM64 stuff - in exactly the same way that you see
>> no point in polluting ARM64 with ARM32 stuff.
>
> I'm not sure whether you are confusing architecture versions with
> instruction sets / exception models or you are simply stating that the
> 32-bit arm kernel will stop at ARMv7.
I do not think that Russell is the source of the confusion. Ard wrote, "The
idea is that a binary built for ARM will have access to the extended
instructions which ARM64 offers to ARM32 binaries running in 32 bit
compatibility mode (such as AES, SHAx etc)." I think s/ARM64/ARMv8/ is
necessary to make the statement correct, and hopefully less confusing.
Christopher
--
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by the Linux Foundation.
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 0/4] arm64: advertise availability of CRC and crypto instructions
2013-12-18 14:27 ` Christopher Covington
@ 2013-12-18 16:13 ` Ard Biesheuvel
2013-12-18 17:29 ` Catalin Marinas
2013-12-18 19:57 ` Nicolas Pitre
0 siblings, 2 replies; 37+ messages in thread
From: Ard Biesheuvel @ 2013-12-18 16:13 UTC (permalink / raw)
To: linux-arm-kernel
On 18 December 2013 15:27, Christopher Covington <cov@codeaurora.org> wrote:
>
> I do not think that Russell is the source of the confusion. Ard wrote, "The
> idea is that a binary built for ARM will have access to the extended
> instructions which ARM64 offers to ARM32 binaries running in 32 bit
> compatibility mode (such as AES, SHAx etc)." I think s/ARM64/ARMv8/ is
> necessary to make the statement correct, and hopefully less confusing.
>
My apologies for adding to the confusion (or creating it in the first place).
However, the bottom line is that, as the 32 bit and 64 bit kernels are
both able to support userland processes running in the execution state
that has retroactively been dubbed 'AArch32', they should both honor
the same contract with AArch32 userland on how to discover CPU
capabilities at runtime. I do understand Russell's reservations about
allocating 6 of the remaining 10 hwcaps bits, and I am open to
suggestions on a better approach. But it is essential that this be
sorted between ARM and arm64 so that AArch32 userland does not need to
be aware of the flavor of kernel it is running under. (Tricks like
trapping SIGILL to infer hwcaps are suboptimal and likely to create
problems going forward). Also, the suggestion that those hwcaps bits
are essentially 'wasted' for ARM32 does not make sense considering
that 32-bit only ARMv8-R CPUs (which may or may not support some or
all of the Crypto Extensions) will need to be supported by the ARM32
kernel as well.
Regards,
Ard.
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 0/4] arm64: advertise availability of CRC and crypto instructions
2013-12-18 16:13 ` Ard Biesheuvel
@ 2013-12-18 17:29 ` Catalin Marinas
2013-12-18 18:50 ` Ard Biesheuvel
2013-12-18 19:57 ` Nicolas Pitre
1 sibling, 1 reply; 37+ messages in thread
From: Catalin Marinas @ 2013-12-18 17:29 UTC (permalink / raw)
To: linux-arm-kernel
On Wed, Dec 18, 2013 at 04:13:52PM +0000, Ard Biesheuvel wrote:
> On 18 December 2013 15:27, Christopher Covington <cov@codeaurora.org> wrote:
> > I do not think that Russell is the source of the confusion. Ard wrote, "The
> > idea is that a binary built for ARM will have access to the extended
> > instructions which ARM64 offers to ARM32 binaries running in 32 bit
> > compatibility mode (such as AES, SHAx etc)." I think s/ARM64/ARMv8/ is
> > necessary to make the statement correct, and hopefully less confusing.
> >
>
> My apologies for adding to the confusion (or creating it in the first place).
>
> However, the bottom line is that, as the 32 bit and 64 bit kernels are
> both able to support userland processes running in the execution state
> that has retroactively been dubbed 'AArch32', they should both honor
> the same contract with AArch32 userland on how to discover CPU
> capabilities at runtime.
For the time being, I merged the first two patches for AArch64 support.
I'm not merging the compat one yet as this should strictly follow the
arch/arm support.
Thanks.
--
Catalin
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 0/4] arm64: advertise availability of CRC and crypto instructions
2013-12-18 17:29 ` Catalin Marinas
@ 2013-12-18 18:50 ` Ard Biesheuvel
2013-12-19 11:11 ` Catalin Marinas
0 siblings, 1 reply; 37+ messages in thread
From: Ard Biesheuvel @ 2013-12-18 18:50 UTC (permalink / raw)
To: linux-arm-kernel
On 18 December 2013 18:29, Catalin Marinas <catalin.marinas@arm.com> wrote:
> For the time being, I merged the first two patches for AArch64 support.
> I'm not merging the compat one yet as this should strictly follow the
> arch/arm support.
>
Yeah, I guess that makes sense.
I find this ghetto style drive-by veto'ing less than productive, I
would much rather have had an informed discussion involving both
Russell and you and settle on a solution that is acceptable for both
ARM and arm64.
Thanks,
Ard.
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 0/4] arm64: advertise availability of CRC and crypto instructions
2013-12-18 16:13 ` Ard Biesheuvel
2013-12-18 17:29 ` Catalin Marinas
@ 2013-12-18 19:57 ` Nicolas Pitre
2013-12-18 20:26 ` Ard Biesheuvel
1 sibling, 1 reply; 37+ messages in thread
From: Nicolas Pitre @ 2013-12-18 19:57 UTC (permalink / raw)
To: linux-arm-kernel
On Wed, 18 Dec 2013, Ard Biesheuvel wrote:
> On 18 December 2013 15:27, Christopher Covington <cov@codeaurora.org> wrote:
> >
> > I do not think that Russell is the source of the confusion. Ard wrote, "The
> > idea is that a binary built for ARM will have access to the extended
> > instructions which ARM64 offers to ARM32 binaries running in 32 bit
> > compatibility mode (such as AES, SHAx etc)." I think s/ARM64/ARMv8/ is
> > necessary to make the statement correct, and hopefully less confusing.
> >
>
> My apologies for adding to the confusion (or creating it in the first place).
>
> However, the bottom line is that, as the 32 bit and 64 bit kernels are
> both able to support userland processes running in the execution state
> that has retroactively been dubbed 'AArch32', they should both honor
> the same contract with AArch32 userland on how to discover CPU
> capabilities at runtime. I do understand Russell's reservations about
> allocating 6 of the remaining 10 hwcaps bits, and I am open to
> suggestions on a better approach.
What is the reason for eating a grand total of 6 bits at once in the
first place?
Are those capabilities really going to be independently integrated? In
other words, what is the probability for a vendor to integrate some but
not the others? If this probability is low then maybe a smaller set of
wider-covering bits would be good enough in practice, and then some
kernel emulation could be added for the odd cases.
Nicolas
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 0/4] arm64: advertise availability of CRC and crypto instructions
2013-12-18 19:57 ` Nicolas Pitre
@ 2013-12-18 20:26 ` Ard Biesheuvel
2013-12-18 21:18 ` Nicolas Pitre
0 siblings, 1 reply; 37+ messages in thread
From: Ard Biesheuvel @ 2013-12-18 20:26 UTC (permalink / raw)
To: linux-arm-kernel
On 18 December 2013 20:57, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> On Wed, 18 Dec 2013, Ard Biesheuvel wrote:
>
>> On 18 December 2013 15:27, Christopher Covington <cov@codeaurora.org> wrote:
>> >
>> > I do not think that Russell is the source of the confusion. Ard wrote, "The
>> > idea is that a binary built for ARM will have access to the extended
>> > instructions which ARM64 offers to ARM32 binaries running in 32 bit
>> > compatibility mode (such as AES, SHAx etc)." I think s/ARM64/ARMv8/ is
>> > necessary to make the statement correct, and hopefully less confusing.
>> >
>>
>> My apologies for adding to the confusion (or creating it in the first place).
>>
>> However, the bottom line is that, as the 32 bit and 64 bit kernels are
>> both able to support userland processes running in the execution state
>> that has retroactively been dubbed 'AArch32', they should both honor
>> the same contract with AArch32 userland on how to discover CPU
>> capabilities at runtime. I do understand Russell's reservations about
>> allocating 6 of the remaining 10 hwcaps bits, and I am open to
>> suggestions on a better approach.
>
> What is the reason for eating a grand total of 6 bits at once in the
> first place?
>
I wasn't entirely accurate: it's 5 bits not 6 ...
> Are those capabilities really going to be independently integrated? In
> other words, what is the probability for a vendor to integrate some but
> not the others? If this probability is low then maybe a smaller set of
> wider-covering bits would be good enough in practice, and then some
> kernel emulation could be added for the odd cases.
>
The capabilities in question are:
* AES
* 64 bit polynomial (carry-less) multiply
* SHA1
* SHA2
* CRC32
and it is up to the implementor to choose the combination. To me,
there are no obviously more likely combinations, but perhaps others
have other ideas?
The nice thing about hwcaps is that it is already integrated into the
ifunc resolution done by the loader, which makes it very easy and
straightforward to offer alternative implementations of library
functions based on CPU capabilities.
As any kind of emulation in the kernel is likely to be slower than an
optimized implementation for a CPU without the feature in question,
trapping SIGILL to infer hwcaps is probably the only viable alternate
approach that does not require a new kernel interface.
Regards,
Ard.
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 0/4] arm64: advertise availability of CRC and crypto instructions
2013-12-18 20:26 ` Ard Biesheuvel
@ 2013-12-18 21:18 ` Nicolas Pitre
2013-12-18 21:57 ` Ard Biesheuvel
0 siblings, 1 reply; 37+ messages in thread
From: Nicolas Pitre @ 2013-12-18 21:18 UTC (permalink / raw)
To: linux-arm-kernel
On Wed, 18 Dec 2013, Ard Biesheuvel wrote:
> On 18 December 2013 20:57, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> > On Wed, 18 Dec 2013, Ard Biesheuvel wrote:
> >
> >> On 18 December 2013 15:27, Christopher Covington <cov@codeaurora.org> wrote:
> >> >
> >> > I do not think that Russell is the source of the confusion. Ard wrote, "The
> >> > idea is that a binary built for ARM will have access to the extended
> >> > instructions which ARM64 offers to ARM32 binaries running in 32 bit
> >> > compatibility mode (such as AES, SHAx etc)." I think s/ARM64/ARMv8/ is
> >> > necessary to make the statement correct, and hopefully less confusing.
> >> >
> >>
> >> My apologies for adding to the confusion (or creating it in the first place).
> >>
> >> However, the bottom line is that, as the 32 bit and 64 bit kernels are
> >> both able to support userland processes running in the execution state
> >> that has retroactively been dubbed 'AArch32', they should both honor
> >> the same contract with AArch32 userland on how to discover CPU
> >> capabilities at runtime. I do understand Russell's reservations about
> >> allocating 6 of the remaining 10 hwcaps bits, and I am open to
> >> suggestions on a better approach.
> >
> > What is the reason for eating a grand total of 6 bits at once in the
> > first place?
> >
>
> I wasn't entirely accurate: it's 5 bits not 6 ...
>
> > Are those capabilities really going to be independently integrated? In
> > other words, what is the probability for a vendor to integrate some but
> > not the others? If this probability is low then maybe a smaller set of
> > wider-covering bits would be good enough in practice, and then some
> > kernel emulation could be added for the odd cases.
> >
>
> The capabilities in question are:
> * AES
> * 64 bit polynomial (carry-less) multiply
> * SHA1
> * SHA2
> * CRC32
> and it is up to the implementor to choose the combination. To me,
> there are no obviously more likely combinations, but perhaps others
> have other ideas?
In any case, I agree with Russell that this looks a bit excessive to
have a single bit for individual instructions. The current hwcaps is
certainly not suitable for that level of granularity without a way to
extend it.
What does the ARM ARM say about those instructions? Are they
individually optional?
> The nice thing about hwcaps is that it is already integrated into the
> ifunc resolution done by the loader, which makes it very easy and
> straightforward to offer alternative implementations of library
> functions based on CPU capabilities.
The library may as well implement its own ifunc that tests the
instruction while trapping SIGILL. On those systems with the supported
instruction there will be no trap. On those that traps then the
alternative implementation is going to be much slower anyway.
> As any kind of emulation in the kernel is likely to be slower than an
> optimized implementation for a CPU without the feature in question,
> trapping SIGILL to infer hwcaps is probably the only viable alternate
> approach that does not require a new kernel interface.
True. However the kernel side infrastructure to emulate any instruction
is already there. So this is just a matter of adding an additional
entry making the call to the existing libs. At least that would make
things work in case the user space libs, or some inline assembly in
application code, is not expecting the lack of hardware support.
Nicolas
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 0/4] arm64: advertise availability of CRC and crypto instructions
2013-12-18 21:18 ` Nicolas Pitre
@ 2013-12-18 21:57 ` Ard Biesheuvel
2013-12-19 6:48 ` Siarhei Siamashka
2013-12-19 18:07 ` Nicolas Pitre
0 siblings, 2 replies; 37+ messages in thread
From: Ard Biesheuvel @ 2013-12-18 21:57 UTC (permalink / raw)
To: linux-arm-kernel
On 18 December 2013 22:18, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> On Wed, 18 Dec 2013, Ard Biesheuvel wrote:
>> The capabilities in question are:
>> * AES
>> * 64 bit polynomial (carry-less) multiply
>> * SHA1
>> * SHA2
>> * CRC32
>> and it is up to the implementor to choose the combination. To me,
>> there are no obviously more likely combinations, but perhaps others
>> have other ideas?
>
> In any case, I agree with Russell that this looks a bit excessive to
> have a single bit for individual instructions.
They are not in fact individual instructions:
* AES -> aese, aesd, aesmc, aesimc
* SHA1 -> sha1c, sha1m, sha1m, sha1p, sha1su0, sha1su1
* SHA2 -> sha256h, sha256h2, sha256su0, sha256su1
* CRC32 -> crc32, crc32c
The only feature that covers a single instruction is poly64 multiply.
> The current hwcaps is
> certainly not suitable for that level of granularity without a way to
> extend it.
>
There is a way to extend it. It is called HWCAP2, and it is already in
use by powerpc.
> What does the ARM ARM say about those instructions? Are they
> individually optional?
>
Yes, the features (mostly) are. The register is called ID_ISAR5: the
way the bits map to the features is a bit odd, though. AES, SHA1, SHA2
and CRC can be enabled independently, only (oddly) poly64 multiply
implies AES support as well.
>> The nice thing about hwcaps is that it is already integrated into the
>> ifunc resolution done by the loader, which makes it very easy and
>> straightforward to offer alternative implementations of library
>> functions based on CPU capabilities.
>
> The library may as well implement its own ifunc that tests the
> instruction while trapping SIGILL. On those systems with the supported
> instruction there will be no trap. On those that traps then the
> alternative implementation is going to be much slower anyway.
>
True. And the trap still only occurs at load time. But I think we
agree it is essentially a poor man's hwcaps.
>> As any kind of emulation in the kernel is likely to be slower than an
>> optimized implementation for a CPU without the feature in question,
>> trapping SIGILL to infer hwcaps is probably the only viable alternate
>> approach that does not require a new kernel interface.
>
> True. However the kernel side infrastructure to emulate any instruction
> is already there. So this is just a matter of adding an additional
> entry making the call to the existing libs. At least that would make
> things work in case the user space libs, or some inline assembly in
> application code, is not expecting the lack of hardware support.
>
I agree that this is all feasible. But I still feel that the best
solution is to allocate 5 bits in HWCAP, that leaves us with 5 spares
now and another 32 when we (trivially) enable HWCAP2 for ARM.
Regards,
Ard.
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 0/4] arm64: advertise availability of CRC and crypto instructions
2013-12-18 21:57 ` Ard Biesheuvel
@ 2013-12-19 6:48 ` Siarhei Siamashka
2013-12-19 11:48 ` Catalin Marinas
2013-12-19 17:33 ` Ard Biesheuvel
2013-12-19 18:07 ` Nicolas Pitre
1 sibling, 2 replies; 37+ messages in thread
From: Siarhei Siamashka @ 2013-12-19 6:48 UTC (permalink / raw)
To: linux-arm-kernel
On Wed, 18 Dec 2013 22:57:33 +0100
Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> On 18 December 2013 22:18, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> > On Wed, 18 Dec 2013, Ard Biesheuvel wrote:
> >> The nice thing about hwcaps is that it is already integrated into the
> >> ifunc resolution done by the loader, which makes it very easy and
> >> straightforward to offer alternative implementations of library
> >> functions based on CPU capabilities.
> >
> > The library may as well implement its own ifunc that tests the
> > instruction while trapping SIGILL. On those systems with the supported
> > instruction there will be no trap. On those that traps then the
> > alternative implementation is going to be much slower anyway.
> >
>
> True. And the trap still only occurs at load time. But I think we
> agree it is essentially a poor man's hwcaps.
And the hwcaps is essentially a poor man's replacement for a userspace
accessible CPUID instruction enjoyed by x86.
It's sad to see that the runtime CPU features detection still remains
a PITA with AArch64. Basically, it's not enough to know if the
instruction is supported or not. Different microarchitectures may
various performance quirks for certain instructions. For example,
VFPLite in Cortex-A8 is non-pipelined and slow. Cortex-A15 can
dual-issue NEON instructions (nice for the code which can enjoy
high ILP), but Cortex-A15 NEON instructions have relatively high
latency (bad for the code, which is essentially a long dependency
chain). The fastest way to read uncached memory for most ARM
processors is to use the VFP load multiple instruction with as
many registers as possible, but this is slow on Marvell PJ4. And
so on.
The information, usable for basic microarchitecture identification
(the value from MIDR register) is only exposed in /proc/cpuinfo, which
makes it an overall winner for the runtime CPU features detection
method. Additionally, reading /proc/self/auxv for retrieving hwcaps has
issues when run under qemu or valgrind. Instructions trapping is a very
bad idea for multiple reasons (one of them is the fact that we can't
easily distinguish between trapped&emulated and natively supported
by hardware, think about FP instructions emulation for example).
So there is no really good alternative to /proc/cpuinfo parsing. But
text parsing is relatively cumbersome to implement. And this method is
obviously not blazingly fast. Also the big.LITTLE systems introduce
an interesting new challenge. How do we know whether we are running
the code on Cortex-A7 or Cortex-A15 at any arbitrary moment? We might
want to have several different assembly optimized functions, one
optimized for Cortex-A15 pipeline and another one optimized for
Cortex-A7. It would be nice to be able to frequently poll for the CPU
features of the currently running CPU core (for example, once per
frame in a video encoder/decoder) to select the fastest code path.
With /proc/cpuinfo text parsing this is not going to work nicely.
The best solution would be in my opinion a userspace accessible (and
guaranteed not to trap) CPUID instruction. This has proven to work
nicely for x86, so why inventing something overly complicated instead?
In the case if the OS wants to conceal the CPU features from the
userspace application, some special "I don't want to tell you,
please use the slowest code path possible" value could be defined
and returned by this instruction.
Well, if it's not desired (and already too late) to change how the
hardware works, another solution would be to have runtime CPU
features detection supported as part of the run-time ABI. For example,
make it mandatory for any EABI conforming system to provide some helper
functions like __aeabi_read_midr() or __aeabi_read_hwcaps(). They could
be implemented for ARM Linux via the kernel-provided user helpers, VDSO
or whatever other method that is appropriate. If this works for the
things like TLS (__aeabi_read_tp), why can't it work for runtime CPU
features detection too? The recent gcc versions also have some nice
built-in functions for runtime cpu features detection on x86
such as __builtin_cpu_is(), __builtin_cpu_supports():
http://gcc.gnu.org/gcc-4.8/changes.html
Please, could we finally have something sane for the runtime CPU
features detection on ARM hardware?
--
Best regards,
Siarhei Siamashka
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 0/4] arm64: advertise availability of CRC and crypto instructions
2013-12-18 18:50 ` Ard Biesheuvel
@ 2013-12-19 11:11 ` Catalin Marinas
0 siblings, 0 replies; 37+ messages in thread
From: Catalin Marinas @ 2013-12-19 11:11 UTC (permalink / raw)
To: linux-arm-kernel
On Wed, Dec 18, 2013 at 06:50:35PM +0000, Ard Biesheuvel wrote:
> On 18 December 2013 18:29, Catalin Marinas <catalin.marinas@arm.com> wrote:
>
> > For the time being, I merged the first two patches for AArch64 support.
> > I'm not merging the compat one yet as this should strictly follow the
> > arch/arm support.
>
> Yeah, I guess that makes sense.
>
> I find this ghetto style drive-by veto'ing less than productive,
LOL. That's how we add a bit of excitement to our monotonous maintainer
life ;).
> I would much rather have had an informed discussion involving both
> Russell and you and settle on a solution that is acceptable for both
> ARM and arm64.
I think we first need to get clarification from Russell whether the
problem is the number of hwcap bits used or just not willing to take new
ARMv8/AArch32 features in the 32-bit kernel.
If the latter, we can wait until AArch32 user space people start working
on optimised crypto libraries and restart the discussion at the time.
--
Catalin
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 0/4] arm64: advertise availability of CRC and crypto instructions
2013-12-19 6:48 ` Siarhei Siamashka
@ 2013-12-19 11:48 ` Catalin Marinas
2013-12-20 6:29 ` Siarhei Siamashka
2013-12-19 17:33 ` Ard Biesheuvel
1 sibling, 1 reply; 37+ messages in thread
From: Catalin Marinas @ 2013-12-19 11:48 UTC (permalink / raw)
To: linux-arm-kernel
On Thu, Dec 19, 2013 at 06:48:16AM +0000, Siarhei Siamashka wrote:
> On Wed, 18 Dec 2013 22:57:33 +0100
> Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
>
> > On 18 December 2013 22:18, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> > > On Wed, 18 Dec 2013, Ard Biesheuvel wrote:
> > >> The nice thing about hwcaps is that it is already integrated into the
> > >> ifunc resolution done by the loader, which makes it very easy and
> > >> straightforward to offer alternative implementations of library
> > >> functions based on CPU capabilities.
> > >
> > > The library may as well implement its own ifunc that tests the
> > > instruction while trapping SIGILL. On those systems with the supported
> > > instruction there will be no trap. On those that traps then the
> > > alternative implementation is going to be much slower anyway.
> > >
> >
> > True. And the trap still only occurs at load time. But I think we
> > agree it is essentially a poor man's hwcaps.
>
> And the hwcaps is essentially a poor man's replacement for a userspace
> accessible CPUID instruction enjoyed by x86.
hwcaps has its value but I agree that some quicker access would be good
in certain cases. However, simply exposing the CPUID scheme to user
space may look nice initially but has other problems. All the
discussions we had (in ARM) basically ended up with having some scratch
registers that could be accessed from user via mrs and the kernel would
either copy the CPUID registers or hwcap-like bits (but basically it is
just an ABI between user and kernel).
> So there is no really good alternative to /proc/cpuinfo parsing. But
> text parsing is relatively cumbersome to implement. And this method is
> obviously not blazingly fast. Also the big.LITTLE systems introduce
> an interesting new challenge. How do we know whether we are running
> the code on Cortex-A7 or Cortex-A15 at any arbitrary moment? We might
> want to have several different assembly optimized functions, one
> optimized for Cortex-A15 pipeline and another one optimized for
> Cortex-A7. It would be nice to be able to frequently poll for the CPU
> features of the currently running CPU core (for example, once per
> frame in a video encoder/decoder) to select the fastest code path.
> With /proc/cpuinfo text parsing this is not going to work nicely.
With big.LITTLE user-space can't tell on which CPU it is running. Even
if it could, it needs to cope with preemption and migration to another
CPU. If we assume the that the same features are present on both, some
routines may occasionally be unoptimal but it shouldn't be that bad.
Anyway, for such A7/A15 combinations, the idea is to optimise for A7's
pipeline since A15 execution is more out of order and tolerant to
instruction order.
> The best solution would be in my opinion a userspace accessible (and
> guaranteed not to trap) CPUID instruction. This has proven to work
> nicely for x86, so why inventing something overly complicated instead?
> In the case if the OS wants to conceal the CPU features from the
> userspace application, some special "I don't want to tell you,
> please use the slowest code path possible" value could be defined
> and returned by this instruction.
As I said above, just raw access to the CPUID registers may not always
be desirable. Some features require kernel support (like FP register
saving/restoring), so if you run an older kernel on a newer CPU you
shouldn't really use such feature.
(I'm also not entirely sure about crypto stuff and export regulations,
whether a mobile vendor may want to disable some hwcap bits in kernel
even though the hardware supports it)
> Well, if it's not desired (and already too late) to change how the
> hardware works, another solution would be to have runtime CPU
> features detection supported as part of the run-time ABI. For example,
> make it mandatory for any EABI conforming system to provide some helper
> functions like __aeabi_read_midr() or __aeabi_read_hwcaps(). They could
> be implemented for ARM Linux via the kernel-provided user helpers, VDSO
> or whatever other method that is appropriate. If this works for the
> things like TLS (__aeabi_read_tp), why can't it work for runtime CPU
> features detection too? The recent gcc versions also have some nice
> built-in functions for runtime cpu features detection on x86
> such as __builtin_cpu_is(), __builtin_cpu_supports():
> http://gcc.gnu.org/gcc-4.8/changes.html
We discussed this in ARM with the toolchain guys and I'm fine with the
idea. But for backwards compatibility, we would need a way for newer
software to work on older kernels. On arm64, with VDSO is easier since
glibc could have a weak function that returns not-implemented. I would
rather have a VDSO on arm as well rather than abusing the vectors page.
If you want to distinguish between CPUs, we can use one of the unused
TLS registers as offset into a VDSO data array with per-CPU information
(all handled via the VDSO code, so user shouldn't really know the
meaning). We have a user read-only thread register unused on arm64 (and
that's what we had in mind when using the read/write register for user
TLS).
However, that's an optimisation and I don't think it should replace
hwcap bits for new features.
--
Catalin
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 0/4] arm64: advertise availability of CRC and crypto instructions
2013-12-19 6:48 ` Siarhei Siamashka
2013-12-19 11:48 ` Catalin Marinas
@ 2013-12-19 17:33 ` Ard Biesheuvel
2013-12-20 1:35 ` Siarhei Siamashka
1 sibling, 1 reply; 37+ messages in thread
From: Ard Biesheuvel @ 2013-12-19 17:33 UTC (permalink / raw)
To: linux-arm-kernel
On 19 December 2013 07:48, Siarhei Siamashka
<siarhei.siamashka@gmail.com> wrote:
> On Wed, 18 Dec 2013 22:57:33 +0100
> Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
>
>> On 18 December 2013 22:18, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
>> > On Wed, 18 Dec 2013, Ard Biesheuvel wrote:
>> >> The nice thing about hwcaps is that it is already integrated into the
>> >> ifunc resolution done by the loader, which makes it very easy and
>> >> straightforward to offer alternative implementations of library
>> >> functions based on CPU capabilities.
>> >
>> > The library may as well implement its own ifunc that tests the
>> > instruction while trapping SIGILL. On those systems with the supported
>> > instruction there will be no trap. On those that traps then the
>> > alternative implementation is going to be much slower anyway.
>> >
>>
>> True. And the trap still only occurs at load time. But I think we
>> agree it is essentially a poor man's hwcaps.
>
> And the hwcaps is essentially a poor man's replacement for a userspace
> accessible CPUID instruction enjoyed by x86.
>
> It's sad to see that the runtime CPU features detection still remains
> a PITA with AArch64. Basically, it's not enough to know if the
> instruction is supported or not. Different microarchitectures may
> various performance quirks for certain instructions. For example,
> VFPLite in Cortex-A8 is non-pipelined and slow. Cortex-A15 can
> dual-issue NEON instructions (nice for the code which can enjoy
> high ILP), but Cortex-A15 NEON instructions have relatively high
> latency (bad for the code, which is essentially a long dependency
> chain). The fastest way to read uncached memory for most ARM
> processors is to use the VFP load multiple instruction with as
> many registers as possible, but this is slow on Marvell PJ4. And
> so on.
>
You are comparing apples and oranges.
It is fairly well known that you are better off using the NEON for
floating point on a Cortex-A8, if you can afford the reduced
precision. But if you /can't/ afford the reduced precision, you are
still better off using VFP-lite than using software emulation.
The same applies to the Crypto Extensions: it is highly unlikely that
you will care about the particular implementation of the AES
instructions if you are faced with the choice of using those
instructions or using a software implementation. So using hwcaps bits
for these kinds of features makes perfect sense. (And so does enabling
the 'has-vfp' bit for VFP-lite)
I do agree with you that the heterogeneity between various ARM
implementors is a PITA at times, and knowing which CPU exactly you are
running on is a valid question in those cases (btw this applies to SSE
on Atom as well).
But please don't confuse it with the simple presence or absence of
some CPU extension.
Regards,
Ard.
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 0/4] arm64: advertise availability of CRC and crypto instructions
2013-12-18 21:57 ` Ard Biesheuvel
2013-12-19 6:48 ` Siarhei Siamashka
@ 2013-12-19 18:07 ` Nicolas Pitre
1 sibling, 0 replies; 37+ messages in thread
From: Nicolas Pitre @ 2013-12-19 18:07 UTC (permalink / raw)
To: linux-arm-kernel
On Wed, 18 Dec 2013, Ard Biesheuvel wrote:
> On 18 December 2013 22:18, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> > The current hwcaps is
> > certainly not suitable for that level of granularity without a way to
> > extend it.
> >
>
> There is a way to extend it. It is called HWCAP2, and it is already in
> use by powerpc.
Great. Shall we do that first then?
Nicolas
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 0/4] arm64: advertise availability of CRC and crypto instructions
2013-12-19 17:33 ` Ard Biesheuvel
@ 2013-12-20 1:35 ` Siarhei Siamashka
0 siblings, 0 replies; 37+ messages in thread
From: Siarhei Siamashka @ 2013-12-20 1:35 UTC (permalink / raw)
To: linux-arm-kernel
On Thu, 19 Dec 2013 18:33:45 +0100
Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> On 19 December 2013 07:48, Siarhei Siamashka
> <siarhei.siamashka@gmail.com> wrote:
> > On Wed, 18 Dec 2013 22:57:33 +0100
> > Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> >
> >> On 18 December 2013 22:18, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> >> > On Wed, 18 Dec 2013, Ard Biesheuvel wrote:
> >> >> The nice thing about hwcaps is that it is already integrated into the
> >> >> ifunc resolution done by the loader, which makes it very easy and
> >> >> straightforward to offer alternative implementations of library
> >> >> functions based on CPU capabilities.
> >> >
> >> > The library may as well implement its own ifunc that tests the
> >> > instruction while trapping SIGILL. On those systems with the supported
> >> > instruction there will be no trap. On those that traps then the
> >> > alternative implementation is going to be much slower anyway.
> >> >
> >>
> >> True. And the trap still only occurs at load time. But I think we
> >> agree it is essentially a poor man's hwcaps.
> >
> > And the hwcaps is essentially a poor man's replacement for a userspace
> > accessible CPUID instruction enjoyed by x86.
> >
> > It's sad to see that the runtime CPU features detection still remains
> > a PITA with AArch64. Basically, it's not enough to know if the
> > instruction is supported or not. Different microarchitectures may
> > various performance quirks for certain instructions. For example,
> > VFPLite in Cortex-A8 is non-pipelined and slow. Cortex-A15 can
> > dual-issue NEON instructions (nice for the code which can enjoy
> > high ILP), but Cortex-A15 NEON instructions have relatively high
> > latency (bad for the code, which is essentially a long dependency
> > chain). The fastest way to read uncached memory for most ARM
> > processors is to use the VFP load multiple instruction with as
> > many registers as possible, but this is slow on Marvell PJ4. And
> > so on.
> >
>
> You are comparing apples and oranges.
>
> It is fairly well known that you are better off using the NEON for
> floating point on a Cortex-A8, if you can afford the reduced
> precision. But if you /can't/ afford the reduced precision, you are
> still better off using VFP-lite than using software emulation.
If the reduced precision of 32-bit floats can't be afforded, it is still
sometimes possible to use more accurate fixed point calculations
instead. And do them faster than using VFP-lite. The generic and slow
software emulation of 64-bit doubles is not even an option.
That's exactly the point. If we know more information about the CPU
capabilities, we can select a more suitable implementation at runtime.
Even the implementation, which uses a somewhat different algorithm
for doing the same job.
> The same applies to the Crypto Extensions: it is highly unlikely that
> you will care about the particular implementation of the AES
> instructions if you are faced with the choice of using those
> instructions or using a software implementation. So using hwcaps bits
> for these kinds of features makes perfect sense. (And so does enabling
> the 'has-vfp' bit for VFP-lite)
I'm not opposing the addition of Crypto Extensions support to hwcaps.
Still this just covers only the basic use cases (which is great!) but
is not enough to make everyone happy.
> I do agree with you that the heterogeneity between various ARM
> implementors is a PITA at times, and knowing which CPU exactly you are
> running on is a valid question in those cases
Again, this was exactly the point of my e-mail. And appears that we
agree with each other.
> (btw this applies to SSE on Atom as well).
I'm well aware of the Atom SSSE3 performance issues (the microcoded
PSHUFB instruction in particular). The key difference is that the x86
architecture allows to easily identify the CPU cores.
> But please don't confuse it with the simple presence or absence of
> some CPU extension.
...
--
Best regards,
Siarhei Siamashka
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 0/4] arm64: advertise availability of CRC and crypto instructions
2013-12-19 11:48 ` Catalin Marinas
@ 2013-12-20 6:29 ` Siarhei Siamashka
2013-12-20 11:27 ` Catalin Marinas
0 siblings, 1 reply; 37+ messages in thread
From: Siarhei Siamashka @ 2013-12-20 6:29 UTC (permalink / raw)
To: linux-arm-kernel
On Thu, 19 Dec 2013 11:48:16 +0000
Catalin Marinas <catalin.marinas@arm.com> wrote:
> On Thu, Dec 19, 2013 at 06:48:16AM +0000, Siarhei Siamashka wrote:
> > On Wed, 18 Dec 2013 22:57:33 +0100
> > Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> >
> > > On 18 December 2013 22:18, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> > > > On Wed, 18 Dec 2013, Ard Biesheuvel wrote:
> > > >> The nice thing about hwcaps is that it is already integrated into the
> > > >> ifunc resolution done by the loader, which makes it very easy and
> > > >> straightforward to offer alternative implementations of library
> > > >> functions based on CPU capabilities.
> > > >
> > > > The library may as well implement its own ifunc that tests the
> > > > instruction while trapping SIGILL. On those systems with the supported
> > > > instruction there will be no trap. On those that traps then the
> > > > alternative implementation is going to be much slower anyway.
> > > >
> > >
> > > True. And the trap still only occurs at load time. But I think we
> > > agree it is essentially a poor man's hwcaps.
> >
> > And the hwcaps is essentially a poor man's replacement for a userspace
> > accessible CPUID instruction enjoyed by x86.
>
> hwcaps has its value but I agree that some quicker access would be good
> in certain cases. However, simply exposing the CPUID scheme to user
> space may look nice initially but has other problems. All the
> discussions we had (in ARM) basically ended up with having some scratch
> registers that could be accessed from user via mrs and the kernel would
> either copy the CPUID registers or hwcap-like bits (but basically it is
> just an ABI between user and kernel).
Sorry, I don't seem to follow what exactly was wrong with this approach.
It looks like a good idea to me. Was it abandoned?
> > So there is no really good alternative to /proc/cpuinfo parsing. But
> > text parsing is relatively cumbersome to implement. And this method is
> > obviously not blazingly fast. Also the big.LITTLE systems introduce
> > an interesting new challenge. How do we know whether we are running
> > the code on Cortex-A7 or Cortex-A15 at any arbitrary moment? We might
> > want to have several different assembly optimized functions, one
> > optimized for Cortex-A15 pipeline and another one optimized for
> > Cortex-A7. It would be nice to be able to frequently poll for the CPU
> > features of the currently running CPU core (for example, once per
> > frame in a video encoder/decoder) to select the fastest code path.
> > With /proc/cpuinfo text parsing this is not going to work nicely.
>
> With big.LITTLE user-space can't tell on which CPU it is running. Even
> if it could, it needs to cope with preemption and migration to another
> CPU. If we assume the that the same features are present on both, some
> routines may occasionally be unoptimal but it shouldn't be that bad.
Just periodically checking the type of the currently running CPU and
adapting at runtime could perhaps make the performance better on
average. We don't strictly need to ensure that the choice of some
optimized function is always optimal for the currently running CPU.
It would be perfectly fine if it's right most of the time.
> Anyway, for such A7/A15 combinations, the idea is to optimise for A7's
> pipeline since A15 execution is more out of order and tolerant to
> instruction order.
So all the software is supposed to be optimized just for A7 in the
A7/A15 big.LITTLE combinations?
Let's take some video codec as an example. If somebody starts
multi-threaded transcoding of some video hogging all CPU cores, then
the execution is going to be migrated to A15, right? In this case it
makes sense to have this codec optimized for A15.
But if somebody just uses this codec for watching some video (the
faster than realtime performance is not required), then the execution
could be migrated to A7 and we are going to be more worried about
optimizing for A7 and reducing power consumption.
Anyway, I'm not going to argue whether it is useful or not (it would
only work if there is non-negligible difference between A7-tuned and
A15-tuned code when run on the right or wrong core). But having a
simple and low overhead CPU type and features detection could allow
to experiment with the optimizations like this.
> > The best solution would be in my opinion a userspace accessible (and
> > guaranteed not to trap) CPUID instruction. This has proven to work
> > nicely for x86, so why inventing something overly complicated instead?
> > In the case if the OS wants to conceal the CPU features from the
> > userspace application, some special "I don't want to tell you,
> > please use the slowest code path possible" value could be defined
> > and returned by this instruction.
>
> As I said above, just raw access to the CPUID registers may not always
> be desirable. Some features require kernel support (like FP register
> saving/restoring), so if you run an older kernel on a newer CPU you
> shouldn't really use such feature.
>
> (I'm also not entirely sure about crypto stuff and export regulations,
> whether a mobile vendor may want to disable some hwcap bits in kernel
> even though the hardware supports it)
AFAIK the new registers saving/restoring is somehow handled in the x86
world?
One argument that I heard against providing raw access to the CPUID
registers was that it could help evil hackers to identify the core type
and revision. And then they could use this information for exploiting
some errata.
But doesn't having a sanitized copy of CPUID register values in the
scratch registers that you mentioned earlier solve all the problems?
> > Well, if it's not desired (and already too late) to change how the
> > hardware works, another solution would be to have runtime CPU
> > features detection supported as part of the run-time ABI. For example,
> > make it mandatory for any EABI conforming system to provide some helper
> > functions like __aeabi_read_midr() or __aeabi_read_hwcaps(). They could
> > be implemented for ARM Linux via the kernel-provided user helpers, VDSO
> > or whatever other method that is appropriate. If this works for the
> > things like TLS (__aeabi_read_tp), why can't it work for runtime CPU
> > features detection too? The recent gcc versions also have some nice
> > built-in functions for runtime cpu features detection on x86
> > such as __builtin_cpu_is(), __builtin_cpu_supports():
> > http://gcc.gnu.org/gcc-4.8/changes.html
>
> We discussed this in ARM with the toolchain guys and I'm fine with the
> idea. But for backwards compatibility, we would need a way for newer
> software to work on older kernels. On arm64, with VDSO is easier since
> glibc could have a weak function that returns not-implemented. I would
> rather have a VDSO on arm as well rather than abusing the vectors page.
>
> If you want to distinguish between CPUs, we can use one of the unused
> TLS registers as offset into a VDSO data array with per-CPU information
> (all handled via the VDSO code, so user shouldn't really know the
> meaning). We have a user read-only thread register unused on arm64 (and
> that's what we had in mind when using the read/write register for user
> TLS).
Sounds good. And I like that this proposal has not been immediately
dismissed yet. Would somebody from ARM or Linaro be willing to invest
some time into trying to develop some prototype patches (for AArch64)?
If I were to develop some prototype for 32-bit arm, it would probably
have kuser helpers extended to add a new function which would just
return a 32-bit variable, initialized to store a copy of MIDR value.
Then add the __aeabi_read_midr() function (to libgcc instead of glibc),
which would rely on check_kuser_version() and the new kuser helper
function. And then try to add the __builtin_cpu_is() built-in function
to gcc, which would use this new helper function for getting the MIDR
value and checking the cpu type. Using libgcc would eliminate any
dependency on glibc version. I believe it would only take a new gcc
release to have this feature working in applications. And it would
only take a new kernel release for this built-in function to actually
identify cpu types instead of returning 0 (or maybe -1 to indicate that
the cpu type check has failed). However kuser helpers have security
implications:
http://lwn.net/Articles/562443/
And the kuser helpers are already disabled in Android (if I understand
this lwn article right). This kinda defeats the purpose if this
feature is not going to work on all Linux systems. So now I wonder,
how difficult would it be to get VDSO working on 32-bit arm?
If the clever and more knowledgeable guys from around here could
advice something, that would be surely appreciated.
> However, that's an optimisation and I don't think it should replace
> hwcap bits for new features.
Yes, it's understandable that the hwcap bits are still in use. And
they are going to be in use in the foreseeable future (maybe even
forver?).
--
Best regards,
Siarhei Siamashka
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 0/4] arm64: advertise availability of CRC and crypto instructions
2013-12-20 6:29 ` Siarhei Siamashka
@ 2013-12-20 11:27 ` Catalin Marinas
0 siblings, 0 replies; 37+ messages in thread
From: Catalin Marinas @ 2013-12-20 11:27 UTC (permalink / raw)
To: linux-arm-kernel
On Fri, Dec 20, 2013 at 06:29:26AM +0000, Siarhei Siamashka wrote:
> On Thu, 19 Dec 2013 11:48:16 +0000
> Catalin Marinas <catalin.marinas@arm.com> wrote:
> > On Thu, Dec 19, 2013 at 06:48:16AM +0000, Siarhei Siamashka wrote:
> > > On Wed, 18 Dec 2013 22:57:33 +0100
> > > Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> > > > On 18 December 2013 22:18, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> > > > > On Wed, 18 Dec 2013, Ard Biesheuvel wrote:
> > > > >> The nice thing about hwcaps is that it is already integrated into the
> > > > >> ifunc resolution done by the loader, which makes it very easy and
> > > > >> straightforward to offer alternative implementations of library
> > > > >> functions based on CPU capabilities.
> > > > >
> > > > > The library may as well implement its own ifunc that tests the
> > > > > instruction while trapping SIGILL. On those systems with the supported
> > > > > instruction there will be no trap. On those that traps then the
> > > > > alternative implementation is going to be much slower anyway.
> > > >
> > > > True. And the trap still only occurs at load time. But I think we
> > > > agree it is essentially a poor man's hwcaps.
> > >
> > > And the hwcaps is essentially a poor man's replacement for a userspace
> > > accessible CPUID instruction enjoyed by x86.
> >
> > hwcaps has its value but I agree that some quicker access would be good
> > in certain cases. However, simply exposing the CPUID scheme to user
> > space may look nice initially but has other problems. All the
> > discussions we had (in ARM) basically ended up with having some scratch
> > registers that could be accessed from user via mrs and the kernel would
> > either copy the CPUID registers or hwcap-like bits (but basically it is
> > just an ABI between user and kernel).
>
> Sorry, I don't seem to follow what exactly was wrong with this approach.
> It looks like a good idea to me. Was it abandoned?
It isn't present in ARMv8/AArch64. My point was that it pretty much
turns into a software-only ABI with another set of registers similar to
the thread ones. That's where you need to balance between more hardware
registers and a software VDSO-like mechanism.
> > Anyway, for such A7/A15 combinations, the idea is to optimise for A7's
> > pipeline since A15 execution is more out of order and tolerant to
> > instruction order.
>
> So all the software is supposed to be optimized just for A7 in the
> A7/A15 big.LITTLE combinations?
>
> Let's take some video codec as an example. If somebody starts
> multi-threaded transcoding of some video hogging all CPU cores, then
> the execution is going to be migrated to A15, right? In this case it
> makes sense to have this codec optimized for A15.
As I said above, the A15 is more tolerant to pipeline optimisations, so
you may not see a significant difference if you optimise for A7 or A15.
But I haven't done any benchmarks, that's what the toolchain guys say.
> > > The best solution would be in my opinion a userspace accessible (and
> > > guaranteed not to trap) CPUID instruction. This has proven to work
> > > nicely for x86, so why inventing something overly complicated instead?
> > > In the case if the OS wants to conceal the CPU features from the
> > > userspace application, some special "I don't want to tell you,
> > > please use the slowest code path possible" value could be defined
> > > and returned by this instruction.
> >
> > As I said above, just raw access to the CPUID registers may not always
> > be desirable. Some features require kernel support (like FP register
> > saving/restoring), so if you run an older kernel on a newer CPU you
> > shouldn't really use such feature.
> >
> > (I'm also not entirely sure about crypto stuff and export regulations,
> > whether a mobile vendor may want to disable some hwcap bits in kernel
> > even though the hardware supports it)
>
> AFAIK the new registers saving/restoring is somehow handled in the x86
> world?
ARM is not x86.
A past example is VFP with 16 double registers and we later got Neon
with 32. The kernel needs updating to save/restore the extra registers.
> > > Well, if it's not desired (and already too late) to change how the
> > > hardware works, another solution would be to have runtime CPU
> > > features detection supported as part of the run-time ABI. For example,
> > > make it mandatory for any EABI conforming system to provide some helper
> > > functions like __aeabi_read_midr() or __aeabi_read_hwcaps(). They could
> > > be implemented for ARM Linux via the kernel-provided user helpers, VDSO
> > > or whatever other method that is appropriate. If this works for the
> > > things like TLS (__aeabi_read_tp), why can't it work for runtime CPU
> > > features detection too? The recent gcc versions also have some nice
> > > built-in functions for runtime cpu features detection on x86
> > > such as __builtin_cpu_is(), __builtin_cpu_supports():
> > > http://gcc.gnu.org/gcc-4.8/changes.html
> >
> > We discussed this in ARM with the toolchain guys and I'm fine with the
> > idea. But for backwards compatibility, we would need a way for newer
> > software to work on older kernels. On arm64, with VDSO is easier since
> > glibc could have a weak function that returns not-implemented. I would
> > rather have a VDSO on arm as well rather than abusing the vectors page.
> >
> > If you want to distinguish between CPUs, we can use one of the unused
> > TLS registers as offset into a VDSO data array with per-CPU information
> > (all handled via the VDSO code, so user shouldn't really know the
> > meaning). We have a user read-only thread register unused on arm64 (and
> > that's what we had in mind when using the read/write register for user
> > TLS).
>
> Sounds good. And I like that this proposal has not been immediately
> dismissed yet. Would somebody from ARM or Linaro be willing to invest
> some time into trying to develop some prototype patches (for AArch64)?
I think the kernel patches part is not hard, it's more like talking to
the toolchain/library guys and agreeing on the actual ABI, how much
information we want to expose.
AFAIK so far the decision on which library to use is done at the dynamic
linking time based on the hwcap bits. If we make this some __builtin_*
in gcc, I think it cannot be overridden dynamically via VDSO. So better
get some toolchain guys involved first.
(and yes, it could be a nice Linaro project ;))
> So now I wonder, how difficult would it be to get VDSO working on
> 32-bit arm?
Couple of days I guess ;).
--
Catalin
^ permalink raw reply [flat|nested] 37+ messages in thread
end of thread, other threads:[~2013-12-20 11:27 UTC | newest]
Thread overview: 37+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-12-16 21:04 [PATCH 0/4] arm64: advertise availability of CRC and crypto instructions Ard Biesheuvel
2013-12-16 21:04 ` [PATCH 1/4] arm64: drop redundant macros from read_cpuid() Ard Biesheuvel
2013-12-17 12:04 ` Catalin Marinas
2013-12-17 12:10 ` Will Deacon
2013-12-17 12:12 ` Catalin Marinas
2013-12-16 21:04 ` [PATCH 2/4] arm64: Add hwcaps for crypto and CRC32 extensions Ard Biesheuvel
2013-12-17 12:08 ` Catalin Marinas
2013-12-17 12:11 ` Catalin Marinas
2013-12-16 21:04 ` [PATCH 3/4] ARM: allocate hwcaps bits for v8 crypto extensions Ard Biesheuvel
2013-12-16 21:04 ` [PATCH 4/4] arm64: add 32-bit compat hwcaps " Ard Biesheuvel
2013-12-17 12:25 ` [PATCH 0/4] arm64: advertise availability of CRC and crypto instructions Catalin Marinas
2013-12-18 9:50 ` Ard Biesheuvel
2013-12-18 10:03 ` Russell King - ARM Linux
2013-12-18 10:25 ` Ard Biesheuvel
2013-12-18 10:55 ` Russell King - ARM Linux
2013-12-18 11:15 ` Ard Biesheuvel
2013-12-18 11:27 ` Catalin Marinas
2013-12-18 11:34 ` Catalin Marinas
2013-12-18 11:42 ` Russell King - ARM Linux
2013-12-18 11:59 ` Ard Biesheuvel
2013-12-18 12:03 ` Catalin Marinas
2013-12-18 14:27 ` Christopher Covington
2013-12-18 16:13 ` Ard Biesheuvel
2013-12-18 17:29 ` Catalin Marinas
2013-12-18 18:50 ` Ard Biesheuvel
2013-12-19 11:11 ` Catalin Marinas
2013-12-18 19:57 ` Nicolas Pitre
2013-12-18 20:26 ` Ard Biesheuvel
2013-12-18 21:18 ` Nicolas Pitre
2013-12-18 21:57 ` Ard Biesheuvel
2013-12-19 6:48 ` Siarhei Siamashka
2013-12-19 11:48 ` Catalin Marinas
2013-12-20 6:29 ` Siarhei Siamashka
2013-12-20 11:27 ` Catalin Marinas
2013-12-19 17:33 ` Ard Biesheuvel
2013-12-20 1:35 ` Siarhei Siamashka
2013-12-19 18:07 ` Nicolas Pitre
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).