* [PATCH 00/10] crypto: x86 - avoid absolute references
@ 2023-04-08 15:27 Ard Biesheuvel
2023-04-08 15:27 ` [PATCH 01/10] crypto: x86/aegis128 - Use RIP-relative addressing Ard Biesheuvel
` (10 more replies)
0 siblings, 11 replies; 13+ messages in thread
From: Ard Biesheuvel @ 2023-04-08 15:27 UTC (permalink / raw)
To: linux-crypto; +Cc: Ard Biesheuvel, Herbert Xu, Eric Biggers, Kees Cook
This is preparatory work for allowing the kernel to be built as a PIE
executable, which relies mostly on RIP-relative symbol references from
code, which don't need to be updated when a binary is loaded at an
address different from its link time address.
Most changes are quite straight-forward, i.e., just adding a (%rip)
suffix is enough in many cases. However, some are slightly trickier, and
need some minor reshuffling of the asm code to get rid of the absolute
references in the code.
Tested with CONFIG_CRYPTO_MANAGER_EXTRA_TESTS=y on a x86 CPU that
implements AVX, AVX2 and AVX512.
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Ard Biesheuvel (10):
crypto: x86/aegis128 - Use RIP-relative addressing
crypto: x86/aesni - Use RIP-relative addressing
crypto: x86/aria - Use RIP-relative addressing
crypto: x86/camellia - Use RIP-relative addressing
crypto: x86/cast5 - Use RIP-relative addressing
crypto: x86/cast6 - Use RIP-relative addressing
crypto: x86/crc32c - Use RIP-relative addressing
crypto: x86/des3 - Use RIP-relative addressing
crypto: x86/ghash - Use RIP-relative addressing
crypto: x86/sha256 - Use RIP-relative addressing
arch/x86/crypto/aegis128-aesni-asm.S | 6 +-
arch/x86/crypto/aesni-intel_asm.S | 2 +-
arch/x86/crypto/aesni-intel_avx-x86_64.S | 6 +-
arch/x86/crypto/aria-aesni-avx-asm_64.S | 28 +++---
arch/x86/crypto/aria-aesni-avx2-asm_64.S | 28 +++---
arch/x86/crypto/aria-gfni-avx512-asm_64.S | 24 ++---
arch/x86/crypto/camellia-aesni-avx-asm_64.S | 30 +++---
arch/x86/crypto/camellia-aesni-avx2-asm_64.S | 30 +++---
arch/x86/crypto/camellia-x86_64-asm_64.S | 8 +-
arch/x86/crypto/cast5-avx-x86_64-asm_64.S | 50 +++++-----
arch/x86/crypto/cast6-avx-x86_64-asm_64.S | 44 +++++----
arch/x86/crypto/crc32c-pcl-intel-asm_64.S | 3 +-
arch/x86/crypto/des3_ede-asm_64.S | 96 +++++++++++++-------
arch/x86/crypto/ghash-clmulni-intel_asm.S | 4 +-
arch/x86/crypto/sha256-avx2-asm.S | 18 ++--
15 files changed, 213 insertions(+), 164 deletions(-)
--
2.39.2
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 01/10] crypto: x86/aegis128 - Use RIP-relative addressing
2023-04-08 15:27 [PATCH 00/10] crypto: x86 - avoid absolute references Ard Biesheuvel
@ 2023-04-08 15:27 ` Ard Biesheuvel
2023-04-08 15:27 ` [PATCH 02/10] crypto: x86/aesni " Ard Biesheuvel
` (9 subsequent siblings)
10 siblings, 0 replies; 13+ messages in thread
From: Ard Biesheuvel @ 2023-04-08 15:27 UTC (permalink / raw)
To: linux-crypto; +Cc: Ard Biesheuvel, Herbert Xu, Eric Biggers, Kees Cook
Prefer RIP-relative addressing where possible, which removes the need
for boot time relocation fixups.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/crypto/aegis128-aesni-asm.S | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/arch/x86/crypto/aegis128-aesni-asm.S b/arch/x86/crypto/aegis128-aesni-asm.S
index cdf3215ec272ced2..ad7f4c89162568b0 100644
--- a/arch/x86/crypto/aegis128-aesni-asm.S
+++ b/arch/x86/crypto/aegis128-aesni-asm.S
@@ -201,8 +201,8 @@ SYM_FUNC_START(crypto_aegis128_aesni_init)
movdqa KEY, STATE4
/* load the constants: */
- movdqa .Laegis128_const_0, STATE2
- movdqa .Laegis128_const_1, STATE1
+ movdqa .Laegis128_const_0(%rip), STATE2
+ movdqa .Laegis128_const_1(%rip), STATE1
pxor STATE2, STATE3
pxor STATE1, STATE4
@@ -682,7 +682,7 @@ SYM_TYPED_FUNC_START(crypto_aegis128_aesni_dec_tail)
punpcklbw T0, T0
punpcklbw T0, T0
punpcklbw T0, T0
- movdqa .Laegis128_counter, T1
+ movdqa .Laegis128_counter(%rip), T1
pcmpgtb T1, T0
pand T0, MSG
--
2.39.2
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 02/10] crypto: x86/aesni - Use RIP-relative addressing
2023-04-08 15:27 [PATCH 00/10] crypto: x86 - avoid absolute references Ard Biesheuvel
2023-04-08 15:27 ` [PATCH 01/10] crypto: x86/aegis128 - Use RIP-relative addressing Ard Biesheuvel
@ 2023-04-08 15:27 ` Ard Biesheuvel
2023-04-08 15:27 ` [PATCH 03/10] crypto: x86/aria " Ard Biesheuvel
` (8 subsequent siblings)
10 siblings, 0 replies; 13+ messages in thread
From: Ard Biesheuvel @ 2023-04-08 15:27 UTC (permalink / raw)
To: linux-crypto; +Cc: Ard Biesheuvel, Herbert Xu, Eric Biggers, Kees Cook
Prefer RIP-relative addressing where possible, which removes the need
for boot time relocation fixups.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/crypto/aesni-intel_asm.S | 2 +-
arch/x86/crypto/aesni-intel_avx-x86_64.S | 6 +++---
2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/arch/x86/crypto/aesni-intel_asm.S b/arch/x86/crypto/aesni-intel_asm.S
index 837c1e0aa0217783..ca99a2274d551015 100644
--- a/arch/x86/crypto/aesni-intel_asm.S
+++ b/arch/x86/crypto/aesni-intel_asm.S
@@ -2717,7 +2717,7 @@ SYM_FUNC_END(aesni_cts_cbc_dec)
* BSWAP_MASK == endian swapping mask
*/
SYM_FUNC_START_LOCAL(_aesni_inc_init)
- movaps .Lbswap_mask, BSWAP_MASK
+ movaps .Lbswap_mask(%rip), BSWAP_MASK
movaps IV, CTR
pshufb BSWAP_MASK, CTR
mov $1, TCTR_LOW
diff --git a/arch/x86/crypto/aesni-intel_avx-x86_64.S b/arch/x86/crypto/aesni-intel_avx-x86_64.S
index 0852ab573fd306ac..cb6acca1550b78a6 100644
--- a/arch/x86/crypto/aesni-intel_avx-x86_64.S
+++ b/arch/x86/crypto/aesni-intel_avx-x86_64.S
@@ -647,9 +647,9 @@ _get_AAD_rest0\@:
/* finalize: shift out the extra bytes we read, and align
left. since pslldq can only shift by an immediate, we use
vpshufb and an array of shuffle masks */
- movq %r12, %r11
- salq $4, %r11
- vmovdqu aad_shift_arr(%r11), \T1
+ leaq aad_shift_arr(%rip), %r11
+ leaq (%r11, %r12, 8), %r11
+ vmovdqu (%r11, %r12, 8), \T1
vpshufb \T1, \T7, \T7
_get_AAD_rest_final\@:
vpshufb SHUF_MASK(%rip), \T7, \T7
--
2.39.2
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 03/10] crypto: x86/aria - Use RIP-relative addressing
2023-04-08 15:27 [PATCH 00/10] crypto: x86 - avoid absolute references Ard Biesheuvel
2023-04-08 15:27 ` [PATCH 01/10] crypto: x86/aegis128 - Use RIP-relative addressing Ard Biesheuvel
2023-04-08 15:27 ` [PATCH 02/10] crypto: x86/aesni " Ard Biesheuvel
@ 2023-04-08 15:27 ` Ard Biesheuvel
2023-04-08 15:27 ` [PATCH 04/10] crypto: x86/camellia " Ard Biesheuvel
` (7 subsequent siblings)
10 siblings, 0 replies; 13+ messages in thread
From: Ard Biesheuvel @ 2023-04-08 15:27 UTC (permalink / raw)
To: linux-crypto; +Cc: Ard Biesheuvel, Herbert Xu, Eric Biggers, Kees Cook
Prefer RIP-relative addressing where possible, which removes the need
for boot time relocation fixups.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/crypto/aria-aesni-avx-asm_64.S | 28 ++++++++++----------
arch/x86/crypto/aria-aesni-avx2-asm_64.S | 28 ++++++++++----------
arch/x86/crypto/aria-gfni-avx512-asm_64.S | 24 ++++++++---------
3 files changed, 40 insertions(+), 40 deletions(-)
diff --git a/arch/x86/crypto/aria-aesni-avx-asm_64.S b/arch/x86/crypto/aria-aesni-avx-asm_64.S
index 9243f6289d34bfbf..7c1abc513f34621e 100644
--- a/arch/x86/crypto/aria-aesni-avx-asm_64.S
+++ b/arch/x86/crypto/aria-aesni-avx-asm_64.S
@@ -80,7 +80,7 @@
transpose_4x4(c0, c1, c2, c3, a0, a1); \
transpose_4x4(d0, d1, d2, d3, a0, a1); \
\
- vmovdqu .Lshufb_16x16b, a0; \
+ vmovdqu .Lshufb_16x16b(%rip), a0; \
vmovdqu st1, a1; \
vpshufb a0, a2, a2; \
vpshufb a0, a3, a3; \
@@ -132,7 +132,7 @@
transpose_4x4(c0, c1, c2, c3, a0, a1); \
transpose_4x4(d0, d1, d2, d3, a0, a1); \
\
- vmovdqu .Lshufb_16x16b, a0; \
+ vmovdqu .Lshufb_16x16b(%rip), a0; \
vmovdqu st1, a1; \
vpshufb a0, a2, a2; \
vpshufb a0, a3, a3; \
@@ -300,11 +300,11 @@
x4, x5, x6, x7, \
t0, t1, t2, t3, \
t4, t5, t6, t7) \
- vmovdqa .Ltf_s2_bitmatrix, t0; \
- vmovdqa .Ltf_inv_bitmatrix, t1; \
- vmovdqa .Ltf_id_bitmatrix, t2; \
- vmovdqa .Ltf_aff_bitmatrix, t3; \
- vmovdqa .Ltf_x2_bitmatrix, t4; \
+ vmovdqa .Ltf_s2_bitmatrix(%rip), t0; \
+ vmovdqa .Ltf_inv_bitmatrix(%rip), t1; \
+ vmovdqa .Ltf_id_bitmatrix(%rip), t2; \
+ vmovdqa .Ltf_aff_bitmatrix(%rip), t3; \
+ vmovdqa .Ltf_x2_bitmatrix(%rip), t4; \
vgf2p8affineinvqb $(tf_s2_const), t0, x1, x1; \
vgf2p8affineinvqb $(tf_s2_const), t0, x5, x5; \
vgf2p8affineqb $(tf_inv_const), t1, x2, x2; \
@@ -324,13 +324,13 @@
x4, x5, x6, x7, \
t0, t1, t2, t3, \
t4, t5, t6, t7) \
- vmovdqa .Linv_shift_row, t0; \
- vmovdqa .Lshift_row, t1; \
- vbroadcastss .L0f0f0f0f, t6; \
- vmovdqa .Ltf_lo__inv_aff__and__s2, t2; \
- vmovdqa .Ltf_hi__inv_aff__and__s2, t3; \
- vmovdqa .Ltf_lo__x2__and__fwd_aff, t4; \
- vmovdqa .Ltf_hi__x2__and__fwd_aff, t5; \
+ vmovdqa .Linv_shift_row(%rip), t0; \
+ vmovdqa .Lshift_row(%rip), t1; \
+ vbroadcastss .L0f0f0f0f(%rip), t6; \
+ vmovdqa .Ltf_lo__inv_aff__and__s2(%rip), t2; \
+ vmovdqa .Ltf_hi__inv_aff__and__s2(%rip), t3; \
+ vmovdqa .Ltf_lo__x2__and__fwd_aff(%rip), t4; \
+ vmovdqa .Ltf_hi__x2__and__fwd_aff(%rip), t5; \
\
vaesenclast t7, x0, x0; \
vaesenclast t7, x4, x4; \
diff --git a/arch/x86/crypto/aria-aesni-avx2-asm_64.S b/arch/x86/crypto/aria-aesni-avx2-asm_64.S
index 82a14b4ad920f792..c60fa2980630379b 100644
--- a/arch/x86/crypto/aria-aesni-avx2-asm_64.S
+++ b/arch/x86/crypto/aria-aesni-avx2-asm_64.S
@@ -96,7 +96,7 @@
transpose_4x4(c0, c1, c2, c3, a0, a1); \
transpose_4x4(d0, d1, d2, d3, a0, a1); \
\
- vbroadcasti128 .Lshufb_16x16b, a0; \
+ vbroadcasti128 .Lshufb_16x16b(%rip), a0; \
vmovdqu st1, a1; \
vpshufb a0, a2, a2; \
vpshufb a0, a3, a3; \
@@ -148,7 +148,7 @@
transpose_4x4(c0, c1, c2, c3, a0, a1); \
transpose_4x4(d0, d1, d2, d3, a0, a1); \
\
- vbroadcasti128 .Lshufb_16x16b, a0; \
+ vbroadcasti128 .Lshufb_16x16b(%rip), a0; \
vmovdqu st1, a1; \
vpshufb a0, a2, a2; \
vpshufb a0, a3, a3; \
@@ -307,11 +307,11 @@
x4, x5, x6, x7, \
t0, t1, t2, t3, \
t4, t5, t6, t7) \
- vpbroadcastq .Ltf_s2_bitmatrix, t0; \
- vpbroadcastq .Ltf_inv_bitmatrix, t1; \
- vpbroadcastq .Ltf_id_bitmatrix, t2; \
- vpbroadcastq .Ltf_aff_bitmatrix, t3; \
- vpbroadcastq .Ltf_x2_bitmatrix, t4; \
+ vpbroadcastq .Ltf_s2_bitmatrix(%rip), t0; \
+ vpbroadcastq .Ltf_inv_bitmatrix(%rip), t1; \
+ vpbroadcastq .Ltf_id_bitmatrix(%rip), t2; \
+ vpbroadcastq .Ltf_aff_bitmatrix(%rip), t3; \
+ vpbroadcastq .Ltf_x2_bitmatrix(%rip), t4; \
vgf2p8affineinvqb $(tf_s2_const), t0, x1, x1; \
vgf2p8affineinvqb $(tf_s2_const), t0, x5, x5; \
vgf2p8affineqb $(tf_inv_const), t1, x2, x2; \
@@ -332,12 +332,12 @@
t4, t5, t6, t7) \
vpxor t7, t7, t7; \
vpxor t6, t6, t6; \
- vbroadcasti128 .Linv_shift_row, t0; \
- vbroadcasti128 .Lshift_row, t1; \
- vbroadcasti128 .Ltf_lo__inv_aff__and__s2, t2; \
- vbroadcasti128 .Ltf_hi__inv_aff__and__s2, t3; \
- vbroadcasti128 .Ltf_lo__x2__and__fwd_aff, t4; \
- vbroadcasti128 .Ltf_hi__x2__and__fwd_aff, t5; \
+ vbroadcasti128 .Linv_shift_row(%rip), t0; \
+ vbroadcasti128 .Lshift_row(%rip), t1; \
+ vbroadcasti128 .Ltf_lo__inv_aff__and__s2(%rip), t2; \
+ vbroadcasti128 .Ltf_hi__inv_aff__and__s2(%rip), t3; \
+ vbroadcasti128 .Ltf_lo__x2__and__fwd_aff(%rip), t4; \
+ vbroadcasti128 .Ltf_hi__x2__and__fwd_aff(%rip), t5; \
\
vextracti128 $1, x0, t6##_x; \
vaesenclast t7##_x, x0##_x, x0##_x; \
@@ -369,7 +369,7 @@
vaesdeclast t7##_x, t6##_x, t6##_x; \
vinserti128 $1, t6##_x, x6, x6; \
\
- vpbroadcastd .L0f0f0f0f, t6; \
+ vpbroadcastd .L0f0f0f0f(%rip), t6; \
\
/* AES inverse shift rows */ \
vpshufb t0, x0, x0; \
diff --git a/arch/x86/crypto/aria-gfni-avx512-asm_64.S b/arch/x86/crypto/aria-gfni-avx512-asm_64.S
index 3193f07014506655..860887e5d02ed6ef 100644
--- a/arch/x86/crypto/aria-gfni-avx512-asm_64.S
+++ b/arch/x86/crypto/aria-gfni-avx512-asm_64.S
@@ -80,7 +80,7 @@
transpose_4x4(c0, c1, c2, c3, a0, a1); \
transpose_4x4(d0, d1, d2, d3, a0, a1); \
\
- vbroadcasti64x2 .Lshufb_16x16b, a0; \
+ vbroadcasti64x2 .Lshufb_16x16b(%rip), a0; \
vmovdqu64 st1, a1; \
vpshufb a0, a2, a2; \
vpshufb a0, a3, a3; \
@@ -132,7 +132,7 @@
transpose_4x4(c0, c1, c2, c3, a0, a1); \
transpose_4x4(d0, d1, d2, d3, a0, a1); \
\
- vbroadcasti64x2 .Lshufb_16x16b, a0; \
+ vbroadcasti64x2 .Lshufb_16x16b(%rip), a0; \
vmovdqu64 st1, a1; \
vpshufb a0, a2, a2; \
vpshufb a0, a3, a3; \
@@ -308,11 +308,11 @@
x4, x5, x6, x7, \
t0, t1, t2, t3, \
t4, t5, t6, t7) \
- vpbroadcastq .Ltf_s2_bitmatrix, t0; \
- vpbroadcastq .Ltf_inv_bitmatrix, t1; \
- vpbroadcastq .Ltf_id_bitmatrix, t2; \
- vpbroadcastq .Ltf_aff_bitmatrix, t3; \
- vpbroadcastq .Ltf_x2_bitmatrix, t4; \
+ vpbroadcastq .Ltf_s2_bitmatrix(%rip), t0; \
+ vpbroadcastq .Ltf_inv_bitmatrix(%rip), t1; \
+ vpbroadcastq .Ltf_id_bitmatrix(%rip), t2; \
+ vpbroadcastq .Ltf_aff_bitmatrix(%rip), t3; \
+ vpbroadcastq .Ltf_x2_bitmatrix(%rip), t4; \
vgf2p8affineinvqb $(tf_s2_const), t0, x1, x1; \
vgf2p8affineinvqb $(tf_s2_const), t0, x5, x5; \
vgf2p8affineqb $(tf_inv_const), t1, x2, x2; \
@@ -332,11 +332,11 @@
y4, y5, y6, y7, \
t0, t1, t2, t3, \
t4, t5, t6, t7) \
- vpbroadcastq .Ltf_s2_bitmatrix, t0; \
- vpbroadcastq .Ltf_inv_bitmatrix, t1; \
- vpbroadcastq .Ltf_id_bitmatrix, t2; \
- vpbroadcastq .Ltf_aff_bitmatrix, t3; \
- vpbroadcastq .Ltf_x2_bitmatrix, t4; \
+ vpbroadcastq .Ltf_s2_bitmatrix(%rip), t0; \
+ vpbroadcastq .Ltf_inv_bitmatrix(%rip), t1; \
+ vpbroadcastq .Ltf_id_bitmatrix(%rip), t2; \
+ vpbroadcastq .Ltf_aff_bitmatrix(%rip), t3; \
+ vpbroadcastq .Ltf_x2_bitmatrix(%rip), t4; \
vgf2p8affineinvqb $(tf_s2_const), t0, x1, x1; \
vgf2p8affineinvqb $(tf_s2_const), t0, x5, x5; \
vgf2p8affineqb $(tf_inv_const), t1, x2, x2; \
--
2.39.2
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 04/10] crypto: x86/camellia - Use RIP-relative addressing
2023-04-08 15:27 [PATCH 00/10] crypto: x86 - avoid absolute references Ard Biesheuvel
` (2 preceding siblings ...)
2023-04-08 15:27 ` [PATCH 03/10] crypto: x86/aria " Ard Biesheuvel
@ 2023-04-08 15:27 ` Ard Biesheuvel
2023-04-08 15:27 ` [PATCH 05/10] crypto: x86/cast5 " Ard Biesheuvel
` (6 subsequent siblings)
10 siblings, 0 replies; 13+ messages in thread
From: Ard Biesheuvel @ 2023-04-08 15:27 UTC (permalink / raw)
To: linux-crypto; +Cc: Ard Biesheuvel, Herbert Xu, Eric Biggers, Kees Cook
Prefer RIP-relative addressing where possible, which removes the need
for boot time relocation fixups.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/crypto/camellia-aesni-avx-asm_64.S | 30 ++++++++++----------
arch/x86/crypto/camellia-aesni-avx2-asm_64.S | 30 ++++++++++----------
arch/x86/crypto/camellia-x86_64-asm_64.S | 8 ++++--
3 files changed, 35 insertions(+), 33 deletions(-)
diff --git a/arch/x86/crypto/camellia-aesni-avx-asm_64.S b/arch/x86/crypto/camellia-aesni-avx-asm_64.S
index 4a30618281ec2e9e..646477a13e110fed 100644
--- a/arch/x86/crypto/camellia-aesni-avx-asm_64.S
+++ b/arch/x86/crypto/camellia-aesni-avx-asm_64.S
@@ -52,10 +52,10 @@
/* \
* S-function with AES subbytes \
*/ \
- vmovdqa .Linv_shift_row, t4; \
- vbroadcastss .L0f0f0f0f, t7; \
- vmovdqa .Lpre_tf_lo_s1, t0; \
- vmovdqa .Lpre_tf_hi_s1, t1; \
+ vmovdqa .Linv_shift_row(%rip), t4; \
+ vbroadcastss .L0f0f0f0f(%rip), t7; \
+ vmovdqa .Lpre_tf_lo_s1(%rip), t0; \
+ vmovdqa .Lpre_tf_hi_s1(%rip), t1; \
\
/* AES inverse shift rows */ \
vpshufb t4, x0, x0; \
@@ -68,8 +68,8 @@
vpshufb t4, x6, x6; \
\
/* prefilter sboxes 1, 2 and 3 */ \
- vmovdqa .Lpre_tf_lo_s4, t2; \
- vmovdqa .Lpre_tf_hi_s4, t3; \
+ vmovdqa .Lpre_tf_lo_s4(%rip), t2; \
+ vmovdqa .Lpre_tf_hi_s4(%rip), t3; \
filter_8bit(x0, t0, t1, t7, t6); \
filter_8bit(x7, t0, t1, t7, t6); \
filter_8bit(x1, t0, t1, t7, t6); \
@@ -83,8 +83,8 @@
filter_8bit(x6, t2, t3, t7, t6); \
\
/* AES subbytes + AES shift rows */ \
- vmovdqa .Lpost_tf_lo_s1, t0; \
- vmovdqa .Lpost_tf_hi_s1, t1; \
+ vmovdqa .Lpost_tf_lo_s1(%rip), t0; \
+ vmovdqa .Lpost_tf_hi_s1(%rip), t1; \
vaesenclast t4, x0, x0; \
vaesenclast t4, x7, x7; \
vaesenclast t4, x1, x1; \
@@ -95,16 +95,16 @@
vaesenclast t4, x6, x6; \
\
/* postfilter sboxes 1 and 4 */ \
- vmovdqa .Lpost_tf_lo_s3, t2; \
- vmovdqa .Lpost_tf_hi_s3, t3; \
+ vmovdqa .Lpost_tf_lo_s3(%rip), t2; \
+ vmovdqa .Lpost_tf_hi_s3(%rip), t3; \
filter_8bit(x0, t0, t1, t7, t6); \
filter_8bit(x7, t0, t1, t7, t6); \
filter_8bit(x3, t0, t1, t7, t6); \
filter_8bit(x6, t0, t1, t7, t6); \
\
/* postfilter sbox 3 */ \
- vmovdqa .Lpost_tf_lo_s2, t4; \
- vmovdqa .Lpost_tf_hi_s2, t5; \
+ vmovdqa .Lpost_tf_lo_s2(%rip), t4; \
+ vmovdqa .Lpost_tf_hi_s2(%rip), t5; \
filter_8bit(x2, t2, t3, t7, t6); \
filter_8bit(x5, t2, t3, t7, t6); \
\
@@ -443,7 +443,7 @@ SYM_FUNC_END(roundsm16_x4_x5_x6_x7_x0_x1_x2_x3_y4_y5_y6_y7_y0_y1_y2_y3_ab)
transpose_4x4(c0, c1, c2, c3, a0, a1); \
transpose_4x4(d0, d1, d2, d3, a0, a1); \
\
- vmovdqu .Lshufb_16x16b, a0; \
+ vmovdqu .Lshufb_16x16b(%rip), a0; \
vmovdqu st1, a1; \
vpshufb a0, a2, a2; \
vpshufb a0, a3, a3; \
@@ -482,7 +482,7 @@ SYM_FUNC_END(roundsm16_x4_x5_x6_x7_x0_x1_x2_x3_y4_y5_y6_y7_y0_y1_y2_y3_ab)
#define inpack16_pre(x0, x1, x2, x3, x4, x5, x6, x7, y0, y1, y2, y3, y4, y5, \
y6, y7, rio, key) \
vmovq key, x0; \
- vpshufb .Lpack_bswap, x0, x0; \
+ vpshufb .Lpack_bswap(%rip), x0, x0; \
\
vpxor 0 * 16(rio), x0, y7; \
vpxor 1 * 16(rio), x0, y6; \
@@ -533,7 +533,7 @@ SYM_FUNC_END(roundsm16_x4_x5_x6_x7_x0_x1_x2_x3_y4_y5_y6_y7_y0_y1_y2_y3_ab)
vmovdqu x0, stack_tmp0; \
\
vmovq key, x0; \
- vpshufb .Lpack_bswap, x0, x0; \
+ vpshufb .Lpack_bswap(%rip), x0, x0; \
\
vpxor x0, y7, y7; \
vpxor x0, y6, y6; \
diff --git a/arch/x86/crypto/camellia-aesni-avx2-asm_64.S b/arch/x86/crypto/camellia-aesni-avx2-asm_64.S
index deaf62aa73a6b09c..a0eb94e53b1bb12d 100644
--- a/arch/x86/crypto/camellia-aesni-avx2-asm_64.S
+++ b/arch/x86/crypto/camellia-aesni-avx2-asm_64.S
@@ -64,12 +64,12 @@
/* \
* S-function with AES subbytes \
*/ \
- vbroadcasti128 .Linv_shift_row, t4; \
- vpbroadcastd .L0f0f0f0f, t7; \
- vbroadcasti128 .Lpre_tf_lo_s1, t5; \
- vbroadcasti128 .Lpre_tf_hi_s1, t6; \
- vbroadcasti128 .Lpre_tf_lo_s4, t2; \
- vbroadcasti128 .Lpre_tf_hi_s4, t3; \
+ vbroadcasti128 .Linv_shift_row(%rip), t4; \
+ vpbroadcastd .L0f0f0f0f(%rip), t7; \
+ vbroadcasti128 .Lpre_tf_lo_s1(%rip), t5; \
+ vbroadcasti128 .Lpre_tf_hi_s1(%rip), t6; \
+ vbroadcasti128 .Lpre_tf_lo_s4(%rip), t2; \
+ vbroadcasti128 .Lpre_tf_hi_s4(%rip), t3; \
\
/* AES inverse shift rows */ \
vpshufb t4, x0, x0; \
@@ -115,8 +115,8 @@
vinserti128 $1, t2##_x, x6, x6; \
vextracti128 $1, x1, t3##_x; \
vextracti128 $1, x4, t2##_x; \
- vbroadcasti128 .Lpost_tf_lo_s1, t0; \
- vbroadcasti128 .Lpost_tf_hi_s1, t1; \
+ vbroadcasti128 .Lpost_tf_lo_s1(%rip), t0; \
+ vbroadcasti128 .Lpost_tf_hi_s1(%rip), t1; \
vaesenclast t4##_x, x2##_x, x2##_x; \
vaesenclast t4##_x, t6##_x, t6##_x; \
vinserti128 $1, t6##_x, x2, x2; \
@@ -131,16 +131,16 @@
vinserti128 $1, t2##_x, x4, x4; \
\
/* postfilter sboxes 1 and 4 */ \
- vbroadcasti128 .Lpost_tf_lo_s3, t2; \
- vbroadcasti128 .Lpost_tf_hi_s3, t3; \
+ vbroadcasti128 .Lpost_tf_lo_s3(%rip), t2; \
+ vbroadcasti128 .Lpost_tf_hi_s3(%rip), t3; \
filter_8bit(x0, t0, t1, t7, t6); \
filter_8bit(x7, t0, t1, t7, t6); \
filter_8bit(x3, t0, t1, t7, t6); \
filter_8bit(x6, t0, t1, t7, t6); \
\
/* postfilter sbox 3 */ \
- vbroadcasti128 .Lpost_tf_lo_s2, t4; \
- vbroadcasti128 .Lpost_tf_hi_s2, t5; \
+ vbroadcasti128 .Lpost_tf_lo_s2(%rip), t4; \
+ vbroadcasti128 .Lpost_tf_hi_s2(%rip), t5; \
filter_8bit(x2, t2, t3, t7, t6); \
filter_8bit(x5, t2, t3, t7, t6); \
\
@@ -475,7 +475,7 @@ SYM_FUNC_END(roundsm32_x4_x5_x6_x7_x0_x1_x2_x3_y4_y5_y6_y7_y0_y1_y2_y3_ab)
transpose_4x4(c0, c1, c2, c3, a0, a1); \
transpose_4x4(d0, d1, d2, d3, a0, a1); \
\
- vbroadcasti128 .Lshufb_16x16b, a0; \
+ vbroadcasti128 .Lshufb_16x16b(%rip), a0; \
vmovdqu st1, a1; \
vpshufb a0, a2, a2; \
vpshufb a0, a3, a3; \
@@ -514,7 +514,7 @@ SYM_FUNC_END(roundsm32_x4_x5_x6_x7_x0_x1_x2_x3_y4_y5_y6_y7_y0_y1_y2_y3_ab)
#define inpack32_pre(x0, x1, x2, x3, x4, x5, x6, x7, y0, y1, y2, y3, y4, y5, \
y6, y7, rio, key) \
vpbroadcastq key, x0; \
- vpshufb .Lpack_bswap, x0, x0; \
+ vpshufb .Lpack_bswap(%rip), x0, x0; \
\
vpxor 0 * 32(rio), x0, y7; \
vpxor 1 * 32(rio), x0, y6; \
@@ -565,7 +565,7 @@ SYM_FUNC_END(roundsm32_x4_x5_x6_x7_x0_x1_x2_x3_y4_y5_y6_y7_y0_y1_y2_y3_ab)
vmovdqu x0, stack_tmp0; \
\
vpbroadcastq key, x0; \
- vpshufb .Lpack_bswap, x0, x0; \
+ vpshufb .Lpack_bswap(%rip), x0, x0; \
\
vpxor x0, y7, y7; \
vpxor x0, y6, y6; \
diff --git a/arch/x86/crypto/camellia-x86_64-asm_64.S b/arch/x86/crypto/camellia-x86_64-asm_64.S
index 347c059f59403d3c..b7c822d813a82772 100644
--- a/arch/x86/crypto/camellia-x86_64-asm_64.S
+++ b/arch/x86/crypto/camellia-x86_64-asm_64.S
@@ -77,11 +77,13 @@
#define RXORbl %r9b
#define xor2ror16(T0, T1, tmp1, tmp2, ab, dst) \
+ leaq T0(%rip), tmp1; \
movzbl ab ## bl, tmp2 ## d; \
+ xorq (tmp1, tmp2, 8), dst; \
+ leaq T1(%rip), tmp2; \
movzbl ab ## bh, tmp1 ## d; \
- rorq $16, ab; \
- xorq T0(, tmp2, 8), dst; \
- xorq T1(, tmp1, 8), dst;
+ xorq (tmp2, tmp1, 8), dst; \
+ rorq $16, ab;
/**********************************************************************
1-way camellia
--
2.39.2
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 05/10] crypto: x86/cast5 - Use RIP-relative addressing
2023-04-08 15:27 [PATCH 00/10] crypto: x86 - avoid absolute references Ard Biesheuvel
` (3 preceding siblings ...)
2023-04-08 15:27 ` [PATCH 04/10] crypto: x86/camellia " Ard Biesheuvel
@ 2023-04-08 15:27 ` Ard Biesheuvel
2023-04-08 15:27 ` [PATCH 06/10] crypto: x86/cast6 " Ard Biesheuvel
` (5 subsequent siblings)
10 siblings, 0 replies; 13+ messages in thread
From: Ard Biesheuvel @ 2023-04-08 15:27 UTC (permalink / raw)
To: linux-crypto; +Cc: Ard Biesheuvel, Herbert Xu, Eric Biggers, Kees Cook
Prefer RIP-relative addressing where possible, which removes the need
for boot time relocation fixups.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/crypto/cast5-avx-x86_64-asm_64.S | 50 +++++++++++---------
1 file changed, 27 insertions(+), 23 deletions(-)
diff --git a/arch/x86/crypto/cast5-avx-x86_64-asm_64.S b/arch/x86/crypto/cast5-avx-x86_64-asm_64.S
index 0326a01503c3a554..438c404a03bcd33e 100644
--- a/arch/x86/crypto/cast5-avx-x86_64-asm_64.S
+++ b/arch/x86/crypto/cast5-avx-x86_64-asm_64.S
@@ -83,16 +83,20 @@
#define lookup_32bit(src, dst, op1, op2, op3, interleave_op, il_reg) \
- movzbl src ## bh, RID1d; \
- movzbl src ## bl, RID2d; \
- shrq $16, src; \
- movl s1(, RID1, 4), dst ## d; \
- op1 s2(, RID2, 4), dst ## d; \
- movzbl src ## bh, RID1d; \
- movzbl src ## bl, RID2d; \
- interleave_op(il_reg); \
- op2 s3(, RID1, 4), dst ## d; \
- op3 s4(, RID2, 4), dst ## d;
+ movzbl src ## bh, RID1d; \
+ leaq s1(%rip), RID2; \
+ movl (RID2, RID1, 4), dst ## d; \
+ movzbl src ## bl, RID2d; \
+ leaq s2(%rip), RID1; \
+ op1 (RID1, RID2, 4), dst ## d; \
+ shrq $16, src; \
+ movzbl src ## bh, RID1d; \
+ leaq s3(%rip), RID2; \
+ op2 (RID2, RID1, 4), dst ## d; \
+ movzbl src ## bl, RID2d; \
+ leaq s4(%rip), RID1; \
+ op3 (RID1, RID2, 4), dst ## d; \
+ interleave_op(il_reg);
#define dummy(d) /* do nothing */
@@ -151,15 +155,15 @@
subround(l ## 3, r ## 3, l ## 4, r ## 4, f);
#define enc_preload_rkr() \
- vbroadcastss .L16_mask, RKR; \
+ vbroadcastss .L16_mask(%rip), RKR; \
/* add 16-bit rotation to key rotations (mod 32) */ \
vpxor kr(CTX), RKR, RKR;
#define dec_preload_rkr() \
- vbroadcastss .L16_mask, RKR; \
+ vbroadcastss .L16_mask(%rip), RKR; \
/* add 16-bit rotation to key rotations (mod 32) */ \
vpxor kr(CTX), RKR, RKR; \
- vpshufb .Lbswap128_mask, RKR, RKR;
+ vpshufb .Lbswap128_mask(%rip), RKR, RKR;
#define transpose_2x4(x0, x1, t0, t1) \
vpunpckldq x1, x0, t0; \
@@ -235,9 +239,9 @@ SYM_FUNC_START_LOCAL(__cast5_enc_blk16)
movq %rdi, CTX;
- vmovdqa .Lbswap_mask, RKM;
- vmovd .Lfirst_mask, R1ST;
- vmovd .L32_mask, R32;
+ vmovdqa .Lbswap_mask(%rip), RKM;
+ vmovd .Lfirst_mask(%rip), R1ST;
+ vmovd .L32_mask(%rip), R32;
enc_preload_rkr();
inpack_blocks(RL1, RR1, RTMP, RX, RKM);
@@ -271,7 +275,7 @@ SYM_FUNC_START_LOCAL(__cast5_enc_blk16)
popq %rbx;
popq %r15;
- vmovdqa .Lbswap_mask, RKM;
+ vmovdqa .Lbswap_mask(%rip), RKM;
outunpack_blocks(RR1, RL1, RTMP, RX, RKM);
outunpack_blocks(RR2, RL2, RTMP, RX, RKM);
@@ -308,9 +312,9 @@ SYM_FUNC_START_LOCAL(__cast5_dec_blk16)
movq %rdi, CTX;
- vmovdqa .Lbswap_mask, RKM;
- vmovd .Lfirst_mask, R1ST;
- vmovd .L32_mask, R32;
+ vmovdqa .Lbswap_mask(%rip), RKM;
+ vmovd .Lfirst_mask(%rip), R1ST;
+ vmovd .L32_mask(%rip), R32;
dec_preload_rkr();
inpack_blocks(RL1, RR1, RTMP, RX, RKM);
@@ -341,7 +345,7 @@ SYM_FUNC_START_LOCAL(__cast5_dec_blk16)
round(RL, RR, 1, 2);
round(RR, RL, 0, 1);
- vmovdqa .Lbswap_mask, RKM;
+ vmovdqa .Lbswap_mask(%rip), RKM;
popq %rbx;
popq %r15;
@@ -504,8 +508,8 @@ SYM_FUNC_START(cast5_ctr_16way)
vpcmpeqd RKR, RKR, RKR;
vpaddq RKR, RKR, RKR; /* low: -2, high: -2 */
- vmovdqa .Lbswap_iv_mask, R1ST;
- vmovdqa .Lbswap128_mask, RKM;
+ vmovdqa .Lbswap_iv_mask(%rip), R1ST;
+ vmovdqa .Lbswap128_mask(%rip), RKM;
/* load IV and byteswap */
vmovq (%rcx), RX;
--
2.39.2
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 06/10] crypto: x86/cast6 - Use RIP-relative addressing
2023-04-08 15:27 [PATCH 00/10] crypto: x86 - avoid absolute references Ard Biesheuvel
` (4 preceding siblings ...)
2023-04-08 15:27 ` [PATCH 05/10] crypto: x86/cast5 " Ard Biesheuvel
@ 2023-04-08 15:27 ` Ard Biesheuvel
2023-04-08 15:27 ` [PATCH 07/10] crypto: x86/crc32c " Ard Biesheuvel
` (4 subsequent siblings)
10 siblings, 0 replies; 13+ messages in thread
From: Ard Biesheuvel @ 2023-04-08 15:27 UTC (permalink / raw)
To: linux-crypto; +Cc: Ard Biesheuvel, Herbert Xu, Eric Biggers, Kees Cook
Prefer RIP-relative addressing where possible, which removes the need
for boot time relocation fixups.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/crypto/cast6-avx-x86_64-asm_64.S | 44 +++++++++++---------
1 file changed, 24 insertions(+), 20 deletions(-)
diff --git a/arch/x86/crypto/cast6-avx-x86_64-asm_64.S b/arch/x86/crypto/cast6-avx-x86_64-asm_64.S
index 82b716fd5dbac65a..180fb9c78de2d315 100644
--- a/arch/x86/crypto/cast6-avx-x86_64-asm_64.S
+++ b/arch/x86/crypto/cast6-avx-x86_64-asm_64.S
@@ -83,16 +83,20 @@
#define lookup_32bit(src, dst, op1, op2, op3, interleave_op, il_reg) \
- movzbl src ## bh, RID1d; \
- movzbl src ## bl, RID2d; \
- shrq $16, src; \
- movl s1(, RID1, 4), dst ## d; \
- op1 s2(, RID2, 4), dst ## d; \
- movzbl src ## bh, RID1d; \
- movzbl src ## bl, RID2d; \
- interleave_op(il_reg); \
- op2 s3(, RID1, 4), dst ## d; \
- op3 s4(, RID2, 4), dst ## d;
+ movzbl src ## bh, RID1d; \
+ leaq s1(%rip), RID2; \
+ movl (RID2, RID1, 4), dst ## d; \
+ movzbl src ## bl, RID2d; \
+ leaq s2(%rip), RID1; \
+ op1 (RID1, RID2, 4), dst ## d; \
+ shrq $16, src; \
+ movzbl src ## bh, RID1d; \
+ leaq s3(%rip), RID2; \
+ op2 (RID2, RID1, 4), dst ## d; \
+ movzbl src ## bl, RID2d; \
+ leaq s4(%rip), RID1; \
+ op3 (RID1, RID2, 4), dst ## d; \
+ interleave_op(il_reg);
#define dummy(d) /* do nothing */
@@ -175,10 +179,10 @@
qop(RD, RC, 1);
#define shuffle(mask) \
- vpshufb mask, RKR, RKR;
+ vpshufb mask(%rip), RKR, RKR;
#define preload_rkr(n, do_mask, mask) \
- vbroadcastss .L16_mask, RKR; \
+ vbroadcastss .L16_mask(%rip), RKR; \
/* add 16-bit rotation to key rotations (mod 32) */ \
vpxor (kr+n*16)(CTX), RKR, RKR; \
do_mask(mask);
@@ -258,9 +262,9 @@ SYM_FUNC_START_LOCAL(__cast6_enc_blk8)
movq %rdi, CTX;
- vmovdqa .Lbswap_mask, RKM;
- vmovd .Lfirst_mask, R1ST;
- vmovd .L32_mask, R32;
+ vmovdqa .Lbswap_mask(%rip), RKM;
+ vmovd .Lfirst_mask(%rip), R1ST;
+ vmovd .L32_mask(%rip), R32;
inpack_blocks(RA1, RB1, RC1, RD1, RTMP, RX, RKRF, RKM);
inpack_blocks(RA2, RB2, RC2, RD2, RTMP, RX, RKRF, RKM);
@@ -284,7 +288,7 @@ SYM_FUNC_START_LOCAL(__cast6_enc_blk8)
popq %rbx;
popq %r15;
- vmovdqa .Lbswap_mask, RKM;
+ vmovdqa .Lbswap_mask(%rip), RKM;
outunpack_blocks(RA1, RB1, RC1, RD1, RTMP, RX, RKRF, RKM);
outunpack_blocks(RA2, RB2, RC2, RD2, RTMP, RX, RKRF, RKM);
@@ -306,9 +310,9 @@ SYM_FUNC_START_LOCAL(__cast6_dec_blk8)
movq %rdi, CTX;
- vmovdqa .Lbswap_mask, RKM;
- vmovd .Lfirst_mask, R1ST;
- vmovd .L32_mask, R32;
+ vmovdqa .Lbswap_mask(%rip), RKM;
+ vmovd .Lfirst_mask(%rip), R1ST;
+ vmovd .L32_mask(%rip), R32;
inpack_blocks(RA1, RB1, RC1, RD1, RTMP, RX, RKRF, RKM);
inpack_blocks(RA2, RB2, RC2, RD2, RTMP, RX, RKRF, RKM);
@@ -332,7 +336,7 @@ SYM_FUNC_START_LOCAL(__cast6_dec_blk8)
popq %rbx;
popq %r15;
- vmovdqa .Lbswap_mask, RKM;
+ vmovdqa .Lbswap_mask(%rip), RKM;
outunpack_blocks(RA1, RB1, RC1, RD1, RTMP, RX, RKRF, RKM);
outunpack_blocks(RA2, RB2, RC2, RD2, RTMP, RX, RKRF, RKM);
--
2.39.2
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 07/10] crypto: x86/crc32c - Use RIP-relative addressing
2023-04-08 15:27 [PATCH 00/10] crypto: x86 - avoid absolute references Ard Biesheuvel
` (5 preceding siblings ...)
2023-04-08 15:27 ` [PATCH 06/10] crypto: x86/cast6 " Ard Biesheuvel
@ 2023-04-08 15:27 ` Ard Biesheuvel
2023-04-08 15:27 ` [PATCH 08/10] crypto: x86/des3 " Ard Biesheuvel
` (3 subsequent siblings)
10 siblings, 0 replies; 13+ messages in thread
From: Ard Biesheuvel @ 2023-04-08 15:27 UTC (permalink / raw)
To: linux-crypto; +Cc: Ard Biesheuvel, Herbert Xu, Eric Biggers, Kees Cook
Prefer RIP-relative addressing where possible, which removes the need
for boot time relocation fixups.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/crypto/crc32c-pcl-intel-asm_64.S | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/x86/crypto/crc32c-pcl-intel-asm_64.S b/arch/x86/crypto/crc32c-pcl-intel-asm_64.S
index ec35915f0901a087..5f843dce77f1de66 100644
--- a/arch/x86/crypto/crc32c-pcl-intel-asm_64.S
+++ b/arch/x86/crypto/crc32c-pcl-intel-asm_64.S
@@ -168,7 +168,8 @@ continue_block:
xor crc2, crc2
## branch into array
- mov jump_table(,%rax,8), %bufp
+ leaq jump_table(%rip), %bufp
+ mov (%bufp,%rax,8), %bufp
JMP_NOSPEC bufp
################################################################
--
2.39.2
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 08/10] crypto: x86/des3 - Use RIP-relative addressing
2023-04-08 15:27 [PATCH 00/10] crypto: x86 - avoid absolute references Ard Biesheuvel
` (6 preceding siblings ...)
2023-04-08 15:27 ` [PATCH 07/10] crypto: x86/crc32c " Ard Biesheuvel
@ 2023-04-08 15:27 ` Ard Biesheuvel
2023-04-08 15:27 ` [PATCH 09/10] crypto: x86/ghash " Ard Biesheuvel
` (2 subsequent siblings)
10 siblings, 0 replies; 13+ messages in thread
From: Ard Biesheuvel @ 2023-04-08 15:27 UTC (permalink / raw)
To: linux-crypto; +Cc: Ard Biesheuvel, Herbert Xu, Eric Biggers, Kees Cook
Prefer RIP-relative addressing where possible, which removes the need
for boot time relocation fixups.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/crypto/des3_ede-asm_64.S | 96 +++++++++++++-------
1 file changed, 64 insertions(+), 32 deletions(-)
diff --git a/arch/x86/crypto/des3_ede-asm_64.S b/arch/x86/crypto/des3_ede-asm_64.S
index f4c760f4cade6d7b..cf21b998e77cc4ea 100644
--- a/arch/x86/crypto/des3_ede-asm_64.S
+++ b/arch/x86/crypto/des3_ede-asm_64.S
@@ -129,21 +129,29 @@
movzbl RW0bl, RT2d; \
movzbl RW0bh, RT3d; \
shrq $16, RW0; \
- movq s8(, RT0, 8), RT0; \
- xorq s6(, RT1, 8), to; \
+ leaq s8(%rip), RW1; \
+ movq (RW1, RT0, 8), RT0; \
+ leaq s6(%rip), RW1; \
+ xorq (RW1, RT1, 8), to; \
movzbl RW0bl, RL1d; \
movzbl RW0bh, RT1d; \
shrl $16, RW0d; \
- xorq s4(, RT2, 8), RT0; \
- xorq s2(, RT3, 8), to; \
+ leaq s4(%rip), RW1; \
+ xorq (RW1, RT2, 8), RT0; \
+ leaq s2(%rip), RW1; \
+ xorq (RW1, RT3, 8), to; \
movzbl RW0bl, RT2d; \
movzbl RW0bh, RT3d; \
- xorq s7(, RL1, 8), RT0; \
- xorq s5(, RT1, 8), to; \
- xorq s3(, RT2, 8), RT0; \
+ leaq s7(%rip), RW1; \
+ xorq (RW1, RL1, 8), RT0; \
+ leaq s5(%rip), RW1; \
+ xorq (RW1, RT1, 8), to; \
+ leaq s3(%rip), RW1; \
+ xorq (RW1, RT2, 8), RT0; \
load_next_key(n, RW0); \
xorq RT0, to; \
- xorq s1(, RT3, 8), to; \
+ leaq s1(%rip), RW1; \
+ xorq (RW1, RT3, 8), to; \
#define load_next_key(n, RWx) \
movq (((n) + 1) * 8)(CTX), RWx;
@@ -355,65 +363,89 @@ SYM_FUNC_END(des3_ede_x86_64_crypt_blk)
movzbl RW0bl, RT3d; \
movzbl RW0bh, RT1d; \
shrq $16, RW0; \
- xorq s8(, RT3, 8), to##0; \
- xorq s6(, RT1, 8), to##0; \
+ leaq s8(%rip), RT2; \
+ xorq (RT2, RT3, 8), to##0; \
+ leaq s6(%rip), RT2; \
+ xorq (RT2, RT1, 8), to##0; \
movzbl RW0bl, RT3d; \
movzbl RW0bh, RT1d; \
shrq $16, RW0; \
- xorq s4(, RT3, 8), to##0; \
- xorq s2(, RT1, 8), to##0; \
+ leaq s4(%rip), RT2; \
+ xorq (RT2, RT3, 8), to##0; \
+ leaq s2(%rip), RT2; \
+ xorq (RT2, RT1, 8), to##0; \
movzbl RW0bl, RT3d; \
movzbl RW0bh, RT1d; \
shrl $16, RW0d; \
- xorq s7(, RT3, 8), to##0; \
- xorq s5(, RT1, 8), to##0; \
+ leaq s7(%rip), RT2; \
+ xorq (RT2, RT3, 8), to##0; \
+ leaq s5(%rip), RT2; \
+ xorq (RT2, RT1, 8), to##0; \
movzbl RW0bl, RT3d; \
movzbl RW0bh, RT1d; \
load_next_key(n, RW0); \
- xorq s3(, RT3, 8), to##0; \
- xorq s1(, RT1, 8), to##0; \
+ leaq s3(%rip), RT2; \
+ xorq (RT2, RT3, 8), to##0; \
+ leaq s1(%rip), RT2; \
+ xorq (RT2, RT1, 8), to##0; \
xorq from##1, RW1; \
movzbl RW1bl, RT3d; \
movzbl RW1bh, RT1d; \
shrq $16, RW1; \
- xorq s8(, RT3, 8), to##1; \
- xorq s6(, RT1, 8), to##1; \
+ leaq s8(%rip), RT2; \
+ xorq (RT2, RT3, 8), to##1; \
+ leaq s6(%rip), RT2; \
+ xorq (RT2, RT1, 8), to##1; \
movzbl RW1bl, RT3d; \
movzbl RW1bh, RT1d; \
shrq $16, RW1; \
- xorq s4(, RT3, 8), to##1; \
- xorq s2(, RT1, 8), to##1; \
+ leaq s4(%rip), RT2; \
+ xorq (RT2, RT3, 8), to##1; \
+ leaq s2(%rip), RT2; \
+ xorq (RT2, RT1, 8), to##1; \
movzbl RW1bl, RT3d; \
movzbl RW1bh, RT1d; \
shrl $16, RW1d; \
- xorq s7(, RT3, 8), to##1; \
- xorq s5(, RT1, 8), to##1; \
+ leaq s7(%rip), RT2; \
+ xorq (RT2, RT3, 8), to##1; \
+ leaq s5(%rip), RT2; \
+ xorq (RT2, RT1, 8), to##1; \
movzbl RW1bl, RT3d; \
movzbl RW1bh, RT1d; \
do_movq(RW0, RW1); \
- xorq s3(, RT3, 8), to##1; \
- xorq s1(, RT1, 8), to##1; \
+ leaq s3(%rip), RT2; \
+ xorq (RT2, RT3, 8), to##1; \
+ leaq s1(%rip), RT2; \
+ xorq (RT2, RT1, 8), to##1; \
xorq from##2, RW2; \
movzbl RW2bl, RT3d; \
movzbl RW2bh, RT1d; \
shrq $16, RW2; \
- xorq s8(, RT3, 8), to##2; \
- xorq s6(, RT1, 8), to##2; \
+ leaq s8(%rip), RT2; \
+ xorq (RT2, RT3, 8), to##2; \
+ leaq s6(%rip), RT2; \
+ xorq (RT2, RT1, 8), to##2; \
movzbl RW2bl, RT3d; \
movzbl RW2bh, RT1d; \
shrq $16, RW2; \
- xorq s4(, RT3, 8), to##2; \
- xorq s2(, RT1, 8), to##2; \
+ leaq s4(%rip), RT2; \
+ xorq (RT2, RT3, 8), to##2; \
+ leaq s2(%rip), RT2; \
+ xorq (RT2, RT1, 8), to##2; \
movzbl RW2bl, RT3d; \
movzbl RW2bh, RT1d; \
shrl $16, RW2d; \
- xorq s7(, RT3, 8), to##2; \
- xorq s5(, RT1, 8), to##2; \
+ leaq s7(%rip), RT2; \
+ xorq (RT2, RT3, 8), to##2; \
+ leaq s5(%rip), RT2; \
+ xorq (RT2, RT1, 8), to##2; \
movzbl RW2bl, RT3d; \
movzbl RW2bh, RT1d; \
do_movq(RW0, RW2); \
- xorq s3(, RT3, 8), to##2; \
- xorq s1(, RT1, 8), to##2;
+ leaq s3(%rip), RT2; \
+ xorq (RT2, RT3, 8), to##2; \
+ leaq s1(%rip), RT2; \
+ xorq (RT2, RT1, 8), to##2;
#define __movq(src, dst) \
movq src, dst;
--
2.39.2
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 09/10] crypto: x86/ghash - Use RIP-relative addressing
2023-04-08 15:27 [PATCH 00/10] crypto: x86 - avoid absolute references Ard Biesheuvel
` (7 preceding siblings ...)
2023-04-08 15:27 ` [PATCH 08/10] crypto: x86/des3 " Ard Biesheuvel
@ 2023-04-08 15:27 ` Ard Biesheuvel
2023-04-08 15:27 ` [PATCH 10/10] crypto: x86/sha256 " Ard Biesheuvel
2023-04-08 15:32 ` [PATCH 00/10] crypto: x86 - avoid absolute references Ard Biesheuvel
10 siblings, 0 replies; 13+ messages in thread
From: Ard Biesheuvel @ 2023-04-08 15:27 UTC (permalink / raw)
To: linux-crypto; +Cc: Ard Biesheuvel, Herbert Xu, Eric Biggers, Kees Cook
Prefer RIP-relative addressing where possible, which removes the need
for boot time relocation fixups.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/crypto/ghash-clmulni-intel_asm.S | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/x86/crypto/ghash-clmulni-intel_asm.S b/arch/x86/crypto/ghash-clmulni-intel_asm.S
index 257ed9446f3ee1a9..99cb983ded9e369f 100644
--- a/arch/x86/crypto/ghash-clmulni-intel_asm.S
+++ b/arch/x86/crypto/ghash-clmulni-intel_asm.S
@@ -93,7 +93,7 @@ SYM_FUNC_START(clmul_ghash_mul)
FRAME_BEGIN
movups (%rdi), DATA
movups (%rsi), SHASH
- movaps .Lbswap_mask, BSWAP
+ movaps .Lbswap_mask(%rip), BSWAP
pshufb BSWAP, DATA
call __clmul_gf128mul_ble
pshufb BSWAP, DATA
@@ -110,7 +110,7 @@ SYM_FUNC_START(clmul_ghash_update)
FRAME_BEGIN
cmp $16, %rdx
jb .Lupdate_just_ret # check length
- movaps .Lbswap_mask, BSWAP
+ movaps .Lbswap_mask(%rip), BSWAP
movups (%rdi), DATA
movups (%rcx), SHASH
pshufb BSWAP, DATA
--
2.39.2
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 10/10] crypto: x86/sha256 - Use RIP-relative addressing
2023-04-08 15:27 [PATCH 00/10] crypto: x86 - avoid absolute references Ard Biesheuvel
` (8 preceding siblings ...)
2023-04-08 15:27 ` [PATCH 09/10] crypto: x86/ghash " Ard Biesheuvel
@ 2023-04-08 15:27 ` Ard Biesheuvel
2023-04-08 15:32 ` [PATCH 00/10] crypto: x86 - avoid absolute references Ard Biesheuvel
10 siblings, 0 replies; 13+ messages in thread
From: Ard Biesheuvel @ 2023-04-08 15:27 UTC (permalink / raw)
To: linux-crypto; +Cc: Ard Biesheuvel, Herbert Xu, Eric Biggers, Kees Cook
Prefer RIP-relative addressing where possible, which removes the need
for boot time relocation fixups.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/crypto/sha256-avx2-asm.S | 18 ++++++++++++------
1 file changed, 12 insertions(+), 6 deletions(-)
diff --git a/arch/x86/crypto/sha256-avx2-asm.S b/arch/x86/crypto/sha256-avx2-asm.S
index 3eada94168526665..e2a4024fb0a3f5d5 100644
--- a/arch/x86/crypto/sha256-avx2-asm.S
+++ b/arch/x86/crypto/sha256-avx2-asm.S
@@ -589,19 +589,23 @@ last_block_enter:
.align 16
loop1:
- vpaddd K256+0*32(SRND), X0, XFER
+ leaq K256+0*32(%rip), INP ## reuse INP as scratch reg
+ vpaddd (INP, SRND), X0, XFER
vmovdqa XFER, 0*32+_XFER(%rsp, SRND)
FOUR_ROUNDS_AND_SCHED _XFER + 0*32
- vpaddd K256+1*32(SRND), X0, XFER
+ leaq K256+1*32(%rip), INP
+ vpaddd (INP, SRND), X0, XFER
vmovdqa XFER, 1*32+_XFER(%rsp, SRND)
FOUR_ROUNDS_AND_SCHED _XFER + 1*32
- vpaddd K256+2*32(SRND), X0, XFER
+ leaq K256+2*32(%rip), INP
+ vpaddd (INP, SRND), X0, XFER
vmovdqa XFER, 2*32+_XFER(%rsp, SRND)
FOUR_ROUNDS_AND_SCHED _XFER + 2*32
- vpaddd K256+3*32(SRND), X0, XFER
+ leaq K256+3*32(%rip), INP
+ vpaddd (INP, SRND), X0, XFER
vmovdqa XFER, 3*32+_XFER(%rsp, SRND)
FOUR_ROUNDS_AND_SCHED _XFER + 3*32
@@ -611,11 +615,13 @@ loop1:
loop2:
## Do last 16 rounds with no scheduling
- vpaddd K256+0*32(SRND), X0, XFER
+ leaq K256+0*32(%rip), INP
+ vpaddd (INP, SRND), X0, XFER
vmovdqa XFER, 0*32+_XFER(%rsp, SRND)
DO_4ROUNDS _XFER + 0*32
- vpaddd K256+1*32(SRND), X1, XFER
+ leaq K256+1*32(%rip), INP
+ vpaddd (INP, SRND), X1, XFER
vmovdqa XFER, 1*32+_XFER(%rsp, SRND)
DO_4ROUNDS _XFER + 1*32
add $2*32, SRND
--
2.39.2
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH 00/10] crypto: x86 - avoid absolute references
2023-04-08 15:27 [PATCH 00/10] crypto: x86 - avoid absolute references Ard Biesheuvel
` (9 preceding siblings ...)
2023-04-08 15:27 ` [PATCH 10/10] crypto: x86/sha256 " Ard Biesheuvel
@ 2023-04-08 15:32 ` Ard Biesheuvel
2023-04-10 9:10 ` Ard Biesheuvel
10 siblings, 1 reply; 13+ messages in thread
From: Ard Biesheuvel @ 2023-04-08 15:32 UTC (permalink / raw)
To: linux-crypto; +Cc: Herbert Xu, Eric Biggers, Kees Cook
On Sat, 8 Apr 2023 at 17:27, Ard Biesheuvel <ardb@kernel.org> wrote:
>
> This is preparatory work for allowing the kernel to be built as a PIE
> executable, which relies mostly on RIP-relative symbol references from
> code, which don't need to be updated when a binary is loaded at an
> address different from its link time address.
>
> Most changes are quite straight-forward, i.e., just adding a (%rip)
> suffix is enough in many cases. However, some are slightly trickier, and
> need some minor reshuffling of the asm code to get rid of the absolute
> references in the code.
>
> Tested with CONFIG_CRYPTO_MANAGER_EXTRA_TESTS=y on a x86 CPU that
> implements AVX, AVX2 and AVX512.
>
> Cc: Herbert Xu <herbert@gondor.apana.org.au>
> Cc: Eric Biggers <ebiggers@kernel.org>
> Cc: Kees Cook <keescook@chromium.org>
>
> Ard Biesheuvel (10):
> crypto: x86/camellia - Use RIP-relative addressing
> crypto: x86/cast5 - Use RIP-relative addressing
> crypto: x86/cast6 - Use RIP-relative addressing
> crypto: x86/des3 - Use RIP-relative addressing
Note: the patches above are
Co-developed-by: Thomas Garnier <thgarnie@chromium.org>
Signed-off-by: Thomas Garnier <thgarnie@chromium.org>
but this got lost inadvertently - apologies.
Herbert: will patchwork pick those up if I put them in a reply to each
of those individual patches?
Thanks,
> crypto: x86/aegis128 - Use RIP-relative addressing
> crypto: x86/aesni - Use RIP-relative addressing
> crypto: x86/aria - Use RIP-relative addressing
> crypto: x86/crc32c - Use RIP-relative addressing
> crypto: x86/ghash - Use RIP-relative addressing
> crypto: x86/sha256 - Use RIP-relative addressing
>
> arch/x86/crypto/aegis128-aesni-asm.S | 6 +-
> arch/x86/crypto/aesni-intel_asm.S | 2 +-
> arch/x86/crypto/aesni-intel_avx-x86_64.S | 6 +-
> arch/x86/crypto/aria-aesni-avx-asm_64.S | 28 +++---
> arch/x86/crypto/aria-aesni-avx2-asm_64.S | 28 +++---
> arch/x86/crypto/aria-gfni-avx512-asm_64.S | 24 ++---
> arch/x86/crypto/camellia-aesni-avx-asm_64.S | 30 +++---
> arch/x86/crypto/camellia-aesni-avx2-asm_64.S | 30 +++---
> arch/x86/crypto/camellia-x86_64-asm_64.S | 8 +-
> arch/x86/crypto/cast5-avx-x86_64-asm_64.S | 50 +++++-----
> arch/x86/crypto/cast6-avx-x86_64-asm_64.S | 44 +++++----
> arch/x86/crypto/crc32c-pcl-intel-asm_64.S | 3 +-
> arch/x86/crypto/des3_ede-asm_64.S | 96 +++++++++++++-------
> arch/x86/crypto/ghash-clmulni-intel_asm.S | 4 +-
> arch/x86/crypto/sha256-avx2-asm.S | 18 ++--
> 15 files changed, 213 insertions(+), 164 deletions(-)
>
> --
> 2.39.2
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 00/10] crypto: x86 - avoid absolute references
2023-04-08 15:32 ` [PATCH 00/10] crypto: x86 - avoid absolute references Ard Biesheuvel
@ 2023-04-10 9:10 ` Ard Biesheuvel
0 siblings, 0 replies; 13+ messages in thread
From: Ard Biesheuvel @ 2023-04-10 9:10 UTC (permalink / raw)
To: linux-crypto; +Cc: Herbert Xu, Eric Biggers, Kees Cook
On Sat, 8 Apr 2023 at 17:32, Ard Biesheuvel <ardb@kernel.org> wrote:
>
> On Sat, 8 Apr 2023 at 17:27, Ard Biesheuvel <ardb@kernel.org> wrote:
> >
> > This is preparatory work for allowing the kernel to be built as a PIE
> > executable, which relies mostly on RIP-relative symbol references from
> > code, which don't need to be updated when a binary is loaded at an
> > address different from its link time address.
> >
> > Most changes are quite straight-forward, i.e., just adding a (%rip)
> > suffix is enough in many cases. However, some are slightly trickier, and
> > need some minor reshuffling of the asm code to get rid of the absolute
> > references in the code.
> >
> > Tested with CONFIG_CRYPTO_MANAGER_EXTRA_TESTS=y on a x86 CPU that
> > implements AVX, AVX2 and AVX512.
> >
> > Cc: Herbert Xu <herbert@gondor.apana.org.au>
> > Cc: Eric Biggers <ebiggers@kernel.org>
> > Cc: Kees Cook <keescook@chromium.org>
> >
> > Ard Biesheuvel (10):
>
> > crypto: x86/camellia - Use RIP-relative addressing
> > crypto: x86/cast5 - Use RIP-relative addressing
> > crypto: x86/cast6 - Use RIP-relative addressing
> > crypto: x86/des3 - Use RIP-relative addressing
>
> Note: the patches above are
>
> Co-developed-by: Thomas Garnier <thgarnie@chromium.org>
> Signed-off-by: Thomas Garnier <thgarnie@chromium.org>
>
> but this got lost inadvertently - apologies.
>
> Herbert: will patchwork pick those up if I put them in a reply to each
> of those individual patches?
>
Never mind, I'll be sending out a v2 in any case.
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2023-04-10 9:10 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-04-08 15:27 [PATCH 00/10] crypto: x86 - avoid absolute references Ard Biesheuvel
2023-04-08 15:27 ` [PATCH 01/10] crypto: x86/aegis128 - Use RIP-relative addressing Ard Biesheuvel
2023-04-08 15:27 ` [PATCH 02/10] crypto: x86/aesni " Ard Biesheuvel
2023-04-08 15:27 ` [PATCH 03/10] crypto: x86/aria " Ard Biesheuvel
2023-04-08 15:27 ` [PATCH 04/10] crypto: x86/camellia " Ard Biesheuvel
2023-04-08 15:27 ` [PATCH 05/10] crypto: x86/cast5 " Ard Biesheuvel
2023-04-08 15:27 ` [PATCH 06/10] crypto: x86/cast6 " Ard Biesheuvel
2023-04-08 15:27 ` [PATCH 07/10] crypto: x86/crc32c " Ard Biesheuvel
2023-04-08 15:27 ` [PATCH 08/10] crypto: x86/des3 " Ard Biesheuvel
2023-04-08 15:27 ` [PATCH 09/10] crypto: x86/ghash " Ard Biesheuvel
2023-04-08 15:27 ` [PATCH 10/10] crypto: x86/sha256 " Ard Biesheuvel
2023-04-08 15:32 ` [PATCH 00/10] crypto: x86 - avoid absolute references Ard Biesheuvel
2023-04-10 9:10 ` Ard Biesheuvel
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).