* [PATCH 1/2] arm64: add macro to handle large immediates
@ 2016-01-06 11:05 Mark Rutland
2016-01-06 11:05 ` [PATCH 2/2] arm64: use memset to clear BSS Mark Rutland
2016-01-06 11:15 ` [PATCH 1/2] arm64: add macro to handle large immediates Ard Biesheuvel
0 siblings, 2 replies; 9+ messages in thread
From: Mark Rutland @ 2016-01-06 11:05 UTC (permalink / raw)
To: linux-arm-kernel
Sometimes we want to be able to load values greater than 0xff into a
register, without placing said values in a literal pool. Arranging for
the value to be split up across a number of movz and movk instructions
is tedious and error-prone.
Following the example of {adr,str,ldr}_l, this patch adds a new mov_l
macro which can be used to load immediate values of up to 64 bits into a
register.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
---
arch/arm64/include/asm/assembler.h | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
index 12eff92..64fd0a2 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -193,6 +193,19 @@ lr .req x30 // link register
str \src, [\tmp, :lo12:\sym]
.endm
+ /*
+ * Move a large immediate up to 64-bits.
+ *
+ * @dst: destination register (64 bit wide)
+ * @val: value
+ */
+ .macro mov_l, dst, val
+ movz \dst, :abs_g0_nc:\val
+ movk \dst, :abs_g1_nc:\val
+ movk \dst, :abs_g2_nc:\val
+ movk \dst, :abs_g3:\val
+ .endm
+
/*
* Annotate a function as position independent, i.e., safe to be called before
* the kernel virtual mapping is activated.
--
1.9.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 2/2] arm64: use memset to clear BSS
2016-01-06 11:05 [PATCH 1/2] arm64: add macro to handle large immediates Mark Rutland
@ 2016-01-06 11:05 ` Mark Rutland
2016-01-06 11:12 ` Ard Biesheuvel
2016-01-06 11:15 ` [PATCH 1/2] arm64: add macro to handle large immediates Ard Biesheuvel
1 sibling, 1 reply; 9+ messages in thread
From: Mark Rutland @ 2016-01-06 11:05 UTC (permalink / raw)
To: linux-arm-kernel
Currently we use an open-coded memzero to clear the BSS. As it is a
trivial implementation, it is sub-optimal.
Our optimised memset doesn't use the stack, is position-independent, and
for the memzero case can use of DC ZVA to clear large blocks
efficiently. In __mmap_switched the MMU is on and there are no live
caller-saved registers, so we can safely call an uninstrumented memset.
This patch changes __mmap_switched to use memset when clearing the BSS.
We use the __pi_memset alias so as to avoid any instrumentation in all
kernel configurations. As with the head symbols, we must get the linker
to generate __bss_size, as there is no ELF relocation for the
subtraction of two symbols.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
---
arch/arm64/kernel/head.S | 14 ++++++--------
arch/arm64/kernel/image.h | 2 ++
2 files changed, 8 insertions(+), 8 deletions(-)
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 23cfc08..247a97b 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -415,14 +415,12 @@ ENDPROC(__create_page_tables)
*/
.set initial_sp, init_thread_union + THREAD_START_SP
__mmap_switched:
- adr_l x6, __bss_start
- adr_l x7, __bss_stop
-
-1: cmp x6, x7
- b.hs 2f
- str xzr, [x6], #8 // Clear BSS
- b 1b
-2:
+ // clear BSS
+ adr_l x0, __bss_start
+ mov x1, xzr
+ mov_l x2, __bss_size
+ bl __pi_memset
+
adr_l sp, initial_sp, x4
str_l x21, __fdt_pointer, x5 // Save FDT pointer
str_l x24, memstart_addr, x6 // Save PHYS_OFFSET
diff --git a/arch/arm64/kernel/image.h b/arch/arm64/kernel/image.h
index bc2abb8..5fd76b5 100644
--- a/arch/arm64/kernel/image.h
+++ b/arch/arm64/kernel/image.h
@@ -95,4 +95,6 @@ __efistub__edata = _edata;
#endif
+__bss_size = __bss_stop - __bss_start;
+
#endif /* __ASM_IMAGE_H */
--
1.9.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 2/2] arm64: use memset to clear BSS
2016-01-06 11:05 ` [PATCH 2/2] arm64: use memset to clear BSS Mark Rutland
@ 2016-01-06 11:12 ` Ard Biesheuvel
2016-01-06 11:40 ` Mark Rutland
0 siblings, 1 reply; 9+ messages in thread
From: Ard Biesheuvel @ 2016-01-06 11:12 UTC (permalink / raw)
To: linux-arm-kernel
On 6 January 2016 at 12:05, Mark Rutland <mark.rutland@arm.com> wrote:
> Currently we use an open-coded memzero to clear the BSS. As it is a
> trivial implementation, it is sub-optimal.
>
> Our optimised memset doesn't use the stack, is position-independent, and
> for the memzero case can use of DC ZVA to clear large blocks
> efficiently. In __mmap_switched the MMU is on and there are no live
> caller-saved registers, so we can safely call an uninstrumented memset.
>
> This patch changes __mmap_switched to use memset when clearing the BSS.
> We use the __pi_memset alias so as to avoid any instrumentation in all
> kernel configurations. As with the head symbols, we must get the linker
> to generate __bss_size, as there is no ELF relocation for the
> subtraction of two symbols.
>
> Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Marc Zyngier <marc.zyngier@arm.com>
> Cc: Will Deacon <will.deacon@arm.com>
> ---
> arch/arm64/kernel/head.S | 14 ++++++--------
> arch/arm64/kernel/image.h | 2 ++
> 2 files changed, 8 insertions(+), 8 deletions(-)
>
> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> index 23cfc08..247a97b 100644
> --- a/arch/arm64/kernel/head.S
> +++ b/arch/arm64/kernel/head.S
> @@ -415,14 +415,12 @@ ENDPROC(__create_page_tables)
> */
> .set initial_sp, init_thread_union + THREAD_START_SP
> __mmap_switched:
> - adr_l x6, __bss_start
> - adr_l x7, __bss_stop
> -
> -1: cmp x6, x7
> - b.hs 2f
> - str xzr, [x6], #8 // Clear BSS
> - b 1b
> -2:
> + // clear BSS
> + adr_l x0, __bss_start
> + mov x1, xzr
> + mov_l x2, __bss_size
Is it such a big deal to do
adr_l x2, __bss_stop
sub x2, x2, x0
instead?
Either way:
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> + bl __pi_memset
> +
> adr_l sp, initial_sp, x4
> str_l x21, __fdt_pointer, x5 // Save FDT pointer
> str_l x24, memstart_addr, x6 // Save PHYS_OFFSET
> diff --git a/arch/arm64/kernel/image.h b/arch/arm64/kernel/image.h
> index bc2abb8..5fd76b5 100644
> --- a/arch/arm64/kernel/image.h
> +++ b/arch/arm64/kernel/image.h
> @@ -95,4 +95,6 @@ __efistub__edata = _edata;
>
> #endif
>
> +__bss_size = __bss_stop - __bss_start;
> +
> #endif /* __ASM_IMAGE_H */
> --
> 1.9.1
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 1/2] arm64: add macro to handle large immediates
2016-01-06 11:05 [PATCH 1/2] arm64: add macro to handle large immediates Mark Rutland
2016-01-06 11:05 ` [PATCH 2/2] arm64: use memset to clear BSS Mark Rutland
@ 2016-01-06 11:15 ` Ard Biesheuvel
2016-01-06 12:21 ` Mark Rutland
1 sibling, 1 reply; 9+ messages in thread
From: Ard Biesheuvel @ 2016-01-06 11:15 UTC (permalink / raw)
To: linux-arm-kernel
On 6 January 2016 at 12:05, Mark Rutland <mark.rutland@arm.com> wrote:
> Sometimes we want to be able to load values greater than 0xff into a
> register, without placing said values in a literal pool. Arranging for
> the value to be split up across a number of movz and movk instructions
> is tedious and error-prone.
>
> Following the example of {adr,str,ldr}_l, this patch adds a new mov_l
> macro which can be used to load immediate values of up to 64 bits into a
> register.
>
> Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Marc Zyngier <marc.zyngier@arm.com>
> Cc: Will Deacon <will.deacon@arm.com>
> ---
> arch/arm64/include/asm/assembler.h | 13 +++++++++++++
> 1 file changed, 13 insertions(+)
>
> diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
> index 12eff92..64fd0a2 100644
> --- a/arch/arm64/include/asm/assembler.h
> +++ b/arch/arm64/include/asm/assembler.h
> @@ -193,6 +193,19 @@ lr .req x30 // link register
> str \src, [\tmp, :lo12:\sym]
> .endm
>
> + /*
> + * Move a large immediate up to 64-bits.
> + *
> + * @dst: destination register (64 bit wide)
> + * @val: value
> + */
> + .macro mov_l, dst, val
> + movz \dst, :abs_g0_nc:\val
> + movk \dst, :abs_g1_nc:\val
> + movk \dst, :abs_g2_nc:\val
> + movk \dst, :abs_g3:\val
> + .endm
> +
Ack for the general idea, but for correctness, you should pair the
movk instructions with the _nc relocations (i.e., keep movz first, but
invert the order of the relocs)
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 2/2] arm64: use memset to clear BSS
2016-01-06 11:12 ` Ard Biesheuvel
@ 2016-01-06 11:40 ` Mark Rutland
2016-01-06 12:34 ` Mark Rutland
0 siblings, 1 reply; 9+ messages in thread
From: Mark Rutland @ 2016-01-06 11:40 UTC (permalink / raw)
To: linux-arm-kernel
On Wed, Jan 06, 2016 at 12:12:45PM +0100, Ard Biesheuvel wrote:
> On 6 January 2016 at 12:05, Mark Rutland <mark.rutland@arm.com> wrote:
> > Currently we use an open-coded memzero to clear the BSS. As it is a
> > trivial implementation, it is sub-optimal.
> >
> > Our optimised memset doesn't use the stack, is position-independent, and
> > for the memzero case can use of DC ZVA to clear large blocks
> > efficiently. In __mmap_switched the MMU is on and there are no live
> > caller-saved registers, so we can safely call an uninstrumented memset.
> >
> > This patch changes __mmap_switched to use memset when clearing the BSS.
> > We use the __pi_memset alias so as to avoid any instrumentation in all
> > kernel configurations. As with the head symbols, we must get the linker
> > to generate __bss_size, as there is no ELF relocation for the
> > subtraction of two symbols.
> >
> > Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> > Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> > Cc: Catalin Marinas <catalin.marinas@arm.com>
> > Cc: Marc Zyngier <marc.zyngier@arm.com>
> > Cc: Will Deacon <will.deacon@arm.com>
> > ---
> > arch/arm64/kernel/head.S | 14 ++++++--------
> > arch/arm64/kernel/image.h | 2 ++
> > 2 files changed, 8 insertions(+), 8 deletions(-)
> >
> > diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> > index 23cfc08..247a97b 100644
> > --- a/arch/arm64/kernel/head.S
> > +++ b/arch/arm64/kernel/head.S
> > @@ -415,14 +415,12 @@ ENDPROC(__create_page_tables)
> > */
> > .set initial_sp, init_thread_union + THREAD_START_SP
> > __mmap_switched:
> > - adr_l x6, __bss_start
> > - adr_l x7, __bss_stop
> > -
> > -1: cmp x6, x7
> > - b.hs 2f
> > - str xzr, [x6], #8 // Clear BSS
> > - b 1b
> > -2:
> > + // clear BSS
> > + adr_l x0, __bss_start
> > + mov x1, xzr
> > + mov_l x2, __bss_size
>
> Is it such a big deal to do
>
> adr_l x2, __bss_stop
> sub x2, x2, x0
>
> instead?
I'm happy either way.
It no-one else has a use for mov_l I'll drop it and move to that.
> Either way:
> Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Thanks!
Mark.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 1/2] arm64: add macro to handle large immediates
2016-01-06 11:15 ` [PATCH 1/2] arm64: add macro to handle large immediates Ard Biesheuvel
@ 2016-01-06 12:21 ` Mark Rutland
2016-01-06 12:26 ` Ard Biesheuvel
0 siblings, 1 reply; 9+ messages in thread
From: Mark Rutland @ 2016-01-06 12:21 UTC (permalink / raw)
To: linux-arm-kernel
On Wed, Jan 06, 2016 at 12:15:14PM +0100, Ard Biesheuvel wrote:
> On 6 January 2016 at 12:05, Mark Rutland <mark.rutland@arm.com> wrote:
> > Sometimes we want to be able to load values greater than 0xff into a
> > register, without placing said values in a literal pool. Arranging for
> > the value to be split up across a number of movz and movk instructions
> > is tedious and error-prone.
> >
> > Following the example of {adr,str,ldr}_l, this patch adds a new mov_l
> > macro which can be used to load immediate values of up to 64 bits into a
> > register.
> >
> > Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> > Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> > Cc: Catalin Marinas <catalin.marinas@arm.com>
> > Cc: Marc Zyngier <marc.zyngier@arm.com>
> > Cc: Will Deacon <will.deacon@arm.com>
> > ---
> > arch/arm64/include/asm/assembler.h | 13 +++++++++++++
> > 1 file changed, 13 insertions(+)
> >
> > diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
> > index 12eff92..64fd0a2 100644
> > --- a/arch/arm64/include/asm/assembler.h
> > +++ b/arch/arm64/include/asm/assembler.h
> > @@ -193,6 +193,19 @@ lr .req x30 // link register
> > str \src, [\tmp, :lo12:\sym]
> > .endm
> >
> > + /*
> > + * Move a large immediate up to 64-bits.
> > + *
> > + * @dst: destination register (64 bit wide)
> > + * @val: value
> > + */
> > + .macro mov_l, dst, val
> > + movz \dst, :abs_g0_nc:\val
> > + movk \dst, :abs_g1_nc:\val
> > + movk \dst, :abs_g2_nc:\val
> > + movk \dst, :abs_g3:\val
> > + .endm
> > +
>
> Ack for the general idea, but for correctness, you should pair the
> movk instructions with the _nc relocations (i.e., keep movz first, but
> invert the order of the relocs)
Ah, I hadn't spotted the restriction. I'll change that to:
movz \dst, :abs_g3:\val
movk \dst, :abs_g2:\val
movk \dst, :abs_g1:\val
movk \dst, :abs_g0:\val
That raises a related question. Is it the linker's responsibility to
fill in the shift encoding in the hw field as part of the g{3,2,1}
relocs?
Mine seems to, but I don't know if that's strictly required or correct
as the AArrch64 ELF spec only mentions the immediate field for *ABS_G*,
and the shift is encoded in hw rather than imm16.
Thanks,
Mark.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 1/2] arm64: add macro to handle large immediates
2016-01-06 12:21 ` Mark Rutland
@ 2016-01-06 12:26 ` Ard Biesheuvel
2016-01-06 12:37 ` Mark Rutland
0 siblings, 1 reply; 9+ messages in thread
From: Ard Biesheuvel @ 2016-01-06 12:26 UTC (permalink / raw)
To: linux-arm-kernel
On 6 January 2016 at 13:21, Mark Rutland <mark.rutland@arm.com> wrote:
> On Wed, Jan 06, 2016 at 12:15:14PM +0100, Ard Biesheuvel wrote:
>> On 6 January 2016 at 12:05, Mark Rutland <mark.rutland@arm.com> wrote:
>> > Sometimes we want to be able to load values greater than 0xff into a
>> > register, without placing said values in a literal pool. Arranging for
>> > the value to be split up across a number of movz and movk instructions
>> > is tedious and error-prone.
>> >
>> > Following the example of {adr,str,ldr}_l, this patch adds a new mov_l
>> > macro which can be used to load immediate values of up to 64 bits into a
>> > register.
>> >
>> > Signed-off-by: Mark Rutland <mark.rutland@arm.com>
>> > Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>> > Cc: Catalin Marinas <catalin.marinas@arm.com>
>> > Cc: Marc Zyngier <marc.zyngier@arm.com>
>> > Cc: Will Deacon <will.deacon@arm.com>
>> > ---
>> > arch/arm64/include/asm/assembler.h | 13 +++++++++++++
>> > 1 file changed, 13 insertions(+)
>> >
>> > diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
>> > index 12eff92..64fd0a2 100644
>> > --- a/arch/arm64/include/asm/assembler.h
>> > +++ b/arch/arm64/include/asm/assembler.h
>> > @@ -193,6 +193,19 @@ lr .req x30 // link register
>> > str \src, [\tmp, :lo12:\sym]
>> > .endm
>> >
>> > + /*
>> > + * Move a large immediate up to 64-bits.
>> > + *
>> > + * @dst: destination register (64 bit wide)
>> > + * @val: value
>> > + */
>> > + .macro mov_l, dst, val
>> > + movz \dst, :abs_g0_nc:\val
>> > + movk \dst, :abs_g1_nc:\val
>> > + movk \dst, :abs_g2_nc:\val
>> > + movk \dst, :abs_g3:\val
>> > + .endm
>> > +
>>
>> Ack for the general idea, but for correctness, you should pair the
>> movk instructions with the _nc relocations (i.e., keep movz first, but
>> invert the order of the relocs)
>
> Ah, I hadn't spotted the restriction. I'll change that to:
>
> movz \dst, :abs_g3:\val
> movk \dst, :abs_g2:\val
> movk \dst, :abs_g1:\val
> movk \dst, :abs_g0:\val
>
Yes, but with the _nc suffix on the latter three.
> That raises a related question. Is it the linker's responsibility to
> fill in the shift encoding in the hw field as part of the g{3,2,1}
> relocs?
>
This
movz x0, :abs_g3:val
movk x0, :abs_g2_nc:val
movk x0, :abs_g1_nc:val
movk x0, :abs_g0_nc:val
assembles to
0000000000000000 <.text>:
0: d2e00000 movz x0, #0x0, lsl #48
4: f2c00000 movk x0, #0x0, lsl #32
8: f2a00000 movk x0, #0x0, lsl #16
c: f2800000 movk x0, #0x0
so it is in fact the assembler that sets the hw field.
> Mine seems to, but I don't know if that's strictly required or correct
> as the AArrch64 ELF spec only mentions the immediate field for *ABS_G*,
> and the shift is encoded in hw rather than imm16.
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 2/2] arm64: use memset to clear BSS
2016-01-06 11:40 ` Mark Rutland
@ 2016-01-06 12:34 ` Mark Rutland
0 siblings, 0 replies; 9+ messages in thread
From: Mark Rutland @ 2016-01-06 12:34 UTC (permalink / raw)
To: linux-arm-kernel
On Wed, Jan 06, 2016 at 11:40:39AM +0000, Mark Rutland wrote:
> On Wed, Jan 06, 2016 at 12:12:45PM +0100, Ard Biesheuvel wrote:
> > On 6 January 2016 at 12:05, Mark Rutland <mark.rutland@arm.com> wrote:
> > > Currently we use an open-coded memzero to clear the BSS. As it is a
> > > trivial implementation, it is sub-optimal.
> > >
> > > Our optimised memset doesn't use the stack, is position-independent, and
> > > for the memzero case can use of DC ZVA to clear large blocks
> > > efficiently. In __mmap_switched the MMU is on and there are no live
> > > caller-saved registers, so we can safely call an uninstrumented memset.
> > >
> > > This patch changes __mmap_switched to use memset when clearing the BSS.
> > > We use the __pi_memset alias so as to avoid any instrumentation in all
> > > kernel configurations. As with the head symbols, we must get the linker
> > > to generate __bss_size, as there is no ELF relocation for the
> > > subtraction of two symbols.
> > >
> > > Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> > > Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> > > Cc: Catalin Marinas <catalin.marinas@arm.com>
> > > Cc: Marc Zyngier <marc.zyngier@arm.com>
> > > Cc: Will Deacon <will.deacon@arm.com>
> > > ---
> > > arch/arm64/kernel/head.S | 14 ++++++--------
> > > arch/arm64/kernel/image.h | 2 ++
> > > 2 files changed, 8 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> > > index 23cfc08..247a97b 100644
> > > --- a/arch/arm64/kernel/head.S
> > > +++ b/arch/arm64/kernel/head.S
> > > @@ -415,14 +415,12 @@ ENDPROC(__create_page_tables)
> > > */
> > > .set initial_sp, init_thread_union + THREAD_START_SP
> > > __mmap_switched:
> > > - adr_l x6, __bss_start
> > > - adr_l x7, __bss_stop
> > > -
> > > -1: cmp x6, x7
> > > - b.hs 2f
> > > - str xzr, [x6], #8 // Clear BSS
> > > - b 1b
> > > -2:
> > > + // clear BSS
> > > + adr_l x0, __bss_start
> > > + mov x1, xzr
> > > + mov_l x2, __bss_size
> >
> > Is it such a big deal to do
> >
> > adr_l x2, __bss_stop
> > sub x2, x2, x0
> >
> > instead?
>
> I'm happy either way.
>
> It no-one else has a use for mov_l I'll drop it and move to that.
>From a discussion with Will, it sounds like the sub form is preferable,
so I'll drop mov_l for now.
Thanks,
Mark.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 1/2] arm64: add macro to handle large immediates
2016-01-06 12:26 ` Ard Biesheuvel
@ 2016-01-06 12:37 ` Mark Rutland
0 siblings, 0 replies; 9+ messages in thread
From: Mark Rutland @ 2016-01-06 12:37 UTC (permalink / raw)
To: linux-arm-kernel
> >> Ack for the general idea, but for correctness, you should pair the
> >> movk instructions with the _nc relocations (i.e., keep movz first, but
> >> invert the order of the relocs)
> >
> > Ah, I hadn't spotted the restriction. I'll change that to:
> >
> > movz \dst, :abs_g3:\val
> > movk \dst, :abs_g2:\val
> > movk \dst, :abs_g1:\val
> > movk \dst, :abs_g0:\val
> >
>
> Yes, but with the _nc suffix on the latter three.
Yup.
> > That raises a related question. Is it the linker's responsibility to
> > fill in the shift encoding in the hw field as part of the g{3,2,1}
> > relocs?
> >
>
> This
>
> movz x0, :abs_g3:val
> movk x0, :abs_g2_nc:val
> movk x0, :abs_g1_nc:val
> movk x0, :abs_g0_nc:val
>
> assembles to
>
> 0000000000000000 <.text>:
> 0: d2e00000 movz x0, #0x0, lsl #48
> 4: f2c00000 movk x0, #0x0, lsl #32
> 8: f2a00000 movk x0, #0x0, lsl #16
> c: f2800000 movk x0, #0x0
>
> so it is in fact the assembler that sets the hw field.
Interesting!
As I mentioned in another reply, for the moment I'm going to drop mov_l
unless we have another need for it.
Thanks,
Mark.
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2016-01-06 12:37 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-01-06 11:05 [PATCH 1/2] arm64: add macro to handle large immediates Mark Rutland
2016-01-06 11:05 ` [PATCH 2/2] arm64: use memset to clear BSS Mark Rutland
2016-01-06 11:12 ` Ard Biesheuvel
2016-01-06 11:40 ` Mark Rutland
2016-01-06 12:34 ` Mark Rutland
2016-01-06 11:15 ` [PATCH 1/2] arm64: add macro to handle large immediates Ard Biesheuvel
2016-01-06 12:21 ` Mark Rutland
2016-01-06 12:26 ` Ard Biesheuvel
2016-01-06 12:37 ` Mark Rutland
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).