[PATCH 1/2] arm64: add macro to handle large immediates

linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 1/2] arm64: add macro to handle large immediates
@ 2016-01-06 11:05 Mark Rutland
  2016-01-06 11:05 ` [PATCH 2/2] arm64: use memset to clear BSS Mark Rutland
  2016-01-06 11:15 ` [PATCH 1/2] arm64: add macro to handle large immediates Ard Biesheuvel
  0 siblings, 2 replies; 9+ messages in thread
From: Mark Rutland @ 2016-01-06 11:05 UTC (permalink / raw)
  To: linux-arm-kernel

Sometimes we want to be able to load values greater than 0xff into a
register, without placing said values in a literal pool. Arranging for
the value to be split up across a number of movz and movk instructions
is tedious and error-prone.

Following the example of {adr,str,ldr}_l, this patch adds a new mov_l
macro which can be used to load immediate values of up to 64 bits into a
register.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/include/asm/assembler.h | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
index 12eff92..64fd0a2 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -193,6 +193,19 @@ lr	.req	x30		// link register
 	str	\src, [\tmp, :lo12:\sym]
 	.endm
 
+	/*
+	 * Move a large immediate up to 64-bits.
+	 *
+	 * @dst: destination register (64 bit wide)
+	 * @val: value
+	 */
+	.macro	mov_l, dst, val
+	movz	\dst, :abs_g0_nc:\val
+	movk	\dst, :abs_g1_nc:\val
+	movk	\dst, :abs_g2_nc:\val
+	movk	\dst, :abs_g3:\val
+	.endm
+
 /*
  * Annotate a function as position independent, i.e., safe to be called before
  * the kernel virtual mapping is activated.
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/2] arm64: use memset to clear BSS
  2016-01-06 11:05 [PATCH 1/2] arm64: add macro to handle large immediates Mark Rutland
@ 2016-01-06 11:05 ` Mark Rutland
  2016-01-06 11:12   ` Ard Biesheuvel
  2016-01-06 11:15 ` [PATCH 1/2] arm64: add macro to handle large immediates Ard Biesheuvel
  1 sibling, 1 reply; 9+ messages in thread
From: Mark Rutland @ 2016-01-06 11:05 UTC (permalink / raw)
  To: linux-arm-kernel

Currently we use an open-coded memzero to clear the BSS. As it is a
trivial implementation, it is sub-optimal.

Our optimised memset doesn't use the stack, is position-independent, and
for the memzero case can use of DC ZVA to clear large blocks
efficiently. In __mmap_switched the MMU is on and there are no live
caller-saved registers, so we can safely call an uninstrumented memset.

This patch changes __mmap_switched to use memset when clearing the BSS.
We use the __pi_memset alias so as to avoid any instrumentation in all
kernel configurations. As with the head symbols, we must get the linker
to generate __bss_size, as there is no ELF relocation for the
subtraction of two symbols.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/kernel/head.S  | 14 ++++++--------
 arch/arm64/kernel/image.h |  2 ++
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 23cfc08..247a97b 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -415,14 +415,12 @@ ENDPROC(__create_page_tables)
  */
 	.set	initial_sp, init_thread_union + THREAD_START_SP
 __mmap_switched:
-	adr_l	x6, __bss_start
-	adr_l	x7, __bss_stop
-
-1:	cmp	x6, x7
-	b.hs	2f
-	str	xzr, [x6], #8			// Clear BSS
-	b	1b
-2:
+	// clear BSS
+	adr_l	x0, __bss_start
+	mov	x1, xzr
+	mov_l	x2, __bss_size
+	bl	__pi_memset
+
 	adr_l	sp, initial_sp, x4
 	str_l	x21, __fdt_pointer, x5		// Save FDT pointer
 	str_l	x24, memstart_addr, x6		// Save PHYS_OFFSET
diff --git a/arch/arm64/kernel/image.h b/arch/arm64/kernel/image.h
index bc2abb8..5fd76b5 100644
--- a/arch/arm64/kernel/image.h
+++ b/arch/arm64/kernel/image.h
@@ -95,4 +95,6 @@ __efistub__edata		= _edata;
 
 #endif
 
+__bss_size			= __bss_stop - __bss_start;
+
 #endif /* __ASM_IMAGE_H */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/2] arm64: use memset to clear BSS
  2016-01-06 11:05 ` [PATCH 2/2] arm64: use memset to clear BSS Mark Rutland
@ 2016-01-06 11:12   ` Ard Biesheuvel
  2016-01-06 11:40     ` Mark Rutland
  0 siblings, 1 reply; 9+ messages in thread
From: Ard Biesheuvel @ 2016-01-06 11:12 UTC (permalink / raw)
  To: linux-arm-kernel

On 6 January 2016 at 12:05, Mark Rutland <mark.rutland@arm.com> wrote:
> Currently we use an open-coded memzero to clear the BSS. As it is a
> trivial implementation, it is sub-optimal.
>
> Our optimised memset doesn't use the stack, is position-independent, and
> for the memzero case can use of DC ZVA to clear large blocks
> efficiently. In __mmap_switched the MMU is on and there are no live
> caller-saved registers, so we can safely call an uninstrumented memset.
>
> This patch changes __mmap_switched to use memset when clearing the BSS.
> We use the __pi_memset alias so as to avoid any instrumentation in all
> kernel configurations. As with the head symbols, we must get the linker
> to generate __bss_size, as there is no ELF relocation for the
> subtraction of two symbols.
>
> Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Marc Zyngier <marc.zyngier@arm.com>
> Cc: Will Deacon <will.deacon@arm.com>
> ---
>  arch/arm64/kernel/head.S  | 14 ++++++--------
>  arch/arm64/kernel/image.h |  2 ++
>  2 files changed, 8 insertions(+), 8 deletions(-)
>
> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> index 23cfc08..247a97b 100644
> --- a/arch/arm64/kernel/head.S
> +++ b/arch/arm64/kernel/head.S
> @@ -415,14 +415,12 @@ ENDPROC(__create_page_tables)
>   */
>         .set    initial_sp, init_thread_union + THREAD_START_SP
>  __mmap_switched:
> -       adr_l   x6, __bss_start
> -       adr_l   x7, __bss_stop
> -
> -1:     cmp     x6, x7
> -       b.hs    2f
> -       str     xzr, [x6], #8                   // Clear BSS
> -       b       1b
> -2:
> +       // clear BSS
> +       adr_l   x0, __bss_start
> +       mov     x1, xzr
> +       mov_l   x2, __bss_size

Is it such a big deal to do

adr_l x2, __bss_stop
sub x2, x2, x0

instead?

Either way:
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>


> +       bl      __pi_memset
> +
>         adr_l   sp, initial_sp, x4
>         str_l   x21, __fdt_pointer, x5          // Save FDT pointer
>         str_l   x24, memstart_addr, x6          // Save PHYS_OFFSET
> diff --git a/arch/arm64/kernel/image.h b/arch/arm64/kernel/image.h
> index bc2abb8..5fd76b5 100644
> --- a/arch/arm64/kernel/image.h
> +++ b/arch/arm64/kernel/image.h
> @@ -95,4 +95,6 @@ __efistub__edata              = _edata;
>
>  #endif
>
> +__bss_size                     = __bss_stop - __bss_start;
> +
>  #endif /* __ASM_IMAGE_H */
> --
> 1.9.1
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/2] arm64: add macro to handle large immediates
  2016-01-06 11:05 [PATCH 1/2] arm64: add macro to handle large immediates Mark Rutland
  2016-01-06 11:05 ` [PATCH 2/2] arm64: use memset to clear BSS Mark Rutland
@ 2016-01-06 11:15 ` Ard Biesheuvel
  2016-01-06 12:21   ` Mark Rutland
  1 sibling, 1 reply; 9+ messages in thread
From: Ard Biesheuvel @ 2016-01-06 11:15 UTC (permalink / raw)
  To: linux-arm-kernel

On 6 January 2016 at 12:05, Mark Rutland <mark.rutland@arm.com> wrote:
> Sometimes we want to be able to load values greater than 0xff into a
> register, without placing said values in a literal pool. Arranging for
> the value to be split up across a number of movz and movk instructions
> is tedious and error-prone.
>
> Following the example of {adr,str,ldr}_l, this patch adds a new mov_l
> macro which can be used to load immediate values of up to 64 bits into a
> register.
>
> Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Marc Zyngier <marc.zyngier@arm.com>
> Cc: Will Deacon <will.deacon@arm.com>
> ---
>  arch/arm64/include/asm/assembler.h | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
>
> diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
> index 12eff92..64fd0a2 100644
> --- a/arch/arm64/include/asm/assembler.h
> +++ b/arch/arm64/include/asm/assembler.h
> @@ -193,6 +193,19 @@ lr .req    x30             // link register
>         str     \src, [\tmp, :lo12:\sym]
>         .endm
>
> +       /*
> +        * Move a large immediate up to 64-bits.
> +        *
> +        * @dst: destination register (64 bit wide)
> +        * @val: value
> +        */
> +       .macro  mov_l, dst, val
> +       movz    \dst, :abs_g0_nc:\val
> +       movk    \dst, :abs_g1_nc:\val
> +       movk    \dst, :abs_g2_nc:\val
> +       movk    \dst, :abs_g3:\val
> +       .endm
> +

Ack for the general idea, but for correctness, you should pair the
movk instructions with the _nc relocations (i.e., keep movz first, but
invert the order of the relocs)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 2/2] arm64: use memset to clear BSS
  2016-01-06 11:12   ` Ard Biesheuvel
@ 2016-01-06 11:40     ` Mark Rutland
  2016-01-06 12:34       ` Mark Rutland
  0 siblings, 1 reply; 9+ messages in thread
From: Mark Rutland @ 2016-01-06 11:40 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jan 06, 2016 at 12:12:45PM +0100, Ard Biesheuvel wrote:
> On 6 January 2016 at 12:05, Mark Rutland <mark.rutland@arm.com> wrote:
> > Currently we use an open-coded memzero to clear the BSS. As it is a
> > trivial implementation, it is sub-optimal.
> >
> > Our optimised memset doesn't use the stack, is position-independent, and
> > for the memzero case can use of DC ZVA to clear large blocks
> > efficiently. In __mmap_switched the MMU is on and there are no live
> > caller-saved registers, so we can safely call an uninstrumented memset.
> >
> > This patch changes __mmap_switched to use memset when clearing the BSS.
> > We use the __pi_memset alias so as to avoid any instrumentation in all
> > kernel configurations. As with the head symbols, we must get the linker
> > to generate __bss_size, as there is no ELF relocation for the
> > subtraction of two symbols.
> >
> > Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> > Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> > Cc: Catalin Marinas <catalin.marinas@arm.com>
> > Cc: Marc Zyngier <marc.zyngier@arm.com>
> > Cc: Will Deacon <will.deacon@arm.com>
> > ---
> >  arch/arm64/kernel/head.S  | 14 ++++++--------
> >  arch/arm64/kernel/image.h |  2 ++
> >  2 files changed, 8 insertions(+), 8 deletions(-)
> >
> > diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> > index 23cfc08..247a97b 100644
> > --- a/arch/arm64/kernel/head.S
> > +++ b/arch/arm64/kernel/head.S
> > @@ -415,14 +415,12 @@ ENDPROC(__create_page_tables)
> >   */
> >         .set    initial_sp, init_thread_union + THREAD_START_SP
> >  __mmap_switched:
> > -       adr_l   x6, __bss_start
> > -       adr_l   x7, __bss_stop
> > -
> > -1:     cmp     x6, x7
> > -       b.hs    2f
> > -       str     xzr, [x6], #8                   // Clear BSS
> > -       b       1b
> > -2:
> > +       // clear BSS
> > +       adr_l   x0, __bss_start
> > +       mov     x1, xzr
> > +       mov_l   x2, __bss_size
> 
> Is it such a big deal to do
> 
> adr_l x2, __bss_stop
> sub x2, x2, x0
> 
> instead?

I'm happy either way.

It no-one else has a use for mov_l I'll drop it and move to that.

> Either way:
> Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

Thanks!

Mark.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/2] arm64: add macro to handle large immediates
  2016-01-06 11:15 ` [PATCH 1/2] arm64: add macro to handle large immediates Ard Biesheuvel
@ 2016-01-06 12:21   ` Mark Rutland
  2016-01-06 12:26     ` Ard Biesheuvel
  0 siblings, 1 reply; 9+ messages in thread
From: Mark Rutland @ 2016-01-06 12:21 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jan 06, 2016 at 12:15:14PM +0100, Ard Biesheuvel wrote:
> On 6 January 2016 at 12:05, Mark Rutland <mark.rutland@arm.com> wrote:
> > Sometimes we want to be able to load values greater than 0xff into a
> > register, without placing said values in a literal pool. Arranging for
> > the value to be split up across a number of movz and movk instructions
> > is tedious and error-prone.
> >
> > Following the example of {adr,str,ldr}_l, this patch adds a new mov_l
> > macro which can be used to load immediate values of up to 64 bits into a
> > register.
> >
> > Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> > Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> > Cc: Catalin Marinas <catalin.marinas@arm.com>
> > Cc: Marc Zyngier <marc.zyngier@arm.com>
> > Cc: Will Deacon <will.deacon@arm.com>
> > ---
> >  arch/arm64/include/asm/assembler.h | 13 +++++++++++++
> >  1 file changed, 13 insertions(+)
> >
> > diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
> > index 12eff92..64fd0a2 100644
> > --- a/arch/arm64/include/asm/assembler.h
> > +++ b/arch/arm64/include/asm/assembler.h
> > @@ -193,6 +193,19 @@ lr .req    x30             // link register
> >         str     \src, [\tmp, :lo12:\sym]
> >         .endm
> >
> > +       /*
> > +        * Move a large immediate up to 64-bits.
> > +        *
> > +        * @dst: destination register (64 bit wide)
> > +        * @val: value
> > +        */
> > +       .macro  mov_l, dst, val
> > +       movz    \dst, :abs_g0_nc:\val
> > +       movk    \dst, :abs_g1_nc:\val
> > +       movk    \dst, :abs_g2_nc:\val
> > +       movk    \dst, :abs_g3:\val
> > +       .endm
> > +
> 
> Ack for the general idea, but for correctness, you should pair the
> movk instructions with the _nc relocations (i.e., keep movz first, but
> invert the order of the relocs)

Ah, I hadn't spotted the restriction. I'll change that to:

	movz	\dst, :abs_g3:\val
	movk	\dst, :abs_g2:\val
	movk	\dst, :abs_g1:\val
	movk	\dst, :abs_g0:\val

That raises a related question. Is it the linker's responsibility to
fill in the shift encoding in the hw field as part of the g{3,2,1}
relocs?

Mine seems to, but I don't know if that's strictly required or correct
as the AArrch64 ELF spec only mentions the immediate field for *ABS_G*,
and the shift is encoded in hw rather than imm16.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/2] arm64: add macro to handle large immediates
  2016-01-06 12:21   ` Mark Rutland
@ 2016-01-06 12:26     ` Ard Biesheuvel
  2016-01-06 12:37       ` Mark Rutland
  0 siblings, 1 reply; 9+ messages in thread
From: Ard Biesheuvel @ 2016-01-06 12:26 UTC (permalink / raw)
  To: linux-arm-kernel

On 6 January 2016 at 13:21, Mark Rutland <mark.rutland@arm.com> wrote:
> On Wed, Jan 06, 2016 at 12:15:14PM +0100, Ard Biesheuvel wrote:
>> On 6 January 2016 at 12:05, Mark Rutland <mark.rutland@arm.com> wrote:
>> > Sometimes we want to be able to load values greater than 0xff into a
>> > register, without placing said values in a literal pool. Arranging for
>> > the value to be split up across a number of movz and movk instructions
>> > is tedious and error-prone.
>> >
>> > Following the example of {adr,str,ldr}_l, this patch adds a new mov_l
>> > macro which can be used to load immediate values of up to 64 bits into a
>> > register.
>> >
>> > Signed-off-by: Mark Rutland <mark.rutland@arm.com>
>> > Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>> > Cc: Catalin Marinas <catalin.marinas@arm.com>
>> > Cc: Marc Zyngier <marc.zyngier@arm.com>
>> > Cc: Will Deacon <will.deacon@arm.com>
>> > ---
>> >  arch/arm64/include/asm/assembler.h | 13 +++++++++++++
>> >  1 file changed, 13 insertions(+)
>> >
>> > diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
>> > index 12eff92..64fd0a2 100644
>> > --- a/arch/arm64/include/asm/assembler.h
>> > +++ b/arch/arm64/include/asm/assembler.h
>> > @@ -193,6 +193,19 @@ lr .req    x30             // link register
>> >         str     \src, [\tmp, :lo12:\sym]
>> >         .endm
>> >
>> > +       /*
>> > +        * Move a large immediate up to 64-bits.
>> > +        *
>> > +        * @dst: destination register (64 bit wide)
>> > +        * @val: value
>> > +        */
>> > +       .macro  mov_l, dst, val
>> > +       movz    \dst, :abs_g0_nc:\val
>> > +       movk    \dst, :abs_g1_nc:\val
>> > +       movk    \dst, :abs_g2_nc:\val
>> > +       movk    \dst, :abs_g3:\val
>> > +       .endm
>> > +
>>
>> Ack for the general idea, but for correctness, you should pair the
>> movk instructions with the _nc relocations (i.e., keep movz first, but
>> invert the order of the relocs)
>
> Ah, I hadn't spotted the restriction. I'll change that to:
>
>         movz    \dst, :abs_g3:\val
>         movk    \dst, :abs_g2:\val
>         movk    \dst, :abs_g1:\val
>         movk    \dst, :abs_g0:\val
>

Yes, but with the _nc suffix on the latter three.

> That raises a related question. Is it the linker's responsibility to
> fill in the shift encoding in the hw field as part of the g{3,2,1}
> relocs?
>

This

movz x0, :abs_g3:val
movk x0, :abs_g2_nc:val
movk x0, :abs_g1_nc:val
movk x0, :abs_g0_nc:val

assembles to

0000000000000000 <.text>:
   0: d2e00000 movz x0, #0x0, lsl #48
   4: f2c00000 movk x0, #0x0, lsl #32
   8: f2a00000 movk x0, #0x0, lsl #16
   c: f2800000 movk x0, #0x0

so it is in fact the assembler that sets the hw field.

> Mine seems to, but I don't know if that's strictly required or correct
> as the AArrch64 ELF spec only mentions the immediate field for *ABS_G*,
> and the shift is encoded in hw rather than imm16.
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 2/2] arm64: use memset to clear BSS
  2016-01-06 11:40     ` Mark Rutland
@ 2016-01-06 12:34       ` Mark Rutland
  0 siblings, 0 replies; 9+ messages in thread
From: Mark Rutland @ 2016-01-06 12:34 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jan 06, 2016 at 11:40:39AM +0000, Mark Rutland wrote:
> On Wed, Jan 06, 2016 at 12:12:45PM +0100, Ard Biesheuvel wrote:
> > On 6 January 2016 at 12:05, Mark Rutland <mark.rutland@arm.com> wrote:
> > > Currently we use an open-coded memzero to clear the BSS. As it is a
> > > trivial implementation, it is sub-optimal.
> > >
> > > Our optimised memset doesn't use the stack, is position-independent, and
> > > for the memzero case can use of DC ZVA to clear large blocks
> > > efficiently. In __mmap_switched the MMU is on and there are no live
> > > caller-saved registers, so we can safely call an uninstrumented memset.
> > >
> > > This patch changes __mmap_switched to use memset when clearing the BSS.
> > > We use the __pi_memset alias so as to avoid any instrumentation in all
> > > kernel configurations. As with the head symbols, we must get the linker
> > > to generate __bss_size, as there is no ELF relocation for the
> > > subtraction of two symbols.
> > >
> > > Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> > > Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> > > Cc: Catalin Marinas <catalin.marinas@arm.com>
> > > Cc: Marc Zyngier <marc.zyngier@arm.com>
> > > Cc: Will Deacon <will.deacon@arm.com>
> > > ---
> > >  arch/arm64/kernel/head.S  | 14 ++++++--------
> > >  arch/arm64/kernel/image.h |  2 ++
> > >  2 files changed, 8 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> > > index 23cfc08..247a97b 100644
> > > --- a/arch/arm64/kernel/head.S
> > > +++ b/arch/arm64/kernel/head.S
> > > @@ -415,14 +415,12 @@ ENDPROC(__create_page_tables)
> > >   */
> > >         .set    initial_sp, init_thread_union + THREAD_START_SP
> > >  __mmap_switched:
> > > -       adr_l   x6, __bss_start
> > > -       adr_l   x7, __bss_stop
> > > -
> > > -1:     cmp     x6, x7
> > > -       b.hs    2f
> > > -       str     xzr, [x6], #8                   // Clear BSS
> > > -       b       1b
> > > -2:
> > > +       // clear BSS
> > > +       adr_l   x0, __bss_start
> > > +       mov     x1, xzr
> > > +       mov_l   x2, __bss_size
> > 
> > Is it such a big deal to do
> > 
> > adr_l x2, __bss_stop
> > sub x2, x2, x0
> > 
> > instead?
> 
> I'm happy either way.
> 
> It no-one else has a use for mov_l I'll drop it and move to that.

>From a discussion with Will, it sounds like the sub form is preferable,
so I'll drop mov_l for now.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/2] arm64: add macro to handle large immediates
  2016-01-06 12:26     ` Ard Biesheuvel
@ 2016-01-06 12:37       ` Mark Rutland
  0 siblings, 0 replies; 9+ messages in thread
From: Mark Rutland @ 2016-01-06 12:37 UTC (permalink / raw)
  To: linux-arm-kernel

> >> Ack for the general idea, but for correctness, you should pair the
> >> movk instructions with the _nc relocations (i.e., keep movz first, but
> >> invert the order of the relocs)
> >
> > Ah, I hadn't spotted the restriction. I'll change that to:
> >
> >         movz    \dst, :abs_g3:\val
> >         movk    \dst, :abs_g2:\val
> >         movk    \dst, :abs_g1:\val
> >         movk    \dst, :abs_g0:\val
> >
> 
> Yes, but with the _nc suffix on the latter three.

Yup.

> > That raises a related question. Is it the linker's responsibility to
> > fill in the shift encoding in the hw field as part of the g{3,2,1}
> > relocs?
> >
> 
> This
> 
> movz x0, :abs_g3:val
> movk x0, :abs_g2_nc:val
> movk x0, :abs_g1_nc:val
> movk x0, :abs_g0_nc:val
> 
> assembles to
> 
> 0000000000000000 <.text>:
>    0: d2e00000 movz x0, #0x0, lsl #48
>    4: f2c00000 movk x0, #0x0, lsl #32
>    8: f2a00000 movk x0, #0x0, lsl #16
>    c: f2800000 movk x0, #0x0
> 
> so it is in fact the assembler that sets the hw field.

Interesting!

As I mentioned in another reply, for the moment I'm going to drop mov_l
unless we have another need for it.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-01-06 12:37 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-01-06 11:05 [PATCH 1/2] arm64: add macro to handle large immediates Mark Rutland
2016-01-06 11:05 ` [PATCH 2/2] arm64: use memset to clear BSS Mark Rutland
2016-01-06 11:12   ` Ard Biesheuvel
2016-01-06 11:40     ` Mark Rutland
2016-01-06 12:34       ` Mark Rutland
2016-01-06 11:15 ` [PATCH 1/2] arm64: add macro to handle large immediates Ard Biesheuvel
2016-01-06 12:21   ` Mark Rutland
2016-01-06 12:26     ` Ard Biesheuvel
2016-01-06 12:37       ` Mark Rutland

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).