From: Bill Pringlemeir <bpringlemeir@nbsps.com>
To: u-boot@lists.denx.de
Subject: [U-Boot] [PATCH 2/3] arm: relocation: clear .bss section with arch memset if defined
Date: Mon, 02 Feb 2015 12:04:37 -0500 [thread overview]
Message-ID: <87wq40qm6y.fsf@nbsps.com> (raw)
In-Reply-To: Bill Pringlemeir's message of "Mon\, 02 Feb 2015 12\:01\:16 -0500"
On 2 Feb 2015, bpringlemeir at nbsps.com wrote:
> On 31 Jan 2015, albert.u.boot at aribaud.net wrote:
>
>> Hello Przemyslaw,
>>
>> On Wed, 28 Jan 2015 13:55:42 +0100, Przemyslaw Marczak
>> <p.marczak@samsung.com> wrote:
>>> For ARM architecture, enable the CONFIG_USE_ARCH_MEMSET/MEMCPY,
>>> will highly increase the memset/memcpy performance. This is able
>>> thanks to the ARM multiple register instructions.
>>>
>>> Unfortunatelly the relocation is done without the cache enabled,
>>> so it takes some time, but zeroing the BSS memory takes much more
>>> longer, especially for the configs with big static buffers.
>>>
>>> A quick test confirms, that the boot time improvement after using
>>> the arch memcpy for relocation has no significant meaning.
>>> The same test confirms that enable the memset for zeroing BSS,
>>> reduces the boot time.
>>>
>>> So this patch enables the arch memset for zeroing the BSS after
>>> the relocation process. For ARM boards, this can be enabled
>>> in board configs by defining: 'CONFIG_USE_ARCH_MEMSET'.
>
>> Since the issue is that zeroing is done one word at a time, could we
>> not simply clear r3 as well as r2 (possibly even r4 and r5 too) and
>> do a double (possibly quadruple) write loop? That would avoid calling
>> a libc routine from the almost sole file in U-Boot where a C
>> environment is not necessarily granted.
I thought the same thing...
> diff --git a/arch/arm/lib/crt0.S b/arch/arm/lib/crt0.S index
> 22df3e5..fab3d2c 100644 --- a/arch/arm/lib/crt0.S +++
> b/arch/arm/lib/crt0.S @@ -115,14 +115,22 @@ here: bl
> c_runtime_cpu_setup /* we still call old routine here */
>
> ldr r0, =__bss_start /* this is auto-relocated! */
> - ldr r1, =__bss_end /* this is auto-relocated! */
>
> +#ifdef CONFIG_USE_ARCH_MEMSET
> + ldr r3, =__bss_end /* this is auto-relocated! */
> + mov r1, #0x00000000 /* prepare zero to clear BSS */
> +
> + subs r2, r3, r0 /* r2 = memset len */
> + bl memset
> +#else
> + ldr r1, =__bss_end /* this is auto-relocated! */
> mov r2, #0x00000000 /* prepare zero to clear BSS */
>
> clbss_l:cmp r0, r1 /* while not at end of BSS */
> strlo r2, [r0] /* clear 32-bit BSS word */
> addlo r0, r0, #4 /* move to next */
> blo clbss_l
> +#endif
This is great news (increase boot speed). Maybe if this files wasn't
conditional? Assuming the the 'BSS' is aligned (LDS enforced),
ldr r0, =__bss_start /* this is auto-relocated! */
ldr r1, =__bss_end /* this is auto-relocated! */
+ mov r2, #0 /* prepare zero to clear BSS */
+ mov r3, #0
+ mov r4, #0
+ mov r5, #0
+ mov r6, #0
+ mov r7, #0
+ mov r8, #0
+ mov lr, #0
+ b 1f
+ .align 4
clbss_l:
+ stmia r0!, {r2-r8,lr} /* clear 32 BSS word */
+ stmia r0!, {r2-r8,lr} /* easy to unroll */
+ 1: cmp r0, r1
blo clbss_l
The code should work on all ARM versions? Then every platform would
benefit. I think the larger portion of the 'ARCH memset' is to handle
alignment issues. Sometimes the tail/head portion can be handled
efficiently with 'strd', etc which is only on some CPUs. It is easy to
setup the BSS so that both the head/tail are aligned.... but I think the
above code only requires multiples of 32bytes as total BSS size (it is
easy to jump into the middle of an unrolled loop with 'add pc, rn<<2,
#constant'). The size/iteration can be easily tweaked (8/16/32 bytes).
At least in principal *if* there is some size alignment on BSS it is
fairly easy to write some generic ARM to quickly clear the BSS that will
be just as competitive as any ARCH memset version. The above code is
adding about 13 words of code.
Fwiw,
Bill Pringlemeir.
next prev parent reply other threads:[~2015-02-02 17:04 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-01-28 12:55 [U-Boot] [PATCH 0/3] arm: reduce .bss section clear time Przemyslaw Marczak
2015-01-28 12:55 ` [U-Boot] [PATCH 1/3] exynos: config: enable arch memcpy and arch memset Przemyslaw Marczak
2015-01-28 12:55 ` [U-Boot] [PATCH 2/3] arm: relocation: clear .bss section with arch memset if defined Przemyslaw Marczak
2015-02-01 2:38 ` Albert ARIBAUD
2015-02-02 17:04 ` Bill Pringlemeir [this message]
2015-02-02 17:25 ` Tom Rini
2015-02-02 17:28 ` Pantelis Antoniou
2015-02-02 17:36 ` Tom Rini
2015-02-12 15:37 ` Tom Rini
2015-02-13 16:23 ` Przemyslaw Marczak
2015-01-28 12:55 ` [U-Boot] [PATCH 3/3] dfu: mmc: file buffer: remove static allocation Przemyslaw Marczak
2015-01-28 13:12 ` [U-Boot] [PATCH 0/3] arm: reduce .bss section clear time Stefan Roese
2015-01-28 14:10 ` Przemyslaw Marczak
2015-01-28 14:18 ` Pantelis Antoniou
2015-01-28 14:30 ` Przemyslaw Marczak
2015-01-28 14:34 ` Pantelis Antoniou
2015-01-29 15:26 ` Przemyslaw Marczak
2015-01-29 16:48 ` Przemyslaw Marczak
2015-02-02 8:46 ` Lukasz Majewski
2015-02-02 18:15 ` Simon Glass
2015-02-05 9:51 ` Lukasz Majewski
2015-02-12 16:07 ` Tom Rini
2015-02-13 15:48 ` Przemyslaw Marczak
2015-02-13 18:13 ` Tom Rini
2015-02-13 16:15 ` Przemyslaw Marczak
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87wq40qm6y.fsf@nbsps.com \
--to=bpringlemeir@nbsps.com \
--cc=u-boot@lists.denx.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox