From mboxrd@z Thu Jan 1 00:00:00 1970
From: Przemyslaw Marczak
Date: Fri, 13 Feb 2015 17:23:31 +0100
Subject: [U-Boot] [PATCH 2/3] arm: relocation: clear .bss section with
arch memset if defined
In-Reply-To: <20150212153715.GD7086@bill-the-cat>
References: <1422449743-10119-1-git-send-email-p.marczak@samsung.com>
<1422449743-10119-3-git-send-email-p.marczak@samsung.com>
<20150201033842.130a86ac@lilith> <20150212153715.GD7086@bill-the-cat>
Message-ID: <54DE2503.6020200@samsung.com>
List-Id:
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
To: u-boot@lists.denx.de
Hello Tom,
On 02/12/2015 04:37 PM, Tom Rini wrote:
> On Sun, Feb 01, 2015 at 03:38:42AM +0100, Albert ARIBAUD wrote:
>> Hello Przemyslaw,
>>
>> On Wed, 28 Jan 2015 13:55:42 +0100, Przemyslaw Marczak
>> wrote:
>>> For ARM architecture, enable the CONFIG_USE_ARCH_MEMSET/MEMCPY,
>>> will highly increase the memset/memcpy performance. This is able
>>> thanks to the ARM multiple register instructions.
>>>
>>> Unfortunatelly the relocation is done without the cache enabled,
>>> so it takes some time, but zeroing the BSS memory takes much more
>>> longer, especially for the configs with big static buffers.
>>>
>>> A quick test confirms, that the boot time improvement after using
>>> the arch memcpy for relocation has no significant meaning.
>>> The same test confirms that enable the memset for zeroing BSS,
>>> reduces the boot time.
>>>
>>> So this patch enables the arch memset for zeroing the BSS after
>>> the relocation process. For ARM boards, this can be enabled
>>> in board configs by defining: 'CONFIG_USE_ARCH_MEMSET'.
>>
>> Since the issue is that zeroing is done one word at a time, could we
>> not simply clear r3 as well as r2 (possibly even r4 and r5 too) and do
>> a double (possibly quadruple) write loop? That would avoid calling a
>> libc routine from the almost sole file in U-Boot where a C environment
>> is not necessarily granted.
>
> I want to jump up here again. Note that the arch memset/memcpy routines
> are in asm and I don't belive require a C environment. Why don't we
> simply use the asm versions for everyone and backport whatever we need
> from the kernel to re-sync there as it's not a choice there and it's a
> performance win too?
>
Right, for ARM the mentioned routines doesn't require C env. But if we
could achieve some improvement in this place, then maybe it has sense to
add some new code just for bss.
I will try to combine and make some timing tests on Monday.
Best regards,
--
Przemyslaw Marczak
Samsung R&D Institute Poland
Samsung Electronics
p.marczak at samsung.com