From mboxrd@z Thu Jan 1 00:00:00 1970 From: Przemyslaw Marczak Date: Fri, 13 Feb 2015 17:23:31 +0100 Subject: [U-Boot] [PATCH 2/3] arm: relocation: clear .bss section with arch memset if defined In-Reply-To: <20150212153715.GD7086@bill-the-cat> References: <1422449743-10119-1-git-send-email-p.marczak@samsung.com> <1422449743-10119-3-git-send-email-p.marczak@samsung.com> <20150201033842.130a86ac@lilith> <20150212153715.GD7086@bill-the-cat> Message-ID: <54DE2503.6020200@samsung.com> List-Id: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: u-boot@lists.denx.de Hello Tom, On 02/12/2015 04:37 PM, Tom Rini wrote: > On Sun, Feb 01, 2015 at 03:38:42AM +0100, Albert ARIBAUD wrote: >> Hello Przemyslaw, >> >> On Wed, 28 Jan 2015 13:55:42 +0100, Przemyslaw Marczak >> wrote: >>> For ARM architecture, enable the CONFIG_USE_ARCH_MEMSET/MEMCPY, >>> will highly increase the memset/memcpy performance. This is able >>> thanks to the ARM multiple register instructions. >>> >>> Unfortunatelly the relocation is done without the cache enabled, >>> so it takes some time, but zeroing the BSS memory takes much more >>> longer, especially for the configs with big static buffers. >>> >>> A quick test confirms, that the boot time improvement after using >>> the arch memcpy for relocation has no significant meaning. >>> The same test confirms that enable the memset for zeroing BSS, >>> reduces the boot time. >>> >>> So this patch enables the arch memset for zeroing the BSS after >>> the relocation process. For ARM boards, this can be enabled >>> in board configs by defining: 'CONFIG_USE_ARCH_MEMSET'. >> >> Since the issue is that zeroing is done one word at a time, could we >> not simply clear r3 as well as r2 (possibly even r4 and r5 too) and do >> a double (possibly quadruple) write loop? That would avoid calling a >> libc routine from the almost sole file in U-Boot where a C environment >> is not necessarily granted. > > I want to jump up here again. Note that the arch memset/memcpy routines > are in asm and I don't belive require a C environment. Why don't we > simply use the asm versions for everyone and backport whatever we need > from the kernel to re-sync there as it's not a choice there and it's a > performance win too? > Right, for ARM the mentioned routines doesn't require C env. But if we could achieve some improvement in this place, then maybe it has sense to add some new code just for bss. I will try to combine and make some timing tests on Monday. Best regards, -- Przemyslaw Marczak Samsung R&D Institute Poland Samsung Electronics p.marczak at samsung.com