From mboxrd@z Thu Jan 1 00:00:00 1970 From: Heiko Stuebner Date: Thu, 30 Mar 2017 13:14:02 +0200 Subject: [U-Boot] [PATCH v2] string: Provide a slimmed-down memset() In-Reply-To: <20170326233817.8834-3-sjg@chromium.org> References: <20170326233817.8834-1-sjg@chromium.org> <20170326233817.8834-3-sjg@chromium.org> Message-ID: <4143670.kux4ZBcVD6@phil> List-Id: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: u-boot@lists.denx.de Most of the time the optimised memset() is what we want. For extreme situations such as TPL it may be too large. For example on the 'rock' board, using a simple loop saves a useful 48 bytes. With gcc 4.9 and the rodata bug, this patch is enough to reduce the TPL image below the limit. Signed-off-by: Simon Glass Signed-off-by: Heiko Stuebner --- Hi Simon, a bit bikesheddy, but might it make more sense to structure the options like below? That way it matches USE_ARCH_MEMSET and might make the intent visible better, as you get USE_ARCH_MEMSET=y = biggest but also fastest (nothing) = default from libgeneric USE_TINY_MEMSET=y = optimize for size over speed Also might make reading defconfigs easier as you would have CONFIG_USE_TINY_MEMSET=y instead of # CONFIG_FAST_MEMSET is not set when needing that option. Anyway, I've tested both variants on a live rk3188-rock now and everything of course still works, even when build with gcc-4.9, so both variants also Tested-by: Heiko Stuebner Heiko lib/Kconfig | 20 ++++++++++++++++++++ lib/string.c | 5 ++++- 2 files changed, 24 insertions(+), 1 deletion(-) diff --git a/lib/Kconfig b/lib/Kconfig index 65c01573e1..ab42413839 100644 --- a/lib/Kconfig +++ b/lib/Kconfig @@ -52,6 +52,26 @@ config LIB_RAND help This library provides pseudo-random number generator functions. +config USE_TINY_MEMSET + bool "Use a size-optimized memset()" + help + This makes memset prefer code size over speed optimizations. + The fastest memset() is the arch-specific one (if available) enabled + by CONFIG_USE_ARCH_MEMSET. If that is not enabled, we can still get + better performance by writing a word at a time at the cost of + slightly bigger memset code, but in some special cases size might + be more important than speed. + +config SPL_USE_TINY_MEMSET + bool "Use a size-optimized memset()" + help + This makes memset prefer code size over speed optimizations. + The fastest memset() is the arch-specific one (if available) enabled + by CONFIG_USE_ARCH_MEMSET. If that is not enabled, we can still get + better performance by writing a word at a time at the cost of + slightly bigger memset code, but in some special cases size might + be more important than speed. + source lib/dhry/Kconfig source lib/rsa/Kconfig diff --git a/lib/string.c b/lib/string.c index 67d5f6a421..edae997fa6 100644 --- a/lib/string.c +++ b/lib/string.c @@ -437,8 +437,10 @@ char *strswab(const char *s) void * memset(void * s,int c,size_t count) { unsigned long *sl = (unsigned long *) s; - unsigned long cl = 0; char *s8; + +#if !CONFIG_IS_ENABLED(USE_TINY_MEMSET) + unsigned long cl = 0; int i; /* do it one word at a time (32 bits or 64 bits) while possible */ @@ -452,6 +454,7 @@ void * memset(void * s,int c,size_t count) count -= sizeof(*sl); } } +#endif /* fill 8 bits at a time */ s8 = (char *)sl; while (count--) -- 2.11.0