public inbox for u-boot@lists.denx.de
 help / color / mirror / Atom feed
* [U-Boot] [PATCH 0/3] make memcpy and memset 32-bit copies
@ 2009-10-07  8:44 Alessandro Rubini
  2009-10-07  8:44 ` [U-Boot] [PATCH 1/3] memcpy: use 32-bit copies if possible Alessandro Rubini
                   ` (3 more replies)
  0 siblings, 4 replies; 19+ messages in thread
From: Alessandro Rubini @ 2009-10-07  8:44 UTC (permalink / raw)
  To: u-boot

I've added 32-bit lcd to the Nomadik (not submitted yet), and I found
the scroll to be very slow, as the screen is big.

Instead of activating the "if 0" stanza for 32-bit scroll in lcd.c,
I'd better have a faster memcpy/memset globally. So this patch set
adds 32-bit memcpy and memset and removes the "#if 0" part in the
scroll function. For me it's 4 times faster ("help" from 14s to 3.5s).

I agree I should use 8-bits in u-boot, but the speedup of 32bit
memcpy/memset is there regardless, as most users are 32-bit aligned
anyways.

Alessandro Rubini (3):
  memcpy: use 32-bit copies if possible
  memset: use 32-bit copies if possible
  lcd: remove '#if 0' 32-bit scroll, now memcpy does it

 common/lcd.c         |   21 ---------------------
 lib_generic/string.c |   27 +++++++++++++++++++++++++--
 2 files changed, 25 insertions(+), 23 deletions(-)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [U-Boot] [PATCH 1/3] memcpy: use 32-bit copies if possible
  2009-10-07  8:44 [U-Boot] [PATCH 0/3] make memcpy and memset 32-bit copies Alessandro Rubini
@ 2009-10-07  8:44 ` Alessandro Rubini
  2009-10-07  8:52   ` Mike Frysinger
  2009-10-07 14:35   ` Peter Tyser
  2009-10-07  8:44 ` [U-Boot] [PATCH 2/3] memset: " Alessandro Rubini
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 19+ messages in thread
From: Alessandro Rubini @ 2009-10-07  8:44 UTC (permalink / raw)
  To: u-boot

From: Alessandro Rubini <rubini@unipv.it>

Signed-off-by: Alessandro Rubini <rubini@unipv.it>
Acked-by: Andrea Gallo <andrea.gallo@stericsson.com>
---
 lib_generic/string.c |   11 ++++++++++-
 1 files changed, 10 insertions(+), 1 deletions(-)

diff --git a/lib_generic/string.c b/lib_generic/string.c
index 181eda6..fdccab6 100644
--- a/lib_generic/string.c
+++ b/lib_generic/string.c
@@ -449,7 +449,16 @@ char * bcopy(const char * src, char * dest, int count)
 void * memcpy(void * dest,const void *src,size_t count)
 {
 	char *tmp = (char *) dest, *s = (char *) src;
+	u32 *d32 = (u32 *)dest, *s32 = (u32 *) src;
 
+	/* if both are aligned, use 32-bit copy */
+	if ( (((int)dest & 3) | ((int)src & 3) | (count & 3)) == 0 ) {
+		count /= 4;
+		while (count--)
+			*d32++ = *s32++;
+		return dest;
+	}
+	/* else, use 1-byte copy */
 	while (count--)
 		*tmp++ = *s++;
 
 -- 
1.5.6.5

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [U-Boot] [PATCH 2/3] memset: use 32-bit copies if possible
  2009-10-07  8:44 [U-Boot] [PATCH 0/3] make memcpy and memset 32-bit copies Alessandro Rubini
  2009-10-07  8:44 ` [U-Boot] [PATCH 1/3] memcpy: use 32-bit copies if possible Alessandro Rubini
@ 2009-10-07  8:44 ` Alessandro Rubini
  2009-10-07  9:27   ` Wolfgang Denk
  2009-10-07  8:44 ` [U-Boot] [PATCH 3/3] lcd: remove '#if 0' 32-bit scroll, now memcpy does it Alessandro Rubini
       [not found] ` <b03f28d4c4afa0c5c9f2421b7d260a0f3890cdcd.1254904388.git.rubini@unipv.it>
  3 siblings, 1 reply; 19+ messages in thread
From: Alessandro Rubini @ 2009-10-07  8:44 UTC (permalink / raw)
  To: u-boot

From: Alessandro Rubini <rubini@unipv.it>

Signed-off-by: Alessandro Rubini <rubini@unipv.it>
Acked-by: Andrea Gallo <andrea.gallo@stericsson.com>
---
 lib_generic/string.c |   16 +++++++++++++++-
 1 files changed, 15 insertions(+), 1 deletions(-)

diff --git a/lib_generic/string.c b/lib_generic/string.c
index fdccab6..68e0255 100644
--- a/lib_generic/string.c
+++ b/lib_generic/string.c
@@ -404,7 +404,21 @@ char *strswab(const char *s)
 void * memset(void * s,int c,size_t count)
 {
 	char *xs = (char *) s;
+	u32 *s32 = (u32 *) s;
+	int c32 = 0; /* most common case */
 
+	/* do it 32 bits at a time if possible */
+	if ( ((count & 3) | ((int)s & 3)) == 0) {
+		count /= 4;
+		if (c) { /* not 0: build 32-bit value */
+			c32 = c | (c<<8);
+			c32 |= c32 << 16;
+		}
+		while (count--)
+			*s32++ = c32;
+		return s;
+	}
+	/* else, fill 8 bits@a time */
 	while (count--)
 		*xs++ = c;
 
-- 
1.5.6.5

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [U-Boot] [PATCH 3/3] lcd: remove '#if 0' 32-bit scroll, now memcpy does it
  2009-10-07  8:44 [U-Boot] [PATCH 0/3] make memcpy and memset 32-bit copies Alessandro Rubini
  2009-10-07  8:44 ` [U-Boot] [PATCH 1/3] memcpy: use 32-bit copies if possible Alessandro Rubini
  2009-10-07  8:44 ` [U-Boot] [PATCH 2/3] memset: " Alessandro Rubini
@ 2009-10-07  8:44 ` Alessandro Rubini
       [not found] ` <b03f28d4c4afa0c5c9f2421b7d260a0f3890cdcd.1254904388.git.rubini@unipv.it>
  3 siblings, 0 replies; 19+ messages in thread
From: Alessandro Rubini @ 2009-10-07  8:44 UTC (permalink / raw)
  To: u-boot

From: Alessandro Rubini <rubini@unipv.it>

Signed-off-by: Alessandro Rubini <rubini@unipv.it>
Acked-by: Andrea Gallo <andrea.gallo@stericsson.com>
---
 common/lcd.c |   21 ---------------------
 1 files changed, 0 insertions(+), 21 deletions(-)

diff --git a/common/lcd.c b/common/lcd.c
index dc8fea6..4e31618 100644
--- a/common/lcd.c
+++ b/common/lcd.c
@@ -99,32 +99,11 @@ static int lcd_getfgcolor (void);
 
 static void console_scrollup (void)
 {
-#if 1
 	/* Copy up rows ignoring the first one */
 	memcpy (CONSOLE_ROW_FIRST, CONSOLE_ROW_SECOND, CONSOLE_SCROLL_SIZE);
 
 	/* Clear the last one */
 	memset (CONSOLE_ROW_LAST, COLOR_MASK(lcd_color_bg), CONSOLE_ROW_SIZE);
-#else
-	/*
-	 * Poor attempt to optimize speed by moving "long"s.
-	 * But the code is ugly, and not a bit faster :-(
-	 */
-	ulong *t = (ulong *)CONSOLE_ROW_FIRST;
-	ulong *s = (ulong *)CONSOLE_ROW_SECOND;
-	ulong    l = CONSOLE_SCROLL_SIZE / sizeof(ulong);
-	uchar  c = lcd_color_bg & 0xFF;
-	ulong val= (c<<24) | (c<<16) | (c<<8) | c;
-
-	while (l--)
-		*t++ = *s++;
-
-	t = (ulong *)CONSOLE_ROW_LAST;
-	l = CONSOLE_ROW_SIZE / sizeof(ulong);
-
-	while (l-- > 0)
-		*t++ = val;
-#endif
 }
 
 /*----------------------------------------------------------------------*/
-- 
1.5.6.5

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [U-Boot] [PATCH 1/3] memcpy: use 32-bit copies if possible
  2009-10-07  8:44 ` [U-Boot] [PATCH 1/3] memcpy: use 32-bit copies if possible Alessandro Rubini
@ 2009-10-07  8:52   ` Mike Frysinger
  2009-10-07  8:59     ` Alessandro Rubini
                       ` (2 more replies)
  2009-10-07 14:35   ` Peter Tyser
  1 sibling, 3 replies; 19+ messages in thread
From: Mike Frysinger @ 2009-10-07  8:52 UTC (permalink / raw)
  To: u-boot

On Wednesday 07 October 2009 04:44:26 Alessandro Rubini wrote:
> --- a/lib_generic/string.c
> +++ b/lib_generic/string.c
> @@ -449,7 +449,16 @@ char * bcopy(const char * src, char * dest, int count)
>  void * memcpy(void * dest,const void *src,size_t count)
>  {
>  	char *tmp = (char *) dest, *s = (char *) src;
> +	u32 *d32 = (u32 *)dest, *s32 = (u32 *) src;
> 
> +	/* if both are aligned, use 32-bit copy */
> +	if ( (((int)dest & 3) | ((int)src & 3) | (count & 3)) == 0 ) {

while 64bit isnt in today, might as well avoid unclean code from the start 
when possible.  in other words, used "unsigned int" rather than "u32" and cast 
to "unsigned long" rather than "int".

> +		count /= 4;

count >>= 2 ?  although gcc probably optimizes this properly.
-mike
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 836 bytes
Desc: This is a digitally signed message part.
Url : http://lists.denx.de/pipermail/u-boot/attachments/20091007/05b69532/attachment.pgp 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [U-Boot] [PATCH 1/3] memcpy: use 32-bit copies if possible
  2009-10-07  8:52   ` Mike Frysinger
@ 2009-10-07  8:59     ` Alessandro Rubini
  2009-10-07  9:30     ` Wolfgang Denk
  2009-10-08  7:41     ` Alessandro Rubini
  2 siblings, 0 replies; 19+ messages in thread
From: Alessandro Rubini @ 2009-10-07  8:59 UTC (permalink / raw)
  To: u-boot

> while 64bit isnt in today, might as well avoid unclean code from the
> start when possible.  in other words, used "unsigned int" rather
> than "u32" and cast to "unsigned long" rather than "int".

You are right. Will do. 

>> +		count /= 4;

> count >>= 2 ?  although gcc probably optimizes this properly.

gcc-4.0 for arm already turns it into a shift. I'll use
"sizeof(unsigned long)" in version 2.

Thanks
/alessandro

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [U-Boot] [PATCH 2/3] memset: use 32-bit copies if possible
  2009-10-07  8:44 ` [U-Boot] [PATCH 2/3] memset: " Alessandro Rubini
@ 2009-10-07  9:27   ` Wolfgang Denk
  0 siblings, 0 replies; 19+ messages in thread
From: Wolfgang Denk @ 2009-10-07  9:27 UTC (permalink / raw)
  To: u-boot

Dear Alessandro Rubini,

In message <d10837d7edf963ce9b0e87773c9771e451f9f9ac.1254904388.git.rubini@ unipv.it> you wrote:
> From: Alessandro Rubini <rubini@unipv.it>
> 
> Signed-off-by: Alessandro Rubini <rubini@unipv.it>
> Acked-by: Andrea Gallo <andrea.gallo@stericsson.com>

Please fix the Subject: memset() does not copy data.

>  lib_generic/string.c |   16 +++++++++++++++-
>  1 files changed, 15 insertions(+), 1 deletions(-)
> 
> diff --git a/lib_generic/string.c b/lib_generic/string.c
> index fdccab6..68e0255 100644
> --- a/lib_generic/string.c
> +++ b/lib_generic/string.c
> @@ -404,7 +404,21 @@ char *strswab(const char *s)
>  void * memset(void * s,int c,size_t count)
>  {
>  	char *xs = (char *) s;
> +	u32 *s32 = (u32 *) s;
> +	int c32 = 0; /* most common case */
>  
> +	/* do it 32 bits at a time if possible */
> +	if ( ((count & 3) | ((int)s & 3)) == 0) {
> +		count /= 4;
> +		if (c) { /* not 0: build 32-bit value */
> +			c32 = c | (c<<8);
> +			c32 |= c32 << 16;

I reject this code, as it changes behaviour. You may argument that
it's only for special cases, but anyway:

Fill pattern	for standard memset()	for your code:
c = -2		0xFEFEFEFE		0xFFFFFFFE
c = -3		0xFDFDFDFD		0xFFFFFFFD
...
c = -2147483648	0x00000000		0x80000000

> +		}

I suggest to omit the special case for zero and handle it just the
same. This results in smaller code size and is easier to read.

> +		while (count--)
> +			*s32++ = c32;
> +		return s;
> +	}
> +	/* else, fill 8 bits at a time */

Blank line here, please.

>  	while (count--)
>  		*xs++ = c;

Please fix and resubmit.

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH,     MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd at denx.de
God made machine language; all the rest is the work of man.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [U-Boot] [PATCH 1/3] memcpy: use 32-bit copies if possible
  2009-10-07  8:52   ` Mike Frysinger
  2009-10-07  8:59     ` Alessandro Rubini
@ 2009-10-07  9:30     ` Wolfgang Denk
  2009-10-08  7:41     ` Alessandro Rubini
  2 siblings, 0 replies; 19+ messages in thread
From: Wolfgang Denk @ 2009-10-07  9:30 UTC (permalink / raw)
  To: u-boot

Dear Mike Frysinger,

In message <200910070452.02225.vapier@gentoo.org> you wrote:
>
> > --- a/lib_generic/string.c
> > +++ b/lib_generic/string.c
> > @@ -449,7 +449,16 @@ char * bcopy(const char * src, char * dest, int count)
> >  void * memcpy(void * dest,const void *src,size_t count)
> >  {
> >  	char *tmp = (char *) dest, *s = (char *) src;
> > +	u32 *d32 = (u32 *)dest, *s32 = (u32 *) src;
> > 
> > +	/* if both are aligned, use 32-bit copy */
> > +	if ( (((int)dest & 3) | ((int)src & 3) | (count & 3)) == 0 ) {
>
> while 64bit isnt in today, might as well avoid unclean code from the start 
> when possible.  in other words, used "unsigned int" rather than "u32" and cast 
> to "unsigned long" rather than "int".
>
> > +		count /= 4;
>
> count >>= 2 ?  although gcc probably optimizes this properly.

Neither one nor the other.

This should become

	count /= sizof(int);

then.

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH,     MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd at denx.de
When all is said and done, more is said than done.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [U-Boot] [PATCH 1/3] memcpy: use 32-bit copies if possible
  2009-10-07  8:44 ` [U-Boot] [PATCH 1/3] memcpy: use 32-bit copies if possible Alessandro Rubini
  2009-10-07  8:52   ` Mike Frysinger
@ 2009-10-07 14:35   ` Peter Tyser
  2009-10-07 17:04     ` Mike Frysinger
  1 sibling, 1 reply; 19+ messages in thread
From: Peter Tyser @ 2009-10-07 14:35 UTC (permalink / raw)
  To: u-boot

Hi Alessandro,

> --- a/lib_generic/string.c
> +++ b/lib_generic/string.c
> @@ -449,7 +449,16 @@ char * bcopy(const char * src, char * dest, int count)
>  void * memcpy(void * dest,const void *src,size_t count)
>  {
>  	char *tmp = (char *) dest, *s = (char *) src;
> +	u32 *d32 = (u32 *)dest, *s32 = (u32 *) src;
>  
> +	/* if both are aligned, use 32-bit copy */
> +	if ( (((int)dest & 3) | ((int)src & 3) | (count & 3)) == 0 ) {
> +		count /= 4;
> +		while (count--)
> +			*d32++ = *s32++;
> +		return dest;
> +	}
> +	/* else, use 1-byte copy */
>  	while (count--)
>  		*tmp++ = *s++;

If we're adding this logic, what about adding it such that:

if (src/dest are 32-bit aligned and count > 3) {
	perform 32-bit copies till count <= 3
}
perform remaining 8-bit copies till count == 0

You'd still get the performance boost but not have the requirement that
count is evenly divisible by 4.  You could do byte copies before the
32-bit copies to align the src/dest in some cases, but that might be
overkill...

Same comment goes for the memset implementation.

Best,
Peter

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [U-Boot] [PATCH 1/3] memcpy: use 32-bit copies if possible
  2009-10-07 14:35   ` Peter Tyser
@ 2009-10-07 17:04     ` Mike Frysinger
  0 siblings, 0 replies; 19+ messages in thread
From: Mike Frysinger @ 2009-10-07 17:04 UTC (permalink / raw)
  To: u-boot

On Wednesday 07 October 2009 10:35:19 Peter Tyser wrote:
> > --- a/lib_generic/string.c
> > +++ b/lib_generic/string.c
> > @@ -449,7 +449,16 @@ char * bcopy(const char * src, char * dest, int
> > count) void * memcpy(void * dest,const void *src,size_t count)
> >  {
> >  	char *tmp = (char *) dest, *s = (char *) src;
> > +	u32 *d32 = (u32 *)dest, *s32 = (u32 *) src;
> >
> > +	/* if both are aligned, use 32-bit copy */
> > +	if ( (((int)dest & 3) | ((int)src & 3) | (count & 3)) == 0 ) {
> > +		count /= 4;
> > +		while (count--)
> > +			*d32++ = *s32++;
> > +		return dest;
> > +	}
> > +	/* else, use 1-byte copy */
> >  	while (count--)
> >  		*tmp++ = *s++;
> 
> If we're adding this logic, what about adding it such that:
> 
> if (src/dest are 32-bit aligned and count > 3) {
> 	perform 32-bit copies till count <= 3
> }
> perform remaining 8-bit copies till count == 0
> 
> You'd still get the performance boost but not have the requirement that
> count is evenly divisible by 4.  You could do byte copies before the
> 32-bit copies to align the src/dest in some cases, but that might be
> overkill...

i thought the same but didnt feel like sending out another e-mail ;)
-mike
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 836 bytes
Desc: This is a digitally signed message part.
Url : http://lists.denx.de/pipermail/u-boot/attachments/20091007/10ed4a62/attachment.pgp 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [U-Boot] [PATCH 1/3] memcpy: use 32-bit copies if possible
       [not found] ` <b03f28d4c4afa0c5c9f2421b7d260a0f3890cdcd.1254904388.git.rubini@unipv.it>
@ 2009-10-08  5:23   ` Chris Moore
  2009-10-08  6:05     ` Mike Frysinger
  0 siblings, 1 reply; 19+ messages in thread
From: Chris Moore @ 2009-10-08  5:23 UTC (permalink / raw)
  To: u-boot

Alessandro Rubini a ?crit :
> From: Alessandro Rubini <rubini@unipv.it>
>
> Signed-off-by: Alessandro Rubini <rubini@unipv.it>
> Acked-by: Andrea Gallo <andrea.gallo@stericsson.com>
> ---
>  lib_generic/string.c |   11 ++++++++++-
>  1 files changed, 10 insertions(+), 1 deletions(-)
>
> diff --git a/lib_generic/string.c b/lib_generic/string.c
> index 181eda6..fdccab6 100644
> --- a/lib_generic/string.c
> +++ b/lib_generic/string.c
> @@ -449,7 +449,16 @@ char * bcopy(const char * src, char * dest, int count)
>  void * memcpy(void * dest,const void *src,size_t count)
>  {
>  	char *tmp = (char *) dest, *s = (char *) src;
> +	u32 *d32 = (u32 *)dest, *s32 = (u32 *) src;
>  
> +	/* if both are aligned, use 32-bit copy */
> +	if ( (((int)dest & 3) | ((int)src & 3) | (count & 3)) == 0 ) {
>   

This can be factorized as (a & 3) | (b & 3) | (c & 3) is equivalent to 
(a | b | c) & 3.
GCC is pretty smart but I doubt that it will pick this up :(

> +		count /= 4;
> +		while (count--)
> +			*d32++ = *s32++;
> +		return dest;
> +	}
> +	/* else, use 1-byte copy */
>  	while (count--)
>  		*tmp++ = *s++;
>  
>   

Cheers,
Chris

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [U-Boot] [PATCH 1/3] memcpy: use 32-bit copies if possible
  2009-10-08  5:23   ` [U-Boot] [PATCH 1/3] memcpy: use 32-bit copies if possible Chris Moore
@ 2009-10-08  6:05     ` Mike Frysinger
  0 siblings, 0 replies; 19+ messages in thread
From: Mike Frysinger @ 2009-10-08  6:05 UTC (permalink / raw)
  To: u-boot

On Thursday 08 October 2009 01:23:05 Chris Moore wrote:
> Alessandro Rubini a ?crit :
> > --- a/lib_generic/string.c
> > +++ b/lib_generic/string.c
> > @@ -449,7 +449,16 @@ char * bcopy(const char * src, char * dest, int
> > count) void * memcpy(void * dest,const void *src,size_t count)
> >  {
> >  	char *tmp = (char *) dest, *s = (char *) src;
> > +	u32 *d32 = (u32 *)dest, *s32 = (u32 *) src;
> >
> > +	/* if both are aligned, use 32-bit copy */
> > +	if ( (((int)dest & 3) | ((int)src & 3) | (count & 3)) == 0 ) {
> 
> This can be factorized as (a & 3) | (b & 3) | (c & 3) is equivalent to
> (a | b | c) & 3.
> GCC is pretty smart but I doubt that it will pick this up :(

last time i looked (gcc-4.1 and gcc-4.3), it worked fine
-mike
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 836 bytes
Desc: This is a digitally signed message part.
Url : http://lists.denx.de/pipermail/u-boot/attachments/20091008/6b8468e2/attachment.pgp 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [U-Boot] [PATCH 1/3] memcpy: use 32-bit copies if possible
  2009-10-07  8:52   ` Mike Frysinger
  2009-10-07  8:59     ` Alessandro Rubini
  2009-10-07  9:30     ` Wolfgang Denk
@ 2009-10-08  7:41     ` Alessandro Rubini
  2009-10-08  8:37       ` Wolfgang Denk
  2009-10-10 16:09       ` Eric Lammerts
  2 siblings, 2 replies; 19+ messages in thread
From: Alessandro Rubini @ 2009-10-08  7:41 UTC (permalink / raw)
  To: u-boot

I was making my v2, and I found a problem wrt:

> while 64bit isnt in today, might as well avoid unclean code from the start
> when possible.  in other words, used "unsigned int" rather than "u32" and cast
> to "unsigned long" rather than "int".

Since int is 32 also on 64bit systems, I used unsigned long.
For memcpy all is well, for memset I have this problem:

    void * memset(void * s,int c,size_t count)
    {
    	char *xs = (char *) s;
    	unsigned long *sl = (unsigned long *) s;
    	unsigned long cl;

    	/* do it one word at a time (32 bits or 64 bits) if possible */
    	if ( ((count | (int)s) & (sizeof(long) - 1)) == 0) {
    		count /= sizeof(long);
    		cl = (c & 0xff) | ((c & 0xff) << 8);
    		cl |= cl << 16;
    		if (sizeof(long) > 4)
    			cl |= cl << 32;
    		while (count--)
    			*sl++ = cl;
    		return s;
    	}
    	/* else, fill 8 bits@a time */
    	while (count--)
    		*xs++ = c;

    	return s;
    }

string.c:416: warning: left shift count >= width of type

(obviously there is no such shift in the generated code,
since the condition is false at compile time).

I think I'll stick to u32 for memset, unless I get better suggestions.

/alessandro

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [U-Boot] [PATCH 1/3] memcpy: use 32-bit copies if possible
  2009-10-08  7:41     ` Alessandro Rubini
@ 2009-10-08  8:37       ` Wolfgang Denk
  2009-10-08  9:05         ` Joakim Tjernlund
  2009-10-10 16:09       ` Eric Lammerts
  1 sibling, 1 reply; 19+ messages in thread
From: Wolfgang Denk @ 2009-10-08  8:37 UTC (permalink / raw)
  To: u-boot

Dear Alessandro Rubini,

In message <20091008074114.GA30203@mail.gnudd.com> you wrote:
>
> Since int is 32 also on 64bit systems, I used unsigned long.

Note that this is not guaranteed, though. It could be 64 bit as well.


>     	/* do it one word at a time (32 bits or 64 bits) if possible */
>     	if ( ((count | (int)s) & (sizeof(long) - 1)) == 0) {
>     		count /= sizeof(long);
>     		cl = (c & 0xff) | ((c & 0xff) << 8);
>     		cl |= cl << 16;
>     		if (sizeof(long) > 4)
>     			cl |= cl << 32;

How about:

		cl = 0;
		for (i=0; i<sizeof(long); ++i) {
			cl <<= 8;
			cl |= c & 0xff;
		}

GCC optimization will do the rest...

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH,     MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd at denx.de
If I don't document something, it's usually either for a good reason,
or a bad reason.  In this case it's a good reason.  :-)
                 - Larry Wall in <1992Jan17.005405.16806@netlabs.com>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [U-Boot] [PATCH 1/3] memcpy: use 32-bit copies if possible
  2009-10-08  8:37       ` Wolfgang Denk
@ 2009-10-08  9:05         ` Joakim Tjernlund
  2009-10-08 12:49           ` Wolfgang Denk
  0 siblings, 1 reply; 19+ messages in thread
From: Joakim Tjernlund @ 2009-10-08  9:05 UTC (permalink / raw)
  To: u-boot

>
> Dear Alessandro Rubini,
>
> In message <20091008074114.GA30203@mail.gnudd.com> you wrote:
> >
> > Since int is 32 also on 64bit systems, I used unsigned long.
>
> Note that this is not guaranteed, though. It could be 64 bit as well.
>
>
> >        /* do it one word at a time (32 bits or 64 bits) if possible */
> >        if ( ((count | (int)s) & (sizeof(long) - 1)) == 0) {
> >           count /= sizeof(long);
> >           cl = (c & 0xff) | ((c & 0xff) << 8);
> >           cl |= cl << 16;
> >           if (sizeof(long) > 4)
> >              cl |= cl << 32;
>
> How about:
>
>       cl = 0;
>       for (i=0; i<sizeof(long); ++i) {
>          cl <<= 8;
>          cl |= c & 0xff;
>       }
>
> GCC optimization will do the rest...

If you want gcc to optimise well, make it easy to do so.
Changing the for loop into:
      for (i=sizeof(long); i; --i)
makes it easier for gcc and more likely to result in optimal code.

Similarly for copy ops
 for(--top,--fromp, i = len/4; i; --i)
     *++top = *++fromp; /* pre incr can be one op on some archs */

   Jocke

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [U-Boot] [PATCH 1/3] memcpy: use 32-bit copies if possible
  2009-10-08  9:05         ` Joakim Tjernlund
@ 2009-10-08 12:49           ` Wolfgang Denk
  2009-10-08 13:29             ` Joakim Tjernlund
  0 siblings, 1 reply; 19+ messages in thread
From: Wolfgang Denk @ 2009-10-08 12:49 UTC (permalink / raw)
  To: u-boot

Dear Joakim Tjernlund,

In message <OF26FEA2CB.36E102EE-ONC1257649.0031301D-C1257649.0031EDAF@transmode.se> you wrote:
>
> > How about:
> >
> >       cl = 0;
> >       for (i=0; i<sizeof(long); ++i) {
> >          cl <<= 8;
> >          cl |= c & 0xff;
> >       }
> >
> > GCC optimization will do the rest...
> 
> If you want gcc to optimise well, make it easy to do so.
> Changing the for loop into:
>       for (i=sizeof(long); i; --i)
> makes it easier for gcc and more likely to result in optimal code.

Did you actually _check_ the code? (I did).

It does not matter. The generated code is identical.

What matters might be compiler options - "-Os" generates a small loop,
and "-O3" and higher will unroll the loop, which is way more code.

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH,     MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd at denx.de
When you die, the first thing you lose is your life. The  next  thing
is the illusions.                       - Terry Pratchett, _Pyramids_

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [U-Boot] [PATCH 1/3] memcpy: use 32-bit copies if possible
  2009-10-08 12:49           ` Wolfgang Denk
@ 2009-10-08 13:29             ` Joakim Tjernlund
  2009-10-08 13:47               ` Wolfgang Denk
  0 siblings, 1 reply; 19+ messages in thread
From: Joakim Tjernlund @ 2009-10-08 13:29 UTC (permalink / raw)
  To: u-boot

Wolfgang Denk <wd@denx.de> wrote on 08/10/2009 14:49:15:
>
> Dear Joakim Tjernlund,
>
> In message <OF26FEA2CB.36E102EE-ONC1257649.0031301D-C1257649.
> 0031EDAF at transmode.se> you wrote:
> >
> > > How about:
> > >
> > >       cl = 0;
> > >       for (i=0; i<sizeof(long); ++i) {
> > >          cl <<= 8;
> > >          cl |= c & 0xff;
> > >       }
> > >
> > > GCC optimization will do the rest...
> >
> > If you want gcc to optimise well, make it easy to do so.
> > Changing the for loop into:
> >       for (i=sizeof(long); i; --i)
> > makes it easier for gcc and more likely to result in optimal code.
>
> Did you actually _check_ the code? (I did).

No, but I have done this in the past and found that
gcc does stupid things sometimes, especially loops, and
it depends on arch and gcc version.

>
> It does not matter. The generated code is identical.

So my question is: Did you check all gcc versions and
arches?

 Jocke

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [U-Boot] [PATCH 1/3] memcpy: use 32-bit copies if possible
  2009-10-08 13:29             ` Joakim Tjernlund
@ 2009-10-08 13:47               ` Wolfgang Denk
  0 siblings, 0 replies; 19+ messages in thread
From: Wolfgang Denk @ 2009-10-08 13:47 UTC (permalink / raw)
  To: u-boot

Dear Joakim Tjernlund,

In message <OF15EDFD3F.769F0248-ONC1257649.0049C8AE-C1257649.004A1776@transmode.se> you wrote:
>
> So my question is: Did you check all gcc versions and
> arches?

Of course not :-)

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH,     MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd at denx.de
Even if you can deceive people about  a  product  through  misleading
statements, sooner or later the product will speak for itself.
- Hajime Karatsu

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [U-Boot] [PATCH 1/3] memcpy: use 32-bit copies if possible
  2009-10-08  7:41     ` Alessandro Rubini
  2009-10-08  8:37       ` Wolfgang Denk
@ 2009-10-10 16:09       ` Eric Lammerts
  1 sibling, 0 replies; 19+ messages in thread
From: Eric Lammerts @ 2009-10-10 16:09 UTC (permalink / raw)
  To: u-boot

On 10/08/09 03:41, Alessandro Rubini wrote:
> For memcpy all is well, for memset I have this problem:

>     		if (sizeof(long) > 4)
>     			cl |= cl << 32;

> string.c:416: warning: left shift count >= width of type
> 
> (obviously there is no such shift in the generated code,
> since the condition is false at compile time).
> 
> I think I'll stick to u32 for memset, unless I get better suggestions.

What about
      		if (sizeof(long) > 4)
      			cl |= (unsigned long long)cl << 32;
?

Eric

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2009-10-10 16:09 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-10-07  8:44 [U-Boot] [PATCH 0/3] make memcpy and memset 32-bit copies Alessandro Rubini
2009-10-07  8:44 ` [U-Boot] [PATCH 1/3] memcpy: use 32-bit copies if possible Alessandro Rubini
2009-10-07  8:52   ` Mike Frysinger
2009-10-07  8:59     ` Alessandro Rubini
2009-10-07  9:30     ` Wolfgang Denk
2009-10-08  7:41     ` Alessandro Rubini
2009-10-08  8:37       ` Wolfgang Denk
2009-10-08  9:05         ` Joakim Tjernlund
2009-10-08 12:49           ` Wolfgang Denk
2009-10-08 13:29             ` Joakim Tjernlund
2009-10-08 13:47               ` Wolfgang Denk
2009-10-10 16:09       ` Eric Lammerts
2009-10-07 14:35   ` Peter Tyser
2009-10-07 17:04     ` Mike Frysinger
2009-10-07  8:44 ` [U-Boot] [PATCH 2/3] memset: " Alessandro Rubini
2009-10-07  9:27   ` Wolfgang Denk
2009-10-07  8:44 ` [U-Boot] [PATCH 3/3] lcd: remove '#if 0' 32-bit scroll, now memcpy does it Alessandro Rubini
     [not found] ` <b03f28d4c4afa0c5c9f2421b7d260a0f3890cdcd.1254904388.git.rubini@unipv.it>
2009-10-08  5:23   ` [U-Boot] [PATCH 1/3] memcpy: use 32-bit copies if possible Chris Moore
2009-10-08  6:05     ` Mike Frysinger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox