linux-fbdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Fw: framebuffer blitting performance loss  2.6.12 -> 2.6.13-rc3
@ 2005-07-15 10:39 Andrew Morton
  2005-07-22  3:58 ` Antonino A. Daplas
  0 siblings, 1 reply; 14+ messages in thread
From: Andrew Morton @ 2005-07-15 10:39 UTC (permalink / raw)
  To: linux-fbdev-devel; +Cc: Knut Petersen



Begin forwarded message:

Date: Fri, 15 Jul 2005 12:14:37 +0200
From: Knut Petersen <Knut_Petersen@t-online.de>
To: linux-kernel@vger.kernel.org
Subject: framebuffer blitting performance loss  2.6.12 -> 2.6.13-rc3


Hi everybody!

There is a serious performance loss between 2.6.12 and 2.6.13-rc3
affecting _all_ framebuffer devices, especially those with fast
bitblit functions.

System: Via Epia 5000
CPU: Via Samuel 2, 533MHz
Graphics core: Cyberblade/i1 (Blade 3D core integrated in 8601A)
Framebuffer driver: Not yet released fully accelerated framebuffer
                    driver cyblafb


Test setup
==========

video mode: 1280x1024, vyres=2662, bpp=8, 8x16 font, ypan scrollmode
kernel 2.6.13-rc3 is compiled with HZ==1000

Measurement 1: Compile framebuffer modules
       Result: 2.6.13-rc3 is slightly slower, but this is an almost
               invisible performance loss of about 1%

Measurement 2: time cat of file consisting of 2000 empty lines
       Result:
                                          |  2.6.12 / 2.6.13-rc3
------------------------------------------+----------------------
total time                                |  0.182s / 0.220s


Measurement 3: time cat of file consisting of 2000 full lines of
               160 characters each. Result:

       Result:
                                          |  2.6.12 / 2.6.13-rc3
------------------------------------------+----------------------
total time                                |  0.853s / 1.062s
time spent in framebuffer bitblit routine |  0.256s / 0.257s
time spent for kernel bitblit overhead    |  0,426s / 0.623s !!!
other time (scrolling, disk io etc)       |  0,171s / 0,182s


Discussion of measurements
==========================

Framebuffer compiling shows that the general kernel performance is
more or less unchanged between 2.6.12 and 2.6.13-rc3.

Cat-ing of the file consisting of 2000 empty lines takes about 20.9%
more time, cat-ing of the file consisting of 2000 full lines takes about
24% more time.

As the time spent in the bitblit function of the framebuffer driver
does not change I do assume that the data sent to the framebuffer
driver has not changed. But the new routines take about 46% longer.

All framebuffer drivers should be affected by this performance loss,
but the faster the bitblit of the used framebuffer driver is, the
more it will affect the general performance. You will not see such
a great difference if e.g. vesafb is used.

Please have a serious look at the changed code of fbcon/fbmem etc
or switch back to the old routines.

cu,
 Knut
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Fw: framebuffer blitting performance loss  2.6.12 -> 2.6.13-rc3
  2005-07-15 10:39 Fw: framebuffer blitting performance loss 2.6.12 -> 2.6.13-rc3 Andrew Morton
@ 2005-07-22  3:58 ` Antonino A. Daplas
  2005-07-29  7:17   ` Andrew Morton
  0 siblings, 1 reply; 14+ messages in thread
From: Antonino A. Daplas @ 2005-07-22  3:58 UTC (permalink / raw)
  To: linux-fbdev-devel, Andrew Morton; +Cc: Knut Petersen

On Friday 15 July 2005 18:39, Andrew Morton wrote:
> Begin forwarded message:
>
> Date: Fri, 15 Jul 2005 12:14:37 +0200
> From: Knut Petersen <Knut_Petersen@t-online.de>
> To: linux-kernel@vger.kernel.org
> Subject: framebuffer blitting performance loss  2.6.12 -> 2.6.13-rc3
>
>
> Hi everybody!
>
> There is a serious performance loss between 2.6.12 and 2.6.13-rc3
> affecting _all_ framebuffer devices, especially those with fast
> bitblit functions.
>

I haven't seen any significant performance penalty, between 2.6.12-rc5-mm1
and 2.6.13-rc3-mm1.

Based on your results, I would pinpoint the culprit to be in
video/console/bitblit.c.  However, the changes there are minor, and should not
alter the peformance.

Tony



-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Fw: framebuffer blitting performance loss  2.6.12 -> 2.6.13-rc3
  2005-07-22  3:58 ` Antonino A. Daplas
@ 2005-07-29  7:17   ` Andrew Morton
  2005-07-29 14:54     ` Knut Petersen
  0 siblings, 1 reply; 14+ messages in thread
From: Andrew Morton @ 2005-07-29  7:17 UTC (permalink / raw)
  To: linux-fbdev-devel; +Cc: adaplas, Knut_Petersen

"Antonino A. Daplas" <adaplas@gmail.com> wrote:
>
> On Friday 15 July 2005 18:39, Andrew Morton wrote:
> > Begin forwarded message:
> >
> > Date: Fri, 15 Jul 2005 12:14:37 +0200
> > From: Knut Petersen <Knut_Petersen@t-online.de>
> > To: linux-kernel@vger.kernel.org
> > Subject: framebuffer blitting performance loss  2.6.12 -> 2.6.13-rc3
> >
> >
> > Hi everybody!
> >
> > There is a serious performance loss between 2.6.12 and 2.6.13-rc3
> > affecting _all_ framebuffer devices, especially those with fast
> > bitblit functions.
> >
> 
> I haven't seen any significant performance penalty, between 2.6.12-rc5-mm1
> and 2.6.13-rc3-mm1.
> 
> Based on your results, I would pinpoint the culprit to be in
> video/console/bitblit.c.  However, the changes there are minor, and should not
> alter the peformance.
> 

So.. what happened here?  Is the problem still present in 2.6.13-rc4?


-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO September
19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Fw: framebuffer blitting performance loss 2.6.12 -> 2.6.13-rc3
  2005-07-29  7:17   ` Andrew Morton
@ 2005-07-29 14:54     ` Knut Petersen
  2005-07-29 15:42       ` Antonino A. Daplas
  0 siblings, 1 reply; 14+ messages in thread
From: Knut Petersen @ 2005-07-29 14:54 UTC (permalink / raw)
  To: linux-fbdev-devel; +Cc: Andrew Morton, adaplas

Hi everybody!

>>I haven't seen any significant performance penalty, between 2.6.12-rc5-mm1
>>and 2.6.13-rc3-mm1.
>>
>>Based on your results, I would pinpoint the culprit to be in
>>video/console/bitblit.c.  However, the changes there are minor, and should not
>>alter the peformance.
>>
>>
>
>So.. what happened here?  Is the problem still present in 2.6.13-rc4?
>
>
Yes, the problem still is present in 2.6.13-rc4.
================================================

There is only an insignificant difference of max +/- 2ms between
2.6.13-rc3 and 2.6.13-rc4 for all measurements.

Test 1:   reset;time cat scrolltest0
Test 2:   reset;time cat scrolltest80
Test 3:   reset;time cat scrolltest160

scrolltest0 is a file with 2000 empty lines.
scrolltest80 is a file with 2000 lines of 80 characters each.
scrolltest 160 is a file with 2000 lines of 160 characters each.

vesafb tests are made with the original vesafb of the respective kernel 
versions,
cyblafb tests all use the same source file, accelerations: fillrect, 
bitblit, copyarea
2.6.13-rc* are compiled for 1000Hz system timer as it is also used for 
2.6.12.

chipset: trident cyberblade/i1
video mode: vesa 0x307 (1280x1024@75hz)
8x16 font

Nothing but the kernel changed between the tests,
the time values given are system time in seconds.


 vga=0x307 | test 1   test 2   test 3  | test 1   test 2    test 3 
           |    video=vesafb:ypan      |       video=vesafb
-----------+---------------------------+---------------------------
2.6.12     | 3,753s   4,825s   5,936s  | 4,258s  65,645s  126,898s
2.6.13-rc4 | 3,937s   5,135s   6,302s  | 4,304s  71,515s  138,674s
           |  +4,9%   +6,42%   +6,17%  | +1,08%   +8,94%    +9,28%


 vga=0x307 | test 1   test 2   test 3  | test 1   test 2    test 3 
           |    video=cyblafb          |   video=cyblafb:noypan
-----------+---------------------------+---------------------------
2.6.12     | 0,228s   0,549s   0,870s  | 7,692s   8,015s    8,335s
2.6.13-rc4 | 0,235s   0,654s   1,072s  | 7,699s   8,120s    8,549s
           | +3,07%  +19,13%  +23,22%  | +0,09%   +1,31%    +2,57%


The numbers show very clearly that 2.6.13-rc* blitting is much slower than
the blitting of 2.6.12. For cyblafb the time spend for the actual 
blitting is
about 257ms for test3, so the actual performance loss for the pre-driver 
part
is above 30%

Now for a real world example:

      reset; time cat patch-2.6.13-rc4

cyblafb, kernel 2.6.12     : 173,013s
cyblafb, kernel 2.6.13-rc4 : 196,181s
   difference              :  23,168s ( +13,4% )

Could anyone take the time to measure performance of some other drivers?
Those using ypan scrolling and hardware accelerated bitblit should be
most affected.


cu,
 Knut





-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO September
19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Fw: framebuffer blitting performance loss 2.6.12 -> 2.6.13-rc3
  2005-07-29 14:54     ` Knut Petersen
@ 2005-07-29 15:42       ` Antonino A. Daplas
  2005-07-29 19:02         ` Andrew Morton
                           ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Antonino A. Daplas @ 2005-07-29 15:42 UTC (permalink / raw)
  To: Knut Petersen; +Cc: linux-fbdev-devel, Andrew Morton

Knut Petersen wrote:
> Hi everybody!
> 
>>> I haven't seen any significant performance penalty, between 
>>> 2.6.12-rc5-mm1
>>> and 2.6.13-rc3-mm1.
>>>
>>> Based on your results, I would pinpoint the culprit to be in
>>> video/console/bitblit.c.  However, the changes there are minor, and 
>>> should not
>>> alter the peformance.
>>>
>>>
>>
>> So.. what happened here?  Is the problem still present in 2.6.13-rc4?
>>
>>
> Yes, the problem still is present in 2.6.13-rc4.
> ================================================

Thank you for your persistence.  I think I know the culprit.  Someone
insisted on using memcpy in fb_pad_aligned_buffer().  I have already
fixed this before, but apparently, the memcpy was brought back.  Try
the attached patch and let me know.

Tony

   fbdev: Replace memcpy with for-loop when preparing bitmap

    Do not use memcpy in fb_pad_aligned_buffer. It is suboptimal because only
    a few bytes are moved at a time. Replace with a for-loop.

    From: Antonino Daplas <adaplas@pol.net>
    Signed-off-by: Antonino Daplas <adaplas@pol.net>
---

 fbmem.c |    6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)

--- a/drivers/video/fbmem.c
+++ b/drivers/video/fbmem.c
@@ -80,10 +80,12 @@ EXPORT_SYMBOL(fb_get_color_depth);
  */
 void fb_pad_aligned_buffer(u8 *dst, u32 d_pitch, u8 *src, u32 s_pitch, u32 height)
 {
-	int i;
+	int i, j;
 
 	for (i = height; i--; ) {
-		memcpy(dst, src, s_pitch);
+		/* s_pitch is a few bytes at the most, memcpy is suboptimal */
+		for (j = 0; j < s_pitch; j++)
+			dst[j] = src[j];
 		src += s_pitch;
 		dst += d_pitch;
 	}


-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO September
19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Fw: framebuffer blitting performance loss 2.6.12 -> 2.6.13-rc3
  2005-07-29 15:42       ` Antonino A. Daplas
@ 2005-07-29 19:02         ` Andrew Morton
  2005-07-29 19:52           ` James Simmons
  2005-07-29 19:59           ` James Simmons
  2005-07-29 19:51         ` James Simmons
  2005-07-29 20:10         ` Knut Petersen
  2 siblings, 2 replies; 14+ messages in thread
From: Andrew Morton @ 2005-07-29 19:02 UTC (permalink / raw)
  To: Antonino A. Daplas; +Cc: Knut_Petersen, linux-fbdev-devel

"Antonino A. Daplas" <adaplas@gmail.com> wrote:
>
>     fbdev: Replace memcpy with for-loop when preparing bitmap

Whee, progress.  Please let me know if/when you want this sent to Linus.


-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Fw: framebuffer blitting performance loss  2.6.12 -> 2.6.13-rc3
  2005-07-29 15:42       ` Antonino A. Daplas
  2005-07-29 19:02         ` Andrew Morton
@ 2005-07-29 19:51         ` James Simmons
  2005-07-29 20:21           ` Jon Smirl
  2005-07-29 22:45           ` Luca
  2005-07-29 20:10         ` Knut Petersen
  2 siblings, 2 replies; 14+ messages in thread
From: James Simmons @ 2005-07-29 19:51 UTC (permalink / raw)
  To: linux-fbdev-devel; +Cc: Knut Petersen, Andrew Morton


> Thank you for your persistence.  I think I know the culprit.  Someone
> insisted on using memcpy in fb_pad_aligned_buffer().  I have already
> fixed this before, but apparently, the memcpy was brought back.  Try
> the attached patch and let me know.

Yipes, I did that. The memcpy function is suppose to be optimized for the 
platform. See string.h in the include/asm directory. I seen for example 
the Athlon would use the 3DNow instruction set to copy data. Something 
is really wrong with memcpy if moving byte by byte is faster !!!! 
Alot of drivers use memcpy. If memcpy sucks then drivers should be copying 
byte by byte then. The question I have is this the case for non intel 
platforms as well. Could someone run the numbers on other platforms?

> Tony
> 
>    fbdev: Replace memcpy with for-loop when preparing bitmap
> 
>     Do not use memcpy in fb_pad_aligned_buffer. It is suboptimal because only
>     a few bytes are moved at a time. Replace with a for-loop.
> 
>     From: Antonino Daplas <adaplas@pol.net>
>     Signed-off-by: Antonino Daplas <adaplas@pol.net>
> ---
> 
>  fbmem.c |    6 ++++--
>  1 files changed, 4 insertions(+), 2 deletions(-)
> 
> --- a/drivers/video/fbmem.c
> +++ b/drivers/video/fbmem.c
> @@ -80,10 +80,12 @@ EXPORT_SYMBOL(fb_get_color_depth);
>   */
>  void fb_pad_aligned_buffer(u8 *dst, u32 d_pitch, u8 *src, u32 s_pitch, u32 height)
>  {
> -	int i;
> +	int i, j;
>  
>  	for (i = height; i--; ) {
> -		memcpy(dst, src, s_pitch);
> +		/* s_pitch is a few bytes at the most, memcpy is suboptimal */
> +		for (j = 0; j < s_pitch; j++)
> +			dst[j] = src[j];
>  		src += s_pitch;
>  		dst += d_pitch;
>  	}
> 
> 
> -------------------------------------------------------
> SF.Net email is Sponsored by the Better Software Conference & EXPO September
> 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
> Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
> Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
> _______________________________________________
> Linux-fbdev-devel mailing list
> Linux-fbdev-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/linux-fbdev-devel
> 


-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Fw: framebuffer blitting performance loss 2.6.12 -> 2.6.13-rc3
  2005-07-29 19:02         ` Andrew Morton
@ 2005-07-29 19:52           ` James Simmons
  2005-07-29 19:59           ` James Simmons
  1 sibling, 0 replies; 14+ messages in thread
From: James Simmons @ 2005-07-29 19:52 UTC (permalink / raw)
  To: linux-fbdev-devel; +Cc: Antonino A. Daplas, Knut_Petersen


> "Antonino A. Daplas" <adaplas@gmail.com> wrote:
> >
> >     fbdev: Replace memcpy with for-loop when preparing bitmap
> 
> Whee, progress.  Please let me know if/when you want this sent to Linus.

Before you do I like to know memcpy is slower than byte by byte copying. 
This just seems to be wrong!


-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Fw: framebuffer blitting performance loss 2.6.12 -> 2.6.13-rc3
  2005-07-29 19:02         ` Andrew Morton
  2005-07-29 19:52           ` James Simmons
@ 2005-07-29 19:59           ` James Simmons
  1 sibling, 0 replies; 14+ messages in thread
From: James Simmons @ 2005-07-29 19:59 UTC (permalink / raw)
  To: Knut_Petersen
  Cc: Antonino A. Daplas, Linux Fbdev development list, Andrew Morton


Can you do some performance measurements with this patch instead? I have 
a theory. I bet because we didn't have the linux version of string.h we 
are using the glibc version instead which is slower. In fact I bet it will
be faster than byte by byte copy. Give it a try.

--- /usr/src/linus-2.6/drivers/video/fbmem.c	2005-07-28 10:24:11.000000000 -0700
+++ fbmem.c	2005-07-29 12:53:30.000000000 -0700
@@ -15,6 +15,7 @@
 #include <linux/module.h>
 
 #include <linux/types.h>
+#include <linux/string.h>
 #include <linux/errno.h>
 #include <linux/sched.h>
 #include <linux/smp_lock.h>


-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Fw: framebuffer blitting performance loss 2.6.12 -> 2.6.13-rc3
  2005-07-29 15:42       ` Antonino A. Daplas
  2005-07-29 19:02         ` Andrew Morton
  2005-07-29 19:51         ` James Simmons
@ 2005-07-29 20:10         ` Knut Petersen
  2 siblings, 0 replies; 14+ messages in thread
From: Knut Petersen @ 2005-07-29 20:10 UTC (permalink / raw)
  To: linux-fbdev-devel

Hi Tony,

>
> Thank you for your persistence.  I think I know the culprit.  Someone
> insisted on using memcpy in fb_pad_aligned_buffer().  I have already
> fixed this before, but apparently, the memcpy was brought back.  Try
> the attached patch and let me know.
>
> Tony

Replacing memcpy() with this inline code helps. Performance is slightly 
slower than
it was in 2.6.12,  but this is hardly measurable and could be caused by 
other changes
in the kernel.

The most affected test, (test 3, cyblafb, ypan) now is about 7ms slower 
than it was
in 2.6.12. Without your patch the performance penalty was 202ms!

Yes, please send the patch to Linus asap, it´s a must for 2.6.13.

Someone should look at memcpy ;-)

cu,
 Knut


-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Fw: framebuffer blitting performance loss 2.6.12 -> 2.6.13-rc3
  2005-07-29 19:51         ` James Simmons
@ 2005-07-29 20:21           ` Jon Smirl
  2005-07-29 22:45             ` Antonino A. Daplas
  2005-07-29 22:45           ` Luca
  1 sibling, 1 reply; 14+ messages in thread
From: Jon Smirl @ 2005-07-29 20:21 UTC (permalink / raw)
  To: linux-fbdev-devel; +Cc: Knut Petersen, Andrew Morton

On 7/29/05, James Simmons <jsimmons@infradead.org> wrote:
> 
> > Thank you for your persistence.  I think I know the culprit.  Someone
> > insisted on using memcpy in fb_pad_aligned_buffer().  I have already
> > fixed this before, but apparently, the memcpy was brought back.  Try
> > the attached patch and let me know.
> 
> Yipes, I did that. The memcpy function is suppose to be optimized for the
> platform. See string.h in the include/asm directory. I seen for example
> the Athlon would use the 3DNow instruction set to copy data. Something
> is really wrong with memcpy if moving byte by byte is faster !!!!
> Alot of drivers use memcpy. If memcpy sucks then drivers should be copying
> byte by byte then. The question I have is this the case for non intel
> platforms as well. Could someone run the numbers on other platforms?

memmove/memcpy is faster. memcpy is faster than memmove so use it if
you can. But, there is a lower limit probably around 16 bytes or so
where the loop becomes faster.  So if you know that you will always be
copying small fragments use the loop.  The compiler can't decide
between loop/memcpy for you since it doesn't know the upper limit on
the length, it is forced to use memcpy since you told it so.

For small things it is even better use a structure assignment if
possible. That lets the compiler decide to do a loop or memcpy since
the length is known.

In this case if we could figure out how to give the compiler an upper
bound on the loop it might decide to unroll it and use multiple moves.

> 
> > Tony
> >
> >    fbdev: Replace memcpy with for-loop when preparing bitmap
> >
> >     Do not use memcpy in fb_pad_aligned_buffer. It is suboptimal because only
> >     a few bytes are moved at a time. Replace with a for-loop.
> >
> >     From: Antonino Daplas <adaplas@pol.net>
> >     Signed-off-by: Antonino Daplas <adaplas@pol.net>
> > ---
> >
> >  fbmem.c |    6 ++++--
> >  1 files changed, 4 insertions(+), 2 deletions(-)
> >
> > --- a/drivers/video/fbmem.c
> > +++ b/drivers/video/fbmem.c
> > @@ -80,10 +80,12 @@ EXPORT_SYMBOL(fb_get_color_depth);
> >   */
> >  void fb_pad_aligned_buffer(u8 *dst, u32 d_pitch, u8 *src, u32 s_pitch, u32 height)
> >  {
> > -     int i;
> > +     int i, j;
> >
> >       for (i = height; i--; ) {
> > -             memcpy(dst, src, s_pitch);
> > +             /* s_pitch is a few bytes at the most, memcpy is suboptimal */
> > +             for (j = 0; j < s_pitch; j++)
> > +                     dst[j] = src[j];
> >               src += s_pitch;
> >               dst += d_pitch;
> >       }
> >
> >
> > -------------------------------------------------------
> > SF.Net email is Sponsored by the Better Software Conference & EXPO September
> > 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
> > Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
> > Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
> > _______________________________________________
> > Linux-fbdev-devel mailing list
> > Linux-fbdev-devel@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/linux-fbdev-devel
> >
> 
> 
> -------------------------------------------------------
> SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
> from IBM. Find simple to follow Roadmaps, straightforward articles,
> informative Webcasts and more! Get everything you need to get up to
> speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click
> _______________________________________________
> Linux-fbdev-devel mailing list
> Linux-fbdev-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/linux-fbdev-devel
> 


-- 
Jon Smirl
jonsmirl@gmail.com


-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_idt77&alloc_id\x16492&op=click

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Fw: framebuffer blitting performance loss 2.6.12 -> 2.6.13-rc3
  2005-07-29 20:21           ` Jon Smirl
@ 2005-07-29 22:45             ` Antonino A. Daplas
  2005-08-03 17:29               ` James Simmons
  0 siblings, 1 reply; 14+ messages in thread
From: Antonino A. Daplas @ 2005-07-29 22:45 UTC (permalink / raw)
  To: linux-fbdev-devel; +Cc: Knut Petersen, Andrew Morton

Jon Smirl wrote:
> On 7/29/05, James Simmons <jsimmons@infradead.org> wrote:
>>> Thank you for your persistence.  I think I know the culprit.  Someone
>>> insisted on using memcpy in fb_pad_aligned_buffer().  I have already
>>> fixed this before, but apparently, the memcpy was brought back.  Try
>>> the attached patch and let me know.
>> Yipes, I did that. The memcpy function is suppose to be optimized for the
>> platform. See string.h in the include/asm directory. I seen for example
>> the Athlon would use the 3DNow instruction set to copy data. Something
>> is really wrong with memcpy if moving byte by byte is faster !!!!
>> Alot of drivers use memcpy. If memcpy sucks then drivers should be copying
>> byte by byte then. The question I have is this the case for non intel
>> platforms as well. Could someone run the numbers on other platforms?
> 
> memmove/memcpy is faster. memcpy is faster than memmove so use it if
> you can. But, there is a lower limit probably around 16 bytes or so
> where the loop becomes faster.  So if you know that you will always be
> copying small fragments use the loop.  The compiler can't decide

Yes, the loop copies each row of a font character.  For an 8x16 font
that's 1 byte. The maximum fontwidth is 32. A 12x22 font does not pass
through this function because the width is not a multiple of 8.  So,
currently, it's used mostly for 8x16 fonts. 

I already know people using 16x30 fonts. There are probably others bigger
than that. 

Of course, we can always use Duff's version to loop-unroll that particular
section, but even at 4 bytes, I don't know if it's worth the effort. Anyone
knows people using 32 wide fonts?

Tony



-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Fw: framebuffer blitting performance loss 2.6.12 -> 2.6.13-rc3
  2005-07-29 19:51         ` James Simmons
  2005-07-29 20:21           ` Jon Smirl
@ 2005-07-29 22:45           ` Luca
  1 sibling, 0 replies; 14+ messages in thread
From: Luca @ 2005-07-29 22:45 UTC (permalink / raw)
  To: linux-fbdev-devel

Il Fri, Jul 29, 2005 at 08:51:34PM +0100, James Simmons ha scritto: 
> > Thank you for your persistence.  I think I know the culprit.  Someone
> > insisted on using memcpy in fb_pad_aligned_buffer().  I have already
> > fixed this before, but apparently, the memcpy was brought back.  Try
> > the attached patch and let me know.
> 
> Yipes, I did that. The memcpy function is suppose to be optimized for the 
> platform. See string.h in the include/asm directory. I seen for example 
> the Athlon would use the 3DNow instruction set to copy data. Something 
> is really wrong with memcpy if moving byte by byte is faster !!!! 

For small copies MMX/3DNow are not used at all. In current kernel
MMX/3DNow memcpy is used only when data size is greater than 512bytes.
Remember that MMX/3DNow uses FPU so the kernel must save/restore state
and this overhead would make the copy slow for small chunks.

Luca
-- 
Home: http://kronoz.cjb.net
Se il  destino di un uomo  e` annegare, anneghera` anche  in un bicchier
d'acqua.
Proverbio yddish


-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Fw: framebuffer blitting performance loss 2.6.12 -> 2.6.13-rc3
  2005-07-29 22:45             ` Antonino A. Daplas
@ 2005-08-03 17:29               ` James Simmons
  0 siblings, 0 replies; 14+ messages in thread
From: James Simmons @ 2005-08-03 17:29 UTC (permalink / raw)
  To: linux-fbdev-devel; +Cc: Knut Petersen, Andrew Morton


> Yes, the loop copies each row of a font character.  For an 8x16 font
> that's 1 byte. The maximum fontwidth is 32. A 12x22 font does not pass
> through this function because the width is not a multiple of 8.  So,
> currently, it's used mostly for 8x16 fonts. 
> 
> I already know people using 16x30 fonts. There are probably others bigger
> than that. 
> 
> Of course, we can always use Duff's version to loop-unroll that particular
> section, but even at 4 bytes, I don't know if it's worth the effort. Anyone
> knows people using 32 wide fonts?

The console system supports up to 32 pixel wide fonts. Even at that 
maximum size we only copy 4 bytes of data at a time. Unrolling the loop 
is right.



-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2005-08-03 17:30 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-07-15 10:39 Fw: framebuffer blitting performance loss 2.6.12 -> 2.6.13-rc3 Andrew Morton
2005-07-22  3:58 ` Antonino A. Daplas
2005-07-29  7:17   ` Andrew Morton
2005-07-29 14:54     ` Knut Petersen
2005-07-29 15:42       ` Antonino A. Daplas
2005-07-29 19:02         ` Andrew Morton
2005-07-29 19:52           ` James Simmons
2005-07-29 19:59           ` James Simmons
2005-07-29 19:51         ` James Simmons
2005-07-29 20:21           ` Jon Smirl
2005-07-29 22:45             ` Antonino A. Daplas
2005-08-03 17:29               ` James Simmons
2005-07-29 22:45           ` Luca
2005-07-29 20:10         ` Knut Petersen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).