* Fw: framebuffer blitting performance loss 2.6.12 -> 2.6.13-rc3
@ 2005-07-15 10:39 Andrew Morton
2005-07-22 3:58 ` Antonino A. Daplas
0 siblings, 1 reply; 14+ messages in thread
From: Andrew Morton @ 2005-07-15 10:39 UTC (permalink / raw)
To: linux-fbdev-devel; +Cc: Knut Petersen
Begin forwarded message:
Date: Fri, 15 Jul 2005 12:14:37 +0200
From: Knut Petersen <Knut_Petersen@t-online.de>
To: linux-kernel@vger.kernel.org
Subject: framebuffer blitting performance loss 2.6.12 -> 2.6.13-rc3
Hi everybody!
There is a serious performance loss between 2.6.12 and 2.6.13-rc3
affecting _all_ framebuffer devices, especially those with fast
bitblit functions.
System: Via Epia 5000
CPU: Via Samuel 2, 533MHz
Graphics core: Cyberblade/i1 (Blade 3D core integrated in 8601A)
Framebuffer driver: Not yet released fully accelerated framebuffer
driver cyblafb
Test setup
==========
video mode: 1280x1024, vyres=2662, bpp=8, 8x16 font, ypan scrollmode
kernel 2.6.13-rc3 is compiled with HZ==1000
Measurement 1: Compile framebuffer modules
Result: 2.6.13-rc3 is slightly slower, but this is an almost
invisible performance loss of about 1%
Measurement 2: time cat of file consisting of 2000 empty lines
Result:
| 2.6.12 / 2.6.13-rc3
------------------------------------------+----------------------
total time | 0.182s / 0.220s
Measurement 3: time cat of file consisting of 2000 full lines of
160 characters each. Result:
Result:
| 2.6.12 / 2.6.13-rc3
------------------------------------------+----------------------
total time | 0.853s / 1.062s
time spent in framebuffer bitblit routine | 0.256s / 0.257s
time spent for kernel bitblit overhead | 0,426s / 0.623s !!!
other time (scrolling, disk io etc) | 0,171s / 0,182s
Discussion of measurements
==========================
Framebuffer compiling shows that the general kernel performance is
more or less unchanged between 2.6.12 and 2.6.13-rc3.
Cat-ing of the file consisting of 2000 empty lines takes about 20.9%
more time, cat-ing of the file consisting of 2000 full lines takes about
24% more time.
As the time spent in the bitblit function of the framebuffer driver
does not change I do assume that the data sent to the framebuffer
driver has not changed. But the new routines take about 46% longer.
All framebuffer drivers should be affected by this performance loss,
but the faster the bitblit of the used framebuffer driver is, the
more it will affect the general performance. You will not see such
a great difference if e.g. vesafb is used.
Please have a serious look at the changed code of fbcon/fbmem etc
or switch back to the old routines.
cu,
Knut
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: Fw: framebuffer blitting performance loss 2.6.12 -> 2.6.13-rc3 2005-07-15 10:39 Fw: framebuffer blitting performance loss 2.6.12 -> 2.6.13-rc3 Andrew Morton @ 2005-07-22 3:58 ` Antonino A. Daplas 2005-07-29 7:17 ` Andrew Morton 0 siblings, 1 reply; 14+ messages in thread From: Antonino A. Daplas @ 2005-07-22 3:58 UTC (permalink / raw) To: linux-fbdev-devel, Andrew Morton; +Cc: Knut Petersen On Friday 15 July 2005 18:39, Andrew Morton wrote: > Begin forwarded message: > > Date: Fri, 15 Jul 2005 12:14:37 +0200 > From: Knut Petersen <Knut_Petersen@t-online.de> > To: linux-kernel@vger.kernel.org > Subject: framebuffer blitting performance loss 2.6.12 -> 2.6.13-rc3 > > > Hi everybody! > > There is a serious performance loss between 2.6.12 and 2.6.13-rc3 > affecting _all_ framebuffer devices, especially those with fast > bitblit functions. > I haven't seen any significant performance penalty, between 2.6.12-rc5-mm1 and 2.6.13-rc3-mm1. Based on your results, I would pinpoint the culprit to be in video/console/bitblit.c. However, the changes there are minor, and should not alter the peformance. Tony ------------------------------------------------------- SF.Net email is sponsored by: Discover Easy Linux Migration Strategies from IBM. Find simple to follow Roadmaps, straightforward articles, informative Webcasts and more! Get everything you need to get up to speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Fw: framebuffer blitting performance loss 2.6.12 -> 2.6.13-rc3 2005-07-22 3:58 ` Antonino A. Daplas @ 2005-07-29 7:17 ` Andrew Morton 2005-07-29 14:54 ` Knut Petersen 0 siblings, 1 reply; 14+ messages in thread From: Andrew Morton @ 2005-07-29 7:17 UTC (permalink / raw) To: linux-fbdev-devel; +Cc: adaplas, Knut_Petersen "Antonino A. Daplas" <adaplas@gmail.com> wrote: > > On Friday 15 July 2005 18:39, Andrew Morton wrote: > > Begin forwarded message: > > > > Date: Fri, 15 Jul 2005 12:14:37 +0200 > > From: Knut Petersen <Knut_Petersen@t-online.de> > > To: linux-kernel@vger.kernel.org > > Subject: framebuffer blitting performance loss 2.6.12 -> 2.6.13-rc3 > > > > > > Hi everybody! > > > > There is a serious performance loss between 2.6.12 and 2.6.13-rc3 > > affecting _all_ framebuffer devices, especially those with fast > > bitblit functions. > > > > I haven't seen any significant performance penalty, between 2.6.12-rc5-mm1 > and 2.6.13-rc3-mm1. > > Based on your results, I would pinpoint the culprit to be in > video/console/bitblit.c. However, the changes there are minor, and should not > alter the peformance. > So.. what happened here? Is the problem still present in 2.6.13-rc4? ------------------------------------------------------- SF.Net email is Sponsored by the Better Software Conference & EXPO September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Fw: framebuffer blitting performance loss 2.6.12 -> 2.6.13-rc3 2005-07-29 7:17 ` Andrew Morton @ 2005-07-29 14:54 ` Knut Petersen 2005-07-29 15:42 ` Antonino A. Daplas 0 siblings, 1 reply; 14+ messages in thread From: Knut Petersen @ 2005-07-29 14:54 UTC (permalink / raw) To: linux-fbdev-devel; +Cc: Andrew Morton, adaplas Hi everybody! >>I haven't seen any significant performance penalty, between 2.6.12-rc5-mm1 >>and 2.6.13-rc3-mm1. >> >>Based on your results, I would pinpoint the culprit to be in >>video/console/bitblit.c. However, the changes there are minor, and should not >>alter the peformance. >> >> > >So.. what happened here? Is the problem still present in 2.6.13-rc4? > > Yes, the problem still is present in 2.6.13-rc4. ================================================ There is only an insignificant difference of max +/- 2ms between 2.6.13-rc3 and 2.6.13-rc4 for all measurements. Test 1: reset;time cat scrolltest0 Test 2: reset;time cat scrolltest80 Test 3: reset;time cat scrolltest160 scrolltest0 is a file with 2000 empty lines. scrolltest80 is a file with 2000 lines of 80 characters each. scrolltest 160 is a file with 2000 lines of 160 characters each. vesafb tests are made with the original vesafb of the respective kernel versions, cyblafb tests all use the same source file, accelerations: fillrect, bitblit, copyarea 2.6.13-rc* are compiled for 1000Hz system timer as it is also used for 2.6.12. chipset: trident cyberblade/i1 video mode: vesa 0x307 (1280x1024@75hz) 8x16 font Nothing but the kernel changed between the tests, the time values given are system time in seconds. vga=0x307 | test 1 test 2 test 3 | test 1 test 2 test 3 | video=vesafb:ypan | video=vesafb -----------+---------------------------+--------------------------- 2.6.12 | 3,753s 4,825s 5,936s | 4,258s 65,645s 126,898s 2.6.13-rc4 | 3,937s 5,135s 6,302s | 4,304s 71,515s 138,674s | +4,9% +6,42% +6,17% | +1,08% +8,94% +9,28% vga=0x307 | test 1 test 2 test 3 | test 1 test 2 test 3 | video=cyblafb | video=cyblafb:noypan -----------+---------------------------+--------------------------- 2.6.12 | 0,228s 0,549s 0,870s | 7,692s 8,015s 8,335s 2.6.13-rc4 | 0,235s 0,654s 1,072s | 7,699s 8,120s 8,549s | +3,07% +19,13% +23,22% | +0,09% +1,31% +2,57% The numbers show very clearly that 2.6.13-rc* blitting is much slower than the blitting of 2.6.12. For cyblafb the time spend for the actual blitting is about 257ms for test3, so the actual performance loss for the pre-driver part is above 30% Now for a real world example: reset; time cat patch-2.6.13-rc4 cyblafb, kernel 2.6.12 : 173,013s cyblafb, kernel 2.6.13-rc4 : 196,181s difference : 23,168s ( +13,4% ) Could anyone take the time to measure performance of some other drivers? Those using ypan scrolling and hardware accelerated bitblit should be most affected. cu, Knut ------------------------------------------------------- SF.Net email is Sponsored by the Better Software Conference & EXPO September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Fw: framebuffer blitting performance loss 2.6.12 -> 2.6.13-rc3 2005-07-29 14:54 ` Knut Petersen @ 2005-07-29 15:42 ` Antonino A. Daplas 2005-07-29 19:02 ` Andrew Morton ` (2 more replies) 0 siblings, 3 replies; 14+ messages in thread From: Antonino A. Daplas @ 2005-07-29 15:42 UTC (permalink / raw) To: Knut Petersen; +Cc: linux-fbdev-devel, Andrew Morton Knut Petersen wrote: > Hi everybody! > >>> I haven't seen any significant performance penalty, between >>> 2.6.12-rc5-mm1 >>> and 2.6.13-rc3-mm1. >>> >>> Based on your results, I would pinpoint the culprit to be in >>> video/console/bitblit.c. However, the changes there are minor, and >>> should not >>> alter the peformance. >>> >>> >> >> So.. what happened here? Is the problem still present in 2.6.13-rc4? >> >> > Yes, the problem still is present in 2.6.13-rc4. > ================================================ Thank you for your persistence. I think I know the culprit. Someone insisted on using memcpy in fb_pad_aligned_buffer(). I have already fixed this before, but apparently, the memcpy was brought back. Try the attached patch and let me know. Tony fbdev: Replace memcpy with for-loop when preparing bitmap Do not use memcpy in fb_pad_aligned_buffer. It is suboptimal because only a few bytes are moved at a time. Replace with a for-loop. From: Antonino Daplas <adaplas@pol.net> Signed-off-by: Antonino Daplas <adaplas@pol.net> --- fbmem.c | 6 ++++-- 1 files changed, 4 insertions(+), 2 deletions(-) --- a/drivers/video/fbmem.c +++ b/drivers/video/fbmem.c @@ -80,10 +80,12 @@ EXPORT_SYMBOL(fb_get_color_depth); */ void fb_pad_aligned_buffer(u8 *dst, u32 d_pitch, u8 *src, u32 s_pitch, u32 height) { - int i; + int i, j; for (i = height; i--; ) { - memcpy(dst, src, s_pitch); + /* s_pitch is a few bytes at the most, memcpy is suboptimal */ + for (j = 0; j < s_pitch; j++) + dst[j] = src[j]; src += s_pitch; dst += d_pitch; } ------------------------------------------------------- SF.Net email is Sponsored by the Better Software Conference & EXPO September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Fw: framebuffer blitting performance loss 2.6.12 -> 2.6.13-rc3 2005-07-29 15:42 ` Antonino A. Daplas @ 2005-07-29 19:02 ` Andrew Morton 2005-07-29 19:52 ` James Simmons 2005-07-29 19:59 ` James Simmons 2005-07-29 19:51 ` James Simmons 2005-07-29 20:10 ` Knut Petersen 2 siblings, 2 replies; 14+ messages in thread From: Andrew Morton @ 2005-07-29 19:02 UTC (permalink / raw) To: Antonino A. Daplas; +Cc: Knut_Petersen, linux-fbdev-devel "Antonino A. Daplas" <adaplas@gmail.com> wrote: > > fbdev: Replace memcpy with for-loop when preparing bitmap Whee, progress. Please let me know if/when you want this sent to Linus. ------------------------------------------------------- SF.Net email is sponsored by: Discover Easy Linux Migration Strategies from IBM. Find simple to follow Roadmaps, straightforward articles, informative Webcasts and more! Get everything you need to get up to speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Fw: framebuffer blitting performance loss 2.6.12 -> 2.6.13-rc3 2005-07-29 19:02 ` Andrew Morton @ 2005-07-29 19:52 ` James Simmons 2005-07-29 19:59 ` James Simmons 1 sibling, 0 replies; 14+ messages in thread From: James Simmons @ 2005-07-29 19:52 UTC (permalink / raw) To: linux-fbdev-devel; +Cc: Antonino A. Daplas, Knut_Petersen > "Antonino A. Daplas" <adaplas@gmail.com> wrote: > > > > fbdev: Replace memcpy with for-loop when preparing bitmap > > Whee, progress. Please let me know if/when you want this sent to Linus. Before you do I like to know memcpy is slower than byte by byte copying. This just seems to be wrong! ------------------------------------------------------- SF.Net email is sponsored by: Discover Easy Linux Migration Strategies from IBM. Find simple to follow Roadmaps, straightforward articles, informative Webcasts and more! Get everything you need to get up to speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Fw: framebuffer blitting performance loss 2.6.12 -> 2.6.13-rc3 2005-07-29 19:02 ` Andrew Morton 2005-07-29 19:52 ` James Simmons @ 2005-07-29 19:59 ` James Simmons 1 sibling, 0 replies; 14+ messages in thread From: James Simmons @ 2005-07-29 19:59 UTC (permalink / raw) To: Knut_Petersen Cc: Antonino A. Daplas, Linux Fbdev development list, Andrew Morton Can you do some performance measurements with this patch instead? I have a theory. I bet because we didn't have the linux version of string.h we are using the glibc version instead which is slower. In fact I bet it will be faster than byte by byte copy. Give it a try. --- /usr/src/linus-2.6/drivers/video/fbmem.c 2005-07-28 10:24:11.000000000 -0700 +++ fbmem.c 2005-07-29 12:53:30.000000000 -0700 @@ -15,6 +15,7 @@ #include <linux/module.h> #include <linux/types.h> +#include <linux/string.h> #include <linux/errno.h> #include <linux/sched.h> #include <linux/smp_lock.h> ------------------------------------------------------- SF.Net email is sponsored by: Discover Easy Linux Migration Strategies from IBM. Find simple to follow Roadmaps, straightforward articles, informative Webcasts and more! Get everything you need to get up to speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Fw: framebuffer blitting performance loss 2.6.12 -> 2.6.13-rc3 2005-07-29 15:42 ` Antonino A. Daplas 2005-07-29 19:02 ` Andrew Morton @ 2005-07-29 19:51 ` James Simmons 2005-07-29 20:21 ` Jon Smirl 2005-07-29 22:45 ` Luca 2005-07-29 20:10 ` Knut Petersen 2 siblings, 2 replies; 14+ messages in thread From: James Simmons @ 2005-07-29 19:51 UTC (permalink / raw) To: linux-fbdev-devel; +Cc: Knut Petersen, Andrew Morton > Thank you for your persistence. I think I know the culprit. Someone > insisted on using memcpy in fb_pad_aligned_buffer(). I have already > fixed this before, but apparently, the memcpy was brought back. Try > the attached patch and let me know. Yipes, I did that. The memcpy function is suppose to be optimized for the platform. See string.h in the include/asm directory. I seen for example the Athlon would use the 3DNow instruction set to copy data. Something is really wrong with memcpy if moving byte by byte is faster !!!! Alot of drivers use memcpy. If memcpy sucks then drivers should be copying byte by byte then. The question I have is this the case for non intel platforms as well. Could someone run the numbers on other platforms? > Tony > > fbdev: Replace memcpy with for-loop when preparing bitmap > > Do not use memcpy in fb_pad_aligned_buffer. It is suboptimal because only > a few bytes are moved at a time. Replace with a for-loop. > > From: Antonino Daplas <adaplas@pol.net> > Signed-off-by: Antonino Daplas <adaplas@pol.net> > --- > > fbmem.c | 6 ++++-- > 1 files changed, 4 insertions(+), 2 deletions(-) > > --- a/drivers/video/fbmem.c > +++ b/drivers/video/fbmem.c > @@ -80,10 +80,12 @@ EXPORT_SYMBOL(fb_get_color_depth); > */ > void fb_pad_aligned_buffer(u8 *dst, u32 d_pitch, u8 *src, u32 s_pitch, u32 height) > { > - int i; > + int i, j; > > for (i = height; i--; ) { > - memcpy(dst, src, s_pitch); > + /* s_pitch is a few bytes at the most, memcpy is suboptimal */ > + for (j = 0; j < s_pitch; j++) > + dst[j] = src[j]; > src += s_pitch; > dst += d_pitch; > } > > > ------------------------------------------------------- > SF.Net email is Sponsored by the Better Software Conference & EXPO September > 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices > Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA > Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf > _______________________________________________ > Linux-fbdev-devel mailing list > Linux-fbdev-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/linux-fbdev-devel > ------------------------------------------------------- SF.Net email is sponsored by: Discover Easy Linux Migration Strategies from IBM. Find simple to follow Roadmaps, straightforward articles, informative Webcasts and more! Get everything you need to get up to speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Fw: framebuffer blitting performance loss 2.6.12 -> 2.6.13-rc3 2005-07-29 19:51 ` James Simmons @ 2005-07-29 20:21 ` Jon Smirl 2005-07-29 22:45 ` Antonino A. Daplas 2005-07-29 22:45 ` Luca 1 sibling, 1 reply; 14+ messages in thread From: Jon Smirl @ 2005-07-29 20:21 UTC (permalink / raw) To: linux-fbdev-devel; +Cc: Knut Petersen, Andrew Morton On 7/29/05, James Simmons <jsimmons@infradead.org> wrote: > > > Thank you for your persistence. I think I know the culprit. Someone > > insisted on using memcpy in fb_pad_aligned_buffer(). I have already > > fixed this before, but apparently, the memcpy was brought back. Try > > the attached patch and let me know. > > Yipes, I did that. The memcpy function is suppose to be optimized for the > platform. See string.h in the include/asm directory. I seen for example > the Athlon would use the 3DNow instruction set to copy data. Something > is really wrong with memcpy if moving byte by byte is faster !!!! > Alot of drivers use memcpy. If memcpy sucks then drivers should be copying > byte by byte then. The question I have is this the case for non intel > platforms as well. Could someone run the numbers on other platforms? memmove/memcpy is faster. memcpy is faster than memmove so use it if you can. But, there is a lower limit probably around 16 bytes or so where the loop becomes faster. So if you know that you will always be copying small fragments use the loop. The compiler can't decide between loop/memcpy for you since it doesn't know the upper limit on the length, it is forced to use memcpy since you told it so. For small things it is even better use a structure assignment if possible. That lets the compiler decide to do a loop or memcpy since the length is known. In this case if we could figure out how to give the compiler an upper bound on the loop it might decide to unroll it and use multiple moves. > > > Tony > > > > fbdev: Replace memcpy with for-loop when preparing bitmap > > > > Do not use memcpy in fb_pad_aligned_buffer. It is suboptimal because only > > a few bytes are moved at a time. Replace with a for-loop. > > > > From: Antonino Daplas <adaplas@pol.net> > > Signed-off-by: Antonino Daplas <adaplas@pol.net> > > --- > > > > fbmem.c | 6 ++++-- > > 1 files changed, 4 insertions(+), 2 deletions(-) > > > > --- a/drivers/video/fbmem.c > > +++ b/drivers/video/fbmem.c > > @@ -80,10 +80,12 @@ EXPORT_SYMBOL(fb_get_color_depth); > > */ > > void fb_pad_aligned_buffer(u8 *dst, u32 d_pitch, u8 *src, u32 s_pitch, u32 height) > > { > > - int i; > > + int i, j; > > > > for (i = height; i--; ) { > > - memcpy(dst, src, s_pitch); > > + /* s_pitch is a few bytes at the most, memcpy is suboptimal */ > > + for (j = 0; j < s_pitch; j++) > > + dst[j] = src[j]; > > src += s_pitch; > > dst += d_pitch; > > } > > > > > > ------------------------------------------------------- > > SF.Net email is Sponsored by the Better Software Conference & EXPO September > > 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices > > Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA > > Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf > > _______________________________________________ > > Linux-fbdev-devel mailing list > > Linux-fbdev-devel@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/linux-fbdev-devel > > > > > ------------------------------------------------------- > SF.Net email is sponsored by: Discover Easy Linux Migration Strategies > from IBM. Find simple to follow Roadmaps, straightforward articles, > informative Webcasts and more! Get everything you need to get up to > speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click > _______________________________________________ > Linux-fbdev-devel mailing list > Linux-fbdev-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/linux-fbdev-devel > -- Jon Smirl jonsmirl@gmail.com ------------------------------------------------------- SF.Net email is sponsored by: Discover Easy Linux Migration Strategies from IBM. Find simple to follow Roadmaps, straightforward articles, informative Webcasts and more! Get everything you need to get up to speed, fast. http://ads.osdn.com/?ad_idt77&alloc_id\x16492&op=click ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Fw: framebuffer blitting performance loss 2.6.12 -> 2.6.13-rc3 2005-07-29 20:21 ` Jon Smirl @ 2005-07-29 22:45 ` Antonino A. Daplas 2005-08-03 17:29 ` James Simmons 0 siblings, 1 reply; 14+ messages in thread From: Antonino A. Daplas @ 2005-07-29 22:45 UTC (permalink / raw) To: linux-fbdev-devel; +Cc: Knut Petersen, Andrew Morton Jon Smirl wrote: > On 7/29/05, James Simmons <jsimmons@infradead.org> wrote: >>> Thank you for your persistence. I think I know the culprit. Someone >>> insisted on using memcpy in fb_pad_aligned_buffer(). I have already >>> fixed this before, but apparently, the memcpy was brought back. Try >>> the attached patch and let me know. >> Yipes, I did that. The memcpy function is suppose to be optimized for the >> platform. See string.h in the include/asm directory. I seen for example >> the Athlon would use the 3DNow instruction set to copy data. Something >> is really wrong with memcpy if moving byte by byte is faster !!!! >> Alot of drivers use memcpy. If memcpy sucks then drivers should be copying >> byte by byte then. The question I have is this the case for non intel >> platforms as well. Could someone run the numbers on other platforms? > > memmove/memcpy is faster. memcpy is faster than memmove so use it if > you can. But, there is a lower limit probably around 16 bytes or so > where the loop becomes faster. So if you know that you will always be > copying small fragments use the loop. The compiler can't decide Yes, the loop copies each row of a font character. For an 8x16 font that's 1 byte. The maximum fontwidth is 32. A 12x22 font does not pass through this function because the width is not a multiple of 8. So, currently, it's used mostly for 8x16 fonts. I already know people using 16x30 fonts. There are probably others bigger than that. Of course, we can always use Duff's version to loop-unroll that particular section, but even at 4 bytes, I don't know if it's worth the effort. Anyone knows people using 32 wide fonts? Tony ------------------------------------------------------- SF.Net email is sponsored by: Discover Easy Linux Migration Strategies from IBM. Find simple to follow Roadmaps, straightforward articles, informative Webcasts and more! Get everything you need to get up to speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Fw: framebuffer blitting performance loss 2.6.12 -> 2.6.13-rc3 2005-07-29 22:45 ` Antonino A. Daplas @ 2005-08-03 17:29 ` James Simmons 0 siblings, 0 replies; 14+ messages in thread From: James Simmons @ 2005-08-03 17:29 UTC (permalink / raw) To: linux-fbdev-devel; +Cc: Knut Petersen, Andrew Morton > Yes, the loop copies each row of a font character. For an 8x16 font > that's 1 byte. The maximum fontwidth is 32. A 12x22 font does not pass > through this function because the width is not a multiple of 8. So, > currently, it's used mostly for 8x16 fonts. > > I already know people using 16x30 fonts. There are probably others bigger > than that. > > Of course, we can always use Duff's version to loop-unroll that particular > section, but even at 4 bytes, I don't know if it's worth the effort. Anyone > knows people using 32 wide fonts? The console system supports up to 32 pixel wide fonts. Even at that maximum size we only copy 4 bytes of data at a time. Unrolling the loop is right. ------------------------------------------------------- SF.Net email is sponsored by: Discover Easy Linux Migration Strategies from IBM. Find simple to follow Roadmaps, straightforward articles, informative Webcasts and more! Get everything you need to get up to speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Fw: framebuffer blitting performance loss 2.6.12 -> 2.6.13-rc3 2005-07-29 19:51 ` James Simmons 2005-07-29 20:21 ` Jon Smirl @ 2005-07-29 22:45 ` Luca 1 sibling, 0 replies; 14+ messages in thread From: Luca @ 2005-07-29 22:45 UTC (permalink / raw) To: linux-fbdev-devel Il Fri, Jul 29, 2005 at 08:51:34PM +0100, James Simmons ha scritto: > > Thank you for your persistence. I think I know the culprit. Someone > > insisted on using memcpy in fb_pad_aligned_buffer(). I have already > > fixed this before, but apparently, the memcpy was brought back. Try > > the attached patch and let me know. > > Yipes, I did that. The memcpy function is suppose to be optimized for the > platform. See string.h in the include/asm directory. I seen for example > the Athlon would use the 3DNow instruction set to copy data. Something > is really wrong with memcpy if moving byte by byte is faster !!!! For small copies MMX/3DNow are not used at all. In current kernel MMX/3DNow memcpy is used only when data size is greater than 512bytes. Remember that MMX/3DNow uses FPU so the kernel must save/restore state and this overhead would make the copy slow for small chunks. Luca -- Home: http://kronoz.cjb.net Se il destino di un uomo e` annegare, anneghera` anche in un bicchier d'acqua. Proverbio yddish ------------------------------------------------------- SF.Net email is sponsored by: Discover Easy Linux Migration Strategies from IBM. Find simple to follow Roadmaps, straightforward articles, informative Webcasts and more! Get everything you need to get up to speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Fw: framebuffer blitting performance loss 2.6.12 -> 2.6.13-rc3 2005-07-29 15:42 ` Antonino A. Daplas 2005-07-29 19:02 ` Andrew Morton 2005-07-29 19:51 ` James Simmons @ 2005-07-29 20:10 ` Knut Petersen 2 siblings, 0 replies; 14+ messages in thread From: Knut Petersen @ 2005-07-29 20:10 UTC (permalink / raw) To: linux-fbdev-devel Hi Tony, > > Thank you for your persistence. I think I know the culprit. Someone > insisted on using memcpy in fb_pad_aligned_buffer(). I have already > fixed this before, but apparently, the memcpy was brought back. Try > the attached patch and let me know. > > Tony Replacing memcpy() with this inline code helps. Performance is slightly slower than it was in 2.6.12, but this is hardly measurable and could be caused by other changes in the kernel. The most affected test, (test 3, cyblafb, ypan) now is about 7ms slower than it was in 2.6.12. Without your patch the performance penalty was 202ms! Yes, please send the patch to Linus asap, it´s a must for 2.6.13. Someone should look at memcpy ;-) cu, Knut ------------------------------------------------------- SF.Net email is sponsored by: Discover Easy Linux Migration Strategies from IBM. Find simple to follow Roadmaps, straightforward articles, informative Webcasts and more! Get everything you need to get up to speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2005-08-03 17:30 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2005-07-15 10:39 Fw: framebuffer blitting performance loss 2.6.12 -> 2.6.13-rc3 Andrew Morton 2005-07-22 3:58 ` Antonino A. Daplas 2005-07-29 7:17 ` Andrew Morton 2005-07-29 14:54 ` Knut Petersen 2005-07-29 15:42 ` Antonino A. Daplas 2005-07-29 19:02 ` Andrew Morton 2005-07-29 19:52 ` James Simmons 2005-07-29 19:59 ` James Simmons 2005-07-29 19:51 ` James Simmons 2005-07-29 20:21 ` Jon Smirl 2005-07-29 22:45 ` Antonino A. Daplas 2005-08-03 17:29 ` James Simmons 2005-07-29 22:45 ` Luca 2005-07-29 20:10 ` Knut Petersen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).