From: Knut Petersen <Knut_Petersen@t-online.de>
To: Roman Zippel <zippel@linux-m68k.org>
Cc: Andrew Morton <akpm@osdl.org>,
linux-fbdev-devel@lists.sourceforge.net,
"Antonino A. Daplas" <adaplas@gmail.com>,
Linux Kernel Development <linux-kernel@vger.kernel.org>,
Jochen Hein <jochen@jochen.org>,
Geert Uytterhoeven <geert@linux-m68k.org>
Subject: Re: [Linux-fbdev-devel] [PATCH 1/1 2.6.13] framebuffer: bit_putcs() optimization for 8x* fonts
Date: Wed, 31 Aug 2005 21:19:40 +0200 [thread overview]
Message-ID: <431602CC.1030008@t-online.de> (raw)
In-Reply-To: <Pine.LNX.4.61.0508311750140.3728@scrub.home>
Hi Roman!
>>+static inline void __fb_pad_aligned_buffer(u8 *dst, u32 d_pitch, u8 *src, +
>>u32 s_pitch, u32 height)
>>+{
>>+ int i, j;
>>+
>>+ if (likely(s_pitch==1))
>>+ for(i=0; i < height; i++)
>>+ dst[d_pitch*i] = src[i];
>>
>>
I added the multiply back because gcc (v. 3.3.4) does generate the
fastest code
if I write it this way. I compiled, inspected the generated assembly and
benchmarked
about a dozend variations of the code (benchmark as previously described).
The special case for s_pitch == 1 saves about 10 ms system time (770 ms
-> 760 ms)
The special case for s_pitch == 2 saves about 270 ms system time (2120
-> 1850ms)
with a 16x30 font.
The third case is for even bigger fonts ... I believe that it will not
often be used but
something like that must be present.
>You have now 3 slightly different variants of the same, which isn't really
>an improvement. In my example I showed you how to generate the first and
>last version from the same source.
>
The first version will only be generated when gcc can be sure that
s_pitch is 1.
Therefore you had to explicitly call __fb_pad_aligned_buffer with that
value:
if (likely(idx == 1))
__fb_pad_aligned_buffer(dst, pitch, src, 1, image.height);
else
fb_pad_aligned_buffer(dst, pitch, src, idx, image.height);
With the version I propose it큦 enough to write
__fb_pad_aligned_buffer(dst, pitch, src, idx, image.height);
instead, and you will get good performance for all cases. If the value of
idx/s_pitch is know at compile time, the compiler can and will ignore the
other cases.
>If you also want to optimize for other sizes, you might want to always
>inline the function, if the function call overhead is the largest part
>anyway, the special case for 2 bytes might not be needed anymore.
>
>
fb_pad_aligned_buffer() is usefull to save some space in cases like
softcursor.
It큦 also used by some drivers (nvidia and riva), but the authors of
those drivers
have to decide if they prefer the inlined version or the version fixed
in fbmem.
>BTW this version saves another condition:
>
>static inline void __fb_pad_aligned_buffer(u8 *dst, u32 d_pitch, u8 *src, u32 s_pitch, u32 height)
>{
> int i, j;
>
> d_pitch -= s_pitch;
> i = height;
> do {
> /* s_pitch is a few bytes at the most, memcpy is suboptimal */
> j = s_pitch;
> do
> *dst++ = *src++;
> while (--j > 0);
> dst += d_pitch;
> } while (--i > 0);
>}
>
I tested that code, together with the followig code in but_putcs():
if(idx==1)
__fb_pad_aligned_buffer(dst, pitch, src, 1,
image.height);
else if (idx==2)
__fb_pad_aligned_buffer(dst, pitch, src, 2,
image.height);
else
fb_pad_aligned_buffer(dst, pitch, src, idx,
image.height);
dst += width;
It큦 as fast/slow as your previous version, the measurements are almost
identical.
Let큦 summarize:
Your version of __fb_pad_aligned_buffer looks much better, but it needs
not so nice
conditionals when used. My version looks bad, but it is easier to use
and it is
_faster_.
cu,
knut
WARNING: multiple messages have this Message-ID (diff)
From: Knut Petersen <Knut_Petersen@t-online.de>
To: Roman Zippel <zippel@linux-m68k.org>
Cc: Andrew Morton <akpm@osdl.org>,
linux-fbdev-devel@lists.sourceforge.net,
"Antonino A. Daplas" <adaplas@gmail.com>,
Linux Kernel Development <linux-kernel@vger.kernel.org>,
Jochen Hein <jochen@jochen.org>,
Geert Uytterhoeven <geert@linux-m68k.org>
Subject: Re: [Linux-fbdev-devel] [PATCH 1/1 2.6.13] framebuffer: bit_putcs() optimization for 8x* fonts
Date: Wed, 31 Aug 2005 21:19:40 +0200 [thread overview]
Message-ID: <431602CC.1030008@t-online.de> (raw)
In-Reply-To: <Pine.LNX.4.61.0508311750140.3728@scrub.home>
Hi Roman!
>>+static inline void __fb_pad_aligned_buffer(u8 *dst, u32 d_pitch, u8 *src, +
>>u32 s_pitch, u32 height)
>>+{
>>+ int i, j;
>>+
>>+ if (likely(s_pitch==1))
>>+ for(i=0; i < height; i++)
>>+ dst[d_pitch*i] = src[i];
>>
>>
I added the multiply back because gcc (v. 3.3.4) does generate the
fastest code
if I write it this way. I compiled, inspected the generated assembly and
benchmarked
about a dozend variations of the code (benchmark as previously described).
The special case for s_pitch == 1 saves about 10 ms system time (770 ms
-> 760 ms)
The special case for s_pitch == 2 saves about 270 ms system time (2120
-> 1850ms)
with a 16x30 font.
The third case is for even bigger fonts ... I believe that it will not
often be used but
something like that must be present.
>You have now 3 slightly different variants of the same, which isn't really
>an improvement. In my example I showed you how to generate the first and
>last version from the same source.
>
The first version will only be generated when gcc can be sure that
s_pitch is 1.
Therefore you had to explicitly call __fb_pad_aligned_buffer with that
value:
if (likely(idx == 1))
__fb_pad_aligned_buffer(dst, pitch, src, 1, image.height);
else
fb_pad_aligned_buffer(dst, pitch, src, idx, image.height);
With the version I propose it´s enough to write
__fb_pad_aligned_buffer(dst, pitch, src, idx, image.height);
instead, and you will get good performance for all cases. If the value of
idx/s_pitch is know at compile time, the compiler can and will ignore the
other cases.
>If you also want to optimize for other sizes, you might want to always
>inline the function, if the function call overhead is the largest part
>anyway, the special case for 2 bytes might not be needed anymore.
>
>
fb_pad_aligned_buffer() is usefull to save some space in cases like
softcursor.
It´s also used by some drivers (nvidia and riva), but the authors of
those drivers
have to decide if they prefer the inlined version or the version fixed
in fbmem.
>BTW this version saves another condition:
>
>static inline void __fb_pad_aligned_buffer(u8 *dst, u32 d_pitch, u8 *src, u32 s_pitch, u32 height)
>{
> int i, j;
>
> d_pitch -= s_pitch;
> i = height;
> do {
> /* s_pitch is a few bytes at the most, memcpy is suboptimal */
> j = s_pitch;
> do
> *dst++ = *src++;
> while (--j > 0);
> dst += d_pitch;
> } while (--i > 0);
>}
>
I tested that code, together with the followig code in but_putcs():
if(idx==1)
__fb_pad_aligned_buffer(dst, pitch, src, 1,
image.height);
else if (idx==2)
__fb_pad_aligned_buffer(dst, pitch, src, 2,
image.height);
else
fb_pad_aligned_buffer(dst, pitch, src, idx,
image.height);
dst += width;
It´s as fast/slow as your previous version, the measurements are almost
identical.
Let´s summarize:
Your version of __fb_pad_aligned_buffer looks much better, but it needs
not so nice
conditionals when used. My version looks bad, but it is easier to use
and it is
_faster_.
cu,
knut
next prev parent reply other threads:[~2005-08-31 19:19 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-08-30 16:15 [PATCH 1/1 2.6.13] framebuffer: bit_putcs() optimization for 8x* fonts Knut Petersen
2005-08-30 16:18 ` Geert Uytterhoeven
2005-08-30 16:18 ` [Linux-fbdev-devel] " Geert Uytterhoeven
2005-08-30 17:58 ` Knut Petersen
2005-08-30 17:58 ` [Linux-fbdev-devel] " Knut Petersen
2005-08-30 19:13 ` Roman Zippel
2005-08-30 19:13 ` [Linux-fbdev-devel] " Roman Zippel
2005-08-30 22:26 ` Knut Petersen
2005-08-30 22:26 ` [Linux-fbdev-devel] " Knut Petersen
2005-08-31 0:51 ` Roman Zippel
2005-08-31 6:42 ` Antonino A. Daplas
2005-08-31 15:49 ` Roman Zippel
2005-08-31 15:49 ` [Linux-fbdev-devel] " Roman Zippel
2005-08-31 12:46 ` Knut Petersen
2005-08-31 12:46 ` [Linux-fbdev-devel] " Knut Petersen
2005-08-31 17:15 ` Roman Zippel
2005-08-31 19:19 ` Knut Petersen [this message]
2005-08-31 19:19 ` Knut Petersen
2005-08-31 19:34 ` Roman Zippel
2005-08-31 19:52 ` Knut Petersen
2005-08-30 19:59 ` Geert Uytterhoeven
2005-08-30 19:59 ` Geert Uytterhoeven
2005-08-31 1:14 ` Antonino A. Daplas
2005-08-31 1:14 ` Antonino A. Daplas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=431602CC.1030008@t-online.de \
--to=knut_petersen@t-online.de \
--cc=adaplas@gmail.com \
--cc=akpm@osdl.org \
--cc=geert@linux-m68k.org \
--cc=jochen@jochen.org \
--cc=linux-fbdev-devel@lists.sourceforge.net \
--cc=linux-kernel@vger.kernel.org \
--cc=zippel@linux-m68k.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.