All of lore.kernel.org
 help / color / mirror / Atom feed
From: Knut Petersen <Knut_Petersen@t-online.de>
To: Roman Zippel <zippel@linux-m68k.org>
Cc: Andrew Morton <akpm@osdl.org>,
	linux-fbdev-devel@lists.sourceforge.net,
	"Antonino A. Daplas" <adaplas@gmail.com>,
	Linux Kernel Development <linux-kernel@vger.kernel.org>,
	Jochen Hein <jochen@jochen.org>,
	Geert Uytterhoeven <geert@linux-m68k.org>
Subject: Re: [Linux-fbdev-devel] [PATCH 1/1 2.6.13] framebuffer: bit_putcs() optimization for 8x* fonts
Date: Wed, 31 Aug 2005 21:19:40 +0200	[thread overview]
Message-ID: <431602CC.1030008@t-online.de> (raw)
In-Reply-To: <Pine.LNX.4.61.0508311750140.3728@scrub.home>

  Hi Roman!

>>+static inline void __fb_pad_aligned_buffer(u8 *dst, u32 d_pitch, u8 *src, +
>>u32 s_pitch, u32 height)
>>+{
>>+	int i, j;
>>+
>>+	if (likely(s_pitch==1))
>>+		for(i=0; i < height; i++)
>>+			dst[d_pitch*i] = src[i];
>>    
>>

I added the multiply back because gcc (v. 3.3.4) does generate the 
fastest code
if I write it this way. I compiled, inspected the generated assembly and 
benchmarked
about a  dozend variations of the code (benchmark as previously described).

The special case for s_pitch == 1 saves about 10 ms system time (770 ms 
-> 760 ms)

The special case for s_pitch == 2 saves about 270 ms system time (2120 
-> 1850ms)
with a 16x30 font.

The third case is for even bigger fonts ... I believe that it will not 
often be used but
something like that must be present.

>You have now 3 slightly different variants of the same, which isn't really 
>an improvement. In my example I showed you how to generate the first and 
>last version from the same source.
>
The first version will only be generated when gcc can be sure that 
s_pitch is 1.
Therefore you had to explicitly call __fb_pad_aligned_buffer with that 
value:

	if (likely(idx == 1))
		__fb_pad_aligned_buffer(dst, pitch, src, 1, image.height);
	else
		fb_pad_aligned_buffer(dst, pitch, src, idx, image.height);
 
With the version I propose it큦 enough to write

		__fb_pad_aligned_buffer(dst, pitch, src, idx, image.height);

instead, and you will get good performance for all cases. If the value of
idx/s_pitch is know at compile time, the compiler can and will ignore the
other cases.

>If you also want to optimize for other sizes, you might want to always 
>inline the function, if the function call overhead is the largest part 
>anyway, the special case for 2 bytes might not be needed anymore.
>  
>
fb_pad_aligned_buffer() is  usefull to save some space in cases like 
softcursor.
It큦 also used by some  drivers (nvidia and riva), but the authors of 
those drivers
have to decide if they prefer the inlined version or the version fixed 
in fbmem.

>BTW this version saves another condition:
>
>static inline void __fb_pad_aligned_buffer(u8 *dst, u32 d_pitch, u8 *src, u32 s_pitch, u32 height)
>{
>	int i, j;
>
>	d_pitch -= s_pitch;
>	i = height;
>	do {
>		/* s_pitch is a few bytes at the most, memcpy is suboptimal */
>		j = s_pitch;
>		do
>			*dst++ = *src++;
>		while (--j > 0);
>		dst += d_pitch;
>	} while (--i > 0);
>}
>
I tested that code, together with the followig code in but_putcs():

             if(idx==1)
                    __fb_pad_aligned_buffer(dst, pitch, src, 1, 
image.height);
             else if (idx==2)
                    __fb_pad_aligned_buffer(dst, pitch, src, 2, 
image.height);
             else
                    fb_pad_aligned_buffer(dst, pitch, src, idx, 
image.height);
             dst += width;

It큦 as fast/slow as your previous version, the measurements are almost 
identical.

Let큦 summarize:

Your version of __fb_pad_aligned_buffer looks much better, but it needs 
not so nice
conditionals when used. My version looks bad, but it is easier to use 
and it is
_faster_.

cu,
 knut

WARNING: multiple messages have this Message-ID (diff)
From: Knut Petersen <Knut_Petersen@t-online.de>
To: Roman Zippel <zippel@linux-m68k.org>
Cc: Andrew Morton <akpm@osdl.org>,
	linux-fbdev-devel@lists.sourceforge.net,
	"Antonino A. Daplas" <adaplas@gmail.com>,
	Linux Kernel Development <linux-kernel@vger.kernel.org>,
	Jochen Hein <jochen@jochen.org>,
	Geert Uytterhoeven <geert@linux-m68k.org>
Subject: Re: [Linux-fbdev-devel] [PATCH 1/1 2.6.13] framebuffer: bit_putcs() optimization for 8x* fonts
Date: Wed, 31 Aug 2005 21:19:40 +0200	[thread overview]
Message-ID: <431602CC.1030008@t-online.de> (raw)
In-Reply-To: <Pine.LNX.4.61.0508311750140.3728@scrub.home>

  Hi Roman!

>>+static inline void __fb_pad_aligned_buffer(u8 *dst, u32 d_pitch, u8 *src, +
>>u32 s_pitch, u32 height)
>>+{
>>+	int i, j;
>>+
>>+	if (likely(s_pitch==1))
>>+		for(i=0; i < height; i++)
>>+			dst[d_pitch*i] = src[i];
>>    
>>

I added the multiply back because gcc (v. 3.3.4) does generate the 
fastest code
if I write it this way. I compiled, inspected the generated assembly and 
benchmarked
about a  dozend variations of the code (benchmark as previously described).

The special case for s_pitch == 1 saves about 10 ms system time (770 ms 
-> 760 ms)

The special case for s_pitch == 2 saves about 270 ms system time (2120 
-> 1850ms)
with a 16x30 font.

The third case is for even bigger fonts ... I believe that it will not 
often be used but
something like that must be present.

>You have now 3 slightly different variants of the same, which isn't really 
>an improvement. In my example I showed you how to generate the first and 
>last version from the same source.
>
The first version will only be generated when gcc can be sure that 
s_pitch is 1.
Therefore you had to explicitly call __fb_pad_aligned_buffer with that 
value:

	if (likely(idx == 1))
		__fb_pad_aligned_buffer(dst, pitch, src, 1, image.height);
	else
		fb_pad_aligned_buffer(dst, pitch, src, idx, image.height);
 
With the version I propose it´s enough to write

		__fb_pad_aligned_buffer(dst, pitch, src, idx, image.height);

instead, and you will get good performance for all cases. If the value of
idx/s_pitch is know at compile time, the compiler can and will ignore the
other cases.

>If you also want to optimize for other sizes, you might want to always 
>inline the function, if the function call overhead is the largest part 
>anyway, the special case for 2 bytes might not be needed anymore.
>  
>
fb_pad_aligned_buffer() is  usefull to save some space in cases like 
softcursor.
It´s also used by some  drivers (nvidia and riva), but the authors of 
those drivers
have to decide if they prefer the inlined version or the version fixed 
in fbmem.

>BTW this version saves another condition:
>
>static inline void __fb_pad_aligned_buffer(u8 *dst, u32 d_pitch, u8 *src, u32 s_pitch, u32 height)
>{
>	int i, j;
>
>	d_pitch -= s_pitch;
>	i = height;
>	do {
>		/* s_pitch is a few bytes at the most, memcpy is suboptimal */
>		j = s_pitch;
>		do
>			*dst++ = *src++;
>		while (--j > 0);
>		dst += d_pitch;
>	} while (--i > 0);
>}
>
I tested that code, together with the followig code in but_putcs():

             if(idx==1)
                    __fb_pad_aligned_buffer(dst, pitch, src, 1, 
image.height);
             else if (idx==2)
                    __fb_pad_aligned_buffer(dst, pitch, src, 2, 
image.height);
             else
                    fb_pad_aligned_buffer(dst, pitch, src, idx, 
image.height);
             dst += width;

It´s as fast/slow as your previous version, the measurements are almost 
identical.

Let´s summarize:

Your version of __fb_pad_aligned_buffer looks much better, but it needs 
not so nice
conditionals when used. My version looks bad, but it is easier to use 
and it is
_faster_.

cu,
 knut




  reply	other threads:[~2005-08-31 19:19 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-08-30 16:15 [PATCH 1/1 2.6.13] framebuffer: bit_putcs() optimization for 8x* fonts Knut Petersen
2005-08-30 16:18 ` Geert Uytterhoeven
2005-08-30 16:18   ` [Linux-fbdev-devel] " Geert Uytterhoeven
2005-08-30 17:58   ` Knut Petersen
2005-08-30 17:58     ` [Linux-fbdev-devel] " Knut Petersen
2005-08-30 19:13     ` Roman Zippel
2005-08-30 19:13       ` [Linux-fbdev-devel] " Roman Zippel
2005-08-30 22:26       ` Knut Petersen
2005-08-30 22:26         ` [Linux-fbdev-devel] " Knut Petersen
2005-08-31  0:51         ` Roman Zippel
2005-08-31  6:42           ` Antonino A. Daplas
2005-08-31 15:49             ` Roman Zippel
2005-08-31 15:49               ` [Linux-fbdev-devel] " Roman Zippel
2005-08-31 12:46           ` Knut Petersen
2005-08-31 12:46             ` [Linux-fbdev-devel] " Knut Petersen
2005-08-31 17:15             ` Roman Zippel
2005-08-31 19:19               ` Knut Petersen [this message]
2005-08-31 19:19                 ` Knut Petersen
2005-08-31 19:34                 ` Roman Zippel
2005-08-31 19:52                   ` Knut Petersen
2005-08-30 19:59     ` Geert Uytterhoeven
2005-08-30 19:59       ` Geert Uytterhoeven
2005-08-31  1:14     ` Antonino A. Daplas
2005-08-31  1:14 ` Antonino A. Daplas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=431602CC.1030008@t-online.de \
    --to=knut_petersen@t-online.de \
    --cc=adaplas@gmail.com \
    --cc=akpm@osdl.org \
    --cc=geert@linux-m68k.org \
    --cc=jochen@jochen.org \
    --cc=linux-fbdev-devel@lists.sourceforge.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=zippel@linux-m68k.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.