* [PATCH] Make shr to divide by power of 2
@ 2009-08-06 18:09 Sergey Senozhatsky
2009-08-07 6:50 ` Andi Kleen
0 siblings, 1 reply; 6+ messages in thread
From: Sergey Senozhatsky @ 2009-08-06 18:09 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel
[-- Attachment #1: Type: text/plain, Size: 918 bytes --]
Hello Andrew,
drivers/video/console/bitblit.c
'shr' is used to divide by 8 except 'static void bit_putcs'.
Of course, compiler supposed to do something like
add $0x7,%eax
shr $0x3,%eax
instead of div. Should we rely on this?
============
Make an arithmetic right shift to divide by power of 2.
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
diff --git a/drivers/video/console/bitblit.c b/drivers/video/console/bitblit.c
index 69864b1..b816151 100644
--- a/drivers/video/console/bitblit.c
+++ b/drivers/video/console/bitblit.c
@@ -144,7 +144,7 @@ static void bit_putcs(struct vc_data *vc, struct fb_info *info,
int fg, int bg)
{
struct fb_image image;
- u32 width = (vc->vc_font.width + 7)/8;
+ u32 width = (vc->vc_font.width + 7) >> 3;
u32 cellsize = width * vc->vc_font.height;
u32 maxcnt = info->pixmap.size/cellsize;
u32 scan_align = info->pixmap.scan_align - 1;
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 315 bytes --]
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] Make shr to divide by power of 2
2009-08-06 18:09 [PATCH] Make shr to divide by power of 2 Sergey Senozhatsky
@ 2009-08-07 6:50 ` Andi Kleen
2009-08-07 7:42 ` Sergey Senozhatsky
2009-08-08 3:09 ` Robert Hancock
0 siblings, 2 replies; 6+ messages in thread
From: Andi Kleen @ 2009-08-07 6:50 UTC (permalink / raw)
To: Sergey Senozhatsky; +Cc: Andrew Morton, linux-kernel
Sergey Senozhatsky <sergey.senozhatsky@gmail.com> writes:
>
> Of course, compiler supposed to do something like
> add $0x7,%eax
> shr $0x3,%eax
> instead of div. Should we rely on this?
Yes. We should rely on this.
-Andi
--
ak@linux.intel.com -- Speaking for myself only.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] Make shr to divide by power of 2
2009-08-07 6:50 ` Andi Kleen
@ 2009-08-07 7:42 ` Sergey Senozhatsky
2009-08-08 3:09 ` Robert Hancock
1 sibling, 0 replies; 6+ messages in thread
From: Sergey Senozhatsky @ 2009-08-07 7:42 UTC (permalink / raw)
To: Andi Kleen; +Cc: Andrew Morton, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 357 bytes --]
On (08/07/09 08:50), Andi Kleen wrote:
> Sergey Senozhatsky <sergey.senozhatsky@gmail.com> writes:
> >
> > Of course, compiler supposed to do something like
> > add $0x7,%eax
> > shr $0x3,%eax
> > instead of div. Should we rely on this?
>
> Yes. We should rely on this.
>
grep -cR '>> 3' bitblit.c
5
and only 1 '/8'.
Sergey
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 315 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] Make shr to divide by power of 2
2009-08-07 6:50 ` Andi Kleen
2009-08-07 7:42 ` Sergey Senozhatsky
@ 2009-08-08 3:09 ` Robert Hancock
2009-08-08 7:35 ` Andi Kleen
1 sibling, 1 reply; 6+ messages in thread
From: Robert Hancock @ 2009-08-08 3:09 UTC (permalink / raw)
To: Andi Kleen; +Cc: Sergey Senozhatsky, Andrew Morton, linux-kernel
On 08/07/2009 12:50 AM, Andi Kleen wrote:
> Sergey Senozhatsky<sergey.senozhatsky@gmail.com> writes:
>> Of course, compiler supposed to do something like
>> add $0x7,%eax
>> shr $0x3,%eax
>> instead of div. Should we rely on this?
>
> Yes. We should rely on this.
It may depend on the selected CPU type that gcc is optimizing for - I
believe that on some P4s the shift may actually be slower than the divide..
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] Make shr to divide by power of 2
2009-08-08 3:09 ` Robert Hancock
@ 2009-08-08 7:35 ` Andi Kleen
2009-08-09 9:40 ` Sergey Senozhatsky
0 siblings, 1 reply; 6+ messages in thread
From: Andi Kleen @ 2009-08-08 7:35 UTC (permalink / raw)
To: Robert Hancock
Cc: Andi Kleen, Sergey Senozhatsky, Andrew Morton, linux-kernel
On Fri, Aug 07, 2009 at 09:09:36PM -0600, Robert Hancock wrote:
> On 08/07/2009 12:50 AM, Andi Kleen wrote:
>> Sergey Senozhatsky<sergey.senozhatsky@gmail.com> writes:
>>> Of course, compiler supposed to do something like
>>> add $0x7,%eax
>>> shr $0x3,%eax
>>> instead of div. Should we rely on this?
>>
>> Yes. We should rely on this.
>
> It may depend on the selected CPU type that gcc is optimizing for - I
> believe that on some P4s the shift may actually be slower than the divide..
DIV should be always slower than a SHIFT.
But it has nothing really to do with the CPU. The point is that the compiler
always selects a suitable one by itself. Rewriting x / 2 to x >> 1 is
one of the easiest exercises in compiler optimizations.
The only case when the compiler cannot do this easily by itself is
when the dividend is not a constant.
That said -Os sometimes screws us up on this, but it's still not worth
doing this change manually.
-Andi
--
ak@linux.intel.com -- Speaking for myself only.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] Make shr to divide by power of 2
2009-08-08 7:35 ` Andi Kleen
@ 2009-08-09 9:40 ` Sergey Senozhatsky
0 siblings, 0 replies; 6+ messages in thread
From: Sergey Senozhatsky @ 2009-08-09 9:40 UTC (permalink / raw)
To: Andi Kleen; +Cc: Robert Hancock, Andrew Morton, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 2086 bytes --]
On (08/08/09 10:22), Robert Hancock wrote:
> Actually, the Intel Architecture Optimization Reference Manual doesn't
> say divide may be faster, but it does say that "On processors based on
> Intel NetBurst microarchitecture, latencies of some instructions are
> relatively significant (including shifts, rotates, integer multiplies,
> and moves from memory with sign extension)." and that "The SHIFT and
> ROTATE instructions have a longer latency on processor with a CPUID
> signature corresponding to family 15 and model encoding of 0, 1, or 2.
> The latency of a sequence of adds will be shorter for left shifts of three or less."
Intel Architecture Optimization Reference Manual does say about latency:
Table C-13a. General Purpose Instructions
Instruction Latency Throughput
IDIV | 11-21 13-23 17-41 22 | 5-13 5-14 12-36 22
SAL/SAR/SHL/SHR | 1 1 1 | 0.33 0.33 0.33
For example,
Table 12-2. Intel® Atom™ Microarchitecture Instructions Latency Data
Instruction Latency Throughput
IDIV r/m8; IDIV r/m16; | 33;42; | 32;41;56;196
IDIV r/m32; IDIV r/m64; | 57;197 |
| |
ROL; ROR; SAL; | 1 | 1
SAR; SHL; SHR | |
*SHLD/SHRD |4;2-11 |3;1-10
On (08/08/09 09:35), Andi Kleen wrote:
> DIV should be always slower than a SHIFT.
>
> But it has nothing really to do with the CPU. The point is that the compiler
> always selects a suitable one by itself. Rewriting x / 2 to x >> 1 is
> one of the easiest exercises in compiler optimizations.
>
> The only case when the compiler cannot do this easily by itself is
> when the dividend is not a constant.
>
int width = (vc->vc_font.width + 7) >> 3;
> That said -Os sometimes screws us up on this, but it's still not worth
> doing this change manually.
>
My point is that it should 'look the same'.
I mean there are 5
int width = (vc->vc_font.width + 7) >> 3;
*not exactly this one, but vc->vc_font.width (+ 7)? >> 3
and _only_ one
int width = (vc->vc_font.width + 7) / 8;
P.S.
Sorry, hit "reply", not "reply to all".
Sergey
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 315 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2009-08-09 9:37 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-08-06 18:09 [PATCH] Make shr to divide by power of 2 Sergey Senozhatsky
2009-08-07 6:50 ` Andi Kleen
2009-08-07 7:42 ` Sergey Senozhatsky
2009-08-08 3:09 ` Robert Hancock
2009-08-08 7:35 ` Andi Kleen
2009-08-09 9:40 ` Sergey Senozhatsky
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox