public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] Make shr to divide by power of 2
@ 2009-08-06 18:09 Sergey Senozhatsky
  2009-08-07  6:50 ` Andi Kleen
  0 siblings, 1 reply; 6+ messages in thread
From: Sergey Senozhatsky @ 2009-08-06 18:09 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 918 bytes --]

Hello Andrew,

drivers/video/console/bitblit.c
'shr' is used to divide by 8 except 'static void bit_putcs'.

Of course, compiler supposed to do something like
	add    $0x7,%eax
	shr    $0x3,%eax
instead of div. Should we rely on this?
============

Make an arithmetic right shift to divide by power of 2.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
diff --git a/drivers/video/console/bitblit.c b/drivers/video/console/bitblit.c
index 69864b1..b816151 100644
--- a/drivers/video/console/bitblit.c
+++ b/drivers/video/console/bitblit.c
@@ -144,7 +144,7 @@ static void bit_putcs(struct vc_data *vc, struct fb_info *info,
 		      int fg, int bg)
 {
 	struct fb_image image;
-	u32 width = (vc->vc_font.width + 7)/8;
+	u32 width = (vc->vc_font.width + 7) >> 3;
 	u32 cellsize = width * vc->vc_font.height;
 	u32 maxcnt = info->pixmap.size/cellsize;
 	u32 scan_align = info->pixmap.scan_align - 1;

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 315 bytes --]

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] Make shr to divide by power of 2
  2009-08-06 18:09 [PATCH] Make shr to divide by power of 2 Sergey Senozhatsky
@ 2009-08-07  6:50 ` Andi Kleen
  2009-08-07  7:42   ` Sergey Senozhatsky
  2009-08-08  3:09   ` Robert Hancock
  0 siblings, 2 replies; 6+ messages in thread
From: Andi Kleen @ 2009-08-07  6:50 UTC (permalink / raw)
  To: Sergey Senozhatsky; +Cc: Andrew Morton, linux-kernel

Sergey Senozhatsky <sergey.senozhatsky@gmail.com> writes:
>
> Of course, compiler supposed to do something like
> 	add    $0x7,%eax
> 	shr    $0x3,%eax
> instead of div. Should we rely on this?

Yes. We should rely on this. 

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Make shr to divide by power of 2
  2009-08-07  6:50 ` Andi Kleen
@ 2009-08-07  7:42   ` Sergey Senozhatsky
  2009-08-08  3:09   ` Robert Hancock
  1 sibling, 0 replies; 6+ messages in thread
From: Sergey Senozhatsky @ 2009-08-07  7:42 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Andrew Morton, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 357 bytes --]

On (08/07/09 08:50), Andi Kleen wrote:
> Sergey Senozhatsky <sergey.senozhatsky@gmail.com> writes:
> >
> > Of course, compiler supposed to do something like
> > 	add    $0x7,%eax
> > 	shr    $0x3,%eax
> > instead of div. Should we rely on this?
> 
> Yes. We should rely on this. 
> 

grep -cR '>> 3' bitblit.c 
5

and only 1 '/8'.

	Sergey

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 315 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Make shr to divide by power of 2
  2009-08-07  6:50 ` Andi Kleen
  2009-08-07  7:42   ` Sergey Senozhatsky
@ 2009-08-08  3:09   ` Robert Hancock
  2009-08-08  7:35     ` Andi Kleen
  1 sibling, 1 reply; 6+ messages in thread
From: Robert Hancock @ 2009-08-08  3:09 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Sergey Senozhatsky, Andrew Morton, linux-kernel

On 08/07/2009 12:50 AM, Andi Kleen wrote:
> Sergey Senozhatsky<sergey.senozhatsky@gmail.com>  writes:
>> Of course, compiler supposed to do something like
>> 	add    $0x7,%eax
>> 	shr    $0x3,%eax
>> instead of div. Should we rely on this?
>
> Yes. We should rely on this.

It may depend on the selected CPU type that gcc is optimizing for - I 
believe that on some P4s the shift may actually be slower than the divide..

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Make shr to divide by power of 2
  2009-08-08  3:09   ` Robert Hancock
@ 2009-08-08  7:35     ` Andi Kleen
  2009-08-09  9:40       ` Sergey Senozhatsky
  0 siblings, 1 reply; 6+ messages in thread
From: Andi Kleen @ 2009-08-08  7:35 UTC (permalink / raw)
  To: Robert Hancock
  Cc: Andi Kleen, Sergey Senozhatsky, Andrew Morton, linux-kernel

On Fri, Aug 07, 2009 at 09:09:36PM -0600, Robert Hancock wrote:
> On 08/07/2009 12:50 AM, Andi Kleen wrote:
>> Sergey Senozhatsky<sergey.senozhatsky@gmail.com>  writes:
>>> Of course, compiler supposed to do something like
>>> 	add    $0x7,%eax
>>> 	shr    $0x3,%eax
>>> instead of div. Should we rely on this?
>>
>> Yes. We should rely on this.
>
> It may depend on the selected CPU type that gcc is optimizing for - I 
> believe that on some P4s the shift may actually be slower than the divide..

DIV should be always slower than a SHIFT.

But it has nothing really to do with the CPU. The point is that the compiler
always selects a suitable one by itself. Rewriting x / 2 to x >> 1 is
one of the easiest exercises in compiler optimizations.

The only case when the compiler cannot do this easily by itself is 
when the dividend is not a constant.

That said -Os sometimes screws us up on this, but it's still not worth
doing this change manually.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Make shr to divide by power of 2
  2009-08-08  7:35     ` Andi Kleen
@ 2009-08-09  9:40       ` Sergey Senozhatsky
  0 siblings, 0 replies; 6+ messages in thread
From: Sergey Senozhatsky @ 2009-08-09  9:40 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Robert Hancock, Andrew Morton, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2086 bytes --]

On (08/08/09 10:22), Robert Hancock wrote:
> Actually, the Intel Architecture Optimization Reference Manual doesn't
> say divide may be faster, but it does say that "On processors based on
> Intel NetBurst microarchitecture, latencies of some instructions are
> relatively significant (including shifts, rotates, integer multiplies,
> and moves from memory with sign extension)." and that "The SHIFT and
> ROTATE instructions have a longer latency on processor with a CPUID
> signature corresponding to family 15 and model encoding of 0, 1, or 2.
> The latency of a sequence of adds will be shorter for left shifts of three or less."

Intel Architecture Optimization Reference Manual does say about latency:

Table C-13a. General Purpose Instructions
Instruction		Latency				Throughput
IDIV 		| 11-21	13-23	17-41	22	| 5-13	5-14	12-36	22
SAL/SAR/SHL/SHR	| 1	1	1		| 0.33	0.33	0.33

For example,
Table 12-2. Intel® Atom™ Microarchitecture Instructions Latency Data
Instruction		Latency		Throughput
IDIV r/m8; IDIV r/m16;	| 33;42;	| 32;41;56;196
IDIV r/m32; IDIV r/m64;	| 57;197	|
			|		|
ROL; ROR; SAL; 		| 1		| 1
SAR; SHL; SHR		|		|

*SHLD/SHRD		|4;2-11	|3;1-10



On (08/08/09 09:35), Andi Kleen wrote:
> DIV should be always slower than a SHIFT.
>
> But it has nothing really to do with the CPU. The point is that the compiler
> always selects a suitable one by itself. Rewriting x / 2 to x >> 1 is
> one of the easiest exercises in compiler optimizations.
>
> The only case when the compiler cannot do this easily by itself is
> when the dividend is not a constant.
>

        int width = (vc->vc_font.width + 7) >> 3;

> That said -Os sometimes screws us up on this, but it's still not worth
> doing this change manually.
>

My point is that it should 'look the same'.
I mean there are 5
        int width = (vc->vc_font.width + 7) >> 3;
*not exactly this one, but vc->vc_font.width (+ 7)? >> 3

and _only_ one
        int width = (vc->vc_font.width + 7) / 8;

P.S.
Sorry, hit "reply", not "reply to all".

        Sergey

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 315 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2009-08-09  9:37 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-08-06 18:09 [PATCH] Make shr to divide by power of 2 Sergey Senozhatsky
2009-08-07  6:50 ` Andi Kleen
2009-08-07  7:42   ` Sergey Senozhatsky
2009-08-08  3:09   ` Robert Hancock
2009-08-08  7:35     ` Andi Kleen
2009-08-09  9:40       ` Sergey Senozhatsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox