linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* question on inline assembly and long long values
@ 2005-04-06 18:26 Chris Friesen
  2005-04-06 19:16 ` Kumar Gala
  2005-04-06 23:19 ` Benjamin Herrenschmidt
  0 siblings, 2 replies; 11+ messages in thread
From: Chris Friesen @ 2005-04-06 18:26 UTC (permalink / raw)
  To: linuxppc-dev

I want to retrieve the msr (which is 64-bits) on a 970 when running in 
32-bit mode.  I have the following bit of code that seems to work, but 
when looking at the code it always seems to use a suboptimal register 
for the low word, and then it ends up having to copy it to the right 
register to create a long long register pair.

static inline unsigned long long get_msr()
{
	union {
		struct {
			unsigned long low;
			unsigned long high;
		} words;
		unsigned long long val;
	} val;
         asm volatile( \
                 "mfmsr  %0               \n\t" \
                 "rldicl %1,%0,32,32       \n\t" \
                 "rldicl %0,%0,0,32        \n\t" \
                 : "=r" (val.words.low), "=r" (val.words.high));
	return val.val;
}

Using this code, the optimised assembly output of

	unsigned long long a = asdf();
	unsigned long long b = asdf();

is

	mfmsr  5
	rldicl 0,5,32,32
	rldicl 5,5,0,32
	mr 6,0
	mfmsr  7
	rldicl 9,7,32,32
	rldicl 7,7,0,32
	mr 8,9

I figure it should have been able to use registers 6/8 in the first 
place, and save the extra moves.

Is there any way to help gcc optimise this?


Chris

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: question on inline assembly and long long values
  2005-04-06 18:26 question on inline assembly and long long values Chris Friesen
@ 2005-04-06 19:16 ` Kumar Gala
  2005-04-06 20:01   ` Chris Friesen
  2005-04-06 23:19 ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 11+ messages in thread
From: Kumar Gala @ 2005-04-06 19:16 UTC (permalink / raw)
  To: Chris Friesen; +Cc: linuxppc-dev

I'm having a similar need (for a different purpose), this post might be=20=

useful:

http://gcc.gnu.org/ml/gcc/2005-04/msg00283.html

- kumar

On Apr 6, 2005, at 1:26 PM, Chris Friesen wrote:

> I want to retrieve the msr (which is 64-bits) on a 970 when running in
> 32-bit mode.=A0 I have the following bit of code that seems to work, =
but
> when looking at the code it always seems to use a suboptimal register
> for the low word, and then it ends up having to copy it to the right
> register to create a long long register pair.
>
> static inline unsigned long long get_msr()
>  {
>  =A0=A0=A0=A0=A0=A0=A0 union {
>  =A0=A0=A0=A0=A0=A0=A0 =A0=A0=A0=A0=A0=A0=A0 struct {
>  =A0=A0=A0=A0=A0=A0=A0 =A0=A0=A0=A0=A0=A0=A0 =A0=A0=A0=A0=A0=A0=A0 =
unsigned long low;
>  =A0=A0=A0=A0=A0=A0=A0 =A0=A0=A0=A0=A0=A0=A0 =A0=A0=A0=A0=A0=A0=A0 =
unsigned long high;
>  =A0=A0=A0=A0=A0=A0=A0 =A0=A0=A0=A0=A0=A0=A0 } words;
>  =A0=A0=A0=A0=A0=A0=A0 =A0=A0=A0=A0=A0=A0=A0 unsigned long long val;
>  =A0=A0=A0=A0=A0=A0=A0 } val;
>  =A0=A0=A0=A0=A0=A0=A0=A0 asm volatile( \
>  =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 "mfmsr=A0 =
%0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 \n\t" \
>  =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 "rldicl =
%1,%0,32,32=A0=A0=A0=A0=A0=A0 \n\t" \
>  =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 "rldicl =
%0,%0,0,32=A0=A0=A0=A0=A0=A0=A0 \n\t" \
>  =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 : "=3Dr" =
(val.words.low), "=3Dr" (val.words.high));
> =A0=A0=A0=A0=A0=A0=A0 return val.val;
>  }
>
> Using this code, the optimised assembly output of
>
> =A0=A0=A0=A0=A0=A0=A0 unsigned long long a =3D asdf();
>  =A0=A0=A0=A0=A0=A0=A0 unsigned long long b =3D asdf();
>
> is
>
> =A0=A0=A0=A0=A0=A0=A0 mfmsr=A0 5
>  =A0=A0=A0=A0=A0=A0=A0 rldicl 0,5,32,32
>  =A0=A0=A0=A0=A0=A0=A0 rldicl 5,5,0,32
>  =A0=A0=A0=A0=A0=A0=A0 mr 6,0
>  =A0=A0=A0=A0=A0=A0=A0 mfmsr=A0 7
>  =A0=A0=A0=A0=A0=A0=A0 rldicl 9,7,32,32
>  =A0=A0=A0=A0=A0=A0=A0 rldicl 7,7,0,32
>  =A0=A0=A0=A0=A0=A0=A0 mr 8,9
>
> I figure it should have been able to use registers 6/8 in the first
> place, and save the extra moves.
>
> Is there any way to help gcc optimise this?
>
>
>
> Chris
>  _______________________________________________
> Linuxppc-dev mailing list
>  Linuxppc-dev@ozlabs.org
> https://ozlabs.org/mailman/listinfo/linuxppc-dev

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: question on inline assembly and long long values
  2005-04-06 19:16 ` Kumar Gala
@ 2005-04-06 20:01   ` Chris Friesen
  2005-04-06 21:18     ` Gabriel Paubert
  0 siblings, 1 reply; 11+ messages in thread
From: Chris Friesen @ 2005-04-06 20:01 UTC (permalink / raw)
  To: Kumar Gala; +Cc: linuxppc-dev

Kumar Gala wrote:
> I'm having a similar need (for a different purpose), this post might be 
> useful:
> 
> http://gcc.gnu.org/ml/gcc/2005-04/msg00283.html

Sweet!  My new code is now shorter *and* more efficient:

static inline unsigned long long getmsr()
{
	unsigned long long val;
         asm volatile( \
                 "mfmsr  %L0               \n\t" \
                 "rldicl %0,%L0,32,32       \n\t" \
                 "rldicl %L0,%L0,0,32        \n\t" \
                 : "=r" (val));
	return val;
}

This results in

unsigned long long a = asdf3();
unsigned long long b = asdf3();

being compiled to

	mfmsr  6
	rldicl 5,6,32,32
	rldicl 6,6,0,32
	
	mfmsr  8
	rldicl 7,8,32,32
	rldicl 8,8,0,32

Which is about as good as it gets...

Thanks!

Chris

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: question on inline assembly and long long values
  2005-04-06 20:01   ` Chris Friesen
@ 2005-04-06 21:18     ` Gabriel Paubert
  2005-04-06 21:37       ` Chris Friesen
  0 siblings, 1 reply; 11+ messages in thread
From: Gabriel Paubert @ 2005-04-06 21:18 UTC (permalink / raw)
  To: Chris Friesen; +Cc: linuxppc-dev

On Wed, Apr 06, 2005 at 02:01:20PM -0600, Chris Friesen wrote:
> Kumar Gala wrote:
> >I'm having a similar need (for a different purpose), this post might be 
> >useful:
> >
> >http://gcc.gnu.org/ml/gcc/2005-04/msg00283.html
> 
> Sweet!  My new code is now shorter *and* more efficient:
> 
> static inline unsigned long long getmsr()
> {
> 	unsigned long long val;
>         asm volatile( \
>                 "mfmsr  %L0               \n\t" \
>                 "rldicl %0,%L0,32,32       \n\t" \
>                 "rldicl %L0,%L0,0,32        \n\t" \
>                 : "=r" (val));
> 	return val;
> }
> 
> This results in
> 
> unsigned long long a = asdf3();
> unsigned long long b = asdf3();
> 
> being compiled to
> 
> 	mfmsr  6
> 	rldicl 5,6,32,32
> 	rldicl 6,6,0,32
> 	
> 	mfmsr  8
> 	rldicl 7,8,32,32
> 	rldicl 8,8,0,32

If you are running in 32 bit mode, you don't need
to clear the upper half of %L0. The architecture is 
designed so that pure 32 bit code runs unmodified despite
the fact that you can access the upper 32 bits using
64 bit instructions.

	Regards,
	Gabriel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: question on inline assembly and long long values
  2005-04-06 21:18     ` Gabriel Paubert
@ 2005-04-06 21:37       ` Chris Friesen
  0 siblings, 0 replies; 11+ messages in thread
From: Chris Friesen @ 2005-04-06 21:37 UTC (permalink / raw)
  To: Gabriel Paubert; +Cc: linuxppc-dev

Gabriel Paubert wrote:
> On Wed, Apr 06, 2005 at 02:01:20PM -0600, Chris Friesen wrote:

>>static inline unsigned long long getmsr()
>>{
>>	unsigned long long val;
>>        asm volatile( \
>>                "mfmsr  %L0               \n\t" \
>>                "rldicl %0,%L0,32,32       \n\t" \
>>                "rldicl %L0,%L0,0,32        \n\t" \
>>                : "=r" (val));
>>	return val;
>>}

> If you are running in 32 bit mode, you don't need
> to clear the upper half of %L0.

Thanks for the tip.  That cuts out another instruction.

Chris

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: question on inline assembly and long long values
  2005-04-06 18:26 question on inline assembly and long long values Chris Friesen
  2005-04-06 19:16 ` Kumar Gala
@ 2005-04-06 23:19 ` Benjamin Herrenschmidt
  2005-04-06 23:30   ` Chris Friesen
  1 sibling, 1 reply; 11+ messages in thread
From: Benjamin Herrenschmidt @ 2005-04-06 23:19 UTC (permalink / raw)
  To: Chris Friesen; +Cc: linuxppc-dev list

On Wed, 2005-04-06 at 12:26 -0600, Chris Friesen wrote:
> I want to retrieve the msr (which is 64-bits) on a 970 when running in 
> 32-bit mode.  I have the following bit of code that seems to work, but 
> when looking at the code it always seems to use a suboptimal register 
> for the low word, and then it ends up having to copy it to the right 
> register to create a long long register pair.
> 
> static inline unsigned long long get_msr()
> {
> 	union {
> 		struct {
> 			unsigned long low;
> 			unsigned long high;
> 		} words;
> 		unsigned long long val;
> 	} val;
>          asm volatile( \
>                  "mfmsr  %0               \n\t" \
>                  "rldicl %1,%0,32,32       \n\t" \
>                  "rldicl %0,%0,0,32        \n\t" \
>                  : "=r" (val.words.low), "=r" (val.words.high));
> 	return val.val;
> }

Is this in kernel mode ? Why are you doing that ? 

> Using this code, the optimised assembly output of
> 
> 	unsigned long long a = asdf();
> 	unsigned long long b = asdf();
> 
> is
> 
> 	mfmsr  5
> 	rldicl 0,5,32,32
> 	rldicl 5,5,0,32
> 	mr 6,0
> 	mfmsr  7
> 	rldicl 9,7,32,32
> 	rldicl 7,7,0,32
> 	mr 8,9
> 
> I figure it should have been able to use registers 6/8 in the first 
> place, and save the extra moves.
> 
> Is there any way to help gcc optimise this?
> 
> 
> Chris
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@ozlabs.org
> https://ozlabs.org/mailman/listinfo/linuxppc-dev
-- 
Benjamin Herrenschmidt <benh@kernel.crashing.org>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: question on inline assembly and long long values
  2005-04-06 23:19 ` Benjamin Herrenschmidt
@ 2005-04-06 23:30   ` Chris Friesen
  2005-04-06 23:49     ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 11+ messages in thread
From: Chris Friesen @ 2005-04-06 23:30 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev list

Benjamin Herrenschmidt wrote:
> On Wed, 2005-04-06 at 12:26 -0600, Chris Friesen wrote:
> 
>>I want to retrieve the msr (which is 64-bits) on a 970 when running in 
>>32-bit mode.

> Is this in kernel mode ? Why are you doing that ? 

Yes it is.

We wanted to manipulate the high word of the msr on the 970, from C 
code, while running in 32-bit mode.  I'll be doing something similar to 
get at the high word of HID1 (to enable the en_icbi bit).

Is there some other way to do this?

Chris

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: question on inline assembly and long long values
  2005-04-06 23:30   ` Chris Friesen
@ 2005-04-06 23:49     ` Benjamin Herrenschmidt
  2005-04-06 23:56       ` Chris Friesen
  0 siblings, 1 reply; 11+ messages in thread
From: Benjamin Herrenschmidt @ 2005-04-06 23:49 UTC (permalink / raw)
  To: Chris Friesen; +Cc: linuxppc-dev list

On Wed, 2005-04-06 at 17:30 -0600, Chris Friesen wrote:
> Benjamin Herrenschmidt wrote:
> > On Wed, 2005-04-06 at 12:26 -0600, Chris Friesen wrote:
> > 
> >>I want to retrieve the msr (which is 64-bits) on a 970 when running in 
> >>32-bit mode.
> 
> > Is this in kernel mode ? Why are you doing that ? 
> 
> Yes it is.
> 
> We wanted to manipulate the high word of the msr on the 970, from C 
> code, while running in 32-bit mode.  I'll be doing something similar to 
> get at the high word of HID1 (to enable the en_icbi bit).
> 
> Is there some other way to do this?

But the kernel code always runs in 64 bits mode... or are you trying to
pass that down to userland ? In the later case, your optimisations are
pretty much nothing compared to the cost of a syscall

Ben.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: question on inline assembly and long long values
  2005-04-06 23:49     ` Benjamin Herrenschmidt
@ 2005-04-06 23:56       ` Chris Friesen
  2005-04-06 23:57         ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 11+ messages in thread
From: Chris Friesen @ 2005-04-06 23:56 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev list

Benjamin Herrenschmidt wrote:

> But the kernel code always runs in 64 bits mode...

The ppc kernel (not ppc64) runs in 64-bit mode?  An unsigned long long 
fits in a single register in C code?

Yeah, the optimisations really don't matter much...the ugly resulting 
code just bothered me.

Chris

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: question on inline assembly and long long values
  2005-04-06 23:56       ` Chris Friesen
@ 2005-04-06 23:57         ` Benjamin Herrenschmidt
  2005-04-07  0:09           ` Chris Friesen
  0 siblings, 1 reply; 11+ messages in thread
From: Benjamin Herrenschmidt @ 2005-04-06 23:57 UTC (permalink / raw)
  To: Chris Friesen; +Cc: linuxppc-dev list

On Wed, 2005-04-06 at 17:56 -0600, Chris Friesen wrote:
> Benjamin Herrenschmidt wrote:
> 
> > But the kernel code always runs in 64 bits mode...
> 
> The ppc kernel (not ppc64) runs in 64-bit mode?  An unsigned long long 
> fits in a single register in C code?

Ah no, it doesn't, but then, be careful to have interrupts off when you
do that are the interrupt handler won't save the high half of the
registers between the mtmsrd and your "splitting" of it.

> Yeah, the optimisations really don't matter much...the ugly resulting 
> code just bothered me.
> 
> Chris
-- 
Benjamin Herrenschmidt <benh@kernel.crashing.org>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: question on inline assembly and long long values
  2005-04-06 23:57         ` Benjamin Herrenschmidt
@ 2005-04-07  0:09           ` Chris Friesen
  0 siblings, 0 replies; 11+ messages in thread
From: Chris Friesen @ 2005-04-07  0:09 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev list

Benjamin Herrenschmidt wrote:
> On Wed, 2005-04-06 at 17:56 -0600, Chris Friesen wrote:
>>Benjamin Herrenschmidt wrote:
>>>But the kernel code always runs in 64 bits mode...
>>
>>The ppc kernel (not ppc64) runs in 64-bit mode?  An unsigned long long 
>>fits in a single register in C code?
> 
> Ah no, it doesn't, but then, be careful to have interrupts off when you
> do that are the interrupt handler won't save the high half of the
> registers between the mtmsrd and your "splitting" of it.

Right.  All this stuff is running with interrupts off.

Chris

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2005-04-07  0:09 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-04-06 18:26 question on inline assembly and long long values Chris Friesen
2005-04-06 19:16 ` Kumar Gala
2005-04-06 20:01   ` Chris Friesen
2005-04-06 21:18     ` Gabriel Paubert
2005-04-06 21:37       ` Chris Friesen
2005-04-06 23:19 ` Benjamin Herrenschmidt
2005-04-06 23:30   ` Chris Friesen
2005-04-06 23:49     ` Benjamin Herrenschmidt
2005-04-06 23:56       ` Chris Friesen
2005-04-06 23:57         ` Benjamin Herrenschmidt
2005-04-07  0:09           ` Chris Friesen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).