public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* gcc feature request / RFC: extra clobbered regs
@ 2015-06-30 21:22 Andy Lutomirski
  2015-06-30 21:32 ` H. Peter Anvin
  2015-06-30 21:37 ` Jakub Jelinek
  0 siblings, 2 replies; 19+ messages in thread
From: Andy Lutomirski @ 2015-06-30 21:22 UTC (permalink / raw)
  To: gcc, linux-kernel@vger.kernel.org, Linus Torvalds, H. Peter Anvin,
	Ingo Molnar, Thomas Gleixner

Hi all-

I'm working on a massive set of cleanups to Linux's syscall handling.
We currently have a nasty optimization in which we don't save rbx,
rbp, r12, r13, r14, and r15 on x86_64 before calling C functions.
This works, but it makes the code a huge mess.  I'd rather save all
regs in asm and then call C code.

Unfortunately, this will add five cycles (on SNB) to one of the
hottest paths in the kernel.  To counteract it, I have a gcc feature
request that might not be all that crazy.  When writing C functions
intended to be called from asm, what if we could do:

__attribute__((extra_clobber("rbx", "rbp", "r12", "r13", "r14",
"r15"))) void func(void);

This will save enough pushes and pops that it could easily give us our
five cycles back and then some.  It's also easy to be compatible with
old GCC versions -- we could just omit the attribute, since preserving
a register is always safe.

Thoughts?  Is this totally crazy?  Is it easy to implement?

(I'm not necessarily suggesting that we do this for the syscall bodies
themselves.  I want to do it for the entry and exit helpers, so we'd
still lose the five cycles in the full fast-path case, but we'd do
better in the slower paths, and the slower paths are becoming
increasingly important in real workloads.)

Thanks,
Andy

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: gcc feature request / RFC: extra clobbered regs
  2015-06-30 21:22 gcc feature request / RFC: extra clobbered regs Andy Lutomirski
@ 2015-06-30 21:32 ` H. Peter Anvin
  2015-06-30 21:37 ` Jakub Jelinek
  1 sibling, 0 replies; 19+ messages in thread
From: H. Peter Anvin @ 2015-06-30 21:32 UTC (permalink / raw)
  To: Andy Lutomirski, gcc, linux-kernel@vger.kernel.org,
	Linus Torvalds, Ingo Molnar, Thomas Gleixner

On 06/30/2015 02:22 PM, Andy Lutomirski wrote:
> Hi all-
> 
> I'm working on a massive set of cleanups to Linux's syscall handling.
> We currently have a nasty optimization in which we don't save rbx,
> rbp, r12, r13, r14, and r15 on x86_64 before calling C functions.
> This works, but it makes the code a huge mess.  I'd rather save all
> regs in asm and then call C code.
> 
> Unfortunately, this will add five cycles (on SNB) to one of the
> hottest paths in the kernel.  To counteract it, I have a gcc feature
> request that might not be all that crazy.  When writing C functions
> intended to be called from asm, what if we could do:
> 
> __attribute__((extra_clobber("rbx", "rbp", "r12", "r13", "r14",
> "r15"))) void func(void);
> 
> This will save enough pushes and pops that it could easily give us our
> five cycles back and then some.  It's also easy to be compatible with
> old GCC versions -- we could just omit the attribute, since preserving
> a register is always safe.
> 
> Thoughts?  Is this totally crazy?  Is it easy to implement?
> 
> (I'm not necessarily suggesting that we do this for the syscall bodies
> themselves.  I want to do it for the entry and exit helpers, so we'd
> still lose the five cycles in the full fast-path case, but we'd do
> better in the slower paths, and the slower paths are becoming
> increasingly important in real workloads.)
> 

Some gcc targets have done this in the past.  There are command-line
options to do that, but using attributes you have to handle cross-ABI
compilation.

However, I don't see this being done in the upstream gcc.

Keep in mind the runway that we'll need, though.

	-hpa



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: gcc feature request / RFC: extra clobbered regs
  2015-06-30 21:22 gcc feature request / RFC: extra clobbered regs Andy Lutomirski
  2015-06-30 21:32 ` H. Peter Anvin
@ 2015-06-30 21:37 ` Jakub Jelinek
  2015-06-30 21:41   ` H. Peter Anvin
  2015-07-01 15:23   ` Vladimir Makarov
  1 sibling, 2 replies; 19+ messages in thread
From: Jakub Jelinek @ 2015-06-30 21:37 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: gcc, linux-kernel@vger.kernel.org, Linus Torvalds, H. Peter Anvin,
	Ingo Molnar, Thomas Gleixner, Vladimir Makarov

On Tue, Jun 30, 2015 at 02:22:33PM -0700, Andy Lutomirski wrote:
> I'm working on a massive set of cleanups to Linux's syscall handling.
> We currently have a nasty optimization in which we don't save rbx,
> rbp, r12, r13, r14, and r15 on x86_64 before calling C functions.
> This works, but it makes the code a huge mess.  I'd rather save all
> regs in asm and then call C code.
> 
> Unfortunately, this will add five cycles (on SNB) to one of the
> hottest paths in the kernel.  To counteract it, I have a gcc feature
> request that might not be all that crazy.  When writing C functions
> intended to be called from asm, what if we could do:
> 
> __attribute__((extra_clobber("rbx", "rbp", "r12", "r13", "r14",
> "r15"))) void func(void);
> 
> This will save enough pushes and pops that it could easily give us our
> five cycles back and then some.  It's also easy to be compatible with
> old GCC versions -- we could just omit the attribute, since preserving
> a register is always safe.
> 
> Thoughts?  Is this totally crazy?  Is it easy to implement?
> 
> (I'm not necessarily suggesting that we do this for the syscall bodies
> themselves.  I want to do it for the entry and exit helpers, so we'd
> still lose the five cycles in the full fast-path case, but we'd do
> better in the slower paths, and the slower paths are becoming
> increasingly important in real workloads.)

GCC already supports -ffixed-REG, -fcall-used-REG and -fcall-saved-REG
options, which allow to tweak the calling conventions; but it is per
translation unit right now.  It isn't clear which of these options
you mean with the extra_clobber.
I assume you are looking for a possibility to change this to be
per-function, with caller with a different calling convention having to
adjust for different ABI callee.  To some extent, recent GCC versions
do that automatically with -fipa-ra already - if some call used registers
are not clobbered by some call and the caller can analyze that callee,
it can stick values in such registers across the call.
I'd say the most natural API for this would be to allow
f{fixed,call-{used,saved}}-REG in target attribute.

	Jakub

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: gcc feature request / RFC: extra clobbered regs
  2015-06-30 21:37 ` Jakub Jelinek
@ 2015-06-30 21:41   ` H. Peter Anvin
  2015-06-30 21:48     ` Andy Lutomirski
  2015-07-01 15:23   ` Vladimir Makarov
  1 sibling, 1 reply; 19+ messages in thread
From: H. Peter Anvin @ 2015-06-30 21:41 UTC (permalink / raw)
  To: Jakub Jelinek, Andy Lutomirski
  Cc: gcc, linux-kernel@vger.kernel.org, Linus Torvalds, Ingo Molnar,
	Thomas Gleixner, Vladimir Makarov

On 06/30/2015 02:37 PM, Jakub Jelinek wrote:
> I'd say the most natural API for this would be to allow
> f{fixed,call-{used,saved}}-REG in target attribute.

Either that or

	__attribute__((fixed(rbp,rcx),used(rax,rbx),saved(r11)))

... just to be shorter.  Either way, I would consider this to be
desirable -- I have myself used this to good effect in a past life
(*cough* Transmeta *cough*) -- but not a high priority feature.

	-hpa



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: gcc feature request / RFC: extra clobbered regs
  2015-06-30 21:41   ` H. Peter Anvin
@ 2015-06-30 21:48     ` Andy Lutomirski
  2015-06-30 21:52       ` H. Peter Anvin
  0 siblings, 1 reply; 19+ messages in thread
From: Andy Lutomirski @ 2015-06-30 21:48 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Jakub Jelinek, Andy Lutomirski, gcc, linux-kernel@vger.kernel.org,
	Linus Torvalds, Ingo Molnar, Thomas Gleixner, Vladimir Makarov

On Tue, Jun 30, 2015 at 2:41 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> On 06/30/2015 02:37 PM, Jakub Jelinek wrote:
>> I'd say the most natural API for this would be to allow
>> f{fixed,call-{used,saved}}-REG in target attribute.
>
> Either that or
>
>         __attribute__((fixed(rbp,rcx),used(rax,rbx),saved(r11)))
>
> ... just to be shorter.  Either way, I would consider this to be
> desirable -- I have myself used this to good effect in a past life
> (*cough* Transmeta *cough*) -- but not a high priority feature.

I think I mean the per-function equivalent of -fcall-used-reg, so
hpa's "used" suggestion would do the trick.

I guess that clobbering the frame pointer is a non-starter, but five
out of six isn't so bad.  It would be nice to error out instead of
producing "disastrous results", though, if another bad reg is chosen.
(Presumably the PIC register on PIC builds would be an example of
that.)

--Andy

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: gcc feature request / RFC: extra clobbered regs
  2015-06-30 21:48     ` Andy Lutomirski
@ 2015-06-30 21:52       ` H. Peter Anvin
  2015-06-30 21:55         ` Andy Lutomirski
  0 siblings, 1 reply; 19+ messages in thread
From: H. Peter Anvin @ 2015-06-30 21:52 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Jakub Jelinek, Andy Lutomirski, gcc, linux-kernel@vger.kernel.org,
	Linus Torvalds, Ingo Molnar, Thomas Gleixner, Vladimir Makarov

On 06/30/2015 02:48 PM, Andy Lutomirski wrote:
> On Tue, Jun 30, 2015 at 2:41 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>> On 06/30/2015 02:37 PM, Jakub Jelinek wrote:
>>> I'd say the most natural API for this would be to allow
>>> f{fixed,call-{used,saved}}-REG in target attribute.
>>
>> Either that or
>>
>>         __attribute__((fixed(rbp,rcx),used(rax,rbx),saved(r11)))
>>
>> ... just to be shorter.  Either way, I would consider this to be
>> desirable -- I have myself used this to good effect in a past life
>> (*cough* Transmeta *cough*) -- but not a high priority feature.
> 
> I think I mean the per-function equivalent of -fcall-used-reg, so
> hpa's "used" suggestion would do the trick.
> 
> I guess that clobbering the frame pointer is a non-starter, but five
> out of six isn't so bad.  It would be nice to error out instead of
> producing "disastrous results", though, if another bad reg is chosen.
> (Presumably the PIC register on PIC builds would be an example of
> that.)
> 

Clobbering the frame pointer is perfectly fine, as is the PIC register.
 However, gcc might need to handle them as "fixed" rather than "clobbered".

	-hpa



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: gcc feature request / RFC: extra clobbered regs
  2015-06-30 21:52       ` H. Peter Anvin
@ 2015-06-30 21:55         ` Andy Lutomirski
  2015-06-30 22:02           ` H. Peter Anvin
  0 siblings, 1 reply; 19+ messages in thread
From: Andy Lutomirski @ 2015-06-30 21:55 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Jakub Jelinek, Andy Lutomirski, gcc, linux-kernel@vger.kernel.org,
	Linus Torvalds, Ingo Molnar, Thomas Gleixner, Vladimir Makarov

On Tue, Jun 30, 2015 at 2:52 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> On 06/30/2015 02:48 PM, Andy Lutomirski wrote:
>> On Tue, Jun 30, 2015 at 2:41 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>>> On 06/30/2015 02:37 PM, Jakub Jelinek wrote:
>>>> I'd say the most natural API for this would be to allow
>>>> f{fixed,call-{used,saved}}-REG in target attribute.
>>>
>>> Either that or
>>>
>>>         __attribute__((fixed(rbp,rcx),used(rax,rbx),saved(r11)))
>>>
>>> ... just to be shorter.  Either way, I would consider this to be
>>> desirable -- I have myself used this to good effect in a past life
>>> (*cough* Transmeta *cough*) -- but not a high priority feature.
>>
>> I think I mean the per-function equivalent of -fcall-used-reg, so
>> hpa's "used" suggestion would do the trick.
>>
>> I guess that clobbering the frame pointer is a non-starter, but five
>> out of six isn't so bad.  It would be nice to error out instead of
>> producing "disastrous results", though, if another bad reg is chosen.
>> (Presumably the PIC register on PIC builds would be an example of
>> that.)
>>
>
> Clobbering the frame pointer is perfectly fine, as is the PIC register.
>  However, gcc might need to handle them as "fixed" rather than "clobbered".

Hmm.  True, I guess, although I wouldn't necessarily expect gcc to be
able to generate code to call a function like that.

--Andy

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: gcc feature request / RFC: extra clobbered regs
  2015-06-30 21:55         ` Andy Lutomirski
@ 2015-06-30 22:02           ` H. Peter Anvin
  2015-07-01  4:20             ` Jeff Law
  0 siblings, 1 reply; 19+ messages in thread
From: H. Peter Anvin @ 2015-06-30 22:02 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Jakub Jelinek, Andy Lutomirski, gcc, linux-kernel@vger.kernel.org,
	Linus Torvalds, Ingo Molnar, Thomas Gleixner, Vladimir Makarov

On 06/30/2015 02:55 PM, Andy Lutomirski wrote:
> On Tue, Jun 30, 2015 at 2:52 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>> On 06/30/2015 02:48 PM, Andy Lutomirski wrote:
>>> On Tue, Jun 30, 2015 at 2:41 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>>>> On 06/30/2015 02:37 PM, Jakub Jelinek wrote:
>>>>> I'd say the most natural API for this would be to allow
>>>>> f{fixed,call-{used,saved}}-REG in target attribute.
>>>>
>>>> Either that or
>>>>
>>>>         __attribute__((fixed(rbp,rcx),used(rax,rbx),saved(r11)))
>>>>
>>>> ... just to be shorter.  Either way, I would consider this to be
>>>> desirable -- I have myself used this to good effect in a past life
>>>> (*cough* Transmeta *cough*) -- but not a high priority feature.
>>>
>>> I think I mean the per-function equivalent of -fcall-used-reg, so
>>> hpa's "used" suggestion would do the trick.
>>>
>>> I guess that clobbering the frame pointer is a non-starter, but five
>>> out of six isn't so bad.  It would be nice to error out instead of
>>> producing "disastrous results", though, if another bad reg is chosen.
>>> (Presumably the PIC register on PIC builds would be an example of
>>> that.)
>>>
>>
>> Clobbering the frame pointer is perfectly fine, as is the PIC register.
>>  However, gcc might need to handle them as "fixed" rather than "clobbered".
> 
> Hmm.  True, I guess, although I wouldn't necessarily expect gcc to be
> able to generate code to call a function like that.
> 

No, but you need to be able to call other functions, or you just push
the issue down one level.

	-hpa



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: gcc feature request / RFC: extra clobbered regs
  2015-06-30 22:02           ` H. Peter Anvin
@ 2015-07-01  4:20             ` Jeff Law
  0 siblings, 0 replies; 19+ messages in thread
From: Jeff Law @ 2015-07-01  4:20 UTC (permalink / raw)
  To: H. Peter Anvin, Andy Lutomirski
  Cc: Jakub Jelinek, Andy Lutomirski, gcc, linux-kernel@vger.kernel.org,
	Linus Torvalds, Ingo Molnar, Thomas Gleixner, Vladimir Makarov

On 06/30/2015 04:02 PM, H. Peter Anvin wrote:
> On 06/30/2015 02:55 PM, Andy Lutomirski wrote:
>> On Tue, Jun 30, 2015 at 2:52 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>>> On 06/30/2015 02:48 PM, Andy Lutomirski wrote:
>>>> On Tue, Jun 30, 2015 at 2:41 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>>>>> On 06/30/2015 02:37 PM, Jakub Jelinek wrote:
>>>>>> I'd say the most natural API for this would be to allow
>>>>>> f{fixed,call-{used,saved}}-REG in target attribute.
>>>>>
>>>>> Either that or
>>>>>
>>>>>          __attribute__((fixed(rbp,rcx),used(rax,rbx),saved(r11)))
>>>>>
>>>>> ... just to be shorter.  Either way, I would consider this to be
>>>>> desirable -- I have myself used this to good effect in a past life
>>>>> (*cough* Transmeta *cough*) -- but not a high priority feature.
>>>>
>>>> I think I mean the per-function equivalent of -fcall-used-reg, so
>>>> hpa's "used" suggestion would do the trick.
>>>>
>>>> I guess that clobbering the frame pointer is a non-starter, but five
>>>> out of six isn't so bad.  It would be nice to error out instead of
>>>> producing "disastrous results", though, if another bad reg is chosen.
>>>> (Presumably the PIC register on PIC builds would be an example of
>>>> that.)
>>>>
>>>
>>> Clobbering the frame pointer is perfectly fine, as is the PIC register.
>>>   However, gcc might need to handle them as "fixed" rather than "clobbered".
>>
>> Hmm.  True, I guess, although I wouldn't necessarily expect gcc to be
>> able to generate code to call a function like that.
>>
>
> No, but you need to be able to call other functions, or you just push
> the issue down one level.
For ia32, the PIC register really isn't special anymore.  I'd be 
surprised if you couldn't clobber it.

jeff

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: gcc feature request / RFC: extra clobbered regs
  2015-06-30 21:37 ` Jakub Jelinek
  2015-06-30 21:41   ` H. Peter Anvin
@ 2015-07-01 15:23   ` Vladimir Makarov
  2015-07-01 15:27     ` Andy Lutomirski
  2015-07-01 15:31     ` Jakub Jelinek
  1 sibling, 2 replies; 19+ messages in thread
From: Vladimir Makarov @ 2015-07-01 15:23 UTC (permalink / raw)
  To: Jakub Jelinek, Andy Lutomirski
  Cc: gcc, linux-kernel@vger.kernel.org, Linus Torvalds, H. Peter Anvin,
	Ingo Molnar, Thomas Gleixner



On 06/30/2015 05:37 PM, Jakub Jelinek wrote:
> On Tue, Jun 30, 2015 at 02:22:33PM -0700, Andy Lutomirski wrote:
>> I'm working on a massive set of cleanups to Linux's syscall handling.
>> We currently have a nasty optimization in which we don't save rbx,
>> rbp, r12, r13, r14, and r15 on x86_64 before calling C functions.
>> This works, but it makes the code a huge mess.  I'd rather save all
>> regs in asm and then call C code.
>>
>> Unfortunately, this will add five cycles (on SNB) to one of the
>> hottest paths in the kernel.  To counteract it, I have a gcc feature
>> request that might not be all that crazy.  When writing C functions
>> intended to be called from asm, what if we could do:
>>
>> __attribute__((extra_clobber("rbx", "rbp", "r12", "r13", "r14",
>> "r15"))) void func(void);
>>
>> This will save enough pushes and pops that it could easily give us our
>> five cycles back and then some.  It's also easy to be compatible with
>> old GCC versions -- we could just omit the attribute, since preserving
>> a register is always safe.
>>
>> Thoughts?  Is this totally crazy?  Is it easy to implement?
>>
>> (I'm not necessarily suggesting that we do this for the syscall bodies
>> themselves.  I want to do it for the entry and exit helpers, so we'd
>> still lose the five cycles in the full fast-path case, but we'd do
>> better in the slower paths, and the slower paths are becoming
>> increasingly important in real workloads.)
> GCC already supports -ffixed-REG, -fcall-used-REG and -fcall-saved-REG
> options, which allow to tweak the calling conventions; but it is per
> translation unit right now.  It isn't clear which of these options
> you mean with the extra_clobber.
> I assume you are looking for a possibility to change this to be
> per-function, with caller with a different calling convention having to
> adjust for different ABI callee.  To some extent, recent GCC versions
> do that automatically with -fipa-ra already - if some call used registers
> are not clobbered by some call and the caller can analyze that callee,
> it can stick values in such registers across the call.
> I'd say the most natural API for this would be to allow
> f{fixed,call-{used,saved}}-REG in target attribute.
>
>
One consequence of frequent changing calling convention per function or 
register usage could be GCC slowdown.  RA calculates too many data and 
it requires a lot of time to recalculate them after something in the 
register usage convention is changed.

Another consequence would be that RA fails generate the code in some 
cases and even worse the failure might depend on version of GCC (I 
already saw PRs where RA worked for an asm in one GCC version because a 
pseudo was changed by equivalent constant and failed in another GCC 
version where it did not happen).

Other than that I don't see other complications with implementing such 
feature.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: gcc feature request / RFC: extra clobbered regs
  2015-07-01 15:23   ` Vladimir Makarov
@ 2015-07-01 15:27     ` Andy Lutomirski
  2015-07-01 17:57       ` Vladimir Makarov
  2015-07-01 15:31     ` Jakub Jelinek
  1 sibling, 1 reply; 19+ messages in thread
From: Andy Lutomirski @ 2015-07-01 15:27 UTC (permalink / raw)
  To: Vladimir Makarov
  Cc: Jakub Jelinek, Andy Lutomirski, gcc, linux-kernel@vger.kernel.org,
	Linus Torvalds, H. Peter Anvin, Ingo Molnar, Thomas Gleixner

On Wed, Jul 1, 2015 at 8:23 AM, Vladimir Makarov <vmakarov@redhat.com> wrote:
>
>
> On 06/30/2015 05:37 PM, Jakub Jelinek wrote:
>>
>> On Tue, Jun 30, 2015 at 02:22:33PM -0700, Andy Lutomirski wrote:
>>>
>>> I'm working on a massive set of cleanups to Linux's syscall handling.
>>> We currently have a nasty optimization in which we don't save rbx,
>>> rbp, r12, r13, r14, and r15 on x86_64 before calling C functions.
>>> This works, but it makes the code a huge mess.  I'd rather save all
>>> regs in asm and then call C code.
>>>
>>> Unfortunately, this will add five cycles (on SNB) to one of the
>>> hottest paths in the kernel.  To counteract it, I have a gcc feature
>>> request that might not be all that crazy.  When writing C functions
>>> intended to be called from asm, what if we could do:
>>>
>>> __attribute__((extra_clobber("rbx", "rbp", "r12", "r13", "r14",
>>> "r15"))) void func(void);
>>>
>>> This will save enough pushes and pops that it could easily give us our
>>> five cycles back and then some.  It's also easy to be compatible with
>>> old GCC versions -- we could just omit the attribute, since preserving
>>> a register is always safe.
>>>
>>> Thoughts?  Is this totally crazy?  Is it easy to implement?
>>>
>>> (I'm not necessarily suggesting that we do this for the syscall bodies
>>> themselves.  I want to do it for the entry and exit helpers, so we'd
>>> still lose the five cycles in the full fast-path case, but we'd do
>>> better in the slower paths, and the slower paths are becoming
>>> increasingly important in real workloads.)
>>
>> GCC already supports -ffixed-REG, -fcall-used-REG and -fcall-saved-REG
>> options, which allow to tweak the calling conventions; but it is per
>> translation unit right now.  It isn't clear which of these options
>> you mean with the extra_clobber.
>> I assume you are looking for a possibility to change this to be
>> per-function, with caller with a different calling convention having to
>> adjust for different ABI callee.  To some extent, recent GCC versions
>> do that automatically with -fipa-ra already - if some call used registers
>> are not clobbered by some call and the caller can analyze that callee,
>> it can stick values in such registers across the call.
>> I'd say the most natural API for this would be to allow
>> f{fixed,call-{used,saved}}-REG in target attribute.
>>
>>
> One consequence of frequent changing calling convention per function or
> register usage could be GCC slowdown.  RA calculates too many data and it
> requires a lot of time to recalculate them after something in the register
> usage convention is changed.

Do you mean that RA precalculates things based on the calling
convention and saves it across functions?  Hmm.  I don't think this
would be a big problem in my intended use case -- there would only be
a handful of functions using this extension, and they'd have very few
non-asm callers.

>
> Another consequence would be that RA fails generate the code in some cases
> and even worse the failure might depend on version of GCC (I already saw PRs
> where RA worked for an asm in one GCC version because a pseudo was changed
> by equivalent constant and failed in another GCC version where it did not
> happen).
>

Would this be a problem generating code for a function with extra
"used" regs or just a problem generating code to call such a function.
I imagine that, in the former case, RA's job would be easier, not
harder, since there would be more registers to work with.  In
practice, though, I think it would just end up changing the prologue
and epilogue.

--Andy

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: gcc feature request / RFC: extra clobbered regs
  2015-07-01 15:23   ` Vladimir Makarov
  2015-07-01 15:27     ` Andy Lutomirski
@ 2015-07-01 15:31     ` Jakub Jelinek
  2015-07-01 17:35       ` Vladimir Makarov
  1 sibling, 1 reply; 19+ messages in thread
From: Jakub Jelinek @ 2015-07-01 15:31 UTC (permalink / raw)
  To: Vladimir Makarov
  Cc: Andy Lutomirski, gcc, linux-kernel@vger.kernel.org,
	Linus Torvalds, H. Peter Anvin, Ingo Molnar, Thomas Gleixner

On Wed, Jul 01, 2015 at 11:23:17AM -0400, Vladimir Makarov wrote:
> >>(I'm not necessarily suggesting that we do this for the syscall bodies
> >>themselves.  I want to do it for the entry and exit helpers, so we'd
> >>still lose the five cycles in the full fast-path case, but we'd do
> >>better in the slower paths, and the slower paths are becoming
> >>increasingly important in real workloads.)
> >GCC already supports -ffixed-REG, -fcall-used-REG and -fcall-saved-REG
> >options, which allow to tweak the calling conventions; but it is per
> >translation unit right now.  It isn't clear which of these options
> >you mean with the extra_clobber.
> >I assume you are looking for a possibility to change this to be
> >per-function, with caller with a different calling convention having to
> >adjust for different ABI callee.  To some extent, recent GCC versions
> >do that automatically with -fipa-ra already - if some call used registers
> >are not clobbered by some call and the caller can analyze that callee,
> >it can stick values in such registers across the call.
> >I'd say the most natural API for this would be to allow
> >f{fixed,call-{used,saved}}-REG in target attribute.
> >
> >
> One consequence of frequent changing calling convention per function or
> register usage could be GCC slowdown.  RA calculates too many data and it
> requires a lot of time to recalculate them after something in the register
> usage convention is changed.

That is true.  i?86/x86_64 is a switchable target, so at least for the case
of info computed for the callee with non-standard calling convention such
info can be computed just once when the function with such a target
attribute would be seen first.  But for the caller side, I agree not
everything can be precomputed, if we can't use e.g. regsets saved in the
callee; as a single function can call different functions with different
ABIs.  But to some extent we have that already with -fipa-ra, don't we?

	Jakub

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: gcc feature request / RFC: extra clobbered regs
  2015-07-01 15:31     ` Jakub Jelinek
@ 2015-07-01 17:35       ` Vladimir Makarov
  2015-07-01 17:38         ` Andy Lutomirski
  2015-07-01 17:43         ` Jakub Jelinek
  0 siblings, 2 replies; 19+ messages in thread
From: Vladimir Makarov @ 2015-07-01 17:35 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Andy Lutomirski, gcc, linux-kernel@vger.kernel.org,
	Linus Torvalds, H. Peter Anvin, Ingo Molnar, Thomas Gleixner



On 07/01/2015 11:31 AM, Jakub Jelinek wrote:
> On Wed, Jul 01, 2015 at 11:23:17AM -0400, Vladimir Makarov wrote:
>>>> (I'm not necessarily suggesting that we do this for the syscall bodies
>>>> themselves.  I want to do it for the entry and exit helpers, so we'd
>>>> still lose the five cycles in the full fast-path case, but we'd do
>>>> better in the slower paths, and the slower paths are becoming
>>>> increasingly important in real workloads.)
>>> GCC already supports -ffixed-REG, -fcall-used-REG and -fcall-saved-REG
>>> options, which allow to tweak the calling conventions; but it is per
>>> translation unit right now.  It isn't clear which of these options
>>> you mean with the extra_clobber.
>>> I assume you are looking for a possibility to change this to be
>>> per-function, with caller with a different calling convention having to
>>> adjust for different ABI callee.  To some extent, recent GCC versions
>>> do that automatically with -fipa-ra already - if some call used registers
>>> are not clobbered by some call and the caller can analyze that callee,
>>> it can stick values in such registers across the call.
>>> I'd say the most natural API for this would be to allow
>>> f{fixed,call-{used,saved}}-REG in target attribute.
>>>
>>>
>> One consequence of frequent changing calling convention per function or
>> register usage could be GCC slowdown.  RA calculates too many data and it
>> requires a lot of time to recalculate them after something in the register
>> usage convention is changed.
> That is true.  i?86/x86_64 is a switchable target, so at least for the case
> of info computed for the callee with non-standard calling convention such
> info can be computed just once when the function with such a target
> attribute would be seen first.
Yes, more clever way could be used.  We can can calculate the info for 
specific calling convention, save it and reuse it for the function with 
the same attributes.  The compilation speed will be ok even with the 
current implementation if there are few calling convention changes.
>    But for the caller side, I agree not
> everything can be precomputed, if we can't use e.g. regsets saved in the
> callee; as a single function can call different functions with different
> ABIs.  But to some extent we have that already with -fipa-ra, don't we?
>
>
Yes, for -fipa-ra if we saw the function, we know what registers it 
actually clobbers.  If we did not processed it yet, we use the worst 
case scenario (clobbering all clobbered registers according to calling 
convention).

Actually it raise a question for me.  If we describe that a function 
clobbers more than calling convention and then use it as a value 
(assigning a variable or passing as an argument) and loosing a track of 
it and than call it.  How can RA know what the call clobbers actually.  
So for the function with the attributes we should prohibit use it as a 
value or make the attributes as a part of the function type, or at least 
say it is unsafe.  So now I see this as a *bigger problem* with this 
extension.  Although I guess it already exists as we have description of 
different ABI as an extension.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: gcc feature request / RFC: extra clobbered regs
  2015-07-01 17:35       ` Vladimir Makarov
@ 2015-07-01 17:38         ` Andy Lutomirski
  2015-07-01 17:43         ` Jakub Jelinek
  1 sibling, 0 replies; 19+ messages in thread
From: Andy Lutomirski @ 2015-07-01 17:38 UTC (permalink / raw)
  To: Vladimir Makarov
  Cc: Jakub Jelinek, Andy Lutomirski, gcc, linux-kernel@vger.kernel.org,
	Linus Torvalds, H. Peter Anvin, Ingo Molnar, Thomas Gleixner

On Wed, Jul 1, 2015 at 10:35 AM, Vladimir Makarov <vmakarov@redhat.com> wrote:
> Actually it raise a question for me.  If we describe that a function
> clobbers more than calling convention and then use it as a value (assigning
> a variable or passing as an argument) and loosing a track of it and than
> call it.  How can RA know what the call clobbers actually.  So for the
> function with the attributes we should prohibit use it as a value or make
> the attributes as a part of the function type, or at least say it is unsafe.

I think it should be part of the type.  This shouldn't compile:

void func(void) __attribute__((used_reg("r12")));
void (*x)(void);
x = func;

--Andy

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: gcc feature request / RFC: extra clobbered regs
  2015-07-01 17:35       ` Vladimir Makarov
  2015-07-01 17:38         ` Andy Lutomirski
@ 2015-07-01 17:43         ` Jakub Jelinek
  2015-07-01 18:12           ` Vladimir Makarov
                             ` (2 more replies)
  1 sibling, 3 replies; 19+ messages in thread
From: Jakub Jelinek @ 2015-07-01 17:43 UTC (permalink / raw)
  To: Vladimir Makarov
  Cc: Andy Lutomirski, gcc, linux-kernel@vger.kernel.org,
	Linus Torvalds, H. Peter Anvin, Ingo Molnar, Thomas Gleixner

On Wed, Jul 01, 2015 at 01:35:16PM -0400, Vladimir Makarov wrote:
> Actually it raise a question for me.  If we describe that a function
> clobbers more than calling convention and then use it as a value (assigning
> a variable or passing as an argument) and loosing a track of it and than
> call it.  How can RA know what the call clobbers actually.  So for the
> function with the attributes we should prohibit use it as a value or make
> the attributes as a part of the function type, or at least say it is unsafe.
> So now I see this as a *bigger problem* with this extension.  Although I
> guess it already exists as we have description of different ABI as an
> extension.

Unfortunately target attribute is function decl attribute rather than
function type.  And having more attributes affect switchable targets will be
non-fun.

	Jakub

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: gcc feature request / RFC: extra clobbered regs
  2015-07-01 15:27     ` Andy Lutomirski
@ 2015-07-01 17:57       ` Vladimir Makarov
  0 siblings, 0 replies; 19+ messages in thread
From: Vladimir Makarov @ 2015-07-01 17:57 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Jakub Jelinek, Andy Lutomirski, gcc, linux-kernel@vger.kernel.org,
	Linus Torvalds, H. Peter Anvin, Ingo Molnar, Thomas Gleixner



On 07/01/2015 11:27 AM, Andy Lutomirski wrote:
> On Wed, Jul 1, 2015 at 8:23 AM, Vladimir Makarov <vmakarov@redhat.com> wrote:
>>
>> On 06/30/2015 05:37 PM, Jakub Jelinek wrote:
>>> On Tue, Jun 30, 2015 at 02:22:33PM -0700, Andy Lutomirski wrote:
>>>> I'm working on a massive set of cleanups to Linux's syscall handling.
>>>> We currently have a nasty optimization in which we don't save rbx,
>>>> rbp, r12, r13, r14, and r15 on x86_64 before calling C functions.
>>>> This works, but it makes the code a huge mess.  I'd rather save all
>>>> regs in asm and then call C code.
>>>>
>>>> Unfortunately, this will add five cycles (on SNB) to one of the
>>>> hottest paths in the kernel.  To counteract it, I have a gcc feature
>>>> request that might not be all that crazy.  When writing C functions
>>>> intended to be called from asm, what if we could do:
>>>>
>>>> __attribute__((extra_clobber("rbx", "rbp", "r12", "r13", "r14",
>>>> "r15"))) void func(void);
>>>>
>>>> This will save enough pushes and pops that it could easily give us our
>>>> five cycles back and then some.  It's also easy to be compatible with
>>>> old GCC versions -- we could just omit the attribute, since preserving
>>>> a register is always safe.
>>>>
>>>> Thoughts?  Is this totally crazy?  Is it easy to implement?
>>>>
>>>> (I'm not necessarily suggesting that we do this for the syscall bodies
>>>> themselves.  I want to do it for the entry and exit helpers, so we'd
>>>> still lose the five cycles in the full fast-path case, but we'd do
>>>> better in the slower paths, and the slower paths are becoming
>>>> increasingly important in real workloads.)
>>> GCC already supports -ffixed-REG, -fcall-used-REG and -fcall-saved-REG
>>> options, which allow to tweak the calling conventions; but it is per
>>> translation unit right now.  It isn't clear which of these options
>>> you mean with the extra_clobber.
>>> I assume you are looking for a possibility to change this to be
>>> per-function, with caller with a different calling convention having to
>>> adjust for different ABI callee.  To some extent, recent GCC versions
>>> do that automatically with -fipa-ra already - if some call used registers
>>> are not clobbered by some call and the caller can analyze that callee,
>>> it can stick values in such registers across the call.
>>> I'd say the most natural API for this would be to allow
>>> f{fixed,call-{used,saved}}-REG in target attribute.
>>>
>>>
>> One consequence of frequent changing calling convention per function or
>> register usage could be GCC slowdown.  RA calculates too many data and it
>> requires a lot of time to recalculate them after something in the register
>> usage convention is changed.
> Do you mean that RA precalculates things based on the calling
> convention and saves it across functions?
RA calculates a lot info (register classes, class x class relations etc) 
based on register usage convention (fixed regs, call used registers 
etc).  If register usage convention is not changed from previous 
function compilation, RA reuses the info.  Otherwise, RA recalculates it.
>    Hmm.  I don't think this
> would be a big problem in my intended use case -- there would only be
> a handful of functions using this extension, and they'd have very few
> non-asm callers.
Good.  I guess it will be rarely used and people will tolerate some 
extra compilation time.
>> Another consequence would be that RA fails generate the code in some cases
>> and even worse the failure might depend on version of GCC (I already saw PRs
>> where RA worked for an asm in one GCC version because a pseudo was changed
>> by equivalent constant and failed in another GCC version where it did not
>> happen).
>>
> Would this be a problem generating code for a function with extra
> "used" regs or just a problem generating code to call such a function.
> I imagine that, in the former case, RA's job would be easier, not
> harder, since there would be more registers to work with.
Sorry, I meant that the problem will be mostly when the attributes 
describe more fixed regs.  If you describe more clobbered regs, they 
still can be used for allocator which can spill/restore them (around 
calls) when they can not be used. Still i think there will be some rare 
and complicated cases where even describing only clobbered regs can make 
RA fails in a function calling the function with additional clobbered regs.
>    In
> practice, though, I think it would just end up changing the prologue
> and epilogue.
>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: gcc feature request / RFC: extra clobbered regs
  2015-07-01 17:43         ` Jakub Jelinek
@ 2015-07-01 18:12           ` Vladimir Makarov
  2015-07-01 20:09           ` Andy Lutomirski
  2015-07-02  6:16           ` H. Peter Anvin
  2 siblings, 0 replies; 19+ messages in thread
From: Vladimir Makarov @ 2015-07-01 18:12 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Andy Lutomirski, gcc, linux-kernel@vger.kernel.org,
	Linus Torvalds, H. Peter Anvin, Ingo Molnar, Thomas Gleixner



On 07/01/2015 01:43 PM, Jakub Jelinek wrote:
> On Wed, Jul 01, 2015 at 01:35:16PM -0400, Vladimir Makarov wrote:
>> Actually it raise a question for me.  If we describe that a function
>> clobbers more than calling convention and then use it as a value (assigning
>> a variable or passing as an argument) and loosing a track of it and than
>> call it.  How can RA know what the call clobbers actually.  So for the
>> function with the attributes we should prohibit use it as a value or make
>> the attributes as a part of the function type, or at least say it is unsafe.
>> So now I see this as a *bigger problem* with this extension.  Although I
>> guess it already exists as we have description of different ABI as an
>> extension.
> Unfortunately target attribute is function decl attribute rather than
> function type.  And having more attributes affect switchable targets will be
> non-fun.
>
>
Making attributes a part of type probably creates a lot issues too.

Although I am not a front-end developer, still I think it is hard to 
implement in front-end.  Sticking fully to this approach, it would be 
logical to describe this as a debug info (I am not sure it is even 
possible).

Portability would be an issue too.  It is hard to prevent for a regular 
C developer to assign such function to variable because it is ok on his 
system while the compilation of such code may fail on another system.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: gcc feature request / RFC: extra clobbered regs
  2015-07-01 17:43         ` Jakub Jelinek
  2015-07-01 18:12           ` Vladimir Makarov
@ 2015-07-01 20:09           ` Andy Lutomirski
  2015-07-02  6:16           ` H. Peter Anvin
  2 siblings, 0 replies; 19+ messages in thread
From: Andy Lutomirski @ 2015-07-01 20:09 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Vladimir Makarov, Andy Lutomirski, gcc,
	linux-kernel@vger.kernel.org, Linus Torvalds, H. Peter Anvin,
	Ingo Molnar, Thomas Gleixner

On Wed, Jul 1, 2015 at 10:43 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Wed, Jul 01, 2015 at 01:35:16PM -0400, Vladimir Makarov wrote:
>> Actually it raise a question for me.  If we describe that a function
>> clobbers more than calling convention and then use it as a value (assigning
>> a variable or passing as an argument) and loosing a track of it and than
>> call it.  How can RA know what the call clobbers actually.  So for the
>> function with the attributes we should prohibit use it as a value or make
>> the attributes as a part of the function type, or at least say it is unsafe.
>> So now I see this as a *bigger problem* with this extension.  Although I
>> guess it already exists as we have description of different ABI as an
>> extension.
>
> Unfortunately target attribute is function decl attribute rather than
> function type.  And having more attributes affect switchable targets will be
> non-fun.

Just to make sure we're on the same page here, if I write:

extern void normal_func(void);

void weird_func(void) __attribute__((used_regs("r12")))
{
  // do something
  normal_func();
  // do something
}

I'd want the code that calls normal_func() to be understand that
normal_func() *will* preserve r12 despite the fact that weird_func is
allowed to clobber r12.  I think this means that the attribute would
have to be an attribute of a function, not of the RA while compiling
the function.

--Andy

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: gcc feature request / RFC: extra clobbered regs
  2015-07-01 17:43         ` Jakub Jelinek
  2015-07-01 18:12           ` Vladimir Makarov
  2015-07-01 20:09           ` Andy Lutomirski
@ 2015-07-02  6:16           ` H. Peter Anvin
  2 siblings, 0 replies; 19+ messages in thread
From: H. Peter Anvin @ 2015-07-02  6:16 UTC (permalink / raw)
  To: Jakub Jelinek, Vladimir Makarov
  Cc: Andy Lutomirski, gcc, linux-kernel@vger.kernel.org,
	Linus Torvalds, Ingo Molnar, Thomas Gleixner

On 07/01/2015 10:43 AM, Jakub Jelinek wrote:
> On Wed, Jul 01, 2015 at 01:35:16PM -0400, Vladimir Makarov wrote:
>> Actually it raise a question for me.  If we describe that a function
>> clobbers more than calling convention and then use it as a value (assigning
>> a variable or passing as an argument) and loosing a track of it and than
>> call it.  How can RA know what the call clobbers actually.  So for the
>> function with the attributes we should prohibit use it as a value or make
>> the attributes as a part of the function type, or at least say it is unsafe.
>> So now I see this as a *bigger problem* with this extension.  Although I
>> guess it already exists as we have description of different ABI as an
>> extension.
> 
> Unfortunately target attribute is function decl attribute rather than
> function type.  And having more attributes affect switchable targets will be
> non-fun.
> 

How on Earth does that work with existing switchable ABIs?  Keep in mind
that we already support multiple ABIs...

	-hpa



^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2015-07-02  6:17 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-06-30 21:22 gcc feature request / RFC: extra clobbered regs Andy Lutomirski
2015-06-30 21:32 ` H. Peter Anvin
2015-06-30 21:37 ` Jakub Jelinek
2015-06-30 21:41   ` H. Peter Anvin
2015-06-30 21:48     ` Andy Lutomirski
2015-06-30 21:52       ` H. Peter Anvin
2015-06-30 21:55         ` Andy Lutomirski
2015-06-30 22:02           ` H. Peter Anvin
2015-07-01  4:20             ` Jeff Law
2015-07-01 15:23   ` Vladimir Makarov
2015-07-01 15:27     ` Andy Lutomirski
2015-07-01 17:57       ` Vladimir Makarov
2015-07-01 15:31     ` Jakub Jelinek
2015-07-01 17:35       ` Vladimir Makarov
2015-07-01 17:38         ` Andy Lutomirski
2015-07-01 17:43         ` Jakub Jelinek
2015-07-01 18:12           ` Vladimir Makarov
2015-07-01 20:09           ` Andy Lutomirski
2015-07-02  6:16           ` H. Peter Anvin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox