* Re: [Qemu-devel] [PATCH v2 00/13] tcg/sparc v8plus code generation
[not found] <1398369715-16102-1-git-send-email-rth@twiddle.net>
@ 2015-07-15 20:54 ` Aurelien Jarno
2015-07-16 21:29 ` Richard Henderson
0 siblings, 1 reply; 6+ messages in thread
From: Aurelien Jarno @ 2015-07-15 20:54 UTC (permalink / raw)
To: Richard Henderson; +Cc: Paolo Bonzini, qemu-devel
On 2014-04-24 13:01, Richard Henderson wrote:
>
> Our 32-bit build for sparc has been requiring a 64-bit capable chip
> for about 2 years now, by way of requiring move-conditional and LE
> memory instructions. But we've mostly been generating 32-bit code
> otherwise.
>
> This patch set changes things so that we make full use of the cpu.
>
> The sparcv8plus code model requires that 64-bit data be kept only
> in the %g and %o registers. These are saved by the kernel in full
> 64-bit slots somewhere. Whereas the %i and %l registers are saved
> via the register window mechanism, and as part of the 32-bit ABI
> we've only allocated 32-bits of stack for storing these. Since the
> register window can roll at any time, due to signals and interrupts,
> we must consider the high bits of %i and %l to be garbage.
>
> This implies that we must treat 32-bit and 64-bit quantities differently.
> For the most part, TCG is good with that. The one case where that falls
> down, however, is when we frob data between widths. Thus the addition
> of the trunc_shr_i32 opcode.
>
> This new opcode, or something like it, would have been required if
> we ever got around to supporting MIPS64 code generation, where 32-bit
> quantities must remain sign-extended in the 64-bit register at all times.
>
> In the case of sparcv8plus, we can get what we need out of the opcode
> merely by setting its register constraints properly.
I am currently trying to review how we handle 32 to 64 and 64 to 32-bit
conversions in QEMU and I have a question about the (now not so) new
trunc_shr_i32 opcode. Sorry for answering such an old email.
While I understand why we need the new trunc_shr_i32 opcode for MIPS64
(the 32-bit values must be kept sign-extended), I currently fail to
see why it is needed for SPARC. I understand only some registers can be
used to store a 64-bit value, but this is not the case for 32-bit
values. I therefore don't see why we would need any specific constraint
for the 64 -> 32 bit conversion (I understand for ext32u and ext32s).
Does it mean that SPARC needs to keep 32-bit values zero-extended? It
doesn't make sense either given the high bits of some of these registers
can become garbage at any moment.
Can you please give some more details about this so that I can add SPARC
target support to the "tcg: improve size changing ops" series? Thanks.
Aurelien
--
Aurelien Jarno GPG: 4096R/1DDD8C9B
aurelien@aurel32.net http://www.aurel32.net
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] [PATCH v2 00/13] tcg/sparc v8plus code generation
2015-07-15 20:54 ` [Qemu-devel] [PATCH v2 00/13] tcg/sparc v8plus code generation Aurelien Jarno
@ 2015-07-16 21:29 ` Richard Henderson
2015-07-17 10:23 ` Aurelien Jarno
0 siblings, 1 reply; 6+ messages in thread
From: Richard Henderson @ 2015-07-16 21:29 UTC (permalink / raw)
To: Aurelien Jarno; +Cc: Paolo Bonzini, qemu-devel
On 07/15/2015 09:54 PM, Aurelien Jarno wrote:
> While I understand why we need the new trunc_shr_i32 opcode for MIPS64
> (the 32-bit values must be kept sign-extended), I currently fail to
> see why it is needed for SPARC.
As far as I recall, it improves code for extracting high parts of 64-bit
quantities. Without this, we wind up with a 64-bit shift, requiring a 64-bit
temp register, followed by the "real" truncate which can copy the data to a
32-bit destination register.
r~
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] [PATCH v2 00/13] tcg/sparc v8plus code generation
2015-07-16 21:29 ` Richard Henderson
@ 2015-07-17 10:23 ` Aurelien Jarno
2015-07-17 13:42 ` Aurelien Jarno
0 siblings, 1 reply; 6+ messages in thread
From: Aurelien Jarno @ 2015-07-17 10:23 UTC (permalink / raw)
To: Richard Henderson; +Cc: Paolo Bonzini, qemu-devel
On 2015-07-16 22:29, Richard Henderson wrote:
> On 07/15/2015 09:54 PM, Aurelien Jarno wrote:
> >While I understand why we need the new trunc_shr_i32 opcode for MIPS64
> >(the 32-bit values must be kept sign-extended), I currently fail to
> >see why it is needed for SPARC.
>
> As far as I recall, it improves code for extracting high parts of 64-bit
> quantities. Without this, we wind up with a 64-bit shift, requiring a
> 64-bit temp register, followed by the "real" truncate which can copy the
> data to a 32-bit destination register.
Ok, I understand the use case now. So it's not for correctness, but
rather to generate more optimized code.
--
Aurelien Jarno GPG: 4096R/1DDD8C9B
aurelien@aurel32.net http://www.aurel32.net
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] [PATCH v2 00/13] tcg/sparc v8plus code generation
2015-07-17 10:23 ` Aurelien Jarno
@ 2015-07-17 13:42 ` Aurelien Jarno
2015-07-18 7:21 ` Richard Henderson
0 siblings, 1 reply; 6+ messages in thread
From: Aurelien Jarno @ 2015-07-17 13:42 UTC (permalink / raw)
To: Richard Henderson; +Cc: Paolo Bonzini, qemu-devel
On 2015-07-17 12:23, Aurelien Jarno wrote:
> On 2015-07-16 22:29, Richard Henderson wrote:
> > On 07/15/2015 09:54 PM, Aurelien Jarno wrote:
> > >While I understand why we need the new trunc_shr_i32 opcode for MIPS64
> > >(the 32-bit values must be kept sign-extended), I currently fail to
> > >see why it is needed for SPARC.
> >
> > As far as I recall, it improves code for extracting high parts of 64-bit
> > quantities. Without this, we wind up with a 64-bit shift, requiring a
> > 64-bit temp register, followed by the "real" truncate which can copy the
> > data to a 32-bit destination register.
>
> Ok, I understand the use case now. So it's not for correctness, but
> rather to generate more optimized code.
OTOH, it means that we always have to go through a 32-bit register first
when truncating a 64-bit value.
I mean we gain in the following case:
shr_i64 t64, t64, i
trunc_i64_i32 t32, t64
...
But we lose in the following case:
trunc_i64_i32 t32, t64
neg t32, t32
...
Overall I guess the advantages far outweigh the disadvantages.
--
Aurelien Jarno GPG: 4096R/1DDD8C9B
aurelien@aurel32.net http://www.aurel32.net
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] [PATCH v2 00/13] tcg/sparc v8plus code generation
2015-07-17 13:42 ` Aurelien Jarno
@ 2015-07-18 7:21 ` Richard Henderson
2015-07-18 21:18 ` Aurelien Jarno
0 siblings, 1 reply; 6+ messages in thread
From: Richard Henderson @ 2015-07-18 7:21 UTC (permalink / raw)
To: Aurelien Jarno; +Cc: Paolo Bonzini, qemu-devel
On 07/17/2015 02:42 PM, Aurelien Jarno wrote:
> On 2015-07-17 12:23, Aurelien Jarno wrote:
>> On 2015-07-16 22:29, Richard Henderson wrote:
>>> On 07/15/2015 09:54 PM, Aurelien Jarno wrote:
>>>> While I understand why we need the new trunc_shr_i32 opcode for MIPS64
>>>> (the 32-bit values must be kept sign-extended), I currently fail to
>>>> see why it is needed for SPARC.
>>>
>>> As far as I recall, it improves code for extracting high parts of 64-bit
>>> quantities. Without this, we wind up with a 64-bit shift, requiring a
>>> 64-bit temp register, followed by the "real" truncate which can copy the
>>> data to a 32-bit destination register.
>>
>> Ok, I understand the use case now. So it's not for correctness, but
>> rather to generate more optimized code.
>
> OTOH, it means that we always have to go through a 32-bit register first
> when truncating a 64-bit value.
>
> I mean we gain in the following case:
> shr_i64 t64, t64, i
> trunc_i64_i32 t32, t64
> ...
>
> But we lose in the following case:
> trunc_i64_i32 t32, t64
> neg t32, t32
> ...
Why do you beleive we're using an extra temp here? Certainly you can't "neg
t32, t64" in any circumstance.
Anyway, this comes up most often with interfacing with the sparcv8plus calling
convention, in which 64-bit quantities must be passed in 2 registers. Before,
we'd emit code like
shrx %g2, 32, %g1
mov %g1, %o0
mov %g2, %o1
After, we're able to put the shift output directly to %o0.
r~
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] [PATCH v2 00/13] tcg/sparc v8plus code generation
2015-07-18 7:21 ` Richard Henderson
@ 2015-07-18 21:18 ` Aurelien Jarno
0 siblings, 0 replies; 6+ messages in thread
From: Aurelien Jarno @ 2015-07-18 21:18 UTC (permalink / raw)
To: Richard Henderson; +Cc: Paolo Bonzini, qemu-devel
On 2015-07-18 08:21, Richard Henderson wrote:
> On 07/17/2015 02:42 PM, Aurelien Jarno wrote:
> >On 2015-07-17 12:23, Aurelien Jarno wrote:
> >>On 2015-07-16 22:29, Richard Henderson wrote:
> >>>On 07/15/2015 09:54 PM, Aurelien Jarno wrote:
> >>>>While I understand why we need the new trunc_shr_i32 opcode for MIPS64
> >>>>(the 32-bit values must be kept sign-extended), I currently fail to
> >>>>see why it is needed for SPARC.
> >>>
> >>>As far as I recall, it improves code for extracting high parts of 64-bit
> >>>quantities. Without this, we wind up with a 64-bit shift, requiring a
> >>>64-bit temp register, followed by the "real" truncate which can copy the
> >>>data to a 32-bit destination register.
> >>
> >>Ok, I understand the use case now. So it's not for correctness, but
> >>rather to generate more optimized code.
> >
> >OTOH, it means that we always have to go through a 32-bit register first
> >when truncating a 64-bit value.
> >
> >I mean we gain in the following case:
> > shr_i64 t64, t64, i
> > trunc_i64_i32 t32, t64
> > ...
> >
> >But we lose in the following case:
> > trunc_i64_i32 t32, t64
> > neg t32, t32
> > ...
>
> Why do you beleive we're using an extra temp here? Certainly you can't "neg
> t32, t64" in any circumstance.
I haven't tried and I am not familiar with the sparc assembly, but I
guess the above code would be translated that way in the with a real
trunc op:
shr %g2, 32, %o0
sub %g0, %o0, %o1
With a trunc op translated into a move, we can directly get:
sub %g2, %g0, %o1
> Anyway, this comes up most often with interfacing with the sparcv8plus
> calling convention, in which 64-bit quantities must be passed in 2
> registers. Before, we'd emit code like
>
> shrx %g2, 32, %g1
> mov %g1, %o0
> mov %g2, %o1
>
> After, we're able to put the shift output directly to %o0.
What is important is to get a more optimized code in general, which is
the case. I believe that given TCG support multiple architectures, it's
difficult to always get the best possible code.
--
Aurelien Jarno GPG: 4096R/1DDD8C9B
aurelien@aurel32.net http://www.aurel32.net
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2015-07-18 21:18 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <1398369715-16102-1-git-send-email-rth@twiddle.net>
2015-07-15 20:54 ` [Qemu-devel] [PATCH v2 00/13] tcg/sparc v8plus code generation Aurelien Jarno
2015-07-16 21:29 ` Richard Henderson
2015-07-17 10:23 ` Aurelien Jarno
2015-07-17 13:42 ` Aurelien Jarno
2015-07-18 7:21 ` Richard Henderson
2015-07-18 21:18 ` Aurelien Jarno
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).