From: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
To: Christopher Li <sparse@chrisli.org>
Cc: Linux-Sparse <linux-sparse@vger.kernel.org>,
Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [RFC] rationale for systematic elimination of OP_SYMADDR instructions
Date: Wed, 26 Apr 2017 04:49:02 +0200 [thread overview]
Message-ID: <CAMHZB6EE93YSm0gVUr=dhNpsTZ1vU8acJ9msyh3VSdTOTVUq-w@mail.gmail.com> (raw)
In-Reply-To: <CANeU7QkhQJrGifw5zvRoN27yVEfmU23B+HG5KQy3R9G3aKWUgw@mail.gmail.com>
On Tue, Apr 25, 2017 at 9:20 PM, Christopher Li <sparse@chrisli.org> wrote:
> On Thu, Mar 9, 2017 at 10:20 PM, Luc Van Oostenryck
> <luc.vanoostenryck@gmail.com> wrote:
>> While investigating some problems related to code generation
>> I realized that OP_SYMADDR are systematically eliminated,
>> the target address are simply replaced by the symbol itself.
>>
>> While it's not wrong per se as it all depends to the semantic
>> we want to give to pseudos and the instructions and how high-
>> or low-level we want to IR, I don't think it was the intention
>> to remove them and more importantly I don't think it's desirable.
>>
>> Those OP_SYMADDR allowed to make a clear separation between a symbol
>> (a name with a type and info for storage & linkage) and its address
>> (which can be stored in memory or in a register and on which
>> arithmetic operations can then be done on it). Once these addresses
>> are replaced by the symbol itself, those symbols can appears almost
>> everywhere in the linearized code:
>> - in calls' arguments,
>> - in adds and subs (while doing pointer arithmetic),
>> - in casts,
>> - in load & stores,
>> - ...
>> and they complicate things considerably once you begin to be
>> interested concretly in things after linearization & simplification
>> since soon or later you will need the address anyway.
>>
>> So my question is:
>> "is there a good reason to eliminate those instructions?",
>
> This change is introduce in 962279e8 by Linus:
>
> Remove OP_SETVAL after symbol-pseudo simplification.
>
> We can just replace all users with the symbol pseudo
> directly.
>
> This means that we can no longer re-do symbol simplification
> after CSE, and we need to rely on the generic memop simplification.
>
> I can see the reason to do that is simplify the CSE. Before this change,
> every reference to the symbol will do a OP_SETVAL (or OP_SYMADDR now
> days) to get the address into a new pseudo. That is extra work for the
> CSE to discover that: "Oh, all those different pseudo are actually the same
> address for the same symbol. Let's replace it with the same pseudo."
>
> I haven't understand why things are more complicate after linearization
> if we replace all the symbol pseudo into one? Even if we don't do it here,
> wouldn't the CSE should do that any way?
>
> The way I see it, the pseudo of the symbol *is* the address of the symbol,
> I don't see a problem using the address of the symbol.
>
> Maybe you have some specific usage case in mind. Can you give some
> example?
Roughly, once you begin to play with code generation, something like
OP_SYMADDR is an operation that you will really do.
Depending on the relocation, it can even be something relatively costly:
I'm thinking static code on a 64bit machine where you can only generate
16bit constants, others cases may be not at all cheaper.
So it's something that soon or later will need to be exposed and
doing CSE on the address is good.
For example, with code as simple as
extern int a;
void foo(void)
{
a = a + 1;
}
compiled for ARM with GCC:
foo:
movw r3, #:lower16:a
movt r3, #:upper16:a
ldr r2, [r3]
add r2, r2, #1
str r2, [r3]
bx lr
The first 2 instructions correspond at taking the address of 'a',
it would be the very direct translation of sparse's:
foo:
.L0:
<entry-point>
symaddr %r1
load.32 %r2 <- 0[%r1]
add.32 %r3 <- %r2, $1
store.32 %r3 -> 0[%r1]
ret
If we would have kept the OP_SYMADRR and doing CSE on it.
But for now we have:
foo:
.L0:
<entry-point>
load.32 %r2 <- 0[a]
add.32 %r3 <- %r2, $1
store.32 %r3 -> 0[a]
ret
whose translation would be:
movw r3, #:lower16:a
movt r3, #:upper16:a
ldr r2, [r3]
add r2, r2, #1
! movw r3, #:lower16:a
! movt r3, #:upper16:a
str r2, [r3]
bx lr
next prev parent reply other threads:[~2017-04-26 2:49 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-03-09 14:20 [RFC] rationale for systematic elimination of OP_SYMADDR instructions Luc Van Oostenryck
2017-04-25 19:20 ` Christopher Li
2017-04-26 2:49 ` Luc Van Oostenryck [this message]
2017-04-26 11:33 ` Christopher Li
2017-04-26 12:17 ` Luc Van Oostenryck
2017-04-26 21:02 ` Christopher Li
2017-04-26 23:02 ` Luc Van Oostenryck
2017-08-10 15:01 ` Christopher Li
2017-08-10 22:16 ` Luc Van Oostenryck
2017-08-11 1:17 ` Christopher Li
2017-08-11 12:25 ` Luc Van Oostenryck
2017-04-26 16:15 ` Linus Torvalds
2017-04-26 23:04 ` Luc Van Oostenryck
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAMHZB6EE93YSm0gVUr=dhNpsTZ1vU8acJ9msyh3VSdTOTVUq-w@mail.gmail.com' \
--to=luc.vanoostenryck@gmail.com \
--cc=linux-sparse@vger.kernel.org \
--cc=sparse@chrisli.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).