Some thoughts on the SSA debate

linux-sparse.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Some thoughts on the SSA debate
@ 2017-09-08 10:28 Dibyendu Majumdar
  2017-09-09  2:06 ` Luc Van Oostenryck
  0 siblings, 1 reply; 8+ messages in thread
From: Dibyendu Majumdar @ 2017-09-08 10:28 UTC (permalink / raw)
  To: Linux-Sparse

Hi,

Having tried out the newssa approach for a bit, and after working on
another backend that doesn't use SSA phi nodes, I would like to share
some of my personal thoughts with regards to the changes:

1. Firstly I think it is important to retain the existing "pre-phi"
linear code. This is fine for backends such as LLVM which do a good
job of transforming code anyway. It is also usually less buggy and
serves as a baseline for correctness. So after careful consideration I
would vote against adding phi instructions in the initial phase.

2. It also appears to me that the original phi instructions, while not
textbook correct in terms of placement, are internally consistent, in
the sense that the only issue I have found so far is the inability to
handle variables that are initialized in a branch of the code. Perhaps
this is just a bug? The other issue with the original implementation
appears to be related to the patches that attempted to kill things -
but I get the feeling that the original design did not envision
removing unused things physically, but just marked them logically
dead?

3. It also appears that the SSA construction approach by Matthias
Braun et al is inadequate in the sense that it is not a full solution
and requires additional hacks. This is unfortunate as I understood the
aim was to avoid hacks and ad-hoc changes and go with a more published
algorithm. In view of this, perhaps the current experiment has failed
and the effort should be directed to building the industry standard
approach?

4. Final thought is that although I was previously suggesting changes
should go into master, I think for such a large piece of fundamental
work maybe the changes should be done on a dedicated branch and only
merged when completely tested and validated.

I hesitated putting forth my thoughts here as I still have the freedom
in my project to accept / reject changes happening in Sparse, and so I
am able to take a different approach. Also, I am not really qualified
to judge the full implications of the new SSA construction approach -
hence I am going by my impressions after testing and using the
approach, and observing the issues that have been discussed so far on
the list.

Regards
Dibyendu

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Some thoughts on the SSA debate
  2017-09-08 10:28 Some thoughts on the SSA debate Dibyendu Majumdar
@ 2017-09-09  2:06 ` Luc Van Oostenryck
  2017-09-12 10:06   ` Dibyendu Majumdar
  0 siblings, 1 reply; 8+ messages in thread
From: Luc Van Oostenryck @ 2017-09-09  2:06 UTC (permalink / raw)
  To: Dibyendu Majumdar; +Cc: Linux-Sparse

On Fri, Sep 08, 2017 at 11:28:57AM +0100, Dibyendu Majumdar wrote:
> Hi,
> 
> Having tried out the newssa approach for a bit, and after working on
> another backend that doesn't use SSA phi nodes, I would like to share
> some of my personal thoughts with regards to the changes:
> 
> 1. Firstly I think it is important to retain the existing "pre-phi"
> linear code. This is fine for backends such as LLVM which do a good
> job of transforming code anyway. It is also usually less buggy and
> serves as a baseline for correctness. So after careful consideration I
> would vote against adding phi instructions in the initial phase.

That's forgetting that even with the "old/current" method create
phi-nodes during linearization:
- for the return mechanism
- for some conditionals

This also doesn't really take in account the fact that the SSA form
is pretty fundamental to sparse (like it's most modern compilers).

If you have a backend that doesn't understand phi-nodes, it means
that this backend doesn't know and doesn't care about SSA and in
this case you have to call unssa() to put back  the IR in 'normal'
form.
 
> 2. It also appears to me that the original phi instructions, while not
> textbook correct in terms of placement, are internally consistent,

No.
For *some* purposes/aspect, they are somehow consistent and the current
code can do something relatively sensical with them (and often it
seems to be more because of simple coincidence).
But this doesn't change the fact that a lof ot things can't possibly
be done correctly with the actual phi-nodes.

I already explain what are the problems and here it is again:
- broken because the phi-nodes are put not at the meet point but
  at a place where the value is needed, where there was a load.
  this place *may* be in the right BB but often it's in a 'later'
  BB. Real problems then arise when:
  * changing some BB edges (like removing a dead parent, BB merge, ...)
  * some undefined values are (conceptually) present (and the current
    code ignore them).
- broken because when a phi-node is created it somehow inherit the
  phi-sources of any 'parent' phi-nodes when there are some. This
  is even more broken than the previous situation. You can't possibly
  do anything sensical with this.

> 3. It also appears that the SSA construction approach by Matthias
> Braun et al is inadequate in the sense that it is not a full solution
> and requires additional hacks.

This is a misrepresentation of the situation. And what are those 'hacks'
are you talking about?
There are 2 things with their method:
1) It only converts 'purely local variables', the ones that can't possibly
   be aliased to anything.
   It can be adapted to also convert more variables but this would,
   of course, requires doing some sort of alias analysis before the
   conversion and thus before the linearization which is absurd.

   The whole advantage of this method is to be able to do a quick
   conversion of the most essential variables and this with a very
   low cost. This can be very useful when:
   - you're in a lightweight environment and can't afford heavier
     methods (think JIT for example)
   - as a preliminary step for doing alias analysis with code already
     in SSA form or to be able to do some fast optimization/simplification
     before doing alais analysis (and then convert the others variables,
     when possible).

   In case, it is not clear, the current method do this in two steps too:
   - first in simplify_symbol_usage()
   - later in simplify_memops()/simplify_loads()
   but for sure the first step convert more variables than the SSSA method
   (and is not only the SSA conversion is broken, but the detection of
   aliasing is also wrong (but I do't know yet if it is just a small bug
   or if it is more fundamental).

2) The article doesn't explain how to deal with gotos.
   The article also doesn't explain how to deal with ifs, do-whiles,
   switches, ... The article is just that, an article presenting a method.
   It's not a step-by-step guide on how to write a compiler which will
   explain all the practical difficulties you will have.


> 4. Final thought is that although I was previously suggesting changes
> should go into master, I think for such a large piece of fundamental
> work maybe the changes should be done on a dedicated branch and only
> merged when completely tested and validated.

Isn't what is done? Is the code for SSSA already upstreamed?


-- Luc Van Oostenryck

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Some thoughts on the SSA debate
  2017-09-09  2:06 ` Luc Van Oostenryck
@ 2017-09-12 10:06   ` Dibyendu Majumdar
  2017-09-12 10:48     ` Luc Van Oostenryck
  0 siblings, 1 reply; 8+ messages in thread
From: Dibyendu Majumdar @ 2017-09-12 10:06 UTC (permalink / raw)
  To: Luc Van Oostenryck; +Cc: Linux-Sparse

H Luc,

On 9 September 2017 at 03:06, Luc Van Oostenryck
<luc.vanoostenryck@gmail.com> wrote:
> On Fri, Sep 08, 2017 at 11:28:57AM +0100, Dibyendu Majumdar wrote:
>>
>> 1. Firstly I think it is important to retain the existing "pre-phi"
>> linear code. This is fine for backends such as LLVM which do a good
>> job of transforming code anyway. It is also usually less buggy and
>> serves as a baseline for correctness. So after careful consideration I
>> would vote against adding phi instructions in the initial phase.
>
> That's forgetting that even with the "old/current" method create
> phi-nodes during linearization:
> - for the return mechanism
> - for some conditionals
>
> This also doesn't really take in account the fact that the SSA form
> is pretty fundamental to sparse (like it's most modern compilers).
>

I do not agree that the phi nodes "must" be present in the initial
linear output. In fact if you look at clang output, initial linear
output has allocas, and loads/stores. Later on these get converted to
phi nodes. As long as the SSA property is maintained, which it is for
Sparse, then this is fine in my view.

I have some practical reasons for it:

a) In the JIT world, we often want fast compilation. I already disable
all subsequent phases, and generate code from the initial linear
output.
b) Introducing phi nodes and then taking them out doesn't help because
it creates extra steps. I haven't yet tried the unssa option yet - I
may have a look at this at some point.
c) As both LLVM backend and my new backend use load/stores and allocas
anyway, the presence of phi nodes just makes the output more
convoluted as I showed earlier.
d) You are right that some phi nodes are output already in the initial
linear output. I would have preferred if this wasn't the case of
course.

Regards
Dibyendu

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Some thoughts on the SSA debate
  2017-09-12 10:06   ` Dibyendu Majumdar
@ 2017-09-12 10:48     ` Luc Van Oostenryck
  2017-09-12 11:47       ` Dibyendu Majumdar
  0 siblings, 1 reply; 8+ messages in thread
From: Luc Van Oostenryck @ 2017-09-12 10:48 UTC (permalink / raw)
  To: Dibyendu Majumdar; +Cc: Linux-Sparse

On Tue, Sep 12, 2017 at 12:06 PM, Dibyendu Majumdar
<mobile@majumdar.org.uk> wrote:
> H Luc,
>
> On 9 September 2017 at 03:06, Luc Van Oostenryck
> <luc.vanoostenryck@gmail.com> wrote:
>> On Fri, Sep 08, 2017 at 11:28:57AM +0100, Dibyendu Majumdar wrote:
>>>
>>> 1. Firstly I think it is important to retain the existing "pre-phi"
>>> linear code. This is fine for backends such as LLVM which do a good
>>> job of transforming code anyway. It is also usually less buggy and
>>> serves as a baseline for correctness. So after careful consideration I
>>> would vote against adding phi instructions in the initial phase.
>>
>> That's forgetting that even with the "old/current" method create
>> phi-nodes during linearization:
>> - for the return mechanism
>> - for some conditionals
>>
>> This also doesn't really take in account the fact that the SSA form
>> is pretty fundamental to sparse (like it's most modern compilers).
>>
>
> I do not agree that the phi nodes "must" be present in the initial
> linear output.

I fail to see where someone said they must.

It's a fact though, that currently, during linearization, sparse already
create phi-nodes for return values and for some conditional expressions.

> I have some practical reasons for it:
>
> a) In the JIT world, we often want fast compilation. I already disable
> all subsequent phases, and generate code from the initial linear
> output.

Amusingly, the Simple SSA method was (partially) created for such uses.
I think you can easily understand why.

Anyway, I'm more than a bit tired of this 'debate', mainly based by half-truths,
assumptions and misunderstandings.

As I have already stated twice, I've put the code for SSSA on hold, focusing
instead on the *correct* conversion of the *others variables* (since SSSA is
only concerned by local variables which can never be aliased to anything,
depending on your code that can be most variables or only a small number).

When I'll have something I'm happy with, I'll evaluate (read: "based
on numbers")
if there is any value left to the SSSA method or not. If I see there
is a significant
value for some uses, I'll submit it and we can have this 'debate'.
*If* the code reaches upstream and bothers you too much, as you had
already asked
for, you will be able to disable it by simply doing so that the call
to 'is_promotable()'
always returns false and you will have all your nice stores & loads.

Meanwhile, the current code is broken and the existing code with SSSA
*can* be used to test, experiment, evaluate, compare, ...

-- Luc

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Some thoughts on the SSA debate
  2017-09-12 10:48     ` Luc Van Oostenryck
@ 2017-09-12 11:47       ` Dibyendu Majumdar
  2017-09-12 13:06         ` Luc Van Oostenryck
  0 siblings, 1 reply; 8+ messages in thread
From: Dibyendu Majumdar @ 2017-09-12 11:47 UTC (permalink / raw)
  To: Luc Van Oostenryck; +Cc: Linux-Sparse

Hi Luc,

On 12 September 2017 at 11:48, Luc Van Oostenryck
<luc.vanoostenryck@gmail.com> wrote:
>> I have some practical reasons for it:
>>
>> a) In the JIT world, we often want fast compilation. I already disable
>> all subsequent phases, and generate code from the initial linear
>> output.
>
> Amusingly, the Simple SSA method was (partially) created for such uses.
> I think you can easily understand why.

Yes, I think the SSA form is important for simplifications. And for
some back-ends such as NanoJIT which do not have optimizations that
LLVM has, the Sparse layer can act as the optimizer. But the issue
right now is that the simplifications are not always correct even with
the new SSA so that I have to disable everything anyway. And having
the phi nodes then actually complicates things.

>
>
> Anyway, I'm more than a bit tired of this 'debate', mainly based by half-truths,
> assumptions and misunderstandings.

Given that there could be major changes in the output (at least for
the initial one) then it is good to make sure that there is a proper
consideration of pros and cons. I am not arguing against the new
implementation at all - just saying it should be a later phase. I also
accept your point that currently there are issues with the SSA
construction, and accept that changes are necessary.

>
> As I have already stated twice, I've put the code for SSSA on hold, focusing
> instead on the *correct* conversion of the *others variables* (since SSSA is
> only concerned by local variables which can never be aliased to anything,
> depending on your code that can be most variables or only a small number).
>
> When I'll have something I'm happy with, I'll evaluate (read: "based
> on numbers")
> if there is any value left to the SSSA method or not. If I see there
> is a significant
> value for some uses, I'll submit it and we can have this 'debate'.
> *If* the code reaches upstream and bothers you too much, as you had
> already asked
> for, you will be able to disable it by simply doing so that the call
> to 'is_promotable()'
> always returns false and you will have all your nice stores & loads.
>
> Meanwhile, the current code is broken and the existing code with SSSA
> *can* be used to test, experiment, evaluate, compare, ...
>

Okay that is exactly what I am giving you - feedback based on using
the new implementation. In fact I started with the assumption that it
solved all the problems, only after using and testing it found that it
didn't, i.e. later simplifications are still causing issues.

Regards
Dibyendu

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Some thoughts on the SSA debate
  2017-09-12 11:47       ` Dibyendu Majumdar
@ 2017-09-12 13:06         ` Luc Van Oostenryck
  2017-09-13 18:25           ` Dibyendu Majumdar
  0 siblings, 1 reply; 8+ messages in thread
From: Luc Van Oostenryck @ 2017-09-12 13:06 UTC (permalink / raw)
  To: Dibyendu Majumdar; +Cc: Linux-Sparse

On Tue, Sep 12, 2017 at 1:47 PM, Dibyendu Majumdar
<mobile@majumdar.org.uk> wrote:
> Hi Luc,
>
> On 12 September 2017 at 11:48, Luc Van Oostenryck
> <luc.vanoostenryck@gmail.com> wrote:
>>> I have some practical reasons for it:
>>>
>>> a) In the JIT world, we often want fast compilation. I already disable
>>> all subsequent phases, and generate code from the initial linear
>>> output.
>>
>> Amusingly, the Simple SSA method was (partially) created for such uses.
>> I think you can easily understand why.
>
> Yes, I think the SSA form is important for simplifications.

Sure, but the point here is about the Simple SSA method (which can
do the conversion of local variables on the fly, during linearization).
The advantage is that you have then converted the variables
which matter the most and this:
- very early
- you can easily combine simple optimization during the conversion
- with a very low cost (no need for another pass, no need to calculate
  the dominance frontiers).

> And for
> some back-ends such as NanoJIT which do not have optimizations that
> LLVM has, the Sparse layer can act as the optimizer.

And those optimizations are critical for sparse current usage as semantic
checker for the kernel:
- to be able to detect some situations
- to not have false warnings in the context checking.

> But the issue
> right now is that the simplifications are not always correct even with
> the new SSA so that I have to disable everything anyway.

It's what you do but it's not what I do.
The other part that is broken is simplify_loads() which try to convert
more variables after each simplification/CSE steps.
Since the code is copied from the code for the initial conversion
it's broken in the same ways.
I just disabled it and things are then correct.

Now, be assured that I understand very well that the way you use
sparse is quite different. Things like trying to reach a fix-point while
doing simplification is at the opposite spectrum from what people
would like for JIT.

> And having
> the phi nodes then actually complicates things.

I also want to insist once more that SSA is fundamental to sparse.
I won't do anything that purposely makes sparse more difficult to use
without SSA but you must understand that it's the least of my priorities.
Even more so because even without conversion in itself the 'normal'
already create phi-nodes for returns & some conditionals.
On the contrary, my priority is to make sparse to produce and keep
correct SSA.

>> Anyway, I'm more than a bit tired of this 'debate', mainly based by half-truths,
>> assumptions and misunderstandings.
>
> Given that there could be major changes in the output (at least for
> the initial one) then it is good to make sure that there is a proper
> consideration of pros and cons. I am not arguing against the new
> implementation at all - just saying it should be a later phase.

It's where we disagree. What I'm saying is:
- we want to also convert pointers & non-local variables decently
  (the general case is undecidable, anyway).
- you can't convert all the non-local variables at once. If you have
  several optimization steps/rounds then each of them can create
  more conversion opportunities.
- let see what is possible to have at which cost
- *if* there is some good enough advantage to do a first quick
  conversion on-the-fly let's do it (and it can be a very good thing,
  for some usage, to just do that and stop there).
- the extreme cases are not really concerned anyway as they don't
  want to run any of this
- not doing the conversion (of local variables) has also its cost since
  then you:
  * have to deal in the IR with stores and loads instructions for *every* access
  * are unable to do most simplifications, even the cheapest ones
  * have to deal at run-time with these memory accesses instead of being able
     to use registers.

>>
>> Meanwhile, the current code is broken and the existing code with SSSA
>> *can* be used to test, experiment, evaluate, compare, ...
>
> Okay that is exactly what I am giving you - feedback based on using
> the new implementation. In fact, I started with the assumption that it
> solved all the problems, only after using and testing it found that it
> didn't, i.e. later simplifications are still causing issues.

Be assured that it is something I didn't directly realize but it doesn't
change anything to what can be done at the initial stage. And as I said
above, you can put the single call to simplify_loads() in comment
and be done with it, just like I do here.

-- Luc

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Some thoughts on the SSA debate
  2017-09-12 13:06         ` Luc Van Oostenryck
@ 2017-09-13 18:25           ` Dibyendu Majumdar
  2017-09-13 19:10             ` Luc Van Oostenryck
  0 siblings, 1 reply; 8+ messages in thread
From: Dibyendu Majumdar @ 2017-09-13 18:25 UTC (permalink / raw)
  To: Luc Van Oostenryck; +Cc: Linux-Sparse

Hi Luc,

On 12 September 2017 at 14:06, Luc Van Oostenryck
<luc.vanoostenryck@gmail.com> wrote:
> On Tue, Sep 12, 2017 at 1:47 PM, Dibyendu Majumdar
>> But the issue
>> right now is that the simplifications are not always correct even with
>> the new SSA so that I have to disable everything anyway.
>
> It's what you do but it's not what I do.
> The other part that is broken is simplify_loads() which try to convert
> more variables after each simplification/CSE steps.
> Since the code is copied from the code for the initial conversion
> it's broken in the same ways.
> I just disabled it and things are then correct.
>

I probably ought to update my project with the latest version you
have. Please let me know if you have an updated minimal changes branch
that I can apply.

Thanks and Regards
Dibyendu

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Some thoughts on the SSA debate
  2017-09-13 18:25           ` Dibyendu Majumdar
@ 2017-09-13 19:10             ` Luc Van Oostenryck
  0 siblings, 0 replies; 8+ messages in thread
From: Luc Van Oostenryck @ 2017-09-13 19:10 UTC (permalink / raw)
  To: Dibyendu Majumdar; +Cc: Linux-Sparse

On Wed, Sep 13, 2017 at 8:25 PM, Dibyendu Majumdar
<mobile@majumdar.org.uk> wrote:
> Hi Luc,
>
> On 12 September 2017 at 14:06, Luc Van Oostenryck
> <luc.vanoostenryck@gmail.com> wrote:
>> On Tue, Sep 12, 2017 at 1:47 PM, Dibyendu Majumdar
>>> But the issue
>>> right now is that the simplifications are not always correct even with
>>> the new SSA so that I have to disable everything anyway.
>>
>> It's what you do but it's not what I do.
>> The other part that is broken is simplify_loads() which try to convert
>> more variables after each simplification/CSE steps.
>> Since the code is copied from the code for the initial conversion
>> it's broken in the same ways.
>> I just disabled it and things are then correct.
>>
>
> I probably ought to update my project with the latest version you
> have. Please let me know if you have an updated minimal changes branch
> that I can apply.

For SSSA, I have what you should have already or something quite close:
* https://github.com/lucvoo/sparse/tree/sssa-mini-v3

And the same but with all LLVM fixes and others fixes or optimizations:
* https://github.com/lucvoo/sparse/tree/sssa-next

I also have something I use only as a sort of reference which I have
absolutely no intention to ever submit. It doesn't do the SSA conversion
during linearization and converts more vars. It also seems to be quite
correct and so is useful to me for doing comparisons but that's all (and it's
not tested for against the kernel, for example):
* https://github.com/lucvoo/sparse/tree/mem2ssa-next

None of these trees are stable as I often rebuild them while working
on the base.
So, I'm not sure how useful they can be for you but if you find a problem
with linearization, SSA conversion or optimization, you can always check
with one of these (probably the last one) to see if it's already fixed or not.

-- Luc

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-09-13 19:10 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-09-08 10:28 Some thoughts on the SSA debate Dibyendu Majumdar
2017-09-09  2:06 ` Luc Van Oostenryck
2017-09-12 10:06   ` Dibyendu Majumdar
2017-09-12 10:48     ` Luc Van Oostenryck
2017-09-12 11:47       ` Dibyendu Majumdar
2017-09-12 13:06         ` Luc Van Oostenryck
2017-09-13 18:25           ` Dibyendu Majumdar
2017-09-13 19:10             ` Luc Van Oostenryck

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).