* master merge plans
@ 2017-08-21 19:39 Christopher Li
2017-08-22 13:32 ` Luc Van Oostenryck
0 siblings, 1 reply; 33+ messages in thread
From: Christopher Li @ 2017-08-21 19:39 UTC (permalink / raw)
To: Luc Van Oostenryck; +Cc: Linux-Sparse
Here is the stuff I am considering:
1) gcc attribute update (trivial and safe to merge)
2) Makefile update to enable debug builds.
This I want to merge first because it impact a lot of
Makefile, other patches typically add some source file.
Which is easy to resolve the conflict than the other way
around.
I will send out the update soon
3) Luc's SSSA
Luc has sssa-mini-v2 now. Is that the latest on to be merged?
I still want to have option to disable SSSA and option to enable
old code path. That can be done after the merge though.
Any thing else I missed? Please let me know.
Chris
^ permalink raw reply [flat|nested] 33+ messages in thread* Re: master merge plans 2017-08-21 19:39 master merge plans Christopher Li @ 2017-08-22 13:32 ` Luc Van Oostenryck 2017-08-22 14:47 ` Christopher Li 2017-08-22 14:53 ` Dibyendu Majumdar 0 siblings, 2 replies; 33+ messages in thread From: Luc Van Oostenryck @ 2017-08-22 13:32 UTC (permalink / raw) To: Christopher Li; +Cc: Linux-Sparse On Mon, Aug 21, 2017 at 9:39 PM, Christopher Li <sparse@chrisli.org> wrote: > Here is the stuff I am considering: > > 3) Luc's SSSA > Luc has sssa-mini-v2 now. Is that the latest on to be merged? > I still want to have option to disable SSSA and option to enable > old code path. That can be done after the merge though. I'm busy to make significant changes. I'll resubmit everything in a few days. > Any thing else I missed? Please let me know. Nicolai's constexpr series. -- Luc ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: master merge plans 2017-08-22 13:32 ` Luc Van Oostenryck @ 2017-08-22 14:47 ` Christopher Li 2017-08-22 15:51 ` Christopher Li 2017-08-22 14:53 ` Dibyendu Majumdar 1 sibling, 1 reply; 33+ messages in thread From: Christopher Li @ 2017-08-22 14:47 UTC (permalink / raw) To: Luc Van Oostenryck; +Cc: Linux-Sparse On Tue, Aug 22, 2017 at 9:32 AM, Luc Van Oostenryck <luc.vanoostenryck@gmail.com> wrote: > > Nicolai's constexpr series. > Is that constexpr-v4 branch on your git repository you want me to merge? If you have a newer one, please post the git url. Chris ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: master merge plans 2017-08-22 14:47 ` Christopher Li @ 2017-08-22 15:51 ` Christopher Li 2017-08-22 20:05 ` Luc Van Oostenryck 0 siblings, 1 reply; 33+ messages in thread From: Christopher Li @ 2017-08-22 15:51 UTC (permalink / raw) To: Luc Van Oostenryck; +Cc: Linux-Sparse On Tue, Aug 22, 2017 at 10:47 AM, Christopher Li <sparse@chrisli.org> wrote: > On Tue, Aug 22, 2017 at 9:32 AM, Luc Van Oostenryck > <luc.vanoostenryck@gmail.com> wrote: >> >> Nicolai's constexpr series. >> > > Is that constexpr-v4 branch on your git repository you want me to merge? There might be a tiny bit slow down due to the constexpr. I will run more test to see if I can ping point it. The last 3 group in each test was start by a script. To be continue on this. The current number is not conclusive. ================ master merge constexpr-v4==================== 1202.35user 446.83system 1:16.27elapsed 2162%CPU (0avgtext+0avgdata 238104maxresident)k 352inputs+12768outputs (0major+128889305minor)pagefaults 0swaps 1204.87user 447.25system 1:16.28elapsed 2165%CPU (0avgtext+0avgdata 238080maxresident)k 0inputs+12768outputs (0major+128888977minor)pagefaults 0swaps 1202.92user 447.03system 1:16.42elapsed 2158%CPU (0avgtext+0avgdata 238048maxresident)k 0inputs+12768outputs (0major+128888805minor)pagefaults 0swaps 1206.48user 446.75system 1:16.36elapsed 2164%CPU (0avgtext+0avgdata 238072maxresident)k 0inputs+12768outputs (0major+128888959minor)pagefaults 0swaps 1207.37user 447.35system 1:16.57elapsed 2161%CPU (0avgtext+0avgdata 238072maxresident)k 0inputs+12768outputs (0major+128888909minor)pagefaults 0swaps 1204.73user 446.87system 1:16.39elapsed 2161%CPU (0avgtext+0avgdata 238040maxresident)k 0inputs+12776outputs (0major+128889143minor)pagefaults 0swaps 1205.96user 447.91system 1:16.39elapsed 2164%CPU (0avgtext+0avgdata 238080maxresident)k 0inputs+12768outputs (0major+128889529minor)pagefaults 0swaps 1207.49user 448.00system 1:16.37elapsed 2167%CPU (0avgtext+0avgdata 238044maxresident)k 0inputs+12784outputs (0major+128888947minor)pagefaults 0swaps ================ rc5 ===================================== 1201.56user 448.22system 1:16.24elapsed 2163%CPU (0avgtext+0avgdata 238048maxresident)k 0inputs+12768outputs (0major+128883583minor)pagefaults 0swaps 1203.27user 447.90system 1:16.22elapsed 2166%CPU (0avgtext+0avgdata 238036maxresident)k 0inputs+12768outputs (0major+128882925minor)pagefaults 0swaps 1201.65user 448.79system 1:16.20elapsed 2165%CPU (0avgtext+0avgdata 238044maxresident)k 0inputs+12768outputs (0major+128883529minor)pagefaults 0swaps 1202.19user 448.79system 1:16.24elapsed 2165%CPU (0avgtext+0avgdata 238192maxresident)k 0inputs+12768outputs (0major+128883477minor)pagefaults 0swaps 1203.38user 449.73system 1:16.33elapsed 2165%CPU (0avgtext+0avgdata 238044maxresident)k 0inputs+12776outputs (0major+128883089minor)pagefaults 0swaps 1203.37user 449.05system 1:16.23elapsed 2167%CPU (0avgtext+0avgdata 238080maxresident)k 0inputs+12768outputs (0major+128883199minor)pagefaults 0swaps Chris ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: master merge plans 2017-08-22 15:51 ` Christopher Li @ 2017-08-22 20:05 ` Luc Van Oostenryck 2017-08-23 20:50 ` Christopher Li 0 siblings, 1 reply; 33+ messages in thread From: Luc Van Oostenryck @ 2017-08-22 20:05 UTC (permalink / raw) To: Christopher Li; +Cc: Linux-Sparse On Tue, Aug 22, 2017 at 5:51 PM, Christopher Li <sparse@chrisli.org> wrote: >> >> Is that constexpr-v4 branch on your git repository you want me to merge? Yes. > There might be a tiny bit slow down due to the constexpr. I will run more test > to see if I can ping point it. The last 3 group in each test was start by > a script. To be continue on this. The current number is not conclusive. At worse these numbers show something like 0.1-0.3%. -- Luc ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: master merge plans 2017-08-22 20:05 ` Luc Van Oostenryck @ 2017-08-23 20:50 ` Christopher Li 2017-08-29 11:27 ` Christopher Li 0 siblings, 1 reply; 33+ messages in thread From: Christopher Li @ 2017-08-23 20:50 UTC (permalink / raw) To: Luc Van Oostenryck; +Cc: Linux-Sparse On Tue, Aug 22, 2017 at 4:05 PM, Luc Van Oostenryck <luc.vanoostenryck@gmail.com> wrote: > > At worse these numbers show something like 0.1-0.3%. I am doing some finial stage testing and reviewing the const expr patch. I have find out that, my server actually do get affect by the CPU temperature. So I turn off the Intel Turbo Stepping. Allow some rest between test. Then I have finish write a script to do the testing per each commit. I include the log in the email. I can't narrow to one commit cause a big jump. No a reason to against the patch. I am just interested to see that the measurement can catch such change in numbers. Chris BTW, for testing purpose, I rebase against master. That is *not* how I am going to merge it. exp~0 17c0bae constexpr: flag __builtin_bswap() as constexpr real 1m24.786s user 20m18.346s sys 7m24.193s real 1m18.803s user 20m55.562s sys 7m36.994s real 1m18.827s user 20m55.731s sys 7m36.688s real 1m18.848s user 20m57.298s sys 7m35.525s real 1m18.922s user 20m56.519s sys 7m36.264s exp~1 58d888a give default return type in evaluate_call() real 1m18.989s user 20m59.896s sys 7m35.895s real 1m18.934s user 20m58.567s sys 7m36.808s real 1m19.037s user 20m58.249s sys 7m37.441s real 1m18.993s user 20m59.233s sys 7m36.422s real 1m19.092s user 20m58.818s sys 7m37.009s exp~2 ba12c3c return an error if too few args real 1m18.984s user 20m59.210s sys 7m37.683s real 1m18.993s user 20m58.759s sys 7m37.619s real 1m19.133s user 20m59.426s sys 7m37.658s real 1m19.067s user 21m0.375s sys 7m36.571s real 1m19.040s user 20m59.331s sys 7m37.687s exp~3 ace5869 constexpr: treat comparisons between types as integer constexpr real 1m18.853s user 20m57.653s sys 7m37.073s real 1m18.870s user 20m56.782s sys 7m38.441s real 1m18.949s user 20m56.683s sys 7m37.473s real 1m19.101s user 20m57.496s sys 7m37.183s real 1m18.889s user 20m57.439s sys 7m37.235s exp~4 82f78d2 constexpr: support compound literals as address constants real 1m18.966s user 20m56.740s sys 7m37.477s real 1m18.882s user 20m55.245s sys 7m38.047s real 1m18.872s user 20m56.149s sys 7m37.610s real 1m19.015s user 20m54.708s sys 7m38.447s real 1m18.831s user 20m55.296s sys 7m37.978s exp~5 df2d126 constexpr: relax some constant expression rules for pointer expressions real 1m18.831s user 20m55.178s sys 7m37.580s real 1m18.802s user 20m55.791s sys 7m36.577s real 1m18.920s user 20m55.301s sys 7m37.593s real 1m18.801s user 20m54.810s sys 7m37.699s real 1m18.814s user 20m55.836s sys 7m36.516s exp~6 96f5c7e constexpr: flag builtins constant_p, safe_p and warning as constexprs real 1m19.001s user 20m54.861s sys 7m39.497s real 1m18.894s user 20m54.176s sys 7m39.225s real 1m18.955s user 20m55.225s sys 7m38.273s real 1m18.837s user 20m55.395s sys 7m38.423s real 1m19.038s user 20m56.132s sys 7m37.417s exp~7 82ba05e constexpr: examine constness of __builtin_offsetof at evaluation only real 1m18.844s user 20m52.640s sys 7m38.045s real 1m18.796s user 20m52.641s sys 7m38.324s real 1m18.908s user 20m53.584s sys 7m37.451s real 1m18.842s user 20m52.688s sys 7m38.812s real 1m18.806s user 20m53.336s sys 7m37.414s exp~8 f585495 constexpr: recognize references to labels as address constants real 1m18.921s user 20m53.766s sys 7m37.205s real 1m18.784s user 20m53.527s sys 7m37.331s real 1m18.846s user 20m54.222s sys 7m36.130s real 1m18.900s user 20m53.766s sys 7m36.899s real 1m18.676s user 20m53.485s sys 7m37.334s exp~9 c19757f constexpr: recognize string literals as address constants real 1m19.184s user 20m54.707s sys 7m37.612s real 1m18.782s user 20m54.507s sys 7m37.990s real 1m18.830s user 20m54.008s sys 7m38.648s real 1m18.984s user 20m53.784s sys 7m38.611s real 1m18.770s user 20m53.433s sys 7m39.090s exp~10 a1ba3b8 constexpr: recognize members of static compound objects as address constants real 1m19.012s user 20m53.325s sys 7m37.647s real 1m18.815s user 20m53.541s sys 7m38.065s real 1m18.716s user 20m52.065s sys 7m39.047s real 1m18.762s user 20m52.854s sys 7m38.205s real 1m18.867s user 20m53.234s sys 7m38.251s exp~11 89f03a1 constexpr: recognize address constants created through pointer arithmetic real 1m18.891s user 20m56.707s sys 7m37.824s real 1m18.944s user 20m56.198s sys 7m37.925s real 1m18.980s user 20m57.025s sys 7m37.688s real 1m18.861s user 20m56.359s sys 7m38.265s real 1m18.927s user 20m56.978s sys 7m37.942s exp~12 add9d6b constexpr: recognize address constants created through casts real 1m18.827s user 20m54.727s sys 7m37.786s real 1m18.823s user 20m55.051s sys 7m37.619s real 1m18.840s user 20m55.395s sys 7m37.611s real 1m18.828s user 20m54.439s sys 7m37.358s real 1m18.996s user 20m55.342s sys 7m37.761s exp~13 1c5d817 constexpr: recognize static objects as address constants real 1m18.849s user 20m55.353s sys 7m37.570s real 1m18.951s user 20m55.800s sys 7m36.969s real 1m18.901s user 20m56.035s sys 7m37.285s real 1m19.008s user 20m54.143s sys 7m38.469s real 1m18.851s user 20m54.583s sys 7m38.325s exp~14 5931437 constexpr: check static storage duration objects' intializers' constness real 1m18.963s user 20m57.428s sys 7m36.810s real 1m18.992s user 20m57.136s sys 7m37.428s real 1m18.990s user 20m57.323s sys 7m36.503s real 1m18.889s user 20m57.453s sys 7m37.012s real 1m18.916s user 20m56.250s sys 7m37.578s exp~15 3d1f615 constexpr: collect storage modifiers of initializers real 1m18.620s user 20m50.093s sys 7m38.680s real 1m18.710s user 20m52.010s sys 7m37.180s real 1m18.658s user 20m51.324s sys 7m37.858s real 1m18.708s user 20m50.750s sys 7m38.064s real 1m18.729s user 20m50.268s sys 7m38.261s exp~16 c5b00b9 constexpr: rename handle_simple_initializer() to handle_initializer() real 1m18.762s user 20m53.856s sys 7m39.205s real 1m19.032s user 20m53.479s sys 7m39.081s real 1m18.952s user 20m53.434s sys 7m39.073s real 1m18.841s user 20m53.849s sys 7m38.662s real 1m18.876s user 20m53.638s sys 7m39.069s exp~17 ce1284b constexpr: add support for tagging address constants real 1m18.841s user 20m53.490s sys 7m38.805s real 1m18.808s user 20m53.957s sys 7m39.191s real 1m18.979s user 20m54.010s sys 7m38.661s real 1m18.923s user 20m53.627s sys 7m39.547s real 1m18.837s user 20m54.008s sys 7m38.673s exp~18 ca44c48 constexpr: add support for tagging arithmetic constant expressions real 1m18.732s user 20m52.681s sys 7m37.328s real 1m18.795s user 20m52.313s sys 7m37.950s real 1m18.755s user 20m52.972s sys 7m37.638s real 1m18.964s user 20m52.372s sys 7m38.090s real 1m18.906s user 20m52.971s sys 7m37.603s exp~19 1d14279 constexpr: examine constness of conditionals at evaluation only real 1m18.778s user 20m52.130s sys 7m38.598s real 1m18.781s user 20m53.484s sys 7m37.828s real 1m18.917s user 20m53.524s sys 7m37.803s real 1m18.855s user 20m52.568s sys 7m38.649s real 1m18.880s user 20m52.250s sys 7m38.596s exp~20 d032f28 constexpr: examine constness of preops at evaluation only real 1m18.670s user 20m52.232s sys 7m37.297s real 1m18.740s user 20m51.975s sys 7m37.008s real 1m18.690s user 20m51.828s sys 7m37.761s real 1m18.874s user 20m50.968s sys 7m38.967s real 1m18.689s user 20m51.353s sys 7m37.997s exp~21 2322e92 constexpr: examine constness of binops and alike at evaluation only real 1m18.848s user 20m52.420s sys 7m38.549s real 1m18.964s user 20m53.504s sys 7m37.359s real 1m18.687s user 20m52.094s sys 7m39.136s real 1m18.730s user 20m52.640s sys 7m38.264s real 1m18.738s user 20m52.135s sys 7m38.986s exp~22 63db9df constexpr: examine constness of casts at evaluation only real 1m18.931s user 20m52.036s sys 7m38.964s real 1m18.781s user 20m51.949s sys 7m38.190s real 1m18.822s user 20m52.409s sys 7m38.207s real 1m18.819s user 20m51.696s sys 7m39.160s real 1m18.939s user 20m51.307s sys 7m39.398s exp~23 3a4d6eb constexpr: init flags at expression allocation real 1m18.735s user 20m51.678s sys 7m37.659s real 1m18.763s user 20m51.829s sys 7m37.649s real 1m18.878s user 20m51.538s sys 7m37.923s real 1m18.764s user 20m51.689s sys 7m37.561s real 1m18.842s user 20m50.505s sys 7m38.146s exp~24 6b1e4ad constexpr: introduce additional expression constness tracking flags real 1m18.821s user 20m52.272s sys 7m38.586s real 1m18.829s user 20m50.961s sys 7m39.365s real 1m18.821s user 20m52.493s sys 7m38.452s real 1m18.850s user 20m51.705s sys 7m39.267s real 1m18.923s user 20m53.136s sys 7m37.625s exp~25 9151ef4 gcc attr: add nonstring warn_if_not_aligned real 1m18.813s user 20m51.490s sys 7m38.610s real 1m18.703s user 20m50.492s sys 7m38.937s real 1m18.723s user 20m50.613s sys 7m38.524s real 1m18.654s user 20m52.010s sys 7m37.222s real 1m18.790s user 20m51.441s sys 7m38.248s ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: master merge plans 2017-08-23 20:50 ` Christopher Li @ 2017-08-29 11:27 ` Christopher Li 2017-09-03 19:24 ` Luc Van Oostenryck 0 siblings, 1 reply; 33+ messages in thread From: Christopher Li @ 2017-08-29 11:27 UTC (permalink / raw) To: Luc Van Oostenryck; +Cc: Linux-Sparse On Wed, Aug 23, 2017 at 4:50 PM, Christopher Li <sparse@chrisli.org> wrote: > On Tue, Aug 22, 2017 at 4:05 PM, Luc Van Oostenryck > <luc.vanoostenryck@gmail.com> wrote: >> >> At worse these numbers show something like 0.1-0.3%. > > I am doing some finial stage testing and reviewing the const expr patch. > The context expression branch has been merge to master. Chris ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: master merge plans 2017-08-29 11:27 ` Christopher Li @ 2017-09-03 19:24 ` Luc Van Oostenryck 0 siblings, 0 replies; 33+ messages in thread From: Luc Van Oostenryck @ 2017-09-03 19:24 UTC (permalink / raw) To: Christopher Li; +Cc: Linux-Sparse On Tue, Aug 29, 2017 at 1:27 PM, Christopher Li <sparse@chrisli.org> wrote: > On Wed, Aug 23, 2017 at 4:50 PM, Christopher Li <sparse@chrisli.org> wrote: >> On Tue, Aug 22, 2017 at 4:05 PM, Luc Van Oostenryck >> <luc.vanoostenryck@gmail.com> wrote: >>> >>> At worse these numbers show something like 0.1-0.3%. >> >> I am doing some finial stage testing and reviewing the const expr patch. >> > > The context expression branch has been merge to master. That's really great! Thanks. -- Luc ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: master merge plans 2017-08-22 13:32 ` Luc Van Oostenryck 2017-08-22 14:47 ` Christopher Li @ 2017-08-22 14:53 ` Dibyendu Majumdar 2017-08-22 14:59 ` Christopher Li 1 sibling, 1 reply; 33+ messages in thread From: Dibyendu Majumdar @ 2017-08-22 14:53 UTC (permalink / raw) To: Luc Van Oostenryck; +Cc: Christopher Li, Linux-Sparse Hi, On 22 August 2017 at 14:32, Luc Van Oostenryck <luc.vanoostenryck@gmail.com> wrote: > On Mon, Aug 21, 2017 at 9:39 PM, Christopher Li <sparse@chrisli.org> wrote: >> Here is the stuff I am considering: >> >> 3) Luc's SSSA >> Luc has sssa-mini-v2 now. Is that the latest on to be merged? >> Any thing else I missed? Please let me know. > > Nicolai's constexpr series. > It will be great if the SSSA series is merged and tested a bit before merging other changes. Regards Dibyendu ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: master merge plans 2017-08-22 14:53 ` Dibyendu Majumdar @ 2017-08-22 14:59 ` Christopher Li 2017-09-03 20:26 ` Simple SSA status Luc Van Oostenryck 0 siblings, 1 reply; 33+ messages in thread From: Christopher Li @ 2017-08-22 14:59 UTC (permalink / raw) To: Dibyendu Majumdar; +Cc: Luc Van Oostenryck, Linux-Sparse On Tue, Aug 22, 2017 at 10:53 AM, Dibyendu Majumdar <mobile@majumdar.org.uk> wrote: >> > > It will be great if the SSSA series is merged and tested a bit before > merging other changes. I can create a topic branch for SSSA on sparse repository. Merge it when it is ready? Right now I do wish the SSSA has the two options I request before. Chris ^ permalink raw reply [flat|nested] 33+ messages in thread
* Simple SSA status 2017-08-22 14:59 ` Christopher Li @ 2017-09-03 20:26 ` Luc Van Oostenryck 2017-09-04 18:29 ` Christopher Li 0 siblings, 1 reply; 33+ messages in thread From: Luc Van Oostenryck @ 2017-09-03 20:26 UTC (permalink / raw) To: Christopher Li; +Cc: Dibyendu Majumdar, Linux-Sparse, Linus Torvalds On Tue, Aug 22, 2017 at 10:59:54AM -0400, Christopher Li wrote: > On Tue, Aug 22, 2017 at 10:53 AM, Dibyendu Majumdar > <mobile@majumdar.org.uk> wrote: > >> > > > > It will be great if the SSSA series is merged and tested a bit before > > merging other changes. > > I can create a topic branch for SSSA on sparse repository. > > Merge it when it is ready? I've put it on hold for the moment. The conversion of purely local vars was the easy job. The real challenge is to do it for the other loads & stores (in simplify_symbol_usage() & simplify_memops()). I'm working on it and have some encouraging results: - correct phi-nodes - pass the test-suite - no crashes in GCC test-suite and others - no infinite loops - no catastrophic performance on exceptional workoads - roughly convert as much stores & loads as currently - good performance (within 2-5% as rc5 on some workloads) - in some case, surprising good effect on optimization I don't know yet if keeping the Simple SSA during linearization will be worth to keep or not. > Right now I do wish the SSSA has the two options I request > before. I don't remember what was the other option you asked but keeping both the old and the new method is not something I'm interested in. We know that the current method is broken. In fact, it's broken in two different ways: - the phi-nodes are mis-placed (they are on the use site instead of the meet point). - each phi-nodes have as many sources as there are definitions for this variable (more or less) instead of having one for each parents. I also recently discovered that the logic for aliases is broken. For example, code like: int foo(void) { int i = 1; int *a; int **p; a = &i; p = &a; *p[0] = 0; return i; } is mis-compiled as it always returns 1 instead of 0 (it fails to see that *p[0] is aliased to i). What I'm interested in is, in this order: 1) produce correct IR 2) convert as much loads & stores as currently (but only when it's correct to do it, of course) 3) have performance that is similar as we currently have Point 2) is needed to avoid more false warnings during context checking (and have a very significant effect on performance). -- Luc ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Simple SSA status 2017-09-03 20:26 ` Simple SSA status Luc Van Oostenryck @ 2017-09-04 18:29 ` Christopher Li 2017-09-04 20:07 ` Luc Van Oostenryck 0 siblings, 1 reply; 33+ messages in thread From: Christopher Li @ 2017-09-04 18:29 UTC (permalink / raw) To: Luc Van Oostenryck; +Cc: Dibyendu Majumdar, Linux-Sparse, Linus Torvalds On Sun, Sep 3, 2017 at 4:26 PM, Luc Van Oostenryck <luc.vanoostenryck@gmail.com> wrote: >> >> Merge it when it is ready? > > I've put it on hold for the moment. Thanks for the update. Glad to heard you back. > The conversion of purely local vars was the easy job. > The real challenge is to do it for the other loads & stores > (in simplify_symbol_usage() & simplify_memops()). > I'm working on it and have some encouraging results: > - correct phi-nodes > - pass the test-suite > - no crashes in GCC test-suite and others > - no infinite loops > - no catastrophic performance on exceptional workoads > - roughly convert as much stores & loads as currently > - good performance (within 2-5% as rc5 on some workloads) > - in some case, surprising good effect on optimization That is great. Very exciting news. > > I don't know yet if keeping the Simple SSA during linearization > will be worth to keep or not. I have the same question as well. When I find out the Simple SSA can't handle memory to register using pointers. It already means we need to keep the other non local variable path alive. We can do some bench mark to find out. If the performance is very similar, remove the simple SSA will actually simplify the code because we only need to care about one code path. >> Right now I do wish the SSSA has the two options I request >> before. > > I don't remember what was the other option you asked but keeping > both the old and the new method is not something I'm interested in. > We know that the current method is broken. In fact, it's broken in two > different ways: I have no intend to keep the broken behavior. My wish list for the two options rephrased: 1) Option to by pass the simpole SSA conversion which generate the raw load/store instruction before convert memory to register. 2) Option to do the memory to register conversion not cover by simple SSA conversion. It sounds like you implement this already. > - the phi-nodes are mis-placed (they are on the use site instead of the > meet point). Right. The meet point on the book also call Dominance Frontier :-) > - each phi-nodes have as many sources as there are definitions for this > variable (more or less) instead of having one for each parents. I am not sure I understand this. If you always place phi node at DF. It will have as many source as incoming edge. That is part of the common SSA dominance property. If we do it right, I can't see why we still need the phi source instruction. > > I also recently discovered that the logic for aliases is broken. > For example, code like: > int foo(void) > { > int i = 1; > int *a; > int **p; > > a = &i; > p = &a; > *p[0] = 0; > return i; > } > is mis-compiled as it always returns 1 instead of 0 > (it fails to see that *p[0] is aliased to i). That is exactly why we need instruction level aliases analyze. The SSA conversion should be done after aliases analyze. > > What I'm interested in is, in this order: > 1) produce correct IR Agree. > 2) convert as much loads & stores as currently > (but only when it's correct to do it, of course) Agree. > 3) have performance that is similar as we currently have Agree. > > Point 2) is needed to avoid more false warnings during > context checking (and have a very significant effect on > performance). It seems we are all in agreement. Chris ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Simple SSA status 2017-09-04 18:29 ` Christopher Li @ 2017-09-04 20:07 ` Luc Van Oostenryck 2017-09-04 20:37 ` Dibyendu Majumdar 2017-09-04 23:31 ` Christopher Li 0 siblings, 2 replies; 33+ messages in thread From: Luc Van Oostenryck @ 2017-09-04 20:07 UTC (permalink / raw) To: Christopher Li; +Cc: Dibyendu Majumdar, Linux-Sparse, Linus Torvalds On Mon, Sep 04, 2017 at 02:29:46PM -0400, Christopher Li wrote: > On Sun, Sep 3, 2017 at 4:26 PM, Luc Van Oostenryck > <luc.vanoostenryck@gmail.com> wrote: > >> > >> Merge it when it is ready? > > > > I've put it on hold for the moment. > > Thanks for the update. Glad to heard you back. > > > The conversion of purely local vars was the easy job. > > The real challenge is to do it for the other loads & stores > > (in simplify_symbol_usage() & simplify_memops()). > > I'm working on it and have some encouraging results: > > - correct phi-nodes > > - pass the test-suite > > - no crashes in GCC test-suite and others > > - no infinite loops > > - no catastrophic performance on exceptional workoads > > - roughly convert as much stores & loads as currently > > - good performance (within 2-5% as rc5 on some workloads) > > - in some case, surprising good effect on optimization > > That is great. Very exciting news. > > > > > I don't know yet if keeping the Simple SSA during linearization > > will be worth to keep or not. > > I have the same question as well. When I find out the Simple SSA > can't handle memory to register using pointers. It already means > we need to keep the other non local variable path alive. Well ... we've already talked about this but - identifying which memory accesses can be converted or not is orthogonal to the conversion itself - the current implementation has also two different conversions. * simplify_symbol_usage(), called early * simplify_loads(), called later, several times Both are different in what they can handle or not. The more I've looked at this, the more I realize that what *can* be done on local vars is most of the time unsuited for non-local ones and what is *needed* for non-local ones are unneeded for local ones. Doing a quick, simple conversion of all local vars can be a good thing to be able to easily, efficiently *identify* in a later pass, which other accesses can be converted or not. > We can do some bench mark to find out. If the performance > is very similar, remove the simple SSA will actually simplify > the code because we only need to care about one code path. > > >> Right now I do wish the SSSA has the two options I request > >> before. > > > > I don't remember what was the other option you asked but keeping > > both the old and the new method is not something I'm interested in. > > We know that the current method is broken. In fact, it's broken in two > > different ways: > > I have no intend to keep the broken behavior. My wish list for the two > options rephrased: > 1) Option to by pass the simpole SSA conversion which generate the > raw load/store instruction before convert memory to register. > 2) Option to do the memory to register conversion not cover by > simple SSA conversion. It sounds like you implement this already. > > > - the phi-nodes are mis-placed (they are on the use site instead of the > > meet point). > > Right. The meet point on the book also call Dominance Frontier :-) Not necessarily. The dominance frontier is where the phi-nodes are strictly needed to guarantee the SSA property. You can put some others phi-nodes, trivial or not, at some other meet points, and this will give you correct SSA too (like dumb-SSA: put a phi-node for each variable at every meet point). So no, the DF is only a subset of all the meet point. Here I was saying that some phi-node are not even put at a meet-point. For example: .L1: ... phisrc %phi1, %r1 br .L3 .L2 ... phisrc %phi2, %r2 br .L3 .L3 // phi-node should be here, it's a meet point ... br .L4 .L4 # lnop ... phi %r4 <- %phi1, %phi2 The method put the phi-node at .L4 because it was in this block that the load/use was present but it should have been put at .L3, the meet point. Often it's not a problem, it's why it hasn't been seen earlier (and probably few people took much time to look at the generated IR). It creates problems though once you: - merge some blocks, phi-conversion, try_to_simplify_bb(), ... - you have undefined vars in the way > > - each phi-nodes have as many sources as there are definitions for this > > variable (more or less) instead of having one for each parents. > > I am not sure I understand this. If you always place phi node at > DF. It will have as many source as incoming edge. That is part > of the common SSA dominance property. If we do it right, I can't > see why we still need the phi source instruction. Hehe, note how I said 'sources' and not 'phi-sources'. I just meant the argument of the phi-nodes, their operands. Note that I was saying that, currently, even when the phi-node is correctly placed, it sometimes has more operands than predecessors because the current method sortof accumulates with the operands of other dominating phi-nodes related to this variable. About the phi-sources, I agree strongly with this. To me these phi-sources are just a nuisance. Once things will be in better shape, I'll see what can be done to get rid of them (quite a bit of the logic depends on them for the moment, though). To be fair, these phi-sources have they use for the moment, but as far as I can see it's only because we don't have a one-to-one mapping between the phi-nodes operands and the block's predecessors. In most notation, articles, ... you will see phi-nodes like: phi %r4 <- [.L1:%r1, .L2:%r2] to explicitly show the correspondance between each operands and each predecessor (internally, things can be implicit, it's what I have now, in fact). > > > > I also recently discovered that the logic for aliases is broken. > > For example, code like: > > int foo(void) > > { > > int i = 1; > > int *a; > > int **p; > > > > a = &i; > > p = &a; > > *p[0] = 0; > > return i; > > } > > is mis-compiled as it always returns 1 instead of 0 > > (it fails to see that *p[0] is aliased to i). > > That is exactly why we need instruction level aliases > analyze. The SSA conversion should be done after > aliases analyze. And it's here where the egg & chicken problem begin. To efficiently do the alias analysis, you better already have IR in SSA form, having constant propagation already done, having dead code elimination already done, ... And, it's here where doing first the simple SSA can be a good thing. Currently we have: 1) linearization 2) simple alias analysis + conversion of local loads + conversion of other loads + simplification of unnneded stores (simplify_symbol_usage()) 3) simplification & CSE 4) more conversion of memory accesses (simplify_memops()) 5) goto 3) With the Simple SSA code we have: 1) linearization & conversion of local loads & stores 2) reuse or not the broken simplify_symbol_usage() to convert loads 3) simplification & CSE 4) reuse or not the broken simplify_memops() or reuse only the non-broken part of it. 5) goto 3) What we would like to have (more or less): 1) linearization 1') conversion of local vars (maybe via Simple SSA) 2) first pass of simplifications & CSE 3) alias analysis 3') conversion of memory accesses using 3) 5) more simplification & CSE 6) goto 3) This correspond to a very classical architecture. Note: here above, I use 'local var' to mean '(local) var which can safely be converted because it cannot be aliased to anything' Note: there are a lot of possible alias analyses, varying greatly in precision, complexity & run-time ressources. It's still very OK to me to have (first) something quite simple as we currently have. It's also OK to do the conversion while doing the analysis if it's advantageaous to do so (no need memory to hold the info). Note: simplification & CSE should include Dead Code Elimination and it's not clear to me what exactly we're doing about it. -- Luc ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Simple SSA status 2017-09-04 20:07 ` Luc Van Oostenryck @ 2017-09-04 20:37 ` Dibyendu Majumdar 2017-09-04 20:55 ` Luc Van Oostenryck 2017-09-04 23:31 ` Christopher Li 1 sibling, 1 reply; 33+ messages in thread From: Dibyendu Majumdar @ 2017-09-04 20:37 UTC (permalink / raw) To: Luc Van Oostenryck; +Cc: Christopher Li, Linux-Sparse, Linus Torvalds Hi Luc, On 4 September 2017 at 21:07, Luc Van Oostenryck <luc.vanoostenryck@gmail.com> wrote: > About the phi-sources, I agree strongly with this. To me these > phi-sources are just a nuisance. Once things will be in better > shape, I'll see what can be done to get rid of them (quite a bit > of the logic depends on them for the moment, though). > As you know the backend relies upon the phisrc instruction and treats it like a 'store' while treating the phi instructions as 'loads'. Selfishly I would like this to remain as it seems changing it is not necessary, but nice to have? Regards Dibyendu ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Simple SSA status 2017-09-04 20:37 ` Dibyendu Majumdar @ 2017-09-04 20:55 ` Luc Van Oostenryck 2017-09-04 21:24 ` Dibyendu Majumdar 0 siblings, 1 reply; 33+ messages in thread From: Luc Van Oostenryck @ 2017-09-04 20:55 UTC (permalink / raw) To: Dibyendu Majumdar; +Cc: Christopher Li, Linux-Sparse, Linus Torvalds On Mon, Sep 04, 2017 at 09:37:17PM +0100, Dibyendu Majumdar wrote: > Hi Luc, > > On 4 September 2017 at 21:07, Luc Van Oostenryck > <luc.vanoostenryck@gmail.com> wrote: > > > About the phi-sources, I agree strongly with this. To me these > > phi-sources are just a nuisance. Once things will be in better > > shape, I'll see what can be done to get rid of them (quite a bit > > of the logic depends on them for the moment, though). > > > > As you know the backend relies upon the phisrc instruction and treats > it like a 'store' while treating the phi instructions as 'loads'. > Selfishly I would like this to remain as it seems changing it is not > necessary, but nice to have? Well, I'm not very sure about how you use these phisrc but from what I've understood, in sparse-llvm it's done like this because LLVM's phi can't be used *because* of the problems described here above. If I remove these phisources, I would of course adapt sparse-llvm to do the right thing. And no, removing these phisources is not a nice to have. Like I said, they're a real nuisance and make a lots of things much more complicated than needed. Once the phi-nodes are correct, you can trivially recreate these phisources if needed. But the simple fact that you can do this trivially is a clear sign that you really don't *need* them. Among other things you can do the alloca/store/load as easily without them (instead of putting the store at the place of the phisrc was, you put the store at the end of each predecessor, which is where these phisrc should have been and where they are now in my code). Anyway, we're not yet at the point where it would be possible to remove them. We're even still far from it. -- Luc ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Simple SSA status 2017-09-04 20:55 ` Luc Van Oostenryck @ 2017-09-04 21:24 ` Dibyendu Majumdar 0 siblings, 0 replies; 33+ messages in thread From: Dibyendu Majumdar @ 2017-09-04 21:24 UTC (permalink / raw) To: Luc Van Oostenryck; +Cc: Christopher Li, Linux-Sparse, Linus Torvalds Hi Luc, On 4 September 2017 at 21:55, Luc Van Oostenryck <luc.vanoostenryck@gmail.com> wrote: > On Mon, Sep 04, 2017 at 09:37:17PM +0100, Dibyendu Majumdar wrote: >> On 4 September 2017 at 21:07, Luc Van Oostenryck >> <luc.vanoostenryck@gmail.com> wrote: >> >> > About the phi-sources, I agree strongly with this. To me these >> > phi-sources are just a nuisance. Once things will be in better >> > shape, I'll see what can be done to get rid of them (quite a bit >> > of the logic depends on them for the moment, though). >> > >> >> As you know the backend relies upon the phisrc instruction and treats >> it like a 'store' while treating the phi instructions as 'loads'. >> Selfishly I would like this to remain as it seems changing it is not >> necessary, but nice to have? > > Well, I'm not very sure about how you use these phisrc but from what > I've understood, in sparse-llvm it's done like this because LLVM's phi > can't be used *because* of the problems described here above. > If I remove these phisources, I would of course adapt sparse-llvm > to do the right thing. > Well I have alternative backend that does not have phi instructions. Which also makes me think now that maybe it is better to keep the introduction of phi instructions as a separate phase rather than doing this at the time of linearization. Regards Dibyendu ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Simple SSA status 2017-09-04 20:07 ` Luc Van Oostenryck 2017-09-04 20:37 ` Dibyendu Majumdar @ 2017-09-04 23:31 ` Christopher Li 2017-09-05 0:55 ` Luc Van Oostenryck 1 sibling, 1 reply; 33+ messages in thread From: Christopher Li @ 2017-09-04 23:31 UTC (permalink / raw) To: Luc Van Oostenryck; +Cc: Dibyendu Majumdar, Linux-Sparse, Linus Torvalds On Mon, Sep 4, 2017 at 4:07 PM, Luc Van Oostenryck <luc.vanoostenryck@gmail.com> wrote: >> I have the same question as well. When I find out the Simple SSA >> can't handle memory to register using pointers. It already means >> we need to keep the other non local variable path alive. > > Well ... we've already talked about this but > - identifying which memory accesses can be converted or not is > orthogonal to the conversion itself I am not sure that simple SSA can work on CFG after CFG has been fully generated. Maybe it can. > - the current implementation has also two different conversions. > * simplify_symbol_usage(), called early > * simplify_loads(), called later, several times > Both are different in what they can handle or not. I am thinking at some point we might just do a proper Cytron et al then we might be able to discard those simplify. Those simplify are ad hoc any way. > The more I've looked at this, the more I realize that what *can* > be done on local vars is most of the time unsuited for non-local > ones and what is *needed* for non-local ones are unneeded for > local ones. That is because you already buy into the simple SSA. The Cytron et al should be very similar treatment for local vs non local. They all need to find the DF then insert the phi node. Things like finding dominator tree etc will be the same for both local vs non local variable. >> Right. The meet point on the book also call Dominance Frontier :-) > > Not necessarily. The dominance frontier is where the phi-nodes are > strictly needed to guarantee the SSA property. You can put some others > phi-nodes, trivial or not, at some other meet points, and this will > give you correct SSA too (like dumb-SSA: put a phi-node for each variable > at every meet point). SSA, yes. Minimal SSA? no. I assume we want minimal SSA? > > So no, the DF is only a subset of all the meet point. > Here I was saying that some phi-node are not even put at a meet-point. > For example: > .L1: > ... > phisrc %phi1, %r1 > br .L3 > .L2 > ... > phisrc %phi2, %r2 > br .L3 > .L3 > // phi-node should be here, it's a meet point > ... > br .L4 > .L4 > # lnop ... > phi %r4 <- %phi1, %phi2 > > The method put the phi-node at .L4 because it was in this block > that the load/use was present but it should have been put at .L3, I think I give an example very similar to this before. That is where the current sparse do it wrong. > the meet point. Often it's not a problem, it's why it hasn't been It is a problem already. Because at .L4, there is no incoming edge information like .L3. At .L3 we can do phinode properly. You need to place one phi node at .L3 any way. You can't do it at L4 without placing one at L3. At L3 it is the minimal requirement. L4 is not required for minimal SSA. > seen earlier (and probably few people took much time to look at > the generated IR). It creates problems though once you: > - merge some blocks, phi-conversion, try_to_simplify_bb(), ... > - you have undefined vars in the way >> > - each phi-nodes have as many sources as there are definitions for this >> > variable (more or less) instead of having one for each parents. >> >> I am not sure I understand this. If you always place phi node at >> DF. It will have as many source as incoming edge. That is part >> of the common SSA dominance property. If we do it right, I can't >> see why we still need the phi source instruction. > > Hehe, note how I said 'sources' and not 'phi-sources'. I just meant > the argument of the phi-nodes, their operands. > > Note that I was saying that, currently, even when the phi-node > is correctly placed, it sometimes has more operands than predecessors > because the current method sortof accumulates with the operands of > other dominating phi-nodes related to this variable. Are you saying current sparse doing that or that is the right thing to do? I think currently sparse might do that. But that is wrong. It is conflict with the SSA dominance property. Every incoming edge should have one definition. It should not have more than one define per incoming edge. Do you have example? > About the phi-sources, I agree strongly with this. To me these > phi-sources are just a nuisance. Once things will be in better > shape, I'll see what can be done to get rid of them (quite a bit > of the logic depends on them for the moment, though). > > To be fair, these phi-sources have they use for the moment, but > as far as I can see it's only because we don't have a one-to-one > mapping between the phi-nodes operands and the block's predecessors. > In most notation, articles, ... you will see phi-nodes like: > phi %r4 <- [.L1:%r1, .L2:%r2] > to explicitly show the correspondance between each operands > and each predecessor (internally, things can be implicit, > it's what I have now, in fact). Seems all good then. >> That is exactly why we need instruction level aliases >> analyze. The SSA conversion should be done after >> aliases analyze. > > And it's here where the egg & chicken problem begin. > To efficiently do the alias analysis, you better already > have IR in SSA form, having constant propagation already > done, having dead code elimination already done, ... That just mean it need more than one pass. Ideally if the optimized code can be driven by incremental change then we don't need to do too many unnecessary pass. > And, it's here where doing first the simple SSA can be > a good thing. > > Currently we have: > 1) linearization > 2) simple alias analysis + conversion of local loads + > conversion of other loads + simplification of > unnneded stores (simplify_symbol_usage()) > 3) simplification & CSE > 4) more conversion of memory accesses (simplify_memops()) > 5) goto 3) > > With the Simple SSA code we have: > 1) linearization & conversion of local loads & stores > 2) reuse or not the broken simplify_symbol_usage() to > convert loads > 3) simplification & CSE > 4) reuse or not the broken simplify_memops() > or reuse only the non-broken part of it. > 5) goto 3) > > What we would like to have (more or less): > 1) linearization > 1') conversion of local vars (maybe via Simple SSA) > 2) first pass of simplifications & CSE > 3) alias analysis > 3') conversion of memory accesses using 3) > 5) more simplification & CSE > 6) goto 3) > This correspond to a very classical architecture. I think it is a good direction to go after. I also think constant propagation should be separate out from 2) and 5) because there is classic SSA algorithm to do it very effectively. > Note: here above, I use 'local var' to mean '(local) var which can > safely be converted because it cannot be aliased to anything' > Note: there are a lot of possible alias analyses, varying greatly > in precision, complexity & run-time ressources. > It's still very OK to me to have (first) something quite simple > as we currently have. Sure, it is depend on how much this fast path can save, and how complex is this fast path. Chris ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Simple SSA status 2017-09-04 23:31 ` Christopher Li @ 2017-09-05 0:55 ` Luc Van Oostenryck 2017-09-05 3:28 ` Christopher Li 0 siblings, 1 reply; 33+ messages in thread From: Luc Van Oostenryck @ 2017-09-05 0:55 UTC (permalink / raw) To: Christopher Li; +Cc: Dibyendu Majumdar, Linux-Sparse, Linus Torvalds On Mon, Sep 04, 2017 at 07:31:02PM -0400, Christopher Li wrote: > On Mon, Sep 4, 2017 at 4:07 PM, Luc Van Oostenryck > <luc.vanoostenryck@gmail.com> wrote: > >> I have the same question as well. When I find out the Simple SSA > >> can't handle memory to register using pointers. It already means > >> we need to keep the other non local variable path alive. > > > > Well ... we've already talked about this but > > - identifying which memory accesses can be converted or not is > > orthogonal to the conversion itself > > I am not sure that simple SSA can work on CFG after CFG has > been fully generated. Maybe it can. It can and it has pro & cons - pro: the CFG is know in advance, so no complication with gotos - cons: for non-goto blocks during linearization we know when all the predecessors have been already processed If done after lineariation, this info need to be somehow deduced (but it's just keeping a flag). > > - the current implementation has also two different conversions. > > * simplify_symbol_usage(), called early > > * simplify_loads(), called later, several times > > Both are different in what they can handle or not. > > I am thinking at some point we might just do a proper Cytron et al then > we might be able to discard those simplify. Those simplify are ad hoc > any way. If you look at the literature you will see two things: - either the article doesn't talk at all about which variable are processed, so the alias problem is ignored and implicitly they talk only about local vars (which address is never taken in any way) - or they explicitly say that for this or that reason SSA is only well suited for local vars (which address is never taken in any way) - only rarely is there any mention that some alias analysis must be done *before* conversion to select which variable can be converted So, if it is to do Cytron or else but only on local var, I largely prefer the Simple SSA way. And this still leave open the problem for the others variables. > > The more I've looked at this, the more I realize that what *can* > > be done on local vars is most of the time unsuited for non-local > > ones and what is *needed* for non-local ones are unneeded for > > local ones. > > That is because you already buy into the simple SSA. Not really. I'm appealing to the Simple SSA because with it we doesn't have to create those phony stores & loads for local variables and destroy them just after. Local variables are not memory, doesn't behave like memory, are not concerned by aliasing and so it's the right thing to me to not use stores & loads on them. > The Cytron > et al should be very similar treatment for local vs non local. They > all need to find the DF then insert the phi node. Things like finding > dominator tree etc will be the same for both local vs non local variable. Yes and no. When you're talking about the dominator tree it's just that, the dominance of the BBs in the CFG. What really matters for the SSA conversion is not this dominance, it's the dominance that variables definitions have on variables uses. Of course, the later depends on the former but the problem for non-local variables is not the same as the one for local variables, you need to take in account the presence of calls, stores to unrelated vars (and non-var), ... You can consider this as being part of the alias analysis if you wish but ... > >> Right. The meet point on the book also call Dominance Frontier :-) > > > > Not necessarily. The dominance frontier is where the phi-nodes are > > strictly needed to guarantee the SSA property. You can put some others > > phi-nodes, trivial or not, at some other meet points, and this will > > give you correct SSA too (like dumb-SSA: put a phi-node for each variable > > at every meet point). > > SSA, yes. Minimal SSA? no. > I assume we want minimal SSA? We certainly don't want dumb-SSA. Some superfluous phi-node in irreducible regions are not a problem. Anyway this 'minimal' SSA is only minimal in a very narrow-sense and you can have lots of unneeded phi-nodes and still be minimal-SSA. > > > > > So no, the DF is only a subset of all the meet point. > > Here I was saying that some phi-node are not even put at a meet-point. > > For example: > > .L1: > > ... > > phisrc %phi1, %r1 > > br .L3 > > .L2 > > ... > > phisrc %phi2, %r2 > > br .L3 > > .L3 > > // phi-node should be here, it's a meet point > > ... > > br .L4 > > .L4 > > # lnop ... > > phi %r4 <- %phi1, %phi2 > > > > The method put the phi-node at .L4 because it was in this block > > that the load/use was present but it should have been put at .L3, > > I think I give an example very similar to this before. That is where > the current sparse do it wrong. > > > the meet point. Often it's not a problem, it's why it hasn't been > > It is a problem already. Because at .L4, there is no incoming > edge information like .L3. At .L3 we can do phinode properly. > You need to place one phi node at .L3 any way. You can't do it at L4 > without placing one at L3. At L3 it is the minimal requirement. > L4 is not required for minimal SSA. Yes yes yes, it's wrong. What I meant by "often it's not a problem" is that currently, with the help of the phisources, often the code can deal with it and do something sensical. This doesn't make it less wrong and there is also enough situations where the code can't possibly do anything sensical with it. > > seen earlier (and probably few people took much time to look at > > the generated IR). It creates problems though once you: > > - merge some blocks, phi-conversion, try_to_simplify_bb(), ... > > - you have undefined vars in the way > > > >> > - each phi-nodes have as many sources as there are definitions for this > >> > variable (more or less) instead of having one for each parents. > >> > >> I am not sure I understand this. If you always place phi node at > >> DF. It will have as many source as incoming edge. That is part > >> of the common SSA dominance property. If we do it right, I can't > >> see why we still need the phi source instruction. > > > > Hehe, note how I said 'sources' and not 'phi-sources'. I just meant > > the argument of the phi-nodes, their operands. > > > > > > > Note that I was saying that, currently, even when the phi-node > > is correctly placed, it sometimes has more operands than predecessors > > because the current method sortof accumulates with the operands of > > other dominating phi-nodes related to this variable. > > Are you saying current sparse doing that or that is the right thing > to do? I think currently sparse might do that. But that is wrong. Yes, currently sparse do that and it's very very wrong. More wrong than the misplacement of phi-nodes. The code can never do something sensical with it. > It is conflict with the SSA dominance property. Every incoming edge > should have one definition. It should not have more than one > define per incoming edge. Currently, there is no relation between phi-nodes operands and incoming edges. The only relation that exist (and that the code need and care about) is the one between phi-nodes and their phi-sources. > Do you have example? I have lots of examples :) In a few days I'll submit a series of test cases and some will be exhibit this. A simple example is: #define TEST(N) \ do { \ d = b + a[N]; \ if (d < b) \ c++; \ b = d; \ } while (0) int foo(int *a, int b, int c) { int d; TEST(0); TEST(1); TEST(2); return d + c; } Where you will have 3 phi-nodes (OK, there is 3 ifs), the first with 2 sources (OK), the second with 3 and the third with 4 sources while all nodes have 2 parents or less. Even more interestingly, if you add another TEST(x), you will have another phi-node with ... 5 sources and so one. One more source for each TEST(x) invocation, so yes this give a nice quadratic processing time and memory consumption. > > About the phi-sources, I agree strongly with this. To me these > > phi-sources are just a nuisance. Once things will be in better > > shape, I'll see what can be done to get rid of them (quite a bit > > of the logic depends on them for the moment, though). > > > > To be fair, these phi-sources have they use for the moment, but > > as far as I can see it's only because we don't have a one-to-one > > mapping between the phi-nodes operands and the block's predecessors. > > In most notation, articles, ... you will see phi-nodes like: > > phi %r4 <- [.L1:%r1, .L2:%r2] > > to explicitly show the correspondance between each operands > > and each predecessor (internally, things can be implicit, > > it's what I have now, in fact). > > Seems all good then. > > > >> That is exactly why we need instruction level aliases > >> analyze. The SSA conversion should be done after > >> aliases analyze. > > > > And it's here where the egg & chicken problem begin. > > To efficiently do the alias analysis, you better already > > have IR in SSA form, having constant propagation already > > done, having dead code elimination already done, ... > > That just mean it need more than one pass. Ideally if > the optimized code can be driven by incremental change > then we don't need to do too many unnecessary pass. > > > And, it's here where doing first the simple SSA can be > > a good thing. > > > > Currently we have: > > 1) linearization > > 2) simple alias analysis + conversion of local loads + > > conversion of other loads + simplification of > > unnneded stores (simplify_symbol_usage()) > > 3) simplification & CSE > > 4) more conversion of memory accesses (simplify_memops()) > > 5) goto 3) > > > > With the Simple SSA code we have: > > 1) linearization & conversion of local loads & stores > > 2) reuse or not the broken simplify_symbol_usage() to > > convert loads > > 3) simplification & CSE > > 4) reuse or not the broken simplify_memops() > > or reuse only the non-broken part of it. > > 5) goto 3) > > > > What we would like to have (more or less): > > 1) linearization > > 1') conversion of local vars (maybe via Simple SSA) > > 2) first pass of simplifications & CSE > > 3) alias analysis > > 3') conversion of memory accesses using 3) > > 5) more simplification & CSE > > 6) goto 3) > > This correspond to a very classical architecture. > > I think it is a good direction to go after. I also think constant > propagation should be separate out from 2) and 5) > because there is classic SSA algorithm to do it very > effectively. It was implicitly included in 'simplification' which can contains all sort of optimization. And yes, sure, we want SCCP for this. > > Note: here above, I use 'local var' to mean '(local) var which can > > safely be converted because it cannot be aliased to anything' > > Note: there are a lot of possible alias analyses, varying greatly > > in precision, complexity & run-time ressources. > > It's still very OK to me to have (first) something quite simple > > as we currently have. > > Sure, it is depend on how much this fast path can save, and how > complex is this fast path. Sure. -- Luc ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Simple SSA status 2017-09-05 0:55 ` Luc Van Oostenryck @ 2017-09-05 3:28 ` Christopher Li 2017-09-07 2:03 ` Luc Van Oostenryck 0 siblings, 1 reply; 33+ messages in thread From: Christopher Li @ 2017-09-05 3:28 UTC (permalink / raw) To: Luc Van Oostenryck; +Cc: Dibyendu Majumdar, Linux-Sparse, Linus Torvalds On Mon, Sep 4, 2017 at 8:55 PM, Luc Van Oostenryck <luc.vanoostenryck@gmail.com> wrote: > > It can and it has pro & cons > - pro: the CFG is know in advance, so no complication with gotos > - cons: for non-goto blocks during linearization we know when > all the predecessors have been already processed > If done after lineariation, this info need to be somehow > deduced (but it's just keeping a flag). For non-goto blocks the dominator tree is easy. >> >> I am thinking at some point we might just do a proper Cytron et al then >> we might be able to discard those simplify. Those simplify are ad hoc >> any way. > > If you look at the literature you will see two things: > - either the article doesn't talk at all about which variable are > processed, so the alias problem is ignored and implicitly they > talk only about local vars (which address is never taken in any way) > - or they explicitly say that for this or that reason SSA is only > well suited for local vars (which address is never taken in any way) > - only rarely is there any mention that some alias analysis must be > done *before* conversion to select which variable can be converted > > So, if it is to do Cytron or else but only on local var, I largely > prefer the Simple SSA way. And this still leave open the problem > for the others variables. I mean do the Cytron et al for all memory store/load, not just limit to local variable. Yes, it will need the help of alias analyze. > Of course, the later depends on the former but the problem for non-local > variables is not the same as the one for local variables, you need to take > in account the presence of calls, stores to unrelated vars (and non-var), > ... You can consider this as being part of the alias analysis if you > wish but ... Local variable can have address taken and pass around as well. If you only look at local variable has never been taken address, yes, that is different than the rest. But that different is largely due to how you select the set as well. If you consider the more general case, local variable can have address taken, it is just like other memory storage. > Yes yes yes, it's wrong. What I meant by "often it's not a problem" > is that currently, with the help of the phisources, often the code > can deal with it and do something sensical. > This doesn't make it less wrong and there is also enough situations > where the code can't possibly do anything sensical with it. Yes, I think we have the same view on this. >> It is conflict with the SSA dominance property. Every incoming edge >> should have one definition. It should not have more than one >> define per incoming edge. > > Currently, there is no relation between phi-nodes operands and > incoming edges. The only relation that exist (and that the code > need and care about) is the one between phi-nodes and their > phi-sources. > >> Do you have example? > > I have lots of examples :) > In a few days I'll submit a series of test cases and some will > be exhibit this. A simple example is: > #define TEST(N) \ > do { \ > d = b + a[N]; \ > if (d < b) \ > c++; \ > b = d; \ > } while (0) > > int foo(int *a, int b, int c) > { > int d; > > TEST(0); > TEST(1); > TEST(2); > > return d + c; > } > > Where you will have 3 phi-nodes (OK, there is 3 ifs), the first with > 2 sources (OK), the second with 3 and the third with 4 sources while > all nodes have 2 parents or less. > Even more interestingly, if you add another TEST(x), you will have > another phi-node with ... 5 sources and so one. One more source for > each TEST(x) invocation, so yes this give a nice quadratic processing > time and memory consumption. Yes, we are talking about how things done wrong right now, Not how things should be. Chris ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Simple SSA status 2017-09-05 3:28 ` Christopher Li @ 2017-09-07 2:03 ` Luc Van Oostenryck 2017-09-07 2:15 ` Linus Torvalds 2017-09-07 2:20 ` Christopher Li 0 siblings, 2 replies; 33+ messages in thread From: Luc Van Oostenryck @ 2017-09-07 2:03 UTC (permalink / raw) To: Christopher Li; +Cc: Dibyendu Majumdar, Linux-Sparse, Linus Torvalds On Mon, Sep 04, 2017 at 11:28:07PM -0400, Christopher Li wrote: > On Mon, Sep 4, 2017 at 8:55 PM, Luc Van Oostenryck wrote: > > > > It can and it has pro & cons > > - pro: the CFG is know in advance, so no complication with gotos > > - cons: for non-goto blocks during linearization we know when > > all the predecessors have been already processed > > If done after lineariation, this info need to be somehow > > deduced (but it's just keeping a flag). > > For non-goto blocks the dominator tree is easy. Mmmm, I think you're confused. The dominator tree is a function of the whole graph, not just some blocks. Also, after linearization, there is no more for(), while() & do-while() or gotos: all CFG edges are alike. What was written here above is that, *during linearization*, when you have a branch from a structured-programming statement, you know what are all the possible parents of the current block. For branches coming from goto, it's not the case. This has nothing to do with the dominance tree as such. What is trivial about the dominator tree is when it is done on an acyclic graph (you just have do to a DFS and voilà) but that's independent of the presence or absence of gotos. [Of course, the presence of gotos *can* create irreducible graphs and for those calculating their dominance tree *can* take a bit more time, but that's all.] > > Of course, the later depends on the former but the problem for non-local > > variables is not the same as the one for local variables, you need to take > > in account the presence of calls, stores to unrelated vars (and non-var), > > ... You can consider this as being part of the alias analysis if you > > wish but ... > > Local variable can have address taken and pass around as well. I suppose you hadn't seen the note I added: > > Note: here above, I use 'local var' to mean '(local) var which can > > safely be converted because it cannot be aliased to anything' This, of course, excludes variables which had their address (explicitly) taken but also things like: - arrays - structures & unions - global variables - EQUIVALENCE declarations in fortran - variables with alias attribute (as gcc extension) - ... -- Luc ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Simple SSA status 2017-09-07 2:03 ` Luc Van Oostenryck @ 2017-09-07 2:15 ` Linus Torvalds 2017-09-07 2:55 ` Christopher Li 2017-09-07 3:05 ` Luc Van Oostenryck 2017-09-07 2:20 ` Christopher Li 1 sibling, 2 replies; 33+ messages in thread From: Linus Torvalds @ 2017-09-07 2:15 UTC (permalink / raw) To: Luc Van Oostenryck; +Cc: Christopher Li, Dibyendu Majumdar, Linux-Sparse On Wed, Sep 6, 2017 at 7:03 PM, Luc Van Oostenryck <luc.vanoostenryck@gmail.com> wrote: > > What was written here above is that, *during linearization*, > when you have a branch from a structured-programming statement, > you know what are all the possible parents of the current block. > For branches coming from goto, it's not the case. Even this is not the case. Not for a C compiler, it isn't. Sure, if you do compilers for toy languages, it might be, but a C compiler has things like "switch()" statements that may not nest with loops etc. So even without a goto, you can pretty much have arbitrary flow. So I think people should entirely stop thinking that "goto" is somehow special. It isn't. It is only special in toy languages like Pascal, not in the real world. It's much better to get rid of that "goto is special" mindset, and instead say "all control flow is just a goto in the end". So yes, get rid of the *real* special cases - those structured flow constructs. They are nice special cases, but they are nice for *humans*, they aren't actually relevant to computers, and they shouldn't be. So the real special case is those simple "structured" loops. You should never design around them, because it will just hurt later. Turn those "structures loop" constructs into goto's as quickly as you can, and just forget about the structure that they really never enforced anyway. Linus ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Simple SSA status 2017-09-07 2:15 ` Linus Torvalds @ 2017-09-07 2:55 ` Christopher Li 2017-09-07 3:17 ` Luc Van Oostenryck 2017-09-07 3:05 ` Luc Van Oostenryck 1 sibling, 1 reply; 33+ messages in thread From: Christopher Li @ 2017-09-07 2:55 UTC (permalink / raw) To: Linus Torvalds; +Cc: Luc Van Oostenryck, Dibyendu Majumdar, Linux-Sparse On Wed, Sep 6, 2017 at 10:15 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > Even this is not the case. > > Not for a C compiler, it isn't. > > Sure, if you do compilers for toy languages, it might be, but a C > compiler has things like "switch()" statements that may not nest with > loops etc. So even without a goto, you can pretty much have arbitrary > flow. I am trying to think if without goto statement, can we create irreducible graph. The for reducible graph, finding dominator has fast path. On way to look at the Simple SSA paper is that, it is taking advanctage of that fast path. The slow path is actually worse than other algorithm. I originally thinking we have to use goto to create irreducible graph. Now you mention it. You are right, with creatively mixing of switch statement and while loop. Some one can create irreducible graph without goto. > So I think people should entirely stop thinking that "goto" is somehow > special. It isn't. It is only special in toy languages like Pascal, > not in the real world. Goto can introduce the irreducible graph. That is the only special part. The compiler of course need to handle all kind of CFG. > It's much better to get rid of that "goto is special" mindset, and > instead say "all control flow is just a goto in the end". > > So yes, get rid of the *real* special cases - those structured flow > constructs. They are nice special cases, but they are nice for > *humans*, they aren't actually relevant to computers, and they > shouldn't be. The only relevant part to computer is weather a fast path exist or not. > So the real special case is those simple "structured" loops. You > should never design around them, because it will just hurt later. In a sense that is why I like the Cytron el al for the SSA conversion over Simple SSA. Cytron el al does not have special case for non structured CFG. If the CFG is reducible, the algorithm will run faster naturally. Chris ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Simple SSA status 2017-09-07 2:55 ` Christopher Li @ 2017-09-07 3:17 ` Luc Van Oostenryck 2017-09-07 4:04 ` Christopher Li 0 siblings, 1 reply; 33+ messages in thread From: Luc Van Oostenryck @ 2017-09-07 3:17 UTC (permalink / raw) To: Christopher Li; +Cc: Linus Torvalds, Dibyendu Majumdar, Linux-Sparse On Wed, Sep 06, 2017 at 10:55:43PM -0400, Christopher Li wrote: > On Wed, Sep 6, 2017 at 10:15 PM, Linus Torvalds > <torvalds@linux-foundation.org> wrote: > > Even this is not the case. > > > > Not for a C compiler, it isn't. > > > > Sure, if you do compilers for toy languages, it might be, but a C > > compiler has things like "switch()" statements that may not nest with > > loops etc. So even without a goto, you can pretty much have arbitrary > > flow. > > I am trying to think if without goto statement, can we create irreducible > graph. The for reducible graph, finding dominator has fast path. Bah ... Depending on the algorithm you use, irreducible graphs may indeed be a bit faster to calculate but since you don't know in advance if your graph is reducible or not, you can't take advantage of that. So, at least to me, this doesn't make it a fast path. > On way to look at the Simple SSA paper is that, it is taking advanctage > of that fast path. No. It takes advantage at every point where it is known that all the parents have already been processed. That's all. The part where reducibility matter is after the conversion is done, if you whish to simplify what they call trivial phi-nodes. And again, since you don't know in advance if your graph is reducible or not, there is no fast path. -- Luc ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Simple SSA status 2017-09-07 3:17 ` Luc Van Oostenryck @ 2017-09-07 4:04 ` Christopher Li 2017-09-07 4:49 ` Luc Van Oostenryck 0 siblings, 1 reply; 33+ messages in thread From: Christopher Li @ 2017-09-07 4:04 UTC (permalink / raw) To: Luc Van Oostenryck; +Cc: Linus Torvalds, Dibyendu Majumdar, Linux-Sparse On Wed, Sep 6, 2017 at 11:17 PM, Luc Van Oostenryck <luc.vanoostenryck@gmail.com> wrote: > > Bah ... > Depending on the algorithm you use, irreducible graphs may indeed be a bit > faster to calculate but since you don't know in advance if your graph is ^^^^^^^^ You mean slower here. > reducible or not, you can't take advantage of that. So, at least to me, > this doesn't make it a fast path. Well, if the graph is not reducible, it will have a extra step to get get minimal SSA, that is what I consider the slow path. For Cytron et al, the slowest part is finding dominator tree. If using the Keith Cooper paper, reducible graph will converge faster naturally. You don't need to know it is reducible or not in advance to benefit from it. > >> On way to look at the Simple SSA paper is that, it is taking advanctage >> of that fast path. > > No. It takes advantage at every point where it is known that all the parents > have already been processed. That's all. That is only half of the job. To get minimal SSA there is extra step to remove duplicate phi nodes. That is quadratic if I recall correctly. > The part where reducibility matter is after the conversion is done, if > you whish to simplify what they call trivial phi-nodes. And again, since > you don't know in advance if your graph is reducible or not, there is no > fast path. So you are saying in order to get minimal SSA, you need to run do the remove extra phi node even if it is already minimal? Because we don't know it is reducible or not. Chris ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Simple SSA status 2017-09-07 4:04 ` Christopher Li @ 2017-09-07 4:49 ` Luc Van Oostenryck 2017-09-07 6:17 ` Christopher Li 0 siblings, 1 reply; 33+ messages in thread From: Luc Van Oostenryck @ 2017-09-07 4:49 UTC (permalink / raw) To: Christopher Li; +Cc: Linus Torvalds, Dibyendu Majumdar, Linux-Sparse On Thu, Sep 7, 2017 at 6:04 AM, Christopher Li <sparse@chrisli.org> wrote: > On Wed, Sep 6, 2017 at 11:17 PM, Luc Van Oostenryck > <luc.vanoostenryck@gmail.com> wrote: >> >> Bah ... >> Depending on the algorithm you use, irreducible graphs may indeed be a bit >> faster to calculate but since you don't know in advance if your graph is > ^^^^^^^^ > You mean slower here. Of course. >> reducible or not, you can't take advantage of that. So, at least to me, >> this doesn't make it a fast path. > > Well, if the graph is not reducible, it will have a extra step to get > get minimal > SSA, that is what I consider the slow path. > > For Cytron et al, the slowest part is finding dominator tree. > If using the Keith Cooper paper, reducible graph will converge > faster naturally. You don't need to know it is reducible or not in > advance to benefit from it. OK, it's just faster on reducible graphs. >> >>> On way to look at the Simple SSA paper is that, it is taking advanctage >>> of that fast path. >> >> No. It takes advantage at every point where it is known that all the parents >> have already been processed. That's all. > > That is only half of the job. No, it's the main job. > To get minimal SSA there is extra step > to remove duplicate phi nodes. That is quadratic if I recall correctly. Quadratic in what? If you have plenty of phi-nodes with plenty of operands, you will have more work to do, irrespectively of the size of the graph and its reducibility. Also, even if the reducibility is a property of the whole graph, it's not really what matters here because each phi-chain is to be taken independently. In other words, what matters here is if the *subgraph* involved in the phi-chain is reducible or not. So even if the graph is irreducible, you can have 99% of the phi as if the graph was reducible. Anyway, this minimal SSA and irreducibility is a nice theoretical question you seem a bit obsessed with. I absolutely don't care about because in practice it won't make any noticeable difference. OTOH, there is a huge number of things that can be done on sparse which *make* a lot of differences. >> The part where reducibility matter is after the conversion is done, if >> you whish to simplify what they call trivial phi-nodes. And again, since >> you don't know in advance if your graph is reducible or not, there is no >> fast path. > > So you are saying in order to get minimal SSA, you need to run do the > remove extra phi node even if it is already minimal? Because we don't > know it is reducible or not. Yes and no. There are a lot of cases where you can simplify away the phi-nodes by simple ways. For the remaining ones, yes, most probably. [Warning, I'm only talking here about the situation with Simple SSA, with the classic methods you have minimal SSA almost by definition but again 'minimal SSA' is 'minimal' only in this very narrow sense and it doesn't mean that no phi-nodes can be removed. For example, I'm pretty sure that what the "Simple SSA" article call 'trivial phi' *can* also happen even in minimal SSA]. -- Luc ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Simple SSA status 2017-09-07 4:49 ` Luc Van Oostenryck @ 2017-09-07 6:17 ` Christopher Li 2017-09-07 7:38 ` Luc Van Oostenryck 0 siblings, 1 reply; 33+ messages in thread From: Christopher Li @ 2017-09-07 6:17 UTC (permalink / raw) To: Luc Van Oostenryck; +Cc: Linus Torvalds, Dibyendu Majumdar, Linux-Sparse On Thu, Sep 7, 2017 at 12:49 AM, Luc Van Oostenryck <luc.vanoostenryck@gmail.com> wrote: >> >> That is only half of the job. > > No, it's the main job. From the time complexity point of the view, it is the not the main job. The later part is. >> To get minimal SSA there is extra step >> to remove duplicate phi nodes. That is quadratic if I recall correctly. > > Quadratic in what? Simple SSA paper 4.1 Time Complexity. O(P + B(B +E)*V*V). that is O( B*B*V*V + B*E*V*V), where B is basic block, E is edge, V is variables. That is why I consider it quadratic. The first half or the part you consider the main job is not nearly as bad. > If you have plenty of phi-nodes with plenty of operands, you will have > more work to do, irrespectively of the size of the graph and its reducibility. The Keith et al paper is related to reducible. The simple SSA in the O() form is not relate to reducible. > Also, even if the reducibility is a property of the whole graph, it's not really > what matters here because each phi-chain is to be taken independently. > In other words, what matters here is if the *subgraph* involved in the > phi-chain is reducible or not. So even if the graph is irreducible, you can > have 99% of the phi as if the graph was reducible. > > Anyway, this minimal SSA and irreducibility is a nice theoretical question > you seem a bit obsessed with. I absolutely don't care about because in Minimal SSA is actually nice to have. Not minimal SSA will make the optimization run less effectively. Cytron el al give minimal form. > practice it won't make any noticeable difference. I think it will have some difference there in terms of optimizations. If we implement Cytron et al in the future, we can compare the result. > OTOH, there is a huge number of things that can be done on sparse > which *make* a lot of differences. Sure, that is a separate patch. > [Warning, I'm only talking here about the situation with Simple SSA, > with the classic methods you have minimal SSA almost by definition but > again 'minimal SSA' is 'minimal' only in this very narrow sense and it doesn't > mean that no phi-nodes can be removed. For example, I'm pretty > sure that what the "Simple SSA" article call 'trivial phi' *can* also > happen even in minimal SSA]. I think trivial phi can't happen in minimal SSA. If you place phi node in DF, it will need to have more than one parent and more than one define. OTOH, minimal SSA can have *dead* phi nodes. Liveness is independent from the SSA form. Chris ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Simple SSA status 2017-09-07 6:17 ` Christopher Li @ 2017-09-07 7:38 ` Luc Van Oostenryck 0 siblings, 0 replies; 33+ messages in thread From: Luc Van Oostenryck @ 2017-09-07 7:38 UTC (permalink / raw) To: Christopher Li; +Cc: Linus Torvalds, Dibyendu Majumdar, Linux-Sparse On Thu, Sep 7, 2017 at 8:17 AM, Christopher Li <sparse@chrisli.org> wrote: > On Thu, Sep 7, 2017 at 12:49 AM, Luc Van Oostenryck > <luc.vanoostenryck@gmail.com> wrote: > > > Minimal SSA is actually nice to have. Not minimal SSA will make the > optimization run less effectively. Cytron el al give minimal form. > >> practice it won't make any noticeable difference. > > I think it will have some difference there in terms of optimizations. > If we implement Cytron et al in the future, we can compare the result. That's the difference between 'in practice' and '(in theory) it will have some'. In practice, it will, of course, make a big difference *if* what you have is not at all minimal, like dumb-SSA. But if 99% of your phis are in fact needed and only 1% are part of the non-minimality, you won't see any *significant* differences. It's not as if the presence of a single superfluous phi-node will totally collapse your performance. In practice, it also won't make any significative difference because *even* if you use gotos, this won't make automatically your graph irreducible. To make your graph irreducible you need to have a goto that jump *into* a loop and normally it's not something people write (think about it, when it could make sense?). So in practice, even with gotos, you will rarely have irreducible graphs and so even without implementing the full phi-node simplification, you will have minimal SSA. In practice, it also won't make a difference because the difference in complexity will only show up if you have a very large number of nodes and in this case you have plenty of other problems to worry about (like memory consumption). >> [Warning, I'm only talking here about the situation with Simple SSA, >> with the classic methods you have minimal SSA almost by definition but >> again 'minimal SSA' is 'minimal' only in this very narrow sense and it doesn't >> mean that no phi-nodes can be removed. For example, I'm pretty >> sure that what the "Simple SSA" article call 'trivial phi' *can* also >> happen even in minimal SSA]. > > I think trivial phi can't happen in minimal SSA. If you place phi node > in DF, it will need to have more than one parent and more than one > define. Thing is that these trivial/redundant phis (as defined in the SSSA article) can also happen when phi-nodes are placed at the DF. They happen naturally in some loops. In the article, when talking about these, they explicitly say "This implies a definition of minimality [that is independent of the source program and] *more strict* than the definition by Cytron et al." So it's pretty clear that even with minimal SSA (Cytron's minimality) you can have those redundant phis. In other words, if you eliminate all redundant phi-nodes as described in SSSA not only you will not have more phi-nodes than Cytron but you *can*, in some case, have less! Anyway, the advantages of the SSSA is not here. -- Luc ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Simple SSA status 2017-09-07 2:15 ` Linus Torvalds 2017-09-07 2:55 ` Christopher Li @ 2017-09-07 3:05 ` Luc Van Oostenryck 1 sibling, 0 replies; 33+ messages in thread From: Luc Van Oostenryck @ 2017-09-07 3:05 UTC (permalink / raw) To: Linus Torvalds; +Cc: Christopher Li, Dibyendu Majumdar, Linux-Sparse On Wed, Sep 06, 2017 at 07:15:13PM -0700, Linus Torvalds wrote: > On Wed, Sep 6, 2017 at 7:03 PM, Luc Van Oostenryck > <luc.vanoostenryck@gmail.com> wrote: > > > > What was written here above is that, *during linearization*, > > when you have a branch from a structured-programming statement, > > you know what are all the possible parents of the current block. > > For branches coming from goto, it's not the case. > > Even this is not the case. > > Not for a C compiler, it isn't. > > Sure, if you do compilers for toy languages, it might be, but a C > compiler has things like "switch()" statements that may not nest with > loops etc. So even without a goto, you can pretty much have arbitrary > flow. Yes, sure. But what I wrote still hold because, even for such cases, the linearization create trivial and useless BBs and for each of those you know where the possible parents are and when their linearization is finished while doing the linearization of each strutured construct of the AST. For gotos & computed gotos, it's not the case. It doesn't matter in itself, but it matters for the Simple SSA method because the processing of BBs for which you don't know yet all the parents must be delayed (it's the price you have to pay for being able to do things mostly on the fly). This delaying happens everywhere you have some kind of back-edge but knowing when all parents are done means that you also know when you can 'un-delay' things (the article call this 'unseal') and it's advantageous do do that as soon as possible. So for this Simple SSA method, gotos are kinda annoying and computed gotos even more so. Anyway, I've put this method on hold for the moment, focusing on the conversion of non-local variables (which Simple SSA can't do) and several related issues. > So I think people should entirely stop thinking that "goto" is somehow > special. It isn't. It is only special in toy languages like Pascal, > not in the real world. > > It's much better to get rid of that "goto is special" mindset, and > instead say "all control flow is just a goto in the end". > > So yes, get rid of the *real* special cases - those structured flow > constructs. They are nice special cases, but they are nice for > *humans*, they aren't actually relevant to computers, and they > shouldn't be. > > So the real special case is those simple "structured" loops. You > should never design around them, because it will just hurt later. > > Turn those "structures loop" constructs into goto's as quickly as you > can, and just forget about the structure that they really never > enforced anyway. Totally agree and it's what's done (I suppose by you and Chris, years ago) and will stay as such: directly after linearization you just have BBs with the list of parents & children. -- Luc Van Oostenryck ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Simple SSA status 2017-09-07 2:03 ` Luc Van Oostenryck 2017-09-07 2:15 ` Linus Torvalds @ 2017-09-07 2:20 ` Christopher Li 2017-09-07 3:46 ` Luc Van Oostenryck 1 sibling, 1 reply; 33+ messages in thread From: Christopher Li @ 2017-09-07 2:20 UTC (permalink / raw) To: Luc Van Oostenryck; +Cc: Dibyendu Majumdar, Linux-Sparse, Linus Torvalds On Wed, Sep 6, 2017 at 10:03 PM, Luc Van Oostenryck <luc.vanoostenryck@gmail.com> wrote: > > Mmmm, I think you're confused. > The dominator tree is a function of the whole graph, not just > some blocks. Also, after linearization, there is no more for(), > while() & do-while() or gotos: all CFG edges are alike. I am not confused. I might jump my sentence too fast. > What was written here above is that, *during linearization*, > when you have a branch from a structured-programming statement, > you know what are all the possible parents of the current block. > For branches coming from goto, it's not the case. > This has nothing to do with the dominance tree as such. I am referring to the "A Simple, Fast Dominance Algorithm" If there is no goto. The function CFG is reducible graph. For reducible graph that algorithm converge in constant iteration. So finding the dominator tree is pretty much linear if the graph is reducible. > [Of course, the presence of gotos *can* create irreducible graphs > and for those calculating their dominance tree *can* take a bit > more time, but that's all.] Right, the irreducible graphs is the trouble maker account for those nasty worse case performance. Chris ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Simple SSA status 2017-09-07 2:20 ` Christopher Li @ 2017-09-07 3:46 ` Luc Van Oostenryck 2017-09-07 4:02 ` Christopher Li 0 siblings, 1 reply; 33+ messages in thread From: Luc Van Oostenryck @ 2017-09-07 3:46 UTC (permalink / raw) To: Christopher Li; +Cc: Dibyendu Majumdar, Linux-Sparse, Linus Torvalds On Thu, Sep 7, 2017 at 4:20 AM, Christopher Li <sparse@chrisli.org> wrote: > > I am referring to the "A Simple, Fast Dominance Algorithm" > ... > For reducible graph that algorithm converge in constant iteration. Where the 'constant' is a function of the loop complexity (maximum nesting level) ... -- Luc Van Oostenryck ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Simple SSA status 2017-09-07 3:46 ` Luc Van Oostenryck @ 2017-09-07 4:02 ` Christopher Li 2017-09-07 4:24 ` Luc Van Oostenryck 0 siblings, 1 reply; 33+ messages in thread From: Christopher Li @ 2017-09-07 4:02 UTC (permalink / raw) To: Luc Van Oostenryck; +Cc: Dibyendu Majumdar, Linux-Sparse, Linus Torvalds On Wed, Sep 6, 2017 at 11:46 PM, Luc Van Oostenryck <luc.vanoostenryck@gmail.com> wrote: >> For reducible graph that algorithm converge in constant iteration. > > Where the 'constant' is a function of the loop complexity (maximum > nesting level) ... I think we are talking different things. My iteration means one pass of the reverse postorder. In the Keith Cooper et al paper end of first page: quote " show that reverse postorder iterative scheme solves these equations in a single pass over the CFG for reducible graphs. " page 2 reference 19 also mention solved in linear time on reducible graphs. It is just linear to the size of the CFG if graph is reducible. I don't recall it is related to maximum nesting level of loops. For the Simple SSA paper, maybe. Chris ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Simple SSA status 2017-09-07 4:02 ` Christopher Li @ 2017-09-07 4:24 ` Luc Van Oostenryck 2017-09-07 4:33 ` Christopher Li 0 siblings, 1 reply; 33+ messages in thread From: Luc Van Oostenryck @ 2017-09-07 4:24 UTC (permalink / raw) To: Christopher Li; +Cc: Dibyendu Majumdar, Linux-Sparse, Linus Torvalds On Thu, Sep 7, 2017 at 6:02 AM, Christopher Li <sparse@chrisli.org> wrote: > On Wed, Sep 6, 2017 at 11:46 PM, Luc Van Oostenryck > <luc.vanoostenryck@gmail.com> wrote: > >>> For reducible graph that algorithm converge in constant iteration. >> >> Where the 'constant' is a function of the loop complexity (maximum >> nesting level) ... > > I think we are talking different things. Since you're talking performance and complexity, I suggest that you read carefully the section "Complexity Analysis" of the article and look where it's written 'd(G)' and 'loop connectedness'. -- Luc ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Simple SSA status 2017-09-07 4:24 ` Luc Van Oostenryck @ 2017-09-07 4:33 ` Christopher Li 0 siblings, 0 replies; 33+ messages in thread From: Christopher Li @ 2017-09-07 4:33 UTC (permalink / raw) To: Luc Van Oostenryck; +Cc: Dibyendu Majumdar, Linux-Sparse, Linus Torvalds On Thu, Sep 7, 2017 at 12:24 AM, Luc Van Oostenryck <luc.vanoostenryck@gmail.com> wrote: > On Thu, Sep 7, 2017 at 6:02 AM, Christopher Li <sparse@chrisli.org> wrote: >> On Wed, Sep 6, 2017 at 11:46 PM, Luc Van Oostenryck >> <luc.vanoostenryck@gmail.com> wrote: >> >>>> For reducible graph that algorithm converge in constant iteration. My statement is regarding *reducible* graph. >>> >>> Where the 'constant' is a function of the loop complexity (maximum >>> nesting level) ... >> >> I think we are talking different things. > > Since you're talking performance and complexity, I suggest that you read > carefully the section "Complexity Analysis" of the article and look where > it's written 'd(G)' and 'loop connectedness'. I assume you are talking page 7. The 'd(G)' and the "loop contentedness" is actually referring to the *irreducible" graph. The Figure 4 is classic *irreducible* graph. That is why I feel we are talking about different things. Chris ^ permalink raw reply [flat|nested] 33+ messages in thread
end of thread, other threads:[~2017-09-07 7:38 UTC | newest] Thread overview: 33+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-08-21 19:39 master merge plans Christopher Li 2017-08-22 13:32 ` Luc Van Oostenryck 2017-08-22 14:47 ` Christopher Li 2017-08-22 15:51 ` Christopher Li 2017-08-22 20:05 ` Luc Van Oostenryck 2017-08-23 20:50 ` Christopher Li 2017-08-29 11:27 ` Christopher Li 2017-09-03 19:24 ` Luc Van Oostenryck 2017-08-22 14:53 ` Dibyendu Majumdar 2017-08-22 14:59 ` Christopher Li 2017-09-03 20:26 ` Simple SSA status Luc Van Oostenryck 2017-09-04 18:29 ` Christopher Li 2017-09-04 20:07 ` Luc Van Oostenryck 2017-09-04 20:37 ` Dibyendu Majumdar 2017-09-04 20:55 ` Luc Van Oostenryck 2017-09-04 21:24 ` Dibyendu Majumdar 2017-09-04 23:31 ` Christopher Li 2017-09-05 0:55 ` Luc Van Oostenryck 2017-09-05 3:28 ` Christopher Li 2017-09-07 2:03 ` Luc Van Oostenryck 2017-09-07 2:15 ` Linus Torvalds 2017-09-07 2:55 ` Christopher Li 2017-09-07 3:17 ` Luc Van Oostenryck 2017-09-07 4:04 ` Christopher Li 2017-09-07 4:49 ` Luc Van Oostenryck 2017-09-07 6:17 ` Christopher Li 2017-09-07 7:38 ` Luc Van Oostenryck 2017-09-07 3:05 ` Luc Van Oostenryck 2017-09-07 2:20 ` Christopher Li 2017-09-07 3:46 ` Luc Van Oostenryck 2017-09-07 4:02 ` Christopher Li 2017-09-07 4:24 ` Luc Van Oostenryck 2017-09-07 4:33 ` Christopher Li
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).