linux-sparse.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* RFC: ptrlist insert during iteration
@ 2017-08-10  3:56 Chris Li
  2017-08-10 23:28 ` Luc Van Oostenryck
  0 siblings, 1 reply; 4+ messages in thread
From: Chris Li @ 2017-08-10  3:56 UTC (permalink / raw)
  To: Linux-Sparse; +Cc: Luc Van Oostenryck, Dibyendu Majumdar

I have give this a little bit more thought.
Find more bugs in the previous proposal.

Data structure:
occupier mask:
     indicate which ptrlist entry is valid.
insert mask:
     Indicate which ptrlist entry is new insert during current loop.

I find out for each ptr iteration. The insert mask need to restart
with fresh zero.
e.g. if the insert mask start with 00010B, and after the current
ptr iteration, the insert mask become 00110B. The loop iterator
has no way to tell if the new bit was insert before or after the
original bit.

That is the reason insert mask will need to start fresh zero
for each pointer iteration.

If there is nest loops, the inner loop will need to maintain and
merge the insert mask for the outer loop.

It need some thing similar to the ptrlist refcount to do some
save/merge on the insert mask context when iterator
enter/exit the ptrlist node.

It is pretty complicate, but do able if we really want to do it.

Chris

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: RFC: ptrlist insert during iteration
  2017-08-10  3:56 RFC: ptrlist insert during iteration Chris Li
@ 2017-08-10 23:28 ` Luc Van Oostenryck
  2017-08-11  0:27   ` Chris Li
  0 siblings, 1 reply; 4+ messages in thread
From: Luc Van Oostenryck @ 2017-08-10 23:28 UTC (permalink / raw)
  To: Chris Li; +Cc: Linux-Sparse, Dibyendu Majumdar

On Wed, Aug 09, 2017 at 11:56:48PM -0400, Chris Li wrote:
> I have give this a little bit more thought.
> Find more bugs in the previous proposal.
> 
> Data structure:
> occupier mask:
>      indicate which ptrlist entry is valid.
> insert mask:
>      Indicate which ptrlist entry is new insert during current loop.
> 
> I find out for each ptr iteration. The insert mask need to restart
> with fresh zero.
> e.g. if the insert mask start with 00010B, and after the current
> ptr iteration, the insert mask become 00110B. The loop iterator
> has no way to tell if the new bit was insert before or after the
> original bit.
> 
> That is the reason insert mask will need to start fresh zero
> for each pointer iteration.
> 
> If there is nest loops, the inner loop will need to maintain and
> merge the insert mask for the outer loop.
> 
> It need some thing similar to the ptrlist refcount to do some
> save/merge on the insert mask context when iterator
> enter/exit the ptrlist node.
> 
> It is pretty complicate, but do able if we really want to do it.

Yes, it's pretty complicated. Too complicate for something that should
be very very simple: just add an element to a list.

Happily, I think we don't need this. Here is why:

First, we must realize that we use ptrlist for two different things:
1) just as container for object, in other words: a set.
2) to store a sequence of objects, an ordered set.
Each of these have different properties & operations:
1) for sets:
   - is-empty
   - size
   - add
   - lookup
   - delete
   - iterate
2) for sequences:
   - is-empty?
   - size
   - add-at-end
   - insert(-in-front)
   - delete
   - iterate
   - concat

Soon we may use them for another use too: fifo, aka stack
   - push
   - pop
   - [is-empty]
but that is another story.

I think that sequences are only needed for instructions: bb->insns.
All other current use just need a 'set'. 

== For instructions ==
* delete is already safe (insns are just marked as ->bb = NULL)
* insert-at-front is only used to add death notes and this shouldn't
  be a problem for nested ptrlist walking.
* concat is only used when packing BB and should be a problem
* add-at-end is used during linearization and some BB simplifications
  and shouldn't be a problem either

== For sets ==
* 'delete' should be done by marking the element as deleted if there
  is a possiblity of nested ptrlist walking
* if 'add' is only done at the end, I don't see well how we can have
  a problem, even with backward walking.
* even if somehow safe, 'add' or 'delete' operations should never be
  done in inner ptrlist walking because the outer will become meaningless
  if the set change (*any kind of change*) while iterating on the set
  In other words, what would mean "do this on all elements of the set"
  if elements are added or removed from the set while iterating on them?
  [It may be that is some specific cases, it is meaningful but in general
   it's not].

== Conclusion ==
We should audit the code much better that I did here but:
1) modification of sets in inner loops must be avoided in all case
   if we want to do well defined operations on them
2) most cases should be already safe
3) maybe we should use another API for sets & sequences
4) maybe it would even be good to use separate implementation for them

-- Luc

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: RFC: ptrlist insert during iteration
  2017-08-10 23:28 ` Luc Van Oostenryck
@ 2017-08-11  0:27   ` Chris Li
  2017-08-11 11:10     ` Luc Van Oostenryck
  0 siblings, 1 reply; 4+ messages in thread
From: Chris Li @ 2017-08-11  0:27 UTC (permalink / raw)
  To: Luc Van Oostenryck; +Cc: Linux-Sparse, Dibyendu Majumdar

On Thu, Aug 10, 2017 at 7:28 PM, Luc Van Oostenryck
<luc.vanoostenryck@gmail.com> wrote:
>> It is pretty complicate, but do able if we really want to do it.
>
> Yes, it's pretty complicated. Too complicate for something that should
> be very very simple: just add an element to a list.

It is also a dare. I don't think there is much simpler way to do insert
while looping properly in ptrlist. I welcome any one to give it a try.
Report back if you have super clever way to do it. To define properly,
the ptrlist looping while modify should behave as if it is a traditional
double link list. New item add to before the current position,
it will not show up in the loop. Insert after the current position, it will
show up in the loop. That is for the normal direction looping.

>
> I think that sequences are only needed for instructions: bb->insns.
> All other current use just need a 'set'.
>
> == For instructions ==
> * delete is already safe (insns are just marked as ->bb = NULL)
> * insert-at-front is only used to add death notes and this shouldn't
>   be a problem for nested ptrlist walking.
> * concat is only used when packing BB and should be a problem
> * add-at-end is used during linearization and some BB simplifications
>   and shouldn't be a problem either
>
> == For sets ==
> * 'delete' should be done by marking the element as deleted if there
>   is a possiblity of nested ptrlist walking
> * if 'add' is only done at the end, I don't see well how we can have
>   a problem, even with backward walking.
> * even if somehow safe, 'add' or 'delete' operations should never be
>   done in inner ptrlist walking because the outer will become meaningless
>   if the set change (*any kind of change*) while iterating on the set
>   In other words, what would mean "do this on all elements of the set"
>   if elements are added or removed from the set while iterating on them?
>   [It may be that is some specific cases, it is meaningful but in general
>    it's not].

I think that is exactly the problem with set. When you have looping
and modify (insert for example), it is very hard to define consistent
behavior.  Should the new inserted element shows in the loop or
not? The double link list is very consistent. Consistent is good.

> == Conclusion ==
> We should audit the code much better that I did here but:
> 1) modification of sets in inner loops must be avoided in all case
>    if we want to do well defined operations on them

Agree.

> 2) most cases should be already safe
> 3) maybe we should use another API for sets & sequences

I think the consistent behavior will be very hard to address in
set. If the compiler get different result run to run, that is bad.

> 4) maybe it would even be good to use separate implementation for them

I do have some plan(in my head) to use work queue to avoid the
looping. I believe the looping while modify can be avoided. Most
of the classic compiler algorithm, in their pseudo code form, already
avoid looping while modify. But that is getting off topic here.

Chris

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: RFC: ptrlist insert during iteration
  2017-08-11  0:27   ` Chris Li
@ 2017-08-11 11:10     ` Luc Van Oostenryck
  0 siblings, 0 replies; 4+ messages in thread
From: Luc Van Oostenryck @ 2017-08-11 11:10 UTC (permalink / raw)
  To: Chris Li; +Cc: Linux-Sparse, Dibyendu Majumdar

On Thu, Aug 10, 2017 at 08:27:22PM -0400, Chris Li wrote:
> On Thu, Aug 10, 2017 at 7:28 PM, Luc Van Oostenryck
> <luc.vanoostenryck@gmail.com> wrote:
> >> It is pretty complicate, but do able if we really want to do it.
> >
> > Yes, it's pretty complicated. Too complicate for something that should
> > be very very simple: just add an element to a list.
> 
> It is also a dare. I don't think there is much simpler way to do insert
> while looping properly in ptrlist. I welcome any one to give it a try.
> Report back if you have super clever way to do it. To define properly,
> the ptrlist looping while modify should behave as if it is a traditional
> double link list. New item add to before the current position,
> it will not show up in the loop. Insert after the current position, it will
> show up in the loop. That is for the normal direction looping.
> 
> >
> > I think that sequences are only needed for instructions: bb->insns.
> > All other current use just need a 'set'.
> >
> > == For instructions ==
> > * delete is already safe (insns are just marked as ->bb = NULL)
> > * insert-at-front is only used to add death notes and this shouldn't
> >   be a problem for nested ptrlist walking.
> > * concat is only used when packing BB and should be a problem
> > * add-at-end is used during linearization and some BB simplifications
> >   and shouldn't be a problem either
> >
> > == For sets ==
> > * 'delete' should be done by marking the element as deleted if there
> >   is a possiblity of nested ptrlist walking
> > * if 'add' is only done at the end, I don't see well how we can have
> >   a problem, even with backward walking.
> > * even if somehow safe, 'add' or 'delete' operations should never be
> >   done in inner ptrlist walking because the outer will become meaningless
> >   if the set change (*any kind of change*) while iterating on the set
> >   In other words, what would mean "do this on all elements of the set"
> >   if elements are added or removed from the set while iterating on them?
> >   [It may be that is some specific cases, it is meaningful but in general
> >    it's not].
> 
> I think that is exactly the problem with set. When you have looping
> and modify (insert for example), it is very hard to define consistent
> behavior.  Should the new inserted element shows in the loop or
> not?

Yes, but again, if we see things strictly from the mathematical
point-of-view, it's not that it is hard to define behaviour, it's
simply that things are *always* inconsistent.
If you write your outer loop in mathematical language as:
  for all x in <this set>: ...
things become clearer as the set is modified while the operations
are done/the properties are checked/...

*Probably* we want to have something like:
  X' = { x in X ! .... } U ...
for the inner loop and thinsg are in fact OK there but for
the outer loop things are simply and plainly wrong.

> The double link list is very consistent. Consistent is good.

Double linked list is an implementation detail. It is easier to
maintain *local consistency* at the structure level. *But* ...
if this list is used the represent a set then the problem above
remains, it's not a question of implementation.
 
> ...
> If the compiler get different result run to run, that is bad.

It's not because a set has no ordering that the implementation
*must* create indeterminism (from run to run) when giving an order
to its elements, using an order when accessing its elements.
It just means that we can choose the order.

And yes, it's really much better if sparse can give exactly the
same behaviour from run to run, in other words: reproducible results.
It currently doesn't (for reasons I haven't yet understood) and
it is *very* annoying when comparing results between versions
at instruction level as I very often do.

-- Luc

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2017-08-11 11:10 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-08-10  3:56 RFC: ptrlist insert during iteration Chris Li
2017-08-10 23:28 ` Luc Van Oostenryck
2017-08-11  0:27   ` Chris Li
2017-08-11 11:10     ` Luc Van Oostenryck

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).