[patch 0/6] 3.14-rt1 fixes

linux-rt-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [patch 0/6] 3.14-rt1 fixes
@ 2014-05-02 11:12 Mike Galbraith
  2014-05-02 11:39 ` Pavel Vasilyev
  2014-05-02 13:44 ` Sebastian Andrzej Siewior
  0 siblings, 2 replies; 12+ messages in thread
From: Mike Galbraith @ 2014-05-02 11:12 UTC (permalink / raw)
  To: RT; +Cc: Sebastian Andrzej Siewior, Steven Rostedt, Thomas Gleixner

The following patches are fixes that fell out of testing rt1.  Patches
1-4 are intended to be folded into existing patches, 5-6 are
replacements.

fold:
1/6 - preempt-lazy-support.patch
2/6 - x86-preempt-lazy.patch
3/6 - hotplug-light-get-online-cpus.patch
4/6 - stomp-machine-raw-lock.patch

replace:
5/6 - (prep)
6/6 - stomp-machine-deal-clever-with-stopper-lock.patch

drop: (buggy - calls migrate_disable() _after_ maybe blocking)
migrate_disable-pushd-down-in-atomic_dec_and_spin_lo.patch

-Mike


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [patch 0/6] 3.14-rt1 fixes
  2014-05-02 11:12 [patch 0/6] 3.14-rt1 fixes Mike Galbraith
@ 2014-05-02 11:39 ` Pavel Vasilyev
  2014-05-02 11:43   ` Mike Galbraith
  2014-05-02 13:44 ` Sebastian Andrzej Siewior
  1 sibling, 1 reply; 12+ messages in thread
From: Pavel Vasilyev @ 2014-05-02 11:39 UTC (permalink / raw)
  To: Mike Galbraith, RT
  Cc: Sebastian Andrzej Siewior, Steven Rostedt, Thomas Gleixner

02.05.2014 15:12, Mike Galbraith пишет:
> The following patches are fixes that fell out of testing rt1.  Patches
> 1-4 are intended to be folded into existing patches, 5-6 are
> replacements.
>
> fold:
> 1/6 - preempt-lazy-support.patch
> 2/6 - x86-preempt-lazy.patch
> 3/6 - hotplug-light-get-online-cpus.patch
> 4/6 - stomp-machine-raw-lock.patch
>
> replace:
> 5/6 - (prep)
> 6/6 - stomp-machine-deal-clever-with-stopper-lock.patch
>
> drop: (buggy - calls migrate_disable() _after_ maybe blocking)
> migrate_disable-pushd-down-in-atomic_dec_and_spin_lo.patch


Cool, but you can the same, but for 3.14.2 ?


-- 

                                                          Pavel.
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [patch 0/6] 3.14-rt1 fixes
  2014-05-02 11:39 ` Pavel Vasilyev
@ 2014-05-02 11:43   ` Mike Galbraith
  2014-05-03  8:32     ` Nicholas Mc Guire
  0 siblings, 1 reply; 12+ messages in thread
From: Mike Galbraith @ 2014-05-02 11:43 UTC (permalink / raw)
  To: pavel; +Cc: RT, Sebastian Andrzej Siewior, Steven Rostedt, Thomas Gleixner

On Fri, 2014-05-02 at 15:39 +0400, Pavel Vasilyev wrote: 
> 02.05.2014 15:12, Mike Galbraith пишет:
> > The following patches are fixes that fell out of testing rt1.  Patches
> > 1-4 are intended to be folded into existing patches, 5-6 are
> > replacements.
> >
> > fold:
> > 1/6 - preempt-lazy-support.patch
> > 2/6 - x86-preempt-lazy.patch
> > 3/6 - hotplug-light-get-online-cpus.patch
> > 4/6 - stomp-machine-raw-lock.patch
> >
> > replace:
> > 5/6 - (prep)
> > 6/6 - stomp-machine-deal-clever-with-stopper-lock.patch
> >
> > drop: (buggy - calls migrate_disable() _after_ maybe blocking)
> > migrate_disable-pushd-down-in-atomic_dec_and_spin_lo.patch
> 
> 
> Cool, but you can the same, but for 3.14.2 ?

They'll apply fine.  You'll just have to comment out the two patches in
the rt1 tree which are no longer needed.  I added those on top of
14.2-rt1 with those two commented out.

-Mike


--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [patch 0/6] 3.14-rt1 fixes
  2014-05-02 11:12 [patch 0/6] 3.14-rt1 fixes Mike Galbraith
  2014-05-02 11:39 ` Pavel Vasilyev
@ 2014-05-02 13:44 ` Sebastian Andrzej Siewior
  2014-05-02 13:45   ` Mike Galbraith
  1 sibling, 1 reply; 12+ messages in thread
From: Sebastian Andrzej Siewior @ 2014-05-02 13:44 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: RT, Steven Rostedt, Thomas Gleixner

* Mike Galbraith | 2014-05-02 13:12:39 [+0200]:

>drop: (buggy - calls migrate_disable() _after_ maybe blocking)
>migrate_disable-pushd-down-in-atomic_dec_and_spin_lo.patch

In the thread you quoted also the delayed migrated_disable in
rt_read_lock(). Is this one no longer required?

>-Mike

Sebastian

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [patch 0/6] 3.14-rt1 fixes
  2014-05-02 13:44 ` Sebastian Andrzej Siewior
@ 2014-05-02 13:45   ` Mike Galbraith
  2014-05-02 13:49     ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 12+ messages in thread
From: Mike Galbraith @ 2014-05-02 13:45 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: RT, Steven Rostedt, Thomas Gleixner

On Fri, 2014-05-02 at 15:44 +0200, Sebastian Andrzej Siewior wrote: 
> * Mike Galbraith | 2014-05-02 13:12:39 [+0200]:
> 
> >drop: (buggy - calls migrate_disable() _after_ maybe blocking)
> >migrate_disable-pushd-down-in-atomic_dec_and_spin_lo.patch
> 
> In the thread you quoted also the delayed migrated_disable in
> rt_read_lock(). Is this one no longer required?

Steven submitted a patch for that.

-Mike


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [patch 0/6] 3.14-rt1 fixes
  2014-05-02 13:45   ` Mike Galbraith
@ 2014-05-02 13:49     ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 12+ messages in thread
From: Sebastian Andrzej Siewior @ 2014-05-02 13:49 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: RT, Steven Rostedt, Thomas Gleixner

On 05/02/2014 03:45 PM, Mike Galbraith wrote:
> On Fri, 2014-05-02 at 15:44 +0200, Sebastian Andrzej Siewior wrote: 
>> * Mike Galbraith | 2014-05-02 13:12:39 [+0200]:
>>
>>> drop: (buggy - calls migrate_disable() _after_ maybe blocking)
>>> migrate_disable-pushd-down-in-atomic_dec_and_spin_lo.patch
>>
>> In the thread you quoted also the delayed migrated_disable in
>> rt_read_lock(). Is this one no longer required?
> 
> Steven submitted a patch for that.

Ah I see. Let me look at this. What it looks is that we revert the 3
three push down patches since they break hotplug. Thanks.

> 
> -Mike
> 

Sebastian

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [patch 0/6] 3.14-rt1 fixes
  2014-05-02 11:43   ` Mike Galbraith
@ 2014-05-03  8:32     ` Nicholas Mc Guire
  2014-05-03 10:40       ` Mike Galbraith
  0 siblings, 1 reply; 12+ messages in thread
From: Nicholas Mc Guire @ 2014-05-03  8:32 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: pavel, RT, Sebastian Andrzej Siewior, Steven Rostedt,
	Thomas Gleixner

On Fri, 02 May 2014, Mike Galbraith wrote:

> On Fri, 2014-05-02 at 15:39 +0400, Pavel Vasilyev wrote: 
> > 02.05.2014 15:12, Mike Galbraith ??????????:
> > > The following patches are fixes that fell out of testing rt1.  Patches
> > > 1-4 are intended to be folded into existing patches, 5-6 are
> > > replacements.
> > >
> > > fold:
> > > 1/6 - preempt-lazy-support.patch
> > > 2/6 - x86-preempt-lazy.patch
> > > 3/6 - hotplug-light-get-online-cpus.patch
> > > 4/6 - stomp-machine-raw-lock.patch
> > >
> > > replace:
> > > 5/6 - (prep)
> > > 6/6 - stomp-machine-deal-clever-with-stopper-lock.patch
> > >
> > > drop: (buggy - calls migrate_disable() _after_ maybe blocking)
> > > migrate_disable-pushd-down-in-atomic_dec_and_spin_lo.patch
> > 
> > 
Why would calling migrate_disable be buggy after blocking ?
at that point any per cpu data is not yet being accessed so its perfectly fine
to

  lock->block
   -> migrate 
    -> unblock->lock 
     -> migrate_disable on a new CPU
      -> access per_cpu object
     -> migrate_enable
  -> unlock

what problems would there be ?

thx!
hofrat


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [patch 0/6] 3.14-rt1 fixes
  2014-05-03  8:32     ` Nicholas Mc Guire
@ 2014-05-03 10:40       ` Mike Galbraith
  2014-05-03 12:31         ` Nicholas Mc Guire
  0 siblings, 1 reply; 12+ messages in thread
From: Mike Galbraith @ 2014-05-03 10:40 UTC (permalink / raw)
  To: Nicholas Mc Guire
  Cc: pavel, RT, Sebastian Andrzej Siewior, Steven Rostedt,
	Thomas Gleixner

On Sat, 2014-05-03 at 10:32 +0200, Nicholas Mc Guire wrote: 
> On Fri, 02 May 2014, Mike Galbraith wrote:
> 
> > On Fri, 2014-05-02 at 15:39 +0400, Pavel Vasilyev wrote: 
> > > 02.05.2014 15:12, Mike Galbraith ??????????:
> > > > The following patches are fixes that fell out of testing rt1.  Patches
> > > > 1-4 are intended to be folded into existing patches, 5-6 are
> > > > replacements.
> > > >
> > > > fold:
> > > > 1/6 - preempt-lazy-support.patch
> > > > 2/6 - x86-preempt-lazy.patch
> > > > 3/6 - hotplug-light-get-online-cpus.patch
> > > > 4/6 - stomp-machine-raw-lock.patch
> > > >
> > > > replace:
> > > > 5/6 - (prep)
> > > > 6/6 - stomp-machine-deal-clever-with-stopper-lock.patch
> > > >
> > > > drop: (buggy - calls migrate_disable() _after_ maybe blocking)
> > > > migrate_disable-pushd-down-in-atomic_dec_and_spin_lo.patch
> > > 
> > > 
> Why would calling migrate_disable be buggy after blocking ?
> at that point any per cpu data is not yet being accessed so its perfectly fine
> to
> 
>   lock->block
>    -> migrate 
>     -> unblock->lock 
>      -> migrate_disable on a new CPU
>       -> access per_cpu object
>      -> migrate_enable
>   -> unlock
> 
> what problems would there be ?

You tell me.   Why do we bother at all if it's ok to migrate with a lock
in your pocket, or the CPU can go away on you while you're acquiring?
If this is in fact safe, you should be able to move each and every
migrate_disable() to post acquisition.  I have a virtual nickle that
says your box will have a severe allergic reaction to such a patch.

-Mike


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [patch 0/6] 3.14-rt1 fixes
  2014-05-03 10:40       ` Mike Galbraith
@ 2014-05-03 12:31         ` Nicholas Mc Guire
  2014-05-03 14:03           ` Mike Galbraith
  0 siblings, 1 reply; 12+ messages in thread
From: Nicholas Mc Guire @ 2014-05-03 12:31 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: pavel, RT, Sebastian Andrzej Siewior, Steven Rostedt,
	Thomas Gleixner

On Sat, 03 May 2014, Mike Galbraith wrote:

> On Sat, 2014-05-03 at 10:32 +0200, Nicholas Mc Guire wrote: 
> > On Fri, 02 May 2014, Mike Galbraith wrote:
> > 
> > > On Fri, 2014-05-02 at 15:39 +0400, Pavel Vasilyev wrote: 
> > > > 02.05.2014 15:12, Mike Galbraith ??????????:
> > > > > The following patches are fixes that fell out of testing rt1.  Patches
> > > > > 1-4 are intended to be folded into existing patches, 5-6 are
> > > > > replacements.
> > > > >
> > > > > fold:
> > > > > 1/6 - preempt-lazy-support.patch
> > > > > 2/6 - x86-preempt-lazy.patch
> > > > > 3/6 - hotplug-light-get-online-cpus.patch
> > > > > 4/6 - stomp-machine-raw-lock.patch
> > > > >
> > > > > replace:
> > > > > 5/6 - (prep)
> > > > > 6/6 - stomp-machine-deal-clever-with-stopper-lock.patch
> > > > >
> > > > > drop: (buggy - calls migrate_disable() _after_ maybe blocking)
> > > > > migrate_disable-pushd-down-in-atomic_dec_and_spin_lo.patch
> > > > 
> > > > 
> > Why would calling migrate_disable be buggy after blocking ?
> > at that point any per cpu data is not yet being accessed so its perfectly fine
> > to
> > 
> >   lock->block
> >    -> migrate 
> >     -> unblock->lock 
> >      -> migrate_disable on a new CPU
> >       -> access per_cpu object
> >      -> migrate_enable
> >   -> unlock
> > 
> > what problems would there be ?
> 
> You tell me.   Why do we bother at all if it's ok to migrate with a lock
> in your pocket, or the CPU can go away on you while you're acquiring?

The lock in your pocket is not the issue (unless its a per_cpu lock in which
case we are pinned already anyway) and the cpu can go away with a global
lock held in my opinion - why not ? 

> If this is in fact safe, you should be able to move each and every
> migrate_disable() to post acquisition.

yup

> I have a virtual nickle that
> says your box will have a severe allergic reaction to such a patch.
>
Actually that is what the pushdowns in the read_lock/write_lock api did and
I did not notice any of the systems having problems with that.

Not saying that there is no case in the pushdown that is wrongly 
handled but before looking at that lets clarify if the assumptions 
made for the migrate pushdowns was right in the first place.

migrate_disable() provides referencial consistency on per_cpu objects so that
preemption is safe with a per_cpu ref held - nothing else.

So:
 if the lock is global then it is safe to migrate while blocked on a lock as
  the referencial consistency of the lock is not affected the reference to
  a protected per_cpu object not yet set (if it were that is a bug if the
  method used to access the per_cpu objects ref did not migrate_disable 
  already).
 if it is a per-cpu lock then migrate_disable() would need to be called before
  accessing the lock object any way and the problem is not related to
  blocking on a lock - the get_ primitives would have disable migration before
  you got access to the lock object and thus you are safe.

Regarding the CPU going away while blocked on a lock we have:
 if the lock is global then the CPU going away would push you off first without
   that the reference to the lock object is invalidated - so that is safe.
 if the lock is per_cpu then the get_cpu_var or what ever you use to get that
   per_cpu lock object would have disable migration BEFORE you got access to
   the per_cpu lock object, so once you have access to the per_cpu lock you
   would again be safe against the cpu going away as well via pin_current_cpu

>From my understanding for a globally visible lock:

            per_cpu
 lock pin   access   unpin unlock
 pin lock   access   unlock unpin

are semantically equivalent and safe
even
            per_cpu
 lock pin   access   unlock unpin
 pin lock   access   unpin unlock

is safe (while strange) - in no constelation would you lose
referencial consistency of the per_cpu object within the crticial
section (access).

If code relies on these two (semantically orthogonal functions) to be
in a particuular order I would suspect that the code has a bug.

If one now looks at the cases of
            per_cpu
 lock pin   access   ...

and assume that you got migrated away while aquireing the lock,
or actually simply before you got temporarily pinned to the cpu,
then this is safe for global locks - if the CPU went away and
we got pushed to another CPU then the per_cpu object access
is on the cpu we got pushed to but stays consistent and safe.

for a per_cpu lock the pin/unpin is implied in access to the per_cpu 
lock object it self so the migrate_disable done in the locking api is 
not needed but no harme either and the position of the same (before or 
after aquisition of the lock) is irrelevant.

So what case did I miss ?

thx!
hofrat

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [patch 0/6] 3.14-rt1 fixes
  2014-05-03 12:31         ` Nicholas Mc Guire
@ 2014-05-03 14:03           ` Mike Galbraith
  2014-05-03 14:19             ` Nicholas Mc Guire
  0 siblings, 1 reply; 12+ messages in thread
From: Mike Galbraith @ 2014-05-03 14:03 UTC (permalink / raw)
  To: Nicholas Mc Guire
  Cc: pavel, RT, Sebastian Andrzej Siewior, Steven Rostedt,
	Thomas Gleixner

On Sat, 2014-05-03 at 14:31 +0200, Nicholas Mc Guire wrote: 
> On Sat, 03 May 2014, Mike Galbraith wrote:

> > If this is in fact safe, you should be able to move each and every
> > migrate_disable() to post acquisition.
> 
> yup

Having just seen working -> brick transition, color me skeptical.

> > I have a virtual nickle that
> > says your box will have a severe allergic reaction to such a patch.
> >
> Actually that is what the pushdowns in the read_lock/write_lock api did and
> I did not notice any of the systems having problems with that.

If you had tested hotplug, you would have met the deadlock, and would
have verified that the change to read_lock() was the culprit instead of
me doing that.  Steven also verified that.  You too can flip back and
forth, drive boxen into the wall as many times as it take to convince
yourself that that change really really did induce the breakage.

I'll read the rest later, I'm getting all grumpy.

-Mike

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [patch 0/6] 3.14-rt1 fixes
  2014-05-03 14:03           ` Mike Galbraith
@ 2014-05-03 14:19             ` Nicholas Mc Guire
  2014-05-03 15:24               ` Mike Galbraith
  0 siblings, 1 reply; 12+ messages in thread
From: Nicholas Mc Guire @ 2014-05-03 14:19 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: pavel, RT, Sebastian Andrzej Siewior, Steven Rostedt,
	Thomas Gleixner

On Sat, 03 May 2014, Mike Galbraith wrote:

> On Sat, 2014-05-03 at 14:31 +0200, Nicholas Mc Guire wrote: 
> > On Sat, 03 May 2014, Mike Galbraith wrote:
> 
> > > If this is in fact safe, you should be able to move each and every
> > > migrate_disable() to post acquisition.
> > 
> > yup
> 
> Having just seen working -> brick transition, color me skeptical.
> 
> > > I have a virtual nickle that
> > > says your box will have a severe allergic reaction to such a patch.
> > >
> > Actually that is what the pushdowns in the read_lock/write_lock api did and
> > I did not notice any of the systems having problems with that.
> 
> If you had tested hotplug, you would have met the deadlock, and would
> have verified that the change to read_lock() was the culprit instead of
> me doing that.  Steven also verified that.  You too can flip back and
> forth, drive boxen into the wall as many times as it take to convince
> yourself that that change really really did induce the breakage.
>

I did not test hotplug - I did try and understand the code to
verify the assumptions - but before looking at details of some code
path (which in my opinion has a few problems of its own) I think it
would be best to clarify first if the assumptions made for the
migrate pushdown patches is right or not - if it is not there is no point
in discussing individual code paths but then that patch set simply needs to 
go out, if the assumptions are right then we can discuss where the fix is 
needed.

thx!
hofrat

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [patch 0/6] 3.14-rt1 fixes
  2014-05-03 14:19             ` Nicholas Mc Guire
@ 2014-05-03 15:24               ` Mike Galbraith
  0 siblings, 0 replies; 12+ messages in thread
From: Mike Galbraith @ 2014-05-03 15:24 UTC (permalink / raw)
  To: Nicholas Mc Guire
  Cc: pavel, RT, Sebastian Andrzej Siewior, Steven Rostedt,
	Thomas Gleixner

On Sat, 2014-05-03 at 16:19 +0200, Nicholas Mc Guire wrote: 
> On Sat, 03 May 2014, Mike Galbraith wrote:
> 
> > On Sat, 2014-05-03 at 14:31 +0200, Nicholas Mc Guire wrote: 
> > > On Sat, 03 May 2014, Mike Galbraith wrote:
> > 
> > > > If this is in fact safe, you should be able to move each and every
> > > > migrate_disable() to post acquisition.
> > > 
> > > yup
> > 
> > Having just seen working -> brick transition, color me skeptical.
> > 
> > > > I have a virtual nickle that
> > > > says your box will have a severe allergic reaction to such a patch.
> > > >
> > > Actually that is what the pushdowns in the read_lock/write_lock api did and
> > > I did not notice any of the systems having problems with that.
> > 
> > If you had tested hotplug, you would have met the deadlock, and would
> > have verified that the change to read_lock() was the culprit instead of
> > me doing that.  Steven also verified that.  You too can flip back and
> > forth, drive boxen into the wall as many times as it take to convince
> > yourself that that change really really did induce the breakage.
> >
> 
> I did not test hotplug - I did try and understand the code to

Not testing hotplug was obviously a mistake given what you were
changing.  Nor is that a tiny change, it's a glaringly huge order
change... for read_lock(), with no order change to write_lock().

Things that make ya go hmmm.

> verify the assumptions - but before looking at details of some code
> path (which in my opinion has a few problems of its own) I think it
> would be best to clarify first if the assumptions made for the
> migrate pushdown patches is right or not - if it is not there is no point
> in discussing individual code paths but then that patch set simply needs to 
> go out, if the assumptions are right then we can discuss where the fix is 
> needed.

If your assumptions were correct, there would have been no breakage.

-Mike


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2014-05-03 15:24 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-05-02 11:12 [patch 0/6] 3.14-rt1 fixes Mike Galbraith
2014-05-02 11:39 ` Pavel Vasilyev
2014-05-02 11:43   ` Mike Galbraith
2014-05-03  8:32     ` Nicholas Mc Guire
2014-05-03 10:40       ` Mike Galbraith
2014-05-03 12:31         ` Nicholas Mc Guire
2014-05-03 14:03           ` Mike Galbraith
2014-05-03 14:19             ` Nicholas Mc Guire
2014-05-03 15:24               ` Mike Galbraith
2014-05-02 13:44 ` Sebastian Andrzej Siewior
2014-05-02 13:45   ` Mike Galbraith
2014-05-02 13:49     ` Sebastian Andrzej Siewior

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).