From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: linux-pm@lists.linux-foundation.org,
Alan Stern <stern@rowland.harvard.edu>,
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
Pierre Tardy <tardyp@gmail.com>,
Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@elte.hu>,
Jean Pihet <jean.pihet@newoldbits.com>,
Steven Rostedt <rostedt@goodmis.org>,
linux-trace-users@vger.kernel.org, Frank Eigler <fche@redhat.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
Frederic Weisbecker <fweisbec@gmail.com>,
Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>,
Tejun Heo <tj@kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
linux-omap@vger.kernel.org,
Arjan van de Ven <arjan@linux.intel.com>,
Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [linux-pm] [PATCH] PERF(kernel): Cleanup power events V2
Date: Wed, 27 Oct 2010 08:21:15 -0400 [thread overview]
Message-ID: <20101027122115.GA10722@Krystal> (raw)
In-Reply-To: <201010271222.08921.rjw@sisk.pl>
* Rafael J. Wysocki (rjw@sisk.pl) wrote:
> On Wednesday, October 27, 2010, Mathieu Desnoyers wrote:
> > * Rafael J. Wysocki (rjw@sisk.pl) wrote:
> > > On Tuesday, October 26, 2010, Mathieu Desnoyers wrote:
> > > > * Alan Stern (stern@rowland.harvard.edu) wrote:
> > > > > On Tue, 26 Oct 2010, Mathieu Desnoyers wrote:
> > > > >
> > > > > > * Peter Zijlstra (peterz@infradead.org) wrote:
> > > > > > > On Tue, 2010-10-26 at 11:56 -0500, Pierre Tardy wrote:
> > > > > > > >
> > > > > > > > + trace_runtime_pm_usage(dev, atomic_read(&dev->power.usage_count)+1);
> > > > > > > > atomic_inc(&dev->power.usage_count);
> > > > > > >
> > > > > > > That's terribly racy..
> > > > > >
> > > > > > Looking at the original code, it looks racy even without considering the
> > > > > > tracepoint:
> > > > > >
> > > > > > int __pm_runtime_get(struct device *dev, bool sync)
> > > > > > {
> > > > > > int retval;
> > > > > >
> > > > > > + trace_runtime_pm_usage(dev, atomic_read(&dev->power.usage_count)+1);
> > > > > > atomic_inc(&dev->power.usage_count);
> > > > > > retval = sync ? pm_runtime_resume(dev) : pm_request_resume(dev);
> > > > > >
> > > > > > There is no implied memory barrier after "atomic_inc". So either all these
> > > > > > inc/dec are protected with mutexes or spinlocks, in which case one might wonder
> > > > > > why atomic operations are used at all, or it's a racy mess. (I vote for the
> > > > > > second option)
> > > > >
> > > > > I don't understand. What's the problem? The inc/dec are atomic
> > > > > because they are not protected by spinlocks, but everything else is
> > > > > (aside from the tracepoint, which is new).
> > > > >
> > > > > > kref should certainly be used there.
> > > > >
> > > > > What for?
> > > >
> > > > kref has the following "get":
> > > >
> > > > atomic_inc(&kref->refcount);
> > > > smp_mb__after_atomic_inc();
> > > >
> > > > What seems to be missing in __pm_runtime_get() and pm_runtime_get_noresume() is
> > > > the memory barrier after the atomic increment. The atomic increment is free to
> > > > be reordered into the following spinlock (within pm_request_resume or pm_request
> > > > resume execution) because taking a spinlock only acts as a memory barrier with
> > > > acquire semantic, not a full memory barrier.
> > > >
> > > > So AFAIU, the failure scenario would be as follows (sorry for the 80+ columns):
> > > >
> > > > initial conditions: usage_count = 1
> > > >
> > > > CPU A CPU B
> > > > 1) __pm_runtime_get() (sync = true)
> > > > 2) atomic_inc(&usage_count) (not committed to memory yet)
> > > > 3) pm_runtime_resume()
> > > > 4) spin_lock_irqsave(&dev->power.lock, flags);
> > > > 5) retval = __pm_request_resume(dev);
> > >
> > > If sync = true this is
> > > retval = __pm_runtime_resume(dev);
> > > which drops and reacquires the spinlock.
> >
> > Let's see. Upon entry in __pm_runtime_resume, the following condition holds
> > (remember, the initial condition is that usage_count == 1):
> >
> > dev->power.runtime_status == RPM_ACTIVE
> >
> > so retval is set to 1, which goto directly to "out", without setting "parent".
> > So there does not seem to be any spinlock reacquire on this path, or am I
> > misunderstanding how the "runtime_status" works ?
>
> No, you're not I think, the above is correct. I was referring to the scenario
> in which the device was RPM_SUSPENDED initially.
Good to know I'm not losing it. ;-)
>
> > > In the meantime it sets
> > > ->power.runtime_status so that __pm_runtime_idle() will fail if run at this
> > > point.
> >
> > runtime_status will be left at "RPM_ACTIVE", which is the appropriate value
> > expected by __pm_runtime_idle.
> >
> > >
> > > > 6) (execute the body of __pm_request_resume and return)
> > > > 7) __pm_runtime_put() (sync = true)
> > > > 8) if (atomic_dec_and_test(&dev->power.usage_count))
> > > > (still see usage_count == 1 before decrement,
> > > > thus decrement to 0)
> > > > 9) pm_runtime_idle()
> > > > 10) spin_unlock_irqrestore(&dev->power.lock, flags)
> > > > 11) spin_lock_irq(&dev->power.lock);
> > > > 12) retval = __pm_runtime_idle(dev);
> > >
> > > Moreover, __pm_runtime_idle() checks ->power.usage_count under the spinlock,
> > > so it will see it's been incremented in the meantime and it will back off.
> >
> > This is a subtle but important point. Yes, my scenario seems to be dealt with by
> > the extra usage_count check while the spinlock is held.
> >
> > How about adding a comment under this atomic_inc() stating that the memory
> > barriers are implicitely dealt with by the following spinlock release and the
> > extra check while spinlock is held ?
> >
> > Commenting memory barriers is important, but commenting why memory barriers are
> > not needed due to a subtle corner-case looks even more important.
>
> Well, given that this discussion is taking place at all, I admit that it would
> be good to document this somehow. :-)
Yep, it's astonishing how a few comments can end up saving lots of emails from
confused reviewers. ;-)
>
> I'll take care of that.
>
> > (hrm, but more below considering pm_runtime_get_noresume())
> >
> > >
> > > > 13) spin_unlock_irq(&dev->power.lock);
> > > >
> > > > So we end up in a situation where CPU A expects the device to be resumed, but
> > > > the last action performed has been to bring it to idle.
> > > >
> > > > A smp_mb__after_atomic_inc() between lines 2 and 3 would fix this.
> > >
> > > I don't think this particular race is possible. However, there is another one
> > > that seems to be possible (in a different function) that an explicit barrier
> > > will prevent from happening.
> > >
> > > It's related to pm_runtime_get_noresume(), but I think it's better to put the
> > > barrier where it's necessary rather than into pm_runtime_get_noresume() itself.
> >
> > Quoting your following mail:
> >
> > > Actually, no. Since rpm_idle() and rpm_suspend() both check usage_count under
> > > the spinlock, the race I was thinking about doesn't appear to be possible
> > > after all.
> >
> > Hrm, for the extra-usage_count-under-spinlock check to work, all
> > pm_runtime_get_noresume() callers should grab and release the dev->power.lock
> > after incrementing the usage_count. This does not seem to be the case though. So
> > you might really have a race there.
> >
> > So every code path that does:
> >
> > 1) pm_runtime_get_noresume(dev);
> >
> > 2) ...
> >
> > 3) pm_runtime_put_noidle(dev);
> >
> > expecting that the device state cannot be changed between 1 and 3 might be
> > surprised by a concurrent call to __pm_runtime_idle() that would put a device to
> > idle (or similarly with suspend) due to lack of memory barrier after the atomic
> > increment.
> >
> > Or am I missing something else ?
>
> First of all, the device can always be resumed regardless of the usage_count
> value. usage_count is only used to block attempts to suspend the device and
> execute its driver's ->runtime_idle() callback after it has been resumed.
> That's why the "normal" pm_runtime_get() queues up a resume request.
>
> IOW, the _get() only becomes meaningful after attempting to resume the device
> (which is what I tried to tell Arjan in one of the previous messages).
OK
>
> Second, there's no synchronization between pm_runtime_get_noresume() and
> pm_runtime_suspend/idle() etc., so calling pm_runtime_get_noresume() is
> certainly insufficient to block pm_runtime_suspend/idle() regardless of memory
> barriers (there may be one already in progress when _get_noresume() is called).
Agreed, I was wondering how this was expected to work.
> To limit possible status changes from happening one should (at least) run
> pm_runtime_barrier() (surprise, no? ;-)) after pm_runtime_get_noresume().
Hrm, then why export pm_runtime_get_noresume() at all ? I don't feel comfortable
with some of the pm_runtime_get_noresume() callers.
>
> So if you don't want to resume the device immediately after increasing its
> usage_count (in which case it's better to use pm_runtime_get_sync()), you
> should do something like this:
>
> 1) pm_runtime_get_noresume(dev);
> 1a) pm_runtime_barrier(dev); // That takes care of all pending requests etc.
>
> 2) ...
>
> 3) pm_runtime_put_noidle(dev);
>
> [The meaning of pm_runtime_barrier() is that all of the runtime PM activity
> started before the barrier has been completed when it returns.]
>
> There's one place in the PM core where that really is necessary, but I wouldn't
> recommend anyone doing anything like it in a driver.
grep -r pm_runtime_get_noresume drivers/ hands out very interesting info.
e.g.:
drivers/usb/core/drivers.c: usb_autopm_get_interface_async()
pm_runtime_get_noresume(&intf->dev);
s = ACCESS_ONCE(intf->dev.power.runtime_status);
if (s == RPM_SUSPENDING || s == RPM_SUSPENDED)
status = pm_request_resume(&intf->dev);
How is this supposed to work ?
If the ACCESS_ONCE can be reordered before the atomic_inc(), then I fear the
device can be suspended even after the check.
My point is that a get/put semantic should imply memory barriers, especially if
these are exported APIs.
Thanks,
Mathieu
>
> Thanks,
> Rafael
--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
next prev parent reply other threads:[~2010-10-27 12:21 UTC|newest]
Thread overview: 135+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <1287488171-25303-1-git-send-email-trenn@suse.de>
2010-10-19 11:36 ` [PATCH 1/3] PERF: Do not export power_frequency, but power_start event Thomas Renninger
2010-10-19 11:36 ` Thomas Renninger
2010-10-19 11:36 ` [PATCH 2/3] PERF(kernel): Cleanup power events Thomas Renninger
2010-10-19 11:36 ` Thomas Renninger
2010-10-25 6:54 ` Arjan van de Ven
2010-10-25 9:41 ` Thomas Renninger
2010-10-25 9:41 ` Thomas Renninger
2010-10-25 13:55 ` Arjan van de Ven
2010-10-25 14:36 ` Thomas Renninger
2010-10-25 14:45 ` Arjan van de Ven
2010-10-25 14:56 ` Ingo Molnar
2010-10-25 14:56 ` Ingo Molnar
2010-10-25 15:48 ` Thomas Renninger
2010-10-25 15:48 ` Thomas Renninger
2010-10-25 16:00 ` Arjan van de Ven
2010-10-25 23:32 ` Thomas Renninger
2010-10-25 23:32 ` Thomas Renninger
2010-10-25 16:00 ` Arjan van de Ven
2010-10-25 14:45 ` Arjan van de Ven
2010-10-25 14:36 ` Thomas Renninger
2010-10-25 13:55 ` Arjan van de Ven
2010-10-25 6:54 ` Arjan van de Ven
2010-10-25 6:58 ` Arjan van de Ven
2010-10-25 6:58 ` Arjan van de Ven
2010-10-25 10:04 ` Ingo Molnar
2010-10-25 11:03 ` Thomas Renninger
2010-10-25 11:03 ` Thomas Renninger
2010-10-25 11:55 ` Ingo Molnar
2010-10-25 12:55 ` Thomas Renninger
2010-10-25 12:55 ` Thomas Renninger
2010-10-25 14:11 ` Arjan van de Ven
2010-10-25 14:51 ` Thomas Renninger
2010-10-25 14:51 ` Thomas Renninger
2010-10-25 14:11 ` Arjan van de Ven
2010-10-25 12:58 ` Mathieu Desnoyers
2010-10-25 20:29 ` Rafael J. Wysocki
2010-10-25 20:29 ` Rafael J. Wysocki
2010-10-25 12:58 ` Mathieu Desnoyers
2010-10-25 11:55 ` Ingo Molnar
2010-10-25 13:58 ` Arjan van de Ven
2010-10-25 20:33 ` Rafael J. Wysocki
2010-10-25 20:33 ` Rafael J. Wysocki
2010-10-25 13:58 ` Arjan van de Ven
2010-10-25 10:04 ` Ingo Molnar
2010-10-25 23:33 ` [PATCH] PERF(kernel): Cleanup power events V2 Thomas Renninger
2010-10-25 23:33 ` Thomas Renninger
2010-10-26 1:09 ` Arjan van de Ven
2010-10-26 1:09 ` Arjan van de Ven
2010-10-26 7:10 ` Ingo Molnar
2010-10-26 7:10 ` Ingo Molnar
2010-10-26 8:08 ` Jean Pihet
2010-10-26 8:08 ` Jean Pihet
2010-10-26 11:21 ` Ingo Molnar
2010-10-26 11:21 ` Ingo Molnar
2010-10-26 11:48 ` Thomas Renninger
2010-10-26 11:48 ` Thomas Renninger
2010-10-26 11:54 ` Ingo Molnar
2010-10-26 11:54 ` Ingo Molnar
2010-10-26 13:17 ` Thomas Renninger
2010-10-26 13:35 ` Thomas Renninger
2010-10-26 13:35 ` Thomas Renninger
2010-10-26 13:17 ` Thomas Renninger
2010-10-26 18:57 ` Rafael J. Wysocki
2010-10-26 18:57 ` Rafael J. Wysocki
2010-10-27 0:00 ` Thomas Renninger
2010-10-27 0:00 ` Thomas Renninger
2010-10-27 9:16 ` Rafael J. Wysocki
2010-10-27 9:16 ` Rafael J. Wysocki
2010-10-26 9:58 ` Arjan van de Ven
2010-10-26 10:19 ` Ingo Molnar
2010-10-26 10:19 ` Ingo Molnar
2010-10-26 9:58 ` Arjan van de Ven
2010-10-26 10:37 ` Thomas Renninger
2010-10-26 11:19 ` Ingo Molnar
2010-10-26 19:01 ` Rafael J. Wysocki
2010-10-26 19:01 ` Rafael J. Wysocki
2010-10-26 11:19 ` Ingo Molnar
2010-10-26 10:37 ` Thomas Renninger
2010-10-26 15:32 ` Pierre Tardy
2010-10-26 16:04 ` Arjan van de Ven
2010-10-26 16:56 ` Pierre Tardy
2010-10-26 17:58 ` Peter Zijlstra
2010-10-26 18:14 ` Mathieu Desnoyers
2010-10-26 18:14 ` Mathieu Desnoyers
2010-10-26 18:50 ` Alan Stern
2010-10-26 18:50 ` [linux-pm] " Alan Stern
2010-10-26 21:33 ` Mathieu Desnoyers
2010-10-26 22:20 ` Rafael J. Wysocki
2010-10-26 22:39 ` Rafael J. Wysocki
2010-10-26 22:39 ` [linux-pm] " Rafael J. Wysocki
2010-10-27 0:46 ` Mathieu Desnoyers
2010-10-27 10:22 ` Rafael J. Wysocki
2010-10-27 12:21 ` Mathieu Desnoyers
2010-10-27 12:21 ` Mathieu Desnoyers [this message]
2010-10-27 14:32 ` [linux-pm] " Alan Stern
2010-10-28 15:22 ` Alan Stern
2010-10-28 15:22 ` Alan Stern
2010-10-27 14:32 ` Alan Stern
2010-10-27 14:32 ` Alan Stern
2010-10-27 21:43 ` Rafael J. Wysocki
2010-10-27 21:43 ` [linux-pm] " Rafael J. Wysocki
2010-10-27 10:22 ` Rafael J. Wysocki
2010-10-27 0:46 ` Mathieu Desnoyers
2010-10-26 22:20 ` Rafael J. Wysocki
2010-10-26 21:33 ` Mathieu Desnoyers
2010-10-26 19:04 ` Rafael J. Wysocki
2010-10-26 21:38 ` Mathieu Desnoyers
2010-10-26 21:38 ` Mathieu Desnoyers
2010-10-26 22:22 ` Rafael J. Wysocki
2010-10-26 22:22 ` Rafael J. Wysocki
2010-10-26 19:04 ` Rafael J. Wysocki
2010-10-26 18:15 ` Pierre Tardy
2010-10-26 19:08 ` Rafael J. Wysocki
2010-10-26 20:23 ` Pierre Tardy
2010-10-26 20:23 ` Pierre Tardy
2010-10-26 20:38 ` Rafael J. Wysocki
2010-10-26 20:52 ` Arjan van de Ven
2010-10-26 20:52 ` Arjan van de Ven
2010-10-26 21:17 ` Rafael J. Wysocki
2010-10-26 21:17 ` Rafael J. Wysocki
2010-10-26 20:38 ` Rafael J. Wysocki
2010-10-26 19:08 ` Rafael J. Wysocki
2010-10-26 18:15 ` Pierre Tardy
2010-10-26 17:58 ` Peter Zijlstra
2010-10-26 16:56 ` Pierre Tardy
2010-10-26 16:04 ` Arjan van de Ven
2010-10-26 15:32 ` Pierre Tardy
2010-10-26 7:59 ` Jean Pihet
2010-10-26 7:59 ` Jean Pihet
2010-10-26 18:52 ` Rafael J. Wysocki
2010-10-26 18:52 ` Rafael J. Wysocki
2010-10-19 11:36 ` [PATCH 3/3] PERF(userspace): Adjust perf timechart to the new power events Thomas Renninger
2010-10-19 11:36 ` Thomas Renninger
2010-10-26 0:18 ` [PATCH] PERF(userspace): Adjust perf timechart to the new power events V2 Thomas Renninger
2010-10-26 0:18 ` Thomas Renninger
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20101027122115.GA10722@Krystal \
--to=mathieu.desnoyers@efficios.com \
--cc=akpm@linux-foundation.org \
--cc=arjan@linux.intel.com \
--cc=fche@redhat.com \
--cc=fweisbec@gmail.com \
--cc=jean.pihet@newoldbits.com \
--cc=linux-omap@vger.kernel.org \
--cc=linux-pm@lists.linux-foundation.org \
--cc=linux-trace-users@vger.kernel.org \
--cc=masami.hiramatsu.pt@hitachi.com \
--cc=mingo@elte.hu \
--cc=paulmck@linux.vnet.ibm.com \
--cc=peterz@infradead.org \
--cc=rjw@sisk.pl \
--cc=rostedt@goodmis.org \
--cc=stern@rowland.harvard.edu \
--cc=tardyp@gmail.com \
--cc=tglx@linutronix.de \
--cc=tj@kernel.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.