public inbox for linux-omap@vger.kernel.org
 help / color / mirror / Atom feed
From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: Alan Stern <stern@rowland.harvard.edu>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Greg Kroah-Hartman <greg@kroah.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Pierre Tardy <tardyp@gmail.com>,
	Arjan van de Ven <arjan@linux.intel.com>,
	Frederic Weisbecker <fweisbec@gmail.com>,
	Jean Pihet <jean.pihet@newoldbits.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	linux-trace-users@vger.kernel.org, Frank Eigler <fche@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	linux-pm@lists.linux-foundation.org,
	Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>,
	Tejun Heo <tj@kernel.org>, Ingo Molnar <mingo@elte.hu>,
	linux-omap@vger.kernel.org,
	Linus Torvalds <torvalds@linux-foundation.org>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Subject: Re: [linux-pm] [PATCH] PERF(kernel): Cleanup power events V2
Date: Tue, 26 Oct 2010 17:33:56 -0400	[thread overview]
Message-ID: <20101026213356.GA21495@Krystal> (raw)
In-Reply-To: <Pine.LNX.4.44L0.1010261447090.1634-100000@iolanthe.rowland.org>

* Alan Stern (stern@rowland.harvard.edu) wrote:
> On Tue, 26 Oct 2010, Mathieu Desnoyers wrote:
> 
> > * Peter Zijlstra (peterz@infradead.org) wrote:
> > > On Tue, 2010-10-26 at 11:56 -0500, Pierre Tardy wrote:
> > > > 
> > > > +       trace_runtime_pm_usage(dev, atomic_read(&dev->power.usage_count)+1);
> > > >         atomic_inc(&dev->power.usage_count); 
> > > 
> > > That's terribly racy..
> > 
> > Looking at the original code, it looks racy even without considering the
> > tracepoint:
> > 
> > int __pm_runtime_get(struct device *dev, bool sync)
> >  {
> >         int retval;
> > 
> > +       trace_runtime_pm_usage(dev, atomic_read(&dev->power.usage_count)+1);
> >         atomic_inc(&dev->power.usage_count);
> >         retval = sync ? pm_runtime_resume(dev) : pm_request_resume(dev);
> > 
> > There is no implied memory barrier after "atomic_inc". So either all these
> > inc/dec are protected with mutexes or spinlocks, in which case one might wonder
> > why atomic operations are used at all, or it's a racy mess. (I vote for the
> > second option)
> 
> I don't understand.  What's the problem?  The inc/dec are atomic 
> because they are not protected by spinlocks, but everything else is 
> (aside from the tracepoint, which is new).
> 
> > kref should certainly be used there.
> 
> What for?

kref has the following "get":

        atomic_inc(&kref->refcount);
        smp_mb__after_atomic_inc();

What seems to be missing in __pm_runtime_get() and pm_runtime_get_noresume() is
the memory barrier after the atomic increment. The atomic increment is free to
be reordered into the following spinlock (within pm_request_resume or pm_request
resume execution) because taking a spinlock only acts as a memory barrier with
acquire semantic, not a full memory barrier.

So AFAIU, the failure scenario would be as follows (sorry for the 80+ columns):

initial conditions: usage_count = 1

CPU A                                                       CPU B
1) __pm_runtime_get() (sync = true)
2)   atomic_inc(&usage_count) (not committed to memory yet)
3)   pm_runtime_resume()
4)     spin_lock_irqsave(&dev->power.lock, flags);
5)     retval = __pm_request_resume(dev);
6)     (execute the body of __pm_request_resume and return)
7)                                                          __pm_runtime_put() (sync = true) 
8)                                                          if (atomic_dec_and_test(&dev->power.usage_count))
                                                              (still see usage_count == 1 before decrement,
                                                               thus decrement to 0)
9)                                                             pm_runtime_idle()
10)  spin_unlock_irqrestore(&dev->power.lock, flags)
11)                                                            spin_lock_irq(&dev->power.lock);
12)                                                            retval = __pm_runtime_idle(dev);
13)                                                            spin_unlock_irq(&dev->power.lock);

So we end up in a situation where CPU A expects the device to be resumed, but
the last action performed has been to bring it to idle.

A smp_mb__after_atomic_inc() between lines 2 and 3 would fix this.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

  reply	other threads:[~2010-10-26 21:33 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1287488171-25303-1-git-send-email-trenn@suse.de>
2010-10-19 11:36 ` [PATCH 1/3] PERF: Do not export power_frequency, but power_start event Thomas Renninger
2010-10-19 11:36 ` [PATCH 2/3] PERF(kernel): Cleanup power events Thomas Renninger
2010-10-25  6:54   ` Arjan van de Ven
2010-10-25  9:41     ` Thomas Renninger
2010-10-25 13:55       ` Arjan van de Ven
2010-10-25 14:36         ` Thomas Renninger
2010-10-25 14:45           ` Arjan van de Ven
2010-10-25 14:56             ` Ingo Molnar
2010-10-25 15:48               ` Thomas Renninger
2010-10-25 16:00                 ` Arjan van de Ven
2010-10-25 23:32                   ` Thomas Renninger
2010-10-25  6:58   ` Arjan van de Ven
2010-10-25 10:04   ` Ingo Molnar
2010-10-25 11:03     ` Thomas Renninger
2010-10-25 11:55       ` Ingo Molnar
2010-10-25 12:55         ` Thomas Renninger
2010-10-25 14:11           ` Arjan van de Ven
2010-10-25 14:51             ` Thomas Renninger
2010-10-25 12:58         ` Mathieu Desnoyers
2010-10-25 20:29           ` Rafael J. Wysocki
2010-10-25 13:58       ` Arjan van de Ven
2010-10-25 20:33         ` Rafael J. Wysocki
2010-10-25 23:33   ` [PATCH] PERF(kernel): Cleanup power events V2 Thomas Renninger
2010-10-26  1:09     ` Arjan van de Ven
2010-10-26  7:10     ` Ingo Molnar
2010-10-26  8:08       ` Jean Pihet
2010-10-26 11:21         ` Ingo Molnar
2010-10-26 11:48           ` Thomas Renninger
2010-10-26 11:54             ` Ingo Molnar
2010-10-26 13:17               ` Thomas Renninger
2010-10-26 13:35                 ` Thomas Renninger
2010-10-26 18:57             ` Rafael J. Wysocki
2010-10-27  0:00               ` Thomas Renninger
2010-10-27  9:16                 ` Rafael J. Wysocki
2010-10-26  9:58       ` Arjan van de Ven
2010-10-26 10:19         ` Ingo Molnar
2010-10-26 10:37       ` Thomas Renninger
2010-10-26 11:19         ` Ingo Molnar
2010-10-26 19:01           ` Rafael J. Wysocki
2010-10-26 15:32       ` Pierre Tardy
2010-10-26 16:04         ` Arjan van de Ven
2010-10-26 16:56           ` Pierre Tardy
2010-10-26 17:58             ` Peter Zijlstra
2010-10-26 18:14               ` Mathieu Desnoyers
2010-10-26 18:50                 ` [linux-pm] " Alan Stern
2010-10-26 21:33                   ` Mathieu Desnoyers [this message]
2010-10-26 22:20                     ` Rafael J. Wysocki
2010-10-26 22:39                       ` Rafael J. Wysocki
2010-10-27  0:46                       ` Mathieu Desnoyers
2010-10-27 10:22                         ` Rafael J. Wysocki
2010-10-27 12:21                           ` Mathieu Desnoyers
2010-10-27 21:43                             ` Rafael J. Wysocki
2010-10-26 19:04                 ` Rafael J. Wysocki
2010-10-26 21:38                   ` Mathieu Desnoyers
2010-10-26 22:22                     ` Rafael J. Wysocki
2010-10-26 18:15               ` Pierre Tardy
2010-10-26 19:08                 ` Rafael J. Wysocki
2010-10-26 20:23                   ` Pierre Tardy
2010-10-26 20:38                     ` Rafael J. Wysocki
2010-10-26 20:52                       ` Arjan van de Ven
2010-10-26 21:17                         ` Rafael J. Wysocki
2010-10-26  7:59     ` Jean Pihet
2010-10-26 18:52     ` Rafael J. Wysocki
2010-10-19 11:36 ` [PATCH 3/3] PERF(userspace): Adjust perf timechart to the new power events Thomas Renninger
2010-10-26  0:18   ` [PATCH] PERF(userspace): Adjust perf timechart to the new power events V2 Thomas Renninger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101026213356.GA21495@Krystal \
    --to=mathieu.desnoyers@efficios.com \
    --cc=akpm@linux-foundation.org \
    --cc=arjan@linux.intel.com \
    --cc=fche@redhat.com \
    --cc=fweisbec@gmail.com \
    --cc=greg@kroah.com \
    --cc=jean.pihet@newoldbits.com \
    --cc=linux-omap@vger.kernel.org \
    --cc=linux-pm@lists.linux-foundation.org \
    --cc=linux-trace-users@vger.kernel.org \
    --cc=masami.hiramatsu.pt@hitachi.com \
    --cc=mingo@elte.hu \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=stern@rowland.harvard.edu \
    --cc=tardyp@gmail.com \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox