Re: [RFC PATCH] cpu hotplug: rework cpu_hotplug locking (was [LOCKDEP] cpufreq: possible circular locking dependency detected)

linux-pm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
To: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
	Viresh Kumar <viresh.kumar@linaro.org>,
	Michael Wang <wangyun@linux.vnet.ibm.com>,
	Jiri Kosina <jkosina@suse.cz>, Borislav Petkov <bp@alien8.de>,
	"Rafael J. Wysocki" <rjw@sisk.pl>,
	linux-kernel@vger.kernel.org, cpufreq@vger.kernel.org,
	linux-pm@vger.kernel.org
Subject: Re: [RFC PATCH] cpu hotplug: rework cpu_hotplug locking (was [LOCKDEP] cpufreq: possible circular locking dependency detected)
Date: Sat, 29 Jun 2013 10:35:39 +0300	[thread overview]
Message-ID: <20130629073539.GA2227@swordfish> (raw)
In-Reply-To: <51CD9A1A.1060908@linux.vnet.ibm.com>

On (06/28/13 19:43), Srivatsa S. Bhat wrote:
> On 06/28/2013 01:14 PM, Sergey Senozhatsky wrote:
> > Hello,
> > Yes, this helps, of course, but at the same time it returns the previous
> > problem -- preventing cpu_hotplug in some places.
> > 
> > 
> > I have a bit different (perhaps naive) RFC patch and would like to hear
> > comments.
> > 
> > 
> > 
> > The idead is to brake existing lock dependency chain by not holding
> > cpu_hotplug lock mutex across the calls. In order to detect active
> > refcount readers or active writer, refcount now may have the following
> > values:
> > 
> > -1: active writer -- only one writer may be active, readers are blocked
> >  0: no readers/writer
> >> 0: active readers -- many readers may be active, writer is blocked
> > 
> > "blocked" reader or writer goes to wait_queue. as soon as writer finishes
> > (refcount becomes 0), it wakeups all existing processes in a wait_queue.
> > reader perform wakeup call only when it sees that pending writer is present
> > (active_writer is not NULL).
> > 
> > cpu_hotplug lock now only required to protect refcount cmp, inc, dec
> > operations so it can be changed to spinlock.
> > 
> 
> Hmm, now that I actually looked at your patch, I see that it is completely
> wrong! I'm sure you intended to fix the *bug*, but instead you ended
> up merely silencing the *warning* (and also left lockdep blind), leaving
> the actual bug as it is!
>

Thank you for your time and review.


> So let me summarize what the actual bug is and what is it that actually
> needs fixing:
> 
> Basically you have 2 things -
> 1. A worker item (cs_dbs_timer in this case) that can re-arm itself
>    using gov_queue_work(). And gov_queue_work() uses get/put_online_cpus().
> 
> 2. In the cpu_down() path, you want to cancel the worker item and destroy
>    and cleanup its resources (the timer_mutex).
> 
> So the problem is that you can deadlock like this:
> 
>     CPU 3                                  CPU 4
> 
>    cpu_down()
>    -> acquire hotplug.lock
> 
> 				      cs_dbs_timer()
> 				        -> get_online_cpus()
> 					   //wait on hotplug.lock
> 
> 
>    try to cancel cs_dbs_timer()
>    synchronously.
> 
> That leads to a deadlock, because, cs_dbs_timer() is waiting to
> get the hotplug lock which CPU 3 is holding, whereas CPU 3 is
> waiting for cs_dbs_timer() to finish. So they can end up mutually
> waiting for each other, forever. (Yeah, the lockdep splat might have
> been a bit cryptic to decode this, but here it is).
> 
> So to fix the *bug*, you need to avoid waiting synchronously while
> holding the hotplug lock. Possibly by using cancel_delayed_work_sync()
> under CPU_POST_DEAD or something like that. That would remove the deadlock
> possibility.

will take a look. Thank you!

	-ss

> Your patch, on the other hand, doesn't remove the deadlock possibility:
> just because you don't hold the lock throughout the hotplug operation
> doesn't mean that the task calling get_online_cpus() can sneak in and
> finish its work in-between a hotplug operation (because the refcount
> won't allow it to). Also, it should *not* be allowed to sneak in like
> that, since that constitutes *racing* with CPU hotplug, which it was
> meant to avoid!.
> 
> Also, as a side effect of not holding the lock throughout the hotplug
> operation, lockdep goes blind, and doesn't complain, even though the
> actual bug is still there! Effectively, this is nothing but papering
> over the bug and silencing the warning, which we should never do.
> 
> So, please, fix the _cpufreq_ code to resolve the deadlock.
> 
> Regards,
> Srivatsa S. Bhat
>

next prev parent reply	other threads:[~2013-06-29  7:36 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-25 21:15 [LOCKDEP] cpufreq: possible circular locking dependency detected Sergey Senozhatsky
2013-06-28  4:43 ` Viresh Kumar
2013-06-28  7:44   ` [RFC PATCH] cpu hotplug: rework cpu_hotplug locking (was [LOCKDEP] cpufreq: possible circular locking dependency detected) Sergey Senozhatsky
2013-06-28  9:31     ` Srivatsa S. Bhat
2013-06-28 10:04       ` Sergey Senozhatsky
2013-06-28 14:13     ` Srivatsa S. Bhat
2013-06-29  7:35       ` Sergey Senozhatsky [this message]
2013-07-01  4:42 ` [LOCKDEP] cpufreq: possible circular locking dependency detected Michael Wang
2013-07-10 23:13   ` Sergey Senozhatsky
2013-07-11  2:43     ` Michael Wang
2013-07-11  8:22       ` Sergey Senozhatsky
2013-07-11  8:47         ` Michael Wang
2013-07-11  8:48           ` Michael Wang
2013-07-11 11:47             ` Bartlomiej Zolnierkiewicz
2013-07-12  2:19               ` Michael Wang
2013-07-11  9:01           ` Sergey Senozhatsky
2013-07-14 11:47       ` Sergey Senozhatsky
2013-07-14 12:06         ` Sergey Senozhatsky
2013-07-15  3:50           ` Michael Wang
2013-07-15  7:52             ` Michael Wang
2013-07-15  8:29               ` Sergey Senozhatsky
2013-07-15 13:19                 ` Srivatsa S. Bhat
2013-07-15 13:32                   ` Srivatsa S. Bhat
2013-07-15 20:49                   ` Peter Wu
2013-07-16  8:29                     ` Srivatsa S. Bhat
2013-07-15 23:20                   ` Sergey Senozhatsky
2013-07-16  8:33                     ` Srivatsa S. Bhat
2013-07-16 10:44                       ` Sergey Senozhatsky
2013-07-16 15:19                         ` Srivatsa S. Bhat
2013-07-16 21:29                           ` Rafael J. Wysocki
2013-07-16  2:19                   ` Michael Wang
2013-07-15  2:42         ` Michael Wang
2013-07-14 15:56       ` Rafael J. Wysocki
2013-07-15  2:46         ` Michael Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130629073539.GA2227@swordfish \
    --to=sergey.senozhatsky@gmail.com \
    --cc=bp@alien8.de \
    --cc=cpufreq@vger.kernel.org \
    --cc=jkosina@suse.cz \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=rjw@sisk.pl \
    --cc=srivatsa.bhat@linux.vnet.ibm.com \
    --cc=viresh.kumar@linaro.org \
    --cc=wangyun@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).