All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
To: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
	Viresh Kumar <viresh.kumar@linaro.org>,
	Michael Wang <wangyun@linux.vnet.ibm.com>,
	Jiri Kosina <jkosina@suse.cz>, Borislav Petkov <bp@alien8.de>,
	"Rafael J. Wysocki" <rjw@sisk.pl>,
	linux-kernel@vger.kernel.org, cpufreq@vger.kernel.org,
	linux-pm@vger.kernel.org
Subject: Re: [RFC PATCH] cpu hotplug: rework cpu_hotplug locking (was [LOCKDEP] cpufreq: possible circular locking dependency detected)
Date: Sat, 29 Jun 2013 10:35:39 +0300	[thread overview]
Message-ID: <20130629073539.GA2227@swordfish> (raw)
In-Reply-To: <51CD9A1A.1060908@linux.vnet.ibm.com>

On (06/28/13 19:43), Srivatsa S. Bhat wrote:
> On 06/28/2013 01:14 PM, Sergey Senozhatsky wrote:
> > Hello,
> > Yes, this helps, of course, but at the same time it returns the previous
> > problem -- preventing cpu_hotplug in some places.
> > 
> > 
> > I have a bit different (perhaps naive) RFC patch and would like to hear
> > comments.
> > 
> > 
> > 
> > The idead is to brake existing lock dependency chain by not holding
> > cpu_hotplug lock mutex across the calls. In order to detect active
> > refcount readers or active writer, refcount now may have the following
> > values:
> > 
> > -1: active writer -- only one writer may be active, readers are blocked
> >  0: no readers/writer
> >> 0: active readers -- many readers may be active, writer is blocked
> > 
> > "blocked" reader or writer goes to wait_queue. as soon as writer finishes
> > (refcount becomes 0), it wakeups all existing processes in a wait_queue.
> > reader perform wakeup call only when it sees that pending writer is present
> > (active_writer is not NULL).
> > 
> > cpu_hotplug lock now only required to protect refcount cmp, inc, dec
> > operations so it can be changed to spinlock.
> > 
> 
> Hmm, now that I actually looked at your patch, I see that it is completely
> wrong! I'm sure you intended to fix the *bug*, but instead you ended
> up merely silencing the *warning* (and also left lockdep blind), leaving
> the actual bug as it is!
>

Thank you for your time and review.


> So let me summarize what the actual bug is and what is it that actually
> needs fixing:
> 
> Basically you have 2 things -
> 1. A worker item (cs_dbs_timer in this case) that can re-arm itself
>    using gov_queue_work(). And gov_queue_work() uses get/put_online_cpus().
> 
> 2. In the cpu_down() path, you want to cancel the worker item and destroy
>    and cleanup its resources (the timer_mutex).
> 
> So the problem is that you can deadlock like this:
> 
>     CPU 3                                  CPU 4
> 
>    cpu_down()
>    -> acquire hotplug.lock
> 
> 				      cs_dbs_timer()
> 				        -> get_online_cpus()
> 					   //wait on hotplug.lock
> 
> 
>    try to cancel cs_dbs_timer()
>    synchronously.
> 
> That leads to a deadlock, because, cs_dbs_timer() is waiting to
> get the hotplug lock which CPU 3 is holding, whereas CPU 3 is
> waiting for cs_dbs_timer() to finish. So they can end up mutually
> waiting for each other, forever. (Yeah, the lockdep splat might have
> been a bit cryptic to decode this, but here it is).
> 
> So to fix the *bug*, you need to avoid waiting synchronously while
> holding the hotplug lock. Possibly by using cancel_delayed_work_sync()
> under CPU_POST_DEAD or something like that. That would remove the deadlock
> possibility.

will take a look. Thank you!

	-ss

> Your patch, on the other hand, doesn't remove the deadlock possibility:
> just because you don't hold the lock throughout the hotplug operation
> doesn't mean that the task calling get_online_cpus() can sneak in and
> finish its work in-between a hotplug operation (because the refcount
> won't allow it to). Also, it should *not* be allowed to sneak in like
> that, since that constitutes *racing* with CPU hotplug, which it was
> meant to avoid!.
> 
> Also, as a side effect of not holding the lock throughout the hotplug
> operation, lockdep goes blind, and doesn't complain, even though the
> actual bug is still there! Effectively, this is nothing but papering
> over the bug and silencing the warning, which we should never do.
> 
> So, please, fix the _cpufreq_ code to resolve the deadlock.
> 
> Regards,
> Srivatsa S. Bhat
> 

  reply	other threads:[~2013-06-29  7:35 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-25 21:15 [LOCKDEP] cpufreq: possible circular locking dependency detected Sergey Senozhatsky
2013-06-28  4:43 ` Viresh Kumar
2013-06-28  7:44   ` [RFC PATCH] cpu hotplug: rework cpu_hotplug locking (was [LOCKDEP] cpufreq: possible circular locking dependency detected) Sergey Senozhatsky
2013-06-28  9:31     ` Srivatsa S. Bhat
2013-06-28 10:04       ` Sergey Senozhatsky
2013-06-28 14:13     ` Srivatsa S. Bhat
2013-06-29  7:35       ` Sergey Senozhatsky [this message]
2013-07-01  4:42 ` [LOCKDEP] cpufreq: possible circular locking dependency detected Michael Wang
2013-07-10 23:13   ` Sergey Senozhatsky
2013-07-11  2:43     ` Michael Wang
2013-07-11  8:22       ` Sergey Senozhatsky
2013-07-11  8:47         ` Michael Wang
2013-07-11  8:48           ` Michael Wang
2013-07-11 11:47             ` Bartlomiej Zolnierkiewicz
2013-07-12  2:19               ` Michael Wang
2013-07-11  9:01           ` Sergey Senozhatsky
2013-07-14 11:47       ` Sergey Senozhatsky
2013-07-14 12:06         ` Sergey Senozhatsky
2013-07-15  3:50           ` Michael Wang
2013-07-15  7:52             ` Michael Wang
2013-07-15  8:29               ` Sergey Senozhatsky
2013-07-15 13:19                 ` Srivatsa S. Bhat
2013-07-15 13:32                   ` Srivatsa S. Bhat
2013-07-15 20:49                   ` Peter Wu
2013-07-16  8:29                     ` Srivatsa S. Bhat
2013-07-15 23:20                   ` Sergey Senozhatsky
2013-07-16  8:33                     ` Srivatsa S. Bhat
2013-07-16 10:44                       ` Sergey Senozhatsky
2013-07-16 15:19                         ` Srivatsa S. Bhat
2013-07-16 21:29                           ` Rafael J. Wysocki
2013-07-16  2:19                   ` Michael Wang
2013-07-15  2:42         ` Michael Wang
2013-07-14 15:56       ` Rafael J. Wysocki
2013-07-15  2:46         ` Michael Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130629073539.GA2227@swordfish \
    --to=sergey.senozhatsky@gmail.com \
    --cc=bp@alien8.de \
    --cc=cpufreq@vger.kernel.org \
    --cc=jkosina@suse.cz \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=rjw@sisk.pl \
    --cc=srivatsa.bhat@linux.vnet.ibm.com \
    --cc=viresh.kumar@linaro.org \
    --cc=wangyun@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.