public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
To: Borislav Petkov <bp@alien8.de>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>,
	Jacob Pan <jacob.jun.pan@linux.intel.com>,
	LKML <linux-kernel@vger.kernel.org>, Borislav Petkov <bp@suse.de>,
	Ingo Molnar <mingo@kernel.org>,
	"Rafael J. Wysocki" <rafael.j.wysocki@intel.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	"ego@linux.vnet.ibm.com" <ego@linux.vnet.ibm.com>,
	Oleg Nesterov <oleg@redhat.com>
Subject: Re: [PATCH] intel_rapl: Correct hotplug correction
Date: Thu, 22 May 2014 17:24:33 +0530	[thread overview]
Message-ID: <537DE579.6000505@linux.vnet.ibm.com> (raw)
In-Reply-To: <20140522100820.GE4383@pd.tnic>

On 05/22/2014 03:38 PM, Borislav Petkov wrote:
> On Thu, May 22, 2014 at 03:13:46PM +0530, Srivatsa S. Bhat wrote:
>> On 05/22/2014 02:53 PM, Borislav Petkov wrote:
>>> From: Borislav Petkov <bp@suse.de>
>>>
>>> So 009f225ef050 ("powercap, intel-rapl: Fix CPU hotplug callback
>>> registration") says how get_/put_online_cpus() should be replaced with
>>> this cpu_notifier_register_begin/_done().
>>>
>>> But they're still there so what's up?
>>>
>>
>> Ok, so I retained that because the comments in the code said that
>> the caller of rapl_cleanup_data() should hold the hotplug lock.
>>
>> Here is the snippet from the patch's changelog:
>>
>>     ...
>>     Fix the intel-rapl code in the powercap driver by using this latter form
>>     of callback registration. But retain the calls to get/put_online_cpus(),
>>     since they also protect the function rapl_cleanup_data(). By nesting
>>     get/put_online_cpus() *inside* cpu_notifier_register_begin/done(), we avoid
>>     the ABBA deadlock possibility mentioned above.
> 
> My bad, I missed that part.
> 
>> But looking closer at the code, I think the only requirement is that
>> rapl_cleanup_data() should be protected against CPU hotplug, and we
>> don't actually need to hold the cpu_hotplug.lock per-se.
> 
> What is the difference between "CPU hotplug" and cpu_hotplug.lock?
> From looking at the code those are two different mutexes with
> cpu_hotplug.lock, i.e. get_online_cpus() having a preemption point.
> 
> And yet, you want to replace get_/put_online_cpus() with
> cpu_notifier_register_begin/_done()

Its not a plain replacement; it is only where both get/put_online_cpus()
as well as notifier-[un]registration is involved. I was trying to overcome
the problems with the existing notifier registration APIs in cpu hotplug,
which were simply impossible to use in a race-free way. But we can certainly
make this much better with a fresh redesign of the whole thing.

I had proposed some other ways of fixing this here:
http://www.kernelhub.org/?msg=26421&p=2

> which is kinda the same thing but
> not really. The one protects against hotplug operations and the other
> protects against cpu hotplug notifier registration.
> 
> Oh, and there's a third one, aliased to the notifier one, which
> is "attempting to serialize the updates to cpu_online_mask &
> cpu_present_mask."
> 
> So why, oh why do we need three? This is absolutely insane. Do we have
> at least one sensible reason why cpu hotplug users should need to know
> all that gunk?
> 

Yeah, its complicated and perhaps we can do much better than that. But I'll
try to explain why there are so many different locks in the existing code.

get/put_online_cpus() uses cpu_hotplug.lock underneath. Although that's just
a mutex, its not used in the usual way (more about that below).

cpu_maps_update_begin(), cpu_notifier_register_begin/done,
register/unregister_cpu_notifier  -- all of these use the cpu_add_remove_lock.

The fundamental difference between these 2 categories is the concurrency
allowed with the lock :-
get_online_cpus() is like a read_lock(). That is, it allows any number
of concurrent CPU hotplug readers (tasks that want to block hotplug), but it
ensures that a writer won't run concurrently with any reader.

Hence, this won't work for notifier registrations because register/unregister
has to _mutate_ the notifier linked-list, and obviously we can't have multiple
tasks trying to do this at the same time. So we need a proper mutex that
allows only 1 task at a time into the critical section. That's why the
cpu_add_remove_lock is used in all the register/unregister paths.

Now, let's look at why the CPU hotplug writer path (the task that actually
does hotplug) takes cpu_add_remove_lock in addition to the cpu_hotplug.lock.
First reason is simple: you don't want tasks that are doing notifier
[un]registrations to race with hotplug. But the second, although subtle reason
is that put_online_cpus() and cpu_hotplug_writer_begin()/done() require that
there is only 1 CPU hotplug writer at a time. The cpu_add_remove_lock provides
this guarantee, since it is taken in the writer path. (The long comment above
cpu_hotplug_begin() mentions this as well).

And then there is powerpc which uses cpu_maps_update_begin/done() to protect
dlpar operations (operations that change the cpu_present_mask etc). But ideally
it should have just used cpu_hotplug_begin() itself, like what driver/acpi/
processor_driver.c does, but then it would have to hold cpu_add_remove_lock
as well, due to the current design :(

That was just me trying to explain the current mess, not justifying it! :-/

I think Oleg had a proposed patch to use per-cpu rwsem in CPU hotplug to
drastically simplify this whole locking scheme. I think we could look at
that again.

Regards,
Srivatsa S. Bhat



  reply	other threads:[~2014-05-22 11:55 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-22  9:23 [PATCH] intel_rapl: Correct hotplug correction Borislav Petkov
2014-05-22  9:43 ` Srivatsa S. Bhat
2014-05-22 10:08   ` Borislav Petkov
2014-05-22 11:54     ` Srivatsa S. Bhat [this message]
2014-05-22 12:13       ` Borislav Petkov
2014-05-22 12:32       ` Peter Zijlstra
2014-05-22 15:30         ` [PATCH] x86, MCE: Kill CPU_POST_DEAD Borislav Petkov
2014-05-22 15:50           ` Luck, Tony
2014-05-22 19:55             ` Borislav Petkov
2014-05-22 21:13               ` Srivatsa S. Bhat
2014-05-22 21:31                 ` Borislav Petkov
2014-05-22 21:40                   ` Srivatsa S. Bhat
2014-05-22 21:43               ` Srivatsa S. Bhat
2014-05-26 20:01               ` Borislav Petkov
2014-05-22 21:31         ` [PATCH] intel_rapl: Correct hotplug correction Srivatsa S. Bhat

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=537DE579.6000505@linux.vnet.ibm.com \
    --to=srivatsa.bhat@linux.vnet.ibm.com \
    --cc=bp@alien8.de \
    --cc=bp@suse.de \
    --cc=ego@linux.vnet.ibm.com \
    --cc=jacob.jun.pan@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=oleg@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rafael.j.wysocki@intel.com \
    --cc=srinivas.pandruvada@linux.intel.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox