From: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
To: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Tejun Heo <tj@kernel.org>, LKML <linux-kernel@vger.kernel.org>,
gregkh@linuxfoundation.org, toshi.kani@hp.com
Subject: Re: [RFC PATCH v5 2/2] Use kernfs_break_active_protection() for device online store callbacks
Date: Fri, 25 Apr 2014 14:47:11 +0200 [thread overview]
Message-ID: <535A594F.2010604@intel.com> (raw)
In-Reply-To: <1398390415.2805.129.camel@ThinkPad-T5421.cn.ibm.com>
On 4/25/2014 3:46 AM, Li Zhong wrote:
> On Thu, 2014-04-24 at 12:02 +0200, Rafael J. Wysocki wrote:
>> On 4/24/2014 10:59 AM, Li Zhong wrote:
>>> On Wed, 2014-04-23 at 18:12 +0200, Rafael J. Wysocki wrote:
>>>> On 4/23/2014 4:23 PM, Tejun Heo wrote:
>>>>> Hello, Rafael.
>>>> Hi,
>>>>
>>>>> On Wed, Apr 23, 2014 at 12:21:33AM +0200, Rafael J. Wysocki wrote:
>>>>>> Can you please elaborate a bit?
>>>>> Because it can get involved in larger locking dependency issues by
>>>>> joining dependency graphs of two otherwise largely disjoint
>>>>> subsystems. It has potential to create possible deadlocks which don't
>>>>> need to exist.
>>>> Well, I do my best not to add unnecessary locks if that's what you mean.
>>>>
>>>>>> It is there to protect hotplug operations involving multiple devices
>>>>>> (in different subsystems) from racing with each other. Why exactly
>>>>>> is it bad?
>>>>> But why would different subsystems, say cpu and memory, use the same
>>>>> lock? Wouldn't those subsystems already have proper locking inside
>>>>> their own subsystems?
>>>> That locking is not sufficient.
>>>>
>>>>> Why add this additional global lock across multiple subsystems?
>>>> That basically is because of how eject works when it is triggered via ACPI.
>>>>
>>>> It is signaled for a device at the top of a subtree. It may be a
>>>> container of some sort and the eject involves everything below that
>>>> device in the ACPI namespace. That may involve multiple subsystem
>>>> (CPUs, memory, PCI host bridge, etc.).
>>>>
>>>> We do that in two steps, offline (which can fail) and eject proper
>>>> (which can't fail and makes all of the involved devices go away). All
>>>> that has to be done in one go with respect to the sysfs-triggered
>>>> offline/online and that's why the lock is there.
>>> Thank you for the education. It's more clear to me now why we need this
>>> lock.
>>>
>>> I still have some small questions about when this lock is needed:
>>>
>>> I could understand sysfs-triggered online is not acceptable when
>>> removing devices in multiple subsystems. But maybe concurrent offline
>>> and remove(with proper per subsystem locks) seems not harmful?
>>>
>>> And if we just want to remove some devices in a specific subsystem, e.g.
>>> like writing /cpu/release, if it just wants to offline and remove some
>>> cpus, then maybe we don't require the device_hotplug_lock to be taken?
>> No and no.
>>
>> If the offline phase fails for any device in the subtree, we roll back
>> the operation
>> and online devices that have already been offlined by it. Also the ACPI
>> hot-addition
>> needs to acquire device_hotplug_lock so that it doesn't race with ejects
>> and so
>> that lock needs to be taken by sysfs-triggered offline too.
> I can understand that hot-addition needs the device_hotplug lock, but
> still not very clear about the offline.
>
> I guess your are describing following scenario:
>
> user A: (trying remove cpu 1 and memory 1-10)
>
> lock_device_hotplug
> offline cpu with cpu locks -- successful
> offline memories with memory locks -- failed, e.g. for memory8
> online cpu and memory with their locks
> unlock_device_hotplug
What about if all is successful and CPU1 is gone before
device_hotplug_lock is released?
> user B: (trying offline cpu 1)
>
> offline cpu with cpu locks
>
> But I don't see any problem for user B not taking the device_hotplug
> lock. The result may be different for user B to take or not take the
> lock. But I think it could be seen as concurrent online/offline for cpu1
> under cpu hotplug locks, which just depends on which is executed last?
>
> Or did I miss something here?
Yes, you could do offline in parallel with user A without taking
device_hotplug_lock, but the result may be surprising to user B then.
With device _hotplug_lock user B will always see CPU1 off line (or gone)
after his offline in this scenario, while without taking the lock user B
may sometimes see CPU1 on line after his offline. I don't think that's
a good thing.
Rafael
next prev parent reply other threads:[~2014-04-25 12:47 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-04-10 9:18 [RFC PATCH] Suppress a device hot remove related lockdep warning Li Zhong
2014-04-10 13:31 ` Tejun Heo
2014-04-11 4:10 ` [RFC PATCH v2] Use kernfs_break_active_protection() for device online store callbacks Li Zhong
2014-04-11 10:26 ` Tejun Heo
2014-04-14 7:47 ` [RFC PATCH v3] " Li Zhong
2014-04-14 20:13 ` Tejun Heo
2014-04-15 2:44 ` Li Zhong
2014-04-15 14:50 ` Tejun Heo
2014-04-16 1:41 ` Li Zhong
2014-04-16 15:17 ` Tejun Heo
2014-04-17 3:05 ` Li Zhong
2014-04-17 15:06 ` Tejun Heo
2014-04-17 6:50 ` [RFC PATCH v4] " Li Zhong
2014-04-17 15:17 ` Tejun Heo
2014-04-18 8:33 ` Li Zhong
2014-04-21 9:20 ` [RFC PATCH v5 1/2] Use lock_device_hotplug() in cpu_probe_store() and cpu_release_store() Li Zhong
2014-04-21 9:23 ` [RFC PATCH v5 2/2] Use kernfs_break_active_protection() for device online store callbacks Li Zhong
2014-04-21 22:46 ` Tejun Heo
2014-04-22 3:34 ` Li Zhong
2014-04-22 10:11 ` Rafael J. Wysocki
2014-04-23 1:50 ` Li Zhong
2014-04-23 10:54 ` Rafael J. Wysocki
2014-04-24 1:13 ` Li Zhong
2014-04-22 20:44 ` Tejun Heo
2014-04-22 22:21 ` Rafael J. Wysocki
2014-04-23 14:23 ` Tejun Heo
2014-04-23 16:12 ` Rafael J. Wysocki
2014-04-23 16:52 ` Tejun Heo
2014-04-24 8:59 ` Li Zhong
2014-04-24 10:02 ` Rafael J. Wysocki
2014-04-25 1:46 ` Li Zhong
2014-04-25 12:47 ` Rafael J. Wysocki [this message]
2014-04-28 1:49 ` Li Zhong
2014-04-23 5:03 ` Li Zhong
2014-04-23 10:58 ` Rafael J. Wysocki
2014-04-24 1:33 ` Li Zhong
2014-05-09 8:35 ` Li Zhong
2014-05-09 8:40 ` [RFC PATCH v6 1/2 ] Use lock_device_hotplug() in cpu_probe_store() and cpu_release_store() Li Zhong
2014-05-09 8:40 ` [RFC PATCH v6 2/2] Implement lock_device_hotplug_sysfs() by breaking active protection Li Zhong
2014-04-21 22:38 ` [RFC PATCH v5 1/2] Use lock_device_hotplug() in cpu_probe_store() and cpu_release_store() Tejun Heo
2014-04-22 2:29 ` Li Zhong
2014-04-22 20:40 ` Tejun Heo
2014-04-23 2:00 ` Li Zhong
2014-04-23 14:39 ` Tejun Heo
2014-04-24 8:37 ` Li Zhong
2014-04-24 14:32 ` Tejun Heo
2014-04-25 1:56 ` Li Zhong
2014-04-25 12:28 ` Tejun Heo
2014-04-28 0:51 ` Li Zhong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=535A594F.2010604@intel.com \
--to=rafael.j.wysocki@intel.com \
--cc=gregkh@linuxfoundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=tj@kernel.org \
--cc=toshi.kani@hp.com \
--cc=zhong@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox