From: David Hildenbrand <david@redhat.com>
To: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Linux Memory Management List <linux-mm@kvack.org>,
linuxppc-dev <linuxppc-dev@lists.ozlabs.org>,
ACPI Devel Maling List <linux-acpi@vger.kernel.org>,
devel@linuxdriverproject.org, linux-s390@vger.kernel.org,
xen-devel@lists.xenproject.org,
Andrew Morton <akpm@linux-foundation.org>,
Michal Hocko <mhocko@suse.com>, Vlastimil Babka <vbabka@suse.cz>,
Dan Williams <dan.j.williams@intel.com>,
Pavel Tatashin <pasha.tatashin@oracle.com>,
osalvador@suse.de, Vitaly Kuznetsov <vkuznets@redhat.com>,
Martin Schwidefsky <schwidefsky@de.ibm.com>,
Heiko Carstens <heiko.carstens@de.ibm.com>,
kys@microsoft.com, haiyangz@microsoft.com,
sthemmin@microsoft.com,
Benjamin Herrenschmidt <benh@kernel.crashing.org>,
Paul Mackerras <paulus@samba.org>,
"Rafael J. Wysocki" <rjw@rjwysocki.net>,
Len Brown <lenb@kernel.org>,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
David Rientjes <rientjes@google.com>
Subject: Re: [PATCH RFC 2/2] mm/memory_hotplug: fix online/offline_pages called w.o. mem_hotplug_lock
Date: Fri, 17 Aug 2018 10:27:50 +0200 [thread overview]
Message-ID: <ce6e0944-11bb-a36b-eac1-aaf123b8be8a@redhat.com> (raw)
In-Reply-To: <CAJZ5v0jpx_XOmkSkMiSxcECxNHGXPGZHtgqRy_Q7iKnf-C55pg@mail.gmail.com>
On 17.08.2018 10:20, Rafael J. Wysocki wrote:
> On Fri, Aug 17, 2018 at 9:59 AM David Hildenbrand <david@redhat.com> wrote:
>>
>> There seem to be some problems as result of 30467e0b3be ("mm, hotplug:
>> fix concurrent memory hot-add deadlock"), which tried to fix a possible
>> lock inversion reported and discussed in [1] due to the two locks
>> a) device_lock()
>> b) mem_hotplug_lock
>>
>> While add_memory() first takes b), followed by a) during
>> bus_probe_device(), onlining of memory from user space first took b),
>> followed by a), exposing a possible deadlock.
>>
>> In [1], and it was decided to not make use of device_hotplug_lock, but
>> rather to enforce a locking order. Looking at 1., this order is not always
>> satisfied when calling device_online() - essentially we simply don't take
>> one of both locks anymore - and fixing this would require us to
>> take the mem_hotplug_lock in core driver code (online_store()), which
>> sounds wrong.
>>
>> The problems I spotted related to this:
>>
>> 1. Memory block device attributes: While .state first calls
>> mem_hotplug_begin() and the calls device_online() - which takes
>> device_lock() - .online does no longer call mem_hotplug_begin(), so
>> effectively calls online_pages() without mem_hotplug_lock. onlining/
>> offlining of pages is no longer serialised across different devices.
>>
>> 2. device_online() should be called under device_hotplug_lock, however
>> onlining memory during add_memory() does not take care of that. (I
>> didn't follow how strictly this is needed, but there seems to be a
>> reason because it is documented at device_online() and
>> device_offline()).
>>
>> In addition, I think there is also something wrong about the locking in
>>
>> 3. arch/powerpc/platforms/powernv/memtrace.c calls offline_pages()
>> (and device_online()) without locks. This was introduced after
>> 30467e0b3be. And skimming over the code, I assume it could need some
>> more care in regards to locking.
>>
>> ACPI code already holds the device_hotplug_lock, and as we are
>> effectively hotplugging memory block devices, requiring to hold that
>> lock does not sound too wrong, although not chosen in [1], as
>> "I don't think resolving a locking dependency is appropriate by
>> just serializing them with another lock."
>> I think this is the cleanest solution.
>>
>> Requiring add_memory()/add_memory_resource() to be called under
>> device_hotplug_lock fixes 2., taking the mem_hotplug_lock in
>> online_pages/offline_pages() fixes 1. and 3.
>>
>> Fixup all callers of add_memory/add_memory_resource to hold the lock if
>> not already done.
>>
>> So this is essentially a revert of 30467e0b3be, implementation of what
>> was suggested in [1] by Vitaly, applied to the current tree.
>>
>> [1] http://driverdev.linuxdriverproject.org/pipermail/ driverdev-devel/
>> 2015-February/065324.html
>>
>> This patch is partly based on a patch by Vitaly Kuznetsov.
>>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>> arch/powerpc/platforms/powernv/memtrace.c | 3 ++
>> drivers/acpi/acpi_memhotplug.c | 1 +
>> drivers/base/memory.c | 18 +++++-----
>> drivers/hv/hv_balloon.c | 4 +++
>> drivers/s390/char/sclp_cmd.c | 3 ++
>> drivers/xen/balloon.c | 3 ++
>> mm/memory_hotplug.c | 42 ++++++++++++++++++-----
>> 7 files changed, 55 insertions(+), 19 deletions(-)
>>
>> diff --git a/arch/powerpc/platforms/powernv/memtrace.c b/arch/powerpc/platforms/powernv/memtrace.c
>> index 51dc398ae3f7..4c2737a33020 100644
>> --- a/arch/powerpc/platforms/powernv/memtrace.c
>> +++ b/arch/powerpc/platforms/powernv/memtrace.c
>> @@ -206,6 +206,8 @@ static int memtrace_online(void)
>> int i, ret = 0;
>> struct memtrace_entry *ent;
>>
>> + /* add_memory() requires device_hotplug_lock */
>> + lock_device_hotplug();
>> for (i = memtrace_array_nr - 1; i >= 0; i--) {
>> ent = &memtrace_array[i];
>>
>> @@ -244,6 +246,7 @@ static int memtrace_online(void)
>> pr_info("Added trace memory back to node %d\n", ent->nid);
>> ent->size = ent->start = ent->nid = -1;
>> }
>> + unlock_device_hotplug();
>> if (ret)
>> return ret;
>>
>> diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
>> index 6b0d3ef7309c..e7a4c7900967 100644
>> --- a/drivers/acpi/acpi_memhotplug.c
>> +++ b/drivers/acpi/acpi_memhotplug.c
>> @@ -228,6 +228,7 @@ static int acpi_memory_enable_device(struct acpi_memory_device *mem_device)
>> if (node < 0)
>> node = memory_add_physaddr_to_nid(info->start_addr);
>>
>> + /* we already hold the device_hotplug lock at this point */
>> result = add_memory(node, info->start_addr, info->length);
>>
>> /*
>
> A very minor nit here: I would say "device_hotplug_lock is already
> held at this point" in the comment (I sort of don't like to say "we"
> in code comments as it is not particularly clear what group of people
> is represented by that and the lock is actually called
> device_hotplug_lock).
Easy to fix, thanks!
>
> Otherwise the approach is fine by me.
>
> BTW, the reason why device_hotplug_lock is acquired by the ACPI memory
> hotplug is because it generally needs to be synchronized with respect
> CPU hot-remove and similar. I believe that this may be the case in
> non-ACPI setups as well.
Yes, and that lock is the reason why we didn't have real problems with
ACPI memory hotplug in this respect so far. (as user triggered
online/offline also takes that lock already)
>
> Thanks,
> Rafael
>
--
Thanks,
David / dhildenb
prev parent reply other threads:[~2018-08-17 8:27 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-08-17 7:58 [PATCH RFC 0/2] mm: online/offline_pages called w.o. mem_hotplug_lock David Hildenbrand
2018-08-17 7:59 ` [PATCH RFC 1/2] drivers/base: export lock_device_hotplug/unlock_device_hotplug David Hildenbrand
2018-08-17 8:41 ` Greg Kroah-Hartman
2018-08-17 8:56 ` David Hildenbrand
2018-08-17 9:03 ` Rafael J. Wysocki
2018-08-17 9:41 ` David Hildenbrand
2018-08-17 10:06 ` Greg Kroah-Hartman
2018-08-17 11:04 ` David Hildenbrand
2018-08-17 11:28 ` Heiko Carstens
2018-08-17 11:56 ` David Hildenbrand
2018-08-17 17:02 ` Greg Kroah-Hartman
2018-08-17 8:57 ` Rafael J. Wysocki
2018-08-17 7:59 ` [PATCH RFC 2/2] mm/memory_hotplug: fix online/offline_pages called w.o. mem_hotplug_lock David Hildenbrand
2018-08-17 8:20 ` Rafael J. Wysocki
2018-08-17 8:27 ` David Hildenbrand [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ce6e0944-11bb-a36b-eac1-aaf123b8be8a@redhat.com \
--to=david@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=benh@kernel.crashing.org \
--cc=dan.j.williams@intel.com \
--cc=devel@linuxdriverproject.org \
--cc=gregkh@linuxfoundation.org \
--cc=haiyangz@microsoft.com \
--cc=heiko.carstens@de.ibm.com \
--cc=kys@microsoft.com \
--cc=lenb@kernel.org \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-s390@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=mhocko@suse.com \
--cc=osalvador@suse.de \
--cc=pasha.tatashin@oracle.com \
--cc=paulus@samba.org \
--cc=rafael@kernel.org \
--cc=rientjes@google.com \
--cc=rjw@rjwysocki.net \
--cc=schwidefsky@de.ibm.com \
--cc=sthemmin@microsoft.com \
--cc=vbabka@suse.cz \
--cc=vkuznets@redhat.com \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).