From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qt1-f197.google.com (mail-qt1-f197.google.com [209.85.160.197]) by kanga.kvack.org (Postfix) with ESMTP id 0EA636B0269 for ; Thu, 4 Oct 2018 04:14:03 -0400 (EDT) Received: by mail-qt1-f197.google.com with SMTP id i64-v6so469423qtb.21 for ; Thu, 04 Oct 2018 01:14:03 -0700 (PDT) Received: from mx1.redhat.com (mx1.redhat.com. [209.132.183.28]) by mx.google.com with ESMTPS id a26-v6si2354117qva.162.2018.10.04.01.14.02 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 04 Oct 2018 01:14:02 -0700 (PDT) Subject: Re: [PATCH RFC] mm/memory_hotplug: Introduce memory block types References: <20181001084038.GD18290@dhcp22.suse.cz> <20181002134734.GT18290@dhcp22.suse.cz> <98fb8d65-b641-2225-f842-8804c6f79a06@redhat.com> <8736tndubn.fsf@vitty.brq.redhat.com> <20181003134444.GH4714@dhcp22.suse.cz> <87zhvvcf3b.fsf@vitty.brq.redhat.com> <49456818-238e-2d95-9df6-d1934e9c8b53@linux.intel.com> <87tvm3cd5w.fsf@vitty.brq.redhat.com> <06a35970-e478-18f8-eae6-4022925a5192@redhat.com> <20181004061938.GB22173@dhcp22.suse.cz> From: David Hildenbrand Message-ID: Date: Thu, 4 Oct 2018 10:13:48 +0200 MIME-Version: 1.0 In-Reply-To: <20181004061938.GB22173@dhcp22.suse.cz> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Michal Hocko Cc: Vitaly Kuznetsov , Dave Hansen , Kate Stewart , Rich Felker , linux-ia64@vger.kernel.org, linux-sh@vger.kernel.org, Peter Zijlstra , Benjamin Herrenschmidt , Balbir Singh , Heiko Carstens , linux-mm@kvack.org, Pavel Tatashin , Paul Mackerras , "H. Peter Anvin" , Rashmica Gupta , Boris Ostrovsky , linux-s390@vger.kernel.org, Michael Neuling , Stephen Hemminger , Yoshinori Sato , Michael Ellerman , linux-acpi@vger.kernel.org, Ingo Molnar , xen-devel@lists.xenproject.org, Rob Herring , Len Brown , Fenghua Yu , Stephen Rothwell , "mike.travis@hpe.com" , Haiyang Zhang , Dan Williams , =?UTF-8?Q?Jonathan_Neusch=c3=a4fer?= , Nicholas Piggin , Joe Perches , =?UTF-8?B?SsOpcsO0bWUgR2xpc3Nl?= , Mike Rapoport , Borislav Petkov , Andy Lutomirski , Thomas Gleixner , Joonsoo Kim , Oscar Salvador , Juergen Gross , Tony Luck , Mathieu Malaterre , Greg Kroah-Hartman , "Rafael J. Wysocki" , linux-kernel@vger.kernel.org, Mauricio Faria de Oliveira , Philippe Ombredanne , Martin Schwidefsky , devel@linuxdriverproject.org, Andrew Morton , linuxppc-dev@lists.ozlabs.org, "Kirill A. Shutemov" On 04/10/2018 08:19, Michal Hocko wrote: > On Wed 03-10-18 19:14:05, David Hildenbrand wrote: >> On 03/10/2018 16:34, Vitaly Kuznetsov wrote: >>> Dave Hansen writes: >>> >>>> On 10/03/2018 06:52 AM, Vitaly Kuznetsov wrote: >>>>> It is more than just memmaps (e.g. forking udev process doing memory >>>>> onlining also needs memory) but yes, the main idea is to make the >>>>> onlining synchronous with hotplug. >>>> >>>> That's a good theoretical concern. >>>> >>>> But, is it a problem we need to solve in practice? >>> >>> Yes, unfortunately. It was previously discovered that when we try to >>> hotplug tons of memory to a low memory system (this is a common scenario >>> with VMs) we end up with OOM because for all new memory blocks we need >>> to allocate page tables, struct pages, ... and we need memory to do >>> that. The userspace program doing memory onlining also needs memory to >>> run and in case it prefers to fork to handle hundreds of notfifications >>> ... well, it may get OOMkilled before it manages to online anything. >>> >>> Allocating all kernel objects from the newly hotplugged blocks would >>> definitely help to manage the situation but as I said this won't solve >>> the 'forking udev' problem completely (it will likely remain in >>> 'extreme' cases only. We can probably work around it by onlining with a >>> dedicated process which doesn't do memory allocation). >>> >> >> I guess the problem is even worse. We always have two phases >> >> 1. add memory - requires memory allocation >> 2. online memory - might require memory allocations e.g. for slab/slub >> >> So if we just added memory but don't have sufficient memory to start a >> user space process to trigger onlining, then we most likely also don't >> have sufficient memory to online the memory right away (in some scenarios). >> >> We would have to allocate all new memory for 1 and 2 from the memory to >> be onlined. I guess the latter part is less trivial. >> >> So while onlining the memory from the kernel might make things a little >> more robust, we would still have the chance for OOM / onlining failing. > > Yes, _theoretically_. Is this a practical problem for reasonable > configurations though? I mean, this will never be perfect and we simply > cannot support all possible configurations. We should focus on > reasonable subset of them. From my practical experience the vast > majority of memory is consumed by memmaps (roughly 1.5%). That is not a > lot but I agree that allocating that from the zone normal and off node > is not great. Especially the second part which is noticeable for whole > node hotplug. > > I have a feeling that arguing about fork not able to proceed or OOMing > for the memory hotplug is a bit of a stretch and a sign a of > misconfiguration. > Just to rephrase, I have the same opinion. Something is already messed up if we cannot even fork anymore. We will have OOM already all over the place before/during/after forking. -- Thanks, David / dhildenb