From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3A20EC00449 for ; Wed, 3 Oct 2018 17:02:52 +0000 (UTC) Received: from lists.ozlabs.org (lists.ozlabs.org [203.11.71.2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 7A4442082A for ; Wed, 3 Oct 2018 17:02:51 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7A4442082A Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 42QMlY3mbWzF38y for ; Thu, 4 Oct 2018 03:02:49 +1000 (AEST) Authentication-Results: lists.ozlabs.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: lists.ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=redhat.com (client-ip=209.132.183.28; helo=mx1.redhat.com; envelope-from=david@redhat.com; receiver=) Authentication-Results: lists.ozlabs.org; dmarc=pass (p=none dis=none) header.from=redhat.com Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 42QMjF035SzF377 for ; Thu, 4 Oct 2018 03:00:48 +1000 (AEST) Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id B744E87621; Wed, 3 Oct 2018 17:00:44 +0000 (UTC) Received: from [10.36.116.20] (ovpn-116-20.ams2.redhat.com [10.36.116.20]) by smtp.corp.redhat.com (Postfix) with ESMTP id 74A609CC4; Wed, 3 Oct 2018 17:00:30 +0000 (UTC) Subject: Re: [PATCH RFC] mm/memory_hotplug: Introduce memory block types To: Michal Hocko References: <20180928150357.12942-1-david@redhat.com> <20181001084038.GD18290@dhcp22.suse.cz> <20181002134734.GT18290@dhcp22.suse.cz> <98fb8d65-b641-2225-f842-8804c6f79a06@redhat.com> <20181003135407.GI4714@dhcp22.suse.cz> From: David Hildenbrand Openpgp: preference=signencrypt Autocrypt: addr=david@redhat.com; prefer-encrypt=mutual; keydata= xsFNBFXLn5EBEAC+zYvAFJxCBY9Tr1xZgcESmxVNI/0ffzE/ZQOiHJl6mGkmA1R7/uUpiCjJ dBrn+lhhOYjjNefFQou6478faXE6o2AhmebqT4KiQoUQFV4R7y1KMEKoSyy8hQaK1umALTdL QZLQMzNE74ap+GDK0wnacPQFpcG1AE9RMq3aeErY5tujekBS32jfC/7AnH7I0v1v1TbbK3Gp XNeiN4QroO+5qaSr0ID2sz5jtBLRb15RMre27E1ImpaIv2Jw8NJgW0k/D1RyKCwaTsgRdwuK Kx/Y91XuSBdz0uOyU/S8kM1+ag0wvsGlpBVxRR/xw/E8M7TEwuCZQArqqTCmkG6HGcXFT0V9 PXFNNgV5jXMQRwU0O/ztJIQqsE5LsUomE//bLwzj9IVsaQpKDqW6TAPjcdBDPLHvriq7kGjt WhVhdl0qEYB8lkBEU7V2Yb+SYhmhpDrti9Fq1EsmhiHSkxJcGREoMK/63r9WLZYI3+4W2rAc UucZa4OT27U5ZISjNg3Ev0rxU5UH2/pT4wJCfxwocmqaRr6UYmrtZmND89X0KigoFD/XSeVv jwBRNjPAubK9/k5NoRrYqztM9W6sJqrH8+UWZ1Idd/DdmogJh0gNC0+N42Za9yBRURfIdKSb B3JfpUqcWwE7vUaYrHG1nw54pLUoPG6sAA7Mehl3nd4pZUALHwARAQABzSREYXZpZCBIaWxk ZW5icmFuZCA8ZGF2aWRAcmVkaGF0LmNvbT7CwX4EEwECACgFAljj9eoCGwMFCQlmAYAGCwkI BwMCBhUIAgkKCwQWAgMBAh4BAheAAAoJEE3eEPcA/4Na5IIP/3T/FIQMxIfNzZshIq687qgG 8UbspuE/YSUDdv7r5szYTK6KPTlqN8NAcSfheywbuYD9A4ZeSBWD3/NAVUdrCaRP2IvFyELj xoMvfJccbq45BxzgEspg/bVahNbyuBpLBVjVWwRtFCUEXkyazksSv8pdTMAs9IucChvFmmq3 jJ2vlaz9lYt/lxN246fIVceckPMiUveimngvXZw21VOAhfQ+/sofXF8JCFv2mFcBDoa7eYob s0FLpmqFaeNRHAlzMWgSsP80qx5nWWEvRLdKWi533N2vC/EyunN3HcBwVrXH4hxRBMco3jvM m8VKLKao9wKj82qSivUnkPIwsAGNPdFoPbgghCQiBjBe6A75Z2xHFrzo7t1jg7nQfIyNC7ez MZBJ59sqA9EDMEJPlLNIeJmqslXPjmMFnE7Mby/+335WJYDulsRybN+W5rLT5aMvhC6x6POK z55fMNKrMASCzBJum2Fwjf/VnuGRYkhKCqqZ8gJ3OvmR50tInDV2jZ1DQgc3i550T5JDpToh dPBxZocIhzg+MBSRDXcJmHOx/7nQm3iQ6iLuwmXsRC6f5FbFefk9EjuTKcLMvBsEx+2DEx0E UnmJ4hVg7u1PQ+2Oy+Lh/opK/BDiqlQ8Pz2jiXv5xkECvr/3Sv59hlOCZMOaiLTTjtOIU7Tq 7ut6OL64oAq+zsFNBFXLn5EBEADn1959INH2cwYJv0tsxf5MUCghCj/CA/lc/LMthqQ773ga uB9mN+F1rE9cyyXb6jyOGn+GUjMbnq1o121Vm0+neKHUCBtHyseBfDXHA6m4B3mUTWo13nid 0e4AM71r0DS8+KYh6zvweLX/LL5kQS9GQeT+QNroXcC1NzWbitts6TZ+IrPOwT1hfB4WNC+X 2n4AzDqp3+ILiVST2DT4VBc11Gz6jijpC/KI5Al8ZDhRwG47LUiuQmt3yqrmN63V9wzaPhC+ xbwIsNZlLUvuRnmBPkTJwwrFRZvwu5GPHNndBjVpAfaSTOfppyKBTccu2AXJXWAE1Xjh6GOC 8mlFjZwLxWFqdPHR1n2aPVgoiTLk34LR/bXO+e0GpzFXT7enwyvFFFyAS0Nk1q/7EChPcbRb hJqEBpRNZemxmg55zC3GLvgLKd5A09MOM2BrMea+l0FUR+PuTenh2YmnmLRTro6eZ/qYwWkC u8FFIw4pT0OUDMyLgi+GI1aMpVogTZJ70FgV0pUAlpmrzk/bLbRkF3TwgucpyPtcpmQtTkWS gDS50QG9DR/1As3LLLcNkwJBZzBG6PWbvcOyrwMQUF1nl4SSPV0LLH63+BrrHasfJzxKXzqg rW28CTAE2x8qi7e/6M/+XXhrsMYG+uaViM7n2je3qKe7ofum3s4vq7oFCPsOgwARAQABwsFl BBgBAgAPBQJVy5+RAhsMBQkJZgGAAAoJEE3eEPcA/4NagOsP/jPoIBb/iXVbM+fmSHOjEshl KMwEl/m5iLj3iHnHPVLBUWrXPdS7iQijJA/VLxjnFknhaS60hkUNWexDMxVVP/6lbOrs4bDZ NEWDMktAeqJaFtxackPszlcpRVkAs6Msn9tu8hlvB517pyUgvuD7ZS9gGOMmYwFQDyytpepo YApVV00P0u3AaE0Cj/o71STqGJKZxcVhPaZ+LR+UCBZOyKfEyq+ZN311VpOJZ1IvTExf+S/5 lqnciDtbO3I4Wq0ArLX1gs1q1XlXLaVaA3yVqeC8E7kOchDNinD3hJS4OX0e1gdsx/e6COvy qNg5aL5n0Kl4fcVqM0LdIhsubVs4eiNCa5XMSYpXmVi3HAuFyg9dN+x8thSwI836FoMASwOl C7tHsTjnSGufB+D7F7ZBT61BffNBBIm1KdMxcxqLUVXpBQHHlGkbwI+3Ye+nE6HmZH7IwLwV W+Ajl7oYF+jeKaH4DZFtgLYGLtZ1LDwKPjX7VAsa4Yx7S5+EBAaZGxK510MjIx6SGrZWBrrV TEvdV00F2MnQoeXKzD7O4WFbL55hhyGgfWTHwZ457iN9SgYi1JLPqWkZB0JRXIEtjd4JEQcx +8Umfre0Xt4713VxMygW0PnQt5aSQdMD58jHFxTk092mU+yIHj5LeYgvwSgZN4airXk5yRXl SE+xAvmumFBY Organization: Red Hat GmbH Message-ID: <9fef1f7d-2d7c-03f1-00e3-5fa657eda019@redhat.com> Date: Wed, 3 Oct 2018 19:00:29 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.0 MIME-Version: 1.0 In-Reply-To: <20181003135407.GI4714@dhcp22.suse.cz> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.26]); Wed, 03 Oct 2018 17:00:46 +0000 (UTC) X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Kate Stewart , Rich Felker , linux-ia64@vger.kernel.org, linux-sh@vger.kernel.org, Peter Zijlstra , Dave Hansen , Heiko Carstens , linux-mm@kvack.org, Pavel Tatashin , Paul Mackerras , "H. Peter Anvin" , Rashmica Gupta , "K. Y. Srinivasan" , Boris Ostrovsky , linux-s390@vger.kernel.org, Michael Neuling , Stephen Hemminger , Yoshinori Sato , linux-acpi@vger.kernel.org, Ingo Molnar , xen-devel@lists.xenproject.org, Rob Herring , Len Brown , Fenghua Yu , Stephen Rothwell , "mike.travis@hpe.com" , Haiyang Zhang , Dan Williams , =?UTF-8?Q?Jonathan_Neusch=c3=a4fer?= , Nicholas Piggin , Joe Perches , =?UTF-8?B?SsOpcsO0bWUgR2xpc3Nl?= , Mike Rapoport , Borislav Petkov , Andy Lutomirski , Thomas Gleixner , Joonsoo Kim , Oscar Salvador , Juergen Gross , Tony Luck , Mathieu Malaterre , Greg Kroah-Hartman , "Rafael J. Wysocki" , linux-kernel@vger.kernel.org, Mauricio Faria de Oliveira , Philippe Ombredanne , Martin Schwidefsky , devel@linuxdriverproject.org, Andrew Morton , linuxppc-dev@lists.ozlabs.org, "Kirill A. Shutemov" Errors-To: linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Sender: "Linuxppc-dev" On 03/10/2018 15:54, Michal Hocko wrote: > On Tue 02-10-18 17:25:19, David Hildenbrand wrote: >> On 02/10/2018 15:47, Michal Hocko wrote: > [...] >>> Zone imbalance is an inherent problem of the highmem zone. It is >>> essentially the highmem zone we all loved so much back in 32b days. >>> Yes the movable zone doesn't have any addressing limitations so it is a >>> bit more relaxed but considering the hotplug scenarios I have seen so >>> far people just want to have full NUMA nodes movable to allow replacing >>> DIMMs. And then we are back to square one and the zone imbalance issue. >>> You have those regardless where memmaps are allocated from. >> >> Unfortunately yes. And things get more complicated as you are adding a >> whole DIMMs and get notifications in the granularity of memory blocks. >> Usually you are not interested in onlining any memory block of that DIMM >> as MOVABLE as soon as you would have to online one memory block of that >> DIMM as NORMAL - because that can already block the whole DIMM. > > For the purpose of the hotremove, yes. But as Dave has noted people are > (ab)using zone movable for other purposes - e.g. large pages. That might be right for some very special use cases. For most of users this is not the case (meaning it should be the default but if the user wants to change it, he should be allowed to change it). > > [...] >>> Then the immediate question would be why to use memory hotplug for that >>> at all? Why don't you simply start with a huge pre-allocated physical >>> address space and balloon memory in an out per demand. Why do you want >>> to inject new memory during the runtime? >> >> Let's assume you have a guest with 20GB size and eventually want to >> allow to grow it to 4TB. You would have to allocate metadata for 4TB >> right from the beginning. That's definitely now what we want. That is >> why memory hotplug is used by e.g. XEN or Hyper-V. With Hyper-V, the >> hypervisor even tells you at which places additional memory has been >> made available. > > Then you have to live with the fact that your hot added memory will be > self hosted and find a way for ballooning to work with that. The price > would be that some part of the memory is not really balloonable in the > end. > >>>> 1. is a reason why distributions usually don't configure >>>> "MEMORY_HOTPLUG_DEFAULT_ONLINE", because you really want the option for >>>> MOVABLE zone. That however implies, that e.g. for x86, you have to >>>> handle all new memory in user space, especially also HyperV memory. >>>> There, you then have to check for things like "isHyperV()" to decide >>>> "oh, yes, this should definitely not go to the MOVABLE zone". >>> >>> Why do you need a generic hotplug rule in the first place? Why don't you >>> simply provide different set of rules for different usecases? Let users >>> decide which usecase they prefer rather than try to be clever which >>> almost always hits weird corner cases. >>> >> >> Memory hotplug has to work as reliable as we can out of the box. Letting >> the user make simple decisions like "oh, I am on hyper-V, I want to >> online memory to the normal zone" does not feel right. > > Users usually know what is their usecase and then it is just a matter of > plumbing (e.g. distribution can provide proper tools to deploy those > usecases) to chose the right and for user obscure way to make it work. I disagree. If we can ship sane defaults, we should do that and allow to make changes later on. This is how distributions have been working for ever. But yes, allowing to make modifications is always a good idea to tailor it to some special case user scenarios. (tuned or whatever we have in place). > >> But yes, we >> should definitely allow to make modifications. So some sane default rule >> + possible modification is usually a good idea. >> >> I think Dave has a point with using MOVABLE for huge page use cases. And >> there might be other corner cases as you correctly state. >> >> I wonder if this patch itself minus modifying online/offline might make >> sense. We can then implement simple rules in user space >> >> if (normal) { >> /* customers expect hotplugged DIMMs to be unpluggable */ >> online_movable(); >> } else if (paravirt) { >> /* paravirt memory should as default always go to the NORMAL */ >> online(); >> } else { >> /* standby memory will never get onlined automatically */ >> } >> >> Compared to having to guess what is to be done (isKVM(), isHyperV, >> isS390 ...) and failing once this is no longer unique (e.g. virtio-mem >> and ACPI support for x86 KVM). > > I am worried that exporing a type will just push us even further to the > corner. The current design is really simple and 2 stage and that is good > because it allows for very different usecases. The more specific the API > be the more likely we are going to hit "I haven't even dreamed somebody > would be using hotplug for this thing". And I would bet this will happen > sooner or later. Exposing the type of memory is in my point of view just forwarding facts to user space. We should not export arbitrary information, that is true. > > Just look at how the whole auto onlining screwed the API to workaround > an implementation detail. It has created a one purpose behavior that > doesn't suite many usecases. Yet we have to live with that because > somebody really relies on it. Let's not repeat same errors. > Let me rephrase: You state that user space has to make the decision and that user should be able to set/reconfigure rules. That is perfectly fine. But then we should give user space access to sufficient information to make a decision. This might be the type of memory as we learned (what some part of this patch proposes), but maybe later more, e.g. to which physical device memory belongs (e.g. to hotplug it all movable or all normal) ... -- Thanks, David / dhildenb