From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751588AbdGZLpx (ORCPT ); Wed, 26 Jul 2017 07:45:53 -0400 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:34796 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751397AbdGZLpv (ORCPT ); Wed, 26 Jul 2017 07:45:51 -0400 Date: Wed, 26 Jul 2017 13:45:39 +0200 From: Heiko Carstens To: Michal Hocko Cc: linux-mm@kvack.org, Andrew Morton , Mel Gorman , Vlastimil Babka , Andrea Arcangeli , Jerome Glisse , Reza Arbab , Yasuaki Ishimatsu , qiuxishi@huawei.com, Kani Toshimitsu , slaoub@gmail.com, Joonsoo Kim , Andi Kleen , Daniel Kiper , Igor Mammedov , Vitaly Kuznetsov , LKML , Michal Hocko , Benjamin Herrenschmidt , Dan Williams , "H. Peter Anvin" , Ingo Molnar , Michael Ellerman , Paul Mackerras , Thomas Gleixner , Gerald Schaefer Subject: Re: [RFC PATCH 3/5] mm, memory_hotplug: allocate memmap from the added memory range for sparse-vmemmap References: <20170726083333.17754-1-mhocko@kernel.org> <20170726083333.17754-4-mhocko@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170726083333.17754-4-mhocko@kernel.org> User-Agent: Mutt/1.5.24 (2015-08-30) X-TM-AS-GCONF: 00 x-cbid: 17072611-0012-0000-0000-000005667551 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17072611-0013-0000-0000-000018DB8BB2 Message-Id: <20170726114539.GG3218@osiris> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-07-26_05:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1706020000 definitions=main-1707260167 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jul 26, 2017 at 10:33:31AM +0200, Michal Hocko wrote: > From: Michal Hocko > > Physical memory hotadd has to allocate a memmap (struct page array) for > the newly added memory section. kmalloc is currantly used for those > allocations. > > This has some disadvantages a) an existing memory is consumed for > that purpose (~2MB per 128MB memory section) and b) if the whole node > is movable then we have off-node struct pages which has performance > drawbacks. > > a) has turned out to be a problem for memory hotplug based ballooning > because the userspace might not react in time to online memory while > to memory consumed during physical hotadd consumes enough memory to push > system to OOM. 31bc3858ea3e ("memory-hotplug: add automatic onlining > policy for the newly added memory") has been added to workaround that > problem. > > We can do much better when CONFIG_SPARSEMEM_VMEMMAP=y because vmemap > page tables can map arbitrary memory. That means that we can simply > use the beginning of each memory section and map struct pages there. > struct pages which back the allocated space then just need to be treated > carefully so that we know they are not usable. > > Add {_Set,_Clear}PageVmemmap helpers to distinguish those pages in pfn > walkers. We do not have any spare page flag for this purpose so use the > combination of PageReserved bit which already tells that the page should > be ignored by the core mm code and store VMEMMAP_PAGE (which sets all > bits but PAGE_MAPPING_FLAGS) into page->mapping. > > On the memory hotplug front reuse vmem_altmap infrastructure to override > the default allocator used by __vmemap_populate. Once the memmap is > allocated we need a way to mark altmap pfns used for the allocation > and this is done by a new vmem_altmap::flush_alloc_pfns callback. > mark_vmemmap_pages implementation then simply __SetPageVmemmap all > struct pages backing those pfns. The callback is called from > sparse_add_one_section after the memmap has been initialized to 0. > > We also have to be careful about those pages during online and offline > operations. They are simply ignored. > > Finally __ClearPageVmemmap is called when the vmemmap page tables are > torn down. > > Please note that only the memory hotplug is currently using this > allocation scheme. The boot time memmap allocation could use the same > trick as well but this is not done yet. Which kernel are these patches based on? I tried linux-next and Linus' vanilla tree, however the series does not apply. In general I do like your idea, however if I understand your patches correctly we might have an ordering problem on s390: it is not possible to access hot-added memory on s390 before it is online (MEM_GOING_ONLINE succeeded). On MEM_GOING_ONLINE we ask the hypervisor to back the potential available hot-added memory region with physical pages. Accessing those ranges before that will result in an exception. However with your approach the memory is still allocated when add_memory() is being called, correct? That wouldn't be a change to the current behaviour; except for the ordering problem outlined above. Just trying to make sure I get this right :)