From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B1A08EB64DC for ; Fri, 21 Jul 2023 12:21:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F3E926B0071; Fri, 21 Jul 2023 08:21:06 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EC60A6B0072; Fri, 21 Jul 2023 08:21:06 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D18C46B0074; Fri, 21 Jul 2023 08:21:06 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id BF83B6B0071 for ; Fri, 21 Jul 2023 08:21:06 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 87E7840265 for ; Fri, 21 Jul 2023 12:21:06 +0000 (UTC) X-FDA: 81035528532.11.D3803C7 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by imf28.hostedemail.com (Postfix) with ESMTP id E8282C0009 for ; Fri, 21 Jul 2023 12:21:02 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=ibm.com header.s=pp1 header.b=o6B8cX3l; dmarc=pass (policy=none) header.from=ibm.com; spf=pass (imf28.hostedemail.com: domain of aneesh.kumar@linux.ibm.com designates 148.163.156.1 as permitted sender) smtp.mailfrom=aneesh.kumar@linux.ibm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1689942063; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=bzxwg2k7BPkZXd8HHFLnqIzU24q3iPDU7xep8XanVbc=; b=3kkBflJK3w3y3H7Qhm294LOaL9Z/mtTJpl1hFlEEz9Kak6hYnTm/KcDNIuvZswynuM6M8i gc2soHAU0NMkDyzNAGan1lcDJ0Q4gY1M/oNF5apawAhG9VQeyc9jUTIFJCm99ConCG4Y4E GwYRExOV4Ojd+MPVYH84KTzhT3uIGDQ= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=ibm.com header.s=pp1 header.b=o6B8cX3l; dmarc=pass (policy=none) header.from=ibm.com; spf=pass (imf28.hostedemail.com: domain of aneesh.kumar@linux.ibm.com designates 148.163.156.1 as permitted sender) smtp.mailfrom=aneesh.kumar@linux.ibm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1689942063; a=rsa-sha256; cv=none; b=fgpZ3TE6afFYYfdvQ5h3h5k9XTb0cTNFYTS+QU92xDUGwY+ABMm/COkDh4VJTo/WE49JAx TA8Q+a9LBzUxTXhVypTly466LlY1QafUZkXNjhHlU6LZbORr8ngRVjOWicrdWhEG4tldRW YTGY80b3honAU8pS7sluDZl4jkXOWmE= Received: from pps.filterd (m0356517.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 36LCFEqI014189; Fri, 21 Jul 2023 12:20:53 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : in-reply-to : references : date : message-id : content-type : mime-version; s=pp1; bh=bzxwg2k7BPkZXd8HHFLnqIzU24q3iPDU7xep8XanVbc=; b=o6B8cX3lQOhDD9HklRZA5KSVtMkfkC5my/lxi0fLcQqOY0+nBI/O3+Sg5VLw62LbRtYB G298EovwslScJDGYSdNZwU/cb+zhOWlYyeRYrZbKTlK7//3cjsbeA3OhQQguYbhVSJ3c 2jGpPpnO14t5LnMYhRofaObY2cy2tLJgttC6cP/VFQmtusrhXUJhlC8DIK5XBv1WZMNw BTXN8pAGHcfABZCqmdWzKdFeZTmVeY/bPmIdYG1VUMVML8g0a0F51BWvfYwiVAT7c/9V yYxOfNOiALS17rPQ6JMGQ31j1u3Q9KipqzW6KApW3QPtMlE/jytTcsmtW21+QQxZGxnc gA== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3ryc7gadne-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 21 Jul 2023 12:20:53 +0000 Received: from m0356517.ppops.net (m0356517.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 36LC7N08019305; Fri, 21 Jul 2023 12:20:52 GMT Received: from ppma22.wdc07v.mail.ibm.com (5c.69.3da9.ip4.static.sl-reverse.com [169.61.105.92]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3ryc7gadn1-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 21 Jul 2023 12:20:52 +0000 Received: from pps.filterd (ppma22.wdc07v.mail.ibm.com [127.0.0.1]) by ppma22.wdc07v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 36LB0jS4003394; Fri, 21 Jul 2023 12:20:50 GMT Received: from smtprelay06.dal12v.mail.ibm.com ([172.16.1.8]) by ppma22.wdc07v.mail.ibm.com (PPS) with ESMTPS id 3rv65xxcch-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 21 Jul 2023 12:20:50 +0000 Received: from smtpav02.wdc07v.mail.ibm.com (smtpav02.wdc07v.mail.ibm.com [10.39.53.229]) by smtprelay06.dal12v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 36LCKosb2425478 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 21 Jul 2023 12:20:50 GMT Received: from smtpav02.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id D8A8758059; Fri, 21 Jul 2023 12:20:49 +0000 (GMT) Received: from smtpav02.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 5FA585805C; Fri, 21 Jul 2023 12:20:44 +0000 (GMT) Received: from skywalker.linux.ibm.com (unknown [9.43.117.127]) by smtpav02.wdc07v.mail.ibm.com (Postfix) with ESMTP; Fri, 21 Jul 2023 12:20:44 +0000 (GMT) X-Mailer: emacs 29.0.91 (via feedmail 11-beta-1 I) From: "Aneesh Kumar K.V" To: Vishal Verma , Andrew Morton , David Hildenbrand , Oscar Salvador , Dan Williams , Dave Jiang Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, nvdimm@lists.linux.dev, linux-cxl@vger.kernel.org, Huang Ying , Dave Hansen , Jonathan Cameron , Jeff Moyer , Vishal Verma Subject: Re: [PATCH v2 2/3] mm/memory_hotplug: split memmap_on_memory requests across memblocks In-Reply-To: <20230720-vv-kmem_memmap-v2-2-88bdaab34993@intel.com> References: <20230720-vv-kmem_memmap-v2-0-88bdaab34993@intel.com> <20230720-vv-kmem_memmap-v2-2-88bdaab34993@intel.com> Date: Fri, 21 Jul 2023 17:50:41 +0530 Message-ID: <87edl1a21y.fsf@linux.ibm.com> Content-Type: text/plain X-TM-AS-GCONF: 00 X-Proofpoint-GUID: __nofeciHubpvhFGmR6_V6kErUgitUXn X-Proofpoint-ORIG-GUID: bBR0yVoOR_y30Vymj85jF5ojPEKJnZat X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.957,Hydra:6.0.591,FMLib:17.11.176.26 definitions=2023-07-21_07,2023-07-20_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 lowpriorityscore=0 bulkscore=0 mlxlogscore=999 malwarescore=0 priorityscore=1501 suspectscore=0 adultscore=0 mlxscore=0 spamscore=0 clxscore=1011 impostorscore=0 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2306200000 definitions=main-2307210108 X-Rspam-User: X-Stat-Signature: hqfbpbiyoqdi5sa71dktngjcsim97yon X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: E8282C0009 X-HE-Tag: 1689942062-765287 X-HE-Meta: U2FsdGVkX18W6ewc01cXjILyDdYKD8NMoO/+veHncVgXv01/3ums61WwOT01xKrf+NDH6OYgTZeP3s5LqYTkdcomDB3jVl3waaH5rmYwMIS6CBgJY8YgJGGV9QjKFqFbfv6B9E0AbOwSzvmfRaxSPm3XcRA7h53k/e/ZmrMg2EdP3OHqyZxjPt1lzmuqBtJGmlUdSibnlH/nafbDhfAwBHPn7lSZwfu3GscY2OuYjfblTTqWaRhpi7MlfFd7JcuWnM+71n6LVe4IivTBeb9hW0mi5op8qKDYRGdEfzERkXsx0jj+fwYDl2tNZ4C2g5XJBH1xMn8Yt6WiUw8/IfMwmN9pk3M8pRGxfF1hZPp5YkY9UBFgw6nSo+2uRjbjOGCaVzvorezEVvccDcHve8o2Xq+4vL75DL+IEah7sJw7CseYJ1N5xkw1jscQy10mg+GLlhJq/kxlxEV+ju9yeAr7nzS0Aii0NYVe3XcVC7mhKUkBOVB+nVeEhOnfkx+DuBFvWwfDakKwIW54qi4vsDKwC+lzrh1PD3CygcLF+4zbF3ft7iqqwAmCM/tycg71U9QChm+huWteKvjN2H0N+2IP6G2O+qf8MfE4LTlhLvMlBxDvgsFMO+qx3IC6udpZpyJSOosMcFZQy+Uj1dR4ch4LGJgFFjB9G3kMn74f+q8tvWa7R3xavBySgX3wPuBigCa0kVgP2mZr9jZsAovIcmXhgtFXvm/f74dy9OLqkFt6wGnJU4bR2n/tkkUb34i41Dar4pveMwkMoafXhCtKCIyr1YQx82qmdrQXWhlxFEKCrPSJlADFTcLPP9dLQzFMD+JEQTT41TZFp0XvN8PzKAuhzdI9kxB0yCEtw8xVottXzrrhkSAL1xzR30rx1UwQpMMS+HlA2+9gI3aBPKIMptmhK/RoNdgH2LZOeSHUxDlPAiUxgjmxhIVnHZGH2K0QiU0UBF6iVKjvpziO5ReGKL/ fiA5pSvP R64zsZAzX0c7vktWWg99yPpGxhSmPmxzBUGUi3G6D33gZXo3xTuwuE9uk5FBTOpfCeahpc2VgqwrUPXTnMWmbcMV+LlWnxiPObD4GvtRJzpI9QwGP/Gqp6s8sI8xW9v09ZHtIHdy1ULiNe9ZEyUN7tYEUHplXb/cU2v3fbEb1NuvfTrfYR0k1ycjeUIGEBY+BdxAoU9lMRu7ViTec52Biz2nA8ZdpzzIm1YrIZjAHmTBsIcaJz/2MSKlX66R2K+wmAq9yD8wEZNGdfdCbH1a+ZfszEBrXbrEpw3IqkGM2E+B7kVE358fJB47DsjmwlsK14xYS X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Vishal Verma writes: > The MHP_MEMMAP_ON_MEMORY flag for hotplugged memory is currently > restricted to 'memblock_size' chunks of memory being added. Adding a > larger span of memory precludes memmap_on_memory semantics. > > For users of hotplug such as kmem, large amounts of memory might get > added from the CXL subsystem. In some cases, this amount may exceed the > available 'main memory' to store the memmap for the memory being added. > In this case, it is useful to have a way to place the memmap on the > memory being added, even if it means splitting the addition into > memblock-sized chunks. > > Change add_memory_resource() to loop over memblock-sized chunks of > memory if caller requested memmap_on_memory, and if other conditions for > it are met,. Teach try_remove_memory() to also expect that a memory > range being removed might have been split up into memblock sized chunks, > and to loop through those as needed. > This conflicts with https://lore.kernel.org/linux-mm/20230718024409.95742-1-aneesh.kumar@linux.ibm.com/ IIUC Andrew was planning add that series to -mm. Also that patchset makes some of related changes in this patch not required. Can you rebase this series on top of that ? > > Cc: Andrew Morton > Cc: David Hildenbrand > Cc: Oscar Salvador > Cc: Dan Williams > Cc: Dave Jiang > Cc: Dave Hansen > Cc: Huang Ying > Suggested-by: David Hildenbrand > Signed-off-by: Vishal Verma > --- > mm/memory_hotplug.c | 154 +++++++++++++++++++++++++++++++--------------------- > 1 file changed, 91 insertions(+), 63 deletions(-) > > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c > index e9bcacbcbae2..20456f0d28e6 100644 > --- a/mm/memory_hotplug.c > +++ b/mm/memory_hotplug.c > @@ -1286,6 +1286,35 @@ bool mhp_supports_memmap_on_memory(unsigned long size) > } > EXPORT_SYMBOL_GPL(mhp_supports_memmap_on_memory); > > +static int add_memory_create_devices(int nid, struct memory_group *group, > + u64 start, u64 size, mhp_t mhp_flags) > +{ > + struct mhp_params params = { .pgprot = pgprot_mhp(PAGE_KERNEL) }; > + struct vmem_altmap mhp_altmap = {}; > + int ret; > + > + if ((mhp_flags & MHP_MEMMAP_ON_MEMORY)) { > + mhp_altmap.free = PHYS_PFN(size); > + mhp_altmap.base_pfn = PHYS_PFN(start); > + params.altmap = &mhp_altmap; > + } > + > + /* call arch's memory hotadd */ > + ret = arch_add_memory(nid, start, size, ¶ms); > + if (ret < 0) > + return ret; > + > + /* create memory block devices after memory was added */ > + ret = create_memory_block_devices(start, size, mhp_altmap.alloc, > + group); > + if (ret) { > + arch_remove_memory(start, size, NULL); > + return ret; > + } > + > + return 0; > +} > + > /* > * NOTE: The caller must call lock_device_hotplug() to serialize hotplug > * and online/offline operations (triggered e.g. by sysfs). > @@ -1294,11 +1323,10 @@ EXPORT_SYMBOL_GPL(mhp_supports_memmap_on_memory); > */ > int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags) > { > - struct mhp_params params = { .pgprot = pgprot_mhp(PAGE_KERNEL) }; > + unsigned long memblock_size = memory_block_size_bytes(); > enum memblock_flags memblock_flags = MEMBLOCK_NONE; > - struct vmem_altmap mhp_altmap = {}; > struct memory_group *group = NULL; > - u64 start, size; > + u64 start, size, cur_start; > bool new_node = false; > int ret; > > @@ -1339,27 +1367,20 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags) > /* > * Self hosted memmap array > */ > - if (mhp_flags & MHP_MEMMAP_ON_MEMORY) { > - if (!mhp_supports_memmap_on_memory(size)) { > - ret = -EINVAL; > + if ((mhp_flags & MHP_MEMMAP_ON_MEMORY) && > + mhp_supports_memmap_on_memory(memblock_size)) { > + for (cur_start = start; cur_start < start + size; > + cur_start += memblock_size) { > + ret = add_memory_create_devices(nid, group, cur_start, > + memblock_size, > + mhp_flags); > + if (ret) > + goto error; > + } > + } else { > + ret = add_memory_create_devices(nid, group, start, size, mhp_flags); > + if (ret) > goto error; > - } > - mhp_altmap.free = PHYS_PFN(size); > - mhp_altmap.base_pfn = PHYS_PFN(start); > - params.altmap = &mhp_altmap; > - } > - > - /* call arch's memory hotadd */ > - ret = arch_add_memory(nid, start, size, ¶ms); > - if (ret < 0) > - goto error; > - > - /* create memory block devices after memory was added */ > - ret = create_memory_block_devices(start, size, mhp_altmap.alloc, > - group); > - if (ret) { > - arch_remove_memory(start, size, NULL); > - goto error; > } > > if (new_node) { > @@ -2035,12 +2056,38 @@ void try_offline_node(int nid) > } > EXPORT_SYMBOL(try_offline_node); > > -static int __ref try_remove_memory(u64 start, u64 size) > +static void __ref __try_remove_memory(int nid, u64 start, u64 size, > + struct vmem_altmap *altmap) > { > - struct vmem_altmap mhp_altmap = {}; > - struct vmem_altmap *altmap = NULL; > - unsigned long nr_vmemmap_pages; > - int rc = 0, nid = NUMA_NO_NODE; > + /* remove memmap entry */ > + firmware_map_remove(start, start + size, "System RAM"); > + > + /* > + * Memory block device removal under the device_hotplug_lock is > + * a barrier against racing online attempts. > + */ > + remove_memory_block_devices(start, size); > + > + mem_hotplug_begin(); > + > + arch_remove_memory(start, size, altmap); > + > + if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK)) { > + memblock_phys_free(start, size); > + memblock_remove(start, size); > + } > + > + release_mem_region_adjustable(start, size); > + > + if (nid != NUMA_NO_NODE) > + try_offline_node(nid); > + > + mem_hotplug_done(); > +} > + > +static int try_remove_memory(u64 start, u64 size) > +{ > + int rc, nid = NUMA_NO_NODE; > > BUG_ON(check_hotplug_memory_range(start, size)); > > @@ -2058,20 +2105,21 @@ static int __ref try_remove_memory(u64 start, u64 size) > return rc; > > /* > - * We only support removing memory added with MHP_MEMMAP_ON_MEMORY in > - * the same granularity it was added - a single memory block. > + * For memmap_on_memory, the altmaps could have been added on > + * a per-memblock basis. Loop through the entire range if so, > + * and remove each memblock and its altmap > */ > if (mhp_memmap_on_memory()) { > - nr_vmemmap_pages = walk_memory_blocks(start, size, NULL, > - get_nr_vmemmap_pages_cb); > - if (nr_vmemmap_pages) { > - if (size != memory_block_size_bytes()) { > - pr_warn("Refuse to remove %#llx - %#llx," > - "wrong granularity\n", > - start, start + size); > - return -EINVAL; > - } > + unsigned long memblock_size = memory_block_size_bytes(); > + struct vmem_altmap mhp_altmap = {}; > + struct vmem_altmap *altmap; > + u64 cur_start; > > + for (cur_start = start; cur_start < start + size; > + cur_start += memblock_size) { > + unsigned long nr_vmemmap_pages = > + walk_memory_blocks(start, memblock_size, NULL, > + get_nr_vmemmap_pages_cb); > /* > * Let remove_pmd_table->free_hugepage_table do the > * right thing if we used vmem_altmap when hot-adding > @@ -2079,33 +2127,13 @@ static int __ref try_remove_memory(u64 start, u64 size) > */ > mhp_altmap.alloc = nr_vmemmap_pages; > altmap = &mhp_altmap; > + __try_remove_memory(nid, cur_start, memblock_size, > + altmap); > } > + } else { > + __try_remove_memory(nid, start, size, NULL); > } > > - /* remove memmap entry */ > - firmware_map_remove(start, start + size, "System RAM"); > - > - /* > - * Memory block device removal under the device_hotplug_lock is > - * a barrier against racing online attempts. > - */ > - remove_memory_block_devices(start, size); > - > - mem_hotplug_begin(); > - > - arch_remove_memory(start, size, altmap); > - > - if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK)) { > - memblock_phys_free(start, size); > - memblock_remove(start, size); > - } > - > - release_mem_region_adjustable(start, size); > - > - if (nid != NUMA_NO_NODE) > - try_offline_node(nid); > - > - mem_hotplug_done(); > return 0; > } > > > -- > 2.41.0