From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-wm1-f48.google.com (mail-wm1-f48.google.com [209.85.128.48]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 613F0337B8E for ; Tue, 6 Jan 2026 15:05:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.48 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1767711954; cv=none; b=sArXDvzP6/dZtLS95aaGyl4aLOLAWZi+kvI5X5YNP/lYi28yfqTngBThQ/j76hoBmOrLXbkxEQFQk6Y7XoH7MrGR+2B9p3NWbbvhtn+Axtaz5q7mB3bZUR/Lqfk/nBVXmqIZZToKZujdgAi2wWh/XzP9x356Ri5y94B1ZXDWu2g= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1767711954; c=relaxed/simple; bh=32dCWeyFTOnFcU66qx8cjXOJDWVcZVAd+Pio7gd13vE=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=KNl9Sbd+JDLiYMlha6bmQxvA9jqf+s5bdLgEkRtBqZKEtcnRtWLrrnnxrx9AM4fcTGRnifweHbh40C/WCExrURgnFZrCKLbgCSyUx+Z4UK2s1uGkcpYZA29n859JmwMNzEMra100FDIYDlzXgSrDwBlpTKqUygNgpvh+WU24qeA= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com; spf=pass smtp.mailfrom=suse.com; dkim=pass (2048-bit key) header.d=suse.com header.i=@suse.com header.b=I79LGlo1; arc=none smtp.client-ip=209.85.128.48 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=suse.com header.i=@suse.com header.b="I79LGlo1" Received: by mail-wm1-f48.google.com with SMTP id 5b1f17b1804b1-4779aa4f928so11296355e9.1 for ; Tue, 06 Jan 2026 07:05:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=google; t=1767711951; x=1768316751; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=ih/MVF9hcG6o3ffasttP2xSTsAYulJAzGdAShorFulo=; b=I79LGlo1SAfyHkty4+Ed6j643VxiZgZI/lXEI/rEkwOAAQL2X5kZDidwj3EHmYRAv1 VXc9elMEJVPKoRBwkpD0wtPNBahs8nqab7z2kFkurIYPomERi+OfdaCeF3lZzJOEB4sR wF8yFmEbsbGOnLM1V1lOBJV6ndAftJZIocmxiUlG88sFNEWG+K3ZtOKdGCrgRBGmrwns udJXA+O6csKWL5KfmUfUzfK8GKf716edDrWLPP6KYzjNEgvMH4MfhR/Ks+V+BZtBtdqR RyRG2c07z2Fw8lwKZ4y5uQSstTjqnvo5WkPrvDxcN7ikrNAlcMZ3MBRhwChWo8kSpgJm GCLA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1767711951; x=1768316751; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-gg:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ih/MVF9hcG6o3ffasttP2xSTsAYulJAzGdAShorFulo=; b=Jh6SM/1gaIpzuyPb9llugtoTxUNGm4DsU6WQdY6OAB6I8gAH794U+jSdds0OZFxesO uxDikJgpSuKztGoIZm0aQhs/HfmpucYi1aQSvSCu2elCo1Ui8yyYsLn/hLmwUNHr+7FH qZ8U4Bj6vW04llQhsTlehdPLxmdNCqNHt1nKSVPmmUJRnlYjB3Ifr8ySSsyxQDTXlZnF iS5mVNP/Q/p0tyx+NjN2moBEqPYgCygV7CLoiTPLSeSOXtlvNhtuKlVF531qbA5idDp0 bhoTTgTjv6Ga3i/5H6X545CxUQcDk5kbxgO0Lac3T1Va+YocmKxNL3c7HR4ZYmriVtS0 J5+w== X-Forwarded-Encrypted: i=1; AJvYcCUHV7YUAJa1xM/9Ljp1fOQaLNFJt4/jpRdfJbte3G51Pbyrbjo3NCRXfPUx68/kz97kUVu9qT09EWTY7Ec=@vger.kernel.org X-Gm-Message-State: AOJu0YxVOI4kyuTLkZD9tgjb04tzTGpXK0Nj1LgpoQ4QYjf7WhMin0v6 F1e4GWuHUXibVnHnDITmQrNN/P4VQOu2xy72JdvgAA2PHdZixdHAY5+TBu4M7JyoJyY= X-Gm-Gg: AY/fxX6yax2zK236K2EHtiEy/RjL6xR8PWM1N7m6T0mEPcoACN6CxyFlWnOhRluJPyT eWHO/Dfl+Z5pNR5Mz1eVoZ0Uixj9zg8PGvSiU5fT0OJv4yJP4ipZT6I4P2qEKd15kAlpPS3GIDf hr0w3v8hXkfA6dMscm1cx1MY3nJ8dRjMApFQ0CjF4DEx9othUi+8S6n+WrgrmfovqoMeKmSu1vC YGtotpPQEMwnEF7iL+wJAb9L3FihfQ+xTKSzJyUqFHovvIrxAAu3dT0YiMLfxtfIwFnlTOGDiv/ lr0lc/Q4FjjxxIYinyk7maR6fCY1YTGFNsa/AgKxJ4JwhIZngkBeqojhuCGDWpnNpOgljayYROl CTxekQijyIdn/4Y+cFz0RiOTGpzZMr/KtH0tZMZn6x0rb03A2jqKbLP/hOrBcn3CoBJYp83CZjN zzKFiO0A6efPCodKqclBu8nX9R X-Google-Smtp-Source: AGHT+IGGY4QMf7Vmj2DO9sOVzlCcpkgE7/x2sMUminVefVSu/zv6TU4CZM+xAOFU1F8EI6fpRdVj5w== X-Received: by 2002:a05:600c:46ca:b0:477:7a53:f493 with SMTP id 5b1f17b1804b1-47d7f0980e2mr37962255e9.23.1767711950535; Tue, 06 Jan 2026 07:05:50 -0800 (PST) Received: from localhost (109-81-90-116.rct.o2.cz. [109.81.90.116]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-47d7f69e802sm47611505e9.8.2026.01.06.07.05.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 06 Jan 2026 07:05:50 -0800 (PST) Date: Tue, 6 Jan 2026 16:05:48 +0100 From: Michal Hocko To: Gregory Price Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@meta.com, david@redhat.com, osalvador@suse.de, gregkh@linuxfoundation.org, rafael@kernel.org, dakr@kernel.org, akpm@linux-foundation.org, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, vbabka@suse.cz, rppt@kernel.org, surenb@google.com, hare@suse.de Subject: Re: [RFC PATCH] memory,memory_hotplug: allow restricting memory blocks to zone movable Message-ID: References: <20260105203611.4079743-1-gourry@gourry.net> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20260105203611.4079743-1-gourry@gourry.net> On Mon 05-01-26 15:36:11, Gregory Price wrote: > It was reported (LPC 2025) that userland services which monitor memory > blocks can cause hot-unplug to fail permanently. > > This can occur when drivers attempt to hot-remove memory in two phases > (offline, remove), while a userland service detects the memory offline > and re-onlines the memory into a zone which may prevent removal. Are there more details about this? > This patch allows a driver to specify that a given memory block is > intended as ZONE_MOVABLE memory only (i.e. the system should try to > protect its hot-unpluggability). This is done via an MHP flag and a new > "movable_only" bool in `struct memory_block`. > > Attempts to online a memory block with movable_only=true with any value > other than MMOP_ONLINE_MOVABLE will fail with -EINVAL. > > It is hard to catch all possible ways to implement offline/remove > process, so a race condition here can clearly still occur if the > userland service onlines the memory back into ZONE_MOVABLE, but it at > least will not prevent the removal of a block at a later time. Irrespective of the userspace note above (which seems like a policy that should probably be re-evaluated or allow for a better fine tuning) I can see some sense in drivers having a better control of which zones (kernel vs. movable) can their managed memory fall into. That being said, rather than movable_only, should we have a mask of online types supported for the mem block? > Suggested-by: Hannes Reinecke > Signed-off-by: Gregory Price > --- > drivers/base/memory.c | 15 +++++++++++---- > include/linux/memory.h | 4 +++- > include/linux/memory_hotplug.h | 13 +++++++++++++ > mm/memory_hotplug.c | 12 +++++++++--- > 4 files changed, 36 insertions(+), 8 deletions(-) > > diff --git a/drivers/base/memory.c b/drivers/base/memory.c > index 6d84a02cfa5d..59512e4b8d62 100644 > --- a/drivers/base/memory.c > +++ b/drivers/base/memory.c > @@ -374,6 +374,8 @@ static int memory_block_change_state(struct memory_block *mem, > > if (to_state == MEM_OFFLINE) > mem->state = MEM_GOING_OFFLINE; > + else if (mem->movable_only && to_state != MMOP_ONLINE_MOVABLE) > + return -EINVAL; > > ret = memory_block_action(mem, to_state); > mem->state = ret ? from_state_req : to_state; > @@ -811,7 +813,8 @@ void memory_block_add_nid_early(struct memory_block *mem, int nid) > > static int add_memory_block(unsigned long block_id, int nid, unsigned long state, > struct vmem_altmap *altmap, > - struct memory_group *group) > + struct memory_group *group, > + bool movable_only) > { > struct memory_block *mem; > int ret = 0; > @@ -829,6 +832,7 @@ static int add_memory_block(unsigned long block_id, int nid, unsigned long state > mem->state = state; > mem->nid = nid; > mem->altmap = altmap; > + mem->movable_only = movable_only; > INIT_LIST_HEAD(&mem->group_next); > > #ifndef CONFIG_NUMA > @@ -880,7 +884,8 @@ static void remove_memory_block(struct memory_block *memory) > */ > int create_memory_block_devices(unsigned long start, unsigned long size, > int nid, struct vmem_altmap *altmap, > - struct memory_group *group) > + struct memory_group *group, > + bool movable_only) > { > const unsigned long start_block_id = pfn_to_block_id(PFN_DOWN(start)); > unsigned long end_block_id = pfn_to_block_id(PFN_DOWN(start + size)); > @@ -893,7 +898,8 @@ int create_memory_block_devices(unsigned long start, unsigned long size, > return -EINVAL; > > for (block_id = start_block_id; block_id != end_block_id; block_id++) { > - ret = add_memory_block(block_id, nid, MEM_OFFLINE, altmap, group); > + ret = add_memory_block(block_id, nid, MEM_OFFLINE, altmap, group, > + movable_only); > if (ret) > break; > } > @@ -998,7 +1004,8 @@ void __init memory_dev_init(void) > continue; > > block_id = memory_block_id(nr); > - ret = add_memory_block(block_id, NUMA_NO_NODE, MEM_ONLINE, NULL, NULL); > + ret = add_memory_block(block_id, NUMA_NO_NODE, MEM_ONLINE, NULL, NULL, > + false); > if (ret) { > panic("%s() failed to add memory block: %d\n", > __func__, ret); > diff --git a/include/linux/memory.h b/include/linux/memory.h > index 43d378038ce2..bab24f796d3d 100644 > --- a/include/linux/memory.h > +++ b/include/linux/memory.h > @@ -80,6 +80,7 @@ struct memory_block { > struct vmem_altmap *altmap; > struct memory_group *group; /* group (if any) for this block */ > struct list_head group_next; /* next block inside memory group */ > + bool movable_only; /* If set, only ZONE_MOVABLE is valid */ > #if defined(CONFIG_MEMORY_FAILURE) && defined(CONFIG_MEMORY_HOTPLUG) > atomic_long_t nr_hwpoison; > #endif > @@ -160,7 +161,8 @@ extern int register_memory_notifier(struct notifier_block *nb); > extern void unregister_memory_notifier(struct notifier_block *nb); > int create_memory_block_devices(unsigned long start, unsigned long size, > int nid, struct vmem_altmap *altmap, > - struct memory_group *group); > + struct memory_group *group, > + bool movable_only); > void remove_memory_block_devices(unsigned long start, unsigned long size); > extern void memory_dev_init(void); > extern int memory_notify(unsigned long val, void *v); > diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h > index 23f038a16231..ca51ef2ad0cf 100644 > --- a/include/linux/memory_hotplug.h > +++ b/include/linux/memory_hotplug.h > @@ -75,6 +75,19 @@ typedef int __bitwise mhp_t; > */ > #define MHP_OFFLINE_INACCESSIBLE ((__force mhp_t)BIT(3)) > > +/* > + * Restrict hotplugged memory blocks to ZONE_MOVABLE only. > + * > + * During offlining of hotplugged memory which was originally onlined > + * as ZONE_MOVABLE, userland services may detect blocks going offline > + * and automatically re-online them into ZONE_NORMAL or lower. When > + * this happens it may become permanently incapable of being removed. > + * > + * Allow driver-managed memory sources to restrict memory blocks to > + * ZONE_MOVABLE only, so that the truly degenerate case can be mitigated. > + */ > +#define MHP_MOVABLE_ONLY ((__force mhp_t)BIT(4)) > + > /* > * Extended parameters for memory hotplug: > * altmap: alternative allocator for memmap array (optional) > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c > index 81ba5b019926..1a184bfd87f6 100644 > --- a/mm/memory_hotplug.c > +++ b/mm/memory_hotplug.c > @@ -1346,7 +1346,9 @@ static int check_hotplug_memory_range(u64 start, u64 size) > > static int online_memory_block(struct memory_block *mem, void *arg) > { > - mem->online_type = mhp_get_default_online_type(); > + mem->online_type = mem->movable_only ? > + MMOP_ONLINE_MOVABLE : > + mhp_get_default_online_type(); > return device_online(&mem->dev); > } > > @@ -1449,6 +1451,7 @@ static int create_altmaps_and_memory_blocks(int nid, struct memory_group *group, > unsigned long memblock_size = memory_block_size_bytes(); > u64 cur_start; > int ret; > + bool movable_only = mhp_flags & MHP_MOVABLE_ONLY; > > for (cur_start = start; cur_start < start + size; > cur_start += memblock_size) { > @@ -1478,7 +1481,8 @@ static int create_altmaps_and_memory_blocks(int nid, struct memory_group *group, > > /* create memory block devices after memory was added */ > ret = create_memory_block_devices(cur_start, memblock_size, nid, > - params.altmap, group); > + params.altmap, group, > + movable_only); > if (ret) { > arch_remove_memory(cur_start, memblock_size, NULL); > kfree(params.altmap); > @@ -1506,6 +1510,7 @@ int add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags) > struct memory_group *group = NULL; > u64 start, size; > bool new_node = false; > + bool movable_only = mhp_flags & MHP_MOVABLE_ONLY; > int ret; > > start = res->start; > @@ -1564,7 +1569,8 @@ int add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags) > goto error; > > /* create memory block devices after memory was added */ > - ret = create_memory_block_devices(start, size, nid, NULL, group); > + ret = create_memory_block_devices(start, size, nid, NULL, group, > + movable_only); > if (ret) { > arch_remove_memory(start, size, params.altmap); > goto error; > -- > 2.52.0 -- Michal Hocko SUSE Labs