From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-m49221.qiye.163.com (mail-m49221.qiye.163.com [45.254.49.221]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7B4A2381AFD for ; Mon, 11 May 2026 12:29:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.254.49.221 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778502600; cv=none; b=cM4hPUrPRyfU1zn3CcJabmPSv9Mo/ILhaCrlIRX95KuYutImfBgCBPsvNkisESGf5sEEB/TrVwRsVATwcomjIffxTog+vcUSPAOJ7iPy7TCGmE27mYoGrKR6FU9Iq0ijHyV9HNAbDUgfdGTaSUM4MrpPsSHGYuMPQvDKJBxzs9g= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778502600; c=relaxed/simple; bh=4824ogFrNWJOorCtVKKvWmVe/v+D6pwQaK/wPjIfBH0=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=d4iVKPB+92IIVHpUHTswvyTvKgWmtZBTQvZgRGg3qFHioVkLD8A5Orp5alOgUkSSrbsF7Zrws9eQEU+QAaVa64vCZM9nRLC0ph9TBY2xZlpSNEQ9owjHoJmU98gZbArF55ho0mUOMNZofoVvz74/xbm/Aex6oowkSnUL7SrYX/0= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=easystack.cn; spf=pass smtp.mailfrom=easystack.cn; arc=none smtp.client-ip=45.254.49.221 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=easystack.cn Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=easystack.cn Received: from [192.168.0.59] (unknown [218.94.118.90]) by smtp.qiye.163.com (Hmail) with ESMTP id 19f9af269; Mon, 11 May 2026 20:24:41 +0800 (GMT+08:00) Message-ID: <2a70be08-ea9c-473b-80f2-9852e2f6fc51@easystack.cn> Date: Mon, 11 May 2026 20:24:40 +0800 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v6 2/3] mm/page_owner: add NUMA node filter with nodelist support To: Oscar Salvador Cc: akpm@linux-foundation.org, vbabka@kernel.org, surenb@google.com, mhocko@suse.com, jackmanb@google.com, hannes@cmpxchg.org, ziy@nvidia.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <20260511033017.747781-1-zhen.ni@easystack.cn> <20260511033017.747781-3-zhen.ni@easystack.cn> From: "zhen.ni" In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-HM-Tid: 0a9e16ff27410229kunm419f9bb3303784 X-HM-MType: 1 X-HM-Spam-Status: e1kfGhgUHx5ZQUpXWQgPGg8OCBgUHx5ZQUlOS1dZFg8aDwILHllBWSg2Ly tZV1koWUFJQjdXWRgWCB1ZQUpXWS1ZQUlXWQ8JGhUIEh9ZQVkaSk4dVkhIGUtNTR0ZTE8ZT1YVFA kWGhdVGRETFhoSFyQUDg9ZV1kYEgtZQVlJSkNVQk9VSkpDVUJLWVdZFhoPEhUdFFlBWU9LSFVCQk lOS1VKS0tVSkJLQlkG 在 2026/5/11 16:54, Oscar Salvador 写道: > On Mon, May 11, 2026 at 11:30:16AM +0800, Zhen Ni wrote: >> Add NUMA node filtering functionality to page_owner to allow filtering >> pages by specific NUMA node(s). This is useful for NUMA-aware memory >> allocation analysis and debugging. >> >> The filter supports flexible nodelist input formats: >> - Single node: echo "0" > nid >> - Multiple nodes: echo "0,2,3" > nid >> - Node range: echo "0-3" > nid >> - Mixed format: echo "0,2-4,7" > nid >> - Clear filter: echo > nid (empty string) >> >> The implementation uses nodemask_t for efficient multi-node filtering >> and nodelist_parse() for flexible input parsing. Empty input clears >> the filter. >> >> Note: Access to nid_mask uses plain load/store without locking because >> nodemask_t is too large (128 bytes) for READ_ONCE/WRITE_ONCE. This is >> safe for debug use: low-frequency changes and torn reads would only >> cause temporary inconsistency in debug output. >> >> Signed-off-by: Zhen Ni >> --- > ... >> --- >> mm/page_owner.c | 92 +++++++++++++++++++++++++++++++++++++++++++++++++ >> 1 file changed, 92 insertions(+) >> >> diff --git a/mm/page_owner.c b/mm/page_owner.c >> index 27a412c52d41..8a38005539ff 100644 >> --- a/mm/page_owner.c >> +++ b/mm/page_owner.c > ... >> @@ -700,6 +708,9 @@ read_page_owner(struct file *file, char __user *buf, size_t count, loff_t *ppos) >> while (!pfn_valid(pfn) && (pfn & (MAX_ORDER_NR_PAGES - 1)) != 0) >> pfn++; >> >> + mask = owner_filter.nid_mask; >> + filter_by_nid = !nodes_empty(mask); >> + >> /* Find an allocated page */ >> for (; pfn < max_pfn; pfn++) { >> /* >> @@ -732,6 +743,14 @@ read_page_owner(struct file *file, char __user *buf, size_t count, loff_t *ppos) >> if (unlikely(!page_ext)) >> continue; >> >> + /* NUMA node filter using bitmask */ >> + if (filter_by_nid) { > > This comment is kinda pointless because it explains something that the code makes it > quite clear. > Either drop it, or just go with "NUMA node filter", but "using bitmask" > does not really add much. > I'll just remove it entirely. > >> + int nid = page_to_nid(page); >> + >> + if (!node_isset(nid, mask)) >> + goto ext_put_continue; >> + } >> + >> /* >> * Some pages could be missed by concurrent allocation or free, >> * because we don't hold the zone lock. >> @@ -1043,6 +1062,77 @@ static const struct file_operations page_owner_print_mode_fops = { >> .llseek = default_llseek, >> }; >> >> +static ssize_t nid_filter_write(struct file *file, >> + const char __user *buf, >> + size_t count, loff_t *ppos) >> +{ >> + char *kbuf; >> + nodemask_t mask; >> + int ret; >> + >> + /* >> + * Limit input size to handle worst-case nodelist (all nodes). >> + * Worst case per node: ",NNNNN" (comma + 5-digit node number) = 6 bytes. >> + */ >> + if (count > (6 * MAX_NUMNODES)) >> + return -EINVAL; >> + >> + kbuf = kmalloc_objs(*kbuf, count + 1); >> + if (!kbuf) >> + return -ENOMEM; >> + >> + if (strncpy_from_user(kbuf, buf, count) < 0) { >> + ret = -EFAULT; >> + goto out_free; >> + } >> + kbuf[count] = '\0'; >> + >> + /* Support nodelist format like "0", "0,2", "0-3", or empty to clear */ >> + if (nodelist_parse(kbuf, mask)) { >> + ret = -EINVAL; >> + goto out_free; >> + } > > nodelist_parse() can also return other return values besides EINVAL. > Something like > > ret = nodelist_parse(...) > if (ret < 0) > return ret > > might be cleaner. > I'll change it. >> + >> + /* Validate that all specified nodes actually exist in the system */ >> + if (!nodes_subset(mask, node_states[N_MEMORY])) { >> + ret = -EINVAL; >> + goto out_free; >> + } > > Ok, I get that since you want to filter allocations by numa nodes, you > want to make sure that those nodes have memory. > Although that might change due to concurrent memory-hotplug operations, > but that is a different story. > > I do not like the comment though, because we can have other nodes > existing in the system with no memory (e.g: memoryless nodes only having > cpus, or none of them), so I would make that clearer: > > " > /* > * We want to filter memory allocations by numa nodes, so make sure > * that the specified nodes have memory. > */ > " > > or something along those lines. > > I'll update the comment to be more precise about filtering nodes with memory. >> + >> + owner_filter.nid_mask = mask; >> + ret = count; >> + >> +out_free: >> + kfree(kbuf); >> + return ret; >> +} >> + >> +static int nid_filter_show(struct seq_file *m, void *v) >> +{ >> + nodemask_t mask = owner_filter.nid_mask; >> + >> + if (nodes_empty(mask)) >> + seq_puts(m, "\n"); >> + else >> + seq_printf(m, "%*pbl\n", nodemask_pr_args(&mask)); > > is not nodemask_pr_args clever enough to not print anything or print "0" > if the nmask is NODE_MASK_NONE? > > Looking at lib/vsprintf.c:bitmap_list_string(), the %*pbl format doesn't print anything when the bitmap is empty (the for_each_set_bitrange loopdoesn't execute). So we can simplify this to just remove check nodes_empty(). Thanks for the thorough review! Best regards, Zhen