From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-m49202.qiye.163.com (mail-m49202.qiye.163.com [45.254.49.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0715F53E0B for ; Thu, 30 Apr 2026 05:12:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.254.49.202 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777525950; cv=none; b=uNkbdvuOboY0g50x+x4BN5DePdhJ3Mjotvoaj2LGEiHmmEYUDfJUFYAjvOVXZ97cLJ5xo22VV8rkyMN7P8KPqvg84cvYFmA3axKx0JbVo3fMG6lP0BhFW0rSrwU+6TnX2LkadbJxd2pl6B3Mp3UI/5G6w1AYu6aE9cvyEPsBQms= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777525950; c=relaxed/simple; bh=RNZjxD9qSzlQYCPG/rxlW4fW9s09NHA7kTALpik4/So=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=Y+UaxKyCCjcrLb1elBP0CFm5hvyWaHE/2aNOvQn34mDlDc4FiqZrfDgSKuuVUs8Z678WA/QNvdQNGFS7XjKszUoF1zfguBHjettPKpnLF4nQxBbOgF2u3fyS3SBMWGQHEh4warAedo9hnnx8dODGStB4Bw+iRsRvYtWEYxGj32s= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=easystack.cn; spf=pass smtp.mailfrom=easystack.cn; arc=none smtp.client-ip=45.254.49.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=easystack.cn Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=easystack.cn Received: from [192.168.0.18] (unknown [218.94.118.90]) by smtp.qiye.163.com (Hmail) with ESMTP id 19931ff17; Thu, 30 Apr 2026 11:56:33 +0800 (GMT+08:00) Message-ID: Date: Thu, 30 Apr 2026 11:56:33 +0800 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v3 3/4] mm/page_owner: add NUMA node filter with nodelist support To: SeongJae Park Cc: akpm@linux-foundation.org, vbabka@kernel.org, surenb@google.com, mhocko@suse.com, jackmanb@google.com, hannes@cmpxchg.org, ziy@nvidia.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <20260429145645.80675-1-sj@kernel.org> From: "zhen.ni" In-Reply-To: <20260429145645.80675-1-sj@kernel.org> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-HM-Tid: 0a9ddc88001b0229kunm611f0e101ee38e X-HM-MType: 1 X-HM-Spam-Status: e1kfGhgUHx5ZQUpXWQgPGg8OCBgUHx5ZQUlOS1dZFg8aDwILHllBWSg2Ly tZV1koWUFJQjdXWRgWCB1ZQUpXWS1ZQUlXWQ8JGhUIEh9ZQVkaSE5MVkpDT0lKS0wdGEoYTFYVFA kWGhdVGRETFhoSFyQUDg9ZV1kYEgtZQVlJSkNVQk9VSkpDVUJLWVdZFhoPEhUdFFlBWU9LSFVCQk lOS1VKS0tVSkJLQlkG 在 2026/4/29 22:56, SeongJae Park 写道: > On Wed, 29 Apr 2026 17:03:56 +0800 "zhen.ni" wrote: > >> >> >> 在 2026/4/29 09:28, SeongJae Park 写道: >>> On Tue, 28 Apr 2026 15:11:11 +0800 Zhen Ni wrote: > [...] >>>> @@ -685,6 +685,7 @@ read_page_owner(struct file *file, char __user *buf, size_t count, loff_t *ppos) >>>> struct page_ext *page_ext; >>>> struct page_owner *page_owner; >>>> depot_stack_handle_t handle; >>>> + nodemask_t mask; >>>> >>>> if (!static_branch_unlikely(&page_owner_inited)) >>>> return -EINVAL; >>>> @@ -698,6 +699,8 @@ read_page_owner(struct file *file, char __user *buf, size_t count, loff_t *ppos) >>>> while (!pfn_valid(pfn) && (pfn & (MAX_ORDER_NR_PAGES - 1)) != 0) >>>> pfn++; >>>> >>>> + mask = owner_filter.nid_mask; >>>> + >>> >>> READ_ONCE() was used for owner_filter.print_mode. Should nid_mask also read >>> using READ_ONCE()? >>> >> The reason is that `owner_filter.nid_mask` is a nodemask_t, which is a >> 128-byte structure. READ_ONCE() only supports types up to 8 bytes and >> will trigger a compile-time assertion failure for larger structures. >> >> This was actually an issue in v2 - the AI review tool (sashiko.dev) and >> Andrew both caught the compilation error with READ_ONCE/WRITE_ONCE on >> nodemask_t, so v3 removed them. > > Thank you for kindly sharing the context. Now I understand why READ_ONCE() > cannot be used. But, is plain load/store safe enough for nodemask_t? > Shouldn't it still be protected against races? > Concurrency Safety: I considered spinlock and RCU, but decided against them: - Spinlock: Adds overhead on every read, overkill for a debug facility - RCU: Requires dynamic allocation of 128-byte nodemask_t, too complex - READ_ONCE/WRITE_ONCE: Not possible, exceeds 8-byte limit Plain load/store is safe here because: 1. page_owner is debug code with low-frequency filter changes 2. Worst case of torn read is temporary inconsistency in debug output 3. Similar debugfs interfaces use the same approach The overhead of locking doesn't justify the benefit for this debug use case. Do you think this is acceptable, or would you prefer I add locking? > [...] >>>> +static ssize_t nid_filter_write(struct file *file, >>>> + const char __user *buf, >>>> + size_t count, loff_t *ppos) >>>> +{ >>>> + char *kbuf; >>>> + nodemask_t mask; >>>> + int ret; >>>> + int val; >>>> + >>>> + /* >>>> + * Limit input size to handle worst-case nodelist (all nodes). >>>> + * Worst case per node: ",NNNNN" (comma + 5-digit node number) = 6 bytes. >>>> + * Formula: 100 bytes overhead + 6 * MAX_NUMNODES >>>> + */ >>>> + if (count > (100 + 6 * MAX_NUMNODES)) >>>> + return -EINVAL; >>>> + >>>> + kbuf = kmalloc(count + 1, GFP_KERNEL); >>>> + if (!kbuf) >>>> + return -ENOMEM; >>>> + >>>> + if (copy_from_user(kbuf, buf, count)) { >>>> + ret = -EFAULT; >>>> + goto out_free; >>>> + } >>>> + kbuf[count] = '\0'; >>>> + >>>> + /* Support: "-1" to clear, or nodelist format like "0", "0,2", "0-3" */ >>>> + if (kstrtoint(kbuf, 10, &val) == 0 && val == -1) >>>> + nodes_clear(mask); >>>> + else if (nodelist_parse(kbuf, mask)) { >>>> + ret = -EINVAL; >>>> + goto out_free; >>>> + } >>> >>> Doesn't empty string input to nodelist_parse() clears the mask? Can't it be >>> reused? >>> >> Yes, empty input (echo > nid) works because nodelist_parse() handles it >> correctly. However, nodelist_parse() - which is implemented via >> bitmap_parselist() - cannot handle "-1" as it's not a valid range format >> and would return an error. The explicit "-1" check is necessary to >> support `echo "-1" > nid` without returning an error. >> >> So the "-1" check handles a case that nodelist_parse() cannot handle. > > Thank you for kindly explaining the reason. But, do we really need to support > "-1" input? Couldn't we just redefine the interface? > I chose "-1" to clearly differentiate from valid NUMA node IDs (0, 1, 2, 3...).Since node IDs are non-negative integers, "-1" naturally means "invalid" or "no filter", which is an intuitive convention in Linux (e.g., pid -1, signal -1). Do you have a better suggestion for how to represent "clear filter"? > [...] >>>> + debugfs_create_file("nid", 0600, filter_dir, NULL, >>>> + &nid_filter_fops); >>> >>> Why don't you use 'page_owner_' prefix like other fops, for consistency? >>> >> For consistency with the other file_operations >> in this module (page_owner_fops, page_owner_threshold_fops, >> page_owner_print_mode_fops), I'll rename nid_filter_fops to >> page_owner_nid_filter_fops. >> >> I'll incorporate these improvements in the next version. > > Thank you for kindly accepting my humble suggestions. > >> >> Thanks for the detailed review! > > Thank you for sharing this nice work, too! > > > Thanks, > SJ > > [...] > >