From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 17DFD288AD for ; Tue, 28 Apr 2026 14:15:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777385749; cv=none; b=ksZ/4GR6Eq1WBxTJsaP7s5kxqdE+Ece27e8BDpN9W4VDyk/1wA1kBOdMhXxZsRyVtWWx0yBPvfgthHCJK30rFN/tXSaHNEq9kn9RROIOGzPrz9x0P3x3sr1sKCVEyNixqZYK5Y4zeROjNptIny/jnmpt4cOU9Zhs+ba/K/Ngd5E= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777385749; c=relaxed/simple; bh=8D4HdJFxNVh66Lidrvh9gSBNONO6k08XZFtmHASmwPo=; h=Date:From:To:Cc:Subject:Message-Id:In-Reply-To:References: Mime-Version:Content-Type; b=EMKuTanvkSPS62TBycWfKfPRNXuuwN6yEEEMpzFZKSrWiTq9HE6YdC20PAutnbN/JZ6/x4BwwfnR/yD0du8r4DAk32+lQhuEYj2bF/H2F2WqMESZfn2NupGGApiYuId8bjQzd7Lo3a1aQcmlL4CtseodChVm967sB+Oo6JD4WlQ= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux-foundation.org header.i=@linux-foundation.org header.b=YddqoRMU; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux-foundation.org header.i=@linux-foundation.org header.b="YddqoRMU" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 1B41DC2BCAF; Tue, 28 Apr 2026 14:15:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1777385748; bh=8D4HdJFxNVh66Lidrvh9gSBNONO6k08XZFtmHASmwPo=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=YddqoRMUtfia1yZupo2gKcRCph3fb/psTON0XvcJ621OMk/h7sZv/jKCm7LOPSvJV amb241O2uIm61qcMFZm8TP8qx9fX2l+QRsVzhfO8y9dwwKTL3es4ou7dtOkZuyqbyA CCSZEXgcTk/wks86EN1yxkL/1qCQhmcYFP+E1doI= Date: Tue, 28 Apr 2026 07:15:47 -0700 From: Andrew Morton To: Zhen Ni Cc: vbabka@kernel.org, surenb@google.com, mhocko@suse.com, jackmanb@google.com, hannes@cmpxchg.org, ziy@nvidia.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v3 0/4] mm/page_owner: add filter infrastructure for print_mode and NUMA filtering Message-Id: <20260428071547.790d37de2e13716717abf022@linux-foundation.org> In-Reply-To: <20260428071112.1420380-1-zhen.ni@easystack.cn> References: <20260428071112.1420380-1-zhen.ni@easystack.cn> X-Mailer: Sylpheed 3.8.0beta1 (GTK+ 2.24.33; x86_64-pc-linux-gnu) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit On Tue, 28 Apr 2026 15:11:08 +0800 Zhen Ni wrote: > This patch series introduces filtering capabilities to the page_owner > feature to address storage and performance challenges in production > environments. Thanks, I updated mm.git's mm-new branch to this version. > Changes from v2: > - Remove READ_ONCE/WRITE_ONCE for nodemask_t (fixes compilation errors) > * nodemask_t is a large structure (128 bytes) that triggers compile-time asserts > * Direct assignment is safe for this use case > - Add comment explaining input length calculation formula > * 6 bytes = ",NNNNN" (comma + 5-digit node number) > - Simplify "-1" check using kstrtoint() instead of dual strcmp() > - Move nodemask_t mask read outside PFN iteration loop for performance > * Avoids 128-byte structure copy on each iteration > - Add documentation for filter features (patch 4/4) Here's how v3 altered mm.git: Documentation/mm/page_owner.rst | 55 +++++++++++++++++++++++++++++- mm/page_owner.c | 14 +++++-- 2 files changed, 64 insertions(+), 5 deletions(-) --- a/Documentation/mm/page_owner.rst~b +++ a/Documentation/mm/page_owner.rst @@ -74,7 +74,17 @@ Usage 3) Do the job that you want to debug. -4) Analyze information from page owner:: +4) (Optional) Use filters to focus on specific memory allocations:: + + cd /sys/kernel/debug/page_owner_filter + + # Print only stack handles instead of full traces + echo 1 > print_mode + + # Filter by NUMA nodes + echo "0,2-3" > nid + +5) Analyze information from page owner:: cat /sys/kernel/debug/page_owner_stacks/show_stacks > stacks.txt cat stacks.txt @@ -238,6 +248,49 @@ Usage ./page_owner_sort --tgid=1,2,3 ./page_owner_sort --name name1,name2 +Page Owner Filters +================== + +The page_owner feature provides filtering capabilities to focus on specific +memory allocations (e.g., by NUMA node). Filters are controlled through debugfs +files in ``/sys/kernel/debug/page_owner_filter/``. + +Print Mode Filter +----------------- + +The ``print_mode`` file controls the level of detail in stack trace output. + +Available modes: + +- ``0`` (default): Print full stack traces +- ``1``: Print only stack handles + +The ``print_mode=1`` output format:: + + Page allocated via order 0, mask 0x42800(GFP_NOWAIT|__GFP_COMP), + pid 1, tgid 1 (systemd), ts 349667370 ns + PFN 0xa00a2 type Unmovable Block 1280 type Unmovable + Flags 0x33fffe0000004124(...) + handle: 17432583 + +To retrieve the full stack trace for a handle, use:: + + cat /sys/kernel/debug/page_owner_stacks/show_stacks_handles + +NUMA Node Filter +---------------- + +The ``nid`` file filters pages by NUMA node. This is useful for NUMA-aware +environments to analyze node-specific memory allocation. + +Supported input formats: + +- Single node: ``echo "2" > nid`` +- Multiple nodes: ``echo "0,2,3" > nid`` +- Node range: ``echo "0-3" > nid`` +- Mixed format: ``echo "0,2-4,7" > nid`` +- Disable filter: ``echo "-1" > nid`` + STANDARD FORMAT SPECIFIERS ========================== :: --- a/mm/page_owner.c~b +++ a/mm/page_owner.c @@ -685,6 +685,7 @@ read_page_owner(struct file *file, char struct page_ext *page_ext; struct page_owner *page_owner; depot_stack_handle_t handle; + nodemask_t mask; if (!static_branch_unlikely(&page_owner_inited)) return -EINVAL; @@ -698,6 +699,8 @@ read_page_owner(struct file *file, char while (!pfn_valid(pfn) && (pfn & (MAX_ORDER_NR_PAGES - 1)) != 0) pfn++; + mask = owner_filter.nid_mask; + /* Find an allocated page */ for (; pfn < max_pfn; pfn++) { /* @@ -707,7 +710,6 @@ read_page_owner(struct file *file, char * user through copy_to_user() or GFP_KERNEL allocations. */ struct page_owner page_owner_tmp; - nodemask_t mask; /* * If the new page is in a new MAX_ORDER_NR_PAGES area, @@ -732,7 +734,6 @@ read_page_owner(struct file *file, char continue; /* NUMA node filter using bitmask */ - mask = owner_filter.nid_mask; if (!nodes_empty(mask)) { int nid = page_to_nid(page); @@ -1026,8 +1027,13 @@ static ssize_t nid_filter_write(struct f char *kbuf; nodemask_t mask; int ret; + int val; - /* Limit input size to handle worst-case nodelist (all nodes) */ + /* + * Limit input size to handle worst-case nodelist (all nodes). + * Worst case per node: ",NNNNN" (comma + 5-digit node number) = 6 bytes. + * Formula: 100 bytes overhead + 6 * MAX_NUMNODES + */ if (count > (100 + 6 * MAX_NUMNODES)) return -EINVAL; @@ -1042,7 +1048,7 @@ static ssize_t nid_filter_write(struct f kbuf[count] = '\0'; /* Support: "-1" to clear, or nodelist format like "0", "0,2", "0-3" */ - if (strcmp(kbuf, "-1\n") == 0 || strcmp(kbuf, "-1") == 0) + if (kstrtoint(kbuf, 10, &val) == 0 && val == -1) nodes_clear(mask); else if (nodelist_parse(kbuf, mask)) { ret = -EINVAL; _