From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from out-178.mta0.migadu.com (out-178.mta0.migadu.com [91.218.175.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4910E12CD8B for ; Mon, 9 Mar 2026 04:31:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.178 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773030710; cv=none; b=XE/RZ88a28IypGLoPrqMALJrWV1VtCtKWQtWQ1Yl1r860pen2VvdMJDlqslNEX0ETf2MFLjTwuYqNozQGkLl5qJEHpUTTMWjpcCup/e1VSVWz82KwtY3VpF8W58DaAVT2OiTHhJBc1fjyGJ5QFQZGbEzH25fFBoHRLI9IB8GOdY= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773030710; c=relaxed/simple; bh=5bZcwq80hP2eLvghZHKw20QaejdRDW8ON09xINe/spg=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=b2VFWwuxGuQiH2gb0PyMEFfzkGGqU9FYgCExfdwD31XLfBdFPLKRWI38npUuhYLrx7gLLTf91dSWDzJzqLwuLChxqgbEuGFYQgUNTIKi8MOGl4pe+M6PILVPApwnAeQNpgIaktP22DdXSLUYK7IYqhu6m5hSvM/KQQamgw7WZyc= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=w60laVuz; arc=none smtp.client-ip=91.218.175.178 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="w60laVuz" Message-ID: <977dc43d-622c-411d-99a6-4204fa26c21e@linux.dev> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1773030697; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=5Z8d6If/2x6c+shmsp818rCRLczc7yj9JjP94B94wZo=; b=w60laVuzxMfgneDLmcGSncQulMv7DiH5dqJoxdCsYWpE8iCxDAKH3XOJqzZ7BYKbjXuC2f m4PBC5/9rxWlIn2qNOv8r1XoZe5H5UmUNQfyXAVyZOmENLXvznBURl2nQXLeLnwls+R7pC mrytVteNtBzEZcTlotUaosuQ0AJ53N8= Date: Sun, 8 Mar 2026 21:31:27 -0700 Precedence: bulk X-Mailing-List: virtualization@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Subject: Re: [PATCH v2] mm/mempolicy: track page allocations per mempolicy To: "Huang, Ying" Cc: linux-mm@kvack.org, akpm@linux-foundation.org, mhocko@suse.com, vbabka@suse.cz, apopple@nvidia.com, axelrasmussen@google.com, byungchul@sk.com, cgroups@vger.kernel.org, david@kernel.org, eperezma@redhat.com, gourry@gourry.net, jasowang@redhat.com, hannes@cmpxchg.org, joshua.hahnjy@gmail.com, Liam.Howlett@oracle.com, linux-kernel@vger.kernel.org, lorenzo.stoakes@oracle.com, matthew.brost@intel.com, mst@redhat.com, rppt@kernel.org, muchun.song@linux.dev, zhengqi.arch@bytedance.com, rakie.kim@sk.com, roman.gushchin@linux.dev, shakeel.butt@linux.dev, surenb@google.com, virtualization@lists.linux.dev, weixugc@google.com, xuanzhuo@linux.alibaba.com, yuanchu@google.com, ziy@nvidia.com, kernel-team@meta.com References: <20260307045520.247998-1-jp.kobryn@linux.dev> <87seabu8np.fsf@DESKTOP-5N7EMDA> Content-Language: en-US X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: "JP Kobryn (Meta)" In-Reply-To: <87seabu8np.fsf@DESKTOP-5N7EMDA> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Migadu-Flow: FLOW_OUT On 3/7/26 4:27 AM, Huang, Ying wrote: > "JP Kobryn (Meta)" writes: > >> When investigating pressure on a NUMA node, there is no straightforward way >> to determine which policies are driving allocations to it. >> >> Add per-policy page allocation counters as new node stat items. These >> counters track allocations to nodes and also whether the allocations were >> intentional or fallbacks. >> >> The new stats follow the existing numa hit/miss/foreign style and have the >> following meanings: >> >> hit >> - for BIND and PREFERRED_MANY, allocation succeeded on node in nodemask >> - for other policies, allocation succeeded on intended node >> - counted on the node of the allocation >> miss >> - allocation intended for other node, but happened on this one >> - counted on other node >> foreign >> - allocation intended on this node, but happened on other node >> - counted on this node >> >> Counters are exposed per-memcg, per-node in memory.numa_stat and globally >> in /proc/vmstat. > > IMHO, it may be better to describe your workflow as an example to use > the newly added statistics. That can describe why we need them. For > example, what you have described in > > https://lore.kernel.org/linux-mm/9ae80317-f005-474c-9da1-95462138f3c6@gmail.com/ > >> 1) Pressure/OOMs reported while system-wide memory is free. >> 2) Check per-node pgscan/pgsteal stats (provided by patch 2) to narrow >> down node(s) under pressure. They become available in >> /sys/devices/system/node/nodeN/vmstat. >> 3) Check per-policy allocation counters (this patch) on that node to >> find what policy was driving it. Same readout at nodeN/vmstat. >> 4) Now use /proc/*/numa_maps to identify tasks using the policy. > Good call. I'll add a workflow adapted for the current approach in the next revision. I included it in another response in this thread, but I'll repeat here because it will make it easier to answer your question below. 1) Pressure/OOMs reported while system-wide memory is free. 2) Check /proc/zoneinfo or per-node stats in .../nodeN/vmstat to narrow down node(s) under pressure. 3) Check per-policy hit/miss/foreign counters (added by this patch) on node(s) to see what policy is driving allocations there (intentional vs fallback). 4) Use /proc/*/numa_maps to identify tasks using the policy. > One question. If we have to search /proc/*/numa_maps, why can't we > find all necessary information via /proc/*/numa_maps? For example, > which VMA uses the most pages on the node? Which policy is used in the > VMA? ... > There's a gap in the flow of information if we go straight from a node in question to numa_maps. Without step 3 above, we can't distinguish whether pages landed there intentionally, as a fallback, or were migrated sometime after the allocation. These new counters track the results of allocations at the time they happen, preserving that information regardless of what may happen later on.