From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-qk1-f196.google.com (mail-qk1-f196.google.com [209.85.222.196]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D2AC82C237F for ; Fri, 13 Feb 2026 21:37:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.196 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771018653; cv=none; b=X6+rGGM8BYSLPs5RZbh4VImaOd3TRnglXnmaVNMiW56eGFUX9TJRgR73vgEXK36jji4bEWhC48zNM18962uRqQ7FVrqPtWUnAp9XVthVjf+EyCG8rxQexLY5YqLqGM1CU7fi0dXnyTUhoYdAHZ9T14jeW2e5o7XweEInCLLoLSE= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771018653; c=relaxed/simple; bh=mH50sMrriyGDz+Qbrz5Taix8L1V/wsCNQUq+G3Lfn6k=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=f83nabcOYEfRP16OFHID9+DkiujzmMz6sKgtvielrP9iixf9ZQ793MRF0FdFGCIN+CKASwA1ie0fWR1AQoHb8uQ1D5XRBybj/Bc8M63o5ocXvtr4YAMAVTzqp5nRUxvx7g1ksn7iZmH9FBDlg608lweZEjSWeiC7DGH2Qh2x9RM= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=dbIFYCQE; arc=none smtp.client-ip=209.85.222.196 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="dbIFYCQE" Received: by mail-qk1-f196.google.com with SMTP id af79cd13be357-8cb38e86cf2so153834885a.1 for ; Fri, 13 Feb 2026 13:37:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1771018651; x=1771623451; darn=vger.kernel.org; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=yr4En5251tETgtYzeFZ5nnuFAQ7w2oyqC71nR7DQ8Wk=; b=dbIFYCQEAE6zC7ve1vlS9DJNfP7xlls4Aralt8YA5iSQaOHkFxhsgxolL57kg1aOEb JWa/Usc3e5TwUjZqOzaCSi5wwv+rVM7eGNDlmeq8m+J7C4FMRergH1dOwZk/5pgm2CjK mqaHAVKtuwZ9OsYpiLlEGai9Nwf60s258EZzNiIFUuogedSSylTEQMDad0NOg0ugVfDx sxWkYaNv88wAjdD0vJyj+G9eAiKU/ECrfLvxrQdXO1r8MRe8QAPog1sHteQuhTWKkFdf MTtbXy4EQWUWp6D3azMWMe35y9MRu147DeGRM1ZtnwVw47AbryDMLDT4eE+5SAkMhcar gThw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1771018651; x=1771623451; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-gg:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=yr4En5251tETgtYzeFZ5nnuFAQ7w2oyqC71nR7DQ8Wk=; b=uKuTDyiHywDObAKeNsP8p85soBKXoJp5RZz8CtS1tX0V0I50uBikXP7Ca20E/JIzzC hb1RZdJHndqazeHKYJ5yV6HltMssLvMZ9198x991Q8tVMMzxD5O+SQv+Ja96ocSFPqTL 58Fu3PRX9UNOSHfsLe3GLYETpiJ3D8w7qfxAQsNky/yMBQr8oS+ZrAM88X70CUQt7AY/ 7wpYv3TQbzsOIYFZPYceGVV2OsW7OE6wLWrTIOV9UfAuCVc+Xjx2jCeCqCWgN2USxDQ+ 4VNVMUWh8g6lmtgCKdBq3bDqgNAb4vNOd6xQzy7LZtcZF/huzMEg4bsf8vMXE6qpAOFw NHPA== X-Forwarded-Encrypted: i=1; AJvYcCWjV925tBhfdzML3rTbtEk36hbX8c2z6hIS77otDEXwpPX8didYgsQTPQm0FbKS0l6xekFQpnBj@vger.kernel.org X-Gm-Message-State: AOJu0Yx6DWLgiptIri4YsNNZB4PpNS6uf3KEMns/ff8YhtZGdcFzFeUn IB8LLAEFPFjpJo9876SOPH4s939IuaIgIukxMKMVyRTQMi6hACQrfcI20EQY8J7D X-Gm-Gg: AZuq6aIrsxXZTUQ/A57FfuKAezO4Y6DoZKS6hpETO0xPFbtt8QImJ1+5cINiX9kVcdg HbIcoWtx37DnPUguoh6jnUKfWW5TKgGfVvPPXCcsfl3eblKCYFZ6NV/QXrkH7AQuHeEDH7VgQ00 MncXystiQwFxR6MKG4xIMS4nMgKhMzjT8tJGiNNyRW90JQRujvWttAjzvwbgxcWcQbB75bdT7sc +Hk8hUDEXO25jLfa3gRrO6CtIqfja0aMglrU6XAcTTk1daCKq82O2owWz5gFF4r0Cbmj8mZltsq eOfEYKhR5dzRh9LcmZqw0LUuS9ZpDF/0MdPacT3MoY5OCA4Man6AcyPcEz9INApvCGY6RB4UzgP 2GRM7ZOuXEg10dHhsDc+MpWFR7MUM5XnEEKwZv4SWYKHeac+llBmGsNu289RReaPopcX8tGoh6l Qf3aLN/e37RLQlL6XTm3bP4jsmXM4SFad/ X-Received: by 2002:a05:7022:618e:b0:11a:f5e0:dc8 with SMTP id a92af1059eb24-1273ae30dbemr1295141c88.28.1771012577689; Fri, 13 Feb 2026 11:56:17 -0800 (PST) Received: from [192.168.4.196] ([73.222.117.172]) by smtp.gmail.com with ESMTPSA id a92af1059eb24-1272a69cc93sm8742855c88.6.2026.02.13.11.56.15 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 13 Feb 2026 11:56:17 -0800 (PST) Message-ID: Date: Fri, 13 Feb 2026 11:56:15 -0800 Precedence: bulk X-Mailing-List: cgroups@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH 1/2] mm/mempolicy: track page allocations per mempolicy To: Vlastimil Babka , linux-mm@kvack.org Cc: apopple@nvidia.com, akpm@linux-foundation.org, axelrasmussen@google.com, byungchul@sk.com, cgroups@vger.kernel.org, david@kernel.org, eperezma@redhat.com, gourry@gourry.net, jasowang@redhat.com, hannes@cmpxchg.org, joshua.hahnjy@gmail.com, Liam.Howlett@oracle.com, linux-kernel@vger.kernel.org, lorenzo.stoakes@oracle.com, matthew.brost@intel.com, mst@redhat.com, mhocko@suse.com, rppt@kernel.org, muchun.song@linux.dev, zhengqi.arch@bytedance.com, rakie.kim@sk.com, roman.gushchin@linux.dev, shakeel.butt@linux.dev, surenb@google.com, virtualization@lists.linux.dev, weixugc@google.com, xuanzhuo@linux.alibaba.com, ying.huang@linux.alibaba.com, yuanchu@google.com, ziy@nvidia.com, kernel-team@meta.com References: <20260212045109.255391-1-inwardvessel@gmail.com> <20260212045109.255391-2-inwardvessel@gmail.com> <96b63efb-551f-4dd5-b4a2-ac67da577431@suse.cz> Content-Language: en-US From: "JP Kobryn (Meta)" In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit On 2/13/26 12:54 AM, Vlastimil Babka wrote: > On 2/12/26 22:25, JP Kobryn wrote: >> On 2/12/26 7:24 AM, Vlastimil Babka wrote: >>> On 2/12/26 05:51, JP Kobryn wrote: >>>> It would be useful to see a breakdown of allocations to understand which >>>> NUMA policies are driving them. For example, when investigating memory >>>> pressure, having policy-specific counts could show that allocations were >>>> bound to the affected node (via MPOL_BIND). >>>> >>>> Add per-policy page allocation counters as new node stat items. These >>>> counters can provide correlation between a mempolicy and pressure on a >>>> given node. >>>> >>>> Signed-off-by: JP Kobryn >>>> Suggested-by: Johannes Weiner >>> >>> Are the numa_{hit,miss,etc.} counters insufficient? Could they be extended >>> in a way that would capture any missing important details? A counter per >>> policy type seems exhaustive, but then on one hand it might be not important >>> to distinguish beetween some of them, and on the other hand it doesn't track >>> the nodemask anyway. >> >> The two patches of the series should complement each other. When >> investigating memory pressure, we could identify the affected nodes >> (patch 2). Then we can cross-reference the policy-specific stats to find >> any correlation (this patch). >> >> I think extending numa_* counters would call for more permutations to >> account for the numa stat per policy. I think distinguishing between >> MPOL_DEFAULT and MPOL_BIND is meaningful, for example. Am I > > Are there other useful examples or would it be enough to add e.g. a > numa_bind counter to the numa_hit/miss/etc? Aside from bind, it's worth emphasizing that with default policy tracking we could see if the local node is the source of pressure. In the interleave case, we would be able to see if the loads are being balanced or, in the weighted case, being distributed properly. On extending the numa stats instead, I looked into this some more. I'm not sure if they're a good fit. They seem more about whether the allocator succeeded at placement rather than which policy drove the allocation. Thoughts? > What I'm trying to say the level of detail you are trying to add to the > always-on counters seems like more suitable for tracepoints. The counters > should be limited to what's known to be useful and not "everything we are > able to track and possibly could need one day". In a triage scenario, having the stats collected up to the time of the reported issue would be better. We make use of the tool called below[0]. It periodically samples the system and allows us to view the historical state prior to the issue. If we started at the time of the incident and attached tracepoints it would be too late. The triage workflow would look like this: 1) Pressure/OOMs reported while system-wide memory is free. 2) Check per-node pgscan/pgsteal stats (provided by patch 2) to narrow down node(s) under pressure. 3) Check per-policy allocation counters (this patch) on that node to find what policy was driving it. [0] https://github.com/facebookincubator/below