From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from out-183.mta0.migadu.com (out-183.mta0.migadu.com [91.218.175.183])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7AE503CFF69
	for <virtualization@lists.linux.dev>; Tue, 10 Mar 2026 17:02:28 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.183
ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1773162151; cv=none; b=iJWpdfCvUvfer9MfV87Of4adHTuCxDGw/wvOsRH8TPzh8R8+OhxJPPp7Uywvip6P5VeDjl/zYaidMjFHrqZhSvm3j0yTd5XI+ZVaascnnBCU4nukDFhHRLyQdAVf3NrUKo8uWGDcIEMJkYNjPbRYfzMjnJOyctla9X2MVwnxkZY=
ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1773162151; c=relaxed/simple;
	bh=UXdqamY6AIu9HPttzy2Zdui/eIc+rJ7wz+7z84q9oug=;
	h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From:
	 In-Reply-To:Content-Type; b=HhW+eCwLJMGr5eWvZxwQgitDs+qPC/3fGaxgDKyVq+c7WPKm4r6UBu92WQMmGqCD7GjyTJl68IjtYQf593/y7VTxXHHnL5wPsB4X9Mwn9Y+w1TThPLFKNv2icHTrzWFSsK/oxuvTkaEt1zdwNDENRvDleZMxpc+Y66m4tQnRRCI=
ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=E5bgoUQu; arc=none smtp.client-ip=91.218.175.183
Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev
Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="E5bgoUQu"
Message-ID: <35272f7a-4c1e-48f8-8e99-82bf3baffab3@linux.dev>
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1;
	t=1773162135;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:cc:mime-version:mime-version:content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references;
	bh=0LKshKUK3KitIMjNACXgO7TTkRiyZPtXq9EyeTBs6so=;
	b=E5bgoUQuw0+On5E2PJ89wAn5VJKu0l4SF5t/wicRslH5R9CgJYBndevL6URwlolo5gSWRA
	Rc/KugVqwjM2d90Y46q784Bkmf8xfQ3zQpYtXtd6nTtBR9SZPM+vh4LPJc3yEA8/OJ97D/
	64B0uIZzJgoLagIttfjTn+QWj07uYtA=
Date: Tue, 10 Mar 2026 10:01:53 -0700
Precedence: bulk
X-Mailing-List: virtualization@lists.linux.dev
List-Id: <virtualization.lists.linux.dev>
List-Subscribe: <mailto:virtualization+subscribe@lists.linux.dev>
List-Unsubscribe: <mailto:virtualization+unsubscribe@lists.linux.dev>
MIME-Version: 1.0
Subject: Re: [PATCH v2] mm/mempolicy: track page allocations per mempolicy
To: Shakeel Butt <shakeel.butt@linux.dev>
Cc: linux-mm@kvack.org, akpm@linux-foundation.org, mhocko@suse.com,
 vbabka@suse.cz, apopple@nvidia.com, axelrasmussen@google.com,
 byungchul@sk.com, cgroups@vger.kernel.org, david@kernel.org,
 eperezma@redhat.com, gourry@gourry.net, jasowang@redhat.com,
 hannes@cmpxchg.org, joshua.hahnjy@gmail.com, Liam.Howlett@oracle.com,
 linux-kernel@vger.kernel.org, lorenzo.stoakes@oracle.com,
 matthew.brost@intel.com, mst@redhat.com, rppt@kernel.org,
 muchun.song@linux.dev, zhengqi.arch@bytedance.com, rakie.kim@sk.com,
 roman.gushchin@linux.dev, surenb@google.com, virtualization@lists.linux.dev,
 weixugc@google.com, xuanzhuo@linux.alibaba.com,
 ying.huang@linux.alibaba.com, yuanchu@google.com, ziy@nvidia.com,
 kernel-team@meta.com
References: <20260307045520.247998-1-jp.kobryn@linux.dev>
 <aa9aDGwk8YteaEob@linux.dev> <dcf2e654-ad2f-4390-9b62-078e664158de@linux.dev>
 <abAmMjkZZLN9LXXM@linux.dev>
Content-Language: en-US
X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers.
From: "JP Kobryn (Meta)" <jp.kobryn@linux.dev>
In-Reply-To: <abAmMjkZZLN9LXXM@linux.dev>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Migadu-Flow: FLOW_OUT

On 3/10/26 7:53 AM, Shakeel Butt wrote:
> On Mon, Mar 09, 2026 at 09:17:43PM -0700, JP Kobryn (Meta) wrote:
>> On 3/9/26 4:43 PM, Shakeel Butt wrote:
>>> On Fri, Mar 06, 2026 at 08:55:20PM -0800, JP Kobryn (Meta) wrote:
> [...]
>>>
>>> This seems like monotonic increasing metrics and I think you don't care about
>>> their absolute value but rather rate of change. Any reason this can not be
>>> achieved through tracepoints and BPF combination?
>>
>> We have the per-node reclaim stats (pg{steal,scan,refill}) in
>> nodeN/vmstat and memory.numa_stat now. The new stats in this patch would
>> be collected from the same source. They were meant to be used together,
>> so it seemed like a reasonable location. I think the advantage over
>> tracepoints is we get the observability on from the start and it would
>> be simple to extend existing programs that already read stats from the
>> cgroup dir files.
> 
> Convenience is not really justifying the cost of adding 18 counters,
> particularly in memcg. We can argue about adding just in system level metrics
> but not for memcg.
> 
> counter_cost = nr_cpus * nr_nodes * nr_memcg * 16 (struct lruvec_stats_percpu)
> 
> On a typical prod machine, we can see 1000s of memcg, 100s of cpus and couple of
> numa nodes. So, a single counter's cost can range from 200KiB to MiBs. This does
> not seem like a cost we should force everyone to pay.
> 
> If you really want these per-memcg and assuming these metrics are updated in
> non-performance critical path, we can try to decouple these and other reclaim
> related stats from rstat infra. That would at least reduce nr_cpus factor in the
> above equation to 1. Though we will need to actually evaluate the performance
> for the change before committing to it.

I could trade off the per-cgroup granularity and change these stats to
become global per-node stats.