From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 45B33C433EF
	for <linux-mm@archiver.kernel.org>; Wed, 27 Apr 2022 02:19:04 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 8A60C6B0073; Tue, 26 Apr 2022 22:19:03 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 82D796B0075; Tue, 26 Apr 2022 22:19:03 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 6A7246B0078; Tue, 26 Apr 2022 22:19:03 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.27])
	by kanga.kvack.org (Postfix) with ESMTP id 546836B0073
	for <linux-mm@kvack.org>; Tue, 26 Apr 2022 22:19:03 -0400 (EDT)
Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay06.hostedemail.com (Postfix) with ESMTP id 21399280D4
	for <linux-mm@kvack.org>; Wed, 27 Apr 2022 02:19:03 +0000 (UTC)
X-FDA: 79401051366.18.2AD592D
Received: from out2.migadu.com (out2.migadu.com [188.165.223.204])
	by imf07.hostedemail.com (Postfix) with ESMTP id 776DE40050
	for <linux-mm@kvack.org>; Wed, 27 Apr 2022 02:19:00 +0000 (UTC)
Date: Tue, 26 Apr 2022 19:18:53 -0700
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1;
	t=1651025940;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:cc:mime-version:mime-version:content-type:content-type:
	 in-reply-to:in-reply-to:references:references;
	bh=aKjMvPM+2MJ2yRO8Ts8TLANuj83FMCxVybjtRrJhsDw=;
	b=OBTgeN3lChLzxmMfe4teyWHnkTWxurZczcrbXhakAoxuwu0Dp8CYE/eOIPWEE9rYYndeHM
	zHg1uAH9G8HSBO5U1KhcGIemGGBiahWklJBS4q86wiTWQ+2tV248ivfCkUSy1UaUOaYX/2
	JtmsAyowfVd2PozU/7V3CBiWosXJ9vw=
X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers.
From: Roman Gushchin <roman.gushchin@linux.dev>
To: Dave Chinner <dchinner@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, Yang Shi <shy828301@gmail.com>,
	Kent Overstreet <kent.overstreet@gmail.com>,
	Hillf Danton <hdanton@sina.com>
Subject: Re: [PATCH v2 0/7] mm: introduce shrinker debugfs interface
Message-ID: <YmioDchdnpIh8HlC@carbon>
References: <20220422202644.799732-1-roman.gushchin@linux.dev>
 <YmeK6/eZYaMo2Ltm@rh>
 <Ymggvr4Boc5JIf9j@carbon>
 <Ymia75Bh/sn/FQdV@rh>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <Ymia75Bh/sn/FQdV@rh>
X-Migadu-Flow: FLOW_OUT
X-Migadu-Auth-User: linux.dev
X-Stat-Signature: 6iifduw7egf9ffhu9594u6wgmzws9qip
Authentication-Results: imf07.hostedemail.com;
	dkim=pass header.d=linux.dev header.s=key1 header.b=OBTgeN3l;
	spf=pass (imf07.hostedemail.com: domain of roman.gushchin@linux.dev designates 188.165.223.204 as permitted sender) smtp.mailfrom=roman.gushchin@linux.dev;
	dmarc=pass (policy=none) header.from=linux.dev
X-Rspam-User: 
X-Rspamd-Server: rspam01
X-Rspamd-Queue-Id: 776DE40050
X-HE-Tag: 1651025940-503881
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On Wed, Apr 27, 2022 at 11:22:55AM +1000, Dave Chinner wrote:
> On Tue, Apr 26, 2022 at 09:41:34AM -0700, Roman Gushchin wrote:
> > Can you, please, summarize your position, because it's a bit unclear.
> > You made a lot of good points about some details (e.g. shrinkers naming,
> > and I totally agree there; machines with hundreds of nodes etc), then
> > you said the active scanning is useless and then said the whole thing
> > is useless and we're fine with what we have regarding shrinkers debugging.
> 
> Better introspection the first thing we need. Work on improving
> that. I've been making suggestions to help improve introspection
> infrastructure.
> 
> Before anything else, we need to improve introspection so we can
> gain better insight into the problems we have. Once we understand
> the problems better and have evidence to back up where the problems
> lie and we have a plan to solve them, then we can talk about whether
> we need other user accessible shrinker APIs.

Ok, at least we do agree here.

This is exactly why I've started with this debugfs stuff.

> 
> For the moment, exposing shrinker control interfaces to userspace
> could potentially be very bad because it exposes internal
> architectural and implementation details to a user API.  Just
> because it is in /sys/kernel/debug it doesn't mean applications
> won't start to use it and build dependencies on it.
> 
> That doesn't mean I'm opposed to exposing a shrinker control
> mechanism to debugfs - I'm still on the fence on that one. However,
> I definitely think that an API that directly exposes the internal
> implementation to userspace is the wrong way to go about this.

Ok, if it's about having memcg-aware and other interfaces, I can
agree here as well.

I actually made an attempt to unify memcg-aware and system-wide
shrinker scanning, not very successful yet, but it's definitely
on my todo list. I'm pretty sure we're iterating over and over
some empty root-level shrinkers without benefiting the bitmap
infrastructure which works for memory cgroups.

> 
> Fine grained shrinker control is not necessary to improve shrinker
> introspection and OOM debugging capability, so if you want/need
> control interfaces then I think you should separate those out into a
> separate line of development where it doesn't derail the discussion
> on how to improve shrinker/OOM introspection.

Ok, no problems here. Btw, tem OOM debugging is a separate topic brought
in by Kent, I'd keep it separate too, as it comes with many OOM-specific
complications.

>From your another email:
> So, yeah, you need to think about how to do fine-grained access to
> shrinker stats effectively. That might require a complete change of
> presentation API. For example, changing the filesystem layout to be
> memcg centric rather than shrinker instance centric would make an
> awful lot of this file parsing problem go away.
>
> e.g:
>
> /sys/kernel/debug/mm/memcg/<memcg instance>/shrinker/<shrinker instance>/stats

The problem with this approach (I though about it) is that it comes
with a high memory overhead especially on that machine with thousands cgroups
and mount points. And beside the memory overhead, it's really expensive to
collect system-wide data and get a big picture, as it requires opening
and reading of thousand of files.

Actually, you wrote recently:
"I've thought about it, too, and can see where it could be useful.
However, when I consider the list_lru memcg integration, I suspect
it becomes a "can't see the forest for the trees" problem. We're
going to end up with millions of sysfs objects with no obvious way
to navigate, iterate or search them if we just take the naive "sysfs
object + stats per list_lru instance" approach."

It all makes me think we need both: a way to iterate over all memcgs and dump
all the numbers at once and a way to get a specific per-memcg (per-node) count.

Thanks!