From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from lgeamrelo07.lge.com (lgeamrelo07.lge.com [156.147.51.103]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1A4AB1E1DE3 for ; Sun, 24 Aug 2025 14:19:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=156.147.51.103 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756045179; cv=none; b=IvEdYfGBSlmiBTV3+ALxRuwqReTbcAIwpQmlck7rmxeAjC6wglxDhcfzzigBxMuR6XI/bcP40Wl6g2jjHDbt5nB70gslY11SCp4Kl5Eqo9PWkBIPnIxnFJ8elgcVTQzQZis9M3jBtya43qyGTDHdg0dadKp0anPh0CnCCNux2Ws= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756045179; c=relaxed/simple; bh=unzH2Zg3QLYapxP0d9yYYXyfQ/FWfgufZj/vUZ29xlo=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=DaPjqsTDBe9MZomeM/lUNmWf1S/VPG8bfm79RLy5hcKymswuXKBuqqSUGONJPSx9n1pfpKP4LQzbMgahlVEWdA8q3sui0yX29+DsqTDeGc1+3S8z7RqSv+INMcplWHZwVxWh2aa6g0tUrWD5hZDQAYUnDl5rT66wU4Ub7iUypKU= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=lge.com; spf=pass smtp.mailfrom=lge.com; arc=none smtp.client-ip=156.147.51.103 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=lge.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=lge.com Received: from unknown (HELO yjaykim-PowerEdge-T330) (10.177.112.156) by 156.147.51.103 with ESMTP; 24 Aug 2025 23:19:28 +0900 X-Original-SENDERIP: 10.177.112.156 X-Original-MAILFROM: youngjun.park@lge.com Date: Sun, 24 Aug 2025 23:19:28 +0900 From: YoungJun Park To: Chris Li Cc: Michal =?iso-8859-1?Q?Koutn=FD?= , akpm@linux-foundation.org, hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, shikemeng@huaweicloud.com, kasong@tencent.com, nphamcs@gmail.com, bhe@redhat.com, baohua@kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, gunho.lee@lge.com, iamjoonsoo.kim@lge.com, taejoon.song@lge.com, Matthew Wilcox , David Hildenbrand , Kairui Song Subject: Re: [PATCH 1/4] mm/swap, memcg: Introduce infrastructure for cgroup-based swap priority Message-ID: References: Precedence: bulk X-Mailing-List: cgroups@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: > How do you express the default tier who shall not name? There are > actually 3 states associated with default. It is not binary. > 1) default not specified: look up parent chain for default. > 2) default specified as on. Override parent default. > 3) default specified as off. Override parent default. As I understand, your intention is to define inheritance semantics depending on the default value, and allow children to override this freely with `-` and `+` semantics. Is that correct? When I originally proposed the swap cgroup priority mechanism, Michal Koutný commented that it is unnatural for cgroups if a parent attribute is not inherited by its child: (https://lore.kernel.org/linux-mm/rivwhhhkuqy7p4r6mmuhpheaj3c7vcw4w4kavp42avpz7es5vp@hbnvrmgzb5tr/) Therefore, my current thinking is: * The global swap setting itself is tier 1 (if nothing is configured). * If a cgroup has no setting: - Top-level cgroups follow the global swap. - Child cgroups follow their parent’s setting. * If a cgroup has its own setting, that setting is applied. (child cgroups can only select tiers that the parent has allowed.) This seems natural because most cgroup resource distribution mechanisms follow a subset inheritance model. Thus, in my concept, there is no notion of a “default” value that controls inheritance. > How are you going to store the list of ranges? Just a bitmask integer > or a list? They can be represented as increasing integers, up to 32, and stored as a bitmask. > I feel the tier name is more readable. The number to which actual > device mapping is non trivial to track for humans. Using increasing integers makes it simpler for the kernel to accept a uniform interface format, it is identical to the existing cpuset interface, and it expresses the meaning of “tiers of swap by speed hierarchy” more clearly in my view. However, my feeling is still that this approach is clearer both in terms of implementation and conceptual expression. I would appreciate if you could reconsider it once more. If after reconsideration you still prefer your direction, I will follow your decision. > I want to add another usage case into consideration. The swap.tiers > does not have to be per cgroup. It can be per VMA. [...] I understand this as a potential extension use case for swap.tier. I will keep this in mind when implementing. If I have further ideas here, I will share them for discussion. > Sounds fine. Maybe we can have "ssd:100 zswap:40 hdd" [...] Yes, this alignment looks good to me! > Can you elaborate on that. Just brainstorming, can we keep the > swap.tiers and assign NUMA autobind range to tier as well? [...] That is actually the same idea I had in mind for the NUMA use case. However, I doubt if there is any real workload using this in practice, so I thought it may be better to leave it out for now. If NUMA autobind is truly needed later, it could be implemented then. This point can also be revisited during review or patch writing, so I will keep thinking about it. > I feel that that has the risk of premature optimization. I suggest > just going with the simplest bitmask check first then optimize as > follow up when needed. [...] Yes, I agree with you. Starting with the bitmask implementation seems to be the right approach. By the way, while thinking about possible implementation, I would like to ask your opinion on the following situation: Suppose a tier has already been defined and cgroups are configured to use it. Should we allow the tier definition itself to be modified afterwards? There seem to be two possible choices: 1. Once a cgroup references a tier, modifying that tier should be disallowed. 2. Allow tier re-definition even if cgroups are already referencing it. Personally, I prefer option (1), since it avoids unexpected changes for cgroups that already rely on a particular tier definition. What is your opinion on this? Best Regards, Youngjun Park