From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from lgeamrelo03.lge.com (lgeamrelo03.lge.com [156.147.51.102]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 92F9E347C7 for ; Sun, 14 Jun 2026 09:23:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=156.147.51.102 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781428996; cv=none; b=Q8X/9kaSwvhHAYLItg6dUU2Z2HeK4W8jV13EXx0dlA1JTz+Ydl3JKcqUI9R/oHVnHDt5KY/gwPZ0WRG1bRQjHqUEIFCb1YkpIfR8+xr/NA/fVl+KLTBpkPPSMjG8WoA+/zG3j/S35PY38S3yxz5A6CgYurjyL56CQhPEHNTEwBQ= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781428996; c=relaxed/simple; bh=ld+wVFU2oJQmVv/sNSfBsHnWdT8YxnJkps1+6Uwbkzo=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=N0yGPl4IuKM+geVWHlcjpJf+Kd7WJQjq9IeLMmK97jDApZBYhueBTgFlXh6Q5kfuzkcpYIjy/KH+suerRtatNVHE4AGCJgf9fJgiRWKBob7PUpGJFe2Ob71atRbeE8RLZl3/sE5YJO1VUrC8uzknJrYOV1LELqEjECypzgY3DQ8= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=lge.com; spf=pass smtp.mailfrom=lge.com; arc=none smtp.client-ip=156.147.51.102 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=lge.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=lge.com Received: from unknown (HELO yjaykim-PowerEdge-T330) (10.177.112.156) by 156.147.51.102 with ESMTP; 14 Jun 2026 18:23:03 +0900 X-Original-SENDERIP: 10.177.112.156 X-Original-MAILFROM: youngjun.park@lge.com Date: Sun, 14 Jun 2026 18:23:03 +0900 From: YoungJun Park To: Shakeel Butt Cc: Yosry Ahmed , Hao Jia , Johannes Weiner , mhocko@kernel.org, tj@kernel.org, mkoutny@suse.com, roman.gushchin@linux.dev, Nhat Pham , akpm@linux-foundation.org, chengming.zhou@linux.dev, muchun.song@linux.dev, cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, Hao Jia , chrisl@kernel.org, kasong@tencent.com, baoquan.he@linux.dev, joshua.hahnjy@gmail.com Subject: Re: [swap tier discussion] Re: [PATCH v3 2/4] mm/zswap: Implement proactive writeback Message-ID: References: Precedence: bulk X-Mailing-List: linux-doc@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: .... > >Based on the memcg interface currently proposed in swap_tier > > (memory.swap.tiers, memory.swap.tiers.effective), I think it aligns well > > with the current direction. It provides a foundation for selectively > > targeting devices in tier order. > > Here instead of cpuset like interface, we may want more zswap like interface > where you can put limit on the usage i.e. memory.swap.tier*.max. We can start > with allowing only two values i.e. 0 and max which effectively will be the > same as what you need. > Good idea, and it's certainly feasible. When I considered this a while ago, the reasons I didn't take this direction were: 1. There's no real-world usage for adjusting the swap tier amount (it's either 0 or MAX). That said, your suggestion to initially allow only 0 and max is the killing point, and it's making me reconsider. 2. The implementation cost seems high. The current implementation handles this at runtime via simple masking. 3. Relationship with swap.max: - If we tie it to the current interface, wouldn't limiting the swap amount within a selected tier already be possible? I wonder if that alone is enough. - If we add tier.max, it would need to be a subset of swap.max. (Any other complexities here?) 4. vswap enable/disable: vswap doesn't seem to have an amount-control aspect, so an on/off semantic would be clearer. https://lore.kernel.org/linux-mm/ai5kOOmR1LPTWs1J@yjaykim-PowerEdge-T330/T/#m8831ec057bf9387978d3bd698f51920600e09a04 In that case, the internal logic could stay roughly the same rather than counting via a page counter. Something like: 1. Change the interface shell: tier.*.max — allow only 0 ~ max. 2. Keep the internal logic as is: 0 disables the mask (child memcgs off too), max enables it (child memcgs on too). 3. memory.zswap.max integrates naturally (it's memory."tier_name".max). 4. Extend later if use cases arise. On balance I still lean toward the current interface, but if a per-tier max is the better fit for memcg's direction and others feel the same, I'm happy to switch. I'd like to hear Shakeel's thoughts again, and I'm curious about others' opinions too. A few more perspectives on the points below. > I will respond to your other points later when I have time. > > > > To summarize the discussions so far, the following points align well. > > > > - Per-cgroup swap control, as I suggested. > > - Proactive zswap writeback (Hao's usecase) > > - Swap device target demotion(if it wants selective, then it is more better), as you mentioned: > > https://lore.kernel.org/linux-mm/aicZ-5GX9De3MAU7@linux.dev/ > > - Virtual Swap on/off in the future, as Nhat mentioned: > > https://lore.kernel.org/linux-mm/20260528212955.1912856-1-nphamcs@gmail.com/ > > - The memory.zswap.writeback alternative (no hierarchy model conflict) > > - zswap is first swap tier. > > - Promotion. (Also better for selectve usage) > > - tier based swap policy (e.g round-robin...) > > > > To accelerate this work, I believe we should reach a consensus and > > merge the currently proposed swap_tier interface :) > > > > If the above approach is difficult, I would like to suggest an > > alternative for progress with the memcg interfaces removed: > > > > 1) We could make zswap the first tier and create > > a use case where memory.zswap.writeback internally is handled by tier logic. > > > > 2) Or simply merge the swap_tier infrastructure itself first. > > > > This would allow the swap_tier infrastructure to be merged and discussed > > more easily. > > > > If it takes longer to adopt swap_tier anyway, by doing so we progress next step > > as a experimental feature. > > > > - Apply per-cgroup swap as an experimental (debugfs) feature. > > - Apply Hao's use case experimentally or as it is as Yosry suggested. > > (future migration to swap tier) > > > > How do you think? > > > > (FYI: My emails to kernel.org are failing due to internal server issues.) > > > > Thank you > > Youngjun Park Let me clarify a part I wrote confusingly. Handling memory.zswap.writeback via tiers is possible, but I don't think the interface itself would be replaced even if memory.swap.tiers is adopted. Selecting only zswap in memory.swap.tiers would not just disable writeback.it would also block regular swap entirely, which differs slightly from the current semantic. (... "Per the cgroup v2 docs: a zswap-only tier setting is subtly different from setting memory.swap.max to 0, since it still allows pages to be written to the zswap pool; this has no effect if zswap is disabled, and swapping is allowed unless memory.swap.max is set to 0.") So the interface itself needs to be retained, and it could be extended toward selective writeback — e.g., passing a desired tier into memory.zswap.writeback so writeback targets only that tier. Currently it only controls on/off. Other tiers probably don't need this. demotion based on the selected tier should be enough. Thanks, Youngjun Park