From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 82768C54798 for ; Tue, 5 Mar 2024 03:24:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 02D4B6B0082; Mon, 4 Mar 2024 22:24:22 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id EF8FE6B0085; Mon, 4 Mar 2024 22:24:21 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D99846B0088; Mon, 4 Mar 2024 22:24:21 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id C3BAF6B0082 for ; Mon, 4 Mar 2024 22:24:21 -0500 (EST) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 8E68A1A078D for ; Tue, 5 Mar 2024 03:24:21 +0000 (UTC) X-FDA: 81861542322.29.85D0D53 Received: from out-170.mta0.migadu.com (out-170.mta0.migadu.com [91.218.175.170]) by imf30.hostedemail.com (Postfix) with ESMTP id 2432A80012 for ; Tue, 5 Mar 2024 03:24:17 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b="dsWwMCC/"; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf30.hostedemail.com: domain of chengming.zhou@linux.dev designates 91.218.175.170 as permitted sender) smtp.mailfrom=chengming.zhou@linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1709609060; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=nQVsAMOrF6/f3trERM918/TTxppEENiwvFZurkrhoFE=; b=CiPLrffd9mm9HkEt6sO8h28LQ1Uq7U9ka0ndtrQxLCADQgmWXvH76UJhYaVzNDnnsY2rGg qBT1bUIf6c8+j3fXILQodAmKj812UbBkCOkmNlElITbRG5u6XOA9ANXkK+2sRlV5bLRD/Y B7PyiiAaIy1gAJu1DwklUyYDvAshCN0= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b="dsWwMCC/"; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf30.hostedemail.com: domain of chengming.zhou@linux.dev designates 91.218.175.170 as permitted sender) smtp.mailfrom=chengming.zhou@linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1709609060; a=rsa-sha256; cv=none; b=0M/tEM/aBcjEe3ke+0aJTCnFrZi2FzlTNN+t4fjRuJ0/RcGdcEmG8qmwa8aBWI4V1fjkaM Si/RCO/nCmBKK94Px5Hs9FDD5AtXuK9T832NLILtNGhlQmS+aaUfvoGv+r80dXfzUtFqjM wTwgHxlLAUY6++KuaMOKobkYn3UY1I4= Message-ID: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1709609052; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=nQVsAMOrF6/f3trERM918/TTxppEENiwvFZurkrhoFE=; b=dsWwMCC/OZ4hVahK79LT30fNcTkmWadBKZrKGiEovM4rhAUQyQUoUgcCelFplEW0RGCWBx 4ni95yMkYmgw1UqidiTeduv15VWjfYqd5zgP10QCyLg2CkrTPOhS9vd0HBHJ9/7J6TWHvi HmVwYe4WNBLGUKuu7XsU/SAOF/W1hzU= Date: Tue, 5 Mar 2024 11:23:42 +0800 MIME-Version: 1.0 Subject: Re: [LSF/MM/BPF TOPIC] Swap Abstraction "the pony" Content-Language: en-US To: Matthew Wilcox , Nhat Pham Cc: Chris Li , lsf-pc@lists.linux-foundation.org, linux-mm , ryan.roberts@arm.com, David Hildenbrand , Barry Song <21cnbao@gmail.com>, Chuanhua Han References: X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Chengming Zhou In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Migadu-Flow: FLOW_OUT X-Rspamd-Queue-Id: 2432A80012 X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: djeugr4w9ojx8wrfij4bze6huhtey8yk X-HE-Tag: 1709609057-61348 X-HE-Meta: U2FsdGVkX1+OomPbjm9E11UYdePxs5/TAuBI77b+gSWg7RFUfmzW5/Yl6A7HYahEw2ydMm4yer/iz0SdKEED0gHf0wKMM70Bg/xMKSs4z30cXCr6UEKRnfxr85RMbJjzvmm/DuPwpVGFzXdeGMaVHR2AW4lLILp9WT8zL2aUfSFCVt7EyVjMukkvSwsQ74rXrPW3XesrPxOJP0kDzm57RqFLjyLQ2LldBCzc4KmccsblW7ZZsNx/sbsjYD6UgC+/SL5OxmkAxIbpDHzwaxLNnl1donIlLazcqfjH4l+2LIha0to8lWCns0hf4YYXxGfIXjLy2ydQcuJ4AVhzLMfihu9mn3cnUCMWU+3D5cEAOew9jCJCCf7AD4UdiRTzlvBEY1HH/HhRkFsIMf+zQ9Q1PtavL+AJ+SL1AfO6cT8k5Ss3coqTfNrr6tHTBYJ+RbJHd/l0PXmPRPSwwWQ16kYUgGePLE+hdMXDmV4em7BYAZk7HNAyCQvdRWqZWD3lOpwZE5WhkukOBckIYpRtTvlJGr6LeCV38tN4JS7BMhepoY5KNauKmTe46HwPlo9k+Bf390F3xK88+6/qn73MFo++cdJL1l+pJltnFhCtIzyJcsU8VmtgDmBfTdC0+BJeiWjc1w1CsWbnRJ2kKSbQEPYEbhgoxbp5R7fXVu4JZ6JrCDHMtUgz1SDQ+S1TM3HDG/lRCctQe8uI31yevWAJG9hoUQOZwZW/ctJatPRpPXSwloP9J9wEOQcp2mqbQw4w1+lcjcWn+uN0CoqfAD9J3uVABYjxd+UqSDr4vQCZbGHUVlr9Epqx/Ez5kf/qYOn6uXDGaNC3rv2fke7xq7fhVIhXy9XI2HJveUF0vJuUz9vNrPa4lL7l4FiTE4YJwPozv26UjtPl8RbgCsXjNW2ecMFdvhcAC/bWA2oCtOfNkx/t+8uo8uQ7v0Sg//Vd1pny6Qpb1RBRybnLW9BZloqfsRw Cq9RiGU1 Ri/3LH7CvLYoc3xt/oZJYqtyoIt4sTCDVBdRlaNeB4tjAZvlumFpIbWSqKx/waGOlzH9gscWml/ZQI3bjhpuoMEKvb5t83I769T/Qcv7kbmzmYiKlhWi4bx6dl7UbgCy6ZJUq4nd/56FLNp8= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000001, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2024/3/5 06:58, Matthew Wilcox wrote: > On Fri, Mar 01, 2024 at 04:53:43PM +0700, Nhat Pham wrote: >> IMHO, one thing this new abstraction should support is seamless >> transfer/migration of pages from one backend to another (perhaps from >> high to low priority backends, i.e writeback). >> >> I think this will require some careful redesigns. The closest thing we >> have right now is zswap -> backing swapfile. But it is currently >> handled in a rather peculiar manner - the underlying swap slot has >> already been reserved for the zswap entry. But there's a couple of >> problems with this: >> >> a) This is wasteful. We're essentially having the same piece of data >> occupying spaces in two levels in the hierarchies. >> b) How do we generalize to a multi-tier hierarchy? >> c) This is a bit too backend-specific. It'd be nice if we can make >> this as backend-agnostic as possible (if possible). >> >> Motivation: I'm currently working/thinking about decoupling zswap and >> swap, and this is one of the more challenging aspects (as I can't seem >> to find a precedent in the swap world for inter-swap backends pages >> migration), and especially with respect to concurrent loads (and >> swapcache interactions). > > Have you considered (and already rejected?) the opposite approach -- > coupling zswap and swap more tightly? That is, we always write out > the original pages today. Why don't we write out the compressed pages > instead? For the same amount of I/O, we'd free up more memory! That > sounds like a win to me. Right, I also thought about this direction for some time. Apart from fewer IO, there are more advantages we can see: 1. Don't need to allocate a page when write out compressed data. This method actually has its own problem[1], by allocating a new page and put on LRU list, wait for writeback and reclaim. If we write out compressed data directly, so don't need to allocated page, these problems can be avoided. 2. Don't need to decompress when write out compressed data. [1] https://lore.kernel.org/all/20240209115950.3885183-1-chengming.zhou@linux.dev/ > > I'm sure it'd be a big redesign, but that seems to be what we're talking > about anyway. > Yes, we need to do modifications in some parts: 1. zsmalloc: compressed objects can be migrated anytime, we need to support pinning. 2. swapout: need to support non-folio write out. 3. zswap: zswap need to handle synchronization between compressed write out and swapin, since they share the same swap entry. I must missed something, more discussions are welcome if others have interests too. Thanks!