From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BD692C54798 for ; Tue, 5 Mar 2024 19:20:36 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 03F746B007B; Tue, 5 Mar 2024 14:20:36 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id F2F896B007D; Tue, 5 Mar 2024 14:20:35 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DF78B6B007E; Tue, 5 Mar 2024 14:20:35 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id D01776B007B for ; Tue, 5 Mar 2024 14:20:35 -0500 (EST) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 6E56AA02C3 for ; Tue, 5 Mar 2024 19:20:35 +0000 (UTC) X-FDA: 81863952030.17.D48DD19 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf06.hostedemail.com (Postfix) with ESMTP id 9467218000B for ; Tue, 5 Mar 2024 19:20:33 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=AT+HDU7x; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf06.hostedemail.com: domain of chrisl@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=chrisl@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1709666433; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=VhgaaDu19SmmrjO9ymvUId9znLIeTupyoznOlWenylA=; b=kSaC3yAEcOn4whegD58ckoYV1mXyGRTb2JeWHxrzhQbx6k5kD910ML08JxxgiWUMfiuEH3 yK2Bo3neTQ+uBvpnf/Y0gaUGywTszPHC2fSjYEi7qqnhKiOAhdAepnvZ4vnm1IYU2QEB6j E7HnPKc40kyHxvuXMF6L51CCIHNIDiU= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=AT+HDU7x; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf06.hostedemail.com: domain of chrisl@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=chrisl@kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1709666433; a=rsa-sha256; cv=none; b=mlOi5r48bsz03edDAPMKPtflvIBGeEobiPiYz8gDuN83qTQhdzHhNgZYJ5sIG4svS4rM5J 10OO12flyTRZt3f534WKqNecipiqG+T6xBZo2A327LdSD87UBeEt7+dP9t0NlRM4uiA7xK zscml43TWAWkyoRVqgq9+k0W7HY+xds= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id 822CC61773 for ; Tue, 5 Mar 2024 19:20:32 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 4B136C43330 for ; Tue, 5 Mar 2024 19:20:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1709666432; bh=Aoz1cO8qc/LUjpN4f5RcmQv+3Msw4yoYcA3XC7csi9g=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=AT+HDU7xnRXbWSkEKWVT1YW7be9LkUNr9mt3dAmIFphfpJNFUKbcZebviiUlnd9aH S64LkXY4dXkfDrK2Dix5REid0Ip1RWN4hMw7pm5VyWQxJ9wcTzOqW5kVuZbz8k80JZ oXOvuELMzWJm8uGp7Y7njkegzNElqqmz4T/+xkdeSXrl56aqnQ7FMh5N8ccxi6hXT/ +gf/owaTCod5v4aywDDOhhIHbv9eM+ejjtcIPDv9tN0TiOzmoLhkjR1mtAsBR4z/Lx oUi3EvEVn6yNGAHJbB4OJMihdPGlnxX84x3nHqL9ARxG5aoGF+rXzf6voG+X6FnrgJ SCNjjwHkec4ZQ== Received: by mail-lf1-f46.google.com with SMTP id 2adb3069b0e04-51336ab1fb7so4811127e87.1 for ; Tue, 05 Mar 2024 11:20:32 -0800 (PST) X-Forwarded-Encrypted: i=1; AJvYcCV1/HOTf2BbByz78SQIx35XQwa9YhAjoD6NViYXn+pAGQiwFkXgpV8Eqp2LeHafHEKuanFk8maMh75NRlSt0Bl1K90= X-Gm-Message-State: AOJu0Yx+ImvDPXQGs99qiQmUdjGtRf5q0TQvo/bRRR0w6IXCLOgwdJTd 5mG/PFJRy0xpSxUMMatjBI4AGuJE1iHiNk0lZu83jPQoq878u9/OqxgFVQqCL98ZVgMsjBYx8lM 9uR+IXCrtZQkPg5LNhlRsKR9WRg== X-Google-Smtp-Source: AGHT+IHVRm24Yt/vzo4A1hGpXAOHOYY1Y5lanIhHVR8CrnX3aTfJdST/k1SARQBLibBe1vcMGw0Mg37HeM0MvCllfqs= X-Received: by 2002:ac2:4d04:0:b0:513:4495:7ccc with SMTP id r4-20020ac24d04000000b0051344957cccmr1804736lfi.16.1709666430739; Tue, 05 Mar 2024 11:20:30 -0800 (PST) MIME-Version: 1.0 References: <97e95dc3-bdc0-4dfd-aca9-2d2880e1fdf5@linux.dev> In-Reply-To: From: Chris Li Date: Tue, 5 Mar 2024 11:20:18 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [LSF/MM/BPF TOPIC] Swap Abstraction "the pony" To: Nhat Pham Cc: Chengming Zhou , Matthew Wilcox , lsf-pc@lists.linux-foundation.org, linux-mm , ryan.roberts@arm.com, David Hildenbrand , Barry Song <21cnbao@gmail.com>, Chuanhua Han Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 9467218000B X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: qdnpc1o5yks3bg41safxsx5jm7ba8aoo X-HE-Tag: 1709666433-801987 X-HE-Meta: U2FsdGVkX18uqoqdL5cOYHhum3zGMIGe44+1t1tW+WfaCoJ0aBbzrRb3jTZOoQtsTlrUbREC9/hFeEI19cKS7qSEpebX1U4+U5Pzi2wmbzXQ4SKuc80eYJ26nwZgoaBIANyBNTYPc+ismnh6y9bhzblO8HEYke27qipRLULmL/0+T3Opgsowv98AFhOPs3Hvs4TzKqTiPJc4X9gg2Cy+9fW7gRcaCv+2f5W0Gi7t/UIO7iSQPfFwUldbJJx7Z/uIXQEqVVCoXZT/iMt4/Bf6MTJ3s/qdO1VglkDFGRdMID1Xii9ntJu7S8TRqg6psy4goSKj7j+d/qddaEfgt9cDil65q6M2x8Mi9QkBfqhDjg90Ls3q2EzBnZyE806DIz7NNKZmf4Yhv9LmQhpzSxYHmFwZVBEp4LhjPNRatCtawoLxoVaO/p/BD72uqBG84noRGihuieqIe0Mnpn+0AEuXhu6ZtmrEbumnuh3vAv+bH1RW/JQpmOZdZwmJdR4GUS74tT5EF+worHue/FEMhL5kEX8iw51sI9zqF3paB/PzSFTqvl5DSuQuCgPJ+gT5UHlFjuJeK/H4j69Hkx095rYzJqXYy83DYSpNAGpjtdS4YjuTyBiiaGNMfqacX2cVLxmPAzNIXEUQq388raI/jWWCvdU5E/0w959okxQHwLEXH7RBHGKvujwzLdTAtOCq6uge0E5/nsYq9yjf3B378snulpK9ZldQPE4/rRPtMQjBxNAmhWKo02f17ZVj2ePoPP3OgUKAJq6txWcTaW5L7eGpzNEylO0PqOvZBKPhFM+x6oRxYq4/conrbT0pTOWnScUH8PsjWbOOB7Pv3J4sYYinlG2iLu0Ll+7+8eAFvX+1NJGiX21h0ml8g/x1X2CZpluS5Z0tYOOjgrv/Pn75oBCLj3Nw2+X5FP3e X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Mar 5, 2024 at 2:55=E2=80=AFAM Nhat Pham wrote: > > On Tue, Mar 5, 2024 at 4:52=E2=80=AFPM Chengming Zhou wrote: > > > > Looks sensible. Now the zswap middle layer is transparent to frontend u= sers, > > which just allocate swap entry and swap out, don't care about whether i= t's > > swapped out to the zswap or swap file. > > > > By decoupling, the frontend users need to know it want to allocate zswa= p entry > > instead of a swap entry, right? Which becomes not transparent to users. > > Hmm for now, I was just thinking that it should always try zswap > first, and only fall back to swap if it fails to store to zswap, to > maintain the overall LRU ordering (best effort). > > The minimal viable implementation I'm thinking right now for this is > basically the "ghost swapfile" approach - i.e represent zswap as a > swapfile. Google has been using the ghost swapfile in production for many years. If it helps, I can rebase the ghost swap file patches to mm-unstable then send them out for RFC discussion. I am not expecting it to merge as it is, just as a starting point for if any one is interested in the ghost swap file. I think zswap with a ghost swap file will make zswap behave more like other swap back ends. If you use the ghost swap file, migrating from zswap to another swap device is very similar to migrating from SSD to hard drive, for example. > Writeback becomes quite hairy though, because there might be two > "swap" entries of the same object (the zswap swap entry and the newly > reserved swap entry) lying around near the end of the writeback step, > so gotta be careful with synchronization (read: juggling the swap > cache) to make sure concurrent swap-ins get something that makes > sense. Dealing with two swap device entries while writing back from one to another is unavoidable. I consider it as necessary evil. If we can have swap offset lookup to different swap entry types. One idea is to introduce a migration type of swap entry, the swap entry will have both source and destination swap entry stored in it. Then you just read in the source swap entry data (compressed or not). Write to the destination entry. Every swap in of the source swap entry will notice it has a migration swap entry type. Then it will ask the destination swap device to perform the IO. The same folio will exist in both source and destination swap cache. The limit of this approach is that, unless the source entry usage count drops to zero (every user swap in the entry). That source swap entry is occupied. It can't be reused for other data. Chris