From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 836EEC47DAF for ; Thu, 18 Jan 2024 17:30:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1D8E66B0085; Thu, 18 Jan 2024 12:30:55 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 189A46B0087; Thu, 18 Jan 2024 12:30:55 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 050DA6B0089; Thu, 18 Jan 2024 12:30:54 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id E6EC16B0085 for ; Thu, 18 Jan 2024 12:30:54 -0500 (EST) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 832661A0C86 for ; Thu, 18 Jan 2024 17:30:54 +0000 (UTC) X-FDA: 81693122028.22.B0F6D5D Received: from mail-ed1-f48.google.com (mail-ed1-f48.google.com [209.85.208.48]) by imf02.hostedemail.com (Postfix) with ESMTP id 957CA80029 for ; Thu, 18 Jan 2024 17:30:52 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=O2HwXUGe; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf02.hostedemail.com: domain of yosryahmed@google.com designates 209.85.208.48 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1705599052; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=dZ8+aWnS6wpwgfnA94ITk/n+W7jq2mwNVOlBfkoNPGg=; b=Zj/eTixXcwdSm91qdZM1qwgEo3KdFgxH22Op3/zfLLMf1bn4fZQib6OXZPE9ic93XI/Ro1 z2bNqPicARCsYED6MIId0rB6vLsGrshW1ojD3ugHQjpRMLLgdUZrH6FHr013DTEdQUrFFF N+8WzU00hzkVZ6A0lUzt1YEOoOOzx/Q= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=O2HwXUGe; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf02.hostedemail.com: domain of yosryahmed@google.com designates 209.85.208.48 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1705599052; a=rsa-sha256; cv=none; b=fLI2/DW0YSbQuIoSd5N5903/39L0JM50dDqxjMeopvUENrMM/47HuRucRU7o0HHknoBB2T BXqHubma2nkzNqCW1fdaFfZksFSPaC9fQHhiKs31XVHbG7IXC2RFw/DhwB6NX+jXF1DO8B nihwTgs0kcMmAhwnQ7qXXC6aB15YvPg= Received: by mail-ed1-f48.google.com with SMTP id 4fb4d7f45d1cf-55a44bb66d3so361310a12.1 for ; Thu, 18 Jan 2024 09:30:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1705599051; x=1706203851; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=dZ8+aWnS6wpwgfnA94ITk/n+W7jq2mwNVOlBfkoNPGg=; b=O2HwXUGeJKcdepTiuDlRU0ZEHS953F9NDmlpB3rqvwz/Wm0iqdB+ROO+ZWmj2PhFhD 73Is2wd1KzqLUck5PCVO7lHU+xzv/GnnEW/fwbgYhf6g3rYfmtDhcvTdXuBUNoo3/c7K x+xzIJARmg39pjfssYlHC8Nf1ih3feE2RpUJe3ehlGR655rvqdbhXyg0luMoYh5EI+IS xRSp4aLA8brtMTumvGSXJswd0YUmEr2Nc83ceS3ZJdFk7v/a8HoqyyqUhozrUKxFRC3j qO6qFki+gziqFZLn3GiqH5+h7osvSF2Uexj/GCTjJuiqqzJWEEiz1zB5gVYU8+/Jmv2H +3OQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1705599051; x=1706203851; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=dZ8+aWnS6wpwgfnA94ITk/n+W7jq2mwNVOlBfkoNPGg=; b=jtw/GEIkb550uFgmfo3EDWJvgsqpLjkylg4L5FI5ymwTchbYDleGAvEIZFdS+Y7HYu 3GQF+pXvcDPWy09mUuoUkKS7qL0DdJ/Z5mk+XJGOmqrTac3qq0+jGhOft7mFL7geqQvr h7p5ovN/n4dLOw2Il0Wfd1ije8uucPLX/prom28QV/bf+JjHM2OtL38atkicRZRGodKZ QvYSsMdZcaKJtaHtl4DQohRmoTTTawVADLKCDF49xAJTd2Jyl8maxSO4wSuUCQ5Ihxmw gOlSyHTAhtLfHBd6Ei4UMDjBSeoHrIxe44teqvRtJ4NWDbFXnEhzjrkJAv7EB/0B6xEh hBdQ== X-Gm-Message-State: AOJu0YyLXni3VVL8ie4mTXDlgMATVxxuVxECpxdxSn++kOJwR3fo9ecr EHiC6WvPXkQzHleQlKElfIoVwO4NbL0ezlAFMW72yKmSQUdIspcw53UvFzj6YWmHmVu5qALXRB7 HTtDWMyd+PJo1nBM8/DUvwOPMD79mnB+DKxbS X-Google-Smtp-Source: AGHT+IG7js52JDOKOth2ogk5zWVsYxaGZLs0gFbl1NLyC22gqAcaQk7pm9ZXvlpEMKYF0N87+x3Ziyr2ZnIXH+7G5dc= X-Received: by 2002:a17:906:c109:b0:a2a:1343:5b18 with SMTP id do9-20020a170906c10900b00a2a13435b18mr822061ejc.86.1705599050944; Thu, 18 Jan 2024 09:30:50 -0800 (PST) MIME-Version: 1.0 References: <20240117-b4-zswap-lock-optimize-v1-0-23f6effe5775@bytedance.com> <20240118153425.GI939255@cmpxchg.org> In-Reply-To: <20240118153425.GI939255@cmpxchg.org> From: Yosry Ahmed Date: Thu, 18 Jan 2024 09:30:12 -0800 Message-ID: Subject: Re: [PATCH 0/2] mm/zswap: optimize the scalability of zswap rb-tree To: Johannes Weiner Cc: Chengming Zhou , Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Chris Li , Nhat Pham Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 957CA80029 X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: b3d1pmenc8upo79x9gtcsh8uepi495ew X-HE-Tag: 1705599052-533482 X-HE-Meta: U2FsdGVkX1/ZPAk/a7YLFSkE6Iwd401cMbwiKWIbSMtLnQrddzRLYz8zq8/yXTxZik986QVHbrVErvfLvW/Ah2m9a2XTCT06kKDb+bK+HdMUmV3fU0hIGdHkZ5gqtfn7QKcG0awJfCbuq37a7OwfG4C0XjkC164P2hr56mHuZbo7OOBRQ89bZ7y4ao9CQrF1a9nl6eiPBt3RFMJg3KE2Qv7m76aVfbe1h8Lq1lmkMfmIoAWf7WEGU2oT5xSE3nhvjPrkeEo0mpIb+59nJ0ERagz5C7QvcHC9/YBbqt2FT1KyBPkS5DiY7lJ9hufMfCroKT4vfjfvKCOFqTzVDbVIkG3OiXnVM7Rvh6YUCLDUZdnRem9GzsCM9LC20IIWSy+oci61x/1Ah/i0eSdRVnZOnvQn5D54V1RxRY/2M4nkfOpAWbdtU7Pdxmr1Z6pkpkqJ0LosDYXnbCW1Li/KzNrK1FL5C8KqqbMYPLmrKi0f3VgcZOCFxFPzAY5XTZYELN77TU2PdJnyxLOESURYk54yIJaBO8ogeuQLo2Wacg41BY6ZCcWHwtOlLSuznFZUX6/udG4K7SEJnrAE+8B5cPKMaZcJpxYqJGsTJsx7pvHfmZZsSK717g3noeHLqBp0BIuWHbum5NClf+wGy5pkEm3lUZxk3sEwMEq+2solu/0CaGrvys6noTHxTw4StmT1ivubhPGtWzaaX+lPHNBSzmlTi6DgCZYUNaenPvVd4BKW80CQiAInG8V4Tg5BQ2/VKDqdRAo06uAY2kBWKHcB5Ll8eQ+BjHmiLU1hTS9Npb9b+C/KdcTXE2s1Bi5f0SqHX5ygd7hhjQKimLvgbbLEmpDYbUMcAM00iXlhdDvFzGXCzIpzczgSsIVi0+Z/50pN66VOc7ahOjbqQnJTX2hiXRkwD3IvqCVH0Eq1wpHGO+ZwupDPECcsplnBeoa11qjkd/+Ua6C9fxmtg3uRZqgAauo HWsesbYH K0vLT3j6XZBWumjkphU11IZQGOeH6CO7TxDA38r14r5aQ8D+oJ5uHTiw93GjoG1uJ21dPKohV96fsSYpvKnwk7jpcs5a8aNC/fr1M8D3xhGutxV/Xpu6qel5zcNQw+e6aMN1iaQry7bP0N+0NuZVOruTjvDZx1cGkbZzDtNeo4akVgcixTbbCe1N0dv5R6T1blU8J4BvXMze7e8psXx64fK6Wj0GMJ37tyXtiiZ1EcYl5Njsux0zKoL5PfWAaEEqrVFcjx3AozGXfIT97pxTjQ+2xobf04m4P9VIw4+FtdKx+P0Vme7zJ2gKkRD7dV1uaJJ5WN8R+I474ccBrUM8yE1qcjQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Jan 18, 2024 at 7:34=E2=80=AFAM Johannes Weiner wrote: > > On Wed, Jan 17, 2024 at 10:37:22AM -0800, Yosry Ahmed wrote: > > On Wed, Jan 17, 2024 at 1:23=E2=80=AFAM Chengming Zhou > > wrote: > > > > > > When testing the zswap performance by using kernel build -j32 in a tm= pfs > > > directory, I found the scalability of zswap rb-tree is not good, whic= h > > > is protected by the only spinlock. That would cause heavy lock conten= tion > > > if multiple tasks zswap_store/load concurrently. > > > > > > So a simple solution is to split the only one zswap rb-tree into mult= iple > > > rb-trees, each corresponds to SWAP_ADDRESS_SPACE_PAGES (64M). This id= ea is > > > from the commit 4b3ef9daa4fc ("mm/swap: split swap cache into 64MB tr= unks"). > > > > > > Although this method can't solve the spinlock contention completely, = it > > > can mitigate much of that contention. Below is the results of kernel = build > > > in tmpfs with zswap shrinker enabled: > > > > > > linux-next zswap-lock-optimize > > > real 1m9.181s 1m3.820s > > > user 17m44.036s 17m40.100s > > > sys 7m37.297s 4m54.622s > > > > > > So there are clearly improvements. And it's complementary with the on= going > > > zswap xarray conversion by Chris. Anyway, I think we can also merge t= his > > > first, it's complementary IMHO. So I just refresh and resend this for > > > further discussion. > > > > The reason why I think we should wait for the xarray patch(es) is > > there is a chance we may see less improvements from splitting the tree > > if it was an xarray. If we merge this series first, there is no way to > > know. > > I mentioned this before, but I disagree quite strongly with this > general sentiment. > > Chengming's patches are simple, mature, and have convincing > numbers. IMO it's poor form to hold something like that for "let's see > how our other experiment works out". The only exception would be if we > all agree that the earlier change flies in the face of the overall > direction we want to pursue, which I don't think is the case here. My intention was not to delay merging these patches until the xarray patches are merged in. It was only to wait until the xarray patches are *posted*, so that we can redo the testing on top of them and verify that the gains are still there. That should have been around now, but the xarray patches were posted in a form that does not allow this testing (because we still have a lock on the read path), so I am less inclined. My rationale was that if the gains from splitting the tree become minimal after we switch to an xarray, we won't know. It's more difficult to remove optimizations than to add them, because we may cause a regression. I am kind of paranoid about having code sitting around that we don't have full information about how much it's needed. In this case, I suppose we can redo the testing (1 tree vs. split trees) once the xarray patches are in a testable form, and before we have formed any strong dependencies on the split trees (we have time until v6.9 is released, I assume). How about that? > > With the xarray we'll still have a per-swapfile lock for writes. That > lock is the reason SWAP_ADDRESS_SPACE segmentation was introduced for > the swapcache in the first place. Lockless reads help of course, but > read-only access to swap are in the minority - stores will write, and > loads are commonly followed by invalidations. Somebody already went > through the trouble of proving that xarrays + segmentation are worth > it for swap load and store access patterns. Why dismiss that? Fair point, although I think the swapcache lock may be more contended than the zswap tree lock. > So my vote is that we follow the ususal upstreaming process here: > merge the ready patches now, and rebase future work on top of it. No objections given the current state of the xarray patches as I mentioned earlier, but I prefer we redo the testing once possible with the xarray.