From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 24BFEEE7FF4 for ; Mon, 11 Sep 2023 16:40:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A1C216B02A4; Mon, 11 Sep 2023 12:40:28 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9A57D6B02A6; Mon, 11 Sep 2023 12:40:28 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 81F946B02A8; Mon, 11 Sep 2023 12:40:28 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 6B7C86B02A4 for ; Mon, 11 Sep 2023 12:40:28 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 42E1B160AD6 for ; Mon, 11 Sep 2023 16:40:28 +0000 (UTC) X-FDA: 81224879736.02.7451C2C Received: from mail-pj1-f48.google.com (mail-pj1-f48.google.com [209.85.216.48]) by imf14.hostedemail.com (Postfix) with ESMTP id 6D87E100018 for ; Mon, 11 Sep 2023 16:40:26 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=kh2Rb8VY; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf14.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.216.48 as permitted sender) smtp.mailfrom=nphamcs@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1694450426; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=7dUHQDB4Hx3nVk2JA0BFztp/V0stKrR8xG6ZBM2hfeU=; b=68FAACRe49dNHtkHE1tFEc/iTcXb9gOKcMisg0tjzA4xInHA6gAAFTObQh5nvKa+DRFmot 1+HmtFNyrP+VowdORsIXoi6mocy2G1Ncrv40rgq+GIMg6alnIyrqb6E2SIe1w/jDQ5CG21 adedmslx0zyY9Guq1/0rSrOvitJLibI= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=kh2Rb8VY; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf14.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.216.48 as permitted sender) smtp.mailfrom=nphamcs@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1694450426; a=rsa-sha256; cv=none; b=trBPiUXljJKcK5xEBrpTsCeZG9qSasR6LpR4TrI6PWEmgHkUY4OFaEYhybxh1TezmKyKxp bU6jsRsS3/B28OVHwjL9fkQIdcIh8UdCu5bmgC76dvFONjtUQ4xwMQ13n0B1fYfZBSWGIv O0CiMT01ecLjyArYdGmUkeoeVWbAdDQ= Received: by mail-pj1-f48.google.com with SMTP id 98e67ed59e1d1-271b102659fso3179854a91.0 for ; Mon, 11 Sep 2023 09:40:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1694450425; x=1695055225; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=7dUHQDB4Hx3nVk2JA0BFztp/V0stKrR8xG6ZBM2hfeU=; b=kh2Rb8VYRovC/mh4ZfS8dh+ZDJSco6MwlkYpRRy0dn062CcOwUWya12NQeuurscQpQ yhR0z6jx+lKFVkIDXuv0FXsy/XjTmltdG7x5SSpTa8uXtQ6OrIp3jsU+3UTsdziP2q4y RpbI3/z9koXe74YoBVUfTEpWmKBEg0wkWHpUkn2QjQZ+EqAPcUIXrceLbo234jpL+gSU 86s/dKoRv58YBl9+hJEMnK7sWLCeav7V3nDTLFQVdSwv+aTmeblVPvhtA41XPxvdqGO6 PY+txxsjL7Sswyc4+6XaJrc4LJhjopR8rI5Q0Ge4DE980OkeztT8eqst8BNFgnCBNzTF FNIw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694450425; x=1695055225; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=7dUHQDB4Hx3nVk2JA0BFztp/V0stKrR8xG6ZBM2hfeU=; b=UmHoBD8VRScRLn2eKZ9ViKBaVobpY9VVfRkbgoLFICiKem44j0YqH5f7GIMcHqZRhI UdLUFPa3NHJQcaW5Zuo1P4Oq2q7RlBjmPVfUDxEtSTyaaM6jXyqxrCKvgrJkC133x4xR wP7mv5ml4F0ENBx8CIiThwGMGzVnVVascmfmMgAoedICUtB4MeTStHgZfMfcUZFwPCG3 bBEiUPhn1TUI2X8N4sInLv6fArM6++djbsJSi6mUA78kaKbpbAfzomH851OdnA+2hDwM +/jGwlS4FQ0eHUNTIdAAtBSYRPQLLZoWQIRUP8CicGzcGUmocMAOy8lxD+6ydPlboaeU AtBA== X-Gm-Message-State: AOJu0YyCmqbCN6e3mpe7ZybyF/JnqjYthlkqQZACoCajwHnfdnqilBz2 q/YfUZOlUzxSm2hBKbxWJsk= X-Google-Smtp-Source: AGHT+IH/zKG57DJyRWj53arZp0pE9/Mo0fA5xhe/Yh+pYSzhEr861yCpA92LlXhBwSEJ6QO9cu4qsA== X-Received: by 2002:a17:90b:118d:b0:268:2af6:e48c with SMTP id gk13-20020a17090b118d00b002682af6e48cmr173907pjb.4.1694450425086; Mon, 11 Sep 2023 09:40:25 -0700 (PDT) Received: from localhost (fwdproxy-prn-008.fbsv.net. [2a03:2880:ff:8::face:b00c]) by smtp.gmail.com with ESMTPSA id 12-20020a17090a194c00b0026f90d7947csm6396955pjh.34.2023.09.11.09.40.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 11 Sep 2023 09:40:24 -0700 (PDT) From: Nhat Pham To: akpm@linux-foundation.org Cc: hannes@cmpxchg.org, cerasuolodomenico@gmail.com, yosryahmed@google.com, sjenning@redhat.com, ddstreet@ieee.org, vitaly.wool@konsulko.com, mhocko@kernel.org, roman.gushchin@linux.dev, shakeelb@google.com, muchun.song@linux.dev, linux-mm@kvack.org, kernel-team@meta.com, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: [PATCH 0/2] workload-specific and memory pressure-driven zswap writeback Date: Mon, 11 Sep 2023 09:40:22 -0700 Message-Id: <20230911164024.2541401-1-nphamcs@gmail.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 6D87E100018 X-Stat-Signature: 9koq8inibew9x1cy4s7m17eoe9fq33b8 X-Rspam-User: X-HE-Tag: 1694450426-658790 X-HE-Meta: U2FsdGVkX187wPZg6vggnth4jMBap9GXtFRxhhxbZqV2ys0HzC1tHdgKDztceFDpbouzBuQun4LaQre1mZmFTAU7A7Lv6sBInfU8vVUpfdUnHTlXCsWsQhvOBptOt45k63HDVQ4M8+jDN1HEMXHzpZdDVv8lP1vAUIyIe8KBNudJYz3/hjRQs9ap7VWDOq9M+VLf9ZJR2CL3/lpuM7fo/D5Ak8ZOPuhSDgj2+oDrJrHDM+72X2UAnwgepndKM6wiEMk2wW4o81KN3R5LdmENGVhvVVaCftRdY6nCCbuW3YcyBHPYiGECg2PSMGL8NMWUV1H7okKzXlVJjs2/mPlT/ja6XSjyQcd6TgcQLINaJg2mNlW5T0ppX2poZbCYUGIWQC5NG6rqzDqfHfrqm+O44btkklZ6EKnS1qt7KTYEPnwbroZ0Xgm/lpC+TMccMAIt4P72SpxXIwUY+F0PlpaJ+Gxnq16GNl0wM34uKwAND+XfU3zt/LL7Lr4Mj6rvAweatBMSeCN0IAKWzeUReH+02Mn0mhHVJAHYGke4v7PTOK7aMGtDpo6QNt86//EllrE43x2+M4zr9pm/+/AUpRPytMck7haWcsDAPifeTNzCOw1mGT7giWwOJqv48V5j+f/PVMGW2LkDPay2pMnjze+vFNNEe+FjkV+cAioiO3ygCvnT5ChNkrKIMp/7ij0Zzm1UKC6gt28fKqYygh4rlVI+kzI1GWaXkC1y3zm9g1gu4a2Gmf8nTP9SEnyQVnTcoIG1Bs4wjoGI/BMoJKM1GPA+vYjdGs5M8s2D1xasjY64TU2HyB9YNP0OmjnqeXlH91Ai4SGvobPRnhi9JDUXMvf9gW/PY0r17SqFan4jm73ZaMiQnpyBfU5Q1q28N3exVsU0DauQS4cKa0NNhvHbgYTFSsrLG7YCuLEhdKq/Oeq2KLxgnGkJecfs1lwwPLtb1EJ/bU3Ij5bqr/Fh+Hl//gQ kXUBMQy8 cRfxTp9TsxnH9li+QpgjBdDwFyin2gTEhWx6278yXtQh+h4TiyA59hhRrrgzXeazLKasQOBwE3vDcubPG736Pqcubgqa7ajguekzdkG7JPUT7dwvozs7IjD1ExTT2I80F3zgSP6S9qQSh7lPTII0g9wV7yNCs+COgZ1KNnYjeum0N6ESQog9e6JIMnweq+/w8BBA5R+3AkBD6BJ4C3ziSEZUQlxL0YLYStbT8DoVzKlaMZLpLOwB5EDFSVbtrYDMiRbLOpcsEIGOLBFU91SR73t7dCf9xZ8/aqHbKbelex5VC/EJLjSakFLyYriZJ/8KSWNE/01IUGF4DtcmfCQ8cdCIrvEAEkgmO7JA4bj0mZ24DYBumlMWIIETDq/5iZIwPbPR/AR2x3k1qPjhGufBmzrS8CSLmwgeGRxKlWJmPhtyDdJhW2vVwCHrH0QwB1gx0E8JAxxj7p2uzoTXdZOmM79jkhEHwGwpnz7IFD+3F6kaTI30= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: There are currently several issues with zswap writeback: 1. There is only a single global LRU for zswap. This makes it impossible to perform worload-specific shrinking - an memcg under memory pressure cannot determine which pages in the pool it owns, and often ends up writing pages from other memcgs. This issue has been previously observed in practice and mitigated by simply disabling memcg-initiated shrinking: https://lore.kernel.org/all/20230530232435.3097106-1-nphamcs@gmail.com/T/#u But this solution leaves a lot to be desired, as we still do not have an avenue for an memcg to free up its own memory locked up in zswap. 2. We only shrink the zswap pool when the user-defined limit is hit. This means that if we set the limit too high, cold data that are unlikely to be used again will reside in the pool, wasting precious memory. It is hard to predict how much zswap space will be needed ahead of time, as this depends on the workload (specifically, on factors such as memory access patterns and compressibility of the memory pages). This patch series solves these issues by separating the global zswap LRU into per-memcg and per-NUMA LRUs, and performs workload-specific (i.e memcg- and NUMA-aware) zswap writeback under memory pressure. The new shrinker does not have any parameter that must be tuned by the user, and can be opted in or out on a per-memcg basis. On a benchmark that we have run: (without the shrinker) real -- mean: 153.27s, median: 153.199s sys -- mean: 541.652s, median: 541.903s user -- mean: 4384.9673999999995s, median: 4385.471s (with the shrinker) real -- mean: 151.4956s, median: 151.456s sys -- mean: 461.14639999999997s, median: 465.656s user -- mean: 4384.7118s, median: 4384.675s We observed a 14-15% reduction in kernel CPU time, which translated to over 1% reduction in real time. On another benchmark, where there was a lot more cold memory residing in zswap, we observed even more pronounced gains: (without the shrinker) real -- mean: 157.52519999999998s, median: 157.281s sys -- mean: 769.3082s, median: 780.545s user -- mean: 4378.1622s, median: 4378.286s (with the shrinker) real -- mean: 152.9608s, median: 152.845s sys -- mean: 517.4446s, median: 506.749s user -- mean: 4387.694s, median: 4387.935s Here, we saw around 32-35% reduction in kernel CPU time, which translated to 2.8% reduction in real time. These results confirm our hypothesis that the shrinker is more helpful the more cold memory we have. Domenico Cerasuolo (1): zswap: make shrinking memcg-aware Nhat Pham (1): zswap: shrinks zswap pool based on memory pressure Documentation/admin-guide/mm/zswap.rst | 12 + include/linux/list_lru.h | 39 +++ include/linux/memcontrol.h | 1 + include/linux/mmzone.h | 14 + include/linux/zswap.h | 9 + mm/list_lru.c | 46 ++- mm/memcontrol.c | 33 +++ mm/swap_state.c | 50 +++- mm/zswap.c | 369 ++++++++++++++++++++++--- 9 files changed, 518 insertions(+), 55 deletions(-) -- 2.34.1