From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CC3D318D62B for ; Tue, 6 Aug 2024 19:44:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722973490; cv=none; b=H7ktbm6cx4C36tSqXAtlcXW4RLMkYom9+9rTy0t9tDa7ZSSZPQA26VG5XSSgsmD8Y3PgQTtIfCF/7wZBeIOP67/OAFJtNeCfiR6hxJj5Es7FCrc3gSgWLOpbKsXSgMmbyiJNBZ1NAFNKwiDTV0WTOYcW1dIDmNZXnXEOszkn/0Y= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722973490; c=relaxed/simple; bh=aAjBWZSLo6zYyEz38FDFcAc8MG9CMjLUjVKt4UgGmyQ=; h=Date:To:From:Subject:Message-Id; b=JM8/YPjJj6Cj42jwc9/ue2YVLkz71ZhcO94D4UkRcbzeo1vCOTFtdG/VT4XhEwEFnSbHfdv0/LhcZTZmmH4enfOs8W755tCNP0HwOOavsDVU2GO+TCwJy9eqpHPJFcvcGOMBxiy/64/LOc9dHzaPsenXX/SF0xxwkuouAr6FmqY= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux-foundation.org header.i=@linux-foundation.org header.b=dHLzT+z5; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux-foundation.org header.i=@linux-foundation.org header.b="dHLzT+z5" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 2BF4AC32786; Tue, 6 Aug 2024 19:44:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1722973490; bh=aAjBWZSLo6zYyEz38FDFcAc8MG9CMjLUjVKt4UgGmyQ=; h=Date:To:From:Subject:From; b=dHLzT+z5bInQ02D0DNlLZApDJk+OG1P1Zz+9SSt8F1v+TlnxPDUa9Rk7F4Um+rDvD MRJ2MVNjs/HKmLdd39nZNc452veKaLgbz0ju4hQcgWvjf9+UN+9SbY/JhFLHpQTYzH CBAiARwSyfS33LzLU/LOtGk3QafShAwLOYFALixY= Date: Tue, 06 Aug 2024 12:44:49 -0700 To: mm-commits@vger.kernel.org,yosryahmed@google.com,shakeel.butt@linux.dev,hannes@cmpxchg.org,flintglass@gmail.com,chengming.zhou@linux.dev,nphamcs@gmail.com,akpm@linux-foundation.org From: Andrew Morton Subject: + zswap-track-swapins-from-disk-more-accurately.patch added to mm-unstable branch Message-Id: <20240806194450.2BF4AC32786@smtp.kernel.org> Precedence: bulk X-Mailing-List: mm-commits@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: The patch titled Subject: zswap: track swapins from disk more accurately has been added to the -mm mm-unstable branch. Its filename is zswap-track-swapins-from-disk-more-accurately.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/zswap-track-swapins-from-disk-more-accurately.patch This patch will later appear in the mm-unstable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days ------------------------------------------------------ From: Nhat Pham Subject: zswap: track swapins from disk more accurately Date: Mon, 5 Aug 2024 16:22:43 -0700 Currently, there are a couple of issues with our disk swapin tracking for dynamic zswap shrinker heuristics: 1. We only increment the swapin counter on pivot pages. This means we are not taking into account pages that also need to be swapped in, but are already taken care of as part of the readahead window. 2. We are also incrementing when the pages are read from the zswap pool, which is inaccurate. This patch rectifies these issues by incrementing the counter whenever we need to perform a non-zswap read. Note that we are slightly overcounting, as a page might be read into memory by the readahead algorithm even though it will not be neeeded by users - however, this is an acceptable inaccuracy, as the readahead logic itself will adapt to these kind of scenarios. To test this change, I built the kernel under a cgroup with its memory.max set to 2 GB: real: 236.66s user: 4286.06s sys: 652.86s swapins: 81552 For comparison, with just the new second chance algorithm, the build time is as follows: real: 244.85s user: 4327.22s sys: 664.39s swapins: 94663 Without neither: real: 263.89s user: 4318.11s sys: 673.29s swapins: 227300.5 (average over 5 runs) With this change, the kernel CPU time reduces by a further 1.7%, and the real time is reduced by another 3.3%, compared to just the second chance algorithm by itself. The swapins count also reduces by another 13.85%. Combinng the two changes, we reduce the real time by 10.32%, kernel CPU time by 3%, and number of swapins by 64.12%. To gauge the new scheme's ability to offload cold data, I ran another benchmark, in which the kernel was built under a cgroup with memory.max set to 3 GB, but with 0.5 GB worth of cold data allocated before each build (in a shmem file). Under the old scheme: real: 197.18s user: 4365.08s sys: 289.02s zswpwb: 72115.2 Under the new scheme: real: 195.8s user: 4362.25s sys: 290.14s zswpwb: 87277.8 (average over 5 runs) Notice that we actually observe a 21% increase in the number of written back pages - so the new scheme is just as good, if not better at offloading pages from the zswap pool when they are cold. Build time reduces by around 0.7% as a result. Link: https://lkml.kernel.org/r/20240805232243.2896283-3-nphamcs@gmail.com Fixes: b5ba474f3f51 ("zswap: shrink zswap pool based on memory pressure") Signed-off-by: Nhat Pham Suggested-by: Johannes Weiner Acked-by: Yosry Ahmed Cc: Chengming Zhou Cc: Shakeel Butt Cc: Takero Funaki Signed-off-by: Andrew Morton --- mm/page_io.c | 11 ++++++++++- mm/swap_state.c | 8 ++------ 2 files changed, 12 insertions(+), 7 deletions(-) --- a/mm/page_io.c~zswap-track-swapins-from-disk-more-accurately +++ a/mm/page_io.c @@ -521,7 +521,15 @@ void swap_read_folio(struct folio *folio if (zswap_load(folio)) { folio_unlock(folio); - } else if (data_race(sis->flags & SWP_FS_OPS)) { + goto finish; + } + + /* + * We have to read the page from slower devices. Increase zswap protection. + */ + zswap_folio_swapin(folio); + + if (data_race(sis->flags & SWP_FS_OPS)) { swap_read_folio_fs(folio, plug); } else if (synchronous) { swap_read_folio_bdev_sync(folio, sis); @@ -529,6 +537,7 @@ void swap_read_folio(struct folio *folio swap_read_folio_bdev_async(folio, sis); } +finish: if (workingset) { delayacct_thrashing_end(&in_thrashing); psi_memstall_leave(&pflags); --- a/mm/swap_state.c~zswap-track-swapins-from-disk-more-accurately +++ a/mm/swap_state.c @@ -702,10 +702,8 @@ skip: /* The page was likely read above, so no need for plugging here */ folio = __read_swap_cache_async(entry, gfp_mask, mpol, ilx, &page_allocated, false); - if (unlikely(page_allocated)) { - zswap_folio_swapin(folio); + if (unlikely(page_allocated)) swap_read_folio(folio, NULL); - } return folio; } @@ -854,10 +852,8 @@ skip: /* The folio was likely read above, so no need for plugging here */ folio = __read_swap_cache_async(targ_entry, gfp_mask, mpol, targ_ilx, &page_allocated, false); - if (unlikely(page_allocated)) { - zswap_folio_swapin(folio); + if (unlikely(page_allocated)) swap_read_folio(folio, NULL); - } return folio; } _ Patches currently in -mm which might be from nphamcs@gmail.com are zswap-implement-a-second-chance-algorithm-for-dynamic-zswap-shrinker.patch zswap-implement-a-second-chance-algorithm-for-dynamic-zswap-shrinker-fix.patch zswap-track-swapins-from-disk-more-accurately.patch zswap-track-swapins-from-disk-more-accurately-fix.patch