All of lore.kernel.org
 help / color / mirror / Atom feed
From: Bagas Sanjaya <bagasdotme@gmail.com>
To: Nhat Pham <nphamcs@gmail.com>, akpm@linux-foundation.org
Cc: hannes@cmpxchg.org, cerasuolodomenico@gmail.com,
	yosryahmed@google.com, sjenning@redhat.com, ddstreet@ieee.org,
	vitaly.wool@konsulko.com, mhocko@kernel.org,
	roman.gushchin@linux.dev, shakeelb@google.com,
	muchun.song@linux.dev, chrisl@kernel.org, linux-mm@kvack.org,
	kernel-team@meta.com,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Linux CGroups <cgroups@vger.kernel.org>,
	Linux Documentation <linux-doc@vger.kernel.org>,
	Linux Kernel Selftests <linux-kselftest@vger.kernel.org>,
	shuah@kernel.org
Subject: Re: [PATCH v7 0/6] workload-specific and memory pressure-driven zswap writeback
Date: Sat, 2 Dec 2023 11:44:18 +0700	[thread overview]
Message-ID: <ZWq2IqMMJesqenGK@archie.me> (raw)
In-Reply-To: <20231127234600.2971029-1-nphamcs@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 6194 bytes --]

On Mon, Nov 27, 2023 at 03:45:54PM -0800, Nhat Pham wrote:
> Changelog:
> v7:
>    * Added the mem_cgroup_iter_online() function to the API for the new
>      behavior (suggested by Andrew Morton) (patch 2)
>    * Fixed a missing list_lru_del -> list_lru_del_obj (patch 1)
> v6:
>    * Rebase on top of latest mm-unstable.
>    * Fix/improve the in-code documentation of the new list_lru
>      manipulation functions (patch 1)
> v5:
>    * Replace reference getting with an rcu_read_lock() section for
>      zswap lru modifications (suggested by Yosry)
>    * Add a new prep patch that allows mem_cgroup_iter() to return
>      online cgroup.
>    * Add a callback that updates pool->next_shrink when the cgroup is
>      offlined (suggested by Yosry Ahmed, Johannes Weiner)
> v4:
>    * Rename list_lru_add to list_lru_add_obj and __list_lru_add to
>      list_lru_add (patch 1) (suggested by Johannes Weiner and
> 	 Yosry Ahmed)
>    * Some cleanups on the memcg aware LRU patch (patch 2)
>      (suggested by Yosry Ahmed)
>    * Use event interface for the new per-cgroup writeback counters.
>      (patch 3) (suggested by Yosry Ahmed)
>    * Abstract zswap's lruvec states and handling into 
>      zswap_lruvec_state (patch 5) (suggested by Yosry Ahmed)
> v3:
>    * Add a patch to export per-cgroup zswap writeback counters
>    * Add a patch to update zswap's kselftest
>    * Separate the new list_lru functions into its own prep patch
>    * Do not start from the top of the hierarchy when encounter a memcg
>      that is not online for the global limit zswap writeback (patch 2)
>      (suggested by Yosry Ahmed)
>    * Do not remove the swap entry from list_lru in
>      __read_swapcache_async() (patch 2) (suggested by Yosry Ahmed)
>    * Removed a redundant zswap pool getting (patch 2)
>      (reported by Ryan Roberts)
>    * Use atomic for the nr_zswap_protected (instead of lruvec's lock)
>      (patch 5) (suggested by Yosry Ahmed)
>    * Remove the per-cgroup zswap shrinker knob (patch 5)
>      (suggested by Yosry Ahmed)
> v2:
>    * Fix loongarch compiler errors
>    * Use pool stats instead of memcg stats when !CONFIG_MEMCG_KEM
> 
> There are currently several issues with zswap writeback:
> 
> 1. There is only a single global LRU for zswap, making it impossible to
>    perform worload-specific shrinking - an memcg under memory pressure
>    cannot determine which pages in the pool it owns, and often ends up
>    writing pages from other memcgs. This issue has been previously
>    observed in practice and mitigated by simply disabling
>    memcg-initiated shrinking:
> 
>    https://lore.kernel.org/all/20230530232435.3097106-1-nphamcs@gmail.com/T/#u
> 
>    But this solution leaves a lot to be desired, as we still do not
>    have an avenue for an memcg to free up its own memory locked up in
>    the zswap pool.
> 
> 2. We only shrink the zswap pool when the user-defined limit is hit.
>    This means that if we set the limit too high, cold data that are
>    unlikely to be used again will reside in the pool, wasting precious
>    memory. It is hard to predict how much zswap space will be needed
>    ahead of time, as this depends on the workload (specifically, on
>    factors such as memory access patterns and compressibility of the
>    memory pages).
> 
> This patch series solves these issues by separating the global zswap
> LRU into per-memcg and per-NUMA LRUs, and performs workload-specific
> (i.e memcg- and NUMA-aware) zswap writeback under memory pressure. The
> new shrinker does not have any parameter that must be tuned by the
> user, and can be opted in or out on a per-memcg basis.
> 
> As a proof of concept, we ran the following synthetic benchmark:
> build the linux kernel in a memory-limited cgroup, and allocate some
> cold data in tmpfs to see if the shrinker could write them out and
> improved the overall performance. Depending on the amount of cold data
> generated, we observe from 14% to 35% reduction in kernel CPU time used
> in the kernel builds.
> 
> Domenico Cerasuolo (3):
>   zswap: make shrinking memcg-aware
>   mm: memcg: add per-memcg zswap writeback stat
>   selftests: cgroup: update per-memcg zswap writeback selftest
> 
> Nhat Pham (3):
>   list_lru: allows explicit memcg and NUMA node selection
>   memcontrol: add a new function to traverse online-only memcg hierarchy
>   zswap: shrinks zswap pool based on memory pressure
> 
>  Documentation/admin-guide/mm/zswap.rst      |   7 +
>  drivers/android/binder_alloc.c              |   7 +-
>  fs/dcache.c                                 |   8 +-
>  fs/gfs2/quota.c                             |   6 +-
>  fs/inode.c                                  |   4 +-
>  fs/nfs/nfs42xattr.c                         |   8 +-
>  fs/nfsd/filecache.c                         |   4 +-
>  fs/xfs/xfs_buf.c                            |   6 +-
>  fs/xfs/xfs_dquot.c                          |   2 +-
>  fs/xfs/xfs_qm.c                             |   2 +-
>  include/linux/list_lru.h                    |  54 ++-
>  include/linux/memcontrol.h                  |  18 +
>  include/linux/mmzone.h                      |   2 +
>  include/linux/vm_event_item.h               |   1 +
>  include/linux/zswap.h                       |  27 +-
>  mm/list_lru.c                               |  48 ++-
>  mm/memcontrol.c                             |  32 +-
>  mm/mmzone.c                                 |   1 +
>  mm/swap.h                                   |   3 +-
>  mm/swap_state.c                             |  26 +-
>  mm/vmstat.c                                 |   1 +
>  mm/workingset.c                             |   4 +-
>  mm/zswap.c                                  | 426 +++++++++++++++++---
>  tools/testing/selftests/cgroup/test_zswap.c |  74 ++--
>  24 files changed, 641 insertions(+), 130 deletions(-)
> 
> 
> base-commit: 5cdba94229e58a39ca389ad99763af29e6b0c5a5

No regressions when booting kernel with series applied.

Tested-by: Bagas Sanjaya <bagasdotme@gmail.com>

-- 
An old man doll... just what I always wanted! - Clara

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

      parent reply	other threads:[~2023-12-02  4:44 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-27 23:45 [PATCH v7 0/6] workload-specific and memory pressure-driven zswap writeback Nhat Pham
2023-11-27 23:45 ` [PATCH v7 1/6] list_lru: allows explicit memcg and NUMA node selection Nhat Pham
2023-11-29 15:02   ` Johannes Weiner
2023-11-27 23:45 ` [PATCH v7 2/6] memcontrol: add a new function to traverse online-only memcg hierarchy Nhat Pham
2023-11-29 15:04   ` Johannes Weiner
2023-11-29 17:00     ` Johannes Weiner
2023-11-27 23:45 ` [PATCH v7 3/6] zswap: make shrinking memcg-aware Nhat Pham
2023-11-29 15:17   ` Johannes Weiner
2023-11-30  0:21     ` Nhat Pham
2023-11-30  1:17       ` Nhat Pham
2023-11-27 23:45 ` [PATCH v7 4/6] mm: memcg: add per-memcg zswap writeback stat Nhat Pham
2023-11-29 15:25   ` Johannes Weiner
2023-11-30  1:26     ` Nhat Pham
2023-11-27 23:45 ` [PATCH v7 5/6] selftests: cgroup: update per-memcg zswap writeback selftest Nhat Pham
2023-11-27 23:46 ` [PATCH v7 6/6] zswap: shrinks zswap pool based on memory pressure Nhat Pham
2023-11-29 16:21   ` Johannes Weiner
2023-11-29 23:44     ` Nhat Pham
2023-12-02  4:44 ` Bagas Sanjaya [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZWq2IqMMJesqenGK@archie.me \
    --to=bagasdotme@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=cerasuolodomenico@gmail.com \
    --cc=cgroups@vger.kernel.org \
    --cc=chrisl@kernel.org \
    --cc=ddstreet@ieee.org \
    --cc=hannes@cmpxchg.org \
    --cc=kernel-team@meta.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=muchun.song@linux.dev \
    --cc=nphamcs@gmail.com \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeelb@google.com \
    --cc=shuah@kernel.org \
    --cc=sjenning@redhat.com \
    --cc=vitaly.wool@konsulko.com \
    --cc=yosryahmed@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.