* [PATCH 0/2] mm/zswap: optimize the scalability of zswap rb-tree @ 2024-01-17 9:23 Chengming Zhou 2024-01-17 9:23 ` [PATCH 1/2] mm/zswap: make sure each swapfile always have " Chengming Zhou ` (2 more replies) 0 siblings, 3 replies; 23+ messages in thread From: Chengming Zhou @ 2024-01-17 9:23 UTC (permalink / raw) To: Andrew Morton Cc: Yosry Ahmed, Chengming Zhou, linux-kernel, Johannes Weiner, linux-mm, Chris Li, Nhat Pham When testing the zswap performance by using kernel build -j32 in a tmpfs directory, I found the scalability of zswap rb-tree is not good, which is protected by the only spinlock. That would cause heavy lock contention if multiple tasks zswap_store/load concurrently. So a simple solution is to split the only one zswap rb-tree into multiple rb-trees, each corresponds to SWAP_ADDRESS_SPACE_PAGES (64M). This idea is from the commit 4b3ef9daa4fc ("mm/swap: split swap cache into 64MB trunks"). Although this method can't solve the spinlock contention completely, it can mitigate much of that contention. Below is the results of kernel build in tmpfs with zswap shrinker enabled: linux-next zswap-lock-optimize real 1m9.181s 1m3.820s user 17m44.036s 17m40.100s sys 7m37.297s 4m54.622s So there are clearly improvements. And it's complementary with the ongoing zswap xarray conversion by Chris. Anyway, I think we can also merge this first, it's complementary IMHO. So I just refresh and resend this for further discussion. Thanks for review and comment! Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> --- Chengming Zhou (2): mm/zswap: make sure each swapfile always have zswap rb-tree mm/zswap: split zswap rb-tree include/linux/zswap.h | 7 +++-- mm/swapfile.c | 10 ++++--- mm/zswap.c | 74 ++++++++++++++++++++++++++++++++------------------- 3 files changed, 59 insertions(+), 32 deletions(-) --- base-commit: ab27740f76654ed58dd32ac0ba0031c18a6dea3b change-id: 20240117-b4-zswap-lock-optimize-44e071c13427 Best regards, -- Chengming Zhou <zhouchengming@bytedance.com> ^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 1/2] mm/zswap: make sure each swapfile always have zswap rb-tree 2024-01-17 9:23 [PATCH 0/2] mm/zswap: optimize the scalability of zswap rb-tree Chengming Zhou @ 2024-01-17 9:23 ` Chengming Zhou 2024-01-18 15:05 ` Johannes Weiner ` (2 more replies) 2024-01-17 9:23 ` [PATCH 2/2] mm/zswap: split " Chengming Zhou 2024-01-17 18:37 ` [PATCH 0/2] mm/zswap: optimize the scalability of " Yosry Ahmed 2 siblings, 3 replies; 23+ messages in thread From: Chengming Zhou @ 2024-01-17 9:23 UTC (permalink / raw) To: Andrew Morton Cc: Yosry Ahmed, Chengming Zhou, linux-kernel, Johannes Weiner, linux-mm, Chris Li, Nhat Pham Not all zswap interfaces can handle the absence of the zswap rb-tree, actually only zswap_store() has handled it for now. To make things simple, we make sure each swapfile always have the zswap rb-tree prepared before being enabled and used. The preparation is unlikely to fail in practice, this patch just make it explicit. Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> --- include/linux/zswap.h | 7 +++++-- mm/swapfile.c | 10 +++++++--- mm/zswap.c | 7 ++++--- 3 files changed, 16 insertions(+), 8 deletions(-) diff --git a/include/linux/zswap.h b/include/linux/zswap.h index 0b709f5bc65f..eca388229d9a 100644 --- a/include/linux/zswap.h +++ b/include/linux/zswap.h @@ -30,7 +30,7 @@ struct zswap_lruvec_state { bool zswap_store(struct folio *folio); bool zswap_load(struct folio *folio); void zswap_invalidate(int type, pgoff_t offset); -void zswap_swapon(int type); +int zswap_swapon(int type); void zswap_swapoff(int type); void zswap_memcg_offline_cleanup(struct mem_cgroup *memcg); void zswap_lruvec_state_init(struct lruvec *lruvec); @@ -51,7 +51,10 @@ static inline bool zswap_load(struct folio *folio) } static inline void zswap_invalidate(int type, pgoff_t offset) {} -static inline void zswap_swapon(int type) {} +static inline int zswap_swapon(int type) +{ + return 0; +} static inline void zswap_swapoff(int type) {} static inline void zswap_memcg_offline_cleanup(struct mem_cgroup *memcg) {} static inline void zswap_lruvec_state_init(struct lruvec *lruvec) {} diff --git a/mm/swapfile.c b/mm/swapfile.c index 3eec686484ef..6c53ea06626b 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -2347,8 +2347,6 @@ static void enable_swap_info(struct swap_info_struct *p, int prio, unsigned char *swap_map, struct swap_cluster_info *cluster_info) { - zswap_swapon(p->type); - spin_lock(&swap_lock); spin_lock(&p->lock); setup_swap_info(p, prio, swap_map, cluster_info); @@ -3166,6 +3164,10 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags) if (error) goto bad_swap_unlock_inode; + error = zswap_swapon(p->type); + if (error) + goto free_swap_address_space; + /* * Flush any pending IO and dirty mappings before we start using this * swap device. @@ -3174,7 +3176,7 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags) error = inode_drain_writes(inode); if (error) { inode->i_flags &= ~S_SWAPFILE; - goto free_swap_address_space; + goto free_swap_zswap; } mutex_lock(&swapon_mutex); @@ -3198,6 +3200,8 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags) error = 0; goto out; +free_swap_zswap: + zswap_swapoff(p->type); free_swap_address_space: exit_swap_address_space(p->type); bad_swap_unlock_inode: diff --git a/mm/zswap.c b/mm/zswap.c index ca25b676048e..d88faea85978 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -1519,7 +1519,7 @@ bool zswap_store(struct folio *folio) if (folio_test_large(folio)) return false; - if (!zswap_enabled || !tree) + if (!zswap_enabled) return false; /* @@ -1772,19 +1772,20 @@ void zswap_invalidate(int type, pgoff_t offset) spin_unlock(&tree->lock); } -void zswap_swapon(int type) +int zswap_swapon(int type) { struct zswap_tree *tree; tree = kzalloc(sizeof(*tree), GFP_KERNEL); if (!tree) { pr_err("alloc failed, zswap disabled for swap type %d\n", type); - return; + return -ENOMEM; } tree->rbroot = RB_ROOT; spin_lock_init(&tree->lock); zswap_trees[type] = tree; + return 0; } void zswap_swapoff(int type) -- b4 0.10.1 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH 1/2] mm/zswap: make sure each swapfile always have zswap rb-tree 2024-01-17 9:23 ` [PATCH 1/2] mm/zswap: make sure each swapfile always have " Chengming Zhou @ 2024-01-18 15:05 ` Johannes Weiner 2024-01-18 17:37 ` Yosry Ahmed 2024-01-18 18:16 ` Nhat Pham 2 siblings, 0 replies; 23+ messages in thread From: Johannes Weiner @ 2024-01-18 15:05 UTC (permalink / raw) To: Chengming Zhou Cc: Andrew Morton, Yosry Ahmed, linux-kernel, linux-mm, Chris Li, Nhat Pham On Wed, Jan 17, 2024 at 09:23:18AM +0000, Chengming Zhou wrote: > Not all zswap interfaces can handle the absence of the zswap rb-tree, > actually only zswap_store() has handled it for now. > > To make things simple, we make sure each swapfile always have the > zswap rb-tree prepared before being enabled and used. The preparation > is unlikely to fail in practice, this patch just make it explicit. > > Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 1/2] mm/zswap: make sure each swapfile always have zswap rb-tree 2024-01-17 9:23 ` [PATCH 1/2] mm/zswap: make sure each swapfile always have " Chengming Zhou 2024-01-18 15:05 ` Johannes Weiner @ 2024-01-18 17:37 ` Yosry Ahmed 2024-01-18 18:16 ` Nhat Pham 2 siblings, 0 replies; 23+ messages in thread From: Yosry Ahmed @ 2024-01-18 17:37 UTC (permalink / raw) To: Chengming Zhou Cc: Andrew Morton, linux-kernel, Johannes Weiner, linux-mm, Chris Li, Nhat Pham On Wed, Jan 17, 2024 at 1:23 AM Chengming Zhou <zhouchengming@bytedance.com> wrote: > > Not all zswap interfaces can handle the absence of the zswap rb-tree, > actually only zswap_store() has handled it for now. > > To make things simple, we make sure each swapfile always have the > zswap rb-tree prepared before being enabled and used. The preparation > is unlikely to fail in practice, this patch just make it explicit. > > Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Acked-by: Yosry Ahmed <yosryahmed@google.com> ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 1/2] mm/zswap: make sure each swapfile always have zswap rb-tree 2024-01-17 9:23 ` [PATCH 1/2] mm/zswap: make sure each swapfile always have " Chengming Zhou 2024-01-18 15:05 ` Johannes Weiner 2024-01-18 17:37 ` Yosry Ahmed @ 2024-01-18 18:16 ` Nhat Pham 2 siblings, 0 replies; 23+ messages in thread From: Nhat Pham @ 2024-01-18 18:16 UTC (permalink / raw) To: Chengming Zhou Cc: Andrew Morton, Yosry Ahmed, linux-kernel, Johannes Weiner, linux-mm, Chris Li On Wed, Jan 17, 2024 at 1:23 AM Chengming Zhou <zhouchengming@bytedance.com> wrote: > > Not all zswap interfaces can handle the absence of the zswap rb-tree, > actually only zswap_store() has handled it for now. > > To make things simple, we make sure each swapfile always have the > zswap rb-tree prepared before being enabled and used. The preparation > is unlikely to fail in practice, this patch just make it explicit. > > Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> This seems fine to me. IIUC, zswap_swapon() only fails when the rbtree allocation fails, and the tree's memory footprint is small so that's unlikely anyway. Acked-by: Nhat Pham <nphamcs@gmail.com> > --- > include/linux/zswap.h | 7 +++++-- > mm/swapfile.c | 10 +++++++--- > mm/zswap.c | 7 ++++--- > 3 files changed, 16 insertions(+), 8 deletions(-) > > diff --git a/include/linux/zswap.h b/include/linux/zswap.h > index 0b709f5bc65f..eca388229d9a 100644 > --- a/include/linux/zswap.h > +++ b/include/linux/zswap.h > @@ -30,7 +30,7 @@ struct zswap_lruvec_state { > bool zswap_store(struct folio *folio); > bool zswap_load(struct folio *folio); > void zswap_invalidate(int type, pgoff_t offset); > -void zswap_swapon(int type); > +int zswap_swapon(int type); > void zswap_swapoff(int type); > void zswap_memcg_offline_cleanup(struct mem_cgroup *memcg); > void zswap_lruvec_state_init(struct lruvec *lruvec); > @@ -51,7 +51,10 @@ static inline bool zswap_load(struct folio *folio) > } > > static inline void zswap_invalidate(int type, pgoff_t offset) {} > -static inline void zswap_swapon(int type) {} > +static inline int zswap_swapon(int type) > +{ > + return 0; > +} > static inline void zswap_swapoff(int type) {} > static inline void zswap_memcg_offline_cleanup(struct mem_cgroup *memcg) {} > static inline void zswap_lruvec_state_init(struct lruvec *lruvec) {} > diff --git a/mm/swapfile.c b/mm/swapfile.c > index 3eec686484ef..6c53ea06626b 100644 > --- a/mm/swapfile.c > +++ b/mm/swapfile.c > @@ -2347,8 +2347,6 @@ static void enable_swap_info(struct swap_info_struct *p, int prio, > unsigned char *swap_map, > struct swap_cluster_info *cluster_info) > { > - zswap_swapon(p->type); > - > spin_lock(&swap_lock); > spin_lock(&p->lock); > setup_swap_info(p, prio, swap_map, cluster_info); > @@ -3166,6 +3164,10 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags) > if (error) > goto bad_swap_unlock_inode; > > + error = zswap_swapon(p->type); > + if (error) > + goto free_swap_address_space; > + > /* > * Flush any pending IO and dirty mappings before we start using this > * swap device. > @@ -3174,7 +3176,7 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags) > error = inode_drain_writes(inode); > if (error) { > inode->i_flags &= ~S_SWAPFILE; > - goto free_swap_address_space; > + goto free_swap_zswap; > } > > mutex_lock(&swapon_mutex); > @@ -3198,6 +3200,8 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags) > > error = 0; > goto out; > +free_swap_zswap: > + zswap_swapoff(p->type); > free_swap_address_space: > exit_swap_address_space(p->type); > bad_swap_unlock_inode: > diff --git a/mm/zswap.c b/mm/zswap.c > index ca25b676048e..d88faea85978 100644 > --- a/mm/zswap.c > +++ b/mm/zswap.c > @@ -1519,7 +1519,7 @@ bool zswap_store(struct folio *folio) > if (folio_test_large(folio)) > return false; > > - if (!zswap_enabled || !tree) > + if (!zswap_enabled) > return false; > > /* > @@ -1772,19 +1772,20 @@ void zswap_invalidate(int type, pgoff_t offset) > spin_unlock(&tree->lock); > } > > -void zswap_swapon(int type) > +int zswap_swapon(int type) > { > struct zswap_tree *tree; > > tree = kzalloc(sizeof(*tree), GFP_KERNEL); > if (!tree) { > pr_err("alloc failed, zswap disabled for swap type %d\n", type); > - return; > + return -ENOMEM; > } > > tree->rbroot = RB_ROOT; > spin_lock_init(&tree->lock); > zswap_trees[type] = tree; > + return 0; > } > > void zswap_swapoff(int type) > > -- > b4 0.10.1 ^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 2/2] mm/zswap: split zswap rb-tree 2024-01-17 9:23 [PATCH 0/2] mm/zswap: optimize the scalability of zswap rb-tree Chengming Zhou 2024-01-17 9:23 ` [PATCH 1/2] mm/zswap: make sure each swapfile always have " Chengming Zhou @ 2024-01-17 9:23 ` Chengming Zhou 2024-01-18 15:11 ` Johannes Weiner 2024-01-18 19:24 ` Nhat Pham 2024-01-17 18:37 ` [PATCH 0/2] mm/zswap: optimize the scalability of " Yosry Ahmed 2 siblings, 2 replies; 23+ messages in thread From: Chengming Zhou @ 2024-01-17 9:23 UTC (permalink / raw) To: Andrew Morton Cc: Yosry Ahmed, Chengming Zhou, linux-kernel, Johannes Weiner, linux-mm, Chris Li, Nhat Pham Each swapfile has one rb-tree to search the mapping of swp_entry_t to zswap_entry, that use a spinlock to protect, which can cause heavy lock contention if multiple tasks zswap_store/load concurrently. Optimize the scalability problem by splitting the zswap rb-tree into multiple rb-trees, each corresponds to SWAP_ADDRESS_SPACE_PAGES (64M), just like we did in the swap cache address_space splitting. Although this method can't solve the spinlock contention completely, it can mitigate much of that contention. Below is the results of kernel build in tmpfs with zswap shrinker enabled: linux-next zswap-lock-optimize real 1m9.181s 1m3.820s user 17m44.036s 17m40.100s sys 7m37.297s 4m54.622s So there are clearly improvements. Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> --- include/linux/zswap.h | 4 +-- mm/swapfile.c | 2 +- mm/zswap.c | 69 ++++++++++++++++++++++++++++++++------------------- 3 files changed, 47 insertions(+), 28 deletions(-) diff --git a/include/linux/zswap.h b/include/linux/zswap.h index eca388229d9a..91895ce1fdbc 100644 --- a/include/linux/zswap.h +++ b/include/linux/zswap.h @@ -30,7 +30,7 @@ struct zswap_lruvec_state { bool zswap_store(struct folio *folio); bool zswap_load(struct folio *folio); void zswap_invalidate(int type, pgoff_t offset); -int zswap_swapon(int type); +int zswap_swapon(int type, unsigned long nr_pages); void zswap_swapoff(int type); void zswap_memcg_offline_cleanup(struct mem_cgroup *memcg); void zswap_lruvec_state_init(struct lruvec *lruvec); @@ -51,7 +51,7 @@ static inline bool zswap_load(struct folio *folio) } static inline void zswap_invalidate(int type, pgoff_t offset) {} -static inline int zswap_swapon(int type) +static inline int zswap_swapon(int type, unsigned long nr_pages) { return 0; } diff --git a/mm/swapfile.c b/mm/swapfile.c index 6c53ea06626b..35aa17b2a2fa 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -3164,7 +3164,7 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags) if (error) goto bad_swap_unlock_inode; - error = zswap_swapon(p->type); + error = zswap_swapon(p->type, maxpages); if (error) goto free_swap_address_space; diff --git a/mm/zswap.c b/mm/zswap.c index d88faea85978..4a6dbc620c7c 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -239,6 +239,7 @@ struct zswap_tree { }; static struct zswap_tree *zswap_trees[MAX_SWAPFILES]; +static unsigned int nr_zswap_trees[MAX_SWAPFILES]; /* RCU-protected iteration */ static LIST_HEAD(zswap_pools); @@ -265,6 +266,10 @@ static bool zswap_has_pool; * helpers and fwd declarations **********************************/ +#define swap_zswap_tree(entry) \ + (&zswap_trees[swp_type(entry)][swp_offset(entry) \ + >> SWAP_ADDRESS_SPACE_SHIFT]) + #define zswap_pool_debug(msg, p) \ pr_debug("%s pool %s/%s\n", msg, (p)->tfm_name, \ zpool_get_type((p)->zpools[0])) @@ -865,7 +870,7 @@ static enum lru_status shrink_memcg_cb(struct list_head *item, struct list_lru_o * until the entry is verified to still be alive in the tree. */ swpoffset = swp_offset(entry->swpentry); - tree = zswap_trees[swp_type(entry->swpentry)]; + tree = swap_zswap_tree(entry->swpentry); list_lru_isolate(l, item); /* * It's safe to drop the lock here because we return either @@ -1494,10 +1499,9 @@ static void zswap_fill_page(void *ptr, unsigned long value) bool zswap_store(struct folio *folio) { swp_entry_t swp = folio->swap; - int type = swp_type(swp); pgoff_t offset = swp_offset(swp); struct page *page = &folio->page; - struct zswap_tree *tree = zswap_trees[type]; + struct zswap_tree *tree = swap_zswap_tree(swp); struct zswap_entry *entry, *dupentry; struct scatterlist input, output; struct crypto_acomp_ctx *acomp_ctx; @@ -1569,7 +1573,7 @@ bool zswap_store(struct folio *folio) src = kmap_local_page(page); if (zswap_is_page_same_filled(src, &value)) { kunmap_local(src); - entry->swpentry = swp_entry(type, offset); + entry->swpentry = swp; entry->length = 0; entry->value = value; atomic_inc(&zswap_same_filled_pages); @@ -1651,7 +1655,7 @@ bool zswap_store(struct folio *folio) mutex_unlock(&acomp_ctx->mutex); /* populate entry */ - entry->swpentry = swp_entry(type, offset); + entry->swpentry = swp; entry->handle = handle; entry->length = dlen; @@ -1711,10 +1715,9 @@ bool zswap_store(struct folio *folio) bool zswap_load(struct folio *folio) { swp_entry_t swp = folio->swap; - int type = swp_type(swp); pgoff_t offset = swp_offset(swp); struct page *page = &folio->page; - struct zswap_tree *tree = zswap_trees[type]; + struct zswap_tree *tree = swap_zswap_tree(swp); struct zswap_entry *entry; u8 *dst; @@ -1757,7 +1760,7 @@ bool zswap_load(struct folio *folio) void zswap_invalidate(int type, pgoff_t offset) { - struct zswap_tree *tree = zswap_trees[type]; + struct zswap_tree *tree = swap_zswap_tree(swp_entry(type, offset)); struct zswap_entry *entry; /* find */ @@ -1772,37 +1775,53 @@ void zswap_invalidate(int type, pgoff_t offset) spin_unlock(&tree->lock); } -int zswap_swapon(int type) +int zswap_swapon(int type, unsigned long nr_pages) { - struct zswap_tree *tree; + struct zswap_tree *trees, *tree; + unsigned int nr, i; - tree = kzalloc(sizeof(*tree), GFP_KERNEL); - if (!tree) { + nr = DIV_ROUND_UP(nr_pages, SWAP_ADDRESS_SPACE_PAGES); + trees = kvcalloc(nr, sizeof(*tree), GFP_KERNEL); + if (!trees) { pr_err("alloc failed, zswap disabled for swap type %d\n", type); return -ENOMEM; } - tree->rbroot = RB_ROOT; - spin_lock_init(&tree->lock); - zswap_trees[type] = tree; + for (i = 0; i < nr; i++) { + tree = trees + i; + tree->rbroot = RB_ROOT; + spin_lock_init(&tree->lock); + } + + nr_zswap_trees[type] = nr; + zswap_trees[type] = trees; return 0; } void zswap_swapoff(int type) { - struct zswap_tree *tree = zswap_trees[type]; - struct zswap_entry *entry, *n; + struct zswap_tree *trees = zswap_trees[type]; + unsigned int i; - if (!tree) + if (!trees) return; - /* walk the tree and free everything */ - spin_lock(&tree->lock); - rbtree_postorder_for_each_entry_safe(entry, n, &tree->rbroot, rbnode) - zswap_free_entry(entry); - tree->rbroot = RB_ROOT; - spin_unlock(&tree->lock); - kfree(tree); + for (i = 0; i < nr_zswap_trees[type]; i++) { + struct zswap_tree *tree = trees + i; + struct zswap_entry *entry, *n; + + /* walk the tree and free everything */ + spin_lock(&tree->lock); + rbtree_postorder_for_each_entry_safe(entry, n, + &tree->rbroot, + rbnode) + zswap_free_entry(entry); + tree->rbroot = RB_ROOT; + spin_unlock(&tree->lock); + } + + kvfree(trees); + nr_zswap_trees[type] = 0; zswap_trees[type] = NULL; } -- b4 0.10.1 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH 2/2] mm/zswap: split zswap rb-tree 2024-01-17 9:23 ` [PATCH 2/2] mm/zswap: split " Chengming Zhou @ 2024-01-18 15:11 ` Johannes Weiner 2024-01-19 6:20 ` Chengming Zhou 2024-01-18 19:24 ` Nhat Pham 1 sibling, 1 reply; 23+ messages in thread From: Johannes Weiner @ 2024-01-18 15:11 UTC (permalink / raw) To: Chengming Zhou Cc: Andrew Morton, Yosry Ahmed, linux-kernel, linux-mm, Chris Li, Nhat Pham On Wed, Jan 17, 2024 at 09:23:19AM +0000, Chengming Zhou wrote: > Each swapfile has one rb-tree to search the mapping of swp_entry_t to > zswap_entry, that use a spinlock to protect, which can cause heavy lock > contention if multiple tasks zswap_store/load concurrently. > > Optimize the scalability problem by splitting the zswap rb-tree into > multiple rb-trees, each corresponds to SWAP_ADDRESS_SPACE_PAGES (64M), > just like we did in the swap cache address_space splitting. > > Although this method can't solve the spinlock contention completely, it > can mitigate much of that contention. Below is the results of kernel build > in tmpfs with zswap shrinker enabled: > > linux-next zswap-lock-optimize > real 1m9.181s 1m3.820s > user 17m44.036s 17m40.100s > sys 7m37.297s 4m54.622s > > So there are clearly improvements. > > Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> One minor nit: > @@ -265,6 +266,10 @@ static bool zswap_has_pool; > * helpers and fwd declarations > **********************************/ > > +#define swap_zswap_tree(entry) \ > + (&zswap_trees[swp_type(entry)][swp_offset(entry) \ > + >> SWAP_ADDRESS_SPACE_SHIFT]) Make this a static inline function instead? ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 2/2] mm/zswap: split zswap rb-tree 2024-01-18 15:11 ` Johannes Weiner @ 2024-01-19 6:20 ` Chengming Zhou 0 siblings, 0 replies; 23+ messages in thread From: Chengming Zhou @ 2024-01-19 6:20 UTC (permalink / raw) To: Johannes Weiner Cc: Andrew Morton, Yosry Ahmed, linux-kernel, linux-mm, Chris Li, Nhat Pham On 2024/1/18 23:11, Johannes Weiner wrote: > On Wed, Jan 17, 2024 at 09:23:19AM +0000, Chengming Zhou wrote: >> Each swapfile has one rb-tree to search the mapping of swp_entry_t to >> zswap_entry, that use a spinlock to protect, which can cause heavy lock >> contention if multiple tasks zswap_store/load concurrently. >> >> Optimize the scalability problem by splitting the zswap rb-tree into >> multiple rb-trees, each corresponds to SWAP_ADDRESS_SPACE_PAGES (64M), >> just like we did in the swap cache address_space splitting. >> >> Although this method can't solve the spinlock contention completely, it >> can mitigate much of that contention. Below is the results of kernel build >> in tmpfs with zswap shrinker enabled: >> >> linux-next zswap-lock-optimize >> real 1m9.181s 1m3.820s >> user 17m44.036s 17m40.100s >> sys 7m37.297s 4m54.622s >> >> So there are clearly improvements. >> >> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> > > Acked-by: Johannes Weiner <hannes@cmpxchg.org> > > One minor nit: > >> @@ -265,6 +266,10 @@ static bool zswap_has_pool; >> * helpers and fwd declarations >> **********************************/ >> >> +#define swap_zswap_tree(entry) \ >> + (&zswap_trees[swp_type(entry)][swp_offset(entry) \ >> + >> SWAP_ADDRESS_SPACE_SHIFT]) > > Make this a static inline function instead? Good suggestion, will do. Thanks. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 2/2] mm/zswap: split zswap rb-tree 2024-01-17 9:23 ` [PATCH 2/2] mm/zswap: split " Chengming Zhou 2024-01-18 15:11 ` Johannes Weiner @ 2024-01-18 19:24 ` Nhat Pham 2024-01-19 6:24 ` Chengming Zhou 1 sibling, 1 reply; 23+ messages in thread From: Nhat Pham @ 2024-01-18 19:24 UTC (permalink / raw) To: Chengming Zhou Cc: Andrew Morton, Yosry Ahmed, linux-kernel, Johannes Weiner, linux-mm, Chris Li On Wed, Jan 17, 2024 at 1:23 AM Chengming Zhou <zhouchengming@bytedance.com> wrote: > > Each swapfile has one rb-tree to search the mapping of swp_entry_t to > zswap_entry, that use a spinlock to protect, which can cause heavy lock > contention if multiple tasks zswap_store/load concurrently. > > Optimize the scalability problem by splitting the zswap rb-tree into > multiple rb-trees, each corresponds to SWAP_ADDRESS_SPACE_PAGES (64M), > just like we did in the swap cache address_space splitting. > > Although this method can't solve the spinlock contention completely, it > can mitigate much of that contention. Below is the results of kernel build > in tmpfs with zswap shrinker enabled: > > linux-next zswap-lock-optimize > real 1m9.181s 1m3.820s > user 17m44.036s 17m40.100s > sys 7m37.297s 4m54.622s That's really impressive, especially the sys time reduction :) Well done. > > So there are clearly improvements. > > Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Code looks solid too. I haven't read the xarray patch series too closely yet, but this patch series is clearly already an improvement. It is simple, with existing precedent (from swap cache), and experiments show that it works quite well to improve zswap's performance. If the xarray patch proves to be even better, we can always combine it with this approach (a per-range xarray?), or replace it with the xarray. But for now: Acked-by: Nhat Pham <nphamcs@gmail.com> > --- > include/linux/zswap.h | 4 +-- > mm/swapfile.c | 2 +- > mm/zswap.c | 69 ++++++++++++++++++++++++++++++++------------------- > 3 files changed, 47 insertions(+), 28 deletions(-) > > diff --git a/include/linux/zswap.h b/include/linux/zswap.h > index eca388229d9a..91895ce1fdbc 100644 > --- a/include/linux/zswap.h > +++ b/include/linux/zswap.h > @@ -30,7 +30,7 @@ struct zswap_lruvec_state { > bool zswap_store(struct folio *folio); > bool zswap_load(struct folio *folio); > void zswap_invalidate(int type, pgoff_t offset); > -int zswap_swapon(int type); > +int zswap_swapon(int type, unsigned long nr_pages); > void zswap_swapoff(int type); > void zswap_memcg_offline_cleanup(struct mem_cgroup *memcg); > void zswap_lruvec_state_init(struct lruvec *lruvec); > @@ -51,7 +51,7 @@ static inline bool zswap_load(struct folio *folio) > } > > static inline void zswap_invalidate(int type, pgoff_t offset) {} > -static inline int zswap_swapon(int type) > +static inline int zswap_swapon(int type, unsigned long nr_pages) > { > return 0; > } > diff --git a/mm/swapfile.c b/mm/swapfile.c > index 6c53ea06626b..35aa17b2a2fa 100644 > --- a/mm/swapfile.c > +++ b/mm/swapfile.c > @@ -3164,7 +3164,7 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags) > if (error) > goto bad_swap_unlock_inode; > > - error = zswap_swapon(p->type); > + error = zswap_swapon(p->type, maxpages); > if (error) > goto free_swap_address_space; > > diff --git a/mm/zswap.c b/mm/zswap.c > index d88faea85978..4a6dbc620c7c 100644 > --- a/mm/zswap.c > +++ b/mm/zswap.c > @@ -239,6 +239,7 @@ struct zswap_tree { > }; > > static struct zswap_tree *zswap_trees[MAX_SWAPFILES]; > +static unsigned int nr_zswap_trees[MAX_SWAPFILES]; > > /* RCU-protected iteration */ > static LIST_HEAD(zswap_pools); > @@ -265,6 +266,10 @@ static bool zswap_has_pool; > * helpers and fwd declarations > **********************************/ > > +#define swap_zswap_tree(entry) \ > + (&zswap_trees[swp_type(entry)][swp_offset(entry) \ > + >> SWAP_ADDRESS_SPACE_SHIFT]) > + > #define zswap_pool_debug(msg, p) \ > pr_debug("%s pool %s/%s\n", msg, (p)->tfm_name, \ > zpool_get_type((p)->zpools[0])) > @@ -865,7 +870,7 @@ static enum lru_status shrink_memcg_cb(struct list_head *item, struct list_lru_o > * until the entry is verified to still be alive in the tree. > */ > swpoffset = swp_offset(entry->swpentry); > - tree = zswap_trees[swp_type(entry->swpentry)]; > + tree = swap_zswap_tree(entry->swpentry); > list_lru_isolate(l, item); > /* > * It's safe to drop the lock here because we return either > @@ -1494,10 +1499,9 @@ static void zswap_fill_page(void *ptr, unsigned long value) > bool zswap_store(struct folio *folio) > { > swp_entry_t swp = folio->swap; > - int type = swp_type(swp); > pgoff_t offset = swp_offset(swp); > struct page *page = &folio->page; > - struct zswap_tree *tree = zswap_trees[type]; > + struct zswap_tree *tree = swap_zswap_tree(swp); > struct zswap_entry *entry, *dupentry; > struct scatterlist input, output; > struct crypto_acomp_ctx *acomp_ctx; > @@ -1569,7 +1573,7 @@ bool zswap_store(struct folio *folio) > src = kmap_local_page(page); > if (zswap_is_page_same_filled(src, &value)) { > kunmap_local(src); > - entry->swpentry = swp_entry(type, offset); > + entry->swpentry = swp; > entry->length = 0; > entry->value = value; > atomic_inc(&zswap_same_filled_pages); > @@ -1651,7 +1655,7 @@ bool zswap_store(struct folio *folio) > mutex_unlock(&acomp_ctx->mutex); > > /* populate entry */ > - entry->swpentry = swp_entry(type, offset); > + entry->swpentry = swp; > entry->handle = handle; > entry->length = dlen; > > @@ -1711,10 +1715,9 @@ bool zswap_store(struct folio *folio) > bool zswap_load(struct folio *folio) > { > swp_entry_t swp = folio->swap; > - int type = swp_type(swp); > pgoff_t offset = swp_offset(swp); > struct page *page = &folio->page; > - struct zswap_tree *tree = zswap_trees[type]; > + struct zswap_tree *tree = swap_zswap_tree(swp); > struct zswap_entry *entry; > u8 *dst; > > @@ -1757,7 +1760,7 @@ bool zswap_load(struct folio *folio) > > void zswap_invalidate(int type, pgoff_t offset) > { > - struct zswap_tree *tree = zswap_trees[type]; > + struct zswap_tree *tree = swap_zswap_tree(swp_entry(type, offset)); > struct zswap_entry *entry; > > /* find */ > @@ -1772,37 +1775,53 @@ void zswap_invalidate(int type, pgoff_t offset) > spin_unlock(&tree->lock); > } > > -int zswap_swapon(int type) > +int zswap_swapon(int type, unsigned long nr_pages) > { > - struct zswap_tree *tree; > + struct zswap_tree *trees, *tree; > + unsigned int nr, i; > > - tree = kzalloc(sizeof(*tree), GFP_KERNEL); > - if (!tree) { > + nr = DIV_ROUND_UP(nr_pages, SWAP_ADDRESS_SPACE_PAGES); > + trees = kvcalloc(nr, sizeof(*tree), GFP_KERNEL); > + if (!trees) { > pr_err("alloc failed, zswap disabled for swap type %d\n", type); > return -ENOMEM; > } > > - tree->rbroot = RB_ROOT; > - spin_lock_init(&tree->lock); > - zswap_trees[type] = tree; > + for (i = 0; i < nr; i++) { > + tree = trees + i; > + tree->rbroot = RB_ROOT; > + spin_lock_init(&tree->lock); > + } > + > + nr_zswap_trees[type] = nr; > + zswap_trees[type] = trees; > return 0; > } > > void zswap_swapoff(int type) > { > - struct zswap_tree *tree = zswap_trees[type]; > - struct zswap_entry *entry, *n; > + struct zswap_tree *trees = zswap_trees[type]; > + unsigned int i; > > - if (!tree) > + if (!trees) > return; > > - /* walk the tree and free everything */ > - spin_lock(&tree->lock); > - rbtree_postorder_for_each_entry_safe(entry, n, &tree->rbroot, rbnode) > - zswap_free_entry(entry); > - tree->rbroot = RB_ROOT; > - spin_unlock(&tree->lock); > - kfree(tree); > + for (i = 0; i < nr_zswap_trees[type]; i++) { > + struct zswap_tree *tree = trees + i; > + struct zswap_entry *entry, *n; > + > + /* walk the tree and free everything */ > + spin_lock(&tree->lock); > + rbtree_postorder_for_each_entry_safe(entry, n, > + &tree->rbroot, > + rbnode) > + zswap_free_entry(entry); > + tree->rbroot = RB_ROOT; > + spin_unlock(&tree->lock); > + } > + > + kvfree(trees); > + nr_zswap_trees[type] = 0; > zswap_trees[type] = NULL; > } > > > -- > b4 0.10.1 ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 2/2] mm/zswap: split zswap rb-tree 2024-01-18 19:24 ` Nhat Pham @ 2024-01-19 6:24 ` Chengming Zhou 0 siblings, 0 replies; 23+ messages in thread From: Chengming Zhou @ 2024-01-19 6:24 UTC (permalink / raw) To: Nhat Pham Cc: Andrew Morton, Yosry Ahmed, linux-kernel, Johannes Weiner, linux-mm, Chris Li On 2024/1/19 03:24, Nhat Pham wrote: > On Wed, Jan 17, 2024 at 1:23 AM Chengming Zhou > <zhouchengming@bytedance.com> wrote: >> >> Each swapfile has one rb-tree to search the mapping of swp_entry_t to >> zswap_entry, that use a spinlock to protect, which can cause heavy lock >> contention if multiple tasks zswap_store/load concurrently. >> >> Optimize the scalability problem by splitting the zswap rb-tree into >> multiple rb-trees, each corresponds to SWAP_ADDRESS_SPACE_PAGES (64M), >> just like we did in the swap cache address_space splitting. >> >> Although this method can't solve the spinlock contention completely, it >> can mitigate much of that contention. Below is the results of kernel build >> in tmpfs with zswap shrinker enabled: >> >> linux-next zswap-lock-optimize >> real 1m9.181s 1m3.820s >> user 17m44.036s 17m40.100s >> sys 7m37.297s 4m54.622s > > That's really impressive, especially the sys time reduction :) Well done. > Thanks! >> >> So there are clearly improvements. >> >> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> > > Code looks solid too. I haven't read the xarray patch series too > closely yet, but this patch series is clearly already an improvement. > It is simple, with existing precedent (from swap cache), and > experiments show that it works quite well to improve zswap's > performance. > > If the xarray patch proves to be even better, we can always combine it > with this approach (a per-range xarray?), or replace it with the > xarray. But for now: > > Acked-by: Nhat Pham <nphamcs@gmail.com> > Right, I agree. We should combine both approaches. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 0/2] mm/zswap: optimize the scalability of zswap rb-tree 2024-01-17 9:23 [PATCH 0/2] mm/zswap: optimize the scalability of zswap rb-tree Chengming Zhou 2024-01-17 9:23 ` [PATCH 1/2] mm/zswap: make sure each swapfile always have " Chengming Zhou 2024-01-17 9:23 ` [PATCH 2/2] mm/zswap: split " Chengming Zhou @ 2024-01-17 18:37 ` Yosry Ahmed 2024-01-17 23:41 ` Chris Li 2024-01-18 15:34 ` Johannes Weiner 2 siblings, 2 replies; 23+ messages in thread From: Yosry Ahmed @ 2024-01-17 18:37 UTC (permalink / raw) To: Chengming Zhou Cc: Andrew Morton, linux-kernel, Johannes Weiner, linux-mm, Chris Li, Nhat Pham On Wed, Jan 17, 2024 at 1:23 AM Chengming Zhou <zhouchengming@bytedance.com> wrote: > > When testing the zswap performance by using kernel build -j32 in a tmpfs > directory, I found the scalability of zswap rb-tree is not good, which > is protected by the only spinlock. That would cause heavy lock contention > if multiple tasks zswap_store/load concurrently. > > So a simple solution is to split the only one zswap rb-tree into multiple > rb-trees, each corresponds to SWAP_ADDRESS_SPACE_PAGES (64M). This idea is > from the commit 4b3ef9daa4fc ("mm/swap: split swap cache into 64MB trunks"). > > Although this method can't solve the spinlock contention completely, it > can mitigate much of that contention. Below is the results of kernel build > in tmpfs with zswap shrinker enabled: > > linux-next zswap-lock-optimize > real 1m9.181s 1m3.820s > user 17m44.036s 17m40.100s > sys 7m37.297s 4m54.622s > > So there are clearly improvements. And it's complementary with the ongoing > zswap xarray conversion by Chris. Anyway, I think we can also merge this > first, it's complementary IMHO. So I just refresh and resend this for > further discussion. The reason why I think we should wait for the xarray patch(es) is there is a chance we may see less improvements from splitting the tree if it was an xarray. If we merge this series first, there is no way to know. Chris, do you intend to send the xarray patch(es) anytime soon? ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 0/2] mm/zswap: optimize the scalability of zswap rb-tree 2024-01-17 18:37 ` [PATCH 0/2] mm/zswap: optimize the scalability of " Yosry Ahmed @ 2024-01-17 23:41 ` Chris Li 2024-01-17 23:47 ` Yosry Ahmed 2024-01-18 15:34 ` Johannes Weiner 1 sibling, 1 reply; 23+ messages in thread From: Chris Li @ 2024-01-17 23:41 UTC (permalink / raw) To: Yosry Ahmed Cc: Chengming Zhou, Andrew Morton, linux-kernel, Johannes Weiner, linux-mm, Nhat Pham Hi Yosry and Chengming, On Wed, Jan 17, 2024 at 10:38 AM Yosry Ahmed <yosryahmed@google.com> wrote: > > On Wed, Jan 17, 2024 at 1:23 AM Chengming Zhou > <zhouchengming@bytedance.com> wrote: > > > > When testing the zswap performance by using kernel build -j32 in a tmpfs > > directory, I found the scalability of zswap rb-tree is not good, which > > is protected by the only spinlock. That would cause heavy lock contention > > if multiple tasks zswap_store/load concurrently. > > > > So a simple solution is to split the only one zswap rb-tree into multiple > > rb-trees, each corresponds to SWAP_ADDRESS_SPACE_PAGES (64M). This idea is > > from the commit 4b3ef9daa4fc ("mm/swap: split swap cache into 64MB trunks"). > > > > Although this method can't solve the spinlock contention completely, it > > can mitigate much of that contention. Below is the results of kernel build > > in tmpfs with zswap shrinker enabled: > > > > linux-next zswap-lock-optimize > > real 1m9.181s 1m3.820s > > user 17m44.036s 17m40.100s > > sys 7m37.297s 4m54.622s > > > > So there are clearly improvements. And it's complementary with the ongoing > > zswap xarray conversion by Chris. Anyway, I think we can also merge this > > first, it's complementary IMHO. So I just refresh and resend this for > > further discussion. > Sorry I have been radio silent busying on a few refreshments of the xarray on the recent kernel tree. There is an assertion triggered on xarray and the rb tree does not agree with each other. It takes some time to debug. I ironed that out, also glad the assert did catch a bug. Currently the xarray patch should have everything it takes to use RCU read lock. However taking out the tree spinlock is more work than previously. If we are going to remove the tree spinlock, I think we should revert back to doing a zswap tree lookup and return the zswap entry with reference increased. The tree mapping can still decouple from the zswap entry reference count drop to zero. Anyway, my V1 of the xarray patch will not include removing the tree spinlock. > The reason why I think we should wait for the xarray patch(es) is > there is a chance we may see less improvements from splitting the tree > if it was an xarray. If we merge this series first, there is no way to > know. > > Chris, do you intend to send the xarray patch(es) anytime soon? Thanks for the heads up. Let me send it out now. Chris ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 0/2] mm/zswap: optimize the scalability of zswap rb-tree 2024-01-17 23:41 ` Chris Li @ 2024-01-17 23:47 ` Yosry Ahmed 2024-01-18 0:17 ` Chris Li 0 siblings, 1 reply; 23+ messages in thread From: Yosry Ahmed @ 2024-01-17 23:47 UTC (permalink / raw) To: Chris Li Cc: Chengming Zhou, Andrew Morton, linux-kernel, Johannes Weiner, linux-mm, Nhat Pham > Currently the xarray patch should have everything it takes to use RCU > read lock. However taking out the tree spinlock is more work than > previously. If we are going to remove the tree spinlock, I think we > should revert back to doing a zswap tree lookup and return the zswap > entry with reference increased. The tree mapping can still decouple > from the zswap entry reference count drop to zero. Anyway, my V1 of > the xarray patch will not include removing the tree spinlock. Interesting. What do you mean by removing the tree spinlock? My assumption was that the xarray reduces lock contention because we do not need a lock to do lookups, but we still need the lock otherwise. Did you have something in mind to completely remove the tree lock? > > > The reason why I think we should wait for the xarray patch(es) is > > there is a chance we may see less improvements from splitting the tree > > if it was an xarray. If we merge this series first, there is no way to > > know. > > > > Chris, do you intend to send the xarray patch(es) anytime soon? > > Thanks for the heads up. Let me send it out now. Awesome, thanks! I assume Chengming can test whether this series provides the same benefits with the xarray. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 0/2] mm/zswap: optimize the scalability of zswap rb-tree 2024-01-17 23:47 ` Yosry Ahmed @ 2024-01-18 0:17 ` Chris Li 2024-01-18 0:34 ` Yosry Ahmed 2024-01-18 0:49 ` Nhat Pham 0 siblings, 2 replies; 23+ messages in thread From: Chris Li @ 2024-01-18 0:17 UTC (permalink / raw) To: Yosry Ahmed Cc: Chengming Zhou, Andrew Morton, linux-kernel, Johannes Weiner, linux-mm, Nhat Pham Hi Yosry, On Wed, Jan 17, 2024 at 3:48 PM Yosry Ahmed <yosryahmed@google.com> wrote: > > > Currently the xarray patch should have everything it takes to use RCU > > read lock. However taking out the tree spinlock is more work than > > previously. If we are going to remove the tree spinlock, I think we > > should revert back to doing a zswap tree lookup and return the zswap > > entry with reference increased. The tree mapping can still decouple > > from the zswap entry reference count drop to zero. Anyway, my V1 of > > the xarray patch will not include removing the tree spinlock. > > Interesting. What do you mean by removing the tree spinlock? My > assumption was that the xarray reduces lock contention because we do > not need a lock to do lookups, but we still need the lock otherwise. > Did you have something in mind to completely remove the tree lock? In my current xarray series, it adds the xarray alongside the rb tree. Xarray has its own internal lock as well. Effectively zswap now has two locks instead of just one previously. The xarray lock will not have any contention due to the xarray lock taken inside the zswap rb tree lock. The eventual goal is reducing the two locks back to one(xarray lock), which is not in my V1 patch. Your understanding is correct, the xarray still needs to have one lock for protecting the write path. > > > The reason why I think we should wait for the xarray patch(es) is > > > there is a chance we may see less improvements from splitting the tree > > > if it was an xarray. If we merge this series first, there is no way to > > > know. > > > > > > Chris, do you intend to send the xarray patch(es) anytime soon? > > > > Thanks for the heads up. Let me send it out now. > > Awesome, thanks! I assume Chengming can test whether this series > provides the same benefits with the xarray. That would be great. The xarray patch needs more testing for sure. Chris ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 0/2] mm/zswap: optimize the scalability of zswap rb-tree 2024-01-18 0:17 ` Chris Li @ 2024-01-18 0:34 ` Yosry Ahmed 2024-01-18 1:03 ` Chris Li 2024-01-18 0:49 ` Nhat Pham 1 sibling, 1 reply; 23+ messages in thread From: Yosry Ahmed @ 2024-01-18 0:34 UTC (permalink / raw) To: Chris Li Cc: Chengming Zhou, Andrew Morton, linux-kernel, Johannes Weiner, linux-mm, Nhat Pham On Wed, Jan 17, 2024 at 4:18 PM Chris Li <chrisl@kernel.org> wrote: > > Hi Yosry, > > On Wed, Jan 17, 2024 at 3:48 PM Yosry Ahmed <yosryahmed@google.com> wrote: > > > > > Currently the xarray patch should have everything it takes to use RCU > > > read lock. However taking out the tree spinlock is more work than > > > previously. If we are going to remove the tree spinlock, I think we > > > should revert back to doing a zswap tree lookup and return the zswap > > > entry with reference increased. The tree mapping can still decouple > > > from the zswap entry reference count drop to zero. Anyway, my V1 of > > > the xarray patch will not include removing the tree spinlock. > > > > Interesting. What do you mean by removing the tree spinlock? My > > assumption was that the xarray reduces lock contention because we do > > not need a lock to do lookups, but we still need the lock otherwise. > > Did you have something in mind to completely remove the tree lock? > > In my current xarray series, it adds the xarray alongside the rb tree. > Xarray has its own internal lock as well. Effectively zswap now has > two locks instead of just one previously. The xarray lock will not > have any contention due to the xarray lock taken inside the zswap rb > tree lock. The eventual goal is reducing the two locks back to > one(xarray lock), which is not in my V1 patch. Your understanding is > correct, the xarray still needs to have one lock for protecting the > write path. Hmm I don't understand. What's the point of keeping the rbtree if we have the xarray? Doesn't it end up being more expensive and bug-prone to maintain both trees? When you say "eventual goal", do you mean what the patch would morph into in later versions (as in v1 is just a proof of concept without removing the rbtree), or follow up patches? ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 0/2] mm/zswap: optimize the scalability of zswap rb-tree 2024-01-18 0:34 ` Yosry Ahmed @ 2024-01-18 1:03 ` Chris Li 2024-01-18 3:51 ` Yosry Ahmed 0 siblings, 1 reply; 23+ messages in thread From: Chris Li @ 2024-01-18 1:03 UTC (permalink / raw) To: Yosry Ahmed Cc: Chengming Zhou, Andrew Morton, linux-kernel, Johannes Weiner, linux-mm, Nhat Pham Hi Yosry, On Wed, Jan 17, 2024 at 4:35 PM Yosry Ahmed <yosryahmed@google.com> wrote: > Hmm I don't understand. What's the point of keeping the rbtree if we > have the xarray? Doesn't it end up being more expensive and bug-prone > to maintain both trees? Patch 2/2 remove the rb tree code. Just keeping the tree spinlock. > > When you say "eventual goal", do you mean what the patch would morph > into in later versions (as in v1 is just a proof of concept without > removing the rbtree), or follow up patches? V1 will remove the rb tree, but does not merge the rb tree lock with the xarray lock. Hi Nhat, > Hmmm why? Is there a reason to keep the rb tree around? No, not at all. Chris ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 0/2] mm/zswap: optimize the scalability of zswap rb-tree 2024-01-18 1:03 ` Chris Li @ 2024-01-18 3:51 ` Yosry Ahmed 0 siblings, 0 replies; 23+ messages in thread From: Yosry Ahmed @ 2024-01-18 3:51 UTC (permalink / raw) To: Chris Li Cc: Chengming Zhou, Andrew Morton, linux-kernel, Johannes Weiner, linux-mm, Nhat Pham On Wed, Jan 17, 2024 at 5:03 PM Chris Li <chrisl@kernel.org> wrote: > > Hi Yosry, > > On Wed, Jan 17, 2024 at 4:35 PM Yosry Ahmed <yosryahmed@google.com> wrote: > > Hmm I don't understand. What's the point of keeping the rbtree if we > > have the xarray? Doesn't it end up being more expensive and bug-prone > > to maintain both trees? > > Patch 2/2 remove the rb tree code. Just keeping the tree spinlock. > > > > > When you say "eventual goal", do you mean what the patch would morph > > into in later versions (as in v1 is just a proof of concept without > > removing the rbtree), or follow up patches? > > V1 will remove the rb tree, but does not merge the rb tree lock with > the xarray lock. I see you already posted the patches, let's move the discussion there. I will take a look at them as soon as I get the chance to. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 0/2] mm/zswap: optimize the scalability of zswap rb-tree 2024-01-18 0:17 ` Chris Li 2024-01-18 0:34 ` Yosry Ahmed @ 2024-01-18 0:49 ` Nhat Pham 1 sibling, 0 replies; 23+ messages in thread From: Nhat Pham @ 2024-01-18 0:49 UTC (permalink / raw) To: Chris Li Cc: Yosry Ahmed, Chengming Zhou, Andrew Morton, linux-kernel, Johannes Weiner, linux-mm On Wed, Jan 17, 2024 at 4:18 PM Chris Li <chrisl@kernel.org> wrote: > > Hi Yosry, > > On Wed, Jan 17, 2024 at 3:48 PM Yosry Ahmed <yosryahmed@google.com> wrote: > > > > > Currently the xarray patch should have everything it takes to use RCU > > > read lock. However taking out the tree spinlock is more work than > > > previously. If we are going to remove the tree spinlock, I think we > > > should revert back to doing a zswap tree lookup and return the zswap > > > entry with reference increased. The tree mapping can still decouple > > > from the zswap entry reference count drop to zero. Anyway, my V1 of > > > the xarray patch will not include removing the tree spinlock. > > > > Interesting. What do you mean by removing the tree spinlock? My > > assumption was that the xarray reduces lock contention because we do > > not need a lock to do lookups, but we still need the lock otherwise. > > Did you have something in mind to completely remove the tree lock? > > In my current xarray series, it adds the xarray alongside the rb tree. Hmmm why? Is there a reason to keep the rb tree around? ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 0/2] mm/zswap: optimize the scalability of zswap rb-tree 2024-01-17 18:37 ` [PATCH 0/2] mm/zswap: optimize the scalability of " Yosry Ahmed 2024-01-17 23:41 ` Chris Li @ 2024-01-18 15:34 ` Johannes Weiner 2024-01-18 17:30 ` Yosry Ahmed 1 sibling, 1 reply; 23+ messages in thread From: Johannes Weiner @ 2024-01-18 15:34 UTC (permalink / raw) To: Yosry Ahmed Cc: Chengming Zhou, Andrew Morton, linux-kernel, linux-mm, Chris Li, Nhat Pham On Wed, Jan 17, 2024 at 10:37:22AM -0800, Yosry Ahmed wrote: > On Wed, Jan 17, 2024 at 1:23 AM Chengming Zhou > <zhouchengming@bytedance.com> wrote: > > > > When testing the zswap performance by using kernel build -j32 in a tmpfs > > directory, I found the scalability of zswap rb-tree is not good, which > > is protected by the only spinlock. That would cause heavy lock contention > > if multiple tasks zswap_store/load concurrently. > > > > So a simple solution is to split the only one zswap rb-tree into multiple > > rb-trees, each corresponds to SWAP_ADDRESS_SPACE_PAGES (64M). This idea is > > from the commit 4b3ef9daa4fc ("mm/swap: split swap cache into 64MB trunks"). > > > > Although this method can't solve the spinlock contention completely, it > > can mitigate much of that contention. Below is the results of kernel build > > in tmpfs with zswap shrinker enabled: > > > > linux-next zswap-lock-optimize > > real 1m9.181s 1m3.820s > > user 17m44.036s 17m40.100s > > sys 7m37.297s 4m54.622s > > > > So there are clearly improvements. And it's complementary with the ongoing > > zswap xarray conversion by Chris. Anyway, I think we can also merge this > > first, it's complementary IMHO. So I just refresh and resend this for > > further discussion. > > The reason why I think we should wait for the xarray patch(es) is > there is a chance we may see less improvements from splitting the tree > if it was an xarray. If we merge this series first, there is no way to > know. I mentioned this before, but I disagree quite strongly with this general sentiment. Chengming's patches are simple, mature, and have convincing numbers. IMO it's poor form to hold something like that for "let's see how our other experiment works out". The only exception would be if we all agree that the earlier change flies in the face of the overall direction we want to pursue, which I don't think is the case here. With the xarray we'll still have a per-swapfile lock for writes. That lock is the reason SWAP_ADDRESS_SPACE segmentation was introduced for the swapcache in the first place. Lockless reads help of course, but read-only access to swap are in the minority - stores will write, and loads are commonly followed by invalidations. Somebody already went through the trouble of proving that xarrays + segmentation are worth it for swap load and store access patterns. Why dismiss that? So my vote is that we follow the ususal upstreaming process here: merge the ready patches now, and rebase future work on top of it. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 0/2] mm/zswap: optimize the scalability of zswap rb-tree 2024-01-18 15:34 ` Johannes Weiner @ 2024-01-18 17:30 ` Yosry Ahmed 2024-01-18 18:06 ` Johannes Weiner 0 siblings, 1 reply; 23+ messages in thread From: Yosry Ahmed @ 2024-01-18 17:30 UTC (permalink / raw) To: Johannes Weiner Cc: Chengming Zhou, Andrew Morton, linux-kernel, linux-mm, Chris Li, Nhat Pham On Thu, Jan 18, 2024 at 7:34 AM Johannes Weiner <hannes@cmpxchg.org> wrote: > > On Wed, Jan 17, 2024 at 10:37:22AM -0800, Yosry Ahmed wrote: > > On Wed, Jan 17, 2024 at 1:23 AM Chengming Zhou > > <zhouchengming@bytedance.com> wrote: > > > > > > When testing the zswap performance by using kernel build -j32 in a tmpfs > > > directory, I found the scalability of zswap rb-tree is not good, which > > > is protected by the only spinlock. That would cause heavy lock contention > > > if multiple tasks zswap_store/load concurrently. > > > > > > So a simple solution is to split the only one zswap rb-tree into multiple > > > rb-trees, each corresponds to SWAP_ADDRESS_SPACE_PAGES (64M). This idea is > > > from the commit 4b3ef9daa4fc ("mm/swap: split swap cache into 64MB trunks"). > > > > > > Although this method can't solve the spinlock contention completely, it > > > can mitigate much of that contention. Below is the results of kernel build > > > in tmpfs with zswap shrinker enabled: > > > > > > linux-next zswap-lock-optimize > > > real 1m9.181s 1m3.820s > > > user 17m44.036s 17m40.100s > > > sys 7m37.297s 4m54.622s > > > > > > So there are clearly improvements. And it's complementary with the ongoing > > > zswap xarray conversion by Chris. Anyway, I think we can also merge this > > > first, it's complementary IMHO. So I just refresh and resend this for > > > further discussion. > > > > The reason why I think we should wait for the xarray patch(es) is > > there is a chance we may see less improvements from splitting the tree > > if it was an xarray. If we merge this series first, there is no way to > > know. > > I mentioned this before, but I disagree quite strongly with this > general sentiment. > > Chengming's patches are simple, mature, and have convincing > numbers. IMO it's poor form to hold something like that for "let's see > how our other experiment works out". The only exception would be if we > all agree that the earlier change flies in the face of the overall > direction we want to pursue, which I don't think is the case here. My intention was not to delay merging these patches until the xarray patches are merged in. It was only to wait until the xarray patches are *posted*, so that we can redo the testing on top of them and verify that the gains are still there. That should have been around now, but the xarray patches were posted in a form that does not allow this testing (because we still have a lock on the read path), so I am less inclined. My rationale was that if the gains from splitting the tree become minimal after we switch to an xarray, we won't know. It's more difficult to remove optimizations than to add them, because we may cause a regression. I am kind of paranoid about having code sitting around that we don't have full information about how much it's needed. In this case, I suppose we can redo the testing (1 tree vs. split trees) once the xarray patches are in a testable form, and before we have formed any strong dependencies on the split trees (we have time until v6.9 is released, I assume). How about that? > > With the xarray we'll still have a per-swapfile lock for writes. That > lock is the reason SWAP_ADDRESS_SPACE segmentation was introduced for > the swapcache in the first place. Lockless reads help of course, but > read-only access to swap are in the minority - stores will write, and > loads are commonly followed by invalidations. Somebody already went > through the trouble of proving that xarrays + segmentation are worth > it for swap load and store access patterns. Why dismiss that? Fair point, although I think the swapcache lock may be more contended than the zswap tree lock. > So my vote is that we follow the ususal upstreaming process here: > merge the ready patches now, and rebase future work on top of it. No objections given the current state of the xarray patches as I mentioned earlier, but I prefer we redo the testing once possible with the xarray. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 0/2] mm/zswap: optimize the scalability of zswap rb-tree 2024-01-18 17:30 ` Yosry Ahmed @ 2024-01-18 18:06 ` Johannes Weiner 2024-01-18 18:37 ` Yosry Ahmed 0 siblings, 1 reply; 23+ messages in thread From: Johannes Weiner @ 2024-01-18 18:06 UTC (permalink / raw) To: Yosry Ahmed Cc: Chengming Zhou, Andrew Morton, linux-kernel, linux-mm, Chris Li, Nhat Pham On Thu, Jan 18, 2024 at 09:30:12AM -0800, Yosry Ahmed wrote: > On Thu, Jan 18, 2024 at 7:34 AM Johannes Weiner <hannes@cmpxchg.org> wrote: > > > > On Wed, Jan 17, 2024 at 10:37:22AM -0800, Yosry Ahmed wrote: > > > On Wed, Jan 17, 2024 at 1:23 AM Chengming Zhou > > > <zhouchengming@bytedance.com> wrote: > > > > > > > > When testing the zswap performance by using kernel build -j32 in a tmpfs > > > > directory, I found the scalability of zswap rb-tree is not good, which > > > > is protected by the only spinlock. That would cause heavy lock contention > > > > if multiple tasks zswap_store/load concurrently. > > > > > > > > So a simple solution is to split the only one zswap rb-tree into multiple > > > > rb-trees, each corresponds to SWAP_ADDRESS_SPACE_PAGES (64M). This idea is > > > > from the commit 4b3ef9daa4fc ("mm/swap: split swap cache into 64MB trunks"). > > > > > > > > Although this method can't solve the spinlock contention completely, it > > > > can mitigate much of that contention. Below is the results of kernel build > > > > in tmpfs with zswap shrinker enabled: > > > > > > > > linux-next zswap-lock-optimize > > > > real 1m9.181s 1m3.820s > > > > user 17m44.036s 17m40.100s > > > > sys 7m37.297s 4m54.622s > > > > > > > > So there are clearly improvements. And it's complementary with the ongoing > > > > zswap xarray conversion by Chris. Anyway, I think we can also merge this > > > > first, it's complementary IMHO. So I just refresh and resend this for > > > > further discussion. > > > > > > The reason why I think we should wait for the xarray patch(es) is > > > there is a chance we may see less improvements from splitting the tree > > > if it was an xarray. If we merge this series first, there is no way to > > > know. > > > > I mentioned this before, but I disagree quite strongly with this > > general sentiment. > > > > Chengming's patches are simple, mature, and have convincing > > numbers. IMO it's poor form to hold something like that for "let's see > > how our other experiment works out". The only exception would be if we > > all agree that the earlier change flies in the face of the overall > > direction we want to pursue, which I don't think is the case here. > > My intention was not to delay merging these patches until the xarray > patches are merged in. It was only to wait until the xarray patches > are *posted*, so that we can redo the testing on top of them and > verify that the gains are still there. That should have been around > now, but the xarray patches were posted in a form that does not allow > this testing (because we still have a lock on the read path), so I am > less inclined. > > My rationale was that if the gains from splitting the tree become > minimal after we switch to an xarray, we won't know. It's more > difficult to remove optimizations than to add them, because we may > cause a regression. I am kind of paranoid about having code sitting > around that we don't have full information about how much it's needed. Yeah I understand that fear. I expect the splitting to help more than the move to xarray because it's the writes that are hot. Luckily in this case it should be fairly easy to differential-test after it's been merged by changing that tree lookup macro/function locally to always return &trees[type][0], right? > In this case, I suppose we can redo the testing (1 tree vs. split > trees) once the xarray patches are in a testable form, and before we > have formed any strong dependencies on the split trees (we have time > until v6.9 is released, I assume). > > How about that? That sounds reasonable. > > With the xarray we'll still have a per-swapfile lock for writes. That > > lock is the reason SWAP_ADDRESS_SPACE segmentation was introduced for > > the swapcache in the first place. Lockless reads help of course, but > > read-only access to swap are in the minority - stores will write, and > > loads are commonly followed by invalidations. Somebody already went > > through the trouble of proving that xarrays + segmentation are worth > > it for swap load and store access patterns. Why dismiss that? > > Fair point, although I think the swapcache lock may be more contended > than the zswap tree lock. Right, it has two updates for each transition, compared to the one for zswap. But we know that in a concurrent system under pressure a globally shared swap lock will hurt. There is a history in Chengming's numbers, your previous patch to split the zpools, 235b62176712b970c815923e36b9a9cc05d4d901 etc. > > So my vote is that we follow the ususal upstreaming process here: > > merge the ready patches now, and rebase future work on top of it. > > No objections given the current state of the xarray patches as I > mentioned earlier, but I prefer we redo the testing once possible with > the xarray. Cool, sounds good to me. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 0/2] mm/zswap: optimize the scalability of zswap rb-tree 2024-01-18 18:06 ` Johannes Weiner @ 2024-01-18 18:37 ` Yosry Ahmed 2024-01-19 6:40 ` Chengming Zhou 0 siblings, 1 reply; 23+ messages in thread From: Yosry Ahmed @ 2024-01-18 18:37 UTC (permalink / raw) To: Johannes Weiner Cc: Chengming Zhou, Andrew Morton, linux-kernel, linux-mm, Chris Li, Nhat Pham On Thu, Jan 18, 2024 at 10:07 AM Johannes Weiner <hannes@cmpxchg.org> wrote: > > On Thu, Jan 18, 2024 at 09:30:12AM -0800, Yosry Ahmed wrote: > > On Thu, Jan 18, 2024 at 7:34 AM Johannes Weiner <hannes@cmpxchg.org> wrote: > > > > > > On Wed, Jan 17, 2024 at 10:37:22AM -0800, Yosry Ahmed wrote: > > > > On Wed, Jan 17, 2024 at 1:23 AM Chengming Zhou > > > > <zhouchengming@bytedance.com> wrote: > > > > > > > > > > When testing the zswap performance by using kernel build -j32 in a tmpfs > > > > > directory, I found the scalability of zswap rb-tree is not good, which > > > > > is protected by the only spinlock. That would cause heavy lock contention > > > > > if multiple tasks zswap_store/load concurrently. > > > > > > > > > > So a simple solution is to split the only one zswap rb-tree into multiple > > > > > rb-trees, each corresponds to SWAP_ADDRESS_SPACE_PAGES (64M). This idea is > > > > > from the commit 4b3ef9daa4fc ("mm/swap: split swap cache into 64MB trunks"). > > > > > > > > > > Although this method can't solve the spinlock contention completely, it > > > > > can mitigate much of that contention. Below is the results of kernel build > > > > > in tmpfs with zswap shrinker enabled: > > > > > > > > > > linux-next zswap-lock-optimize > > > > > real 1m9.181s 1m3.820s > > > > > user 17m44.036s 17m40.100s > > > > > sys 7m37.297s 4m54.622s > > > > > > > > > > So there are clearly improvements. And it's complementary with the ongoing > > > > > zswap xarray conversion by Chris. Anyway, I think we can also merge this > > > > > first, it's complementary IMHO. So I just refresh and resend this for > > > > > further discussion. > > > > > > > > The reason why I think we should wait for the xarray patch(es) is > > > > there is a chance we may see less improvements from splitting the tree > > > > if it was an xarray. If we merge this series first, there is no way to > > > > know. > > > > > > I mentioned this before, but I disagree quite strongly with this > > > general sentiment. > > > > > > Chengming's patches are simple, mature, and have convincing > > > numbers. IMO it's poor form to hold something like that for "let's see > > > how our other experiment works out". The only exception would be if we > > > all agree that the earlier change flies in the face of the overall > > > direction we want to pursue, which I don't think is the case here. > > > > My intention was not to delay merging these patches until the xarray > > patches are merged in. It was only to wait until the xarray patches > > are *posted*, so that we can redo the testing on top of them and > > verify that the gains are still there. That should have been around > > now, but the xarray patches were posted in a form that does not allow > > this testing (because we still have a lock on the read path), so I am > > less inclined. > > > > My rationale was that if the gains from splitting the tree become > > minimal after we switch to an xarray, we won't know. It's more > > difficult to remove optimizations than to add them, because we may > > cause a regression. I am kind of paranoid about having code sitting > > around that we don't have full information about how much it's needed. > > Yeah I understand that fear. > > I expect the splitting to help more than the move to xarray because > it's the writes that are hot. Luckily in this case it should be fairly > easy to differential-test after it's been merged by changing that tree > lookup macro/function locally to always return &trees[type][0], right? Yeah that's exactly what I had in mind. Once we have a version of the xarray patch without the locking on the read side we can test with that. Chengming, does this sound reasonable to you? ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 0/2] mm/zswap: optimize the scalability of zswap rb-tree 2024-01-18 18:37 ` Yosry Ahmed @ 2024-01-19 6:40 ` Chengming Zhou 0 siblings, 0 replies; 23+ messages in thread From: Chengming Zhou @ 2024-01-19 6:40 UTC (permalink / raw) To: Yosry Ahmed, Johannes Weiner Cc: Andrew Morton, linux-kernel, linux-mm, Chris Li, Nhat Pham On 2024/1/19 02:37, Yosry Ahmed wrote: > On Thu, Jan 18, 2024 at 10:07 AM Johannes Weiner <hannes@cmpxchg.org> wrote: >> >> On Thu, Jan 18, 2024 at 09:30:12AM -0800, Yosry Ahmed wrote: >>> On Thu, Jan 18, 2024 at 7:34 AM Johannes Weiner <hannes@cmpxchg.org> wrote: >>>> >>>> On Wed, Jan 17, 2024 at 10:37:22AM -0800, Yosry Ahmed wrote: >>>>> On Wed, Jan 17, 2024 at 1:23 AM Chengming Zhou >>>>> <zhouchengming@bytedance.com> wrote: >>>>>> >>>>>> When testing the zswap performance by using kernel build -j32 in a tmpfs >>>>>> directory, I found the scalability of zswap rb-tree is not good, which >>>>>> is protected by the only spinlock. That would cause heavy lock contention >>>>>> if multiple tasks zswap_store/load concurrently. >>>>>> >>>>>> So a simple solution is to split the only one zswap rb-tree into multiple >>>>>> rb-trees, each corresponds to SWAP_ADDRESS_SPACE_PAGES (64M). This idea is >>>>>> from the commit 4b3ef9daa4fc ("mm/swap: split swap cache into 64MB trunks"). >>>>>> >>>>>> Although this method can't solve the spinlock contention completely, it >>>>>> can mitigate much of that contention. Below is the results of kernel build >>>>>> in tmpfs with zswap shrinker enabled: >>>>>> >>>>>> linux-next zswap-lock-optimize >>>>>> real 1m9.181s 1m3.820s >>>>>> user 17m44.036s 17m40.100s >>>>>> sys 7m37.297s 4m54.622s >>>>>> >>>>>> So there are clearly improvements. And it's complementary with the ongoing >>>>>> zswap xarray conversion by Chris. Anyway, I think we can also merge this >>>>>> first, it's complementary IMHO. So I just refresh and resend this for >>>>>> further discussion. >>>>> >>>>> The reason why I think we should wait for the xarray patch(es) is >>>>> there is a chance we may see less improvements from splitting the tree >>>>> if it was an xarray. If we merge this series first, there is no way to >>>>> know. >>>> >>>> I mentioned this before, but I disagree quite strongly with this >>>> general sentiment. >>>> >>>> Chengming's patches are simple, mature, and have convincing >>>> numbers. IMO it's poor form to hold something like that for "let's see >>>> how our other experiment works out". The only exception would be if we >>>> all agree that the earlier change flies in the face of the overall >>>> direction we want to pursue, which I don't think is the case here. >>> >>> My intention was not to delay merging these patches until the xarray >>> patches are merged in. It was only to wait until the xarray patches >>> are *posted*, so that we can redo the testing on top of them and >>> verify that the gains are still there. That should have been around >>> now, but the xarray patches were posted in a form that does not allow >>> this testing (because we still have a lock on the read path), so I am >>> less inclined. >>> >>> My rationale was that if the gains from splitting the tree become >>> minimal after we switch to an xarray, we won't know. It's more >>> difficult to remove optimizations than to add them, because we may >>> cause a regression. I am kind of paranoid about having code sitting >>> around that we don't have full information about how much it's needed. >> >> Yeah I understand that fear. >> >> I expect the splitting to help more than the move to xarray because >> it's the writes that are hot. Luckily in this case it should be fairly >> easy to differential-test after it's been merged by changing that tree >> lookup macro/function locally to always return &trees[type][0], right? > > Yeah that's exactly what I had in mind. Once we have a version of the > xarray patch without the locking on the read side we can test with > that. Chengming, does this sound reasonable to you? It's ok, sounds reasonable to me. I agree with Johannes, we will need both since xarray still have a spinlock in the writes, it's clearly better to split it. As for testing, we can always return &trees[type][0]. Thanks! ^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2024-01-19 6:40 UTC | newest] Thread overview: 23+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-01-17 9:23 [PATCH 0/2] mm/zswap: optimize the scalability of zswap rb-tree Chengming Zhou 2024-01-17 9:23 ` [PATCH 1/2] mm/zswap: make sure each swapfile always have " Chengming Zhou 2024-01-18 15:05 ` Johannes Weiner 2024-01-18 17:37 ` Yosry Ahmed 2024-01-18 18:16 ` Nhat Pham 2024-01-17 9:23 ` [PATCH 2/2] mm/zswap: split " Chengming Zhou 2024-01-18 15:11 ` Johannes Weiner 2024-01-19 6:20 ` Chengming Zhou 2024-01-18 19:24 ` Nhat Pham 2024-01-19 6:24 ` Chengming Zhou 2024-01-17 18:37 ` [PATCH 0/2] mm/zswap: optimize the scalability of " Yosry Ahmed 2024-01-17 23:41 ` Chris Li 2024-01-17 23:47 ` Yosry Ahmed 2024-01-18 0:17 ` Chris Li 2024-01-18 0:34 ` Yosry Ahmed 2024-01-18 1:03 ` Chris Li 2024-01-18 3:51 ` Yosry Ahmed 2024-01-18 0:49 ` Nhat Pham 2024-01-18 15:34 ` Johannes Weiner 2024-01-18 17:30 ` Yosry Ahmed 2024-01-18 18:06 ` Johannes Weiner 2024-01-18 18:37 ` Yosry Ahmed 2024-01-19 6:40 ` Chengming Zhou
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).