From: Chengming Zhou <chengming.zhou@linux.dev>
To: Usama Arif <usamaarif642@gmail.com>, akpm@linux-foundation.org
Cc: hannes@cmpxchg.org, shakeel.butt@linux.dev, david@redhat.com,
ying.huang@intel.com, hughd@google.com, willy@infradead.org,
yosryahmed@google.com, nphamcs@gmail.com, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, kernel-team@meta.com
Subject: Re: [PATCH v5 1/2] mm: store zero pages to be swapped out in a bitmap
Date: Fri, 14 Jun 2024 20:05:48 +0800 [thread overview]
Message-ID: <b5952ee4-3b86-40ff-a3de-8a08f09557bd@linux.dev> (raw)
In-Reply-To: <20240614100902.3469724-2-usamaarif642@gmail.com>
On 2024/6/14 18:07, Usama Arif wrote:
> Approximately 10-20% of pages to be swapped out are zero pages [1].
> Rather than reading/writing these pages to flash resulting
> in increased I/O and flash wear, a bitmap can be used to mark these
> pages as zero at write time, and the pages can be filled at
> read time if the bit corresponding to the page is set.
> With this patch, NVMe writes in Meta server fleet decreased
> by almost 10% with conventional swap setup (zswap disabled).
>
> [1] https://lore.kernel.org/all/20171018104832epcms5p1b2232e2236258de3d03d1344dde9fce0@epcms5p1/
>
> Signed-off-by: Usama Arif <usamaarif642@gmail.com>
Looks good to me, only some small nits below.
Reviewed-by: Chengming Zhou <chengming.zhou@linux.dev>
> ---
> include/linux/swap.h | 1 +
> mm/page_io.c | 113 ++++++++++++++++++++++++++++++++++++++++++-
> mm/swapfile.c | 15 ++++++
> 3 files changed, 128 insertions(+), 1 deletion(-)
>
[...]
> +
> +static void swap_zeromap_folio_set(struct folio *folio)
> +{
> + struct swap_info_struct *sis = swp_swap_info(folio->swap);
> + swp_entry_t entry;
> + unsigned int i;
> +
> + for (i = 0; i < folio_nr_pages(folio); i++) {
> + entry = page_swap_entry(folio_page(folio, i));
It seems simpler to use:
swp_entry_t entry = folio->swap;
for (i = 0; i < folio_nr_pages(folio); i++, entry.val++)
The current code is good too, no objection.
> + set_bit(swp_offset(entry), sis->zeromap);
> + }
> +}
> +
[...]
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index 9c6d8e557c0f..0b8270359bcf 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -747,6 +747,14 @@ static void swap_range_free(struct swap_info_struct *si, unsigned long offset,
> unsigned long begin = offset;
> unsigned long end = offset + nr_entries - 1;
> void (*swap_slot_free_notify)(struct block_device *, unsigned long);
> + unsigned int i;
> +
> + /*
> + * Use atomic clear_bit operations only on zeromap instead of non-atomic
> + * bitmap_clear to prevent adjacent bits corruption due to simultaneous writes.
> + */
> + for (i = 0; i < nr_entries; i++)
> + clear_bit(offset + i, si->zeromap);
I'm wondering if we need to clear bits at all? Since the current locked folio is
the owner of these bits, we always update correctly when swap_writepage(). So
if these swap entries freed and reused by another folio, we won't load from backend
until that another folio has gone swap_writepage(), which update these bits correctly.
Maybe I missed something? Anyway, it should be no harm to clear here too.
Thanks.
>
> if (offset < si->lowest_bit)
> si->lowest_bit = offset;
> @@ -2635,6 +2643,7 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
> free_percpu(p->cluster_next_cpu);
> p->cluster_next_cpu = NULL;
> vfree(swap_map);
> + bitmap_free(p->zeromap);
> kvfree(cluster_info);
> /* Destroy swap account information */
> swap_cgroup_swapoff(p->type);
> @@ -3161,6 +3170,12 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
> goto bad_swap_unlock_inode;
> }
>
> + p->zeromap = bitmap_zalloc(maxpages, GFP_KERNEL);
> + if (!p->zeromap) {
> + error = -ENOMEM;
> + goto bad_swap_unlock_inode;
> + }
> +
> if (p->bdev && bdev_stable_writes(p->bdev))
> p->flags |= SWP_STABLE_WRITES;
>
next prev parent reply other threads:[~2024-06-14 12:08 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-14 10:07 [PATCH v5 0/2] mm: store zero pages to be swapped out in a bitmap Usama Arif
2024-06-14 10:07 ` [PATCH v5 1/2] " Usama Arif
2024-06-14 12:05 ` Chengming Zhou [this message]
2024-06-14 18:41 ` Yosry Ahmed
2024-06-14 14:45 ` Andi Kleen
2024-06-14 15:02 ` Usama Arif
2024-06-14 18:36 ` Yosry Ahmed
2024-06-14 20:23 ` Nhat Pham
2024-06-14 10:07 ` [PATCH v5 2/2] mm: remove code to handle same filled pages Usama Arif
2024-06-14 12:07 ` Chengming Zhou
2024-06-14 18:41 ` Yosry Ahmed
2024-06-14 20:25 ` Nhat Pham
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b5952ee4-3b86-40ff-a3de-8a08f09557bd@linux.dev \
--to=chengming.zhou@linux.dev \
--cc=akpm@linux-foundation.org \
--cc=david@redhat.com \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=kernel-team@meta.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=nphamcs@gmail.com \
--cc=shakeel.butt@linux.dev \
--cc=usamaarif642@gmail.com \
--cc=willy@infradead.org \
--cc=ying.huang@intel.com \
--cc=yosryahmed@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.