From: Hugh Dickins <hughd@google.com>
To: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: akpm@linux-foundation.org, hughd@google.com, willy@infradead.org,
david@redhat.com, wangkefeng.wang@huawei.com, chrisl@kernel.org,
ying.huang@intel.com, 21cnbao@gmail.com, ryan.roberts@arm.com,
shy828301@gmail.com, ziy@nvidia.com, ioworker0@gmail.com,
da.gomez@samsung.com, p.raghav@samsung.com, linux-mm@kvack.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH v5 4/9] mm: filemap: use xa_get_order() to get the swap entry order
Date: Sun, 25 Aug 2024 14:55:41 -0700 (PDT) [thread overview]
Message-ID: <d3dc75e2-40a7-8439-734c-19d83707164c@google.com> (raw)
In-Reply-To: <6876d55145c1cc80e79df7884aa3a62e397b101d.1723434324.git.baolin.wang@linux.alibaba.com>
On Mon, 12 Aug 2024, Baolin Wang wrote:
> In the following patches, shmem will support the swap out of large folios,
> which means the shmem mappings may contain large order swap entries, so
> using xa_get_order() to get the folio order of the shmem swap entry to
> update the '*start' correctly.
>
> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> ---
> mm/filemap.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 4130be74f6fd..4c312aab8b1f 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -2056,6 +2056,8 @@ unsigned find_get_entries(struct address_space *mapping, pgoff_t *start,
> folio = fbatch->folios[idx];
> if (!xa_is_value(folio))
> nr = folio_nr_pages(folio);
> + else
> + nr = 1 << xa_get_order(&mapping->i_pages, indices[idx]);
> *start = indices[idx] + nr;
> }
> return folio_batch_count(fbatch);
> @@ -2120,6 +2122,8 @@ unsigned find_lock_entries(struct address_space *mapping, pgoff_t *start,
> folio = fbatch->folios[idx];
> if (!xa_is_value(folio))
> nr = folio_nr_pages(folio);
> + else
> + nr = 1 << xa_get_order(&mapping->i_pages, indices[idx]);
> *start = indices[idx] + nr;
> }
> return folio_batch_count(fbatch);
> --
Here we have a problem, but I'm not suggesting a fix for it yet: I
need to get other fixes out first, then turn to thinking about this -
it's not easy.
That code is almost always right, so it works well enough for most
people not to have noticed, but there are two issues with it.
The first issue is that it's assuming indices[idx] is already aligned
to nr: not necessarily so. I believe it was already wrong in the
folio_nr_pages() case, but the time I caught it wrong with a printk
was in the swap (value) case. (I may not be stating this correctly:
again more thought needed but I can't spend so long writing.)
And immediately afterwards that kernel build failed with a corrupted
(all 0s) .o file - I'm building on ext4 on /dev/loop0 on huge tmpfs while
swapping, and happen to be using the "-o discard" option to ext4 mount.
I've been chasing these failures (maybe a few minutes in, maybe half an
hour) for days, then had the idea of trying without the "-o discard":
whereupon these builds can be repeated successfully for many hours.
IIRC ext4 discard to /dev/loop0 entails hole-punch to the tmpfs.
The alignment issue can easily be corrected, but that might not help.
(And those two functions would look better with the rcu_read_unlock()
moved down to just before returning: but again, wouldn't really help.)
The second issue is that swap is more slippery to work with than
folios or pages: in the folio_nr_pages() case, that number is stable
because we hold a refcount (which stops a THP from being split), and
later we'll be taking folio lock too. None of that in the swap case,
so (depending on how a large entry gets split) the xa_get_order() result
is not reliable. Likewise for other uses of xa_get_order() in this series.
There needs to be some kind of locking or retry to make the order usable,
and to avoid shmem_free_swap() occasionally freeing more than it ought.
I'll give it thought after.
Hugh
next prev parent reply other threads:[~2024-08-25 21:55 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-12 7:42 [PATCH v5 0/9] support large folio swap-out and swap-in for shmem Baolin Wang
2024-08-12 7:42 ` [PATCH v5 1/9] mm: swap: extend swap_shmem_alloc() to support batch SWAP_MAP_SHMEM flag setting Baolin Wang
2024-08-12 7:42 ` [PATCH v5 2/9] mm: shmem: extend shmem_partial_swap_usage() to support large folio swap Baolin Wang
2024-08-12 7:42 ` [PATCH v5 3/9] mm: shmem: return number of pages beeing freed in shmem_free_swap Baolin Wang
2024-08-12 7:42 ` [PATCH v5 4/9] mm: filemap: use xa_get_order() to get the swap entry order Baolin Wang
2024-08-25 21:55 ` Hugh Dickins [this message]
2024-08-25 23:28 ` Matthew Wilcox
2024-08-27 10:10 ` Baolin Wang
2024-08-29 8:07 ` Hugh Dickins
2024-08-29 12:40 ` Baolin Wang
2024-08-30 10:18 ` Hugh Dickins
2024-08-12 7:42 ` [PATCH v5 5/9] mm: shmem: use swap_free_nr() to free shmem swap entries Baolin Wang
2024-08-12 7:42 ` [PATCH v5 6/9] mm: shmem: support large folio allocation for shmem_replace_folio() Baolin Wang
2024-08-25 22:05 ` Hugh Dickins
2024-08-27 3:06 ` Baolin Wang
2024-08-12 7:42 ` [PATCH v5 7/9] mm: shmem: drop folio reference count using 'nr_pages' in shmem_delete_from_page_cache() Baolin Wang
2024-08-12 7:42 ` [PATCH v5 8/9] mm: shmem: split large entry if the swapin folio is not large Baolin Wang
2024-08-25 22:31 ` Hugh Dickins
2024-08-27 6:46 ` Baolin Wang
2024-08-12 7:42 ` [PATCH v5 9/9] mm: shmem: support large folio swap out Baolin Wang
2024-08-25 23:14 ` Hugh Dickins
2024-08-27 6:58 ` Baolin Wang
2024-08-28 8:28 ` [PATCH] mm: shmem: support large folio swap out fix 2 Baolin Wang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d3dc75e2-40a7-8439-734c-19d83707164c@google.com \
--to=hughd@google.com \
--cc=21cnbao@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=chrisl@kernel.org \
--cc=da.gomez@samsung.com \
--cc=david@redhat.com \
--cc=ioworker0@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=p.raghav@samsung.com \
--cc=ryan.roberts@arm.com \
--cc=shy828301@gmail.com \
--cc=wangkefeng.wang@huawei.com \
--cc=willy@infradead.org \
--cc=ying.huang@intel.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).