* [PATCH RFC] shmem: don't trim whole folio loop boundaries on partial truncate
@ 2025-10-30 16:51 Brian Foster
2025-11-03 20:11 ` Brian Foster
0 siblings, 1 reply; 2+ messages in thread
From: Brian Foster @ 2025-10-30 16:51 UTC (permalink / raw)
To: linux-mm; +Cc: Hugh Dickins, Matthew Wilcox, Jan Kara, William Kucharski
An fstests fsx test run in a low memory cgroup environment on a
huge=always tmpfs mount occasionally reproduces a livelock in
shmem_undo_range() (via a truncate operation). The process ends up
spinning indefinitely in the whole_folios loop.
The root cause of this appears to be that earlier in the function, a
large folio is handled at the start boundary of the truncate. The
truncate_inode_partial_folio() splits the large folio such that the
currently referenced folio now ends before the range of whole folios
to truncate in the second loop. The start index is pushed back and
the second loop finds/splits some folios in this range.
Ultimately what appears to happen is that we settle in a sequence
where a large and dirty folio sits at the updated start index, the
truncate returns false and doesn't actually toss the folio because
it's not fully within the logical offset boundaries, and hence the
loop restarts indefinitely because the index had moved forward since
the previous batch lookup.
To avoid this problem and simplify the code, remove the start/end
boundary updates in the partial folio handling at range boundaries.
My understanding is that there is potentially some overlap and
duplicate work here, but that this is relatively harmless and safer
than the alternative. Also since the folio size can change as a
result of the partial truncate, update the same_folio value based on
the post-truncate folio.
Signed-off-by: Brian Foster <bfoster@redhat.com>
---
Hi all,
So I've been (finally) getting back to my tmpfs zeroing fixes and
running some testing and hit this issue. The test is fsx running in a
10MB memory.max cgroup against a 4MB file on a tmpfs mount with
huge=always. Note that I first tried to bisect this since I don't recall
hitting it before and that landed on 69e0a3b49003 ("mm: shmem: fix the
strategy for the tmpfs 'huge=' options"). I take this as a behavior
change to something that precedes my initial testing and not a root
cause, so I moved away from that in favor of throwing in some tracing to
characterize behavior.
I'm sending this as an RFC because even though it seems to resolve the
issue in my (limited so far) testing, I'm not familiar enough with all
the complexities around large folio management and whatnot to be totally
sure it's the right fix. For example, I'm not quite sure if the test
constraints here are circumstantial or something more. FWIW, I started
with a more simple fix to just prevent start from moving backwards. That
prevents the issue as well, but this seemed a little more explicit on
further thought.
I notice the same boundary tweaking logic in
truncate_inode_pages_range() (via the same commit [1]), though it looks
like that path is not susceptible to a livelock as it will just toss
folios. Again, I'm not totally sure if there are outside circumstances
that make this less relevant there than for tmpfs, so this is at worst a
starting point for discussion.. Thanks.
Brian
[1] b9a8a4195c7d ("truncate,shmem: Handle truncates that split large folios")
mm/shmem.c | 11 +++--------
1 file changed, 3 insertions(+), 8 deletions(-)
diff --git a/mm/shmem.c b/mm/shmem.c
index b9081b817d28..133a7d8213c5 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1133,13 +1133,9 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
same_folio = (lstart >> PAGE_SHIFT) == (lend >> PAGE_SHIFT);
folio = shmem_get_partial_folio(inode, lstart >> PAGE_SHIFT);
if (folio) {
- same_folio = lend < folio_pos(folio) + folio_size(folio);
folio_mark_dirty(folio);
- if (!truncate_inode_partial_folio(folio, lstart, lend)) {
- start = folio_next_index(folio);
- if (same_folio)
- end = folio->index;
- }
+ truncate_inode_partial_folio(folio, lstart, lend);
+ same_folio = lend < folio_pos(folio) + folio_size(folio);
folio_unlock(folio);
folio_put(folio);
folio = NULL;
@@ -1149,8 +1145,7 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
folio = shmem_get_partial_folio(inode, lend >> PAGE_SHIFT);
if (folio) {
folio_mark_dirty(folio);
- if (!truncate_inode_partial_folio(folio, lstart, lend))
- end = folio->index;
+ truncate_inode_partial_folio(folio, lstart, lend);
folio_unlock(folio);
folio_put(folio);
}
--
2.51.0
^ permalink raw reply related [flat|nested] 2+ messages in thread
* Re: [PATCH RFC] shmem: don't trim whole folio loop boundaries on partial truncate
2025-10-30 16:51 [PATCH RFC] shmem: don't trim whole folio loop boundaries on partial truncate Brian Foster
@ 2025-11-03 20:11 ` Brian Foster
0 siblings, 0 replies; 2+ messages in thread
From: Brian Foster @ 2025-11-03 20:11 UTC (permalink / raw)
To: linux-mm; +Cc: Hugh Dickins, Matthew Wilcox, Jan Kara, William Kucharski
On Thu, Oct 30, 2025 at 12:51:21PM -0400, Brian Foster wrote:
> An fstests fsx test run in a low memory cgroup environment on a
> huge=always tmpfs mount occasionally reproduces a livelock in
> shmem_undo_range() (via a truncate operation). The process ends up
> spinning indefinitely in the whole_folios loop.
>
> The root cause of this appears to be that earlier in the function, a
> large folio is handled at the start boundary of the truncate. The
> truncate_inode_partial_folio() splits the large folio such that the
> currently referenced folio now ends before the range of whole folios
> to truncate in the second loop. The start index is pushed back and
> the second loop finds/splits some folios in this range.
>
> Ultimately what appears to happen is that we settle in a sequence
> where a large and dirty folio sits at the updated start index, the
> truncate returns false and doesn't actually toss the folio because
> it's not fully within the logical offset boundaries, and hence the
> loop restarts indefinitely because the index had moved forward since
> the previous batch lookup.
>
> To avoid this problem and simplify the code, remove the start/end
> boundary updates in the partial folio handling at range boundaries.
> My understanding is that there is potentially some overlap and
> duplicate work here, but that this is relatively harmless and safer
> than the alternative. Also since the folio size can change as a
> result of the partial truncate, update the same_folio value based on
> the post-truncate folio.
>
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
>
> Hi all,
>
> So I've been (finally) getting back to my tmpfs zeroing fixes and
> running some testing and hit this issue. The test is fsx running in a
> 10MB memory.max cgroup against a 4MB file on a tmpfs mount with
> huge=always. Note that I first tried to bisect this since I don't recall
> hitting it before and that landed on 69e0a3b49003 ("mm: shmem: fix the
> strategy for the tmpfs 'huge=' options"). I take this as a behavior
> change to something that precedes my initial testing and not a root
> cause, so I moved away from that in favor of throwing in some tracing to
> characterize behavior.
>
> I'm sending this as an RFC because even though it seems to resolve the
> issue in my (limited so far) testing, I'm not familiar enough with all
> the complexities around large folio management and whatnot to be totally
> sure it's the right fix. For example, I'm not quite sure if the test
> constraints here are circumstantial or something more. FWIW, I started
> with a more simple fix to just prevent start from moving backwards. That
> prevents the issue as well, but this seemed a little more explicit on
> further thought.
>
Self-NAK. After further testing I think I've reproduced a case where the
undo range races with swapout such that this change causes the
whole_folios loop to toss a full swap entry that extends outside the
range (where the existing code would have trimmed the range start),
which is wrong. I'll fall back to testing out the incremental fix
mentioned above and follow up when I have something more generally
sorted out...
Brian
> I notice the same boundary tweaking logic in
> truncate_inode_pages_range() (via the same commit [1]), though it looks
> like that path is not susceptible to a livelock as it will just toss
> folios. Again, I'm not totally sure if there are outside circumstances
> that make this less relevant there than for tmpfs, so this is at worst a
> starting point for discussion.. Thanks.
>
> Brian
>
> [1] b9a8a4195c7d ("truncate,shmem: Handle truncates that split large folios")
>
> mm/shmem.c | 11 +++--------
> 1 file changed, 3 insertions(+), 8 deletions(-)
>
> diff --git a/mm/shmem.c b/mm/shmem.c
> index b9081b817d28..133a7d8213c5 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -1133,13 +1133,9 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
> same_folio = (lstart >> PAGE_SHIFT) == (lend >> PAGE_SHIFT);
> folio = shmem_get_partial_folio(inode, lstart >> PAGE_SHIFT);
> if (folio) {
> - same_folio = lend < folio_pos(folio) + folio_size(folio);
> folio_mark_dirty(folio);
> - if (!truncate_inode_partial_folio(folio, lstart, lend)) {
> - start = folio_next_index(folio);
> - if (same_folio)
> - end = folio->index;
> - }
> + truncate_inode_partial_folio(folio, lstart, lend);
> + same_folio = lend < folio_pos(folio) + folio_size(folio);
> folio_unlock(folio);
> folio_put(folio);
> folio = NULL;
> @@ -1149,8 +1145,7 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
> folio = shmem_get_partial_folio(inode, lend >> PAGE_SHIFT);
> if (folio) {
> folio_mark_dirty(folio);
> - if (!truncate_inode_partial_folio(folio, lstart, lend))
> - end = folio->index;
> + truncate_inode_partial_folio(folio, lstart, lend);
> folio_unlock(folio);
> folio_put(folio);
> }
> --
> 2.51.0
>
>
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2025-11-03 20:07 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-30 16:51 [PATCH RFC] shmem: don't trim whole folio loop boundaries on partial truncate Brian Foster
2025-11-03 20:11 ` Brian Foster
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).