linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Regression caused by commit 4687fdbb805a ("mm/filemap: Support VM_HUGEPAGE for file mappings")
@ 2025-08-15 18:43 Roman Gushchin
  2025-08-15 21:01 ` Matthew Wilcox
  0 siblings, 1 reply; 4+ messages in thread
From: Roman Gushchin @ 2025-08-15 18:43 UTC (permalink / raw)
  To: Matthew Wilcox, Andrew Morton
  Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org

Hello!

The commit 4687fdbb805a ("mm/filemap: Support VM_HUGEPAGE for file
mappings") causes a regression in our production for containers
which are running short on memory. In some cases they are getting
stuck for hours in a vicious reclaim cycle. Reverting this commit
fixes the problem.

As I understand, the intention of the commit is to allocate large folios
whenever possible, and the idea is to ignore device-specific readahead
settings and the mmap_miss logic to achieve that, which makes total
sense.

However under a heavy memory pressure there must be a mechanism to
revert to order-0 folios, otherwise the memory pressure is inevitable
increased. Maybe mmap_miss heuristics should still be applied? Any other
ideas how to fix it?

Also, a side question: I wonder if it makes sense to allocate 1-2
PMD-sized folios if mapping_large_folio_support() is not there?

Thanks!

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Regression caused by commit 4687fdbb805a ("mm/filemap: Support VM_HUGEPAGE for file mappings")
  2025-08-15 18:43 Regression caused by commit 4687fdbb805a ("mm/filemap: Support VM_HUGEPAGE for file mappings") Roman Gushchin
@ 2025-08-15 21:01 ` Matthew Wilcox
  2025-08-15 22:12   ` Roman Gushchin
  0 siblings, 1 reply; 4+ messages in thread
From: Matthew Wilcox @ 2025-08-15 21:01 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Andrew Morton, linux-mm@kvack.org, linux-kernel@vger.kernel.org

On Fri, Aug 15, 2025 at 11:43:25AM -0700, Roman Gushchin wrote:
> The commit 4687fdbb805a ("mm/filemap: Support VM_HUGEPAGE for file
> mappings") causes a regression in our production for containers
> which are running short on memory. In some cases they are getting
> stuck for hours in a vicious reclaim cycle. Reverting this commit
> fixes the problem.
> 
> As I understand, the intention of the commit is to allocate large folios
> whenever possible, and the idea is to ignore device-specific readahead
> settings and the mmap_miss logic to achieve that, which makes total
> sense.
> 
> However under a heavy memory pressure there must be a mechanism to
> revert to order-0 folios, otherwise the memory pressure is inevitable
> increased. Maybe mmap_miss heuristics should still be applied? Any other
> ideas how to fix it?

What's supposed to happen is that we should have logic like:

                        if (order > min_order)
                                alloc_gfp |= __GFP_NORETRY | __GFP_NOWARN;

so we try a little bit to free memory if we can't allocate an order-9
folio immediately, but we shouldn't be retrying for hours.  Maybe
that got lost somewhere along the line because I don't see it now.

> Also, a side question: I wonder if it makes sense to allocate 1-2
> PMD-sized folios if mapping_large_folio_support() is not there?

Um, we don't?

        if (!mapping_large_folio_support(mapping) || ra->size < min_ra_size)
                goto fallback;


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Regression caused by commit 4687fdbb805a ("mm/filemap: Support VM_HUGEPAGE for file mappings")
  2025-08-15 21:01 ` Matthew Wilcox
@ 2025-08-15 22:12   ` Roman Gushchin
  2025-08-15 22:21     ` Roman Gushchin
  0 siblings, 1 reply; 4+ messages in thread
From: Roman Gushchin @ 2025-08-15 22:12 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Andrew Morton, linux-mm@kvack.org, linux-kernel@vger.kernel.org

Matthew Wilcox <willy@infradead.org> writes:

> On Fri, Aug 15, 2025 at 11:43:25AM -0700, Roman Gushchin wrote:
>> The commit 4687fdbb805a ("mm/filemap: Support VM_HUGEPAGE for file
>> mappings") causes a regression in our production for containers
>> which are running short on memory. In some cases they are getting
>> stuck for hours in a vicious reclaim cycle. Reverting this commit
>> fixes the problem.
>> 
>> As I understand, the intention of the commit is to allocate large folios
>> whenever possible, and the idea is to ignore device-specific readahead
>> settings and the mmap_miss logic to achieve that, which makes total
>> sense.
>> 
>> However under a heavy memory pressure there must be a mechanism to
>> revert to order-0 folios, otherwise the memory pressure is inevitable
>> increased. Maybe mmap_miss heuristics should still be applied? Any other
>> ideas how to fix it?
>
> What's supposed to happen is that we should have logic like:
>
>                         if (order > min_order)
>                                 alloc_gfp |= __GFP_NORETRY | __GFP_NOWARN;
>
> so we try a little bit to free memory if we can't allocate an order-9
> folio immediately, but we shouldn't be retrying for hours.  Maybe
> that got lost somewhere along the line because I don't see it now.

Yeah, I see it in __filemap_get_folio(), but not in ra_alloc_folio().
I'll prepare a fix for this.

>
>> Also, a side question: I wonder if it makes sense to allocate 1-2
>> PMD-sized folios if mapping_large_folio_support() is not there?
>
> Um, we don't?
>
>         if (!mapping_large_folio_support(mapping) || ra->size < min_ra_size)
>                 goto fallback;

Sorry, I wasn't clear, I mean we're still allocating 2-4MB of readahead.
Shouldn't we do something like this instead?

--

diff --git a/mm/filemap.c b/mm/filemap.c
index 983ba1019674..e5fb9034118d 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -3222,7 +3222,8 @@ static struct file *do_sync_mmap_readahead(struct vm_fault *vmf)
 
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
        /* Use the readahead code, even if readahead is disabled */
-       if ((vm_flags & VM_HUGEPAGE) && HPAGE_PMD_ORDER <= MAX_PAGECACHE_ORDER) {
+       if ((vm_flags & VM_HUGEPAGE) && HPAGE_PMD_ORDER <= MAX_PAGECACHE_ORDER &&
+           mapping_large_folio_support(mapping)) {
                fpin = maybe_unlock_mmap_for_io(vmf, fpin);
                ractl._index &= ~((unsigned long)HPAGE_PMD_NR - 1);
                ra->size = HPAGE_PMD_NR;


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: Regression caused by commit 4687fdbb805a ("mm/filemap: Support VM_HUGEPAGE for file mappings")
  2025-08-15 22:12   ` Roman Gushchin
@ 2025-08-15 22:21     ` Roman Gushchin
  0 siblings, 0 replies; 4+ messages in thread
From: Roman Gushchin @ 2025-08-15 22:21 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Andrew Morton, linux-mm@kvack.org, linux-kernel@vger.kernel.org

Roman Gushchin <roman.gushchin@linux.dev> writes:

> Matthew Wilcox <willy@infradead.org> writes:
>
>> On Fri, Aug 15, 2025 at 11:43:25AM -0700, Roman Gushchin wrote:
>>> The commit 4687fdbb805a ("mm/filemap: Support VM_HUGEPAGE for file
>>> mappings") causes a regression in our production for containers
>>> which are running short on memory. In some cases they are getting
>>> stuck for hours in a vicious reclaim cycle. Reverting this commit
>>> fixes the problem.
>>> 
>>> As I understand, the intention of the commit is to allocate large folios
>>> whenever possible, and the idea is to ignore device-specific readahead
>>> settings and the mmap_miss logic to achieve that, which makes total
>>> sense.
>>> 
>>> However under a heavy memory pressure there must be a mechanism to
>>> revert to order-0 folios, otherwise the memory pressure is inevitable
>>> increased. Maybe mmap_miss heuristics should still be applied? Any other
>>> ideas how to fix it?
>>
>> What's supposed to happen is that we should have logic like:
>>
>>                         if (order > min_order)
>>                                 alloc_gfp |= __GFP_NORETRY | __GFP_NOWARN;
>>
>> so we try a little bit to free memory if we can't allocate an order-9
>> folio immediately, but we shouldn't be retrying for hours.  Maybe
>> that got lost somewhere along the line because I don't see it now.
>
> Yeah, I see it in __filemap_get_folio(), but not in ra_alloc_folio().
> I'll prepare a fix for this.

Actually I'm wrong. It's there, hidden in readahead_gfp_mask(), and it's
not conditional on the folio order. However it's not helping/not enough.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-08-15 22:21 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-15 18:43 Regression caused by commit 4687fdbb805a ("mm/filemap: Support VM_HUGEPAGE for file mappings") Roman Gushchin
2025-08-15 21:01 ` Matthew Wilcox
2025-08-15 22:12   ` Roman Gushchin
2025-08-15 22:21     ` Roman Gushchin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).