From: Gavin Shan <gshan@redhat.com>
To: David Hildenbrand <david@redhat.com>,
Matthew Wilcox <willy@infradead.org>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
akpm@linux-foundation.org, william.kucharski@oracle.com,
ryan.roberts@arm.com, shan.gavin@gmail.com
Subject: Re: [PATCH] mm/huge_memory: Avoid PMD-size page cache if needed
Date: Sat, 13 Jul 2024 19:25:34 +1000 [thread overview]
Message-ID: <a168f908-3906-43e3-8676-360809ed5c8d@redhat.com> (raw)
In-Reply-To: <df83a218-e2e5-496e-999a-e446a7d0b383@redhat.com>
On 7/13/24 11:03 AM, David Hildenbrand wrote:
> On 12.07.24 07:39, Gavin Shan wrote:
>>
>> David, I can help to clean it up. Could you please help to confirm the following
>
> Thanks!
>
>> changes are exactly what you're suggesting? Hopefully, there are nothing I've missed.
>> The original issue can be fixed by the changes. With the changes applied, madvise(MADV_COLLAPSE)
>> returns with errno -22 in the test program.
>>
>> The fix tag needs to adjusted either.
>>
>> Fixes: 3485b88390b0 ("mm: thp: introduce multi-size THP sysfs interface")
>>
>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>> index 2aa986a5cd1b..45909efb0ef0 100644
>> --- a/include/linux/huge_mm.h
>> +++ b/include/linux/huge_mm.h
>> @@ -74,7 +74,12 @@ extern struct kobj_attribute shmem_enabled_attr;
>> /*
>> * Mask of all large folio orders supported for file THP.
>> */
>> -#define THP_ORDERS_ALL_FILE (BIT(PMD_ORDER) | BIT(PUD_ORDER))
>
> DAX doesn't have any MAX_PAGECACHE_ORDER restrictions (like hugetlb). So this should be
>
> /*
> * FSDAX never splits folios, so the MAX_PAGECACHE_ORDER limit does not
> * apply here.
> */
> THP_ORDERS_ALL_FILE_DAX ((BIT(PMD_ORDER) | BIT(PUD_ORDER))
>
> Something like that
>
Ok. It will be corrected in v2.
>> +#define THP_ORDERS_ALL_FILE_DAX \
>> + ((BIT(PMD_ORDER) | BIT(PUD_ORDER)) & (BIT(MAX_PAGECACHE_ORDER + 1) - 1))
>> +#define THP_ORDERS_ALL_FILE_DEFAULT \
>> + ((BIT(MAX_PAGECACHE_ORDER + 1) - 1) & ~BIT(0))
>> +#define THP_ORDERS_ALL_FILE \
>> + (THP_ORDERS_ALL_FILE_DAX | THP_ORDERS_ALL_FILE_DEFAULT)
>
> Maybe we can get rid of THP_ORDERS_ALL_FILE (to prevent abuse) and fixup
> THP_ORDERS_ALL instead.
>
Sure, it will be removed in v2.
>> /*
>> * Mask of all large folio orders supported for THP.
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index 2120f7478e55..4690f33afaa6 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -88,9 +88,17 @@ unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma,
>> bool smaps = tva_flags & TVA_SMAPS;
>> bool in_pf = tva_flags & TVA_IN_PF;
>> bool enforce_sysfs = tva_flags & TVA_ENFORCE_SYSFS;
>> + unsigned long supported_orders;
>> +
>> /* Check the intersection of requested and supported orders. */
>> - orders &= vma_is_anonymous(vma) ?
>> - THP_ORDERS_ALL_ANON : THP_ORDERS_ALL_FILE;
>> + if (vma_is_anonymous(vma))
>> + supported_orders = THP_ORDERS_ALL_ANON;
>> + else if (vma_is_dax(vma))
>> + supported_orders = THP_ORDERS_ALL_FILE_DAX;
>> + else
>> + supported_orders = THP_ORDERS_ALL_FILE_DEFAULT;
>
> This is what I had in mind.
>
> But, do we have to special-case shmem as well or will that be handled correctly?
>
With previous fixes and this one, I don't see there is any missed cases
for shmem to have 512MB page cache, exceeding MAX_PAGECACHE_ORDER. Hopefully,
I don't miss anything from the code inspection.
- regular read/write paths: covered by the previous fixes
- synchronous readahead: covered by the previous fixes
- asynchronous readahead: page size granularity, no huge page
- page fault handling: covered by the previous fixes
- collapsing PTEs to PMD: to be covered by this patch
- swapin: shouldn't have 512MB huge page since we don't have such huge pages during swapout period
- other cases I missed (?)
Thanks,
Gavin
next prev parent reply other threads:[~2024-07-13 9:25 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20240711104840.200573-1-gshan@redhat.com>
2024-07-11 20:46 ` [PATCH] mm/huge_memory: Avoid PMD-size page cache if needed Matthew Wilcox
2024-07-11 21:03 ` David Hildenbrand
2024-07-11 21:20 ` David Hildenbrand
2024-07-12 5:39 ` Gavin Shan
2024-07-13 1:03 ` David Hildenbrand
2024-07-13 4:01 ` Baolin Wang
2024-07-13 4:17 ` David Hildenbrand
2024-07-13 12:57 ` Baolin Wang
2024-07-13 9:25 ` Gavin Shan [this message]
2024-07-13 11:05 ` Ryan Roberts
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a168f908-3906-43e3-8676-360809ed5c8d@redhat.com \
--to=gshan@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=david@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ryan.roberts@arm.com \
--cc=shan.gavin@gmail.com \
--cc=william.kucharski@oracle.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).