linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Baolin Wang <baolin.wang@linux.alibaba.com>,
	Gavin Shan <gshan@redhat.com>,
	Matthew Wilcox <willy@infradead.org>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, william.kucharski@oracle.com,
	ryan.roberts@arm.com, shan.gavin@gmail.com
Subject: Re: [PATCH] mm/huge_memory: Avoid PMD-size page cache if needed
Date: Sat, 13 Jul 2024 06:17:37 +0200	[thread overview]
Message-ID: <11c95a82-cf13-414b-b489-1dd48255e022@redhat.com> (raw)
In-Reply-To: <b11d6006-1efb-4329-baa0-75799935e019@linux.alibaba.com>

On 13.07.24 06:01, Baolin Wang wrote:
> 
> 
> On 2024/7/13 09:03, David Hildenbrand wrote:
>> On 12.07.24 07:39, Gavin Shan wrote:
>>> On 7/12/24 7:03 AM, David Hildenbrand wrote:
>>>> On 11.07.24 22:46, Matthew Wilcox wrote:
>>>>> On Thu, Jul 11, 2024 at 08:48:40PM +1000, Gavin Shan wrote:
>>>>>> +++ b/mm/huge_memory.c
>>>>>> @@ -136,7 +136,8 @@ unsigned long __thp_vma_allowable_orders(struct
>>>>>> vm_area_struct *vma,
>>>>>>             while (orders) {
>>>>>>                 addr = vma->vm_end - (PAGE_SIZE << order);
>>>>>> -            if (thp_vma_suitable_order(vma, addr, order))
>>>>>> +            if (!(vma->vm_file && order > MAX_PAGECACHE_ORDER) &&
>>>>>> +                thp_vma_suitable_order(vma, addr, order))
>>>>>>                     break;
>>>>>
>>>>> Why does 'orders' even contain potential orders that are larger than
>>>>> MAX_PAGECACHE_ORDER?
>>>>>
>>>>> We do this at the top:
>>>>>
>>>>>            orders &= vma_is_anonymous(vma) ?
>>>>>                            THP_ORDERS_ALL_ANON : THP_ORDERS_ALL_FILE;
>>>>>
>>>>> include/linux/huge_mm.h:#define THP_ORDERS_ALL_FILE
>>>>> (BIT(PMD_ORDER) | BIT(PUD_ORDER))
>>>>>
>>>>> ... and that seems very wrong.  We support all kinds of orders for
>>>>> files, not just PMD order.  We don't support PUD order at all.
>>>>>
>>>>> What the hell is going on here?
>>>>
>>>> yes, that's just absolutely confusing. I mentioned it to Ryan lately
>>>> that we should clean that up (I wanted to look into that, but am
>>>> happy if someone else can help).
>>>>
>>>> There should likely be different defines for
>>>>
>>>> DAX (PMD|PUD)
>>>>
>>>> SHMEM (PMD) -- but soon more. Not sure if we want separate ANON_SHMEM
>>>> for the time being. Hm. But shmem is already handles separately, so
>>>> maybe we can just ignore shmem here.
>>>>
>>>> PAGECACHE (1 .. MAX_PAGECACHE_ORDER)
>>>>
>>>> ? But it's still unclear to me.
>>>>
>>>> At least DAX must stay special I think, and PAGECACHE should be
>>>> capped at MAX_PAGECACHE_ORDER.
>>>>
>>>
>>> David, I can help to clean it up. Could you please help to confirm the
>>> following
>>
>> Thanks!
>>
>>> changes are exactly what you're suggesting? Hopefully, there are
>>> nothing I've missed.
>>> The original issue can be fixed by the changes. With the changes
>>> applied, madvise(MADV_COLLAPSE)
>>> returns with errno -22 in the test program.
>>>
>>> The fix tag needs to adjusted either.
>>>
>>> Fixes: 3485b88390b0 ("mm: thp: introduce multi-size THP sysfs interface")
>>>
>>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>>> index 2aa986a5cd1b..45909efb0ef0 100644
>>> --- a/include/linux/huge_mm.h
>>> +++ b/include/linux/huge_mm.h
>>> @@ -74,7 +74,12 @@ extern struct kobj_attribute shmem_enabled_attr;
>>>     /*
>>>      * Mask of all large folio orders supported for file THP.
>>>      */
>>> -#define THP_ORDERS_ALL_FILE    (BIT(PMD_ORDER) | BIT(PUD_ORDER))
>>
>> DAX doesn't have any MAX_PAGECACHE_ORDER restrictions (like hugetlb). So
>> this should be
>>
>> /*
>>    * FSDAX never splits folios, so the MAX_PAGECACHE_ORDER limit does not
>>    * apply here.
>>    */
>> THP_ORDERS_ALL_FILE_DAX ((BIT(PMD_ORDER) | BIT(PUD_ORDER))
>>
>> Something like that
>>
>>> +#define THP_ORDERS_ALL_FILE_DAX                \
>>> +       ((BIT(PMD_ORDER) | BIT(PUD_ORDER)) & (BIT(MAX_PAGECACHE_ORDER
>>> + 1) - 1))
>>> +#define THP_ORDERS_ALL_FILE_DEFAULT    \
>>> +       ((BIT(MAX_PAGECACHE_ORDER + 1) - 1) & ~BIT(0))
>>> +#define THP_ORDERS_ALL_FILE            \
>>> +       (THP_ORDERS_ALL_FILE_DAX | THP_ORDERS_ALL_FILE_DEFAULT)
>>
>> Maybe we can get rid of THP_ORDERS_ALL_FILE (to prevent abuse) and fixup
>> THP_ORDERS_ALL instead.
>>
>>>     /*
>>>      * Mask of all large folio orders supported for THP.
>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>> index 2120f7478e55..4690f33afaa6 100644
>>> --- a/mm/huge_memory.c
>>> +++ b/mm/huge_memory.c
>>> @@ -88,9 +88,17 @@ unsigned long __thp_vma_allowable_orders(struct
>>> vm_area_struct *vma,
>>>            bool smaps = tva_flags & TVA_SMAPS;
>>>            bool in_pf = tva_flags & TVA_IN_PF;
>>>            bool enforce_sysfs = tva_flags & TVA_ENFORCE_SYSFS;
>>> +       unsigned long supported_orders;
>>> +
>>>            /* Check the intersection of requested and supported orders. */
>>> -       orders &= vma_is_anonymous(vma) ?
>>> -                       THP_ORDERS_ALL_ANON : THP_ORDERS_ALL_FILE;
>>> +       if (vma_is_anonymous(vma))
>>> +               supported_orders = THP_ORDERS_ALL_ANON;
>>> +       else if (vma_is_dax(vma))
>>> +               supported_orders = THP_ORDERS_ALL_FILE_DAX;
>>> +       else
>>> +               supported_orders = THP_ORDERS_ALL_FILE_DEFAULT;
>>
>> This is what I had in mind.
>>
>> But, do we have to special-case shmem as well or will that be handled
>> correctly?
> 
> For anonymous shmem, it is now same as anonymous THP, which can utilize
> THP_ORDERS_ALL_ANON.
> For tmpfs, we currently only support PMD-sized THP
> (will support more larger orders in the future). Therefore, I think we
> can reuse THP_ORDERS_ALL_ANON for shmem now:
> 
> if (vma_is_anonymous(vma) || shmem_file(vma->vm_file)))
> 	supported_orders = THP_ORDERS_ALL_ANON;
> ......
> 


It should be THP_ORDERS_ALL_FILE_DEFAULT (MAX_PAGECACHE_ORDER imitation 
applies).

-- 
Cheers,

David / dhildenb



  reply	other threads:[~2024-07-13  4:17 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20240711104840.200573-1-gshan@redhat.com>
2024-07-11 20:46 ` [PATCH] mm/huge_memory: Avoid PMD-size page cache if needed Matthew Wilcox
2024-07-11 21:03   ` David Hildenbrand
2024-07-11 21:20     ` David Hildenbrand
2024-07-12  5:39     ` Gavin Shan
2024-07-13  1:03       ` David Hildenbrand
2024-07-13  4:01         ` Baolin Wang
2024-07-13  4:17           ` David Hildenbrand [this message]
2024-07-13 12:57             ` Baolin Wang
2024-07-13  9:25         ` Gavin Shan
2024-07-13 11:05   ` Ryan Roberts

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=11c95a82-cf13-414b-b489-1dd48255e022@redhat.com \
    --to=david@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=gshan@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ryan.roberts@arm.com \
    --cc=shan.gavin@gmail.com \
    --cc=william.kucharski@oracle.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).