linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: ran xiaokai <ranxiaokai627@163.com>
To: ryan.roberts@arm.com
Cc: 21cnbao@gmail.com, akpm@linux-foundation.org,
	baolin.wang@linux.alibaba.com, david@redhat.com,
	ioworker0@gmail.com, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	peterx@redhat.com, ran.xiaokai@zte.com.cn, ranxiaokai627@163.com,
	svetly.todorov@memverge.com, vbabka@suse.cz,
	yang.yang29@zte.com.cn, si.hao@zte.com.cn,
	wangkefeng.wang@huawei.com, willy@infradead.org, ziy@nvidia.com
Subject: Re: [PATCH 2/2] kpageflags: fix wrong KPF_THP on non-pmd-mappable compound pages
Date: Thu, 27 Jun 2024 12:46:13 +0000	[thread overview]
Message-ID: <20240627124613.23377-1-ranxiaokai627@163.com> (raw)
In-Reply-To: <4e1a1878-4133-4d78-90fa-1d5bc99d179c@arm.com>

>On 27/06/2024 10:16, Barry Song wrote:
>> On Thu, Jun 27, 2024 at 8:39?PM Ryan Roberts <ryan.roberts@arm.com> wrote:
>>>
>>> On 27/06/2024 05:10, Barry Song wrote:
>>>> On Thu, Jun 27, 2024 at 2:40?AM Zi Yan <ziy@nvidia.com> wrote:
>>>>>
>>>>> On Wed Jun 26, 2024 at 7:07 AM EDT, Ryan Roberts wrote:
>>>>>> On 26/06/2024 04:06, Zi Yan wrote:
>>>>>>> On Tue Jun 25, 2024 at 10:49 PM EDT, ran xiaokai wrote:
>>>>>>>> From: Ran Xiaokai <ran.xiaokai@zte.com.cn>
>>>>>>>>
>>>>>>>> KPF_COMPOUND_HEAD and KPF_COMPOUND_TAIL are set on "common" compound
>>>>>>>> pages, which means of any order, but KPF_THP should only be set
>>>>>>>> when the folio is a 2M pmd mappable THP.
>>>>>>
>>>>>> Why should KPF_THP only be set on 2M THP? What problem does it cause as it is
>>>>>> currently configured?
>>>>>>
>>>>>> I would argue that mTHP is still THP so should still have the flag. And since
>>>>>> these smaller mTHP sizes are disabled by default, only mTHP-aware user space
>>>>>> will be enabling them, so I'll naively state that it should not cause compat
>>>>>> issues as is.
>>>>>>
>>>>>> Also, the script at tools/mm/thpmaps relies on KPF_THP being set for all mTHP
>>>>>> sizes to function correctly. So that would need to be reworked if making this
>>>>>> change.
>>>>>
>>>>> + more folks working on mTHP
>>>>>
>>>>> I agree that mTHP is still THP, but we might want different
>>>>> stats/counters for it, since people might want to keep the old THP counters
>>>>> consistent. See recent commits on adding mTHP counters:
>>>>> ec33687c6749 ("mm: add per-order mTHP anon_fault_alloc and anon_fault_fallback
>>>>> counters"), 1f97fd042f38 ("mm: shmem: add mTHP counters for anonymous shmem")
>>>>>
>>>>> and changes to make THP counter to only count PMD THP:
>>>>> 835c3a25aa37 ("mm: huge_memory: add the missing folio_test_pmd_mappable() for
>>>>> THP split statistics")
>>>>>
>>>>> In this case, I wonder if we want a new KPF_MTHP bit for mTHP and some
>>>>> adjustment on tools/mm/thpmaps.
>>>>
>>>> It seems we have to do this though I think keeping KPF_THP and adding a
>>>> separate bit like KPF_PMD_MAPPED makes more sense. but those tools
>>>> relying on KPF_THP need to realize this and check the new bit , which is
>>>> not done now.
>>>> whether the mTHP's name is mTHP or THP will make no difference for
>>>> this case:-)
>>>
>>> I don't quite follow your logic for that last part; If there are 2 separate
>>> bits; KPF_THP and KPF_MTHP, and KPF_THP is only set for PMD-sized THP, that
>>> would be a safe/compatible approach, right? Where as your suggestion requires
>>> changes to existing tools to work.
>> 
>> Right, my point is that mTHP and THP are both types of THP. The only difference
>> is whether they are PMD-mapped or PTE-mapped. Adding a bit to describe how
>> the page is mapped would more accurately reflect reality. However, this change
>> would disrupt tools that assume KPF_THP always means PMD-mapped THP.
>> Therefore, we would still need separate bits for THP and mTHP in this case.
>
>I think perhaps PTE- vs PMD-mapped is a separate issue. The issue at hand is
>whether PKF_THP implies a fixed size (and alignment). If compat is an issue,
>then PKF_THP must continue to imply PMD-size. If compat is not an issue, then
>size can be determined by iterating over the entries.
>
>Having a mechanism to determine the level at which a block is mapped would
>potentially be a useful feature, but seems orthogonal to me.
>
>> 
>> I saw Willy complain about mTHP being called "mTHP," but in this case, calling
>> it "mTHP" or just "THP" doesn't change anything if old tools continue to assume
>> that KPF_THP means PMD-mapped THP.
>
>I think Willy was just ribbing me because he preferred calling it "anonymous
>large folios". That's how I took it anyway.
>
>> 
>>>
>>> Thinking about this a bit more, I wonder if PKF_MTHP is the right name for a new
>>> flag; We don't currently expose the term "mTHP" to user space. I can't think of
>>> a better name though.
>> 
>> Yes.  If "compatibility" is a requirement, we cannot disregard it.
>> 
>>> I'd still like to understand what is actually broken that this change is fixing.
>>> Is the concern that a user could see KPF_THP and advance forward by
>>> "/sys/kernel/mm/transparent_hugepage/hpage_pmd_size / getpagesize()" entries?
>>>
>> 
>> Maybe we need an example which is thinking that KPF_THP is PMD-mapped.
>
>Yes, that would help.

For now it is the testcase in tools/testing/selftests/mm/split_huge_page_test,
if we try to split THP to other orders other than 0, the testcase will break.

Maybe we can use KPF_COMPOUND_HEAD and KPF_COMPOUND_TAIL to figure out
the compound page's start/end and the order. But these two flags are not
for userspace memory only.


  reply	other threads:[~2024-06-27 12:47 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-26  2:49 [PATCH 0/2] kpageflags: fix wrong KPF_THP on non-pmd-mappable compound pages ran xiaokai
2024-06-26  2:49 ` [PATCH 1/2] mm: Constify folio_order()/folio_test_pmd_mappable() ran xiaokai
2024-06-26  3:09   ` Zi Yan
2024-06-26  4:30     ` ran xiaokai
2024-06-26 11:19       ` Zi Yan
2024-06-26  2:49 ` [PATCH 2/2] kpageflags: fix wrong KPF_THP on non-pmd-mappable compound pages ran xiaokai
2024-06-26  3:06   ` Zi Yan
2024-06-26  4:32     ` ran xiaokai
2024-06-26 11:07     ` Ryan Roberts
2024-06-26 14:40       ` Zi Yan
2024-06-26 14:42         ` Ryan Roberts
2024-06-27  1:54           ` Lance Yang
2024-06-27  4:10         ` Barry Song
2024-06-27  8:39           ` Ryan Roberts
2024-06-27  9:16             ` Barry Song
2024-06-27  9:27               ` Ryan Roberts
2024-06-27 12:46                 ` ran xiaokai [this message]
2024-06-26 15:15       ` Matthew Wilcox
2024-06-26 15:18         ` Ryan Roberts
2024-06-27  2:07           ` Lance Yang
2024-06-26 15:55   ` kernel test robot
2024-06-26 16:21   ` kernel test robot
2024-06-27 13:54   ` David Hildenbrand
2024-06-28  3:01     ` ran xiaokai
2024-07-03  9:20     ` ran xiaokai
2024-07-03 10:11       ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240627124613.23377-1-ranxiaokai627@163.com \
    --to=ranxiaokai627@163.com \
    --cc=21cnbao@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=david@redhat.com \
    --cc=ioworker0@gmail.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=peterx@redhat.com \
    --cc=ran.xiaokai@zte.com.cn \
    --cc=ryan.roberts@arm.com \
    --cc=si.hao@zte.com.cn \
    --cc=svetly.todorov@memverge.com \
    --cc=vbabka@suse.cz \
    --cc=wangkefeng.wang@huawei.com \
    --cc=willy@infradead.org \
    --cc=yang.yang29@zte.com.cn \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).