linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Mike Kravetz <mike.kravetz@oracle.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	inuxppc-dev@lists.ozlabs.org, linux-ia64@vger.kernel.org,
	Baolin Wang <baolin.wang@linux.alibaba.com>,
	"Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>,
	Naoya Horiguchi <naoya.horiguchi@linux.dev>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Muchun Song <songmuchun@bytedance.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Christophe Leroy <christophe.leroy@csgroup.eu>
Subject: Re: [PATCH] hugetlb: simplify hugetlb handling in follow_page_mask
Date: Wed, 31 Aug 2022 10:07:24 +0200	[thread overview]
Message-ID: <739dc825-ece3-a59f-adc5-65861676e0ae@redhat.com> (raw)
In-Reply-To: <Yw6Bpsow+gUMlHCU@monkey>

On 30.08.22 23:31, Mike Kravetz wrote:
> On 08/30/22 09:52, Mike Kravetz wrote:
>> On 08/30/22 10:11, David Hildenbrand wrote:
>>> On 30.08.22 01:40, Mike Kravetz wrote:
>>>> During discussions of this series [1], it was suggested that hugetlb
>>>> handling code in follow_page_mask could be simplified.  At the beginning
>>>
>>> Feel free to use a Suggested-by if you consider it appropriate.
>>>
>>>> of follow_page_mask, there currently is a call to follow_huge_addr which
>>>> 'may' handle hugetlb pages.  ia64 is the only architecture which provides
>>>> a follow_huge_addr routine that does not return error.  Instead, at each
>>>> level of the page table a check is made for a hugetlb entry.  If a hugetlb
>>>> entry is found, a call to a routine associated with that entry is made.
>>>>
>>>> Currently, there are two checks for hugetlb entries at each page table
>>>> level.  The first check is of the form:
>>>> 	if (p?d_huge())
>>>> 		page = follow_huge_p?d();
>>>> the second check is of the form:
>>>> 	if (is_hugepd())
>>>> 		page = follow_huge_pd().
>>>
>>> BTW, what about all this hugepd stuff in mm/pagewalk.c?
>>>
>>> Isn't this all dead code as we're essentially routing all hugetlb VMAs
>>> via walk_hugetlb_range? [yes, all that hugepd stuff in generic code that
>>> overcomplicates stuff has been annoying me for a long time]
>>
>> I am 'happy' to look at cleaning up that code next.  Perhaps I will just
>> create a cleanup series.
>>
> 
> Technically, that code is not dead IIUC.  The call to walk_hugetlb_range in
> __walk_page_range is as follows:
> 
> 	if (vma && is_vm_hugetlb_page(vma)) {
> 		if (ops->hugetlb_entry)
> 			err = walk_hugetlb_range(start, end, walk);
> 	} else
> 		err = walk_pgd_range(start, end, walk);
> 
> We also have the interface walk_page_range_novma() that will call
> __walk_page_range without a value for vma.  So, in that case we would
> end up calling walk_pgd_range, etc.  walk_pgd_range and related routines
> do have those checks such as:
> 
> 		if (is_hugepd(__hugepd(pmd_val(*pmd))))
> 			err = walk_hugepd_range((hugepd_t *)pmd, addr, next, walk, PMD_SHIFT);
> 
> So, it looks like in this case we would process 'hugepd' entries but not
> 'normal' hugetlb entries.  That does not seem right.

:/ walking a hugetlb range without knowing whether it's a hugetlb range
is certainly questionable.


> 
> Christophe Leroy added this code with commit e17eae2b8399 "mm: pagewalk: fix
> walk for hugepage tables".  This was part of the series "Convert powerpc to
> GENERIC_PTDUMP".  And, the ptdump code uses the walk_page_range_novma
> interface.  So, this code is certainly not dead.

Hm, that commit doesn't actually mention how it can happen, what exactly
will happen ("crazy result") and if it ever happened.

> 
> Adding Christophe on Cc:
> 
> Christophe do you know if is_hugepd is true for all hugetlb entries, not
> just hugepd?
> 
> On systems without hugepd entries, I guess ptdump skips all hugetlb entries.
> Sigh!

IIUC, the idea of ptdump_walk_pgd() is to dump page tables even outside
VMAs (for debugging purposes?).

I cannot convince myself that that's a good idea when only holding the
mmap lock in read mode, because we can just see page tables getting
freed concurrently e.g., during concurrent munmap() ... while holding
the mmap lock in read we may only walk inside VMA boundaries.

That then raises the questions if we're only calling this on special MMs
(e.g., init_mm) whereby we cannot really see concurrent munmap() and
where we shouldn't have hugetlb mappings or hugepd entries.

-- 
Thanks,

David / dhildenb



  reply	other threads:[~2022-08-31  8:07 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-29 23:40 [PATCH] hugetlb: simplify hugetlb handling in follow_page_mask Mike Kravetz
2022-08-30  1:06 ` Baolin Wang
2022-08-30 16:44   ` Mike Kravetz
2022-08-30 18:39     ` Mike Kravetz
2022-08-31  1:07       ` Baolin Wang
2022-09-01  0:00         ` Mike Kravetz
2022-09-01  1:24           ` Baolin Wang
2022-09-01  6:59             ` David Hildenbrand
2022-09-01 10:40               ` Baolin Wang
2022-08-30  8:11 ` David Hildenbrand
2022-08-30 16:52   ` Mike Kravetz
2022-08-30 21:31     ` Mike Kravetz
2022-08-31  8:07       ` David Hildenbrand [this message]
2022-09-02 18:50         ` Mike Kravetz
2022-09-02 18:52           ` David Hildenbrand
2022-09-03  6:59             ` Christophe Leroy
2022-09-03  7:07             ` Christophe Leroy
2022-09-04 11:49               ` Michael Ellerman
2022-09-05  8:37               ` David Hildenbrand
2022-09-05  9:33                 ` Christophe Leroy
2022-09-05  9:46                   ` David Hildenbrand
2022-09-05 16:05                     ` Christophe Leroy
2022-09-05 16:09                       ` David Hildenbrand
2022-08-31  5:08 ` kernel test robot
2022-08-31 20:42   ` Mike Kravetz
2022-09-01 16:19 ` Mike Kravetz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=739dc825-ece3-a59f-adc5-65861676e0ae@redhat.com \
    --to=david@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=christophe.leroy@csgroup.eu \
    --cc=inuxppc-dev@lists.ozlabs.org \
    --cc=linux-ia64@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=mpe@ellerman.id.au \
    --cc=naoya.horiguchi@linux.dev \
    --cc=songmuchun@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).