All of lore.kernel.org
 help / color / mirror / Atom feed
From: Oscar Salvador <osalvador@suse.de>
To: David Hildenbrand <david@redhat.com>
Cc: Peter Xu <peterx@redhat.com>, Gavin Guo <gavinguo@igalia.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	muchun.song@linux.dev, akpm@linux-foundation.org,
	mike.kravetz@oracle.com, kernel-dev@igalia.com,
	stable@vger.kernel.org, Hugh Dickins <hughd@google.com>,
	Florent Revest <revest@google.com>, Gavin Shan <gshan@redhat.com>
Subject: Re: [PATCH v3] mm/hugetlb: fix a deadlock with pagecache_folio and hugetlb_fault_mutex_table
Date: Wed, 28 May 2025 23:34:09 +0200	[thread overview]
Message-ID: <aDeBUXCRLRZobHq0@localhost.localdomain> (raw)
In-Reply-To: <04ecf2e3-651a-47c9-9f30-d31423e2c9d7@redhat.com>

On Wed, May 28, 2025 at 10:26:04PM +0200, David Hildenbrand wrote:
> Digging a bit:
> 
> commit 56c9cfb13c9b6516017eea4e8cbe22ea02e07ee6
> Author: Naoya Horiguchi <nao.horiguchi@gmail.com>
> Date:   Fri Sep 10 13:23:04 2010 +0900
> 
>     hugetlb, rmap: fix confusing page locking in hugetlb_cow()
>     The "if (!trylock_page)" block in the avoidcopy path of hugetlb_cow()
>     looks confusing and is buggy.  Originally this trylock_page() was
>     intended to make sure that old_page is locked even when old_page !=
>     pagecache_page, because then only pagecache_page is locked.
> 
> Added the comment
> 
> +       /*
> +        * hugetlb_cow() requires page locks of pte_page(entry) and
> +        * pagecache_page, so here we need take the former one
> +        * when page != pagecache_page or !pagecache_page.
> +        * Note that locking order is always pagecache_page -> page,
> +        * so no worry about deadlock.
> +        */
> 
> 
> And
> 
> commit 0fe6e20b9c4c53b3e97096ee73a0857f60aad43f
> Author: Naoya Horiguchi <nao.horiguchi@gmail.com>
> Date:   Fri May 28 09:29:16 2010 +0900
> 
>     hugetlb, rmap: add reverse mapping for hugepage
>     This patch adds reverse mapping feature for hugepage by introducing
>     mapcount for shared/private-mapped hugepage and anon_vma for
>     private-mapped hugepage.
>     While hugepage is not currently swappable, reverse mapping can be useful
>     for memory error handler.
>     Without this patch, memory error handler cannot identify processes
>     using the bad hugepage nor unmap it from them. That is:
>     - for shared hugepage:
>       we can collect processes using a hugepage through pagecache,
>       but can not unmap the hugepage because of the lack of mapcount.
>     - for privately mapped hugepage:
>       we can neither collect processes nor unmap the hugepage.
>     This patch solves these problems.
>     This patch include the bug fix given by commit 23be7468e8, so reverts it.
> 
> Added the real locking magic.

Yes, I have been checking "hugetlb, rmap: add reverse mapping for
hugepage", which added locking the now-so-called 'old_folio' in case
hugetlbfs_pagecache_page() didn't return anything.

Because in hugetlb_wp, this was added:

 @@ -2286,8 +2299,11 @@ static int hugetlb_cow(struct mm_struct *mm, struct vm_area_struct *vma,
  retry_avoidcopy:
         /* If no-one else is actually using this page, avoid the copy
          * and just make the page writable */
 -       avoidcopy = (page_count(old_page) == 1);
 +       avoidcopy = (page_mapcount(old_page) == 1);
         if (avoidcopy) {
 +               if (!trylock_page(old_page))
 +                       if (PageAnon(old_page))
 +                               page_move_anon_rmap(old_page, vma, address);

So, as you mentioned, it was done to keep the rmap stable as I guess rmap test test the
PageLock. 


> Not that much changed regarding locking until COW support was added in
> 
> commit 1e8f889b10d8d2223105719e36ce45688fedbd59
> Author: David Gibson <david@gibson.dropbear.id.au>
> Date:   Fri Jan 6 00:10:44 2006 -0800
> 
>     [PATCH] Hugetlb: Copy on Write support
>     Implement copy-on-write support for hugetlb mappings so MAP_PRIVATE can be
>     supported.  This helps us to safely use hugetlb pages in many more
>     applications.  The patch makes the following changes.  If needed, I also have
>     it broken out according to the following paragraphs.
> 
> 
> Confusing.
> 
> Locking the *old_folio* when calling hugetlb_wp() makes sense when it is
> an anon folio because we might want to call folio_move_anon_rmap() to adjust the rmap root.

Yes, this is clear.

> Locking the pagecache folio when calling hugetlb_wp() if old_folio is an anon folio ...
> does not make sense to me.

I think this one is also clear.

> Locking the pagecache folio when calling hugetlb_wp if old_folio is a pageache folio ...
> also doesn't quite make sense for me.
> Again, we don't take the lock for ordinary pages, so what's special about hugetlb for the last
> case (reservations, I assume?).

So, this case is when pagecache_folio == old_folio.

I guess we are talking about resv_maps? But I think we cannot interfere there.
For the reserves to be modified the page has to go away.

Now, I have been checking this one too:

 commit 04f2cbe35699d22dbf428373682ead85ca1240f5
 Author: Mel Gorman <mel@csn.ul.ie>
 Date:   Wed Jul 23 21:27:25 2008 -0700
 
     hugetlb: guarantee that COW faults for a process that called mmap(MAP_PRIVATE) on hugetlbfs will succeed

And I think it is interesting.
That one added this chunk in hugetlb_fault():

 @@ -1126,8 +1283,15 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
         spin_lock(&mm->page_table_lock);
         /* Check for a racing update before calling hugetlb_cow */
         if (likely(pte_same(entry, huge_ptep_get(ptep))))
 -               if (write_access && !pte_write(entry))
 -                       ret = hugetlb_cow(mm, vma, address, ptep, entry);
 +               if (write_access && !pte_write(entry)) {
 +                       struct page *page;
 +                       page = hugetlbfs_pagecache_page(vma, address);
 +                       ret = hugetlb_cow(mm, vma, address, ptep, entry, page);
 +                       if (page) {
 +                               unlock_page(page);
 +                               put_page(page);
 +                       }
 +               }

So, it finds and lock the page in the pagecache, and calls hugetlb_cow.

hugetlb_fault() takes hugetlb_instantiation_mutex, and there is a
comment saying:

        /*
         * Serialize hugepage allocation and instantiation, so that we don't
         * get spurious allocation failures if two CPUs race to instantiate
         * the same page in the page cache.
         */
        mutex_lock(&hugetlb_instantiation_mutex);

But it does not say anything about truncation.
Actually, checking the truncation code from back then,
truncate_hugepages() (and none of its callers) take the hugetlb_instantiation_mutex,
as it is done today (e.g: current remove_inode_hugepages() code).

Back then, truncate_hugepages() relied only in lock_page():

 static void truncate_hugepages(struct inode *inode, loff_t lstart)
 {
  ...
  ...
  lock_page(page);
  truncate_huge_page(page);
  unlock_page(page);
 }

While today, remove_inode_hugepages() takes the mutex, and also the lock.
And then zaps the page and does its thing with resv_maps.

So I think that we should not even need the lock for hugetlb_wp
when pagecache_folio == old_folio (pagecache), because the mutex
already protects us from the page to go away, right (e.g: truncated)?
Besides we hold a reference on that page since
filemap_lock_hugetlb_folio() locks the page and increases its refcount.

All in all, I am leaning towards not being needed, but it's getting late
here..


-- 
Oscar Salvador
SUSE Labs

      reply	other threads:[~2025-05-28 21:34 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-28  2:33 [PATCH v3] mm/hugetlb: fix a deadlock with pagecache_folio and hugetlb_fault_mutex_table Gavin Guo
2025-05-28  9:27 ` Oscar Salvador
2025-05-28 15:03   ` Peter Xu
2025-05-28 15:09     ` David Hildenbrand
2025-05-28 15:45       ` Oscar Salvador
2025-05-28 16:14         ` James Houghton
2025-05-28 16:24           ` Peter Xu
2025-05-28 16:16         ` Peter Xu
2025-05-28 20:00         ` David Hildenbrand
2025-05-28 20:26           ` David Hildenbrand
2025-05-28 21:34             ` Oscar Salvador [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aDeBUXCRLRZobHq0@localhost.localdomain \
    --to=osalvador@suse.de \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=gavinguo@igalia.com \
    --cc=gshan@redhat.com \
    --cc=hughd@google.com \
    --cc=kernel-dev@igalia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=muchun.song@linux.dev \
    --cc=peterx@redhat.com \
    --cc=revest@google.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.