linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Oscar Salvador <osalvador@suse.de>
To: Peter Xu <peterx@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Muchun Song <muchun.song@linux.dev>,
	David Hildenbrand <david@redhat.com>,
	James Houghton <jthoughton@google.com>,
	Gavin Guo <gavinguo@igalia.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH 1/3] mm, hugetlb: Clean up locking in hugetlb_fault and hugetlb_wp
Date: Tue, 10 Jun 2025 16:13:45 +0200	[thread overview]
Message-ID: <aEg9mTDDA1TXJ9by@localhost.localdomain> (raw)
In-Reply-To: <aD8_c9uEMn6NXXAX@x1.local>

On Tue, Jun 03, 2025 at 02:31:15PM -0400, Peter Xu wrote:
> > There're actually three places this patch touched, the 3rd one is
> > hugetlb_no_page(), in which case I also think we should lock it, not only
> > because file folios normally does it (see do_fault(), for example), but
> > also that's exactly what James mentioned I believe on possible race of
> > !uptodate hugetlb folio being injected by UFFDIO_CONTINUE, along the lines:
> > 
> > 		folio = alloc_hugetlb_folio(vma, vmf->address, false);
> >                 ...
> > 		folio_zero_user(folio, vmf->real_address);
> > 		__folio_mark_uptodate(folio);
> 
> I think I was wrong here at least partly..  So what this patch changed is
> only the lookup of the no_page path, hence what I said here doesn't apply.
> This patch also mentioned in the commit message on why James's concern was
> ok - the fault mutex was held.  Yes I agree. Actually even without fault
> mutex, the folio is only injected into page cache after mark uptodate.. so
> it looks fine even without the mutex.
> 
> Though it's still true there're three paths to be discussed, which should
> include no_page, and it's still needed to be discussed when any of us like
> to remove folio lock even in the lookup path.
> 
> For example, I'm not sure whether it's always thread safe to do
> folio_test_hwpoison() when without it, even with the fault mutex.
> 
> Sorry for the confusion, Oscar.

So, I'm having another go at this.
I started with replacing the check for cow-on-private-mappings [1].

I'm still to figure out how to approach the other locks though.
The thing is, we only need the lock in two cases:

1) folio is the original folio in the pagecache: We need to lock it to
   copy it over to another page (do_fault->do_cow_page locks the folio
   via filemap_fault).
   Although, the truth be told, we even lock the folio on read-fault in
   do_read_fault. Maybe that is just because __do_fault() ends up
   calling filemap_fault (so it doesn't differentiate?).
   But that is not a problem, just a note.
   So it is ok for hugetlb_no_page() to also take the lock when
   read-faulting.
   We need to take the lock here.

2) folio is anonymous: We need to take the lock when we want to check
   if we can re-use the folio in hugetlb_wp.
   Normal faulting path does it in wp_can_reuse_anon_folio().

Now, the scope for them is different.
Normal faulting path only locks the anonymous folio when calling
wp_can_reuse_anon_folio() (so, it isn't hold during the copy), while we
lock folios from the pagecache in filemap_fault() and keep the lock until
we have a) mapped it into our pagetables (read-fault) b) or we have
copied it over to another page.

Representing this difference of timespan in hugetlb is rather tricky as
is.
Maybe it could be done if we refactor hugetlb_{fault,no_page,wp} to look
more alike to the non-hugetlb faulting path.
E.g: hugetlb_fault could directly call hugetlb_wp if it sees that
FAULT_FLAG_WRITE is on, instead of doing a first pass to
hugetlb_no_page, and leave hugetlb_no_page for only RO stuff.
Not put much though on it, but just an idead of how do_fault() handles
it.

(In an ideal world hugetlb could be re-routed to generic faulting path
but we would need to sort out somehow the need to allocate folios,
reserves etc. so that is still far future)

Anyway, as I said, representing this different of timespan file vs
anonymous in hugetlb is tricky, so I think the current state of locks,
with [1] applied is fine.

hugetlb_fault:
               - locks the folio mapped in pte_orig
hugetlb_no_page:
               - locks the folio from the pagecache
	       - or allocates a new folio
	         - and inserts it into the pagecache and locks it
	         - and locks it (anonymous)

So, hugetlb_wp() already gets the folio locked and doesn't have to
bother about anything else.

Of course, while I think that locks can be keep as they are, I plan to
document them properly so the next poor soul that comes along doesn't
have to spend time trying to figure out why we do what we do.

Another thing of keeping the locks is that for the future we can
potentially rely less on the faulting mutex.


[1] https://github.com/leberus/linux/commit/b8e10b854fc3e427fd69d61d065798c7f31ebd55

 

-- 
Oscar Salvador
SUSE Labs


  reply	other threads:[~2025-06-10 14:13 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-02 14:16 [RFC PATCH 0/3] Clean up locking in hugetlb faulting code Oscar Salvador
2025-06-02 14:16 ` [RFC PATCH 1/3] mm, hugetlb: Clean up locking in hugetlb_fault and hugetlb_wp Oscar Salvador
2025-06-02 15:14   ` Peter Xu
2025-06-02 20:47     ` Oscar Salvador
2025-06-02 21:30       ` Peter Xu
2025-06-03 13:50         ` Oscar Salvador
2025-06-03 14:57           ` Peter Xu
2025-06-03 15:08             ` David Hildenbrand
2025-06-03 15:46               ` Peter Xu
2025-06-03 17:19                 ` David Hildenbrand
2025-06-03 19:11                   ` Peter Xu
2025-06-03 18:31             ` Peter Xu
2025-06-10 14:13               ` Oscar Salvador [this message]
2025-06-10 15:57                 ` Peter Xu
2025-06-03 15:12           ` David Hildenbrand
2025-06-02 14:16 ` [RFC PATCH 2/3] mm, hugetlb: Update comments in hugetlb_fault Oscar Salvador
2025-06-02 14:16 ` [RFC PATCH 3/3] mm, hugetlb: Drop unlikelys from hugetlb_fault Oscar Salvador
2025-06-16  3:21 ` [RFC PATCH 0/3] Clean up locking in hugetlb faulting code Gavin Guo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aEg9mTDDA1TXJ9by@localhost.localdomain \
    --to=osalvador@suse.de \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=gavinguo@igalia.com \
    --cc=jthoughton@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=muchun.song@linux.dev \
    --cc=peterx@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).