From: Oscar Salvador <osalvador@suse.de>
To: Peter Xu <peterx@redhat.com>
Cc: riel@surriel.com, linux-kernel@vger.kernel.org,
kernel-team@meta.com, linux-mm@kvack.org,
akpm@linux-foundation.org, muchun.song@linux.dev,
mike.kravetz@oracle.com, leit@meta.com, willy@infradead.org,
stable@kernel.org, Ackerley Tng <ackerleytng@google.com>
Subject: Re: [PATCH 2/4] hugetlbfs: extend hugetlb_vma_lock to private VMAs
Date: Tue, 12 Nov 2024 12:24:00 +0100 [thread overview]
Message-ID: <ZzM60CLkWKwFzWqa@localhost.localdomain> (raw)
In-Reply-To: <Zy0gqwIw5Y3IuNTD@x1n>
On Thu, Nov 07, 2024 at 03:18:51PM -0500, Peter Xu wrote:
> +Ackerley +Oscar
>
> I'm reading the resv code recently and just stumbled upon this. So want to
> raise this question.
>
> IIUC __vma_private_lock() will return false for MAP_PRIVATE hugetlb vma if
> the vma is dup()ed from a fork(), with/without commit 187da0f8250a
> ("hugetlb: fix null-ptr-deref in hugetlb_vma_lock_write") which fixed a
> slightly different issue.
>
> The problem is the current vma lock for private mmap() is based on the resv
> map, and the resv map only belongs to the process that mmap()ed this
> private vma. E.g. dup_mmap() has:
>
> if (is_vm_hugetlb_page(tmp))
> hugetlb_dup_vma_private(tmp);
>
> Which does:
>
> if (vma->vm_flags & VM_MAYSHARE) {
> ...
> } else
> vma->vm_private_data = NULL; <---------------------
>
> So even if I don't know how many of us are even using hugetlb PRIVATE +
> fork(), assuming that's the most controversial use case that I'm aware of
> on hugetlb that people complains about.. with some tricky changes like
> 04f2cbe35699.. Just still want to raise this pure question, that after a
> fork() on private vma, and if I read it alright, lock/unlock operations may
> become noop..
I have been taking a look at this, and yes, __vma_private_lock will
return false for private hugetlb mappings that were forked .
I quickly checked what protects what and we currently have:
hugetlb_vma_lock_read - copy_hugetlb_page_range (only sharing)
hugetlb_vma_lock_read - hugetlb_wp (only for HPAGE_RESV_OWNER)
hugetlb_vma_lock_read - hugetlb_fault , protects huge_pmd_unshare?
hugetlb_vma_lock_read - pagewalks
hugetlb_vma_lock_write - hugetlb_change_protection
hugetlb_vma_lock_write - hugetlb_unshare_pmds
hugetlb_vma_lock_wirte - move_hugetlb_page_tables
hugetlb_vma_lock_wirte - _hugetlb_zap_begin (unmap_vmas)
the ones taking the hugetlb_vma_lock in write (so, the last four) also
take the i_mmap_lock_write (vma->vm_file->f_mapping), and AFAIK, hugetlb
mappings, private or not, should have vma->vm_file->f_mapping set.
Which means that technically we cannot race between hugetlb_change_protection
and move_hugetlb_page_tables etc.
But, checking
commit bf4916922c60f43efaa329744b3eef539aa6a2b2
Author: Rik van Riel <riel@surriel.com>
Date: Thu Oct 5 23:59:07 2023 -0400
hugetlbfs: extend hugetlb_vma_lock to private VMAs
which its motivation was to protect MADV_DONTNEED vs page_faults, I do
not see how it gets protected with private hugetlb mappings that were
dupped (forked).
madvise_dontneed_single_vma
zap_page_range_single
_hugetlb_zap_begin
hugetlb_vma_lock_write - noop for mappings that do not own the reservation
i_mmap_lock_write
But the hugetlb_fault path only takes hugetlb_vma_lock_*, so theorically
we still could race between page_fault vs madvise_dontneed_single_vma?
A quick way to prove would be map a hugetlb private mapping, fork it and
have two threads tryong to madvise(MADV_DONTNEED) and the other trying
to write to it?
I do not know, maybe we are protected in some other way I cannot see
right now.
I will have a look.
--
Oscar Salvador
SUSE Labs
next prev parent reply other threads:[~2024-11-12 11:24 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-06 3:59 [PATCH v7 0/4] hugetlbfs: close race between MADV_DONTNEED and page fault riel
2023-10-06 3:59 ` [PATCH 1/4] hugetlbfs: clear resv_map pointer if mmap fails riel
2023-10-06 3:59 ` [PATCH 2/4] hugetlbfs: extend hugetlb_vma_lock to private VMAs riel
2024-11-07 20:18 ` Peter Xu
2024-11-12 11:24 ` Oscar Salvador [this message]
2023-10-06 3:59 ` [PATCH 3/4] hugetlbfs: close race between MADV_DONTNEED and page fault riel
2023-10-06 17:57 ` Mike Kravetz
2023-10-06 3:59 ` [PATCH 4/4] hugetlbfs: replace hugetlb_vma_lock with invalidate_lock riel
2023-10-17 0:52 ` Mike Kravetz
2023-10-17 13:47 ` Rik van Riel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZzM60CLkWKwFzWqa@localhost.localdomain \
--to=osalvador@suse.de \
--cc=ackerleytng@google.com \
--cc=akpm@linux-foundation.org \
--cc=kernel-team@meta.com \
--cc=leit@meta.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mike.kravetz@oracle.com \
--cc=muchun.song@linux.dev \
--cc=peterx@redhat.com \
--cc=riel@surriel.com \
--cc=stable@kernel.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.