All of lore.kernel.org
 help / color / mirror / Atom feed
From: Oscar Salvador <osalvador@suse.de>
To: Peter Xu <peterx@redhat.com>
Cc: riel@surriel.com, linux-kernel@vger.kernel.org,
	kernel-team@meta.com, linux-mm@kvack.org,
	akpm@linux-foundation.org, muchun.song@linux.dev,
	mike.kravetz@oracle.com, leit@meta.com, willy@infradead.org,
	stable@kernel.org, Ackerley Tng <ackerleytng@google.com>
Subject: Re: [PATCH 2/4] hugetlbfs: extend hugetlb_vma_lock to private VMAs
Date: Tue, 12 Nov 2024 12:24:00 +0100	[thread overview]
Message-ID: <ZzM60CLkWKwFzWqa@localhost.localdomain> (raw)
In-Reply-To: <Zy0gqwIw5Y3IuNTD@x1n>

On Thu, Nov 07, 2024 at 03:18:51PM -0500, Peter Xu wrote:
> +Ackerley +Oscar
> 
> I'm reading the resv code recently and just stumbled upon this. So want to
> raise this question.
> 
> IIUC __vma_private_lock() will return false for MAP_PRIVATE hugetlb vma if
> the vma is dup()ed from a fork(), with/without commit 187da0f8250a
> ("hugetlb: fix null-ptr-deref in hugetlb_vma_lock_write") which fixed a
> slightly different issue.
> 
> The problem is the current vma lock for private mmap() is based on the resv
> map, and the resv map only belongs to the process that mmap()ed this
> private vma.  E.g. dup_mmap() has:
> 
>                 if (is_vm_hugetlb_page(tmp))
>                         hugetlb_dup_vma_private(tmp);
> 
> Which does:
> 
> 	if (vma->vm_flags & VM_MAYSHARE) {
>                 ...
> 	} else
> 		vma->vm_private_data = NULL; <---------------------
> 
> So even if I don't know how many of us are even using hugetlb PRIVATE +
> fork(), assuming that's the most controversial use case that I'm aware of
> on hugetlb that people complains about.. with some tricky changes like
> 04f2cbe35699.. Just still want to raise this pure question, that after a
> fork() on private vma, and if I read it alright, lock/unlock operations may
> become noop..

I have been taking a look at this, and yes, __vma_private_lock will
return false for private hugetlb mappings that were forked .

I quickly checked what protects what and we currently have:

hugetlb_vma_lock_read - copy_hugetlb_page_range (only sharing)
hugetlb_vma_lock_read - hugetlb_wp (only for HPAGE_RESV_OWNER)
hugetlb_vma_lock_read - hugetlb_fault , protects huge_pmd_unshare?
hugetlb_vma_lock_read - pagewalks

hugetlb_vma_lock_write - hugetlb_change_protection
hugetlb_vma_lock_write - hugetlb_unshare_pmds
hugetlb_vma_lock_wirte - move_hugetlb_page_tables
hugetlb_vma_lock_wirte - _hugetlb_zap_begin (unmap_vmas)

the ones taking the hugetlb_vma_lock in write (so, the last four) also
take the i_mmap_lock_write (vma->vm_file->f_mapping), and AFAIK, hugetlb
mappings, private or not, should have vma->vm_file->f_mapping set.

Which means that technically we cannot race between hugetlb_change_protection
and move_hugetlb_page_tables etc.

But, checking 

 commit bf4916922c60f43efaa329744b3eef539aa6a2b2
 Author: Rik van Riel <riel@surriel.com>
 Date:   Thu Oct 5 23:59:07 2023 -0400
 
     hugetlbfs: extend hugetlb_vma_lock to private VMAs

which its motivation was to protect MADV_DONTNEED vs page_faults, I do
not see how it gets protected with private hugetlb mappings that were
dupped (forked).

 madvise_dontneed_single_vma
  zap_page_range_single
   _hugetlb_zap_begin
    hugetlb_vma_lock_write - noop for mappings that do not own the reservation
    i_mmap_lock_write

But the hugetlb_fault path only takes hugetlb_vma_lock_*, so theorically
we still could race between page_fault vs madvise_dontneed_single_vma?

A quick way to prove would be map a hugetlb private mapping, fork it and
have two threads tryong to madvise(MADV_DONTNEED) and the other trying
to write to it?

I do not know, maybe we are protected in some other way I cannot see
right now.

I will have a look.

 

-- 
Oscar Salvador
SUSE Labs


  reply	other threads:[~2024-11-12 11:24 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-06  3:59 [PATCH v7 0/4] hugetlbfs: close race between MADV_DONTNEED and page fault riel
2023-10-06  3:59 ` [PATCH 1/4] hugetlbfs: clear resv_map pointer if mmap fails riel
2023-10-06  3:59 ` [PATCH 2/4] hugetlbfs: extend hugetlb_vma_lock to private VMAs riel
2024-11-07 20:18   ` Peter Xu
2024-11-12 11:24     ` Oscar Salvador [this message]
2023-10-06  3:59 ` [PATCH 3/4] hugetlbfs: close race between MADV_DONTNEED and page fault riel
2023-10-06 17:57   ` Mike Kravetz
2023-10-06  3:59 ` [PATCH 4/4] hugetlbfs: replace hugetlb_vma_lock with invalidate_lock riel
2023-10-17  0:52   ` Mike Kravetz
2023-10-17 13:47     ` Rik van Riel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZzM60CLkWKwFzWqa@localhost.localdomain \
    --to=osalvador@suse.de \
    --cc=ackerleytng@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=kernel-team@meta.com \
    --cc=leit@meta.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=muchun.song@linux.dev \
    --cc=peterx@redhat.com \
    --cc=riel@surriel.com \
    --cc=stable@kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.