Re: [PATCH v5 5/6] mm: handle swap page faults under per-VMA lock

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Alistair Popple <apopple@nvidia.com>
To: Suren Baghdasaryan <surenb@google.com>
Cc: akpm@linux-foundation.org, willy@infradead.org,
	hannes@cmpxchg.org, mhocko@suse.com, josef@toxicpanda.com,
	jack@suse.cz, ldufour@linux.ibm.com, laurent.dufour@fr.ibm.com,
	michel@lespinasse.org, liam.howlett@oracle.com,
	jglisse@google.com, vbabka@suse.cz, minchan@google.com,
	dave@stgolabs.net, punit.agrawal@bytedance.com,
	lstoakes@gmail.com, hdanton@sina.com, peterx@redhat.com,
	ying.huang@intel.com, david@redhat.com, yuzhao@google.com,
	dhowells@redhat.com, hughd@google.com, viro@zeniv.linux.org.uk,
	brauner@kernel.org, pasha.tatashin@soleen.com,
	linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org, kernel-team@android.com
Subject: Re: [PATCH v5 5/6] mm: handle swap page faults under per-VMA lock
Date: Thu, 29 Jun 2023 16:04:58 +1000	[thread overview]
Message-ID: <877crm246q.fsf@nvdebian.thelocal> (raw)
In-Reply-To: <20230628172529.744839-6-surenb@google.com>


Looks good and passed the HMM selftests. So:

Tested-by: Alistair Popple <apopple@nvidia.com>
Reviewed-by: Alistair Popple <apopple@nvidia.com>

Suren Baghdasaryan <surenb@google.com> writes:

> When page fault is handled under per-VMA lock protection, all swap page
> faults are retried with mmap_lock because folio_lock_or_retry has to drop
> and reacquire mmap_lock if folio could not be immediately locked.
> Follow the same pattern as mmap_lock to drop per-VMA lock when waiting
> for folio and retrying once folio is available.
> With this obstacle removed, enable do_swap_page to operate under
> per-VMA lock protection. Drivers implementing ops->migrate_to_ram might
> still rely on mmap_lock, therefore we have to fall back to mmap_lock in
> that particular case.
> Note that the only time do_swap_page calls synchronous swap_readpage
> is when SWP_SYNCHRONOUS_IO is set, which is only set for
> QUEUE_FLAG_SYNCHRONOUS devices: brd, zram and nvdimms (both btt and
> pmem). Therefore we don't sleep in this path, and there's no need to
> drop the mmap or per-VMA lock.
>
> Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> Acked-by: Peter Xu <peterx@redhat.com>
> ---
>  include/linux/mm.h | 13 +++++++++++++
>  mm/filemap.c       | 17 ++++++++---------
>  mm/memory.c        | 16 ++++++++++------
>  3 files changed, 31 insertions(+), 15 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index fec149585985..bbaec479bf98 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -723,6 +723,14 @@ static inline void vma_mark_detached(struct vm_area_struct *vma, bool detached)
>  struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm,
>  					  unsigned long address);
>  
> +static inline void release_fault_lock(struct vm_fault *vmf)
> +{
> +	if (vmf->flags & FAULT_FLAG_VMA_LOCK)
> +		vma_end_read(vmf->vma);
> +	else
> +		mmap_read_unlock(vmf->vma->vm_mm);
> +}
> +
>  #else /* CONFIG_PER_VMA_LOCK */
>  
>  static inline void vma_init_lock(struct vm_area_struct *vma) {}
> @@ -736,6 +744,11 @@ static inline void vma_assert_write_locked(struct vm_area_struct *vma) {}
>  static inline void vma_mark_detached(struct vm_area_struct *vma,
>  				     bool detached) {}
>  
> +static inline void release_fault_lock(struct vm_fault *vmf)
> +{
> +	mmap_read_unlock(vmf->vma->vm_mm);
> +}
> +
>  #endif /* CONFIG_PER_VMA_LOCK */
>  
>  /*
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 52bcf12dcdbf..d4d8f474e0c5 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -1703,27 +1703,26 @@ static int __folio_lock_async(struct folio *folio, struct wait_page_queue *wait)
>   * Return values:
>   * 0 - folio is locked.
>   * VM_FAULT_RETRY - folio is not locked.
> - *     mmap_lock has been released (mmap_read_unlock(), unless flags had both
> - *     FAULT_FLAG_ALLOW_RETRY and FAULT_FLAG_RETRY_NOWAIT set, in
> - *     which case mmap_lock is still held.
> + *     mmap_lock or per-VMA lock has been released (mmap_read_unlock() or
> + *     vma_end_read()), unless flags had both FAULT_FLAG_ALLOW_RETRY and
> + *     FAULT_FLAG_RETRY_NOWAIT set, in which case the lock is still held.
>   *
>   * If neither ALLOW_RETRY nor KILLABLE are set, will always return 0
> - * with the folio locked and the mmap_lock unperturbed.
> + * with the folio locked and the mmap_lock/per-VMA lock is left unperturbed.
>   */
>  vm_fault_t __folio_lock_or_retry(struct folio *folio, struct vm_fault *vmf)
>  {
> -	struct mm_struct *mm = vmf->vma->vm_mm;
>  	unsigned int flags = vmf->flags;
>  
>  	if (fault_flag_allow_retry_first(flags)) {
>  		/*
> -		 * CAUTION! In this case, mmap_lock is not released
> -		 * even though return VM_FAULT_RETRY.
> +		 * CAUTION! In this case, mmap_lock/per-VMA lock is not
> +		 * released even though returning VM_FAULT_RETRY.
>  		 */
>  		if (flags & FAULT_FLAG_RETRY_NOWAIT)
>  			return VM_FAULT_RETRY;
>  
> -		mmap_read_unlock(mm);
> +		release_fault_lock(vmf);
>  		if (flags & FAULT_FLAG_KILLABLE)
>  			folio_wait_locked_killable(folio);
>  		else
> @@ -1735,7 +1734,7 @@ vm_fault_t __folio_lock_or_retry(struct folio *folio, struct vm_fault *vmf)
>  
>  		ret = __folio_lock_killable(folio);
>  		if (ret) {
> -			mmap_read_unlock(mm);
> +			release_fault_lock(vmf);
>  			return VM_FAULT_RETRY;
>  		}
>  	} else {
> diff --git a/mm/memory.c b/mm/memory.c
> index 345080052003..4fb8ecfc6d13 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3712,12 +3712,6 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
>  	if (!pte_unmap_same(vmf))
>  		goto out;
>  
> -	if (vmf->flags & FAULT_FLAG_VMA_LOCK) {
> -		ret = VM_FAULT_RETRY;
> -		vma_end_read(vma);
> -		goto out;
> -	}
> -
>  	entry = pte_to_swp_entry(vmf->orig_pte);
>  	if (unlikely(non_swap_entry(entry))) {
>  		if (is_migration_entry(entry)) {
> @@ -3727,6 +3721,16 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
>  			vmf->page = pfn_swap_entry_to_page(entry);
>  			ret = remove_device_exclusive_entry(vmf);
>  		} else if (is_device_private_entry(entry)) {
> +			if (vmf->flags & FAULT_FLAG_VMA_LOCK) {
> +				/*
> +				 * migrate_to_ram is not yet ready to operate
> +				 * under VMA lock.
> +				 */
> +				vma_end_read(vma);
> +				ret = VM_FAULT_RETRY;
> +				goto out;
> +			}
> +
>  			vmf->page = pfn_swap_entry_to_page(entry);
>  			vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd,
>  					vmf->address, &vmf->ptl);

next prev parent reply	other threads:[~2023-06-29  6:06 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-28 17:25 [PATCH v5 0/6] Per-VMA lock support for swap and userfaults Suren Baghdasaryan
2023-06-28 17:25 ` [PATCH v5 1/6] swap: remove remnants of polling from read_swap_cache_async Suren Baghdasaryan
2023-06-28 17:25 ` [PATCH v5 2/6] mm: add missing VM_FAULT_RESULT_TRACE name for VM_FAULT_COMPLETED Suren Baghdasaryan
2023-06-28 17:25 ` [PATCH v5 3/6] mm: drop per-VMA lock when returning VM_FAULT_RETRY or VM_FAULT_COMPLETED Suren Baghdasaryan
2023-06-28 17:25 ` [PATCH v5 4/6] mm: change folio_lock_or_retry to use vm_fault directly Suren Baghdasaryan
2023-06-28 17:25 ` [PATCH v5 5/6] mm: handle swap page faults under per-VMA lock Suren Baghdasaryan
2023-06-29  6:04   ` Alistair Popple [this message]
2023-06-30  1:30     ` Suren Baghdasaryan
2023-06-28 17:25 ` [PATCH v5 6/6] mm: handle userfaults under VMA lock Suren Baghdasaryan
2023-06-28 17:32   ` Peter Xu
2023-06-29  0:19     ` Suren Baghdasaryan
2023-06-29 16:32       ` Peter Xu
2023-06-29 16:39         ` Suren Baghdasaryan
2023-06-30  2:07           ` Suren Baghdasaryan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=877crm246q.fsf@nvdebian.thelocal \
    --to=apopple@nvidia.com \
    --cc=akpm@linux-foundation.org \
    --cc=brauner@kernel.org \
    --cc=dave@stgolabs.net \
    --cc=david@redhat.com \
    --cc=dhowells@redhat.com \
    --cc=hannes@cmpxchg.org \
    --cc=hdanton@sina.com \
    --cc=hughd@google.com \
    --cc=jack@suse.cz \
    --cc=jglisse@google.com \
    --cc=josef@toxicpanda.com \
    --cc=kernel-team@android.com \
    --cc=laurent.dufour@fr.ibm.com \
    --cc=ldufour@linux.ibm.com \
    --cc=liam.howlett@oracle.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lstoakes@gmail.com \
    --cc=mhocko@suse.com \
    --cc=michel@lespinasse.org \
    --cc=minchan@google.com \
    --cc=pasha.tatashin@soleen.com \
    --cc=peterx@redhat.com \
    --cc=punit.agrawal@bytedance.com \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    --cc=ying.huang@intel.com \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.