linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Jerome Glisse <jglisse@redhat.com>
To: Peter Xu <peterx@redhat.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	David Hildenbrand <david@redhat.com>,
	Hugh Dickins <hughd@google.com>, Maya Gokhale <gokhale2@llnl.gov>,
	Pavel Emelyanov <xemul@virtuozzo.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Martin Cracauer <cracauer@cons.org>, Shaohua Li <shli@fb.com>,
	Marty McFadden <mcfadden8@llnl.gov>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	Denis Plotnikov <dplotnikov@virtuozzo.com>,
	Mike Rapoport <rppt@linux.vnet.ibm.com>,
	Mel Gorman <mgorman@suse.de>,
	"Kirill A . Shutemov" <kirill@shutemov.name>,
	"Dr . David Alan Gilbert" <dgilbert@redhat.com>
Subject: Re: [PATCH v2.1 04/26] mm: allow VM_FAULT_RETRY for multiple times
Date: Thu, 21 Feb 2019 10:53:11 -0500	[thread overview]
Message-ID: <20190221155311.GD2813@redhat.com> (raw)
In-Reply-To: <20190221085656.18529-1-peterx@redhat.com>

On Thu, Feb 21, 2019 at 04:56:56PM +0800, Peter Xu wrote:
> The idea comes from a discussion between Linus and Andrea [1].
> 
> Before this patch we only allow a page fault to retry once.  We
> achieved this by clearing the FAULT_FLAG_ALLOW_RETRY flag when doing
> handle_mm_fault() the second time.  This was majorly used to avoid
> unexpected starvation of the system by looping over forever to handle
> the page fault on a single page.  However that should hardly happen,
> and after all for each code path to return a VM_FAULT_RETRY we'll
> first wait for a condition (during which time we should possibly yield
> the cpu) to happen before VM_FAULT_RETRY is really returned.
> 
> This patch removes the restriction by keeping the
> FAULT_FLAG_ALLOW_RETRY flag when we receive VM_FAULT_RETRY.  It means
> that the page fault handler now can retry the page fault for multiple
> times if necessary without the need to generate another page fault
> event.  Meanwhile we still keep the FAULT_FLAG_TRIED flag so page
> fault handler can still identify whether a page fault is the first
> attempt or not.
> 
> Then we'll have these combinations of fault flags (only considering
> ALLOW_RETRY flag and TRIED flag):
> 
>   - ALLOW_RETRY and !TRIED:  this means the page fault allows to
>                              retry, and this is the first try
> 
>   - ALLOW_RETRY and TRIED:   this means the page fault allows to
>                              retry, and this is not the first try
> 
>   - !ALLOW_RETRY and !TRIED: this means the page fault does not allow
>                              to retry at all
> 
>   - !ALLOW_RETRY and TRIED:  this is forbidden and should never be used
> 
> In existing code we have multiple places that has taken special care
> of the first condition above by checking against (fault_flags &
> FAULT_FLAG_ALLOW_RETRY).  This patch introduces a simple helper to
> detect the first retry of a page fault by checking against
> both (fault_flags & FAULT_FLAG_ALLOW_RETRY) and !(fault_flag &
> FAULT_FLAG_TRIED) because now even the 2nd try will have the
> ALLOW_RETRY set, then use that helper in all existing special paths.
> One example is in __lock_page_or_retry(), now we'll drop the mmap_sem
> only in the first attempt of page fault and we'll keep it in follow up
> retries, so old locking behavior will be retained.
> 
> This will be a nice enhancement for current code [2] at the same time
> a supporting material for the future userfaultfd-writeprotect work,
> since in that work there will always be an explicit userfault
> writeprotect retry for protected pages, and if that cannot resolve the
> page fault (e.g., when userfaultfd-writeprotect is used in conjunction
> with swapped pages) then we'll possibly need a 3rd retry of the page
> fault.  It might also benefit other potential users who will have
> similar requirement like userfault write-protection.
> 
> GUP code is not touched yet and will be covered in follow up patch.
> 
> Please read the thread below for more information.
> 
> [1] https://lkml.org/lkml/2017/11/2/833
> [2] https://lkml.org/lkml/2018/12/30/64

I have few comments on this one. See below.


> 
> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
> Suggested-by: Andrea Arcangeli <aarcange@redhat.com>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
> 
>  arch/alpha/mm/fault.c           |  2 +-
>  arch/arc/mm/fault.c             |  1 -
>  arch/arm/mm/fault.c             |  3 ---
>  arch/arm64/mm/fault.c           |  5 -----
>  arch/hexagon/mm/vm_fault.c      |  1 -
>  arch/ia64/mm/fault.c            |  1 -
>  arch/m68k/mm/fault.c            |  3 ---
>  arch/microblaze/mm/fault.c      |  1 -
>  arch/mips/mm/fault.c            |  1 -
>  arch/nds32/mm/fault.c           |  1 -
>  arch/nios2/mm/fault.c           |  3 ---
>  arch/openrisc/mm/fault.c        |  1 -
>  arch/parisc/mm/fault.c          |  2 --
>  arch/powerpc/mm/fault.c         |  6 ------
>  arch/riscv/mm/fault.c           |  5 -----
>  arch/s390/mm/fault.c            |  5 +----
>  arch/sh/mm/fault.c              |  1 -
>  arch/sparc/mm/fault_32.c        |  1 -
>  arch/sparc/mm/fault_64.c        |  1 -
>  arch/um/kernel/trap.c           |  1 -
>  arch/unicore32/mm/fault.c       |  6 +-----
>  arch/x86/mm/fault.c             |  2 --
>  arch/xtensa/mm/fault.c          |  1 -
>  drivers/gpu/drm/ttm/ttm_bo_vm.c | 12 +++++++++---
>  include/linux/mm.h              | 12 +++++++++++-
>  mm/filemap.c                    |  2 +-
>  mm/shmem.c                      |  2 +-
>  27 files changed, 25 insertions(+), 57 deletions(-)
> 

[...]

> diff --git a/arch/parisc/mm/fault.c b/arch/parisc/mm/fault.c
> index 29422eec329d..7d3e96a9a7ab 100644
> --- a/arch/parisc/mm/fault.c
> +++ b/arch/parisc/mm/fault.c
> @@ -327,8 +327,6 @@ void do_page_fault(struct pt_regs *regs, unsigned long code,
>  		else
>  			current->min_flt++;
>  		if (fault & VM_FAULT_RETRY) {
> -			flags &= ~FAULT_FLAG_ALLOW_RETRY;

Don't you need to also add:
     flags |= FAULT_FLAG_TRIED;

Like other arch.


[...]

> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> index 248ff0a28ecd..d842c3e02a50 100644
> --- a/arch/x86/mm/fault.c
> +++ b/arch/x86/mm/fault.c
> @@ -1483,9 +1483,7 @@ void do_user_addr_fault(struct pt_regs *regs,
>  	if (unlikely(fault & VM_FAULT_RETRY)) {
>  		bool is_user = flags & FAULT_FLAG_USER;
>  
> -		/* Retry at most once */
>  		if (flags & FAULT_FLAG_ALLOW_RETRY) {
> -			flags &= ~FAULT_FLAG_ALLOW_RETRY;
>  			flags |= FAULT_FLAG_TRIED;
>  			if (is_user && signal_pending(tsk))
>  				return;

So here you have a change in behavior, it can retry indefinitly for as
long as they are no signal. Don't you want so test for FAULT_FLAG_TRIED ?

[...]

> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 80bb6408fe73..4e11c9639f1b 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -341,11 +341,21 @@ extern pgprot_t protection_map[16];
>  #define FAULT_FLAG_ALLOW_RETRY	0x04	/* Retry fault if blocking */
>  #define FAULT_FLAG_RETRY_NOWAIT	0x08	/* Don't drop mmap_sem and wait when retrying */
>  #define FAULT_FLAG_KILLABLE	0x10	/* The fault task is in SIGKILL killable region */
> -#define FAULT_FLAG_TRIED	0x20	/* Second try */
> +#define FAULT_FLAG_TRIED	0x20	/* We've tried once */
>  #define FAULT_FLAG_USER		0x40	/* The fault originated in userspace */
>  #define FAULT_FLAG_REMOTE	0x80	/* faulting for non current tsk/mm */
>  #define FAULT_FLAG_INSTRUCTION  0x100	/* The fault was during an instruction fetch */
>  
> +/*
> + * Returns true if the page fault allows retry and this is the first
> + * attempt of the fault handling; false otherwise.
> + */

You should add why it returns false if it is not the first try ie to
avoid starvation.

> +static inline bool fault_flag_allow_retry_first(unsigned int flags)
> +{
> +	return (flags & FAULT_FLAG_ALLOW_RETRY) &&
> +	    (!(flags & FAULT_FLAG_TRIED));
> +}
> +
>  #define FAULT_FLAG_TRACE \
>  	{ FAULT_FLAG_WRITE,		"WRITE" }, \
>  	{ FAULT_FLAG_MKWRITE,		"MKWRITE" }, \

[...]


  reply	other threads:[~2019-02-21 15:53 UTC|newest]

Thread overview: 113+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-12  2:56 [PATCH v2 00/26] userfaultfd: write protection support Peter Xu
2019-02-12  2:56 ` [PATCH v2 01/26] mm: gup: rename "nonblocking" to "locked" where proper Peter Xu
2019-02-21 15:17   ` Jerome Glisse
2019-02-22  3:42     ` Peter Xu
2019-02-12  2:56 ` [PATCH v2 02/26] mm: userfault: return VM_FAULT_RETRY on signals Peter Xu
2019-02-21 15:29   ` Jerome Glisse
2019-02-22  3:51     ` Peter Xu
2019-02-12  2:56 ` [PATCH v2 03/26] userfaultfd: don't retake mmap_sem to emulate NOPAGE Peter Xu
2019-02-21 15:34   ` Jerome Glisse
2019-02-12  2:56 ` [PATCH v2 04/26] mm: allow VM_FAULT_RETRY for multiple times Peter Xu
2019-02-13  3:34   ` Peter Xu
2019-02-20 11:48     ` Peter Xu
2019-02-21  8:56   ` [PATCH v2.1 " Peter Xu
2019-02-21 15:53     ` Jerome Glisse [this message]
2019-02-22  4:25       ` Peter Xu
2019-02-22 15:11         ` Jerome Glisse
2019-02-25  6:19           ` Peter Xu
2019-02-12  2:56 ` [PATCH v2 05/26] mm: gup: " Peter Xu
2019-02-21 16:06   ` Jerome Glisse
2019-02-22  4:41     ` Peter Xu
2019-02-22 15:13       ` Jerome Glisse
2019-02-12  2:56 ` [PATCH v2 06/26] userfaultfd: wp: add helper for writeprotect check Peter Xu
2019-02-21 16:07   ` Jerome Glisse
2019-02-25 15:41   ` Mike Rapoport
2019-02-12  2:56 ` [PATCH v2 07/26] userfaultfd: wp: hook userfault handler to write protection fault Peter Xu
2019-02-21 16:25   ` Jerome Glisse
2019-02-25 15:43   ` Mike Rapoport
2019-02-12  2:56 ` [PATCH v2 08/26] userfaultfd: wp: add WP pagetable tracking to x86 Peter Xu
2019-02-21 17:20   ` Jerome Glisse
2019-02-25 15:48   ` Mike Rapoport
2019-02-12  2:56 ` [PATCH v2 09/26] userfaultfd: wp: userfaultfd_pte/huge_pmd_wp() helpers Peter Xu
2019-02-21 17:21   ` Jerome Glisse
2019-02-25 17:12   ` Mike Rapoport
2019-02-12  2:56 ` [PATCH v2 10/26] userfaultfd: wp: add UFFDIO_COPY_MODE_WP Peter Xu
2019-02-21 17:29   ` Jerome Glisse
2019-02-22  7:11     ` Peter Xu
2019-02-22 15:15       ` Jerome Glisse
2019-02-25  6:45         ` Peter Xu
2019-02-25 15:58   ` Mike Rapoport
2019-02-26  5:09     ` Peter Xu
2019-02-26  8:28       ` Mike Rapoport
2019-02-12  2:56 ` [PATCH v2 11/26] mm: merge parameters for change_protection() Peter Xu
2019-02-21 17:32   ` Jerome Glisse
2019-02-12  2:56 ` [PATCH v2 12/26] userfaultfd: wp: apply _PAGE_UFFD_WP bit Peter Xu
2019-02-21 17:44   ` Jerome Glisse
2019-02-22  7:31     ` Peter Xu
2019-02-22 15:17       ` Jerome Glisse
2019-02-25 18:00   ` Mike Rapoport
2019-02-12  2:56 ` [PATCH v2 13/26] mm: export wp_page_copy() Peter Xu
2019-02-21 17:44   ` Jerome Glisse
2019-02-12  2:56 ` [PATCH v2 14/26] userfaultfd: wp: handle COW properly for uffd-wp Peter Xu
2019-02-21 18:04   ` Jerome Glisse
2019-02-22  8:46     ` Peter Xu
2019-02-22 15:35       ` Jerome Glisse
2019-02-25  7:13         ` Peter Xu
2019-02-25 15:32           ` Jerome Glisse
2019-02-12  2:56 ` [PATCH v2 15/26] userfaultfd: wp: drop _PAGE_UFFD_WP properly when fork Peter Xu
2019-02-21 18:06   ` Jerome Glisse
2019-02-22  9:09     ` Peter Xu
2019-02-22 15:36       ` Jerome Glisse
2019-02-25 18:19   ` Mike Rapoport
2019-02-12  2:56 ` [PATCH v2 16/26] userfaultfd: wp: add pmd_swp_*uffd_wp() helpers Peter Xu
2019-02-21 18:07   ` Jerome Glisse
2019-02-25 18:20   ` Mike Rapoport
2019-02-12  2:56 ` [PATCH v2 17/26] userfaultfd: wp: support swap and page migration Peter Xu
2019-02-21 18:16   ` Jerome Glisse
2019-02-25  7:48     ` Peter Xu
2019-02-25 18:28   ` Mike Rapoport
2019-02-12  2:56 ` [PATCH v2 18/26] khugepaged: skip collapse if uffd-wp detected Peter Xu
2019-02-21 18:17   ` Jerome Glisse
2019-02-25 18:50   ` Mike Rapoport
2019-02-12  2:56 ` [PATCH v2 19/26] userfaultfd: introduce helper vma_find_uffd Peter Xu
2019-02-21 18:19   ` Jerome Glisse
2019-02-25 20:48   ` Mike Rapoport
2019-02-12  2:56 ` [PATCH v2 20/26] userfaultfd: wp: support write protection for userfault vma range Peter Xu
2019-02-21 18:23   ` Jerome Glisse
2019-02-25  8:16     ` Peter Xu
2019-02-25 20:52   ` Mike Rapoport
2019-02-26  6:06     ` Peter Xu
2019-02-26  6:43       ` Mike Rapoport
2019-02-26  7:20         ` Peter Xu
2019-02-26  7:46           ` Mike Rapoport
2019-02-26  7:54             ` Peter Xu
2019-02-12  2:56 ` [PATCH v2 21/26] userfaultfd: wp: add the writeprotect API to userfaultfd ioctl Peter Xu
2019-02-21 18:28   ` Jerome Glisse
2019-02-25  8:31     ` Peter Xu
2019-02-25 21:03   ` Mike Rapoport
2019-02-26  6:30     ` Peter Xu
2019-02-12  2:56 ` [PATCH v2 22/26] userfaultfd: wp: enabled write protection in userfaultfd API Peter Xu
2019-02-21 18:29   ` Jerome Glisse
2019-02-25  8:34     ` Peter Xu
2019-02-12  2:56 ` [PATCH v2 23/26] userfaultfd: wp: don't wake up when doing write protect Peter Xu
2019-02-21 18:36   ` Jerome Glisse
2019-02-25  8:58     ` Peter Xu
2019-02-25 21:15       ` Mike Rapoport
2019-02-25 21:09   ` Mike Rapoport
2019-02-26  6:24     ` Peter Xu
2019-02-26  7:29       ` Mike Rapoport
2019-02-26  7:41         ` Peter Xu
2019-02-26  8:00           ` Mike Rapoport
2019-02-28  2:47             ` Peter Xu
2019-02-26  8:00   ` Mike Rapoport
2019-02-12  2:56 ` [PATCH v2 24/26] userfaultfd: wp: UFFDIO_REGISTER_MODE_WP documentation update Peter Xu
2019-02-21 18:38   ` Jerome Glisse
2019-02-25 21:19   ` Mike Rapoport
2019-02-26  6:53     ` Peter Xu
2019-02-26  7:04       ` Mike Rapoport
2019-02-26  7:42         ` Peter Xu
2019-02-12  2:56 ` [PATCH v2 25/26] userfaultfd: selftests: refactor statistics Peter Xu
2019-02-26  6:50   ` Mike Rapoport
2019-02-12  2:56 ` [PATCH v2 26/26] userfaultfd: selftests: add write-protect test Peter Xu
2019-02-26  6:58   ` Mike Rapoport
2019-02-26  7:52     ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190221155311.GD2813@redhat.com \
    --to=jglisse@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=cracauer@cons.org \
    --cc=david@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=dplotnikov@virtuozzo.com \
    --cc=gokhale2@llnl.gov \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mcfadden8@llnl.gov \
    --cc=mgorman@suse.de \
    --cc=mike.kravetz@oracle.com \
    --cc=peterx@redhat.com \
    --cc=rppt@linux.vnet.ibm.com \
    --cc=shli@fb.com \
    --cc=xemul@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).