All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: Alistair Popple <apopple@nvidia.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Jason Gunthorpe <jgg@ziepe.ca>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	David Hildenbrand <david@redhat.com>,
	Matthew Wilcox <willy@infradead.org>,
	"Kirill A . Shutemov" <kirill@shutemov.name>,
	Hugh Dickins <hughd@google.com>,
	Tiberiu Georgescu <tiberiu.georgescu@nutanix.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Axel Rasmussen <axelrasmussen@google.com>,
	Nadav Amit <nadav.amit@gmail.com>,
	Mike Rapoport <rppt@linux.vnet.ibm.com>,
	Jerome Glisse <jglisse@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Miaohe Lin <linmiaohe@huawei.com>
Subject: Re: [PATCH v5 05/26] mm/swap: Introduce the idea of special swap ptes
Date: Thu, 22 Jul 2021 11:21:05 -0400	[thread overview]
Message-ID: <YPmM4ThMIde9FTbs@t490s> (raw)
In-Reply-To: <5071185.SEdLSG93TQ@nvdebian>

On Thu, Jul 22, 2021 at 11:08:53AM +1000, Alistair Popple wrote:
> On Thursday, 22 July 2021 7:35:32 AM AEST Peter Xu wrote:
> > On Wed, Jul 21, 2021 at 09:28:49PM +1000, Alistair Popple wrote:
> > > On Saturday, 17 July 2021 5:11:33 AM AEST Peter Xu wrote:
> > > > On Fri, Jul 16, 2021 at 03:50:52PM +1000, Alistair Popple wrote:
> > > > > Hi Peter,
> > > > > 
> > > > > [...]
> > > > > 
> > > > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > > > > > index ae1f5d0cb581..4b46c099ad94 100644
> > > > > > --- a/mm/memcontrol.c
> > > > > > +++ b/mm/memcontrol.c
> > > > > > @@ -5738,7 +5738,7 @@ static enum mc_target_type get_mctgt_type(struct vm_area_struct *vma,
> > > > > >  
> > > > > >  	if (pte_present(ptent))
> > > > > >  		page = mc_handle_present_pte(vma, addr, ptent);
> > > > > > -	else if (is_swap_pte(ptent))
> > > > > > +	else if (pte_has_swap_entry(ptent))
> > > > > >  		page = mc_handle_swap_pte(vma, ptent, &ent);
> > > > > >  	else if (pte_none(ptent))
> > > > > >  		page = mc_handle_file_pte(vma, addr, ptent, &ent);
> > > > > 
> > > > > As I understand things pte_none() == False for a special swap pte, but
> > > > > shouldn't this be treated as pte_none() here? Ie. does this need to be
> > > > > pte_none(ptent) || is_swap_special_pte() here?
> > > > 
> > > > Looks correct; here the page/swap cache could hide behind the special pte just
> > > > like a none pte.  Will fix it.  Thanks!
> > > > 
> > > > > 
> > > > > > diff --git a/mm/memory.c b/mm/memory.c
> > > > > > index 0e0de08a2cd5..998a4f9a3744 100644
> > > > > > --- a/mm/memory.c
> > > > > > +++ b/mm/memory.c
> > > > > > @@ -3491,6 +3491,13 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
> > > > > >  	if (!pte_unmap_same(vmf))
> > > > > >  		goto out;
> > > > > >  
> > > > > > +	/*
> > > > > > +	 * We should never call do_swap_page upon a swap special pte; just be
> > > > > > +	 * safe to bail out if it happens.
> > > > > > +	 */
> > > > > > +	if (WARN_ON_ONCE(is_swap_special_pte(vmf->orig_pte)))
> > > > > > +		goto out;
> > > > > > +
> > > > > >  	entry = pte_to_swp_entry(vmf->orig_pte);
> > > > > >  	if (unlikely(non_swap_entry(entry))) {
> > > > > >  		if (is_migration_entry(entry)) {
> > > > > 
> > > > > Are there other changes required here? Because we can end up with stale special
> > > > > pte's and a special pte is !pte_none don't we need to fix some of the !pte_none
> > > > > checks in these functions:
> > > > > 
> > > > > insert_pfn() -> checks for !pte_none
> > > > > remap_pte_range() -> BUG_ON(!pte_none)
> > > > > apply_to_pte_range() -> didn't check further but it tests for !pte_none
> > > > > 
> > > > > In general it feels like I might be missing something here though. There are
> > > > > plenty of checks in the kernel for pte_none() which haven't been updated. Is
> > > > > there some rule that says none of those paths can see a special pte?
> > > > 
> > > > My rule on doing this was to only care about vma that can be backed by RAM,
> > > > majorly shmem/hugetlb, so the special pte can only exist there within those
> > > > vmas.  I believe in most pte_none() users this special pte won't exist.
> > > > 
> > > > So if it's not related to RAM backed memory at all, maybe it's fine to keep the
> > > > pte_none() usage like before.
> > > > 
> > > > Take the example of insert_pfn() referenced first - I think it can be used to
> > > > map some MMIO regions, but I don't think we'll call that upon a RAM region
> > > > (either shmem or hugetlb), nor can it be uffd wr-protected.  So I'm not sure
> > > > adding special pte check there would be helpful.
> > > > 
> > > > apply_to_pte_range() seems to be a bit special - I think the pte_fn_t matters
> > > > more on whether the special pte will matter.  I had a quick look, it seems
> > > > still be used mostly by all kinds of driver code not mm core.  It's used in two
> > > > forms:
> > > > 
> > > >         apply_to_page_range
> > > >         apply_to_existing_page_range
> > > > 
> > > > The first one creates ptes only, so it ignores the pte_none() check so I skipped.
> > > > 
> > > > The second one has two call sites:
> > > > 
> > > > *** arch/powerpc/mm/pageattr.c:
> > > > change_memory_attr[99]         return apply_to_existing_page_range(&init_mm, start, size,
> > > > set_memory_attr[132]           return apply_to_existing_page_range(&init_mm, start, sz, set_page_attr,
> > > > 
> > > > *** mm/kasan/shadow.c:
> > > > kasan_release_vmalloc[485]     apply_to_existing_page_range(&init_mm,
> > > > 
> > > > I'll leave the ppc callers for now as uffd-wp is not even supported there.  The
> > > > kasan_release_vmalloc() should be for kernel allocated memories only, so should
> > > > not be a target for special pte either.
> > > > 
> > > > So indeed it's hard to 100% cover all pte_none() users to make sure things are
> > > > used right.  As stated above I still believe most callers don't need that, but
> > > > the worst case is if someone triggered uffd-wp issues with a specific feature,
> > > > we can look into it.  I am not sure whether it's good we add this for all the
> > > > pte_none() users, because mostly they'll be useless checks, imho.
> > > 
> > > I wonder then - should we make pte_none() return true for these special pte's
> > > as well? It seems if we do miss any callers it could result in some fairly hard
> > > to find bugs if the code follows a different path due to the presence of an
> > > unexpected special pte changing the result of pte_none().
> > 
> > I thought about something similar before, but I didn't dare to change
> > pte_none() as it's been there for ages and I'm afraid people will get confused
> > when it's meaning changed.  So even if we want to have some helper identifying
> > "either none pte or the swap special pte" it should use a different name.
> > 
> > Modifying the meaning of pte_none() could also have other risks that when we
> > really want an empty pte to be doing something else now.  It turns out there's
> > no easy way to not identify the case one by one, at least to me.  I'm always
> > open to good suggestions.
> 
> I'm not convinced it's changing the behaviour of pte_none() though and my
> concern is that introducing special swap ptes does change it. Prior to this
> clearing a pte would result in pte_none()==True. After this series clearing a
> pte can some sometimes result in pte_none()==False because it doesn't really
> get cleared.

The thing is the uffd special pte is not "none" literally; there's something
inside.  That's what makes it feel not right to me.  I'm not against trapping
all of pte_none(), but as I mentioned I think at least it needs to be renamed
to something else (maybe pte_none_mostly(), but I don't know..).

> 
> Now as you say it's hard to cover 100% of pte_none() uses, so it's possible we
> have missed cases that may now encounter a special pte and take a different
> path (get_mctgt_type() is one example, I stopped looking for other possible
> ones after mm/memory.c).
> 
> So perhaps if we want to keep pte_none() to check for really clear pte's then
> what is required is converting all callers to a new helper
> (pte_none_not_special()?) that treats special swap ptes as pte_none() and warns
> if a special pte is encountered?

By double check all core memory calls to pte_none()?

The special swap pte shouldn't exist for most cases but only for shmem and
hugetlbfs so far.  So we can sensibly drop a lot of pte_none() users IMHO
depending on the type of memory.

> 
> > Btw, as you mentioned before, we can use a new number out of MAX_SWAPFILES,
> > that'll make all these easier a bit here, then we don't need to worry on
> > pte_none() issues too.  Two days ago Hugh has raised some similar concern on
> > whether it's good to implement this uffd-wp special pte like this.  I think we
> > can discuss this separately.
> 
> Yes, I saw that and personally I still prefer that approach.

Yes I see your preference.  Let's hold off a bit on the pte_none() discussions;
I'll re-raise this in the cover letter soon.  If everyone is okay that we use
yet another MAX_SWAPFILES and that's preferred, then I can switch the design.
Then I think I can also avoid touching the pte_none() bits at all, which seems
to be controversial here.

But still, I am also not convinced that we can blindly replace pte_none() into
"either none pte or some special pte", either in this series or (if this series
will switch to swp_entry) in the future when we want to use !pte_present and
!swp_entry ptes.  If we want to replace that, we may still want to check over
all the users of pte_none then it's the same as what we should do now, and do a
proper rename of it.

Thanks,

-- 
Peter Xu



  reply	other threads:[~2021-07-22 15:21 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-15 20:13 [PATCH v5 00/26] userfaultfd-wp: Support shmem and hugetlbfs Peter Xu
2021-07-15 20:13 ` [PATCH v5 01/26] mm/shmem: Unconditionally set pte dirty in mfill_atomic_install_pte Peter Xu
2021-07-15 20:13 ` [PATCH v5 02/26] shmem/userfaultfd: Take care of UFFDIO_COPY_MODE_WP Peter Xu
2021-07-15 20:13 ` [PATCH v5 03/26] mm: Clear vmf->pte after pte_unmap_same() returns Peter Xu
2021-07-15 20:14 ` [PATCH v5 04/26] mm/userfaultfd: Introduce special pte for unmapped file-backed mem Peter Xu
2021-07-15 20:14 ` [PATCH v5 05/26] mm/swap: Introduce the idea of special swap ptes Peter Xu
2021-07-16  5:50   ` Alistair Popple
2021-07-16 19:11     ` Peter Xu
2021-07-21 11:28       ` Alistair Popple
2021-07-21 21:35         ` Peter Xu
2021-07-22  1:08           ` Alistair Popple
2021-07-22 15:21             ` Peter Xu [this message]
2021-07-15 20:14 ` [PATCH v5 06/26] shmem/userfaultfd: Handle uffd-wp special pte in page fault handler Peter Xu
2021-07-15 20:14 ` [PATCH v5 07/26] mm: Drop first_index/last_index in zap_details Peter Xu
2021-07-15 20:14 ` [PATCH v5 08/26] mm: Introduce zap_details.zap_flags Peter Xu
2021-07-15 20:14 ` [PATCH v5 09/26] mm: Introduce ZAP_FLAG_SKIP_SWAP Peter Xu
2021-07-15 20:14 ` [PATCH v5 10/26] shmem/userfaultfd: Persist uffd-wp bit across zapping for file-backed Peter Xu
2021-07-15 20:15 ` [PATCH v5 11/26] shmem/userfaultfd: Allow wr-protect none pte for file-backed mem Peter Xu
2021-07-15 20:16 ` [PATCH v5 12/26] shmem/userfaultfd: Allows file-back mem to be uffd wr-protected on thps Peter Xu
2021-07-15 20:16 ` [PATCH v5 13/26] shmem/userfaultfd: Handle the left-overed special swap ptes Peter Xu
2021-07-15 20:16 ` [PATCH v5 14/26] shmem/userfaultfd: Pass over uffd-wp special swap pte when fork() Peter Xu
2021-07-15 20:16 ` [PATCH v5 15/26] mm/hugetlb: Drop __unmap_hugepage_range definition from hugetlb.h Peter Xu
2021-07-15 20:16 ` [PATCH v5 16/26] mm/hugetlb: Introduce huge pte version of uffd-wp helpers Peter Xu
2021-07-15 20:16 ` [PATCH v5 17/26] hugetlb/userfaultfd: Hook page faults for uffd write protection Peter Xu
2021-07-20 15:37   ` kernel test robot
2021-07-20 15:37     ` kernel test robot
2021-07-21 21:50     ` Peter Xu
2021-07-21 21:50       ` Peter Xu
2021-07-15 20:16 ` [PATCH v5 18/26] hugetlb/userfaultfd: Take care of UFFDIO_COPY_MODE_WP Peter Xu
2021-07-20 23:59   ` kernel test robot
2021-07-20 23:59     ` kernel test robot
2021-07-15 20:16 ` [PATCH v5 19/26] hugetlb/userfaultfd: Handle UFFDIO_WRITEPROTECT Peter Xu
2021-07-21  8:24   ` kernel test robot
2021-07-21  8:24     ` kernel test robot
2021-07-15 20:16 ` [PATCH v5 20/26] mm/hugetlb: Introduce huge version of special swap pte helpers Peter Xu
2021-07-15 20:16 ` [PATCH v5 21/26] hugetlb/userfaultfd: Handle uffd-wp special pte in hugetlb pf handler Peter Xu
2021-07-15 20:16 ` [PATCH v5 22/26] hugetlb/userfaultfd: Allow wr-protect none ptes Peter Xu
2021-07-15 20:16 ` [PATCH v5 23/26] hugetlb/userfaultfd: Only drop uffd-wp special pte if required Peter Xu
2021-07-15 20:16 ` [PATCH v5 24/26] mm/pagemap: Recognize uffd-wp bit for shmem/hugetlbfs Peter Xu
2021-07-19  9:53   ` Tiberiu Georgescu
2021-07-19 16:03     ` Peter Xu
2021-07-19 17:23       ` Tiberiu Georgescu
2021-07-19 17:56         ` Peter Xu
2021-07-21 14:38           ` Ivan Teterevkov
2021-07-21 16:19             ` David Hildenbrand
2021-07-21 19:54               ` Ivan Teterevkov
2021-07-21 22:28                 ` Peter Xu
2021-07-21 22:57                   ` Peter Xu
2021-07-22  6:27                     ` David Hildenbrand
2021-07-22 16:08                       ` Peter Xu
2021-07-15 20:16 ` [PATCH v5 25/26] mm/userfaultfd: Enable write protection for shmem & hugetlbfs Peter Xu
2021-07-15 20:16 ` [PATCH v5 26/26] userfaultfd/selftests: Enable uffd-wp for shmem/hugetlbfs Peter Xu
2021-07-19 19:21 ` [PATCH v5 00/26] userfaultfd-wp: Support shmem and hugetlbfs David Hildenbrand
2021-07-19 20:12   ` Peter Xu
2021-07-22 18:30 ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YPmM4ThMIde9FTbs@t490s \
    --to=peterx@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=axelrasmussen@google.com \
    --cc=david@redhat.com \
    --cc=hughd@google.com \
    --cc=jgg@ziepe.ca \
    --cc=jglisse@redhat.com \
    --cc=kirill@shutemov.name \
    --cc=linmiaohe@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=nadav.amit@gmail.com \
    --cc=rppt@linux.vnet.ibm.com \
    --cc=tiberiu.georgescu@nutanix.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.