From: Jerome Glisse <jglisse@redhat.com>
To: Laurent Dufour <ldufour@linux.ibm.com>
Cc: jack@suse.cz, sergey.senozhatsky.work@gmail.com,
peterz@infradead.org, Will Deacon <will.deacon@arm.com>,
mhocko@kernel.org, linux-mm@kvack.org, paulus@samba.org,
Punit Agrawal <punitagrawal@gmail.com>,
hpa@zytor.com, Michel Lespinasse <walken@google.com>,
Alexei Starovoitov <alexei.starovoitov@gmail.com>,
Andrea Arcangeli <aarcange@redhat.com>,
ak@linux.intel.com, Minchan Kim <minchan@kernel.org>,
aneesh.kumar@linux.ibm.com, x86@kernel.org,
Matthew Wilcox <willy@infradead.org>,
Daniel Jordan <daniel.m.jordan@oracle.com>,
Ingo Molnar <mingo@redhat.com>,
David Rientjes <rientjes@google.com>,
paulmck@linux.vnet.ibm.com, Haiyan Song <haiyanx.song@intel.com>,
npiggin@gmail.com, sj38.park@gmail.com, dave@stgolabs.net,
kemi.wang@intel.com, kirill@shutemov.name,
Thomas Gleixner <tglx@linutronix.de>,
zhong jiang <zhongjiang@huawei.com>,
Ganesh Mahendran <opensource.ganesh@gmail.com>,
Yang Shi <yang.shi@linux.alibaba.com>,
Mike Rapoport <rppt@linux.ibm.com>,
linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org,
Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
vinayak menon <vinayakm.list@gmail.com>,
akpm@linux-foundation.org, Tim Chen <tim.c.chen@linux.intel.com>,
haren@linux.vnet.ibm.com
Subject: Re: [PATCH v12 09/31] mm: VMA sequence count
Date: Mon, 22 Apr 2019 11:51:42 -0400 [thread overview]
Message-ID: <20190422155142.GD3450@redhat.com> (raw)
In-Reply-To: <d217e71c-7d55-ce1a-6461-ce1de732fb57@linux.ibm.com>
On Fri, Apr 19, 2019 at 05:45:57PM +0200, Laurent Dufour wrote:
> Hi Jerome,
>
> Thanks a lot for reviewing this series.
>
> Le 19/04/2019 à 00:48, Jerome Glisse a écrit :
> > On Tue, Apr 16, 2019 at 03:45:00PM +0200, Laurent Dufour wrote:
> > > From: Peter Zijlstra <peterz@infradead.org>
> > >
> > > Wrap the VMA modifications (vma_adjust/unmap_page_range) with sequence
> > > counts such that we can easily test if a VMA is changed.
> > >
> > > The calls to vm_write_begin/end() in unmap_page_range() are
> > > used to detect when a VMA is being unmap and thus that new page fault
> > > should not be satisfied for this VMA. If the seqcount hasn't changed when
> > > the page table are locked, this means we are safe to satisfy the page
> > > fault.
> > >
> > > The flip side is that we cannot distinguish between a vma_adjust() and
> > > the unmap_page_range() -- where with the former we could have
> > > re-checked the vma bounds against the address.
> > >
> > > The VMA's sequence counter is also used to detect change to various VMA's
> > > fields used during the page fault handling, such as:
> > > - vm_start, vm_end
> > > - vm_pgoff
> > > - vm_flags, vm_page_prot
> > > - vm_policy
> >
> > ^ All above are under mmap write lock ?
>
> Yes, changes are still made under the protection of the mmap_sem.
>
> >
> > > - anon_vma
> >
> > ^ This is either under mmap write lock or under page table lock
> >
> > So my question is do we need the complexity of seqcount_t for this ?
>
> The sequence counter is used to detect write operation done while readers
> (SPF handler) is running.
>
> The implementation is quite simple (here without the lockdep checks):
>
> static inline void raw_write_seqcount_begin(seqcount_t *s)
> {
> s->sequence++;
> smp_wmb();
> }
>
> I can't see why this is too complex here, would you elaborate on this ?
>
> >
> > It seems that using regular int as counter and also relying on vm_flags
> > when vma is unmap should do the trick.
>
> vm_flags is not enough I guess an some operation are not impacting the
> vm_flags at all (resizing for instance).
> Am I missing something ?
>
> >
> > vma_delete(struct vm_area_struct *vma)
> > {
> > ...
> > /*
> > * Make sure the vma is mark as invalid ie neither read nor write
> > * so that speculative fault back off. A racing speculative fault
> > * will either see the flags as 0 or the new seqcount.
> > */
> > vma->vm_flags = 0;
> > smp_wmb();
> > vma->seqcount++;
> > ...
> > }
>
> Well I don't think we can safely clear the vm_flags this way when the VMA is
> unmap, I think it is used later when cleaning is doen.
>
> Later in this series, the VMA deletion is managed when the VMA is unlinked
> from the RB Tree. That is checked using the vm_rb field's value, and managed
> using RCU.
>
> > Then:
> > speculative_fault_begin(struct vm_area_struct *vma,
> > struct spec_vmf *spvmf)
> > {
> > ...
> > spvmf->seqcount = vma->seqcount;
> > smp_rmb();
> > spvmf->vm_flags = vma->vm_flags;
> > if (!spvmf->vm_flags) {
> > // Back off the vma is dying ...
> > ...
> > }
> > }
> >
> > bool speculative_fault_commit(struct vm_area_struct *vma,
> > struct spec_vmf *spvmf)
> > {
> > ...
> > seqcount = vma->seqcount;
> > smp_rmb();
> > vm_flags = vma->vm_flags;
> >
> > if (spvmf->vm_flags != vm_flags || seqcount != spvmf->seqcount) {
> > // Something did change for the vma
> > return false;
> > }
> > return true;
> > }
> >
> > This would also avoid the lockdep issue described below. But maybe what
> > i propose is stupid and i will see it after further reviewing thing.
>
> That's true that the lockdep is quite annoying here. But it is still
> interesting to keep in the loop to avoid 2 subsequent write_seqcount_begin()
> call being made in the same context (which would lead to an even sequence
> counter value while write operation is in progress). So I think this is
> still a good thing to have lockdep available here.
Ok so i had to read everything and i should have read everything before
asking all of the above. It does look good in fact, what worried my in
this patch is all the lockdep avoidance as it is usualy a red flags.
But after thinking long and hard i do not see how to easily solve that
one as unmap_page_range() is in so many different path... So what is done
in this patch is the most sane thing. Sorry for the noise.
So for this patch:
Reviewed-by: Jérôme Glisse <jglisse@redhat.com>
next prev parent reply other threads:[~2019-04-22 15:53 UTC|newest]
Thread overview: 98+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-04-16 13:44 [PATCH v12 00/31] Speculative page faults Laurent Dufour
2019-04-16 13:44 ` [PATCH v12 01/31] mm: introduce CONFIG_SPECULATIVE_PAGE_FAULT Laurent Dufour
2019-04-18 21:47 ` Jerome Glisse
2019-04-23 15:21 ` Laurent Dufour
2019-04-16 13:44 ` [PATCH v12 02/31] x86/mm: define ARCH_SUPPORTS_SPECULATIVE_PAGE_FAULT Laurent Dufour
2019-04-18 21:48 ` Jerome Glisse
2019-04-16 13:44 ` [PATCH v12 03/31] powerpc/mm: set ARCH_SUPPORTS_SPECULATIVE_PAGE_FAULT Laurent Dufour
2019-04-18 21:49 ` Jerome Glisse
2019-04-16 13:44 ` [PATCH v12 04/31] arm64/mm: define ARCH_SUPPORTS_SPECULATIVE_PAGE_FAULT Laurent Dufour
2019-04-16 14:27 ` Mark Rutland
2019-04-16 14:31 ` Laurent Dufour
2019-04-16 14:41 ` Mark Rutland
2019-04-18 21:51 ` Jerome Glisse
2019-04-23 15:36 ` Laurent Dufour
2019-04-23 16:19 ` Mark Rutland
2019-04-24 10:34 ` Laurent Dufour
2019-04-16 13:44 ` [PATCH v12 05/31] mm: prepare for FAULT_FLAG_SPECULATIVE Laurent Dufour
2019-04-18 22:04 ` Jerome Glisse
2019-04-23 15:45 ` Laurent Dufour
2019-04-16 13:44 ` [PATCH v12 06/31] mm: introduce pte_spinlock " Laurent Dufour
2019-04-18 22:05 ` Jerome Glisse
2019-04-16 13:44 ` [PATCH v12 07/31] mm: make pte_unmap_same compatible with SPF Laurent Dufour
2019-04-18 22:10 ` Jerome Glisse
2019-04-23 15:43 ` Matthew Wilcox
2019-04-23 15:47 ` Laurent Dufour
2019-04-16 13:44 ` [PATCH v12 08/31] mm: introduce INIT_VMA() Laurent Dufour
2019-04-18 22:22 ` Jerome Glisse
2019-04-16 13:45 ` [PATCH v12 09/31] mm: VMA sequence count Laurent Dufour
2019-04-18 22:48 ` Jerome Glisse
2019-04-19 15:45 ` Laurent Dufour
2019-04-22 15:51 ` Jerome Glisse [this message]
2019-04-16 13:45 ` [PATCH v12 10/31] mm: protect VMA modifications using " Laurent Dufour
2019-04-22 19:43 ` Jerome Glisse
2019-04-16 13:45 ` [PATCH v12 11/31] mm: protect mremap() against SPF hanlder Laurent Dufour
2019-04-22 19:51 ` Jerome Glisse
2019-04-23 15:51 ` Laurent Dufour
2019-04-16 13:45 ` [PATCH v12 12/31] mm: protect SPF handler against anon_vma changes Laurent Dufour
2019-04-22 19:53 ` Jerome Glisse
2019-04-16 13:45 ` [PATCH v12 13/31] mm: cache some VMA fields in the vm_fault structure Laurent Dufour
2019-04-22 20:06 ` Jerome Glisse
2019-04-16 13:45 ` [PATCH v12 14/31] mm/migrate: Pass vm_fault pointer to migrate_misplaced_page() Laurent Dufour
2019-04-22 20:09 ` Jerome Glisse
2019-04-16 13:45 ` [PATCH v12 15/31] mm: introduce __lru_cache_add_active_or_unevictable Laurent Dufour
2019-04-22 20:11 ` Jerome Glisse
2019-04-16 13:45 ` [PATCH v12 16/31] mm: introduce __vm_normal_page() Laurent Dufour
2019-04-22 20:15 ` Jerome Glisse
2019-04-16 13:45 ` [PATCH v12 17/31] mm: introduce __page_add_new_anon_rmap() Laurent Dufour
2019-04-22 20:18 ` Jerome Glisse
2019-04-16 13:45 ` [PATCH v12 18/31] mm: protect against PTE changes done by dup_mmap() Laurent Dufour
2019-04-22 20:32 ` Jerome Glisse
2019-04-24 10:33 ` Laurent Dufour
2019-04-16 13:45 ` [PATCH v12 19/31] mm: protect the RB tree with a sequence lock Laurent Dufour
2019-04-22 20:33 ` Jerome Glisse
2019-04-16 13:45 ` [PATCH v12 20/31] mm: introduce vma reference counter Laurent Dufour
2019-04-22 20:36 ` Jerome Glisse
2019-04-24 14:26 ` Laurent Dufour
2019-04-16 13:45 ` [PATCH v12 21/31] mm: Introduce find_vma_rcu() Laurent Dufour
2019-04-22 20:57 ` Jerome Glisse
2019-04-24 14:39 ` Laurent Dufour
2019-04-23 9:27 ` Peter Zijlstra
2019-04-23 18:13 ` Davidlohr Bueso
2019-04-24 7:57 ` Laurent Dufour
2019-04-16 13:45 ` [PATCH v12 22/31] mm: provide speculative fault infrastructure Laurent Dufour
2019-04-22 21:26 ` Jerome Glisse
2019-04-24 14:56 ` Laurent Dufour
2019-04-24 15:13 ` Jerome Glisse
2019-04-16 13:45 ` [PATCH v12 23/31] mm: don't do swap readahead during speculative page fault Laurent Dufour
2019-04-22 21:36 ` Jerome Glisse
2019-04-24 14:57 ` Laurent Dufour
2019-04-16 13:45 ` [PATCH v12 24/31] mm: adding speculative page fault failure trace events Laurent Dufour
2019-04-16 13:45 ` [PATCH v12 25/31] perf: add a speculative page fault sw event Laurent Dufour
2019-04-16 13:45 ` [PATCH v12 26/31] perf tools: add support for the SPF perf event Laurent Dufour
2019-04-16 13:45 ` [PATCH v12 27/31] mm: add speculative page fault vmstats Laurent Dufour
2019-04-16 13:45 ` [PATCH v12 28/31] x86/mm: add speculative pagefault handling Laurent Dufour
2019-04-16 13:45 ` [PATCH v12 29/31] powerpc/mm: add speculative page fault Laurent Dufour
2019-04-16 13:45 ` [PATCH v12 30/31] arm64/mm: " Laurent Dufour
2019-04-16 13:45 ` [PATCH v12 31/31] mm: Add a speculative page fault switch in sysctl Laurent Dufour
2019-04-22 21:29 ` [PATCH v12 00/31] Speculative page faults Michel Lespinasse
2019-04-23 9:38 ` Peter Zijlstra
2019-04-24 7:33 ` Laurent Dufour
2019-04-27 1:53 ` Michel Lespinasse
2019-04-23 10:47 ` Michal Hocko
2019-04-23 12:41 ` Matthew Wilcox
2019-04-23 12:48 ` Peter Zijlstra
2019-04-23 13:42 ` Michal Hocko
2019-04-24 18:01 ` Laurent Dufour
2019-04-27 6:00 ` Michel Lespinasse
2019-04-23 11:35 ` Anshuman Khandual
2019-06-06 6:51 ` Haiyan Song
2019-06-14 8:37 ` Laurent Dufour
2019-06-14 8:44 ` Laurent Dufour
2019-06-20 8:19 ` Haiyan Song
2020-07-06 9:25 ` Chinwen Chang
2020-07-06 12:27 ` Laurent Dufour
2020-07-07 5:31 ` Chinwen Chang
2020-12-14 2:03 ` Joel Fernandes
2020-12-14 9:36 ` Laurent Dufour
2020-12-14 18:10 ` Joel Fernandes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190422155142.GD3450@redhat.com \
--to=jglisse@redhat.com \
--cc=aarcange@redhat.com \
--cc=ak@linux.intel.com \
--cc=akpm@linux-foundation.org \
--cc=alexei.starovoitov@gmail.com \
--cc=aneesh.kumar@linux.ibm.com \
--cc=daniel.m.jordan@oracle.com \
--cc=dave@stgolabs.net \
--cc=haiyanx.song@intel.com \
--cc=haren@linux.vnet.ibm.com \
--cc=hpa@zytor.com \
--cc=jack@suse.cz \
--cc=kemi.wang@intel.com \
--cc=kirill@shutemov.name \
--cc=ldufour@linux.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=mhocko@kernel.org \
--cc=minchan@kernel.org \
--cc=mingo@redhat.com \
--cc=npiggin@gmail.com \
--cc=opensource.ganesh@gmail.com \
--cc=paulmck@linux.vnet.ibm.com \
--cc=paulus@samba.org \
--cc=peterz@infradead.org \
--cc=punitagrawal@gmail.com \
--cc=rientjes@google.com \
--cc=rppt@linux.ibm.com \
--cc=sergey.senozhatsky.work@gmail.com \
--cc=sergey.senozhatsky@gmail.com \
--cc=sj38.park@gmail.com \
--cc=tglx@linutronix.de \
--cc=tim.c.chen@linux.intel.com \
--cc=vinayakm.list@gmail.com \
--cc=walken@google.com \
--cc=will.deacon@arm.com \
--cc=willy@infradead.org \
--cc=x86@kernel.org \
--cc=yang.shi@linux.alibaba.com \
--cc=zhongjiang@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).