Re: [PATCH 0 of 9] mmu notifier #v12

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Robin Holt <holt@sgi.com>
To: Andrea Arcangeli <andrea@qumranet.com>
Cc: Robin Holt <holt@sgi.com>, Christoph Lameter <clameter@sgi.com>,
	akpm@linux-foundation.org, Nick Piggin <npiggin@suse.de>,
	Steve Wise <swise@opengridcomputing.com>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	linux-mm@kvack.org, Kanoj Sarcar <kanojsarcar@yahoo.com>,
	Roland Dreier <rdreier@cisco.com>, Jack Steiner <steiner@sgi.com>,
	linux-kernel@vger.kernel.org, Avi Kivity <avi@qumranet.com>,
	kvm-devel@lists.sourceforge.net, general@lists.openfabrics.org,
	Hugh Dickins <hugh@veritas.com>
Subject: Re: [PATCH 0 of 9] mmu notifier #v12
Date: Tue, 22 Apr 2008 08:36:04 -0500	[thread overview]
Message-ID: <20080422133604.GN30298@sgi.com> (raw)
In-Reply-To: <20080422132143.GS12709@duo.random>

On Tue, Apr 22, 2008 at 03:21:43PM +0200, Andrea Arcangeli wrote:
> On Tue, Apr 22, 2008 at 08:01:20AM -0500, Robin Holt wrote:
> > On Tue, Apr 22, 2008 at 02:00:56PM +0200, Andrea Arcangeli wrote:
> > > On Tue, Apr 22, 2008 at 09:20:26AM +0200, Andrea Arcangeli wrote:
> > > >     invalidate_range_start {
> > > > 	spin_lock(&kvm->mmu_lock);
> > > > 
> > > > 	kvm->invalidate_range_count++;
> > > > 	rmap-invalidate of sptes in range
> > > > 
> > > 
> > > 	write_seqlock; write_sequnlock;
> > 
> > I don't think you need it here since invalidate_range_count is already
> > elevated which will accomplish the same effect.
> 
> Agreed, seqlock only in range_end should be enough. BTW, the fact

I am a little confused about the value of the seq_lock versus a simple
atomic, but I assumed there is a reason and left it at that.

> seqlock is needed regardless of invalidate_page existing or not,
> really makes invalidate_page a no brainer not just from the core VM
> point of view, but from the driver point of view too. The
> kvm_page_fault logic would be the same even if I remove
> invalidate_page from the mmu notifier patch but it'd run slower both
> when armed and disarmed.

I don't know what you mean by "it'd" run slower and what you mean by
"armed and disarmed".

For the sake of this discussion, I will assume "it'd" means the kernel in
general and not KVM.  With the two call sites for range_begin/range_end,
I would agree we have more call sites, but the second is extremely likely
to be cache hot.

By disarmed, I will assume you mean no notifiers registered for a
particular mm.  In that case, the cache will make the second call
effectively free.  So, for the disarmed case, I see no measurable
difference.

For the case where there is a notifier registered, I certainly can see
a difference.  I am not certain how to quantify the difference as it
depends on the callee.  In the case of xpmem, our callout is always very
expensive for the _start case.  Our _end case is very light, but it is
essentially the exact same steps we would perform for the _page callout.

When I was discussing this difference with Jack, he reminded me that
the GRU, due to its hardware, does not have any race issues with the
invalidate_page callout simply doing the tlb shootdown and not modifying
any of its internal structures.  He then put a caveat on the discussion
that _either_ method was acceptable as far as he was concerned.  The real
issue is getting a patch in that satisfies all needs and not whether
there is a seperate invalidate_page callout.

Thanks,
Robin

WARNING: multiple messages have this Message-ID (diff)

From: Robin Holt <holt@sgi.com>
To: Andrea Arcangeli <andrea@qumranet.com>
Cc: Nick Piggin <npiggin@suse.de>, Jack Steiner <steiner@sgi.com>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	kvm-devel@lists.sourceforge.net,
	Kanoj Sarcar <kanojsarcar@yahoo.com>,
	Roland Dreier <rdreier@cisco.com>,
	linux-kernel@vger.kernel.org, Avi Kivity <avi@qumranet.com>,
	linux-mm@kvack.org, Robin Holt <holt@sgi.com>,
	general@lists.openfabrics.org, Hugh Dickins <hugh@veritas.com>,
	akpm@linux-foundation.org, Christoph Lameter <clameter@sgi.com>
Subject: [ofa-general] Re: [PATCH 0 of 9] mmu notifier #v12
Date: Tue, 22 Apr 2008 08:36:04 -0500	[thread overview]
Message-ID: <20080422133604.GN30298@sgi.com> (raw)
In-Reply-To: <20080422132143.GS12709@duo.random>

On Tue, Apr 22, 2008 at 03:21:43PM +0200, Andrea Arcangeli wrote:
> On Tue, Apr 22, 2008 at 08:01:20AM -0500, Robin Holt wrote:
> > On Tue, Apr 22, 2008 at 02:00:56PM +0200, Andrea Arcangeli wrote:
> > > On Tue, Apr 22, 2008 at 09:20:26AM +0200, Andrea Arcangeli wrote:
> > > >     invalidate_range_start {
> > > > 	spin_lock(&kvm->mmu_lock);
> > > > 
> > > > 	kvm->invalidate_range_count++;
> > > > 	rmap-invalidate of sptes in range
> > > > 
> > > 
> > > 	write_seqlock; write_sequnlock;
> > 
> > I don't think you need it here since invalidate_range_count is already
> > elevated which will accomplish the same effect.
> 
> Agreed, seqlock only in range_end should be enough. BTW, the fact

I am a little confused about the value of the seq_lock versus a simple
atomic, but I assumed there is a reason and left it at that.

> seqlock is needed regardless of invalidate_page existing or not,
> really makes invalidate_page a no brainer not just from the core VM
> point of view, but from the driver point of view too. The
> kvm_page_fault logic would be the same even if I remove
> invalidate_page from the mmu notifier patch but it'd run slower both
> when armed and disarmed.

I don't know what you mean by "it'd" run slower and what you mean by
"armed and disarmed".

For the sake of this discussion, I will assume "it'd" means the kernel in
general and not KVM.  With the two call sites for range_begin/range_end,
I would agree we have more call sites, but the second is extremely likely
to be cache hot.

By disarmed, I will assume you mean no notifiers registered for a
particular mm.  In that case, the cache will make the second call
effectively free.  So, for the disarmed case, I see no measurable
difference.

For the case where there is a notifier registered, I certainly can see
a difference.  I am not certain how to quantify the difference as it
depends on the callee.  In the case of xpmem, our callout is always very
expensive for the _start case.  Our _end case is very light, but it is
essentially the exact same steps we would perform for the _page callout.

When I was discussing this difference with Jack, he reminded me that
the GRU, due to its hardware, does not have any race issues with the
invalidate_page callout simply doing the tlb shootdown and not modifying
any of its internal structures.  He then put a caveat on the discussion
that _either_ method was acceptable as far as he was concerned.  The real
issue is getting a patch in that satisfies all needs and not whether
there is a seperate invalidate_page callout.

Thanks,
Robin

WARNING: multiple messages have this Message-ID (diff)

From: Robin Holt <holt@sgi.com>
To: Andrea Arcangeli <andrea@qumranet.com>
Cc: Robin Holt <holt@sgi.com>, Christoph Lameter <clameter@sgi.com>,
	akpm@linux-foundation.org, Nick Piggin <npiggin@suse.de>,
	Steve Wise <swise@opengridcomputing.com>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	linux-mm@kvack.org, Kanoj Sarcar <kanojsarcar@yahoo.com>,
	Roland Dreier <rdreier@cisco.com>, Jack Steiner <steiner@sgi.com>,
	linux-kernel@vger.kernel.org, Avi Kivity <avi@qumranet.com>,
	kvm-devel@lists.sourceforge.net, general@lists.openfabrics.org,
	Hugh Dickins <hugh@veritas.com>
Subject: Re: [PATCH 0 of 9] mmu notifier #v12
Date: Tue, 22 Apr 2008 08:36:04 -0500	[thread overview]
Message-ID: <20080422133604.GN30298@sgi.com> (raw)
In-Reply-To: <20080422132143.GS12709@duo.random>

On Tue, Apr 22, 2008 at 03:21:43PM +0200, Andrea Arcangeli wrote:
> On Tue, Apr 22, 2008 at 08:01:20AM -0500, Robin Holt wrote:
> > On Tue, Apr 22, 2008 at 02:00:56PM +0200, Andrea Arcangeli wrote:
> > > On Tue, Apr 22, 2008 at 09:20:26AM +0200, Andrea Arcangeli wrote:
> > > >     invalidate_range_start {
> > > > 	spin_lock(&kvm->mmu_lock);
> > > > 
> > > > 	kvm->invalidate_range_count++;
> > > > 	rmap-invalidate of sptes in range
> > > > 
> > > 
> > > 	write_seqlock; write_sequnlock;
> > 
> > I don't think you need it here since invalidate_range_count is already
> > elevated which will accomplish the same effect.
> 
> Agreed, seqlock only in range_end should be enough. BTW, the fact

I am a little confused about the value of the seq_lock versus a simple
atomic, but I assumed there is a reason and left it at that.

> seqlock is needed regardless of invalidate_page existing or not,
> really makes invalidate_page a no brainer not just from the core VM
> point of view, but from the driver point of view too. The
> kvm_page_fault logic would be the same even if I remove
> invalidate_page from the mmu notifier patch but it'd run slower both
> when armed and disarmed.

I don't know what you mean by "it'd" run slower and what you mean by
"armed and disarmed".

For the sake of this discussion, I will assume "it'd" means the kernel in
general and not KVM.  With the two call sites for range_begin/range_end,
I would agree we have more call sites, but the second is extremely likely
to be cache hot.

By disarmed, I will assume you mean no notifiers registered for a
particular mm.  In that case, the cache will make the second call
effectively free.  So, for the disarmed case, I see no measurable
difference.

For the case where there is a notifier registered, I certainly can see
a difference.  I am not certain how to quantify the difference as it
depends on the callee.  In the case of xpmem, our callout is always very
expensive for the _start case.  Our _end case is very light, but it is
essentially the exact same steps we would perform for the _page callout.

When I was discussing this difference with Jack, he reminded me that
the GRU, due to its hardware, does not have any race issues with the
invalidate_page callout simply doing the tlb shootdown and not modifying
any of its internal structures.  He then put a caveat on the discussion
that _either_ method was acceptable as far as he was concerned.  The real
issue is getting a patch in that satisfies all needs and not whether
there is a seperate invalidate_page callout.

Thanks,
Robin

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2008-04-22 13:36 UTC|newest]

Thread overview: 125+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-04-08 15:44 [PATCH 0 of 9] mmu notifier #v12 Andrea Arcangeli
2008-04-08 15:44 ` Andrea Arcangeli
2008-04-08 15:44 ` [ofa-general] " Andrea Arcangeli
2008-04-08 15:44 ` [PATCH 1 of 9] Lock the entire mm to prevent any mmu related operation to happen Andrea Arcangeli
2008-04-08 15:44   ` Andrea Arcangeli
2008-04-08 15:44   ` [ofa-general] " Andrea Arcangeli
2008-04-16 16:33   ` Robin Holt
2008-04-16 16:33     ` Robin Holt
2008-04-16 16:33     ` [ofa-general] " Robin Holt
2008-04-16 18:35     ` Christoph Lameter
2008-04-16 18:35       ` Christoph Lameter
2008-04-16 18:35       ` [ofa-general] " Christoph Lameter
2008-04-16 19:02       ` Robin Holt
2008-04-16 19:02         ` Robin Holt
2008-04-16 19:02         ` Robin Holt
2008-04-16 19:15         ` Christoph Lameter
2008-04-16 19:15           ` Christoph Lameter
2008-04-16 19:15           ` [ofa-general] " Christoph Lameter
2008-04-17 11:14           ` Robin Holt
2008-04-17 11:14             ` Robin Holt
2008-04-17 11:14             ` [ofa-general] " Robin Holt
2008-04-17 15:51       ` Andrea Arcangeli
2008-04-17 15:51         ` Andrea Arcangeli
2008-04-17 16:36         ` Robin Holt
2008-04-17 16:36           ` Robin Holt
2008-04-17 17:14           ` Andrea Arcangeli
2008-04-17 17:14             ` Andrea Arcangeli
2008-04-17 17:14             ` Andrea Arcangeli
2008-04-17 17:25             ` Robin Holt
2008-04-17 17:25               ` Robin Holt
2008-04-17 17:25               ` [ofa-general] " Robin Holt
2008-04-17 19:10             ` Christoph Lameter
2008-04-17 19:10               ` Christoph Lameter
2008-04-17 22:16               ` Andrea Arcangeli
2008-04-17 22:16                 ` Andrea Arcangeli
2008-04-17 22:16                 ` [ofa-general] " Andrea Arcangeli
2008-04-22  5:06   ` Rusty Russell
2008-04-22  5:06     ` Rusty Russell
2008-04-22  5:06     ` Rusty Russell
2008-04-25 16:56     ` Andrea Arcangeli
2008-04-25 16:56       ` Andrea Arcangeli
2008-04-25 17:04       ` Andrea Arcangeli
2008-04-25 17:04         ` Andrea Arcangeli
2008-04-25 19:25       ` Robin Holt
2008-04-25 19:25         ` Robin Holt
2008-04-25 19:25         ` [ofa-general] " Robin Holt
2008-04-26  0:57         ` Andrea Arcangeli
2008-04-26  0:57           ` Andrea Arcangeli
2008-04-26  0:57           ` Andrea Arcangeli
2008-04-08 15:44 ` [PATCH 2 of 9] Core of mmu notifiers Andrea Arcangeli
2008-04-08 15:44   ` Andrea Arcangeli
2008-04-08 15:44   ` [ofa-general] " Andrea Arcangeli
2008-04-08 16:26   ` Robin Holt
2008-04-08 16:26     ` Robin Holt
2008-04-08 16:26     ` [ofa-general] " Robin Holt
2008-04-08 17:05     ` Andrea Arcangeli
2008-04-08 17:05       ` Andrea Arcangeli
2008-04-14 19:57   ` Christoph Lameter
2008-04-14 19:57     ` Christoph Lameter
2008-04-14 19:57     ` [ofa-general] " Christoph Lameter
2008-04-14 19:59   ` Christoph Lameter
2008-04-14 19:59     ` Christoph Lameter
2008-04-14 19:59     ` Christoph Lameter
2008-04-08 15:44 ` [PATCH 3 of 9] Moves all mmu notifier methods outside the PT lock (first and not last Andrea Arcangeli
2008-04-08 15:44   ` Andrea Arcangeli
2008-04-08 15:44   ` [ofa-general] " Andrea Arcangeli
2008-04-14 19:57   ` Christoph Lameter
2008-04-14 19:57     ` Christoph Lameter
2008-04-14 19:57     ` Christoph Lameter
2008-04-08 15:44 ` [PATCH 4 of 9] Move the tlb flushing into free_pgtables. The conversion of the locks Andrea Arcangeli
2008-04-08 15:44   ` Andrea Arcangeli
2008-04-08 15:44   ` [ofa-general] " Andrea Arcangeli
2008-04-08 15:44 ` [PATCH 5 of 9] The conversion to a rwsem allows callbacks during rmap traversal Andrea Arcangeli
2008-04-08 15:44   ` Andrea Arcangeli
2008-04-08 15:44   ` [ofa-general] " Andrea Arcangeli
2008-04-08 15:44 ` [PATCH 6 of 9] We no longer abort unmapping in unmap vmas because we can reschedule while Andrea Arcangeli
2008-04-08 15:44   ` Andrea Arcangeli
2008-04-08 15:44   ` [ofa-general] " Andrea Arcangeli
2008-04-08 15:44 ` [PATCH 7 of 9] Convert the anon_vma spinlock to a rw semaphore. This allows concurrent Andrea Arcangeli
2008-04-08 15:44   ` Andrea Arcangeli
2008-04-08 15:44   ` [ofa-general] " Andrea Arcangeli
2008-04-08 15:44 ` [PATCH 8 of 9] XPMEM would have used sys_madvise() except that madvise_dontneed() Andrea Arcangeli
2008-04-08 15:44   ` Andrea Arcangeli
2008-04-08 15:44   ` [ofa-general] " Andrea Arcangeli
2008-04-08 15:44 ` [PATCH 9 of 9] This patch adds a lock ordering rule to avoid a potential deadlock when Andrea Arcangeli
2008-04-08 15:44   ` Andrea Arcangeli
2008-04-08 15:44   ` [ofa-general] " Andrea Arcangeli
2008-04-08 21:46 ` [PATCH 0 of 9] mmu notifier #v12 Avi Kivity
2008-04-08 21:46   ` Avi Kivity
2008-04-08 21:46   ` [ofa-general] " Avi Kivity
2008-04-08 22:06   ` Andrea Arcangeli
2008-04-08 22:06     ` Andrea Arcangeli
2008-04-08 22:06     ` [ofa-general] " Andrea Arcangeli
2008-04-09 13:17 ` Robin Holt
2008-04-09 13:17   ` Robin Holt
2008-04-09 13:17   ` [ofa-general] " Robin Holt
2008-04-09 14:44   ` Andrea Arcangeli
2008-04-09 14:44     ` Andrea Arcangeli
2008-04-09 14:44     ` [ofa-general] " Andrea Arcangeli
2008-04-09 18:55     ` Robin Holt
2008-04-09 18:55       ` Robin Holt
2008-04-09 18:55       ` [ofa-general] " Robin Holt
2008-04-22  7:20       ` Andrea Arcangeli
2008-04-22  7:20         ` Andrea Arcangeli
2008-04-22  7:20         ` [ofa-general] " Andrea Arcangeli
2008-04-22 12:00         ` Andrea Arcangeli
2008-04-22 12:00           ` Andrea Arcangeli
2008-04-22 12:00           ` [ofa-general] " Andrea Arcangeli
2008-04-22 13:01           ` Robin Holt
2008-04-22 13:01             ` Robin Holt
2008-04-22 13:01             ` [ofa-general] " Robin Holt
2008-04-22 13:21             ` Andrea Arcangeli
2008-04-22 13:21               ` Andrea Arcangeli
2008-04-22 13:21               ` [ofa-general] " Andrea Arcangeli
2008-04-22 13:36               ` Robin Holt [this message]
2008-04-22 13:36                 ` Robin Holt
2008-04-22 13:36                 ` [ofa-general] " Robin Holt
2008-04-22 13:48                 ` Andrea Arcangeli
2008-04-22 13:48                   ` Andrea Arcangeli
2008-04-22 13:48                   ` Andrea Arcangeli
2008-04-22 15:26                   ` Robin Holt
2008-04-22 15:26                     ` Robin Holt
2008-04-14 23:09 ` Christoph Lameter
2008-04-14 23:09   ` Christoph Lameter
2008-04-14 23:09   ` [ofa-general] " Christoph Lameter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080422133604.GN30298@sgi.com \
    --to=holt@sgi.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=andrea@qumranet.com \
    --cc=avi@qumranet.com \
    --cc=clameter@sgi.com \
    --cc=general@lists.openfabrics.org \
    --cc=hugh@veritas.com \
    --cc=kanojsarcar@yahoo.com \
    --cc=kvm-devel@lists.sourceforge.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=npiggin@suse.de \
    --cc=rdreier@cisco.com \
    --cc=steiner@sgi.com \
    --cc=swise@opengridcomputing.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.