Re: [PATCH 0 of 9] mmu notifier #v12

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Robin Holt <holt@sgi.com>
To: Andrea Arcangeli <andrea@qumranet.com>, akpm@linux-foundation.org
Cc: Robin Holt <holt@sgi.com>, Christoph Lameter <clameter@sgi.com>,
	Nick Piggin <npiggin@suse.de>,
	Steve Wise <swise@opengridcomputing.com>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	linux-mm@kvack.org, Kanoj Sarcar <kanojsarcar@yahoo.com>,
	Roland Dreier <rdreier@cisco.com>, Jack Steiner <steiner@sgi.com>,
	linux-kernel@vger.kernel.org, Avi Kivity <avi@qumranet.com>,
	kvm-devel@lists.sourceforge.net, general@lists.openfabrics.org,
	Hugh Dickins <hugh@veritas.com>
Subject: Re: [PATCH 0 of 9] mmu notifier #v12
Date: Tue, 22 Apr 2008 10:26:00 -0500	[thread overview]
Message-ID: <20080422152600.GP30298@sgi.com> (raw)
In-Reply-To: <20080422134847.GT12709@duo.random>

Andrew, Could we get direction/guidance from you as regards
the invalidate_page() callout of Andrea's patch set versus the
invalidate_range_start/invalidate_range_end callout pairs of Christoph's
patchset?  This is only in the context of the __xip_unmap, do_wp_page,
page_mkclean_one, and try_to_unmap_one call sites.

On Tue, Apr 22, 2008 at 03:48:47PM +0200, Andrea Arcangeli wrote:
> On Tue, Apr 22, 2008 at 08:36:04AM -0500, Robin Holt wrote:
> > I am a little confused about the value of the seq_lock versus a simple
> > atomic, but I assumed there is a reason and left it at that.
> 
> There's no value for anything but get_user_pages (get_user_pages takes
> its own lock internally though). I preferred to explain it as a
> seqlock because it was simpler for reading, but I totally agree in the
> final implementation it shouldn't be a seqlock. My code was meant to
> be pseudo-code only. It doesn't even need to be atomic ;).

Unless there is additional locking in your fault path, I think it does
need to be atomic.

> > I don't know what you mean by "it'd" run slower and what you mean by
> > "armed and disarmed".
> 
> 1) when armed the time-window where the kvm-page-fault would be
> blocked would be a bit larger without invalidate_page for no good
> reason

But that is a distinction without a difference.  In the _start/_end
case, kvm's fault handler will not have any _DIRECT_ blocking, but
get_user_pages() had certainly better block waiting for some other lock
to prevent the process's pages being refaulted.

I am no VM expert, but that seems like it is critical to having a
consistent virtual address space.  Effectively, you have a delay on the
kvm fault handler beginning when either invalidate_page() is entered
or invalidate_range_start() is entered until when the _CALLER_ of the
invalidate* method has unlocked.  That time will remain essentailly
identical for either case.  I would argue you would be hard pressed to
even measure the difference.

> 2) if you were to remove invalidate_page when disarmed the VM could
> would need two branches instead of one in various places

Those branches are conditional upon there being list entries.  That check
should be extremely cheap.  The vast majority of cases will have no
registered notifiers.  The second check for the _end callout will be
from cpu cache.

> I don't want to waste cycles if not wasting them improves performance
> both when armed and disarmed.

In summary, I think we have narrowed down the case of no registered
notifiers to being infinitesimal.  The case of registered notifiers
being a distinction without a difference.

> > When I was discussing this difference with Jack, he reminded me that
> > the GRU, due to its hardware, does not have any race issues with the
> > invalidate_page callout simply doing the tlb shootdown and not modifying
> > any of its internal structures.  He then put a caveat on the discussion
> > that _either_ method was acceptable as far as he was concerned.  The real
> > issue is getting a patch in that satisfies all needs and not whether
> > there is a seperate invalidate_page callout.
> 
> Sure, we have that patch now, I'll send it out in a minute, I was just
> trying to explain why it makes sense to have an invalidate_page too
> (which remains the only difference by now), removing it would be a
> regression on all sides, even if a minor one.

I think GRU is the only compelling case I have heard for having the
invalidate_page seperate.  In the case of the GRU, the hardware enforces a
lifetime of the invalidate which covers all in-progress faults including
ones where the hardware is informed after the flush of a PTE.  in all
cases, once the GRU invalidate instruction is issued, all active requests
are invalidated.  Future faults will be blocked in get_user_pages().
Without that special feature of the hardware, I don't think any code
simplification exists.  I, of course, reserve the right to be wrong.

I believe the argument against a seperate invalidate_page() callout was
Christoph's interpretation of Andrew's comments.  I am not certain Andrew
was aware of this special aspects of the GRU hardware and whether that
had been factored into the discussion at that point in time.

Thanks,
Robin

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2008-04-22 15:26 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-04-08 15:44 [PATCH 0 of 9] mmu notifier #v12 Andrea Arcangeli
2008-04-08 15:44 ` [PATCH 1 of 9] Lock the entire mm to prevent any mmu related operation to happen Andrea Arcangeli
2008-04-16 16:33   ` Robin Holt
2008-04-16 18:35     ` Christoph Lameter
2008-04-16 19:02       ` Robin Holt
2008-04-16 19:15         ` Christoph Lameter
2008-04-17 11:14           ` Robin Holt
2008-04-17 15:51       ` Andrea Arcangeli
2008-04-17 16:36         ` Robin Holt
2008-04-17 17:14           ` Andrea Arcangeli
2008-04-17 17:25             ` Robin Holt
2008-04-17 19:10             ` Christoph Lameter
2008-04-17 22:16               ` Andrea Arcangeli
2008-04-22  5:06   ` Rusty Russell
2008-04-25 16:56     ` Andrea Arcangeli
2008-04-25 17:04       ` Andrea Arcangeli
2008-04-25 19:25       ` Robin Holt
2008-04-26  0:57         ` Andrea Arcangeli
2008-04-08 15:44 ` [PATCH 2 of 9] Core of mmu notifiers Andrea Arcangeli
2008-04-08 16:26   ` Robin Holt
2008-04-08 17:05     ` Andrea Arcangeli
2008-04-14 19:57   ` Christoph Lameter
2008-04-14 19:59   ` Christoph Lameter
2008-04-08 15:44 ` [PATCH 3 of 9] Moves all mmu notifier methods outside the PT lock (first and not last Andrea Arcangeli
2008-04-14 19:57   ` Christoph Lameter
2008-04-08 15:44 ` [PATCH 4 of 9] Move the tlb flushing into free_pgtables. The conversion of the locks Andrea Arcangeli
2008-04-08 15:44 ` [PATCH 5 of 9] The conversion to a rwsem allows callbacks during rmap traversal Andrea Arcangeli
2008-04-08 15:44 ` [PATCH 6 of 9] We no longer abort unmapping in unmap vmas because we can reschedule while Andrea Arcangeli
2008-04-08 15:44 ` [PATCH 7 of 9] Convert the anon_vma spinlock to a rw semaphore. This allows concurrent Andrea Arcangeli
2008-04-08 15:44 ` [PATCH 8 of 9] XPMEM would have used sys_madvise() except that madvise_dontneed() Andrea Arcangeli
2008-04-08 15:44 ` [PATCH 9 of 9] This patch adds a lock ordering rule to avoid a potential deadlock when Andrea Arcangeli
2008-04-08 21:46 ` [PATCH 0 of 9] mmu notifier #v12 Avi Kivity
2008-04-08 22:06   ` Andrea Arcangeli
2008-04-09 13:17 ` Robin Holt
2008-04-09 14:44   ` Andrea Arcangeli
2008-04-09 18:55     ` Robin Holt
2008-04-22  7:20       ` Andrea Arcangeli
2008-04-22 12:00         ` Andrea Arcangeli
2008-04-22 13:01           ` Robin Holt
2008-04-22 13:21             ` Andrea Arcangeli
2008-04-22 13:36               ` Robin Holt
2008-04-22 13:48                 ` Andrea Arcangeli
2008-04-22 15:26                   ` Robin Holt [this message]
2008-04-14 23:09 ` Christoph Lameter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080422152600.GP30298@sgi.com \
    --to=holt@sgi.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=andrea@qumranet.com \
    --cc=avi@qumranet.com \
    --cc=clameter@sgi.com \
    --cc=general@lists.openfabrics.org \
    --cc=hugh@veritas.com \
    --cc=kanojsarcar@yahoo.com \
    --cc=kvm-devel@lists.sourceforge.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=npiggin@suse.de \
    --cc=rdreier@cisco.com \
    --cc=steiner@sgi.com \
    --cc=swise@opengridcomputing.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).