linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Andrea Arcangeli <aarcange@redhat.com>
To: Robin Holt <holt@sgi.com>
Cc: Jack Steiner <steiner@sgi.com>,
	cl@linux-foundation.org, mingo@elte.hu, tglx@linutronix.de,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] - Fix unmap_vma() bug related to mmu_notifiers
Date: Thu, 28 Jan 2010 16:20:48 +0100	[thread overview]
Message-ID: <20100128152048.GA1217@random.random> (raw)
In-Reply-To: <20100128132503.GJ6616@sgi.com>

On Thu, Jan 28, 2010 at 07:25:03AM -0600, Robin Holt wrote:
> The GRU is using a hardware TLB of 2MB page size when the
> is_vm_hugetlb_page() indicates it is a 2MB vma.  From my reading of it,
> your callout to mmu_notifier_invalidate_page() will work for gru and I
> think it will work for XPMEM as well, but I am not certain of that yet.
> I am certain that it can be made to work for XPMEM.  I think using the
> range callouts are actually worse because now we are mixing the conceptual
> uses of page and range.

KVM also will obviously be fine, the whole point of transparent
hugepage support is to allow mapping 2M tlb and 2M shadow pages in the
spmd... In fact I'm already calling the 4k callout for the 2M
pmdp_flush_clear_young_notify... because worst case that won't cash
but only swap less smart. However at the moment start/stop is just
safer... and gives more peace of mind and they can't schedule
anyway... but I surely would prefer a single call too, if nothing else
for performance reasons. Said that it's definitely not a fast path
worth worrying about for KVM.

Even the tlb_flush of pmdp_*flush on x86 is a range flush in
transparent huegepage support because I found errata that invlpg isn't
ok to flush PSE tlb on some cpus but then I didn't check the details,
I just wanted less risk right now, later it can be optimized (worst
case dependent on cpuid).

Note also that pmdp_splitting_flush_notify probably can drop the
_notify part. As long as there is symmetry with the "pages" returned
by gup and their respective put_page and you only use the "page" to do
put_page and get its physical address, there is no need to be notified
about a split_huge_page. At the moment it seems just a little more
paranoid but again not a requirement by design because hugepages are
stable, and only thing that can require an invalidate is a physical
relocation that only happens during swap (or similar). split_huge_page
doesn't affect _physical_ at all, and in turn there is in theory no
need to modify the secondary mmu mappings, when the primary mmu
mappings are altered. One of the reasons of altering the primary mmu
mappings is to avoid confusing the code that can't handle huge pmd
natively and would crash on them, so we virtually split the page to
show to that code an environment it won't find itself lost.

> I must be missing something key here.  I thought unmap_mapping_range_vma
> would percolate down to calling mmu_notifier_invalidate_page() which
> xpmem can sleep in.  Based upon that assumption, I don't see the
> need for the other patches.

unmap_mapping_range takes i_mmap_lock (spinlock) and then calls
zap_page_range that calls unmap_vmas under spinlock, that leads to
mmu_notifier_invalidate_range_start under i_mmap_lock. That only
happens for truncate... That was also the reason that Christioph's
first patchset had a sleep parameter in its version of
mmu_notifier_invalidate_something(sleep) (and sleep=0 was passed when
it was called inside truncate IIRC).

> I don't see the mandatory part here.  Maybe it is your broken english

Eheh, my english so bad ah... :(. And my writing is probably better
than my pronunciation ;)

> combined with my ignorance, but I do not see what the statement
> "i_mmap_lock side is mandatory" is based upon.  It looks to me like

Tried to explain again above, hope it is clearer this time.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

      reply	other threads:[~2010-01-28 15:20 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-01-25 17:45 [PATCH] - Fix unmap_vma() bug related to mmu_notifiers Jack Steiner
2010-01-25 19:00 ` Andrea Arcangeli
2010-01-25 21:10   ` Jack Steiner
2010-01-25 21:16     ` Andrea Arcangeli
2010-01-26 21:29       ` Robin Holt
2010-01-26 21:38         ` Andrea Arcangeli
2010-01-28  3:18           ` Robin Holt
2010-01-28  3:49             ` Robin Holt
2010-01-28 10:03               ` Andrea Arcangeli
2010-01-28 13:25                 ` Robin Holt
2010-01-28 15:20                   ` Andrea Arcangeli [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100128152048.GA1217@random.random \
    --to=aarcange@redhat.com \
    --cc=cl@linux-foundation.org \
    --cc=holt@sgi.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mingo@elte.hu \
    --cc=steiner@sgi.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).