All of lore.kernel.org
 help / color / mirror / Atom feed
From: Robin Holt <holt@sgi.com>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: Robin Holt <holt@sgi.com>, Jack Steiner <steiner@sgi.com>,
	cl@linux-foundation.org, mingo@elte.hu, tglx@linutronix.de,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] - Fix unmap_vma() bug related to mmu_notifiers
Date: Thu, 28 Jan 2010 07:25:03 -0600	[thread overview]
Message-ID: <20100128132503.GJ6616@sgi.com> (raw)
In-Reply-To: <20100128100327.GF24242@random.random>

On Thu, Jan 28, 2010 at 11:03:27AM +0100, Andrea Arcangeli wrote:
> On Wed, Jan 27, 2010 at 09:49:44PM -0600, Robin Holt wrote:
> > > I think that with the SRCU patch, we have enough.  Is that true or have
> > > I missed something?
> > 
> > I wasn't quite complete in my previous email.  Your srcu patch
> > plus Jack's patch to move the tlb_gather_mmu to after the
> > mmu_notifier_invalidate_range_start().
> 
> My pmdp_clear_flush_notify with transparent hugepage will give some
> trouble because it's using mmu_notifier_invalidate_range_start to
> provide backwards compatible API to mmu notifier users like GRU that
> may be mapping physical hugepages with 4k secondary tlb mappings
> (which have to be all invalidated not only the first one). So that
> would still require the full series as it's like if the rmap code
> would be using mmu_notifier_invalidate_range_start. But we can
> probably get away by forcing all mmu notifier methods to provide a
> mmu_notifier_invalidate_hugepage.

The GRU is using a hardware TLB of 2MB page size when the
is_vm_hugetlb_page() indicates it is a 2MB vma.  From my reading of it,
your callout to mmu_notifier_invalidate_page() will work for gru and I
think it will work for XPMEM as well, but I am not certain of that yet.
I am certain that it can be made to work for XPMEM.  I think using the
range callouts are actually worse because now we are mixing the conceptual
uses of page and range.

> But in addition to srcu surely you also need i_mmap_lock_to_sem for
> unmap_mapping_range_vma taking i_mmap_lock, basically you missed
> truncate. Which then in cascade requires free_pgtables,

I must be missing something key here.  I thought unmap_mapping_range_vma
would percolate down to calling mmu_notifier_invalidate_page() which
xpmem can sleep in.  Based upon that assumption, I don't see the
need for the other patches.

> rwsem-contended, unmap_vmas (the latter are for the tlb gather
> required to be in atomic context to avoid scheduling to other cpus
> while holding the tlb gather).
> 
> So you only avoid the need of anon-vma switching to rwsem (because
> there's no range-vmtruncate but only rmap uses it on a page-by-page
> basis with mmu_notifier_invalidate_page). So at that point IMHO you
> can as well add a CONFIG_MMU_NOTIFIER_SLEEPABLE and allow scheduling
> everywhere in mmu notifier IMHO, but if you prefer to avoid changing
> anon_vma lock to rwsem and add refcounting that is ok with me too. I
> just think it'd be cleaner to switch them all to sleepable code if we
> have to provide for it and most of the work on the i_mmap_lock side is
> mandatory anyway.

I don't see the mandatory part here.  Maybe it is your broken english
combined with my ignorance, but I do not see what the statement
"i_mmap_lock side is mandatory" is based upon.  It looks to me like
everywhere that is calling an mmu_notifier_invalidate_* while holding
the i_mmap_lock is calling mmu_notifier_invalidate_page().  That is
currently safe for sleeping as far as XPMEM is concerned.  Is there a
place that is calling mmu_notifier_invalidate_range_*() while holding
the i_mmap_lock which I have missed?

Thanks,
Robin

WARNING: multiple messages have this Message-ID (diff)
From: Robin Holt <holt@sgi.com>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: Robin Holt <holt@sgi.com>, Jack Steiner <steiner@sgi.com>,
	cl@linux-foundation.org, mingo@elte.hu, tglx@linutronix.de,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] - Fix unmap_vma() bug related to mmu_notifiers
Date: Thu, 28 Jan 2010 07:25:03 -0600	[thread overview]
Message-ID: <20100128132503.GJ6616@sgi.com> (raw)
In-Reply-To: <20100128100327.GF24242@random.random>

On Thu, Jan 28, 2010 at 11:03:27AM +0100, Andrea Arcangeli wrote:
> On Wed, Jan 27, 2010 at 09:49:44PM -0600, Robin Holt wrote:
> > > I think that with the SRCU patch, we have enough.  Is that true or have
> > > I missed something?
> > 
> > I wasn't quite complete in my previous email.  Your srcu patch
> > plus Jack's patch to move the tlb_gather_mmu to after the
> > mmu_notifier_invalidate_range_start().
> 
> My pmdp_clear_flush_notify with transparent hugepage will give some
> trouble because it's using mmu_notifier_invalidate_range_start to
> provide backwards compatible API to mmu notifier users like GRU that
> may be mapping physical hugepages with 4k secondary tlb mappings
> (which have to be all invalidated not only the first one). So that
> would still require the full series as it's like if the rmap code
> would be using mmu_notifier_invalidate_range_start. But we can
> probably get away by forcing all mmu notifier methods to provide a
> mmu_notifier_invalidate_hugepage.

The GRU is using a hardware TLB of 2MB page size when the
is_vm_hugetlb_page() indicates it is a 2MB vma.  From my reading of it,
your callout to mmu_notifier_invalidate_page() will work for gru and I
think it will work for XPMEM as well, but I am not certain of that yet.
I am certain that it can be made to work for XPMEM.  I think using the
range callouts are actually worse because now we are mixing the conceptual
uses of page and range.

> But in addition to srcu surely you also need i_mmap_lock_to_sem for
> unmap_mapping_range_vma taking i_mmap_lock, basically you missed
> truncate. Which then in cascade requires free_pgtables,

I must be missing something key here.  I thought unmap_mapping_range_vma
would percolate down to calling mmu_notifier_invalidate_page() which
xpmem can sleep in.  Based upon that assumption, I don't see the
need for the other patches.

> rwsem-contended, unmap_vmas (the latter are for the tlb gather
> required to be in atomic context to avoid scheduling to other cpus
> while holding the tlb gather).
> 
> So you only avoid the need of anon-vma switching to rwsem (because
> there's no range-vmtruncate but only rmap uses it on a page-by-page
> basis with mmu_notifier_invalidate_page). So at that point IMHO you
> can as well add a CONFIG_MMU_NOTIFIER_SLEEPABLE and allow scheduling
> everywhere in mmu notifier IMHO, but if you prefer to avoid changing
> anon_vma lock to rwsem and add refcounting that is ok with me too. I
> just think it'd be cleaner to switch them all to sleepable code if we
> have to provide for it and most of the work on the i_mmap_lock side is
> mandatory anyway.

I don't see the mandatory part here.  Maybe it is your broken english
combined with my ignorance, but I do not see what the statement
"i_mmap_lock side is mandatory" is based upon.  It looks to me like
everywhere that is calling an mmu_notifier_invalidate_* while holding
the i_mmap_lock is calling mmu_notifier_invalidate_page().  That is
currently safe for sleeping as far as XPMEM is concerned.  Is there a
place that is calling mmu_notifier_invalidate_range_*() while holding
the i_mmap_lock which I have missed?

Thanks,
Robin

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2010-01-28 13:25 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-01-25 17:45 [PATCH] - Fix unmap_vma() bug related to mmu_notifiers Jack Steiner
2010-01-25 17:45 ` Jack Steiner
2010-01-25 19:00 ` Andrea Arcangeli
2010-01-25 19:00   ` Andrea Arcangeli
2010-01-25 21:10   ` Jack Steiner
2010-01-25 21:10     ` Jack Steiner
2010-01-25 21:16     ` Andrea Arcangeli
2010-01-25 21:16       ` Andrea Arcangeli
2010-01-26 21:29       ` Robin Holt
2010-01-26 21:29         ` Robin Holt
2010-01-26 21:38         ` Andrea Arcangeli
2010-01-26 21:38           ` Andrea Arcangeli
2010-01-28  3:18           ` Robin Holt
2010-01-28  3:18             ` Robin Holt
2010-01-28  3:49             ` Robin Holt
2010-01-28  3:49               ` Robin Holt
2010-01-28 10:03               ` Andrea Arcangeli
2010-01-28 10:03                 ` Andrea Arcangeli
2010-01-28 13:25                 ` Robin Holt [this message]
2010-01-28 13:25                   ` Robin Holt
2010-01-28 15:20                   ` Andrea Arcangeli
2010-01-28 15:20                     ` Andrea Arcangeli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100128132503.GJ6616@sgi.com \
    --to=holt@sgi.com \
    --cc=aarcange@redhat.com \
    --cc=cl@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mingo@elte.hu \
    --cc=steiner@sgi.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.