linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Robin Holt <holt@sgi.com>
To: Andrea Arcangeli <andrea@qumranet.com>
Cc: Jack Steiner <steiner@sgi.com>, Robin Holt <holt@sgi.com>,
	Christoph Lameter <clameter@sgi.com>,
	Nick Piggin <npiggin@suse.de>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	kvm-devel@lists.sourceforge.net,
	Kanoj Sarcar <kanojsarcar@yahoo.com>,
	Roland Dreier <rdreier@cisco.com>,
	Steve Wise <swise@opengridcomputing.com>,
	linux-kernel@vger.kernel.org, Avi Kivity <avi@qumranet.com>,
	linux-mm@kvack.org, general@lists.openfabrics.org,
	Hugh Dickins <hugh@veritas.com>,
	akpm@linux-foundation.org, Rusty Russell <rusty@rustcorp.com.au>
Subject: Re: [PATCH 01 of 12] Core of mmu notifiers
Date: Thu, 24 Apr 2008 04:51:12 -0500	[thread overview]
Message-ID: <20080424095112.GC30298@sgi.com> (raw)
In-Reply-To: <20080424064753.GH24536@duo.random>

I am not certain of this, but it seems like this patch leaves things in
a somewhat asymetric state.  At the very least, I think that asymetry
should be documented in the comments of either mmu_notifier.h or .c.

Before I do the first mmu_notifier_register, all places that test for
mm_has_notifiers(mm) will return false and take the fast path.

After I do some mmu_notifier_register()s and their corresponding
mmu_notifier_unregister()s, The mm_has_notifiers(mm) will return true
and the slow path will be taken.  This, despite all registered notifiers
having unregistered.

It seems to me the work done by mmu_notifier_mm_destroy should really
be done inside the mm_lock()/mm_unlock area of mmu_unregister and
mm_notifier_release when we have removed the last entry.  That would
give the users job the same performance after they are done using the
special device that they had prior to its use.


On Thu, Apr 24, 2008 at 08:49:40AM +0200, Andrea Arcangeli wrote:
...
> diff --git a/mm/memory.c b/mm/memory.c
> --- a/mm/memory.c
> +++ b/mm/memory.c
...
> @@ -603,25 +605,39 @@
>  	 * readonly mappings. The tradeoff is that copy_page_range is more
>  	 * efficient than faulting.
>  	 */
> +	ret = 0;
>  	if (!(vma->vm_flags & (VM_HUGETLB|VM_NONLINEAR|VM_PFNMAP|VM_INSERTPAGE))) {
>  		if (!vma->anon_vma)
> -			return 0;
> +			goto out;
>  	}
>  
> -	if (is_vm_hugetlb_page(vma))
> -		return copy_hugetlb_page_range(dst_mm, src_mm, vma);
> +	if (unlikely(is_vm_hugetlb_page(vma))) {
> +		ret = copy_hugetlb_page_range(dst_mm, src_mm, vma);
> +		goto out;
> +	}
>  
> +	if (is_cow_mapping(vma->vm_flags))
> +		mmu_notifier_invalidate_range_start(src_mm, addr, end);
> +
> +	ret = 0;

I don't think this is needed.

...
> +/* avoid memory allocations for mm_unlock to prevent deadlock */
> +void mm_unlock(struct mm_struct *mm, struct mm_lock_data *data)
> +{
> +	if (mm->map_count) {
> +		if (data->nr_anon_vma_locks)
> +			mm_unlock_vfree(data->anon_vma_locks,
> +					data->nr_anon_vma_locks);
> +		if (data->i_mmap_locks)

I think you really want data->nr_i_mmap_locks.

...
> diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
> new file mode 100644
> --- /dev/null
> +++ b/mm/mmu_notifier.c
...
> +/*
> + * This function can't run concurrently against mmu_notifier_register
> + * or any other mmu notifier method. mmu_notifier_register can only
> + * run with mm->mm_users > 0 (and exit_mmap runs only when mm_users is
> + * zero). All other tasks of this mm already quit so they can't invoke
> + * mmu notifiers anymore. This can run concurrently only against
> + * mmu_notifier_unregister and it serializes against it with the
> + * unregister_lock in addition to RCU. struct mmu_notifier_mm can't go
> + * away from under us as the exit_mmap holds a mm_count pin itself.
> + *
> + * The ->release method can't allow the module to be unloaded, the
> + * module can only be unloaded after mmu_notifier_unregister run. This
> + * is because the release method has to run the ret instruction to
> + * return back here, and so it can't allow the ret instruction to be
> + * freed.
> + */

The second paragraph of this comment seems extraneous.

...
> +	/*
> +	 * Wait ->release if mmu_notifier_unregister run list_del_rcu.
> +	 * srcu can't go away from under us because one mm_count is
> +	 * hold by exit_mmap.
> +	 */

These two sentences don't make any sense to me.

...
> +void mmu_notifier_unregister(struct mmu_notifier *mn, struct mm_struct *mm)
> +{
> +	int before_release = 0, srcu;
> +
> +	BUG_ON(atomic_read(&mm->mm_count) <= 0);
> +
> +	srcu = srcu_read_lock(&mm->mmu_notifier_mm->srcu);
> +	spin_lock(&mm->mmu_notifier_mm->unregister_lock);
> +	if (!hlist_unhashed(&mn->hlist)) {
> +		hlist_del_rcu(&mn->hlist);
> +		before_release = 1;
> +	}
> +	spin_unlock(&mm->mmu_notifier_mm->unregister_lock);
> +	if (before_release)
> +		/*
> +		 * exit_mmap will block in mmu_notifier_release to
> +		 * guarantee ->release is called before freeing the
> +		 * pages.
> +		 */
> +		mn->ops->release(mn, mm);

I am not certain about the need to do the release callout when the driver
has already told this subsystem it is done.  For XPMEM, this callout
would immediately return.  I would expect it to be the same or GRU.

Thanks,
Robin

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2008-04-24  9:51 UTC|newest]

Thread overview: 86+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-04-22 13:51 [PATCH 00 of 12] mmu notifier #v13 Andrea Arcangeli
2008-04-22 13:51 ` [PATCH 01 of 12] Core of mmu notifiers Andrea Arcangeli
2008-04-22 14:56   ` Eric Dumazet
2008-04-22 15:15     ` Andrea Arcangeli
2008-04-22 15:24       ` Avi Kivity
2008-04-22 15:37       ` Eric Dumazet
2008-04-22 16:46         ` Andrea Arcangeli
2008-04-22 20:19   ` Christoph Lameter
2008-04-22 20:31     ` Robin Holt
2008-04-22 22:35     ` Andrea Arcangeli
2008-04-22 23:07       ` Robin Holt
2008-04-23  0:28         ` Jack Steiner
2008-04-23 16:37           ` Andrea Arcangeli
2008-04-23 18:19             ` Christoph Lameter
2008-04-23 18:25               ` Andrea Arcangeli
2008-04-23 22:19             ` Andrea Arcangeli
2008-04-24  6:49               ` Andrea Arcangeli
2008-04-24  9:51                 ` Robin Holt [this message]
2008-04-24 15:39                   ` Andrea Arcangeli
2008-04-24 17:41                     ` Andrea Arcangeli
2008-04-26 13:17                       ` Robin Holt
2008-04-26 14:04                         ` Andrea Arcangeli
2008-04-27 12:27                         ` Andrea Arcangeli
2008-04-28 20:34                           ` Christoph Lameter
2008-04-29  0:10                             ` Andrea Arcangeli
2008-04-29  1:28                               ` Christoph Lameter
2008-04-29 15:30                                 ` Andrea Arcangeli
2008-04-29 15:50                                   ` Robin Holt
2008-04-29 16:03                                     ` Andrea Arcangeli
2008-05-07 15:00                                       ` Andrea Arcangeli
2008-04-29 10:49                               ` Hugh Dickins
2008-04-29 13:32                                 ` Andrea Arcangeli
2008-04-23 13:36         ` Andrea Arcangeli
2008-04-23 14:47           ` Robin Holt
2008-04-23 15:59             ` Andrea Arcangeli
2008-04-23 18:09               ` Christoph Lameter
2008-04-23 18:19                 ` Andrea Arcangeli
2008-04-23 18:27                   ` Christoph Lameter
2008-04-23 18:37                     ` Andrea Arcangeli
2008-04-23 18:46                       ` Christoph Lameter
2008-04-22 23:20       ` Christoph Lameter
2008-04-23 16:26         ` Andrea Arcangeli
2008-04-23 17:24           ` Andrea Arcangeli
2008-04-23 18:21             ` Christoph Lameter
2008-04-23 18:34               ` Andrea Arcangeli
2008-04-23 18:15           ` Christoph Lameter
2008-04-23 17:09   ` Jack Steiner
2008-04-23 17:45     ` Andrea Arcangeli
2008-04-22 13:51 ` [PATCH 02 of 12] Fix ia64 compilation failure because of common code include bug Andrea Arcangeli
2008-04-22 20:22   ` Christoph Lameter
2008-04-22 22:43     ` Andrea Arcangeli
2008-04-22 23:07       ` Robin Holt
2008-04-22 13:51 ` [PATCH 03 of 12] get_task_mm should not succeed if mmput() is running and has reduced Andrea Arcangeli
2008-04-22 20:23   ` Christoph Lameter
2008-04-22 22:37     ` Andrea Arcangeli
2008-04-22 23:13       ` Christoph Lameter
2008-04-22 13:51 ` [PATCH 04 of 12] Moves all mmu notifier methods outside the PT lock (first and not last Andrea Arcangeli
2008-04-22 20:24   ` Christoph Lameter
2008-04-22 22:40     ` Andrea Arcangeli
2008-04-22 23:14       ` Christoph Lameter
2008-04-23 13:44         ` Andrea Arcangeli
2008-04-23 15:45           ` Robin Holt
2008-04-23 16:15             ` Andrea Arcangeli
2008-04-23 19:55               ` Robin Holt
2008-04-23 21:05             ` Avi Kivity
2008-04-23 18:02           ` Christoph Lameter
2008-04-23 18:16             ` Andrea Arcangeli
2008-04-22 13:51 ` [PATCH 05 of 12] Move the tlb flushing into free_pgtables. The conversion of the locks Andrea Arcangeli
2008-04-22 20:25   ` Christoph Lameter
2008-04-22 13:51 ` [PATCH 06 of 12] Move the tlb flushing inside of unmap vmas. This saves us from passing Andrea Arcangeli
2008-04-22 13:51 ` [PATCH 07 of 12] Add a function to rw_semaphores to check if there are any processes Andrea Arcangeli
2008-04-22 13:51 ` [PATCH 08 of 12] The conversion to a rwsem allows notifier callbacks during rmap traversal Andrea Arcangeli
2008-04-22 13:51 ` [PATCH 09 of 12] Convert the anon_vma spinlock to a rw semaphore. This allows concurrent Andrea Arcangeli
2008-04-22 13:51 ` [PATCH 10 of 12] Convert mm_lock to use semaphores after i_mmap_lock and anon_vma_lock Andrea Arcangeli
2008-04-22 20:26   ` Christoph Lameter
2008-04-22 22:54     ` Andrea Arcangeli
2008-04-22 23:19       ` Christoph Lameter
2008-04-22 13:51 ` [PATCH 11 of 12] XPMEM would have used sys_madvise() except that madvise_dontneed() Andrea Arcangeli
2008-04-22 13:51 ` [PATCH 12 of 12] This patch adds a lock ordering rule to avoid a potential deadlock when Andrea Arcangeli
2008-04-22 18:22 ` [PATCH 00 of 12] mmu notifier #v13 Robin Holt
2008-04-22 18:43   ` Andrea Arcangeli
2008-04-22 19:42     ` Robin Holt
2008-04-22 20:30       ` Christoph Lameter
2008-04-23 13:33         ` Andrea Arcangeli
2008-04-22 20:28     ` Christoph Lameter
2008-04-23  0:31 ` Jack Steiner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080424095112.GC30298@sgi.com \
    --to=holt@sgi.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=andrea@qumranet.com \
    --cc=avi@qumranet.com \
    --cc=clameter@sgi.com \
    --cc=general@lists.openfabrics.org \
    --cc=hugh@veritas.com \
    --cc=kanojsarcar@yahoo.com \
    --cc=kvm-devel@lists.sourceforge.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=npiggin@suse.de \
    --cc=rdreier@cisco.com \
    --cc=rusty@rustcorp.com.au \
    --cc=steiner@sgi.com \
    --cc=swise@opengridcomputing.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).