From: Rusty Russell <rusty@rustcorp.com.au>
To: Andrea Arcangeli <andrea@qumranet.com>
Cc: Christoph Lameter <clameter@sgi.com>,
akpm@linux-foundation.org, Nick Piggin <npiggin@suse.de>,
Steve Wise <swise@opengridcomputing.com>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
linux-mm@kvack.org, Kanoj Sarcar <kanojsarcar@yahoo.com>,
Roland Dreier <rdreier@cisco.com>, Jack Steiner <steiner@sgi.com>,
linux-kernel@vger.kernel.org, Avi Kivity <avi@qumranet.com>,
kvm-devel@lists.sourceforge.net, Robin Holt <holt@sgi.com>,
general@lists.openfabrics.org, Hugh Dickins <hugh@veritas.com>
Subject: Re: [PATCH 1 of 9] Lock the entire mm to prevent any mmu related operation to happen
Date: Tue, 22 Apr 2008 15:06:24 +1000 [thread overview]
Message-ID: <200804221506.26226.rusty@rustcorp.com.au> (raw)
In-Reply-To: <ec6d8f91b299cf26cce5.1207669444@duo.random>
On Wednesday 09 April 2008 01:44:04 Andrea Arcangeli wrote:
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1050,6 +1050,15 @@
> unsigned long addr, unsigned long len,
> unsigned long flags, struct page **pages);
>
> +struct mm_lock_data {
> + spinlock_t **i_mmap_locks;
> + spinlock_t **anon_vma_locks;
> + unsigned long nr_i_mmap_locks;
> + unsigned long nr_anon_vma_locks;
> +};
> +extern struct mm_lock_data *mm_lock(struct mm_struct * mm);
> +extern void mm_unlock(struct mm_struct *mm, struct mm_lock_data *data);
As far as I can tell you don't actually need to expose this struct at all?
> + data->i_mmap_locks = vmalloc(nr_i_mmap_locks *
> + sizeof(spinlock_t));
This is why non-typesafe allocators suck. You want 'sizeof(spinlock_t *)'
here.
> + data->anon_vma_locks = vmalloc(nr_anon_vma_locks *
> + sizeof(spinlock_t));
and here.
> + err = -EINTR;
> + i_mmap_lock_last = NULL;
> + nr_i_mmap_locks = 0;
> + for (;;) {
> + spinlock_t *i_mmap_lock = (spinlock_t *) -1UL;
> + for (vma = mm->mmap; vma; vma = vma->vm_next) {
...
> + data->i_mmap_locks[nr_i_mmap_locks++] = i_mmap_lock;
> + }
> + data->nr_i_mmap_locks = nr_i_mmap_locks;
How about you track your running counter in data->nr_i_mmap_locks, leave
nr_i_mmap_locks alone, and BUG_ON(data->nr_i_mmap_locks != nr_i_mmap_locks)?
Even nicer would be to wrap this in a "get_sorted_mmap_locks()" function.
Similarly for anon_vma locks.
Unfortunately, I just don't think we can fail locking like this. In your next
patch unregistering a notifier can fail because of it: that not usable.
I think it means you need to add a linked list element to the vma for the
CONFIG_MMU_NOTIFIER case. Or track the max number of vmas for any mm, and
keep a pool to handle mm_lock for this number (ie. if you can't enlarge the
pool, fail the vma allocation).
Both have their problems though...
Rusty.
WARNING: multiple messages have this Message-ID (diff)
From: Rusty Russell <rusty@rustcorp.com.au>
To: Andrea Arcangeli <andrea@qumranet.com>
Cc: Nick Piggin <npiggin@suse.de>, Jack Steiner <steiner@sgi.com>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
kvm-devel@lists.sourceforge.net,
Kanoj Sarcar <kanojsarcar@yahoo.com>,
Roland Dreier <rdreier@cisco.com>,
Steve Wise <swise@opengridcomputing.com>,
linux-kernel@vger.kernel.org, Avi Kivity <avi@qumranet.com>,
linux-mm@kvack.org, Robin Holt <holt@sgi.com>,
general@lists.openfabrics.org, Hugh Dickins <hugh@veritas.com>,
akpm@linux-foundation.org, Christoph Lameter <clameter@sgi.com>
Subject: Re: [PATCH 1 of 9] Lock the entire mm to prevent any mmu related operation to happen
Date: Tue, 22 Apr 2008 15:06:24 +1000 [thread overview]
Message-ID: <200804221506.26226.rusty@rustcorp.com.au> (raw)
In-Reply-To: <ec6d8f91b299cf26cce5.1207669444@duo.random>
On Wednesday 09 April 2008 01:44:04 Andrea Arcangeli wrote:
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1050,6 +1050,15 @@
> unsigned long addr, unsigned long len,
> unsigned long flags, struct page **pages);
>
> +struct mm_lock_data {
> + spinlock_t **i_mmap_locks;
> + spinlock_t **anon_vma_locks;
> + unsigned long nr_i_mmap_locks;
> + unsigned long nr_anon_vma_locks;
> +};
> +extern struct mm_lock_data *mm_lock(struct mm_struct * mm);
> +extern void mm_unlock(struct mm_struct *mm, struct mm_lock_data *data);
As far as I can tell you don't actually need to expose this struct at all?
> + data->i_mmap_locks = vmalloc(nr_i_mmap_locks *
> + sizeof(spinlock_t));
This is why non-typesafe allocators suck. You want 'sizeof(spinlock_t *)'
here.
> + data->anon_vma_locks = vmalloc(nr_anon_vma_locks *
> + sizeof(spinlock_t));
and here.
> + err = -EINTR;
> + i_mmap_lock_last = NULL;
> + nr_i_mmap_locks = 0;
> + for (;;) {
> + spinlock_t *i_mmap_lock = (spinlock_t *) -1UL;
> + for (vma = mm->mmap; vma; vma = vma->vm_next) {
...
> + data->i_mmap_locks[nr_i_mmap_locks++] = i_mmap_lock;
> + }
> + data->nr_i_mmap_locks = nr_i_mmap_locks;
How about you track your running counter in data->nr_i_mmap_locks, leave
nr_i_mmap_locks alone, and BUG_ON(data->nr_i_mmap_locks != nr_i_mmap_locks)?
Even nicer would be to wrap this in a "get_sorted_mmap_locks()" function.
Similarly for anon_vma locks.
Unfortunately, I just don't think we can fail locking like this. In your next
patch unregistering a notifier can fail because of it: that not usable.
I think it means you need to add a linked list element to the vma for the
CONFIG_MMU_NOTIFIER case. Or track the max number of vmas for any mm, and
keep a pool to handle mm_lock for this number (ie. if you can't enlarge the
pool, fail the vma allocation).
Both have their problems though...
Rusty.
-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
Don't miss this year's exciting event. There's still time to save $100.
Use priority code J8TL2D2.
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
WARNING: multiple messages have this Message-ID (diff)
From: Rusty Russell <rusty@rustcorp.com.au>
To: Andrea Arcangeli <andrea@qumranet.com>
Cc: Christoph Lameter <clameter@sgi.com>,
akpm@linux-foundation.org, Nick Piggin <npiggin@suse.de>,
Steve Wise <swise@opengridcomputing.com>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
linux-mm@kvack.org, Kanoj Sarcar <kanojsarcar@yahoo.com>,
Roland Dreier <rdreier@cisco.com>, Jack Steiner <steiner@sgi.com>,
linux-kernel@vger.kernel.org, Avi Kivity <avi@qumranet.com>,
kvm-devel@lists.sourceforge.net, Robin Holt <holt@sgi.com>,
general@lists.openfabrics.org, Hugh Dickins <hugh@veritas.com>
Subject: Re: [PATCH 1 of 9] Lock the entire mm to prevent any mmu related operation to happen
Date: Tue, 22 Apr 2008 15:06:24 +1000 [thread overview]
Message-ID: <200804221506.26226.rusty@rustcorp.com.au> (raw)
In-Reply-To: <ec6d8f91b299cf26cce5.1207669444@duo.random>
On Wednesday 09 April 2008 01:44:04 Andrea Arcangeli wrote:
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1050,6 +1050,15 @@
> unsigned long addr, unsigned long len,
> unsigned long flags, struct page **pages);
>
> +struct mm_lock_data {
> + spinlock_t **i_mmap_locks;
> + spinlock_t **anon_vma_locks;
> + unsigned long nr_i_mmap_locks;
> + unsigned long nr_anon_vma_locks;
> +};
> +extern struct mm_lock_data *mm_lock(struct mm_struct * mm);
> +extern void mm_unlock(struct mm_struct *mm, struct mm_lock_data *data);
As far as I can tell you don't actually need to expose this struct at all?
> + data->i_mmap_locks = vmalloc(nr_i_mmap_locks *
> + sizeof(spinlock_t));
This is why non-typesafe allocators suck. You want 'sizeof(spinlock_t *)'
here.
> + data->anon_vma_locks = vmalloc(nr_anon_vma_locks *
> + sizeof(spinlock_t));
and here.
> + err = -EINTR;
> + i_mmap_lock_last = NULL;
> + nr_i_mmap_locks = 0;
> + for (;;) {
> + spinlock_t *i_mmap_lock = (spinlock_t *) -1UL;
> + for (vma = mm->mmap; vma; vma = vma->vm_next) {
...
> + data->i_mmap_locks[nr_i_mmap_locks++] = i_mmap_lock;
> + }
> + data->nr_i_mmap_locks = nr_i_mmap_locks;
How about you track your running counter in data->nr_i_mmap_locks, leave
nr_i_mmap_locks alone, and BUG_ON(data->nr_i_mmap_locks != nr_i_mmap_locks)?
Even nicer would be to wrap this in a "get_sorted_mmap_locks()" function.
Similarly for anon_vma locks.
Unfortunately, I just don't think we can fail locking like this. In your next
patch unregistering a notifier can fail because of it: that not usable.
I think it means you need to add a linked list element to the vma for the
CONFIG_MMU_NOTIFIER case. Or track the max number of vmas for any mm, and
keep a pool to handle mm_lock for this number (ie. if you can't enlarge the
pool, fail the vma allocation).
Both have their problems though...
Rusty.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2008-04-22 5:06 UTC|newest]
Thread overview: 125+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-04-08 15:44 [PATCH 0 of 9] mmu notifier #v12 Andrea Arcangeli
2008-04-08 15:44 ` Andrea Arcangeli
2008-04-08 15:44 ` [ofa-general] " Andrea Arcangeli
2008-04-08 15:44 ` [PATCH 1 of 9] Lock the entire mm to prevent any mmu related operation to happen Andrea Arcangeli
2008-04-08 15:44 ` Andrea Arcangeli
2008-04-08 15:44 ` [ofa-general] " Andrea Arcangeli
2008-04-16 16:33 ` Robin Holt
2008-04-16 16:33 ` Robin Holt
2008-04-16 16:33 ` [ofa-general] " Robin Holt
2008-04-16 18:35 ` Christoph Lameter
2008-04-16 18:35 ` Christoph Lameter
2008-04-16 18:35 ` [ofa-general] " Christoph Lameter
2008-04-16 19:02 ` Robin Holt
2008-04-16 19:02 ` Robin Holt
2008-04-16 19:02 ` Robin Holt
2008-04-16 19:15 ` Christoph Lameter
2008-04-16 19:15 ` Christoph Lameter
2008-04-16 19:15 ` [ofa-general] " Christoph Lameter
2008-04-17 11:14 ` Robin Holt
2008-04-17 11:14 ` Robin Holt
2008-04-17 11:14 ` [ofa-general] " Robin Holt
2008-04-17 15:51 ` Andrea Arcangeli
2008-04-17 15:51 ` Andrea Arcangeli
2008-04-17 16:36 ` Robin Holt
2008-04-17 16:36 ` Robin Holt
2008-04-17 17:14 ` Andrea Arcangeli
2008-04-17 17:14 ` Andrea Arcangeli
2008-04-17 17:14 ` Andrea Arcangeli
2008-04-17 17:25 ` Robin Holt
2008-04-17 17:25 ` Robin Holt
2008-04-17 17:25 ` [ofa-general] " Robin Holt
2008-04-17 19:10 ` Christoph Lameter
2008-04-17 19:10 ` Christoph Lameter
2008-04-17 22:16 ` Andrea Arcangeli
2008-04-17 22:16 ` Andrea Arcangeli
2008-04-17 22:16 ` [ofa-general] " Andrea Arcangeli
2008-04-22 5:06 ` Rusty Russell [this message]
2008-04-22 5:06 ` Rusty Russell
2008-04-22 5:06 ` Rusty Russell
2008-04-25 16:56 ` Andrea Arcangeli
2008-04-25 16:56 ` Andrea Arcangeli
2008-04-25 17:04 ` Andrea Arcangeli
2008-04-25 17:04 ` Andrea Arcangeli
2008-04-25 19:25 ` Robin Holt
2008-04-25 19:25 ` Robin Holt
2008-04-25 19:25 ` [ofa-general] " Robin Holt
2008-04-26 0:57 ` Andrea Arcangeli
2008-04-26 0:57 ` Andrea Arcangeli
2008-04-26 0:57 ` Andrea Arcangeli
2008-04-08 15:44 ` [PATCH 2 of 9] Core of mmu notifiers Andrea Arcangeli
2008-04-08 15:44 ` Andrea Arcangeli
2008-04-08 15:44 ` [ofa-general] " Andrea Arcangeli
2008-04-08 16:26 ` Robin Holt
2008-04-08 16:26 ` Robin Holt
2008-04-08 16:26 ` [ofa-general] " Robin Holt
2008-04-08 17:05 ` Andrea Arcangeli
2008-04-08 17:05 ` Andrea Arcangeli
2008-04-14 19:57 ` Christoph Lameter
2008-04-14 19:57 ` Christoph Lameter
2008-04-14 19:57 ` [ofa-general] " Christoph Lameter
2008-04-14 19:59 ` Christoph Lameter
2008-04-14 19:59 ` Christoph Lameter
2008-04-14 19:59 ` Christoph Lameter
2008-04-08 15:44 ` [PATCH 3 of 9] Moves all mmu notifier methods outside the PT lock (first and not last Andrea Arcangeli
2008-04-08 15:44 ` Andrea Arcangeli
2008-04-08 15:44 ` [ofa-general] " Andrea Arcangeli
2008-04-14 19:57 ` Christoph Lameter
2008-04-14 19:57 ` Christoph Lameter
2008-04-14 19:57 ` Christoph Lameter
2008-04-08 15:44 ` [PATCH 4 of 9] Move the tlb flushing into free_pgtables. The conversion of the locks Andrea Arcangeli
2008-04-08 15:44 ` Andrea Arcangeli
2008-04-08 15:44 ` [ofa-general] " Andrea Arcangeli
2008-04-08 15:44 ` [PATCH 5 of 9] The conversion to a rwsem allows callbacks during rmap traversal Andrea Arcangeli
2008-04-08 15:44 ` Andrea Arcangeli
2008-04-08 15:44 ` [ofa-general] " Andrea Arcangeli
2008-04-08 15:44 ` [PATCH 6 of 9] We no longer abort unmapping in unmap vmas because we can reschedule while Andrea Arcangeli
2008-04-08 15:44 ` Andrea Arcangeli
2008-04-08 15:44 ` [ofa-general] " Andrea Arcangeli
2008-04-08 15:44 ` [PATCH 7 of 9] Convert the anon_vma spinlock to a rw semaphore. This allows concurrent Andrea Arcangeli
2008-04-08 15:44 ` Andrea Arcangeli
2008-04-08 15:44 ` [ofa-general] " Andrea Arcangeli
2008-04-08 15:44 ` [PATCH 8 of 9] XPMEM would have used sys_madvise() except that madvise_dontneed() Andrea Arcangeli
2008-04-08 15:44 ` Andrea Arcangeli
2008-04-08 15:44 ` [ofa-general] " Andrea Arcangeli
2008-04-08 15:44 ` [PATCH 9 of 9] This patch adds a lock ordering rule to avoid a potential deadlock when Andrea Arcangeli
2008-04-08 15:44 ` Andrea Arcangeli
2008-04-08 15:44 ` [ofa-general] " Andrea Arcangeli
2008-04-08 21:46 ` [PATCH 0 of 9] mmu notifier #v12 Avi Kivity
2008-04-08 21:46 ` Avi Kivity
2008-04-08 21:46 ` [ofa-general] " Avi Kivity
2008-04-08 22:06 ` Andrea Arcangeli
2008-04-08 22:06 ` Andrea Arcangeli
2008-04-08 22:06 ` [ofa-general] " Andrea Arcangeli
2008-04-09 13:17 ` Robin Holt
2008-04-09 13:17 ` Robin Holt
2008-04-09 13:17 ` [ofa-general] " Robin Holt
2008-04-09 14:44 ` Andrea Arcangeli
2008-04-09 14:44 ` Andrea Arcangeli
2008-04-09 14:44 ` [ofa-general] " Andrea Arcangeli
2008-04-09 18:55 ` Robin Holt
2008-04-09 18:55 ` Robin Holt
2008-04-09 18:55 ` [ofa-general] " Robin Holt
2008-04-22 7:20 ` Andrea Arcangeli
2008-04-22 7:20 ` Andrea Arcangeli
2008-04-22 7:20 ` [ofa-general] " Andrea Arcangeli
2008-04-22 12:00 ` Andrea Arcangeli
2008-04-22 12:00 ` Andrea Arcangeli
2008-04-22 12:00 ` [ofa-general] " Andrea Arcangeli
2008-04-22 13:01 ` Robin Holt
2008-04-22 13:01 ` Robin Holt
2008-04-22 13:01 ` [ofa-general] " Robin Holt
2008-04-22 13:21 ` Andrea Arcangeli
2008-04-22 13:21 ` Andrea Arcangeli
2008-04-22 13:21 ` [ofa-general] " Andrea Arcangeli
2008-04-22 13:36 ` Robin Holt
2008-04-22 13:36 ` Robin Holt
2008-04-22 13:36 ` [ofa-general] " Robin Holt
2008-04-22 13:48 ` Andrea Arcangeli
2008-04-22 13:48 ` Andrea Arcangeli
2008-04-22 13:48 ` Andrea Arcangeli
2008-04-22 15:26 ` Robin Holt
2008-04-22 15:26 ` Robin Holt
2008-04-14 23:09 ` Christoph Lameter
2008-04-14 23:09 ` Christoph Lameter
2008-04-14 23:09 ` [ofa-general] " Christoph Lameter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200804221506.26226.rusty@rustcorp.com.au \
--to=rusty@rustcorp.com.au \
--cc=a.p.zijlstra@chello.nl \
--cc=akpm@linux-foundation.org \
--cc=andrea@qumranet.com \
--cc=avi@qumranet.com \
--cc=clameter@sgi.com \
--cc=general@lists.openfabrics.org \
--cc=holt@sgi.com \
--cc=hugh@veritas.com \
--cc=kanojsarcar@yahoo.com \
--cc=kvm-devel@lists.sourceforge.net \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=npiggin@suse.de \
--cc=rdreier@cisco.com \
--cc=steiner@sgi.com \
--cc=swise@opengridcomputing.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.