From: Robin Holt <holt@sgi.com>
To: Christoph Lameter <clameter@sgi.com>
Cc: Robin Holt <holt@sgi.com>, Andrea Arcangeli <andrea@qumranet.com>,
akpm@linux-foundation.org, Nick Piggin <npiggin@suse.de>,
Steve Wise <swise@opengridcomputing.com>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
linux-mm@kvack.org, Kanoj Sarcar <kanojsarcar@yahoo.com>,
Roland Dreier <rdreier@cisco.com>, Jack Steiner <steiner@sgi.com>,
linux-kernel@vger.kernel.org, Avi Kivity <avi@qumranet.com>,
kvm-devel@lists.sourceforge.net, general@lists.openfabrics.org,
Hugh Dickins <hugh@veritas.com>
Subject: Re: [PATCH 1 of 9] Lock the entire mm to prevent any mmu related operation to happen
Date: Thu, 17 Apr 2008 06:14:04 -0500 [thread overview]
Message-ID: <20080417111404.GL22493@sgi.com> (raw)
In-Reply-To: <Pine.LNX.4.64.0804161214170.14657@schroedinger.engr.sgi.com>
On Wed, Apr 16, 2008 at 12:15:08PM -0700, Christoph Lameter wrote:
> On Wed, 16 Apr 2008, Robin Holt wrote:
>
> > On Wed, Apr 16, 2008 at 11:35:38AM -0700, Christoph Lameter wrote:
> > > On Wed, 16 Apr 2008, Robin Holt wrote:
> > >
> > > > I don't think this lock mechanism is completely working. I have
> > > > gotten a few failures trying to dereference 0x100100 which appears to
> > > > be LIST_POISON1.
> > >
> > > How does xpmem unregistering of notifiers work?
> >
> > For the tests I have been running, we are waiting for the release
> > callout as part of exit.
>
> Some more details on the failure may be useful. AFAICT list_del[_rcu] is
> the culprit here and that is only used on release or unregister.
I think I have this understood now. It happens quite quickly (within
10 minutes) on a 128 rank job of small data set in a loop.
In these failing jobs, all the ranks are nearly symmetric. There is
a certain part of each ranks address space that has access granted.
All the ranks have included all the other ranks including themselves in
exactly the same layout at exactly the same virtual address.
Rank 3 has hit _release and is beginning to clean up, but has not deleted
the notifier from its list.
Rank 9 calls the xpmem_invalidate_page() callout. That page was attached
by rank 3 so we call zap_page_range on rank 3 which then calls back into
xpmem's invalidate_range_start callout.
The rank 3 _release callout begins and deletes its notifier from the list.
Rank 9's call to rank 3's zap_page_range notifier returns and dereferences
LIST_POISON1.
I often confuse myself while trying to explain these so please kick me
where the holes in the flow appear. The console output from the simple
debugging stuff I put in is a bit overwhelming.
I am trying to figure out now which locks we hold as part of the zap
callout that should have prevented the _release callout.
Thanks,
Robin
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2008-04-17 11:14 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-04-08 15:44 [PATCH 0 of 9] mmu notifier #v12 Andrea Arcangeli
2008-04-08 15:44 ` [PATCH 1 of 9] Lock the entire mm to prevent any mmu related operation to happen Andrea Arcangeli
2008-04-16 16:33 ` Robin Holt
2008-04-16 18:35 ` Christoph Lameter
2008-04-16 19:02 ` Robin Holt
2008-04-16 19:15 ` Christoph Lameter
2008-04-17 11:14 ` Robin Holt [this message]
2008-04-17 15:51 ` Andrea Arcangeli
2008-04-17 16:36 ` Robin Holt
2008-04-17 17:14 ` Andrea Arcangeli
2008-04-17 17:25 ` Robin Holt
2008-04-17 19:10 ` Christoph Lameter
2008-04-17 22:16 ` Andrea Arcangeli
2008-04-22 5:06 ` Rusty Russell
2008-04-25 16:56 ` Andrea Arcangeli
2008-04-25 17:04 ` Andrea Arcangeli
2008-04-25 19:25 ` Robin Holt
2008-04-26 0:57 ` Andrea Arcangeli
2008-04-08 15:44 ` [PATCH 2 of 9] Core of mmu notifiers Andrea Arcangeli
2008-04-08 16:26 ` Robin Holt
2008-04-08 17:05 ` Andrea Arcangeli
2008-04-14 19:57 ` Christoph Lameter
2008-04-14 19:59 ` Christoph Lameter
2008-04-08 15:44 ` [PATCH 3 of 9] Moves all mmu notifier methods outside the PT lock (first and not last Andrea Arcangeli
2008-04-14 19:57 ` Christoph Lameter
2008-04-08 15:44 ` [PATCH 4 of 9] Move the tlb flushing into free_pgtables. The conversion of the locks Andrea Arcangeli
2008-04-08 15:44 ` [PATCH 5 of 9] The conversion to a rwsem allows callbacks during rmap traversal Andrea Arcangeli
2008-04-08 15:44 ` [PATCH 6 of 9] We no longer abort unmapping in unmap vmas because we can reschedule while Andrea Arcangeli
2008-04-08 15:44 ` [PATCH 7 of 9] Convert the anon_vma spinlock to a rw semaphore. This allows concurrent Andrea Arcangeli
2008-04-08 15:44 ` [PATCH 8 of 9] XPMEM would have used sys_madvise() except that madvise_dontneed() Andrea Arcangeli
2008-04-08 15:44 ` [PATCH 9 of 9] This patch adds a lock ordering rule to avoid a potential deadlock when Andrea Arcangeli
2008-04-08 21:46 ` [PATCH 0 of 9] mmu notifier #v12 Avi Kivity
2008-04-08 22:06 ` Andrea Arcangeli
2008-04-09 13:17 ` Robin Holt
2008-04-09 14:44 ` Andrea Arcangeli
2008-04-09 18:55 ` Robin Holt
2008-04-22 7:20 ` Andrea Arcangeli
2008-04-22 12:00 ` Andrea Arcangeli
2008-04-22 13:01 ` Robin Holt
2008-04-22 13:21 ` Andrea Arcangeli
2008-04-22 13:36 ` Robin Holt
2008-04-22 13:48 ` Andrea Arcangeli
2008-04-22 15:26 ` Robin Holt
2008-04-14 23:09 ` Christoph Lameter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080417111404.GL22493@sgi.com \
--to=holt@sgi.com \
--cc=a.p.zijlstra@chello.nl \
--cc=akpm@linux-foundation.org \
--cc=andrea@qumranet.com \
--cc=avi@qumranet.com \
--cc=clameter@sgi.com \
--cc=general@lists.openfabrics.org \
--cc=hugh@veritas.com \
--cc=kanojsarcar@yahoo.com \
--cc=kvm-devel@lists.sourceforge.net \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=npiggin@suse.de \
--cc=rdreier@cisco.com \
--cc=steiner@sgi.com \
--cc=swise@opengridcomputing.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).