From: Oleg Nesterov <oleg@redhat.com>
To: Frank Mayhar <fmayhar@google.com>
Cc: mingo@elte.hu, roland@redhat.com, adobriyan@gmail.com,
akpm@linux-foundation.org, linux-kernel@vger.kernel.org,
doug.chapman@hp.com
Subject: Re: regression introduced by - timers: fix itimer/many thread hang
Date: Thu, 6 Nov 2008 13:59:51 +0100 [thread overview]
Message-ID: <20081106125951.GA5756@redhat.com> (raw)
In-Reply-To: <20081105191211.c0316b94.akpm@linux-foundation.org>
> Begin forwarded message:
>
> On Tue, 2008-10-28 at 14:38 -0400, Doug Chapman wrote:
> > On Mon, 2008-10-27 at 11:39 -0700, Frank Mayhar wrote:
> > > On Wed, 2008-10-22 at 13:03 -0400, Doug Chapman wrote:
> > > > Unable to handle kernel paging request at virtual address
> > > > 94949494949494a4
> > >
> > > I take it this can be read as an uninitialized (or cleared) pointer?
> > >
> > > It certainly looks like this is a race in thread (process?) teardown. I
> > > don't have hardware on which to reproduce this but _looks_ like another
> > > thread has gotten in and torn down the process while we've been busy.
> >
> > I finally managed to get kdump working and caught this in the act. I
> > still need to dig into this more but I think these 2 threads will show
> > us the race condition. Note that this is a slightly hacked kernel in
> > that I removed "static" from a few functions to better see what was
> > going on but no real functional changes when compared to a recent (day
> > old or so) git pull from Linus's tree.
>
> After digging through this a bit, I've concluded that it's probably a
> race between process reap and the dequeue_entity() call to update_curr()
> combined with a side effect of the slab debug stuff. The
> account_group_exec_runtime() routine (like the rest of these routines)
> checks tsk->signal and tsk->signal->cputime.totals for NULL to make sure
> they're still valid. It looks like at this point tsk->signal is valid
> (since the tsk->signal->cputime dereference succeeded) but
> tsk->signal->cputime.totals is invalid. That can't happen unless the
> process is being reaped,
Frank, currently I don't have the source code which I can look at,
so I am probably wrong... But just in case, perhaps we can do
- account_group_exec_runtime(...);
+ if (lock_task_sighand(...)) {
+ account_group_exec_runtime(...);
+ unlock_task_sighand();
+ }
?
Once we take ->siglock the task can't be reaped, and ->signal becomes
stable and != NULL.
Oleg.
next parent reply other threads:[~2008-11-06 11:59 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20081105191211.c0316b94.akpm@linux-foundation.org>
2008-11-06 12:59 ` Oleg Nesterov [this message]
[not found] <1224694989.8431.23.camel@oberon>
[not found] ` <1225132746.14792.13.camel@bobble.smo.corp.google.com>
[not found] ` <1225219114.24204.37.camel@oberon>
2008-11-06 1:58 ` regression introduced by - timers: fix itimer/many thread hang Frank Mayhar
2008-11-06 11:03 ` Peter Zijlstra
2008-11-06 15:03 ` Christoph Lameter
2008-11-06 15:08 ` Peter Zijlstra
2008-11-06 16:08 ` Christoph Lameter
2008-11-06 23:52 ` Frank Mayhar
2008-11-07 8:35 ` Ingo Molnar
2008-11-07 10:29 ` Peter Zijlstra
2008-11-07 18:10 ` Frank Mayhar
2008-11-07 20:26 ` Peter Zijlstra
2008-11-10 14:38 ` Christoph Lameter
2008-11-10 14:42 ` Peter Zijlstra
2008-11-10 15:41 ` Christoph Lameter
2008-11-10 18:00 ` Frank Mayhar
2008-11-14 2:42 ` Roland McGrath
2008-11-14 16:41 ` Oleg Nesterov
2008-11-17 14:36 ` Oleg Nesterov
2008-11-17 18:16 ` Roland McGrath
2008-11-17 22:18 ` Oleg Nesterov
2008-11-17 21:49 ` Roland McGrath
2008-11-11 0:20 ` Ingo Oeser
2008-11-11 13:58 ` Christoph Lameter
2008-11-21 18:42 ` Petr Tesarik
2008-11-21 19:26 ` Frank Mayhar
2008-11-23 14:24 ` Peter Zijlstra
2008-11-24 8:46 ` Petr Tesarik
2008-11-24 9:33 ` Peter Zijlstra
2008-11-24 12:32 ` Petr Tesarik
2008-11-24 12:59 ` Peter Zijlstra
2008-11-24 16:06 ` Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20081106125951.GA5756@redhat.com \
--to=oleg@redhat.com \
--cc=adobriyan@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=doug.chapman@hp.com \
--cc=fmayhar@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=roland@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.