From: Oleg Nesterov <oleg@redhat.com>
To: Frank Mayhar <fmayhar@google.com>
Cc: mingo@elte.hu, roland@redhat.com, adobriyan@gmail.com,
akpm@linux-foundation.org, linux-kernel@vger.kernel.org,
doug.chapman@hp.com
Subject: Re: regression introduced by - timers: fix itimer/many thread hang
Date: Thu, 6 Nov 2008 13:59:51 +0100 [thread overview]
Message-ID: <20081106125951.GA5756@redhat.com> (raw)
In-Reply-To: <20081105191211.c0316b94.akpm@linux-foundation.org>
> Begin forwarded message:
>
> On Tue, 2008-10-28 at 14:38 -0400, Doug Chapman wrote:
> > On Mon, 2008-10-27 at 11:39 -0700, Frank Mayhar wrote:
> > > On Wed, 2008-10-22 at 13:03 -0400, Doug Chapman wrote:
> > > > Unable to handle kernel paging request at virtual address
> > > > 94949494949494a4
> > >
> > > I take it this can be read as an uninitialized (or cleared) pointer?
> > >
> > > It certainly looks like this is a race in thread (process?) teardown. I
> > > don't have hardware on which to reproduce this but _looks_ like another
> > > thread has gotten in and torn down the process while we've been busy.
> >
> > I finally managed to get kdump working and caught this in the act. I
> > still need to dig into this more but I think these 2 threads will show
> > us the race condition. Note that this is a slightly hacked kernel in
> > that I removed "static" from a few functions to better see what was
> > going on but no real functional changes when compared to a recent (day
> > old or so) git pull from Linus's tree.
>
> After digging through this a bit, I've concluded that it's probably a
> race between process reap and the dequeue_entity() call to update_curr()
> combined with a side effect of the slab debug stuff. The
> account_group_exec_runtime() routine (like the rest of these routines)
> checks tsk->signal and tsk->signal->cputime.totals for NULL to make sure
> they're still valid. It looks like at this point tsk->signal is valid
> (since the tsk->signal->cputime dereference succeeded) but
> tsk->signal->cputime.totals is invalid. That can't happen unless the
> process is being reaped,
Frank, currently I don't have the source code which I can look at,
so I am probably wrong... But just in case, perhaps we can do
- account_group_exec_runtime(...);
+ if (lock_task_sighand(...)) {
+ account_group_exec_runtime(...);
+ unlock_task_sighand();
+ }
?
Once we take ->siglock the task can't be reaped, and ->signal becomes
stable and != NULL.
Oleg.
next parent reply other threads:[~2008-11-06 11:59 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20081105191211.c0316b94.akpm@linux-foundation.org>
2008-11-06 12:59 ` Oleg Nesterov [this message]
[not found] <1224694989.8431.23.camel@oberon>
[not found] ` <1225132746.14792.13.camel@bobble.smo.corp.google.com>
[not found] ` <1225219114.24204.37.camel@oberon>
2008-11-06 1:58 ` regression introduced by - timers: fix itimer/many thread hang Frank Mayhar
2008-11-06 11:03 ` Peter Zijlstra
2008-11-06 15:03 ` Christoph Lameter
2008-11-06 15:08 ` Peter Zijlstra
2008-11-06 16:08 ` Christoph Lameter
2008-11-06 23:52 ` Frank Mayhar
2008-11-07 8:35 ` Ingo Molnar
2008-11-07 10:29 ` Peter Zijlstra
2008-11-07 18:10 ` Frank Mayhar
2008-11-07 20:26 ` Peter Zijlstra
2008-11-10 14:38 ` Christoph Lameter
2008-11-10 14:42 ` Peter Zijlstra
2008-11-10 15:41 ` Christoph Lameter
2008-11-10 18:00 ` Frank Mayhar
2008-11-14 2:42 ` Roland McGrath
2008-11-14 16:41 ` Oleg Nesterov
2008-11-17 14:36 ` Oleg Nesterov
2008-11-17 18:16 ` Roland McGrath
2008-11-17 22:18 ` Oleg Nesterov
2008-11-17 21:49 ` Roland McGrath
2008-11-11 0:20 ` Ingo Oeser
2008-11-11 13:58 ` Christoph Lameter
2008-11-21 18:42 ` Petr Tesarik
2008-11-21 19:26 ` Frank Mayhar
2008-11-23 14:24 ` Peter Zijlstra
2008-11-24 8:46 ` Petr Tesarik
2008-11-24 9:33 ` Peter Zijlstra
2008-11-24 12:32 ` Petr Tesarik
2008-11-24 12:59 ` Peter Zijlstra
2008-11-24 16:06 ` Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20081106125951.GA5756@redhat.com \
--to=oleg@redhat.com \
--cc=adobriyan@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=doug.chapman@hp.com \
--cc=fmayhar@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=roland@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox