public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Frank Mayhar <fmayhar@google.com>
Cc: Doug Chapman <doug.chapman@hp.com>,
	mingo@elte.hu, roland@redhat.com, adobriyan@gmail.com,
	akpm@linux-foundation.org,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Christoph Lameter <cl@linux-foundation.org>
Subject: Re: regression introduced by - timers: fix itimer/many thread hang
Date: Thu, 06 Nov 2008 12:03:40 +0100	[thread overview]
Message-ID: <1225969420.7803.4366.camel@twins> (raw)
In-Reply-To: <1225936715.27507.44.camel@bobble.smo.corp.google.com>

On Wed, 2008-11-05 at 17:58 -0800, Frank Mayhar wrote:
> On Tue, 2008-10-28 at 14:38 -0400, Doug Chapman wrote:
> > On Mon, 2008-10-27 at 11:39 -0700, Frank Mayhar wrote:
> > > On Wed, 2008-10-22 at 13:03 -0400, Doug Chapman wrote:
> > > > Unable to handle kernel paging request at virtual address
> > > > 94949494949494a4
> > > 
> > > I take it this can be read as an uninitialized (or cleared) pointer?
> > > 
> > > It certainly looks like this is a race in thread (process?) teardown.  I
> > > don't have hardware on which to reproduce this but _looks_ like another
> > > thread has gotten in and torn down the process while we've been busy.
> > 
> > I finally managed to get kdump working and caught this in the act.  I
> > still need to dig into this more but I think these 2 threads will show
> > us the race condition.  Note that this is a slightly hacked kernel in
> > that I removed "static" from a few functions to better see what was
> > going on but no real functional changes when compared to a recent (day
> > old or so) git pull from Linus's tree.
> 
> After digging through this a bit, I've concluded that it's probably a
> race between process reap and the dequeue_entity() call to update_curr()
> combined with a side effect of the slab debug stuff.  The
> account_group_exec_runtime() routine (like the rest of these routines)
> checks tsk->signal and tsk->signal->cputime.totals for NULL to make sure
> they're still valid.  It looks like at this point tsk->signal is valid
> (since the tsk->signal->cputime dereference succeeded) but
> tsk->signal->cputime.totals is invalid.  That can't happen unless the
> process is being reaped, and in fact can only happen in a narrow window
> in __cleanup_signal() between the call to thread_group_cputime_free()
> and the kmem_cache_free() of the signal struct itself.  Without the slab
> debug stuff it would either skip the update (by noticing that pointers
> were NULL) or blithely update freed structures.
> 
> I can't see anything that would prevent this from happening in the
> general case.  I don't see what would stop the parent from coming in on
> another CPU and reaping the process after schedule() has picked it off
> the rq but before the deactivate_task->dequeue_task->dequeue_entity->
> update_curr chain completes.  I see that schedule() disables preemption
> on that CPU but will that really do the job?  I also suspect that with
> hyperthreading both of these processes are on the same silicon, meaning
> that one can be unexpectedly suspended while the other runs, thereby
> making the window wider.
> 
> Unfortunately I don't know the code well enough to know what the right
> fix is.  Maybe account_group_exec_runtime() should check for PF_EXITED?

That is just plain ugly. The right fix to me seems to destroy the
signal/thread group stuff _after_ the last task goes away.

> Maybe update_curr() should do that?  I think that it makes more sense
> for dequeue_entity() to do the check and then not call update_curr() if
> the task is exiting but I defer to others with more knowledge of this
> area.

Hell no. Its a race in this signal/thread group stuff, fix it there.

But now you made me look at this patch:

commit f06febc96ba8e0af80bcc3eaec0a109e88275fac
Author: Frank Mayhar <fmayhar@google.com>
Date:   Fri Sep 12 09:54:39 2008 -0700

    timers: fix itimer/many thread hang


and I'm thinking you just rendered big iron unbootable.

You replaced a loop-over-threads by a loop-over-cpus, on every tick. Did
you stop to think what would happen on 4096 cpu machines?

Also, you just introduced per-cpu allocations for each thread-group,
while Christoph is reworking the per-cpu allocator, with one unfortunate
side-effect - its going to have a limited size pool. Therefore this will
limit the number of thread-groups we can have.

Hohumm, not at all liking this..

  reply	other threads:[~2008-11-06 11:03 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1224694989.8431.23.camel@oberon>
     [not found] ` <1225132746.14792.13.camel@bobble.smo.corp.google.com>
     [not found]   ` <1225219114.24204.37.camel@oberon>
2008-11-06  1:58     ` regression introduced by - timers: fix itimer/many thread hang Frank Mayhar
2008-11-06 11:03       ` Peter Zijlstra [this message]
2008-11-06 15:03         ` Christoph Lameter
2008-11-06 15:08           ` Peter Zijlstra
2008-11-06 16:08             ` Christoph Lameter
2008-11-06 23:52             ` Frank Mayhar
2008-11-07  8:35               ` Ingo Molnar
2008-11-07 10:29               ` Peter Zijlstra
2008-11-07 18:10                 ` Frank Mayhar
2008-11-07 20:26                   ` Peter Zijlstra
2008-11-10 14:38                     ` Christoph Lameter
2008-11-10 14:42                       ` Peter Zijlstra
2008-11-10 15:41                         ` Christoph Lameter
2008-11-10 18:00                         ` Frank Mayhar
2008-11-14  2:42                           ` Roland McGrath
2008-11-14 16:41                             ` Oleg Nesterov
2008-11-17 14:36                               ` Oleg Nesterov
2008-11-17 18:16                                 ` Roland McGrath
2008-11-17 22:18                                   ` Oleg Nesterov
2008-11-17 21:49                                     ` Roland McGrath
2008-11-11  0:20                         ` Ingo Oeser
2008-11-11 13:58                           ` Christoph Lameter
2008-11-21 18:42                 ` Petr Tesarik
2008-11-21 19:26                   ` Frank Mayhar
2008-11-23 14:24                   ` Peter Zijlstra
2008-11-24  8:46                     ` Petr Tesarik
2008-11-24  9:33                       ` Peter Zijlstra
2008-11-24 12:32                         ` Petr Tesarik
2008-11-24 12:59                           ` Peter Zijlstra
2008-11-24 16:06                             ` Peter Zijlstra
2008-11-06 16:31         ` [PATCH] revert: " Peter Zijlstra
2008-11-06 21:44           ` Ingo Molnar
2008-11-06 21:53             ` Christoph Lameter
2008-11-07 10:19               ` Peter Zijlstra
2008-11-13 16:00           ` Doug Chapman
2008-11-13 16:08             ` Ingo Molnar
2008-11-14 14:10               ` Doug Chapman
     [not found] <20081105191211.c0316b94.akpm@linux-foundation.org>
2008-11-06 12:59 ` regression introduced by - " Oleg Nesterov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1225969420.7803.4366.camel@twins \
    --to=peterz@infradead.org \
    --cc=adobriyan@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux-foundation.org \
    --cc=doug.chapman@hp.com \
    --cc=fmayhar@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=roland@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox