Re: regression introduced by - timers: fix itimer/many thread hang

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Peter Zijlstra <peterz@infradead.org>
To: Frank Mayhar <fmayhar@google.com>
Cc: Doug Chapman <doug.chapman@hp.com>,
	mingo@elte.hu, roland@redhat.com, adobriyan@gmail.com,
	akpm@linux-foundation.org,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Christoph Lameter <cl@linux-foundation.org>
Subject: Re: regression introduced by - timers: fix itimer/many thread hang
Date: Thu, 06 Nov 2008 12:03:40 +0100	[thread overview]
Message-ID: <1225969420.7803.4366.camel@twins> (raw)
In-Reply-To: <1225936715.27507.44.camel@bobble.smo.corp.google.com>

On Wed, 2008-11-05 at 17:58 -0800, Frank Mayhar wrote:
> On Tue, 2008-10-28 at 14:38 -0400, Doug Chapman wrote:
> > On Mon, 2008-10-27 at 11:39 -0700, Frank Mayhar wrote:
> > > On Wed, 2008-10-22 at 13:03 -0400, Doug Chapman wrote:
> > > > Unable to handle kernel paging request at virtual address
> > > > 94949494949494a4
> > > 
> > > I take it this can be read as an uninitialized (or cleared) pointer?
> > > 
> > > It certainly looks like this is a race in thread (process?) teardown.  I
> > > don't have hardware on which to reproduce this but _looks_ like another
> > > thread has gotten in and torn down the process while we've been busy.
> > 
> > I finally managed to get kdump working and caught this in the act.  I
> > still need to dig into this more but I think these 2 threads will show
> > us the race condition.  Note that this is a slightly hacked kernel in
> > that I removed "static" from a few functions to better see what was
> > going on but no real functional changes when compared to a recent (day
> > old or so) git pull from Linus's tree.
> 
> After digging through this a bit, I've concluded that it's probably a
> race between process reap and the dequeue_entity() call to update_curr()
> combined with a side effect of the slab debug stuff.  The
> account_group_exec_runtime() routine (like the rest of these routines)
> checks tsk->signal and tsk->signal->cputime.totals for NULL to make sure
> they're still valid.  It looks like at this point tsk->signal is valid
> (since the tsk->signal->cputime dereference succeeded) but
> tsk->signal->cputime.totals is invalid.  That can't happen unless the
> process is being reaped, and in fact can only happen in a narrow window
> in __cleanup_signal() between the call to thread_group_cputime_free()
> and the kmem_cache_free() of the signal struct itself.  Without the slab
> debug stuff it would either skip the update (by noticing that pointers
> were NULL) or blithely update freed structures.
> 
> I can't see anything that would prevent this from happening in the
> general case.  I don't see what would stop the parent from coming in on
> another CPU and reaping the process after schedule() has picked it off
> the rq but before the deactivate_task->dequeue_task->dequeue_entity->
> update_curr chain completes.  I see that schedule() disables preemption
> on that CPU but will that really do the job?  I also suspect that with
> hyperthreading both of these processes are on the same silicon, meaning
> that one can be unexpectedly suspended while the other runs, thereby
> making the window wider.
> 
> Unfortunately I don't know the code well enough to know what the right
> fix is.  Maybe account_group_exec_runtime() should check for PF_EXITED?

That is just plain ugly. The right fix to me seems to destroy the
signal/thread group stuff _after_ the last task goes away.

> Maybe update_curr() should do that?  I think that it makes more sense
> for dequeue_entity() to do the check and then not call update_curr() if
> the task is exiting but I defer to others with more knowledge of this
> area.

Hell no. Its a race in this signal/thread group stuff, fix it there.

But now you made me look at this patch:

commit f06febc96ba8e0af80bcc3eaec0a109e88275fac
Author: Frank Mayhar <fmayhar@google.com>
Date:   Fri Sep 12 09:54:39 2008 -0700

    timers: fix itimer/many thread hang


and I'm thinking you just rendered big iron unbootable.

You replaced a loop-over-threads by a loop-over-cpus, on every tick. Did
you stop to think what would happen on 4096 cpu machines?

Also, you just introduced per-cpu allocations for each thread-group,
while Christoph is reworking the per-cpu allocator, with one unfortunate
side-effect - its going to have a limited size pool. Therefore this will
limit the number of thread-groups we can have.

Hohumm, not at all liking this..

next prev parent reply	other threads:[~2008-11-06 11:03 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1224694989.8431.23.camel@oberon>
     [not found] ` <1225132746.14792.13.camel@bobble.smo.corp.google.com>
     [not found]   ` <1225219114.24204.37.camel@oberon>
2008-11-06  1:58     ` regression introduced by - timers: fix itimer/many thread hang Frank Mayhar
2008-11-06 11:03       ` Peter Zijlstra [this message]
2008-11-06 15:03         ` Christoph Lameter
2008-11-06 15:08           ` Peter Zijlstra
2008-11-06 16:08             ` Christoph Lameter
2008-11-06 23:52             ` Frank Mayhar
2008-11-07  8:35               ` Ingo Molnar
2008-11-07 10:29               ` Peter Zijlstra
2008-11-07 18:10                 ` Frank Mayhar
2008-11-07 20:26                   ` Peter Zijlstra
2008-11-10 14:38                     ` Christoph Lameter
2008-11-10 14:42                       ` Peter Zijlstra
2008-11-10 15:41                         ` Christoph Lameter
2008-11-10 18:00                         ` Frank Mayhar
2008-11-14  2:42                           ` Roland McGrath
2008-11-14 16:41                             ` Oleg Nesterov
2008-11-17 14:36                               ` Oleg Nesterov
2008-11-17 18:16                                 ` Roland McGrath
2008-11-17 22:18                                   ` Oleg Nesterov
2008-11-17 21:49                                     ` Roland McGrath
2008-11-11  0:20                         ` Ingo Oeser
2008-11-11 13:58                           ` Christoph Lameter
2008-11-21 18:42                 ` Petr Tesarik
2008-11-21 19:26                   ` Frank Mayhar
2008-11-23 14:24                   ` Peter Zijlstra
2008-11-24  8:46                     ` Petr Tesarik
2008-11-24  9:33                       ` Peter Zijlstra
2008-11-24 12:32                         ` Petr Tesarik
2008-11-24 12:59                           ` Peter Zijlstra
2008-11-24 16:06                             ` Peter Zijlstra
2008-11-06 16:31         ` [PATCH] revert: " Peter Zijlstra
2008-11-06 21:44           ` Ingo Molnar
2008-11-06 21:53             ` Christoph Lameter
2008-11-07 10:19               ` Peter Zijlstra
2008-11-13 16:00           ` Doug Chapman
2008-11-13 16:08             ` Ingo Molnar
2008-11-14 14:10               ` Doug Chapman
     [not found] <20081105191211.c0316b94.akpm@linux-foundation.org>
2008-11-06 12:59 ` regression introduced by - " Oleg Nesterov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1225969420.7803.4366.camel@twins \
    --to=peterz@infradead.org \
    --cc=adobriyan@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux-foundation.org \
    --cc=doug.chapman@hp.com \
    --cc=fmayhar@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=roland@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.