From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752065AbaKWSVq (ORCPT ); Sun, 23 Nov 2014 13:21:46 -0500 Received: from mx1.redhat.com ([209.132.183.28]:50544 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751419AbaKWSVp (ORCPT ); Sun, 23 Nov 2014 13:21:45 -0500 Date: Sun, 23 Nov 2014 19:21:08 +0100 From: Oleg Nesterov To: Borislav Petkov Cc: lkml , Rik van Riel , Peter Zijlstra , Steven Rostedt , x86-ml Subject: Re: task_stat splat Message-ID: <20141123182108.GA15349@redhat.com> References: <20141123111220.GA6436@pd.tnic> <20141123172256.GA9625@redhat.com> <20141123175641.GA11619@redhat.com> <20141123181717.GA13198@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20141123181717.GA13198@redhat.com> User-Agent: Mutt/1.5.18 (2008-05-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Damn, sorry for noise ;) On 11/23, Oleg Nesterov wrote: > > On 11/23, Oleg Nesterov wrote: > > > > On 11/23, Oleg Nesterov wrote: > > > > > > On 11/23, Borislav Petkov wrote: > > > > > > > > where we end up with a zero PMD. RIP is corrupted too so we're somewhere > > > > off in the fields. > > > > > > PMD = 0 is fine I guess, addr == 0 is not mapped. > > > > > > > Comment over thread_group_cputime() talks about dead tasks accounting > > > > > > This comment simply means that we also need to read the accumulated > > > counters in tsk->signal. > > > > > > > which might be relevant as we're seeing not mapped page hierarchy so > > > > something must have gone away recently but we try to look at it. > > > > > > This is called under ->siglock, we can't race with exit/etc. But this > > > doesn't matter, it is not that we (say) get t == NULL or something like > > > this. > > > > > > RIP == 0, and this looks "impossible", I do not see indirect function > > > calls in this paths. > > > > Ah, I didn't notice you mentioned tip/master... so it looks as if > > sched_class->update_curr is NULL? > > Perhaps this is migration thread? stop_sched_class doesn't have ->update_curr. Yes, I think this can explain the problem, but > could you try to cat /proc/pid-of-migration-thread/stat on your machine? This won't trigger the crash unless it is running. Oleg.