From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752388AbbEHFMM (ORCPT ); Fri, 8 May 2015 01:12:12 -0400 Received: from mail-wi0-f178.google.com ([209.85.212.178]:36866 "EHLO mail-wi0-f178.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751137AbbEHFMJ (ORCPT ); Fri, 8 May 2015 01:12:09 -0400 Message-ID: <1431061931.3168.41.camel@gmail.com> Subject: Re: [KERNEL BUG] do_timer/tick_handover_do_timer 3.10.17 From: Mike Galbraith To: "Oza (Pawandeep) Oza" Cc: pawandeep oza , "linux-kernel@vger.kernel.org" , malayasen rout Date: Fri, 08 May 2015 07:12:11 +0200 In-Reply-To: <5C6899BCED92C94EBDCC00F80838E3D52113AB15@SJEXCHMB06.corp.ad.broadcom.com> References: <1430968960.2955.45.camel@gmail.com> <5C6899BCED92C94EBDCC00F80838E3D52113A83F@SJEXCHMB06.corp.ad.broadcom.com> <1430975311.2955.73.camel@gmail.com> <5C6899BCED92C94EBDCC00F80838E3D52113A87B@SJEXCHMB06.corp.ad.broadcom.com> <1430978071.2955.96.camel@gmail.com> <5C6899BCED92C94EBDCC00F80838E3D52113A8D3@SJEXCHMB06.corp.ad.broadcom.com> <1430981678.2955.121.camel@gmail.com> <5C6899BCED92C94EBDCC00F80838E3D52113A908@SJEXCHMB06.corp.ad.broadcom.com> <1430987391.2955.163.camel@gmail.com> <5C6899BCED92C94EBDCC00F80838E3D52113AB15@SJEXCHMB06.corp.ad.broadcom.com> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.12.11 Mime-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 2015-05-08 at 04:16 +0000, Oza (Pawandeep) Oza wrote: > So Mike, is this reason strong enough for you ? Nope. I think you did the right thing in removing your dependency on jiffies reliability in a dying box. You don't have to convince me of anything though, CC timer subsystem maintainer, see what he says. > I understand your point: solve the BUG, and I do tend to agree with you. > > But by design and implementation, the BUG() is just a beginning of the end for dying kernel. > And what happens in between this 'the beginning' and 'the end' is not less important. > (because say, on our platform we want to get clean RAMDUMP to analyze what happened, and for that we want to get clean reboot) I don't see anybody else having any trouble getting crash dumps. I spent yet another long day just yesterday, rummaging through one. > Also, > If somebody's design is to legally Crash the kernel (e.g. where kernel is actually not faulty). > Then, I do expect that tick/timekeeping framework do its job as long as it can do, and it should do, because kernel is not faulty. > But in this case it doesn’t handover jiffies incrementing job sanely. It seems odd to me to use BUG() for what you appear to be using it for.. not that I know exactly what that it mind you, but when you said when some other gizmo in your box has a problem you crash the kernel, my head tilted to the side - surely there's a more controlled response possible than poking the big red self destruct button ;-) -Mike