From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933687Ab1JDW43 (ORCPT ); Tue, 4 Oct 2011 18:56:29 -0400 Received: from mail-ww0-f44.google.com ([74.125.82.44]:38788 "EHLO mail-ww0-f44.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933158Ab1JDW42 (ORCPT ); Tue, 4 Oct 2011 18:56:28 -0400 To: Frederic Weisbecker Date: Tue, 4 Oct 2011 23:37:38 +0100 User-Agent: KMail/1.13.6 (Linux/3.1.0-rc8; KDE/4.6.2; x86_64; ; ) Cc: linux-kernel@vger.kernel.org, "Paul E. McKenney" , Ingo Molnar , Peter Zijlstra MIME-Version: 1.0 From: Julie Sullivan Subject: BUG: scheduling while atomic: swapper/0/0x10000002 - spew of 44-odd in all 3.1-rc* Message-Id: <201110042337.39611.kernelmail.jms@gmail.com> Content-Type: Text/Plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Frederic and all, I've been getting a big spew of 'BUG: scheduling while atomic: swapper/0/0x10000002' messages in dmesg for the current -rc series (not in 3.0). Bisecting for this produces two behaviours; 1 - affected kernels have typically 44 but there can be between 43 - 47 of these messages in the log (varying on a per boot basis rather than a per kernel basis.) All the 3.1-rc* kernels are like this. Looking at the call traces most (but not all) of these seem to be acpi- related. The traces slightly differ, I wouldn't want to guess what the offending functions are. 2 - affected kernels have only one 'BUG: ...' message which seems to always be the same, at least in the examples I've looked at: [ 0.000000] Detected 2393.032 MHz processor. [ 0.001003] Calibrating delay loop (skipped), value calculated using timer frequency.. 4786.06 BogoMIPS (lpj=2393032) [ 0.001008] pid_max: default: 32768 minimum: 301 [ 0.001012] BUG: scheduling while atomic: swapper/0/0x10000002 [ 0.001020] no locks held by swapper/0. [ 0.001022] Modules linked in: [ 0.001026] Pid: 0, comm: swapper Not tainted 3.1.0-rc6 #96 [ 0.001028] Call Trace: [ 0.001036] [] __schedule_bug+0x75/0x7a [ 0.001041] [] __schedule+0x95/0x686 [ 0.001047] [] ? kzalloc.clone.0+0x29/0x2b [ 0.001052] [] __cond_resched+0x2a/0x36 [ 0.001055] [] _cond_resched+0x1b/0x22 [ 0.001060] [] slab_pre_alloc_hook.clone.28+0x3a/0x40 [ 0.001064] [] kmem_cache_alloc_trace+0x2c/0xec [ 0.001068] [] kzalloc.clone.0+0x29/0x2b [ 0.001073] [] pidmap_init+0x6a/0xab [ 0.001079] [] start_kernel+0x2ef/0x37f [ 0.001083] [] x86_64_start_reservations+0xb6/0xba [ 0.001086] [] x86_64_start_kernel+0xf2/0xf9 [ 0.002039] Security Framework initialized [ 0.002049] SELinux: Initializing. I'll send a copy of a (sample 44-message) dmesg and my .config shortly. (btw superficially this looks like a problem discussed by Josh Boyer and Paul McKenney a few weeks ago but I tried both Josh and Paul's patches and neither made a difference.) Bisecting for the 44-odd message behaviour just results in this merge commit; commit 1ecc818c51b1f6886825dae3885792d5e49ec798 Merge: 1c09ab0 d902db1 Author: Ingo Molnar Date: Fri Jul 1 13:20:51 2011 +0200 Merge branch 'sched/core-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing into sched/core but bisecting for the 1-message behaviour more helpfully results in this: commit e8f7c70f44f75c827c04239b0ae5f0068b65b76e Author: Frederic Weisbecker Date: Wed Jun 8 01:51:02 2011 +0200 sched: Make sleeping inside spinlock detection working in !CONFIG_PREEMPT Select CONFIG_PREEMPT_COUNT when we enable the sleeping inside spinlock detection, so that the preempt offset gets correctly incremented/decremented from preempt_disable()/preempt_enable(). This makes the preempt count eventually working in !CONFIG_PREEMPT when that debug option is set and thus fixes the detection of explicit preemption disabled sections under such config. Code that sleeps in explicitly preempt disabled section can be finally spotted in non-preemptible kernels. Signed-off-by: Frederic Weisbecker Acked-by: Paul E. McKenney Cc: Ingo Molnar Cc: Peter Zijlstra so I guess this is not a bug but a change that uncovers other bugs? just triggering a load of 'scheduling while atomic' messages which didn't show up before. Although I can't figure out why if this patch was released in June these messages aren't present in 3.0... Indeed switching from CONFIG_PREEMPT_VOLUNTARY to CONFIG_PREEMPT completely gets rid of all of these :-) As far as I can tell this is a boot-time issue. Starting up seems straightforward, one kernel boot hung but I test booted it again twice and it was OK. All other kernels I tested when bisecting (more than 40) also booted OK. System behaviour once up seems unaffected, which is why I didn't take it especially seriously. Is this just noise? Cheers Julie